1
|
Cary E, Lahdesmaki I, Badde S. Audiovisual simultaneity windows reflect temporal sensory uncertainty. Psychon Bull Rev 2024:10.3758/s13423-024-02478-4. [PMID: 38388825 DOI: 10.3758/s13423-024-02478-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/04/2024] [Indexed: 02/24/2024]
Abstract
The ability to judge the temporal alignment of visual and auditory information is a prerequisite for multisensory integration and segregation. However, each temporal measurement is subject to error. Thus, when judging whether a visual and auditory stimulus were presented simultaneously, observers must rely on a subjective decision boundary to distinguish between measurement error and truly misaligned audiovisual signals. Here, we tested whether these decision boundaries are relaxed with increasing temporal sensory uncertainty, i.e., whether participants make the same type of adjustment an ideal observer would make. Participants judged the simultaneity of audiovisual stimulus pairs with varying temporal offset, while being immersed in different virtual environments. To obtain estimates of participants' temporal sensory uncertainty and simultaneity criteria in each environment, an independent-channels model was fitted to their simultaneity judgments. In two experiments, participants' simultaneity decision boundaries were predicted by their temporal uncertainty, which varied unsystematically with the environment. Hence, observers used a flexibly updated estimate of their own audiovisual temporal uncertainty to establish subjective criteria of simultaneity. This finding implies that, under typical circumstances, audiovisual simultaneity windows reflect an observer's cross-modal temporal uncertainty.
Collapse
Affiliation(s)
- Emma Cary
- Department of Psychology, Tufts University, Medford, MA, 02155, USA
| | - Ilona Lahdesmaki
- Department of Psychology, Tufts University, Medford, MA, 02155, USA
| | - Stephanie Badde
- Department of Psychology, Tufts University, Medford, MA, 02155, USA.
| |
Collapse
|
2
|
Heins N, Pomp J, Kluger DS, Vinbrüx S, Trempler I, Kohler A, Kornysheva K, Zentgraf K, Raab M, Schubotz RI. Surmising synchrony of sound and sight: Factors explaining variance of audiovisual integration in hurdling, tap dancing and drumming. PLoS One 2021; 16:e0253130. [PMID: 34293800 PMCID: PMC8298114 DOI: 10.1371/journal.pone.0253130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Accepted: 05/31/2021] [Indexed: 11/18/2022] Open
Abstract
Auditory and visual percepts are integrated even when they are not perfectly temporally aligned with each other, especially when the visual signal precedes the auditory signal. This window of temporal integration for asynchronous audiovisual stimuli is relatively well examined in the case of speech, while other natural action-induced sounds have been widely neglected. Here, we studied the detection of audiovisual asynchrony in three different whole-body actions with natural action-induced sounds–hurdling, tap dancing and drumming. In Study 1, we examined whether audiovisual asynchrony detection, assessed by a simultaneity judgment task, differs as a function of sound production intentionality. Based on previous findings, we expected that auditory and visual signals should be integrated over a wider temporal window for actions creating sounds intentionally (tap dancing), compared to actions creating sounds incidentally (hurdling). While percentages of perceived synchrony differed in the expected way, we identified two further factors, namely high event density and low rhythmicity, to induce higher synchrony ratings as well. Therefore, we systematically varied event density and rhythmicity in Study 2, this time using drumming stimuli to exert full control over these variables, and the same simultaneity judgment tasks. Results suggest that high event density leads to a bias to integrate rather than segregate auditory and visual signals, even at relatively large asynchronies. Rhythmicity had a similar, albeit weaker effect, when event density was low. Our findings demonstrate that shorter asynchronies and visual-first asynchronies lead to higher synchrony ratings of whole-body action, pointing to clear parallels with audiovisual integration in speech perception. Overconfidence in the naturally expected, that is, synchrony of sound and sight, was stronger for intentional (vs. incidental) sound production and for movements with high (vs. low) rhythmicity, presumably because both encourage predictive processes. In contrast, high event density appears to increase synchronicity judgments simply because it makes the detection of audiovisual asynchrony more difficult. More studies using real-life audiovisual stimuli with varying event densities and rhythmicities are needed to fully uncover the general mechanisms of audiovisual integration.
Collapse
Affiliation(s)
- Nina Heins
- Department of Psychology, University of Muenster, Muenster, Germany
- Otto Creutzfeldt Center for Cognitive and Behavioral Neuroscience, University of Muenster, Muenster, Germany
| | - Jennifer Pomp
- Department of Psychology, University of Muenster, Muenster, Germany
- Otto Creutzfeldt Center for Cognitive and Behavioral Neuroscience, University of Muenster, Muenster, Germany
| | - Daniel S. Kluger
- Otto Creutzfeldt Center for Cognitive and Behavioral Neuroscience, University of Muenster, Muenster, Germany
- Institute for Biomagnetism and Biosignal Analysis, University Hospital Muenster, Muenster, Germany
| | - Stefan Vinbrüx
- Institute of Sport and Exercise Sciences, Human Performance and Training, University of Muenster, Muenster, Germany
| | - Ima Trempler
- Department of Psychology, University of Muenster, Muenster, Germany
- Otto Creutzfeldt Center for Cognitive and Behavioral Neuroscience, University of Muenster, Muenster, Germany
| | - Axel Kohler
- Otto Creutzfeldt Center for Cognitive and Behavioral Neuroscience, University of Muenster, Muenster, Germany
| | - Katja Kornysheva
- School of Psychology and Bangor Neuroimaging Unit, Bangor University, Wales, United Kingdom
| | - Karen Zentgraf
- Department of Movement Science and Training in Sports, Institute of Sport Sciences, Goethe University Frankfurt, Frankfurt, Germany
| | - Markus Raab
- Institute of Psychology, German Sport University Cologne, Cologne, Germany
- School of Applied Sciences, London South Bank University, London, United Kingdom
| | - Ricarda I. Schubotz
- Department of Psychology, University of Muenster, Muenster, Germany
- Otto Creutzfeldt Center for Cognitive and Behavioral Neuroscience, University of Muenster, Muenster, Germany
- * E-mail:
| |
Collapse
|
3
|
Keeping in time with social and non-social stimuli: Synchronisation with auditory, visual, and audio-visual cues. Sci Rep 2021; 11:8805. [PMID: 33888822 PMCID: PMC8062473 DOI: 10.1038/s41598-021-88112-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Accepted: 03/31/2021] [Indexed: 02/02/2023] Open
Abstract
Everyday social interactions require us to closely monitor, predict, and synchronise our movements with those of an interacting partner. Experimental studies of social synchrony typically examine the social-cognitive outcomes associated with synchrony, such as affiliation. On the other hand, research on the sensorimotor aspects of synchronisation generally uses non-social stimuli (e.g. a moving dot). To date, the differences in sensorimotor aspects of synchronisation to social compared to non-social stimuli remain largely unknown. The present study aims to address this gap using a verbal response paradigm where participants were asked to synchronise a 'ba' response in time with social and non-social stimuli, which were presented auditorily, visually, or audio-visually combined. For social stimuli a video/audio recording of an actor performing the same verbal 'ba' response was presented, whereas for non-social stimuli a moving dot, an auditory metronome or both combined were presented. The impact of autistic traits on participants' synchronisation performance was examined using the Autism Spectrum Quotient (AQ). Our results revealed more accurate synchronisation for social compared to non-social stimuli, suggesting that greater familiarity with and motivation in attending to social stimuli may enhance our ability to better predict and synchronise with them. Individuals with fewer autistic traits demonstrated greater social learning, as indexed through an improvement in synchronisation performance to social vs non-social stimuli across the experiment.
Collapse
|
4
|
Li S, Ding Q, Yuan Y, Yue Z. Audio-Visual Causality and Stimulus Reliability Affect Audio-Visual Synchrony Perception. Front Psychol 2021; 12:629996. [PMID: 33679553 PMCID: PMC7930005 DOI: 10.3389/fpsyg.2021.629996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Accepted: 01/28/2021] [Indexed: 11/18/2022] Open
Abstract
People can discriminate the synchrony between audio-visual scenes. However, the sensitivity of audio-visual synchrony perception can be affected by many factors. Using a simultaneity judgment task, the present study investigated whether the synchrony perception of complex audio-visual stimuli was affected by audio-visual causality and stimulus reliability. In Experiment 1, the results showed that audio-visual causality could increase one's sensitivity to audio-visual onset asynchrony (AVOA) of both action stimuli and speech stimuli. Moreover, participants were more tolerant of AVOA of speech stimuli than that of action stimuli in the high causality condition, whereas no significant difference between these two kinds of stimuli was found in the low causality condition. In Experiment 2, the speech stimuli were manipulated with either high or low stimulus reliability. The results revealed a significant interaction between audio-visual causality and stimulus reliability. Under the low causality condition, the percentage of “synchronous” responses of audio-visual intact stimuli was significantly higher than that of visual_intact/auditory_blurred stimuli and audio-visual blurred stimuli. In contrast, no significant difference among all levels of stimulus reliability was observed under the high causality condition. Our study supported the synergistic effect of top-down processing and bottom-up processing in audio-visual synchrony perception.
Collapse
Affiliation(s)
- Shao Li
- Department of Psychology, Sun Yat-sen University, Guangzhou, China
| | - Qi Ding
- Department of Psychology, Sun Yat-sen University, Guangzhou, China
| | - Yichen Yuan
- Department of Psychology, Sun Yat-sen University, Guangzhou, China
| | - Zhenzhu Yue
- Department of Psychology, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
5
|
Heins N, Trempler I, Zentgraf K, Raab M, Schubotz RI. Too Late! Influence of Temporal Delay on the Neural Processing of One's Own Incidental and Intentional Action-Induced Sounds. Front Neurosci 2020; 14:573970. [PMID: 33250704 PMCID: PMC7674666 DOI: 10.3389/fnins.2020.573970] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Accepted: 10/09/2020] [Indexed: 12/30/2022] Open
Abstract
The influence of delayed auditory feedback on action evaluation and execution of real-life action-induced sounds apart from language and music is still poorly understood. Here, we examined how a temporal delay impacted the behavioral evaluation and neural representation of hurdling and tap-dancing actions in a functional magnetic resonance imaging (fMRI) experiment, postulating that effects of delay diverge between the two, as we create action-induced sounds intentionally in tap dancing, but incidentally in hurdling. Based on previous findings, we expected that conditions differ regarding the engagement of the supplementary motor area (SMA), posterior superior temporal gyrus (pSTG), and primary auditory cortex (A1). Participants were videotaped during a 9-week training of hurdling and tap dancing; in the fMRI scanner, they were presented with point-light videos of their own training videos, including the original or the slightly delayed sound, and had to evaluate how well they performed on each single trial. For the undelayed conditions, we replicated A1 attenuation and enhanced pSTG and SMA engagement for tap dancing (intentionally generated sounds) vs. hurdling (incidentally generated sounds). Delayed auditory feedback did not negatively influence behavioral rating scores in general. Blood-oxygen-level-dependent (BOLD) response transiently increased and then adapted to repeated presentation of point-light videos with delayed sound in pSTG. This region also showed a significantly stronger correlation with the SMA under delayed feedback. Notably, SMA activation increased more for delayed feedback in the tap-dancing condition, covarying with higher rating scores. Findings suggest that action evaluation is more strongly based on top–down predictions from SMA when sounds of intentional action are distorted.
Collapse
Affiliation(s)
- Nina Heins
- Department of Psychology, University of Münster, Münster, Germany.,Otto Creutzfeldt Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| | - Ima Trempler
- Department of Psychology, University of Münster, Münster, Germany.,Otto Creutzfeldt Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| | - Karen Zentgraf
- Institute for Sport Sciences, Goethe University Frankfurt, Frankfurt, Germany
| | - Markus Raab
- Institute of Psychology, German Sport University Cologne, Cologne, Germany.,School of Applied Sciences, London South Bank University, London, United Kingdom
| | - Ricarda I Schubotz
- Department of Psychology, University of Münster, Münster, Germany.,Otto Creutzfeldt Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| |
Collapse
|
6
|
Abstract
It is proposed that the perceived present is not a moment in time, but an information structure comprising an integrated set of products of perceptual processing. All information in the perceived present carries an informational time marker identifying it as "present". This marker is exclusive to information in the perceived present. There are other kinds of time markers, such as ordinality ("this stimulus occurred before that one") and duration ("this stimulus lasted for 50 ms"). These are different from the "present" time marker and may be attached to information regardless of whether it is in the perceived present or not. It is proposed that the perceived present is a very short-term and very high-capacity holding area for perceptual information. The maximum holding time for any given piece of information is ~100 ms: This is affected by the need to balance the value of informational persistence for further processing against the problem of obsolescence of the information. The main function of the perceived present is to facilitate access by other specialized, automatic processes.
Collapse
Affiliation(s)
- Peter A White
- School of Psychology, Cardiff University, Tower Building, Park Place, Cardiff, Wales, CF10 3YG, UK.
| |
Collapse
|
7
|
Deploying attention to the target location of a pointing action modulates audiovisual processes at nontarget locations. Atten Percept Psychophys 2020; 82:3507-3520. [PMID: 32676805 DOI: 10.3758/s13414-020-02065-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The current study examined how the deployment of spatial attention at the onset of a pointing movement influenced audiovisual crossmodal interactions at the target of the pointing action and at nontarget locations. These interactions were quantified by measuring the susceptibility to the fission (i.e., reporting two visual flashes under one flash and two auditory beep pairings) and fusion (i.e., reporting one flash under two flashes and one beep pairing) audiovisual illusions. At movement onset, unimodal, or auditory and visual bimodal stimuli were either presented at the target of the pointing action or in an adjacent, nontarget location. In Experiment 1, perceptual accuracy within the unimodal and bimodal conditions was lower in the nontarget relative to the target condition. The fission illusion was uninfluenced by target condition. However, the fusion illusion was more likely to be reported at the target relative to the nontarget location. In Experiment 2, the stimuli from Experiment 1 were further presented at a location near where the eyes were fixated (i.e., congruent condition), where the hand was aiming (i.e., target), or in a location where neither the eyes were fixated nor the hand was aiming. The results yielded the greatest susceptibility to the fusion illusion when the visual location and movement end points were congruent relative to when either movement or fixation was incongruent. Although attention may facilitate the processing of unisensory and multisensory cues in general, attention might have the strongest influence on the audiovisual integration mechanisms that underlie the sound-induced fusion illusion.
Collapse
|
8
|
Randazzo M, Priefer R, Smith PJ, Nagler A, Avery T, Froud K. Neural Correlates of Modality-Sensitive Deviance Detection in the Audiovisual Oddball Paradigm. Brain Sci 2020; 10:brainsci10060328. [PMID: 32481538 PMCID: PMC7348766 DOI: 10.3390/brainsci10060328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Revised: 05/15/2020] [Accepted: 05/25/2020] [Indexed: 11/16/2022] Open
Abstract
The McGurk effect, an incongruent pairing of visual /ga/–acoustic /ba/, creates a fusion illusion /da/ and is the cornerstone of research in audiovisual speech perception. Combination illusions occur given reversal of the input modalities—auditory /ga/-visual /ba/, and percept /bga/. A robust literature shows that fusion illusions in an oddball paradigm evoke a mismatch negativity (MMN) in the auditory cortex, in absence of changes to acoustic stimuli. We compared fusion and combination illusions in a passive oddball paradigm to further examine the influence of visual and auditory aspects of incongruent speech stimuli on the audiovisual MMN. Participants viewed videos under two audiovisual illusion conditions: fusion with visual aspect of the stimulus changing, and combination with auditory aspect of the stimulus changing, as well as two unimodal auditory- and visual-only conditions. Fusion and combination deviants exerted similar influence in generating congruency predictions with significant differences between standards and deviants in the N100 time window. Presence of the MMN in early and late time windows differentiated fusion from combination deviants. When the visual signal changes, a new percept is created, but when the visual is held constant and the auditory changes, the response is suppressed, evoking a later MMN. In alignment with models of predictive processing in audiovisual speech perception, we interpreted our results to indicate that visual information can both predict and suppress auditory speech perception.
Collapse
Affiliation(s)
- Melissa Randazzo
- Department of Communication Sciences and Disorders, Adelphi University, Garden City, NY 11530, USA; (R.P.); (A.N.)
- Correspondence: ; Tel.: +1-516-877-4769
| | - Ryan Priefer
- Department of Communication Sciences and Disorders, Adelphi University, Garden City, NY 11530, USA; (R.P.); (A.N.)
| | - Paul J. Smith
- Neuroscience and Education, Department of Biobehavioral Sciences, Teachers College, Columbia University, New York, NY 10027, USA; (P.J.S.); (T.A.); (K.F.)
| | - Amanda Nagler
- Department of Communication Sciences and Disorders, Adelphi University, Garden City, NY 11530, USA; (R.P.); (A.N.)
| | - Trey Avery
- Neuroscience and Education, Department of Biobehavioral Sciences, Teachers College, Columbia University, New York, NY 10027, USA; (P.J.S.); (T.A.); (K.F.)
| | - Karen Froud
- Neuroscience and Education, Department of Biobehavioral Sciences, Teachers College, Columbia University, New York, NY 10027, USA; (P.J.S.); (T.A.); (K.F.)
| |
Collapse
|
9
|
Zhou HY, Cheung EFC, Chan RCK. Audiovisual temporal integration: Cognitive processing, neural mechanisms, developmental trajectory and potential interventions. Neuropsychologia 2020; 140:107396. [PMID: 32087206 DOI: 10.1016/j.neuropsychologia.2020.107396] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Revised: 02/14/2020] [Accepted: 02/15/2020] [Indexed: 12/21/2022]
Abstract
To integrate auditory and visual signals into a unified percept, the paired stimuli must co-occur within a limited time window known as the Temporal Binding Window (TBW). The width of the TBW, a proxy of audiovisual temporal integration ability, has been found to be correlated with higher-order cognitive and social functions. A comprehensive review of studies investigating audiovisual TBW reveals several findings: (1) a wide range of top-down processes and bottom-up features can modulate the width of the TBW, facilitating adaptation to the changing and multisensory external environment; (2) a large-scale brain network works in coordination to ensure successful detection of audiovisual (a)synchrony; (3) developmentally, audiovisual TBW follows a U-shaped pattern across the lifespan, with a protracted developmental course into late adolescence and rebounding in size again in late life; (4) an enlarged TBW is characteristic of a number of neurodevelopmental disorders; and (5) the TBW is highly flexible via perceptual and musical training. Interventions targeting the TBW may be able to improve multisensory function and ameliorate social communicative symptoms in clinical populations.
Collapse
Affiliation(s)
- Han-Yu Zhou
- Neuropsychology and Applied Cognitive Neuroscience Laboratory, CAS Key Laboratory of Mental Health, Institute of Psychology, Beijing, China; Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
| | | | - Raymond C K Chan
- Neuropsychology and Applied Cognitive Neuroscience Laboratory, CAS Key Laboratory of Mental Health, Institute of Psychology, Beijing, China; Department of Psychology, University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
10
|
Scurry AN, Vercillo T, Nicholson A, Webster M, Jiang F. Aging Impairs Temporal Sensitivity, but not Perceptual Synchrony, Across Modalities. Multisens Res 2019; 32:671-692. [PMID: 31059487 DOI: 10.1163/22134808-20191343] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2018] [Accepted: 02/11/2019] [Indexed: 11/19/2022]
Abstract
Encoding the temporal properties of external signals that comprise multimodal events is a major factor guiding everyday experience. However, during the natural aging process, impairments to sensory processing can profoundly affect multimodal temporal perception. Various mechanisms can contribute to temporal perception, and thus it is imperative to understand how each can be affected by age. In the current study, using three different temporal order judgement tasks (unisensory, multisensory, and sensorimotor), we investigated the effects of age on two separate temporal processes: synchronization and integration of multiple signals. These two processes rely on different aspects of temporal information, either the temporal alignment of processed signals or the integration/segregation of signals arising from different modalities, respectively. Results showed that the ability to integrate/segregate multiple signals decreased with age regardless of the task, and that the magnitude of such impairment correlated across tasks, suggesting a widespread mechanism affected by age. In contrast, perceptual synchrony remained stable with age, revealing a distinct intact mechanism. Overall, results from this study suggest that aging has differential effects on temporal processing, and general impairments with aging may impact global temporal sensitivity while context-dependent processes remain unaffected.
Collapse
Affiliation(s)
| | - Tiziana Vercillo
- 2Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, NY 14642, USA
| | - Alexis Nicholson
- 1Department of Psychology, University of Nevada, Reno, NV 89557, USA
| | - Michael Webster
- 1Department of Psychology, University of Nevada, Reno, NV 89557, USA
| | - Fang Jiang
- 1Department of Psychology, University of Nevada, Reno, NV 89557, USA
| |
Collapse
|
11
|
Takehana A, Uehara T, Sakaguchi Y. Audiovisual synchrony perception in observing human motion to music. PLoS One 2019; 14:e0221584. [PMID: 31454393 PMCID: PMC6711538 DOI: 10.1371/journal.pone.0221584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Accepted: 08/09/2019] [Indexed: 11/18/2022] Open
Abstract
To examine how individuals perceive synchrony between music and body motion, we investigated the characteristics of synchrony perception during observation of a Japanese Radio Calisthenics routine. We used the constant stimuli method to present video clips of an individual performing an exercise routine. We generated stimuli with a range of temporal shifts between the visual and auditory streams, and asked participants to make synchrony judgments. We then examined which movement-feature points agreed with music beats when the participants perceived synchrony. We found that extremities (e.g., hands and feet) reached the movement endpoint or moved through the lowest position at music beats associated with synchrony. Movement onsets never agreed with music beats. To investigate whether visual information about the feature points was necessary for synchrony perception, we conducted a second experiment where only limited portions of video clips were presented to the participants. Participants consistently judged synchrony even when the video image did not contain the critical feature points, suggesting that a prediction mechanism contributes to synchrony perception. To discuss the meaning of these feature points with respect to synchrony perception, we examined the temporal relationship between the motion of body parts and the ground reaction force (GRF) of exercise performers, which reflected the total force acting on the performer. Interestingly, vertical GRF showed local peaks consistently synchronized with music beats for most exercises, with timing that was closely correlated with the timing of movement feature points. This result suggests that synchrony perception in humans is based on some global variable anticipated from visual information, instead of the feature points found in the motion of individual body parts. In summary, the present results indicate that synchrony perception during observation of human motion to music depends largely on spatiotemporal prediction of the performer's motion.
Collapse
Affiliation(s)
- Akira Takehana
- Department of Mechanical Engineering and Intelligent Systems, Graduate School of Informatics and Engineering, University of Electro-Communications, Chofu, Tokyo, Japan
| | - Tsukasa Uehara
- Department of Mechanical Engineering and Intelligent Systems, Graduate School of Informatics and Engineering, University of Electro-Communications, Chofu, Tokyo, Japan
| | - Yutaka Sakaguchi
- Department of Mechanical Engineering and Intelligent Systems, Graduate School of Informatics and Engineering, University of Electro-Communications, Chofu, Tokyo, Japan
- Research Center for Performance Art Science, University of Electro-Communications, Chofu, Tokyo, Japan
- * E-mail:
| |
Collapse
|
12
|
Dittrich S, Noesselt T. Temporal Audiovisual Motion Prediction in 2D- vs. 3D-Environments. Front Psychol 2018; 9:368. [PMID: 29618999 PMCID: PMC5871701 DOI: 10.3389/fpsyg.2018.00368] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2017] [Accepted: 03/06/2018] [Indexed: 11/24/2022] Open
Abstract
Predicting motion is essential for many everyday life activities, e.g., in road traffic. Previous studies on motion prediction failed to find consistent results, which might be due to the use of very different stimulus material and behavioural tasks. Here, we directly tested the influence of task (detection, extrapolation) and stimulus features (visual vs. audiovisual and three-dimensional vs. non-three-dimensional) on temporal motion prediction in two psychophysical experiments. In both experiments a ball followed a trajectory toward the observer and temporarily disappeared behind an occluder. In audiovisual conditions a moving white noise (congruent or non-congruent to visual motion direction) was presented concurrently. In experiment 1 the ball reappeared on a predictable or a non-predictable trajectory and participants detected when the ball reappeared. In experiment 2 the ball did not reappear after occlusion and participants judged when the ball would reach a specified position at two possible distances from the occluder (extrapolation task). Both experiments were conducted in three-dimensional space (using stereoscopic screen and polarised glasses) and also without stereoscopic presentation. Participants benefitted from visually predictable trajectories and concurrent sounds during detection. Additionally, visual facilitation was more pronounced for non-3D stimulation during detection task. In contrast, for a more complex extrapolation task group mean results indicated that auditory information impaired motion prediction. However, a post hoc cross-validation procedure (split-half) revealed that participants varied in their ability to use sounds during motion extrapolation. Most participants selectively profited from either near or far extrapolation distances but were impaired for the other one. We propose that interindividual differences in extrapolation efficiency might be the mechanism governing this effect. Together, our results indicate that both a realistic experimental environment and subject-specific differences modulate the ability of audiovisual motion prediction and need to be considered in future research.
Collapse
Affiliation(s)
- Sandra Dittrich
- Department of Biological Psychology, Otto von Guericke University Magdeburg, Magdeburg, Germany
| | - Tömme Noesselt
- Department of Biological Psychology, Otto von Guericke University Magdeburg, Magdeburg, Germany.,Center for Behavioral Brain Sciences, Magdeburg, Germany
| |
Collapse
|
13
|
Stimulus duration has little effect on auditory, visual and audiovisual temporal order judgement. Exp Brain Res 2018; 236:1273-1282. [DOI: 10.1007/s00221-018-5218-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Accepted: 02/22/2018] [Indexed: 10/17/2022]
|
14
|
Shatzer H, Shen S, Kerlin JR, Pitt MA, Shahin AJ. Neurophysiology underlying influence of stimulus reliability on audiovisual integration. Eur J Neurosci 2018; 48:2836-2848. [PMID: 29363844 DOI: 10.1111/ejn.13843] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Revised: 12/15/2017] [Accepted: 01/08/2018] [Indexed: 12/01/2022]
Abstract
We tested the predictions of the dynamic reweighting model (DRM) of audiovisual (AV) speech integration, which posits that spectrotemporally reliable (informative) AV speech stimuli induce a reweighting of processing from low-level to high-level auditory networks. This reweighting decreases sensitivity to acoustic onsets and in turn increases tolerance to AV onset asynchronies (AVOA). EEG was recorded while subjects watched videos of a speaker uttering trisyllabic nonwords that varied in spectrotemporal reliability and asynchrony of the visual and auditory inputs. Subjects judged the stimuli as in-sync or out-of-sync. Results showed that subjects exhibited greater AVOA tolerance for non-blurred than blurred visual speech and for less than more degraded acoustic speech. Increased AVOA tolerance was reflected in reduced amplitude of the P1-P2 auditory evoked potentials, a neurophysiological indication of reduced sensitivity to acoustic onsets and successful AV integration. There was also sustained visual alpha band (8-14 Hz) suppression (desynchronization) following acoustic speech onsets for non-blurred vs. blurred visual speech, consistent with continuous engagement of the visual system as the speech unfolds. The current findings suggest that increased spectrotemporal reliability of acoustic and visual speech promotes robust AV integration, partly by suppressing sensitivity to acoustic onsets, in support of the DRM's reweighting mechanism. Increased visual signal reliability also sustains the engagement of the visual system with the auditory system to maintain alignment of information across modalities.
Collapse
Affiliation(s)
- Hannah Shatzer
- Department of Psychology, The Ohio State University, Columbus, OH, USA
| | - Stanley Shen
- Center for Mind and Brain, University of California, 267 Cousteau Place, Davis, CA, 95618, USA
| | - Jess R Kerlin
- Center for Mind and Brain, University of California, 267 Cousteau Place, Davis, CA, 95618, USA
| | - Mark A Pitt
- Department of Psychology, The Ohio State University, Columbus, OH, USA
| | - Antoine J Shahin
- Center for Mind and Brain, University of California, 267 Cousteau Place, Davis, CA, 95618, USA
| |
Collapse
|
15
|
Shahin AJ, Shen S, Kerlin JR. Tolerance for audiovisual asynchrony is enhanced by the spectrotemporal fidelity of the speaker's mouth movements and speech. LANGUAGE, COGNITION AND NEUROSCIENCE 2017; 32:1102-1118. [PMID: 28966930 PMCID: PMC5617130 DOI: 10.1080/23273798.2017.1283428] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2016] [Accepted: 01/07/2017] [Indexed: 06/07/2023]
Abstract
We examined the relationship between tolerance for audiovisual onset asynchrony (AVOA) and the spectrotemporal fidelity of the spoken words and the speaker's mouth movements. In two experiments that only varied in the temporal order of sensory modality, visual speech leading (exp1) or lagging (exp2) acoustic speech, participants watched intact and blurred videos of a speaker uttering trisyllabic words and nonwords that were noise vocoded with 4-, 8-, 16-, and 32-channels. They judged whether the speaker's mouth movements and the speech sounds were in-sync or out-of-sync. Individuals perceived synchrony (tolerated AVOA) on more trials when the acoustic speech was more speech-like (8 channels and higher vs. 4 channels), and when visual speech was intact than blurred (exp1 only). These findings suggest that enhanced spectrotemporal fidelity of the audiovisual (AV) signal prompts the brain to widen the window of integration promoting the fusion of temporally distant AV percepts.
Collapse
Affiliation(s)
- Antoine J Shahin
- Center for Mind and Brain, University of California, Davis, CA, 95618
| | - Stanley Shen
- Center for Mind and Brain, University of California, Davis, CA, 95618
| | - Jess R Kerlin
- Center for Mind and Brain, University of California, Davis, CA, 95618
| |
Collapse
|
16
|
Venezia JH, Thurman SM, Matchin W, George SE, Hickok G. Timing in audiovisual speech perception: A mini review and new psychophysical data. Atten Percept Psychophys 2016; 78:583-601. [PMID: 26669309 PMCID: PMC4744562 DOI: 10.3758/s13414-015-1026-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Recent influential models of audiovisual speech perception suggest that visual speech aids perception by generating predictions about the identity of upcoming speech sounds. These models place stock in the assumption that visual speech leads auditory speech in time. However, it is unclear whether and to what extent temporally-leading visual speech information contributes to perception. Previous studies exploring audiovisual-speech timing have relied upon psychophysical procedures that require artificial manipulation of cross-modal alignment or stimulus duration. We introduce a classification procedure that tracks perceptually relevant visual speech information in time without requiring such manipulations. Participants were shown videos of a McGurk syllable (auditory /apa/ + visual /aka/ = perceptual /ata/) and asked to perform phoneme identification (/apa/ yes-no). The mouth region of the visual stimulus was overlaid with a dynamic transparency mask that obscured visual speech in some frames but not others randomly across trials. Variability in participants' responses (~35 % identification of /apa/ compared to ~5 % in the absence of the masker) served as the basis for classification analysis. The outcome was a high resolution spatiotemporal map of perceptually relevant visual features. We produced these maps for McGurk stimuli at different audiovisual temporal offsets (natural timing, 50-ms visual lead, and 100-ms visual lead). Briefly, temporally-leading (~130 ms) visual information did influence auditory perception. Moreover, several visual features influenced perception of a single speech sound, with the relative influence of each feature depending on both its temporal relation to the auditory signal and its informational content.
Collapse
Affiliation(s)
- Jonathan H Venezia
- Department of Cognitive Sciences, University of California, Irvine, CA, 92697, USA.
| | - Steven M Thurman
- Department of Psychology, University of California, Los Angeles, CA, USA
| | - William Matchin
- Department of Linguistics, University of Maryland, Baltimore, MD, USA
| | - Sahara E George
- Department of Anatomy and Neurobiology, University of California, Irvine, CA, USA
| | - Gregory Hickok
- Department of Cognitive Sciences, University of California, Irvine, CA, 92697, USA
| |
Collapse
|