1
|
Giurgola S, Lo Gerfo E, Farnè A, Roy AC, Bolognini N. Multisensory integration and motor resonance in the primary motor cortex. Cortex 2024; 179:235-246. [PMID: 39213776 DOI: 10.1016/j.cortex.2024.07.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 06/09/2024] [Accepted: 07/15/2024] [Indexed: 09/04/2024]
Abstract
Humans are endowed with a motor system that resonates to speech sounds, but whether concurrent visual information from lip movements can improve speech perception at a motor level through multisensory integration mechanisms remains unknown. Therefore, the aim of the study was to explore behavioral and neurophysiological correlates of multisensory influences on motor resonance in speech perception. Motor-evoked potentials (MEPs), by single pulse transcranial magnetic stimulation (TMS) applied over the left lip muscle (orbicularis oris) representation in the primary motor cortex, were recorded in healthy participants during the presentation of syllables in unimodal (visual or auditory) or multisensory (audio-visual) congruent or incongruent conditions. At the behavioral level, subjects showed better syllable identification in the congruent audio-visual condition as compared to the unimodal conditions, hence showing a multisensory enhancement effect. Accordingly, at the neurophysiological level, increased MEPs amplitudes were found in the congruent audio-visual condition, as compared to the unimodal ones. Incongruent audio-visual syllables resulting in illusory percepts did not increase corticospinal excitability, which in fact was comparable to that induced by the real perception of the same syllable. In conclusion, seeing and hearing congruent bilabial syllables increases the excitability of the lip representation in the primary motor cortex, hence documenting that multisensory integration can facilitate speech processing by influencing motor resonance. These findings highlight the modulation role of multisensory processing showing that it can boost speech perception and that multisensory interactions occur not only within higher-order regions, but also within primary motor areas, as shown by corticospinal excitability changes.
Collapse
Affiliation(s)
- Serena Giurgola
- Department of Psychology & NeuroMI - Milan Center for Neuroscience, University of Milano-Bicocca, Milan, Italy.
| | | | - Alessandro Farnè
- Impact Team of the Lyon Neuroscience Research Centre, INSERM U1028 CNRS UMR5292, University Claude Bernard Lyon 1, Lyon, France
| | - Alice C Roy
- Laboratoire Dynamique du Langage, Centre National de la Recherche Scientifique, UMR 5596, CNRS Université de Lyon 2, Lyon, France
| | - Nadia Bolognini
- Department of Psychology & NeuroMI - Milan Center for Neuroscience, University of Milano-Bicocca, Milan, Italy; IRCCS Istituto Auxologico Italiano, Laboratory of Neuropsychology, Milan, Italy.
| |
Collapse
|
2
|
Singh L, Tan A, Quinn PC. Infants recognize words spoken through opaque masks but not through clear masks. Dev Sci 2021; 24:e13117. [PMID: 33942441 PMCID: PMC8236912 DOI: 10.1111/desc.13117] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Revised: 04/20/2021] [Accepted: 04/22/2021] [Indexed: 12/20/2022]
Abstract
COVID-19 has modified numerous aspects of children's social environments. Many children are now spoken to through a mask. There is little empirical evidence attesting to the effects of masked language input on language processing. In addition, not much is known about the effects of clear masks (i.e., transparent face shields) versus opaque masks on language comprehension in children. In the current study, 2-year-old infants were tested on their ability to recognize familiar spoken words in three conditions: words presented with no mask, words presented through a clear mask, and words presented through an opaque mask. Infants were able to recognize familiar words presented without a mask and when hearing words through opaque masks, but not when hearing words through clear masks. Findings suggest that the ability of infants to recover spoken language input through masks varies depending on the surface properties of the mask.
Collapse
Affiliation(s)
- Leher Singh
- Department of Psychology, National University of Singapore, Singapore
| | - Agnes Tan
- Department of Psychology, National University of Singapore, Singapore
| | - Paul C Quinn
- Department of Psychological and Brain Sciences, University of Delaware, Newark, Delaware, USA
| |
Collapse
|
3
|
Abstract
Visual speech cues play an important role in speech recognition, and the McGurk effect is a classic demonstration of this. In the original McGurk & Macdonald (Nature 264, 746-748 1976) experiment, 98% of participants reported an illusory "fusion" percept of /d/ when listening to the spoken syllable /b/ and watching the visual speech movements for /g/. However, more recent work shows that subject and task differences influence the proportion of fusion responses. In the current study, we varied task (forced-choice vs. open-ended), stimulus set (including /d/ exemplars vs. not), and data collection environment (lab vs. Mechanical Turk) to investigate the robustness of the McGurk effect. Across experiments, using the same stimuli to elicit the McGurk effect, we found fusion responses ranging from 10% to 60%, thus showing large variability in the likelihood of experiencing the McGurk effect across factors that are unrelated to the perceptual information provided by the stimuli. Rather than a robust perceptual illusion, we therefore argue that the McGurk effect exists only for some individuals under specific task situations.Significance: This series of studies re-evaluates the classic McGurk effect, which shows the relevance of visual cues on speech perception. We highlight the importance of taking into account subject variables and task differences, and challenge future researchers to think carefully about the perceptual basis of the McGurk effect, how it is defined, and what it can tell us about audiovisual integration in speech.
Collapse
|
4
|
Ujiie Y, Takahashi K. Weaker McGurk Effect for Rubin's Vase-Type Speech in People With High Autistic Traits. Multisens Res 2021; 34:1-17. [PMID: 33873157 DOI: 10.1163/22134808-bja10047] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 04/05/2021] [Indexed: 11/19/2022]
Abstract
While visual information from facial speech modulates auditory speech perception, it is less influential on audiovisual speech perception among autistic individuals than among typically developed individuals. In this study, we investigated the relationship between autistic traits (Autism-Spectrum Quotient; AQ) and the influence of visual speech on the recognition of Rubin's vase-type speech stimuli with degraded facial speech information. Participants were 31 university students (13 males and 18 females; mean age: 19.2, SD: 1.13 years) who reported normal (or corrected-to-normal) hearing and vision. All participants completed three speech recognition tasks (visual, auditory, and audiovisual stimuli) and the AQ-Japanese version. The results showed that accuracies of speech recognition for visual (i.e., lip-reading) and auditory stimuli were not significantly related to participants' AQ. In contrast, audiovisual speech perception was less susceptible to facial speech perception among individuals with high rather than low autistic traits. The weaker influence of visual information on audiovisual speech perception in autism spectrum disorder (ASD) was robust regardless of the clarity of the visual information, suggesting a difficulty in the process of audiovisual integration rather than in the visual processing of facial speech.
Collapse
Affiliation(s)
- Yuta Ujiie
- Graduate School of Psychology, Chukyo University, 101-2 Yagoto Honmachi, Showa-ku, Nagoya-shi, Aichi, 466-8666, Japan
- Japan Society for the Promotion of Science, Kojimachi Business Center Building, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo 102-0083, Japan
- Research and Development Initiative, Chuo University, 1-13-27, Kasuga, Bunkyo-ku, Tokyo, 112-8551, Japan
| | - Kohske Takahashi
- School of Psychology, Chukyo University, 101-2 Yagoto Honmachi, Showa-ku, Nagoya-shi, Aichi, 466-8666, Japan
| |
Collapse
|
5
|
Alsius A, Paré M, Munhall KG. Forty Years After Hearing Lips and Seeing Voices: the McGurk Effect Revisited. Multisens Res 2018; 31:111-144. [PMID: 31264597 DOI: 10.1163/22134808-00002565] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Accepted: 03/09/2017] [Indexed: 11/19/2022]
Abstract
Since its discovery 40 years ago, the McGurk illusion has been usually cited as a prototypical paradigmatic case of multisensory binding in humans, and has been extensively used in speech perception studies as a proxy measure for audiovisual integration mechanisms. Despite the well-established practice of using the McGurk illusion as a tool for studying the mechanisms underlying audiovisual speech integration, the magnitude of the illusion varies enormously across studies. Furthermore, the processing of McGurk stimuli differs from congruent audiovisual processing at both phenomenological and neural levels. This questions the suitability of this illusion as a tool to quantify the necessary and sufficient conditions under which audiovisual integration occurs in natural conditions. In this paper, we review some of the practical and theoretical issues related to the use of the McGurk illusion as an experimental paradigm. We believe that, without a richer understanding of the mechanisms involved in the processing of the McGurk effect, experimenters should be really cautious when generalizing data generated by McGurk stimuli to matching audiovisual speech events.
Collapse
Affiliation(s)
- Agnès Alsius
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| | - Martin Paré
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| | - Kevin G Munhall
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| |
Collapse
|
6
|
Morís Fernández L, Macaluso E, Soto-Faraco S. Audiovisual integration as conflict resolution: The conflict of the McGurk illusion. Hum Brain Mapp 2017; 38:5691-5705. [PMID: 28792094 DOI: 10.1002/hbm.23758] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2017] [Revised: 07/25/2017] [Accepted: 07/27/2017] [Indexed: 01/22/2023] Open
Abstract
There are two main behavioral expressions of multisensory integration (MSI) in speech; the perceptual enhancement produced by the sight of the congruent lip movements of the speaker, and the illusory sound perceived when a speech syllable is dubbed with incongruent lip movements, in the McGurk effect. These two models have been used very often to study MSI. Here, we contend that, unlike congruent audiovisually (AV) speech, the McGurk effect involves brain areas related to conflict detection and resolution. To test this hypothesis, we used fMRI to measure blood oxygen level dependent responses to AV speech syllables. We analyzed brain activity as a function of the nature of the stimuli-McGurk or non-McGurk-and the perceptual outcome regarding MSI-integrated or not integrated response-in a 2 × 2 factorial design. The results showed that, regardless of perceptual outcome, AV mismatch activated general-purpose conflict areas (e.g., anterior cingulate cortex) as well as specific AV speech conflict areas (e.g., inferior frontal gyrus), compared with AV matching stimuli. Moreover, these conflict areas showed stronger activation on trials where the McGurk illusion was perceived compared with non-illusory trials, despite the stimuli where physically identical. We conclude that the AV incongruence in McGurk stimuli triggers the activation of conflict processing areas and that the process of resolving the cross-modal conflict is critical for the McGurk illusion to arise. Hum Brain Mapp 38:5691-5705, 2017. © 2017 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Luis Morís Fernández
- Multisensory Research Group, Center for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain
| | - Emiliano Macaluso
- Neuroimaging Laboratory, Santa Lucia Foundation, Rome, Italy.,ImpAct Team, Lyon Neuroscience Research Center (UCBL1, INSERM 1028, CNRS 5292), Lyon, France
| | - Salvador Soto-Faraco
- Multisensory Research Group, Center for Brain and Cognition, Universitat Pompeu Fabra, Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
7
|
Affiliation(s)
- Stefan R. Schweinberger
- Department of General Psychology, Friedrich Schiller University and DFG Research Unit Person Perception, Jena, Germany
| | - David M.C. Robertson
- Department of General Psychology, Friedrich Schiller University and DFG Research Unit Person Perception, Jena, Germany
| |
Collapse
|
8
|
Abstract
The relationships of autism quotient (AQ), systematizing (SQ), and empathizing (EQ), with over-selectivity were explored to assess whether over-selectivity is implicated in complex social skills, which has been assumed, but not experimentally examined. Eighty participants (aged 18–60) were trained on a simultaneous discrimination task (AB+CD−), and tested in extinction on the degree to which they had learned about both elements of the reinforced (AB) compound. Higher AQ and lower EQ scorers demonstrated greater over-selectivity, but there was no relationship between SQ and over-selectivity. These results imply that high AQ scorers perform similarly to individuals with ASD on this cognitive task, and that over-selectivity may be related to some complex social skills, like empathy.
Collapse
Affiliation(s)
- Phil Reed
- Department of Psychology, Swansea University, Singleton Park, Swansea, SA2 8PP, UK.
| |
Collapse
|
9
|
McCotter MV, Jordan TR. The Role of Facial Colour and Luminance in Visual and Audiovisual Speech Perception. Perception 2016; 32:921-36. [PMID: 14580139 DOI: 10.1068/p3316] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
We conducted four experiments to investigate the role of colour and luminance information in visual and audiovisual speech perception. In experiments la (stimuli presented in quiet conditions) and 1b (stimuli presented in auditory noise), face display types comprised naturalistic colour (NC), grey-scale (GS), and luminance inverted (LI) faces. In experiments 2a (quiet) and 2b (noise), face display types comprised NC, colour inverted (CI), LI, and colour and luminance inverted (CLI) faces. Six syllables and twenty-two words were used to produce auditory and visual speech stimuli. Auditory and visual signals were combined to produce congruent and incongruent audiovisual speech stimuli. Experiments 1a and 1b showed that perception of visual speech, and its influence on identifying the auditory components of congruent and incongruent audiovisual speech, was less for LI than for either NC or GS faces, which produced identical results. Experiments 2a and 2b showed that perception of visual speech, and influences on perception of incongruent auditory speech, was less for LI and CLI faces than for NC and CI faces (which produced identical patterns of performance). Our findings for NC and CI faces suggest that colour is not critical for perception of visual and audiovisual speech. The effect of luminance inversion on performance accuracy was relatively small (5%), which suggests that the luminance information preserved in LI faces is important for the processing of visual and audiovisual speech.
Collapse
Affiliation(s)
- Maxine V McCotter
- School of Psychology, University of Nottingham, University Park, Nottingham NG7 2RD, UK.
| | | |
Collapse
|
10
|
Jaekl P, Pesquita A, Alsius A, Munhall K, Soto-Faraco S. The contribution of dynamic visual cues to audiovisual speech perception. Neuropsychologia 2015; 75:402-10. [PMID: 26100561 DOI: 10.1016/j.neuropsychologia.2015.06.025] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2014] [Revised: 06/11/2015] [Accepted: 06/18/2015] [Indexed: 11/19/2022]
Abstract
Seeing a speaker's facial gestures can significantly improve speech comprehension, especially in noisy environments. However, the nature of the visual information from the speaker's facial movements that is relevant for this enhancement is still unclear. Like auditory speech signals, visual speech signals unfold over time and contain both dynamic configural information and luminance-defined local motion cues; two information sources that are thought to engage anatomically and functionally separate visual systems. Whereas, some past studies have highlighted the importance of local, luminance-defined motion cues in audiovisual speech perception, the contribution of dynamic configural information signalling changes in form over time has not yet been assessed. We therefore attempted to single out the contribution of dynamic configural information to audiovisual speech processing. To this aim, we measured word identification performance in noise using unimodal auditory stimuli, and with audiovisual stimuli. In the audiovisual condition, speaking faces were presented as point light displays achieved via motion capture of the original talker. Point light displays could be isoluminant, to minimise the contribution of effective luminance-defined local motion information, or with added luminance contrast, allowing the combined effect of dynamic configural cues and local motion cues. Audiovisual enhancement was found in both the isoluminant and contrast-based luminance conditions compared to an auditory-only condition, demonstrating, for the first time the specific contribution of dynamic configural cues to audiovisual speech improvement. These findings imply that globally processed changes in a speaker's facial shape contribute significantly towards the perception of articulatory gestures and the analysis of audiovisual speech.
Collapse
Affiliation(s)
- Philip Jaekl
- Center for Visual Science and Department of Brain and Cognitive Sciences, University of Rochester, Rochester, NY, USA.
| | - Ana Pesquita
- UBC Vision Lab, Department of Psychology, University of British Colombia, Vancouver, BC, Canada
| | - Agnes Alsius
- Department of Psychology, Queen's University, Kingston, ON, Canada
| | - Kevin Munhall
- Department of Psychology, Queen's University, Kingston, ON, Canada
| | - Salvador Soto-Faraco
- Centre for Brain and Cognition, Department of Information Technology and Communications, Universitat Pompeu Fabra, Spain; Institució Catalana de Recerca i Estudis Avançats (ICREA), Spain
| |
Collapse
|
11
|
Jordan TR, Sheen M, Abedipour L, Paterson KB. Visual speech perception in foveal and extrafoveal vision: further implications for divisions in hemispheric projections. PLoS One 2014; 9:e98273. [PMID: 25032950 PMCID: PMC4102446 DOI: 10.1371/journal.pone.0098273] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2014] [Accepted: 04/30/2014] [Indexed: 11/19/2022] Open
Abstract
When observing a talking face, it has often been argued that visual speech to the left and right of fixation may produce differences in performance due to divided projections to the two cerebral hemispheres. However, while it seems likely that such a division in hemispheric projections exists for areas away from fixation, the nature and existence of a functional division in visual speech perception at the foveal midline remains to be determined. We investigated this issue by presenting visual speech in matched hemiface displays to the left and right of a central fixation point, either exactly abutting the foveal midline or else located away from the midline in extrafoveal vision. The location of displays relative to the foveal midline was controlled precisely using an automated, gaze-contingent eye-tracking procedure. Visual speech perception showed a clear right hemifield advantage when presented in extrafoveal locations but no hemifield advantage (left or right) when presented abutting the foveal midline. Thus, while visual speech observed in extrafoveal vision appears to benefit from unilateral projections to left-hemisphere processes, no evidence was obtained to indicate that a functional division exists when visual speech is observed around the point of fixation. Implications of these findings for understanding visual speech perception and the nature of functional divisions in hemispheric projection are discussed.
Collapse
Affiliation(s)
| | | | - Lily Abedipour
- School of Psychology, University of Leicester, Leicester, United Kingdom
| | - Kevin B. Paterson
- School of Psychology, University of Leicester, Leicester, United Kingdom
| |
Collapse
|
12
|
van Wassenhove V. Speech through ears and eyes: interfacing the senses with the supramodal brain. Front Psychol 2013; 4:388. [PMID: 23874309 PMCID: PMC3709159 DOI: 10.3389/fpsyg.2013.00388] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2013] [Accepted: 06/10/2013] [Indexed: 12/02/2022] Open
Abstract
The comprehension of auditory-visual (AV) speech integration has greatly benefited from recent advances in neurosciences and multisensory research. AV speech integration raises numerous questions relevant to the computational rules needed for binding information (within and across sensory modalities), the representational format in which speech information is encoded in the brain (e.g., auditory vs. articulatory), or how AV speech ultimately interfaces with the linguistic system. The following non-exhaustive review provides a set of empirical findings and theoretical questions that have fed the original proposal for predictive coding in AV speech processing. More recently, predictive coding has pervaded many fields of inquiries and positively reinforced the need to refine the notion of internal models in the brain together with their implications for the interpretation of neural activity recorded with various neuroimaging techniques. However, it is argued here that the strength of predictive coding frameworks reside in the specificity of the generative internal models not in their generality; specifically, internal models come with a set of rules applied on particular representational formats themselves depending on the levels and the network structure at which predictive operations occur. As such, predictive coding in AV speech owes to specify the level(s) and the kinds of internal predictions that are necessary to account for the perceptual benefits or illusions observed in the field. Among those specifications, the actual content of a prediction comes first and foremost, followed by the representational granularity of that prediction in time. This review specifically presents a focused discussion on these issues.
Collapse
Affiliation(s)
- Virginie van Wassenhove
- Cognitive Neuroimaging Unit, Brain Dynamics, INSERM, U992 Gif/Yvette, France ; NeuroSpin Center, CEA, DSV/I2BM Gif/Yvette, France ; Cognitive Neuroimaging Unit, University Paris-Sud Gif/Yvette, France
| |
Collapse
|
13
|
When half a face is as good as a whole: Effects of simple substantial occlusion on visual and audiovisual speech perception. Atten Percept Psychophys 2011; 73:2270-85. [DOI: 10.3758/s13414-011-0152-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
14
|
Legault I, Gagné JP, Rhoualem W, Anderson-Gosselin P. The effects of blurred vision on auditory-visual speech perception in younger and older adults. Int J Audiol 2010; 49:904-11. [DOI: 10.3109/14992027.2010.509112] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
|
15
|
Bernstein LE, Jiang J, Pantazis D, Lu ZL, Joshi A. Visual phonetic processing localized using speech and nonspeech face gestures in video and point-light displays. Hum Brain Mapp 2010; 32:1660-76. [PMID: 20853377 DOI: 10.1002/hbm.21139] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2008] [Revised: 06/08/2010] [Accepted: 06/28/2010] [Indexed: 11/09/2022] Open
Abstract
The talking face affords multiple types of information. To isolate cortical sites with responsibility for integrating linguistically relevant visual speech cues, speech and nonspeech face gestures were presented in natural video and point-light displays during fMRI scanning at 3.0T. Participants with normal hearing viewed the stimuli and also viewed localizers for the fusiform face area (FFA), the lateral occipital complex (LOC), and the visual motion (V5/MT) regions of interest (ROIs). The FFA, the LOC, and V5/MT were significantly less activated for speech relative to nonspeech and control stimuli. Distinct activation of the posterior superior temporal sulcus and the adjacent middle temporal gyrus to speech, independent of media, was obtained in group analyses. Individual analyses showed that speech and nonspeech stimuli were associated with adjacent but different activations, with the speech activations more anterior. We suggest that the speech activation area is the temporal visual speech area (TVSA), and that it can be localized with the combination of stimuli used in this study.
Collapse
Affiliation(s)
- Lynne E Bernstein
- Division of Communication and Auditory Neuroscience, House Ear Institute, Los Angeles, California, USA.
| | | | | | | | | |
Collapse
|
16
|
Brault LM, Gilbert JL, Lansing CR, McCarley JS, Kramer AF. Bimodal stimulus presentation and expanded auditory bandwidth improve older adults' speech perception. HUMAN FACTORS 2010; 52:479-491. [PMID: 21141241 DOI: 10.1177/0018720810380404] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
OBJECTIVE A pair of experiments investigated the hypothesis that bimodal (auditory-visual) speech presentation and expanded auditory bandwidth would improve speech intelligibility and increase working memory performance for older adults by reducing the cognitive effort needed for speech perception. BACKGROUND Although telephone communication is important for helping older adults maintain social engagement, age-related sensory and working memory limits may make telephone conversations difficult. METHOD Older adults with either age-normal hearing or mild-to-moderate sensorineural hearing loss performed a running memory task. Participants heard word strings of unpredictable length and at the end of each string were required to repeat back the final three words.Words were presented monaurally in telephone bandwidth (300 Hz to 3300 Hz) or expanded bandwidth (50 Hz to 7500 Hz), in quiet (65 dBZ SPL), or in white noise (65 dBZ SPL with noise at 60 dBZ SPL), with or without a visual display of the talker. RESULTS In quiet listening conditions, bimodal presentation increased the number of words correctly reported per trial but only for listeners with hearing loss and with high lipreading proficiency. Stimulus bandwidth did not affect performance. In noise, bimodal presentation and expanded bandwidth improved performance for all participant groups but did so by improving speech intelligibility, not by improving working memory. CONCLUSION Expanded bandwidth and bimodal presentation can improve speech perceptibility in difficult listening conditions but may not always improve working memory performance. APPLICATION Results can inform the design of telephone features to improve ease of communication for older adults.
Collapse
Affiliation(s)
- Lynn M Brault
- Department of Speech and Hearing Science, University of Illinois at Urbana-Champaign, Champaign, IL 61820, USA
| | | | | | | | | |
Collapse
|
17
|
Rimell AN, Mansfield NJ, Hands D. The influence of content, task and sensory interaction on multimedia quality perception. ERGONOMICS 2008; 51:85-97. [PMID: 17852366 DOI: 10.1080/00140130701526432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Human sensory interaction plays an important (but not yet fully understood) role in determining how individuals interact with the world around them. There are numerous types of sensory interaction and this paper examines the interaction of the auditory and visual senses for viewers of multimedia systems. This paper addresses two questions: first, does perception of quality in one modality affect the perception of quality in the other modality and, second, does focusing attention towards one modality affect the viewer's ability to detect errors in the other modality? The perception of audio quality and video quality are closely linked for certain multimedia content. To investigate this relationship, two experiments were conducted where participants were presented with multimedia content where varying distortion had been introduced into both the auditory and visual streams. Participants were asked to state their opinion of the audio, video or overall quality using a standardized scale. Results and subsequent statistical analysis showed that subjective audio quality varied with the video quality and vice versa. Furthermore, when a participant was attending to just one modality, they were less sensitive to reduced quality in the other modality.
Collapse
Affiliation(s)
- A N Rimell
- Department of Human Sciences, Loughborough University, Loughborough, LE11 3TU, UK
| | | | | |
Collapse
|
18
|
van Wassenhove V, Grant KW, Poeppel D. Temporal window of integration in auditory-visual speech perception. Neuropsychologia 2006; 45:598-607. [PMID: 16530232 DOI: 10.1016/j.neuropsychologia.2006.01.001] [Citation(s) in RCA: 375] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2005] [Revised: 11/28/2005] [Accepted: 01/12/2006] [Indexed: 10/24/2022]
Abstract
Forty-three normal hearing participants were tested in two experiments, which focused on temporal coincidence in auditory visual (AV) speech perception. In these experiments, audio recordings of/pa/and/ba/were dubbed onto video recordings of /ba/or/ga/, respectively (ApVk, AbVg), to produce the illusory "fusion" percepts /ta/, or /da/ [McGurk, H., & McDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746-747]. In Experiment 1, an identification task using McGurk pairs with asynchronies ranging from -467 ms (auditory lead) to +467 ms was conducted. Fusion responses were prevalent over temporal asynchronies from -30 ms to +170 ms and more robust for audio lags. In Experiment 2, simultaneity judgments for incongruent and congruent audiovisual tokens (AdVd, AtVt) were collected. McGurk pairs were more readily judged as asynchronous than congruent pairs. Characteristics of the temporal window over which simultaneity and fusion responses were maximal were quite similar, suggesting the existence of a 200 ms duration asymmetric bimodal temporal integration window.
Collapse
|
19
|
Thomas SM, Jordan TR. Contributions of oral and extraoral facial movement to visual and audiovisual speech perception. J Exp Psychol Hum Percept Perform 2005; 30:873-88. [PMID: 15462626 DOI: 10.1037/0096-1523.30.5.873] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Seeing a talker's face influences auditory speech recognition, but the visible input essential for this influence has yet to be established. Using a new seamless editing technique, the authors examined effects of restricting visible movement to oral or extraoral areas of a talking face. In Experiment 1, visual speech identification and visual influences on identifying auditory speech were compared across displays in which the whole face moved, the oral area moved, or the extraoral area moved. Visual speech influences on auditory speech recognition were substantial and unchanging across whole-face and oral-movement displays. However, extraoral movement also influenced identification of visual and audiovisual speech. Experiments 2 and 3 demonstrated that these results are dependent on intact and upright facial contexts, but only with extraoral movement displays.
Collapse
Affiliation(s)
- Sharon M Thomas
- MRC Institute of Hearing Research, University Park, Nottingham, England.
| | | |
Collapse
|
20
|
Abstract
Four experiments examined the nature of multisensory speech information. In Experiment 1, participants were asked to match heard voices with dynamic visual-alone video clips of speakers' articulating faces. This cross-modal matching task was used to examine whether vocal source matching can be accomplished across sensory modalities. The results showed that observers could match speaking faces and voices, indicating that information about the speaker was available for cross-modal comparisons. In a series of follow-up experiments, several stimulus manipulations were used to determine some of the critical acoustic and optic patterns necessary for specifying cross-modal source information. The results showed that cross-modal source information was not available in static visual displays of faces and was not contingent on a prominent acoustic cue to vocal identity (f0). Furthermore, cross-modal matching was not possible when the acoustic signal was temporally reversed.
Collapse
Affiliation(s)
- Lorin Lachs
- Department of Psychology California State University, Fresno
| | | |
Collapse
|
21
|
Calvert GA, Campbell R. Reading speech from still and moving faces: the neural substrates of visible speech. J Cogn Neurosci 2003; 15:57-70. [PMID: 12590843 DOI: 10.1162/089892903321107828] [Citation(s) in RCA: 229] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Speech is perceived both by ear and by eye. Unlike heard speech, some seen speech gestures can be captured in stilled image sequences. Previous studies have shown that in hearing people, natural time-varying silent seen speech can access the auditory cortex (left superior temporal regions). Using functional magnetic resonance imaging (fMRI), the present study explored the extent to which this circuitry was activated when seen speech was deprived of its time-varying characteristics. In the scanner, hearing participants were instructed to look for a prespecified visible speech target sequence ("voo" or "ahv") among other monosyllables. In one condition, the image sequence comprised a series of stilled key frames showing apical gestures (e.g., separate frames for "v" and "oo" [from the target] or "ee" and "m" [i.e., from nontarget syllables]). In the other condition, natural speech movement of the same overall segment duration was seen. In contrast to a baseline condition in which the letter "V" was superimposed on a resting face, stilled speech face images generated activation in posterior cortical regions associated with the perception of biological movement, despite the lack of apparent movement in the speech image sequence. Activation was also detected in traditional speech-processing regions including the left inferior frontal (Broca's) area, left superior temporal sulcus (STS), and left supramarginal gyrus (the dorsal aspect of Wernicke's area). Stilled speech sequences also generated activation in the ventral premotor cortex and anterior inferior parietal sulcus bilaterally. Moving faces generated significantly greater cortical activation than stilled face sequences, and in similar regions. However, a number of differences between stilled and moving speech were also observed. In the visual cortex, stilled faces generated relatively more activation in primary visual regions (V1/V2), while visual movement areas (V5/MT+) were activated to a greater extent by moving faces. Cortical regions activated more by naturally moving speaking faces included the auditory cortex (Brodmann's Areas 41/42; lateral parts of Heschl's gyrus) and the left STS and inferior frontal gyrus. Seen speech with normal time-varying characteristics appears to have preferential access to "purely" auditory processing regions specialized for language, possibly via acquired dynamic audiovisual integration mechanisms in STS. When seen speech lacks natural time-varying characteristics, access to speech-processing systems in the left temporal lobe may be achieved predominantly via action-based speech representations, realized in the ventral premotor cortex.
Collapse
Affiliation(s)
- Gemma A Calvert
- University Laboratory of Physiology, University of Oxford, UK.
| | | |
Collapse
|
22
|
Thomas SM, Jordan TR. Determining the influence of Gaussian blurring on inversion effects with talking faces. ACTA ACUST UNITED AC 2002; 64:932-44. [PMID: 12269300 DOI: 10.3758/bf03196797] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Perception of visual speech and the influence of visual speech on auditory speech perception is affected by the orientation of a talker's face, but the nature of the visual information underlying this effect has yet to be established. Here, we examine the contributions of visually coarse (configural) and fine (featural) facial movement information to inversion effects in the perception of visual and audiovisual speech. We describe two experiments in which we disrupted perception of fine facial detail by decreasing spatial frequency (blurring) and disrupted perception of coarse configural information by facial inversion. For normal, unblurred talking faces, facial inversion had no influence on visual speech identification or on the effects of congruent or incongruent visual speech movements on perception of auditory speech. However, for blurred faces, facial inversion reduced identification of unimodal visual speech and effects of visual speech on perception of congruent and incongruent auditory speech. These effects were more pronounced for words whose appearance may be defined by fine featural detail. Implications for the nature of inversion effects in visual and audiovisual speech are discussed.
Collapse
|
23
|
Thomas SM, Jordan TR. Techniques for the production of point-light and fully illuminated video displays from identical recordings. BEHAVIOR RESEARCH METHODS, INSTRUMENTS, & COMPUTERS : A JOURNAL OF THE PSYCHONOMIC SOCIETY, INC 2001; 33:59-64. [PMID: 11296720 DOI: 10.3758/bf03195347] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Illumination of only a few key points on a moving human body or face is enough to convey a compelling perception of human motion. A full understanding of the perception of biological motion from point-light displays requires accurate comparison with the perception of motion in normal, fully illuminated versions of the same images. Traditionally, these two types of stimuli (point-light and fully illuminated) have been filmed separately, allowing the introduction of uncontrolled variation across recordings. This is undesirable for accurate comparison of perceptual performance across the two types of display. This article describes simple techniques, using proprietary software, that allow production of point-light and fully illuminated video displays from identical recordings. These techniques are potentially useful for many studies of motion perception, by permitting precise comparison of perceptual performances across point-light displays and their fully illuminated counterparts with accuracy and comparative ease.
Collapse
Affiliation(s)
- S M Thomas
- University of Nottingham, University Park, England.
| | | |
Collapse
|
24
|
Jordan TR, Thomas SM. Effects of horizontal viewing angle on visual and audiovisual speech recognition. ACTA ACUST UNITED AC 2001. [DOI: 10.1037/0096-1523.27.6.1386] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|