1
|
Jertberg RM, Wienicke FJ, Andruszkiewicz K, Begeer S, Chakrabarti B, Geurts HM, Vries RD, der Burg EV. Differences Between Autistic and Non-Autistic Individuals in Audiovisual Speech Integration: A Systematic Review and Meta-analysis. Neurosci Biobehav Rev 2024:105787. [PMID: 38945419 DOI: 10.1016/j.neubiorev.2024.105787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 05/15/2024] [Accepted: 06/24/2024] [Indexed: 07/02/2024]
Abstract
Research has indicated unique challenges in audiovisual integration of speech among autistic individuals, although methodological differences have led to divergent findings. We conducted a systematic literature search to identify studies that measured audiovisual speech integration among both autistic and non-autistic individuals. Across the 18 identified studies (combined N = 952), autistic individuals showed impaired audiovisual integration compared to their non-autistic peers (g = 0.69, 95% CI [0.53, 0.85], p <.001). This difference was not found to be influenced by participants' mean ages, studies' sample sizes, risk-of-bias scores, or paradigms investigated. However, a subgroup analysis suggested that child studies may show larger between-group differences than adult ones. The prevailing pattern of impaired audiovisual speech integration in autism may have cascading effects on communicative and social behavior. However, small samples and inconsistency in design/analysis translated into considerable heterogeneity in findings and opacity regarding the influence of underlying unisensory and attentional factors. We recommend three key directions for future research: larger samples, more research with adults, and standardization of methodology and analytical approaches.
Collapse
Affiliation(s)
- Robert M Jertberg
- Department of Clinical and Developmental Psychology, Vrije Universiteit Amsterdam, The Netherlands and Amsterdam Public Health Research Institute, Amsterdam, Netherlands.
| | - Frederik J Wienicke
- Department of Clinical Psychology, Behavioural Science Institute, Radboud University, Nijmegen, Netherlands
| | - Krystian Andruszkiewicz
- Department of Clinical and Developmental Psychology, Vrije Universiteit Amsterdam, The Netherlands and Amsterdam Public Health Research Institute, Amsterdam, Netherlands
| | - Sander Begeer
- Department of Clinical and Developmental Psychology, Vrije Universiteit Amsterdam, The Netherlands and Amsterdam Public Health Research Institute, Amsterdam, Netherlands
| | - Bhismadev Chakrabarti
- Centre for Autism, School of Psychology and Clinical Language Sciences, University of Reading, UK; India Autism Center, Kolkata, India; Department of Psychology, Ashoka University, India
| | - Hilde M Geurts
- Department of Psychology, Universiteit van Amsterdam, the Netherlands; Leo Kannerhuis (Youz/Parnassiagroup), the Netherlands
| | - Ralph de Vries
- Medical Library, Vrije Universiteit, Amsterdam, the Netherlands
| | - Erik Van der Burg
- Department of Psychology, Universiteit van Amsterdam, the Netherlands
| |
Collapse
|
2
|
Lodeiro Colatosti A, Pla Gil I, Morant Ventura A, Latorre Monteagudo E, Chacón Aranda L, Marco Algarra J. Normal hearing and verbal discrimination in real sounds environments. ACTA OTORRINOLARINGOLOGICA ESPANOLA 2024:S2173-5735(24)00066-8. [PMID: 38908790 DOI: 10.1016/j.otoeng.2024.05.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 05/19/2024] [Indexed: 06/24/2024]
Abstract
INTRODUCTION Human beings are constantly exposed to complex acoustic environments every day, which even pose challenges for individuals with normal hearing. Speech perception relies not only on fixed elements within the acoustic wave but is also influenced by various factors. These factors include speech intensity, environmental noise, the presence of other speakers, individual specific characteristics, spatial separatios of sound sources, ambient reverberation, and audiovisual cues. The objective of this study is twofold: to determine the auditory capacity of normal hearing individuals to discriminate spoken words in real-life acoustic conditions and perform a phonetic analysis of misunderstood spoken words. MATERIALS AND METHODS This is a descriptive observational cross-sectional study involving 20 normal hearing individuals. Verbal audiometry was conducted in an open-field environment, with sounds masked by simulated real-word acoustic environment at various sound intensity levels. To enhance sound emission, 2D visual images related to the sounds were displayed on a television. We analyzed the percentage of correct answers and performed a phonetic analysis of misunderstood Spanish bisyllabic words in each environment. RESULTS 14 women (70%) and 6 men (30%), with an average age of 26 ± 5,4 years and a mean airway hearing threshold in the right ear of 10,56 ± 3,52 dB SPL and in the left ear of 10,12 ± 2,49 dB SPL. The percentage of verbal discrimination in the "Ocean" sound environment was 97,2 ± 5,04%, "Restaurant" was 94 ± 4,58%, and "Traffic" was 86,2 ± 9,94% (p = 0,000). Regarding the phonetic analysis, the allophones that exhibited statistically significant differences were as follows: [o] (p = 0,002) within the group of vocalic phonemes, [n] (p = 0,000) of voiced nasal consonants, [r] (p = 0,0016) of voiced fricatives, [b] (p = 0,000) and [g] (p = 0,045) of voiced stops. CONCLUSION The dynamic properties of the acoustic environment can impact the ability of a normal hearing individual to extract information from a voice signal. Our study demonstrates that this ability decreases when the voice signal is masked by one or more simultaneous interfering voices, as observed in a "Restaurant" environment, and when it is masked by a continuous and intense noise environment such as "Traffic". Regarding the phonetic analysis, when the sound environment was composed of continuous-low frequency noise, we found that nasal consonants were particularly challenging to identify. Furthermore in situations with distracting verbal signals, vowels and vibrating consonants exhibited the worst intelligibility.
Collapse
Affiliation(s)
- Adriana Lodeiro Colatosti
- Servicio de Otorrinolaringología, Hospital Clínico Universitario de Valencia, Valencia, Spain; Servicio de Otorrinolaringología, Hospital General Universitario de Castellón, Castellón de la Plana, Spain.
| | - Ignacio Pla Gil
- Servicio de Otorrinolaringología, Hospital Clínico Universitario de Valencia, Valencia, Spain; Universidad de Valencia, Valencia, Spain
| | - Antonio Morant Ventura
- Servicio de Otorrinolaringología, Hospital Clínico Universitario de Valencia, Valencia, Spain; Universidad de Valencia, Valencia, Spain
| | | | - Lucía Chacón Aranda
- Servicio de Otorrinolaringología, Hospital Clínico Universitario de Valencia, Valencia, Spain
| | - Jaime Marco Algarra
- Servicio de Otorrinolaringología, Hospital Clínico Universitario de Valencia, Valencia, Spain; Universidad de Valencia, Valencia, Spain
| |
Collapse
|
3
|
Jertberg RM, Begeer S, Geurts HM, Chakrabarti B, Van der Burg E. Age, not autism, influences multisensory integration of speech stimuli among adults in a McGurk/MacDonald paradigm. Eur J Neurosci 2024; 59:2979-2994. [PMID: 38570828 DOI: 10.1111/ejn.16319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 02/27/2024] [Accepted: 02/28/2024] [Indexed: 04/05/2024]
Abstract
Differences between autistic and non-autistic individuals in perception of the temporal relationships between sights and sounds are theorized to underlie difficulties in integrating relevant sensory information. These, in turn, are thought to contribute to problems with speech perception and higher level social behaviour. However, the literature establishing this connection often involves limited sample sizes and focuses almost entirely on children. To determine whether these differences persist into adulthood, we compared 496 autistic and 373 non-autistic adults (aged 17 to 75 years). Participants completed an online version of the McGurk/MacDonald paradigm, a multisensory illusion indicative of the ability to integrate audiovisual speech stimuli. Audiovisual asynchrony was manipulated, and participants responded both to the syllable they perceived (revealing their susceptibility to the illusion) and to whether or not the audio and video were synchronized (allowing insight into temporal processing). In contrast with prior research with smaller, younger samples, we detected no evidence of impaired temporal or multisensory processing in autistic adults. Instead, we found that in both groups, multisensory integration correlated strongly with age. This contradicts prior presumptions that differences in multisensory perception persist and even increase in magnitude over the lifespan of autistic individuals. It also suggests that the compensatory role multisensory integration may play as the individual senses decline with age is intact. These findings challenge existing theories and provide an optimistic perspective on autistic development. They also underline the importance of expanding autism research to better reflect the age range of the autistic population.
Collapse
Affiliation(s)
- Robert M Jertberg
- Department of Clinical and Developmental Psychology, Vrije Universiteit Amsterdam, The Netherlands and Amsterdam Public Health Research Institute, Amsterdam, Netherlands
| | - Sander Begeer
- Department of Clinical and Developmental Psychology, Vrije Universiteit Amsterdam, The Netherlands and Amsterdam Public Health Research Institute, Amsterdam, Netherlands
| | - Hilde M Geurts
- Dutch Autism and ADHD Research Center (d'Arc), Brain & Cognition, Department of Psychology, Universiteit van Amsterdam, Amsterdam, The Netherlands
- Leo Kannerhuis (Youz/Parnassiagroup), Den Haag, The Netherlands
| | - Bhismadev Chakrabarti
- Centre for Autism, School of Psychology and Clinical Language Sciences, University of Reading, Reading, UK
- India Autism Center, Kolkata, India
- Department of Psychology, Ashoka University, Sonipat, India
| | - Erik Van der Burg
- Dutch Autism and ADHD Research Center (d'Arc), Brain & Cognition, Department of Psychology, Universiteit van Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
4
|
Butcher N, Bennetts RJ, Sexton L, Barbanta A, Lander K. Eye movement differences when recognising and learning moving and static faces. Q J Exp Psychol (Hove) 2024:17470218241252145. [PMID: 38644390 DOI: 10.1177/17470218241252145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Seeing a face in motion can help subsequent face recognition. Several explanations have been proposed for this "motion advantage," but other factors that might play a role have received less attention. For example, facial movement might enhance recognition by attracting attention to the internal facial features, thereby facilitating identification. However, there is no direct evidence that motion increases attention to regions of the face that facilitate identification (i.e., internal features) compared with static faces. We tested this hypothesis by recording participants' eye movements while they completed the famous face recognition (Experiment 1, N = 32), and face-learning (Experiment 2, N = 60, Experiment 3, N = 68) tasks, with presentation style manipulated (moving or static). Across all three experiments, a motion advantage was found, and participants directed a higher proportion of fixations to the internal features (i.e., eyes, nose, and mouth) of moving faces versus static. Conversely, the proportion of fixations to the internal non-feature area (i.e., cheeks, forehead, chin) and external area (Experiment 3) was significantly reduced for moving compared with static faces (all ps < .05). Results suggest that during both familiar and unfamiliar face recognition, facial motion is associated with increased attention to internal facial features, but only during familiar face recognition is the magnitude of the motion advantage significantly related functionally to the proportion of fixations directed to the internal features.
Collapse
Affiliation(s)
- Natalie Butcher
- Department of Psychology, Teesside University, Middlesbrough, UK
| | | | - Laura Sexton
- Department of Psychology, Teesside University, Middlesbrough, UK
- School of Psychology, Faculty of Health Sciences and Wellbeing, University of Sunderland, Sunderland, UK
| | | | - Karen Lander
- Division of Psychology, Communication and Human Neuroscience, University of Manchester, Manchester, UK
| |
Collapse
|
5
|
Jackson IR, Perugia E, Stone MA, Saunders GH. The impact of face coverings on audio-visual contributions to communication with conversational speech. Cogn Res Princ Implic 2024; 9:25. [PMID: 38652383 PMCID: PMC11039583 DOI: 10.1186/s41235-024-00552-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 04/06/2024] [Indexed: 04/25/2024] Open
Abstract
The use of face coverings can make communication more difficult by removing access to visual cues as well as affecting the physical transmission of speech sounds. This study aimed to assess the independent and combined contributions of visual and auditory cues to impaired communication when using face coverings. In an online task, 150 participants rated videos of natural conversation along three dimensions: (1) how much they could follow, (2) how much effort was required, and (3) the clarity of the speech. Visual and audio variables were independently manipulated in each video, so that the same video could be presented with or without a superimposed surgical-style mask, accompanied by one of four audio conditions (either unfiltered audio, or audio-filtered to simulate the attenuation associated with a surgical mask, an FFP3 mask, or a visor). Hypotheses and analyses were pre-registered. Both the audio and visual variables had a statistically significant negative impact across all three dimensions. Whether or not talkers' faces were visible made the largest contribution to participants' ratings. The study identifies a degree of attenuation whose negative effects can be overcome by the restoration of visual cues. The significant effects observed in this nominally low-demand task (speech in quiet) highlight the importance of the visual and audio cues in everyday life and that their consideration should be included in future face mask designs.
Collapse
Affiliation(s)
- I R Jackson
- Manchester Centre for Audiology and Deafness, School of Health Sciences, University of Manchester, Manchester, M13 9PL, UK.
| | - E Perugia
- Manchester Centre for Audiology and Deafness, School of Health Sciences, University of Manchester, Manchester, M13 9PL, UK
| | - M A Stone
- Manchester Centre for Audiology and Deafness, School of Health Sciences, University of Manchester, Manchester, M13 9PL, UK
- Manchester Academic Health Science Centre, Manchester, UK
| | - G H Saunders
- Manchester Centre for Audiology and Deafness, School of Health Sciences, University of Manchester, Manchester, M13 9PL, UK
| |
Collapse
|
6
|
Lalonde K, Peng ZE, Halverson DM, Dwyer GA. Children's use of spatial and visual cues for release from perceptual masking. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:1559-1569. [PMID: 38393738 PMCID: PMC10890829 DOI: 10.1121/10.0024766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 01/19/2024] [Accepted: 01/22/2024] [Indexed: 02/25/2024]
Abstract
This study examined the role of visual speech in providing release from perceptual masking in children by comparing visual speech benefit across conditions with and without a spatial separation cue. Auditory-only and audiovisual speech recognition thresholds in a two-talker speech masker were obtained from 21 children with typical hearing (7-9 years of age) using a color-number identification task. The target was presented from a loudspeaker at 0° azimuth. Masker source location varied across conditions. In the spatially collocated condition, the masker was also presented from the loudspeaker at 0° azimuth. In the spatially separated condition, the masker was presented from the loudspeaker at 0° azimuth and a loudspeaker at -90° azimuth, with the signal from the -90° loudspeaker leading the signal from the 0° loudspeaker by 4 ms. The visual stimulus (static image or video of the target talker) was presented at 0° azimuth. Children achieved better thresholds when the spatial cue was provided and when the visual cue was provided. Visual and spatial cue benefit did not differ significantly depending on the presence of the other cue. Additional studies are needed to characterize how children's preferential use of visual and spatial cues varies depending on the strength of each cue.
Collapse
Affiliation(s)
- Kaylah Lalonde
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, Nebraska 68131, USA
| | - Z Ellen Peng
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, Nebraska 68131, USA
| | - Destinee M Halverson
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, Nebraska 68131, USA
| | - Grace A Dwyer
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, Nebraska 68131, USA
| |
Collapse
|
7
|
Ross LA, Molholm S, Butler JS, Del Bene VA, Brima T, Foxe JJ. Neural correlates of audiovisual narrative speech perception in children and adults on the autism spectrum: A functional magnetic resonance imaging study. Autism Res 2024; 17:280-310. [PMID: 38334251 DOI: 10.1002/aur.3104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Accepted: 01/19/2024] [Indexed: 02/10/2024]
Abstract
Autistic individuals show substantially reduced benefit from observing visual articulations during audiovisual speech perception, a multisensory integration deficit that is particularly relevant to social communication. This has mostly been studied using simple syllabic or word-level stimuli and it remains unclear how altered lower-level multisensory integration translates to the processing of more complex natural multisensory stimulus environments in autism. Here, functional neuroimaging was used to examine neural correlates of audiovisual gain (AV-gain) in 41 autistic individuals to those of 41 age-matched non-autistic controls when presented with a complex audiovisual narrative. Participants were presented with continuous narration of a story in auditory-alone, visual-alone, and both synchronous and asynchronous audiovisual speech conditions. We hypothesized that previously identified differences in audiovisual speech processing in autism would be characterized by activation differences in brain regions well known to be associated with audiovisual enhancement in neurotypicals. However, our results did not provide evidence for altered processing of auditory alone, visual alone, audiovisual conditions or AV- gain in regions associated with the respective task when comparing activation patterns between groups. Instead, we found that autistic individuals responded with higher activations in mostly frontal regions where the activation to the experimental conditions was below baseline (de-activations) in the control group. These frontal effects were observed in both unisensory and audiovisual conditions, suggesting that these altered activations were not specific to multisensory processing but reflective of more general mechanisms such as an altered disengagement of Default Mode Network processes during the observation of the language stimulus across conditions.
Collapse
Affiliation(s)
- Lars A Ross
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, USA
- Department of Imaging Sciences, University of Rochester Medical Center, University of Rochester School of Medicine and Dentistry, Rochester, New York, USA
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, USA
| | - Sophie Molholm
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, USA
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, USA
| | - John S Butler
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, USA
- School of Mathematics and Statistics, Technological University Dublin, City Campus, Dublin, Ireland
| | - Victor A Del Bene
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, USA
- Heersink School of Medicine, Department of Neurology, University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - Tufikameni Brima
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, USA
| | - John J Foxe
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, USA
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, USA
| |
Collapse
|
8
|
Gijbels L, Lee AKC, Yeatman JD. Children with developmental dyslexia have equivalent audiovisual speech perception performance but their perceptual weights differ. Dev Sci 2024; 27:e13431. [PMID: 37403418 DOI: 10.1111/desc.13431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 05/18/2023] [Accepted: 06/19/2023] [Indexed: 07/06/2023]
Abstract
As reading is inherently a multisensory, audiovisual (AV) process where visual symbols (i.e., letters) are connected to speech sounds, the question has been raised whether individuals with reading difficulties, like children with developmental dyslexia (DD), have broader impairments in multisensory processing. This question has been posed before, yet it remains unanswered due to (a) the complexity and contentious etiology of DD along with (b) lack of consensus on developmentally appropriate AV processing tasks. We created an ecologically valid task for measuring multisensory AV processing by leveraging the natural phenomenon that speech perception improves when listeners are provided visual information from mouth movements (particularly when the auditory signal is degraded). We designed this AV processing task with low cognitive and linguistic demands such that children with and without DD would have equal unimodal (auditory and visual) performance. We then collected data in a group of 135 children (age 6.5-15) with an AV speech perception task to answer the following questions: (1) How do AV speech perception benefits manifest in children, with and without DD? (2) Do children all use the same perceptual weights to create AV speech perception benefits, and (3) what is the role of phonological processing in AV speech perception? We show that children with and without DD have equal AV speech perception benefits on this task, but that children with DD rely less on auditory processing in more difficult listening situations to create these benefits and weigh both incoming information streams differently. Lastly, any reported differences in speech perception in children with DD might be better explained by differences in phonological processing than differences in reading skills. RESEARCH HIGHLIGHTS: Children with versus without developmental dyslexia have equal audiovisual speech perception benefits, regardless of their phonological awareness or reading skills. Children with developmental dyslexia rely less on auditory performance to create audiovisual speech perception benefits. Individual differences in speech perception in children might be better explained by differences in phonological processing than differences in reading skills.
Collapse
Affiliation(s)
- Liesbeth Gijbels
- Department of Speech & Hearing Sciences, University of Washington, Seattle, Washington, USA
- University of Washington, Institute for Learning & Brain Sciences, Seattle, Washington, USA
| | - Adrian K C Lee
- Department of Speech & Hearing Sciences, University of Washington, Seattle, Washington, USA
- University of Washington, Institute for Learning & Brain Sciences, Seattle, Washington, USA
| | - Jason D Yeatman
- Division of Developmental-Behavioral Pediatrics, Stanford University School of Medicine, Stanford, California, USA
- Stanford University Graduate School of Education, Stanford, California, USA
- Stanford University Department of Psychology, Stanford, California, USA
| |
Collapse
|
9
|
Mok S, Park S, Whang M. Examining the Impact of Digital Human Gaze Expressions on Engagement Induction. Biomimetics (Basel) 2023; 8:610. [PMID: 38132549 PMCID: PMC10742036 DOI: 10.3390/biomimetics8080610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 12/11/2023] [Accepted: 12/12/2023] [Indexed: 12/23/2023] Open
Abstract
With advancements in technology, digital humans are becoming increasingly sophisticated, with their application scope widening to include interactions with real people. However, research on expressions that facilitate natural engagement in interactions between real people and digital humans is scarce. With this study, we aimed to examine the differences in user engagement as measured by subjective evaluations, eye tracking, and electroencephalogram (EEG) responses relative to different gaze expressions in various conversational contexts. Conversational situations were categorized as face-to-face, face-to-video, and digital human interactions, with gaze expressions segmented into eye contact and gaze avoidance. Story stimuli incorporating twelve sentences verified to elicit positive and negative emotional responses were employed in the experiments after validation. A total of 45 participants (31 females and 14 males) underwent stimulation through positive and negative stories while exhibiting eye contact or gaze avoidance under each of the three conversational conditions. Engagement was assessed using subjective evaluation metrics in conjunction with measures of the subjects' gaze and brainwave activity. The findings revealed engagement disparities between the face-to-face and digital-human conversation conditions. Notably, only positive stimuli elicited variations in engagement based on gaze expression across different conversation conditions. Gaze analysis corroborated the engagement differences, aligning with prior research on social sensitivity, but only in response to positive stimuli. This research departs from traditional studies of un-natural interactions with digital humans, focusing instead on interactions with digital humans designed to mimic the appearance of real humans. This study demonstrates the potential for gaze expression to induce engagement, regardless of the human or digital nature of the conversational dyads.
Collapse
Affiliation(s)
- Subin Mok
- Department of Emotion Engineering, Sangmyung University, Seoul 03016, Republic of Korea; (S.M.); (S.P.)
| | - Sung Park
- Department of Emotion Engineering, Sangmyung University, Seoul 03016, Republic of Korea; (S.M.); (S.P.)
| | - Mincheol Whang
- Department of Human-Centered Artificial Intelligence, Sangmyung University, Seoul 03016, Republic of Korea
| |
Collapse
|
10
|
Ahn E, Majumdar A, Lee T, Brang D. Evidence for a Causal Dissociation of the McGurk Effect and Congruent Audiovisual Speech Perception via TMS. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.27.568892. [PMID: 38077093 PMCID: PMC10705272 DOI: 10.1101/2023.11.27.568892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2023]
Abstract
Congruent visual speech improves speech perception accuracy, particularly in noisy environments. Conversely, mismatched visual speech can alter what is heard, leading to an illusory percept known as the McGurk effect. This illusion has been widely used to study audiovisual speech integration, illustrating that auditory and visual cues are combined in the brain to generate a single coherent percept. While prior transcranial magnetic stimulation (TMS) and neuroimaging studies have identified the left posterior superior temporal sulcus (pSTS) as a causal region involved in the generation of the McGurk effect, it remains unclear whether this region is critical only for this illusion or also for the more general benefits of congruent visual speech (e.g., increased accuracy and faster reaction times). Indeed, recent correlative research suggests that the benefits of congruent visual speech and the McGurk effect reflect largely independent mechanisms. To better understand how these different features of audiovisual integration are causally generated by the left pSTS, we used single-pulse TMS to temporarily impair processing while subjects were presented with either incongruent (McGurk) or congruent audiovisual combinations. Consistent with past research, we observed that TMS to the left pSTS significantly reduced the strength of the McGurk effect. Importantly, however, left pSTS stimulation did not affect the positive benefits of congruent audiovisual speech (increased accuracy and faster reaction times), demonstrating a causal dissociation between the two processes. Our results are consistent with models proposing that the pSTS is but one of multiple critical areas supporting audiovisual speech interactions. Moreover, these data add to a growing body of evidence suggesting that the McGurk effect is an imperfect surrogate measure for more general and ecologically valid audiovisual speech behaviors.
Collapse
Affiliation(s)
- EunSeon Ahn
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109
| | - Areti Majumdar
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109
| | - Taraz Lee
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109
| | - David Brang
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109
| |
Collapse
|
11
|
Karagkouni O. The Effects of the Use of Protective Face Mask on the Voice and Its Relation to Self-Perceived Voice Changes. J Voice 2023; 37:802.e1-802.e14. [PMID: 34167856 DOI: 10.1016/j.jvoice.2021.04.014] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Revised: 04/15/2021] [Accepted: 04/20/2021] [Indexed: 12/19/2022]
Abstract
OBJECTIVES The purpose of this study was to investigate the effects that the use of protective face mask has over the voice and to search for associations between the self-reported voice changes and the levels of discomfort experienced by the participants. Also, to detect any change it the phonatory patterns while speaking with face mask. METHODS This was a cross-sectional, observational study, conducted by distributing an online questionnaire. From a total of 155 people who participated in the study, 143 of them wore protective face mask during their working hours and qualified. Five groups of questions were used to measure the Speech Difficulties, the Mask Related Behaviors caused by the use of face mask, alterations in the Voice Perceptual Features, the Vocal Tract Discomfort levels and the Greek version of the Voice Handicap Index. The participants self-evaluated their voice and stated the frequency and severity of the symptoms they experienced during the mask usage period. RESULTS The results showed that the use of protective face mask increases the self-perception of changes in the voice, especially in the voice-breathing coordination and has great effect on the intelligibility and overall communication. The majority of people stated that they have to speak louder and that they have noticed alterations in the perceptual features of their voice, with hoarseness, and volume being the most frequently affected. Almost every symptom on the Vocal Tract Discomfort Group was present with Dry, Lump in Throat, Tight, and Short Breath being the most severe, and Dry and Short breath being the most common among them. Physical, Functional, and Emotional affectations were also observed through the Voice Handicap Index. CONCLUSION The use of protective face mask increases the vocal effort of the speaker, affects the voice-breathing coordination, limits the overall communication, alters the perceptual features of the voice, increases vocal track discomfort levels and results in psychosocial and socioemotional difficulties. All these affectations may result to the establishment of a voice disorder, especially in high-risk population.
Collapse
|
12
|
Moradi S, Rönnberg J. Perceptual Doping: A Hypothesis on How Early Audiovisual Speech Stimulation Enhances Subsequent Auditory Speech Processing. Brain Sci 2023; 13:brainsci13040601. [PMID: 37190566 DOI: 10.3390/brainsci13040601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 03/27/2023] [Accepted: 03/30/2023] [Indexed: 04/05/2023] Open
Abstract
Face-to-face communication is one of the most common means of communication in daily life. We benefit from both auditory and visual speech signals that lead to better language understanding. People prefer face-to-face communication when access to auditory speech cues is limited because of background noise in the surrounding environment or in the case of hearing impairment. We demonstrated that an early, short period of exposure to audiovisual speech stimuli facilitates subsequent auditory processing of speech stimuli for correct identification, but early auditory exposure does not. We called this effect “perceptual doping” as an early audiovisual speech stimulation dopes or recalibrates auditory phonological and lexical maps in the mental lexicon in a way that results in better processing of auditory speech signals for correct identification. This short opinion paper provides an overview of perceptual doping and how it differs from similar auditory perceptual aftereffects following exposure to audiovisual speech materials, its underlying cognitive mechanism, and its potential usefulness in the aural rehabilitation of people with hearing difficulties.
Collapse
Affiliation(s)
- Shahram Moradi
- Department of Health, Social and Welfare Studies, Faculty of Health and Social Sciences, University of South-Eastern Norway, 3918 Porsgrunn, Norway
| | - Jerker Rönnberg
- Department of Behavioral Sciences and Learning, Linnaeus Centre Head, Linköping University, 581 83 Linköping, Sweden
| |
Collapse
|
13
|
Chalas N, Omigie D, Poeppel D, van Wassenhove V. Hierarchically nested networks optimize the analysis of audiovisual speech. iScience 2023; 26:106257. [PMID: 36909667 PMCID: PMC9993032 DOI: 10.1016/j.isci.2023.106257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Revised: 12/22/2022] [Accepted: 02/17/2023] [Indexed: 02/22/2023] Open
Abstract
In conversational settings, seeing the speaker's face elicits internal predictions about the upcoming acoustic utterance. Understanding how the listener's cortical dynamics tune to the temporal statistics of audiovisual (AV) speech is thus essential. Using magnetoencephalography, we explored how large-scale frequency-specific dynamics of human brain activity adapt to AV speech delays. First, we show that the amplitude of phase-locked responses parametrically decreases with natural AV speech synchrony, a pattern that is consistent with predictive coding. Second, we show that the temporal statistics of AV speech affect large-scale oscillatory networks at multiple spatial and temporal resolutions. We demonstrate a spatial nestedness of oscillatory networks during the processing of AV speech: these oscillatory hierarchies are such that high-frequency activity (beta, gamma) is contingent on the phase response of low-frequency (delta, theta) networks. Our findings suggest that the endogenous temporal multiplexing of speech processing confers adaptability within the temporal regimes that are essential for speech comprehension.
Collapse
Affiliation(s)
- Nikos Chalas
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, P.C., 48149 Münster, Germany
- CEA, DRF/Joliot, NeuroSpin, INSERM, Cognitive Neuroimaging Unit; CNRS; Université Paris-Saclay, 91191 Gif/Yvette, France
- School of Biology, Faculty of Sciences, Aristotle University of Thessaloniki, P.C., 54124 Thessaloniki, Greece
- Corresponding author
| | - Diana Omigie
- Department of Psychology, Goldsmiths University London, London, UK
| | - David Poeppel
- Department of Psychology, New York University, New York, NY 10003, USA
- Ernst Struengmann Institute for Neuroscience, 60528 Frankfurt am Main, Frankfurt, Germany
| | - Virginie van Wassenhove
- CEA, DRF/Joliot, NeuroSpin, INSERM, Cognitive Neuroimaging Unit; CNRS; Université Paris-Saclay, 91191 Gif/Yvette, France
- Corresponding author
| |
Collapse
|
14
|
Intensive Training of Spatial Hearing Promotes Auditory Abilities of Bilateral Cochlear Implant Adults: A Pilot Study. Ear Hear 2023; 44:61-76. [PMID: 35943235 DOI: 10.1097/aud.0000000000001256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
OBJECTIVE The aim of this study was to evaluate the feasibility of a virtual reality-based spatial hearing training protocol in bilateral cochlear implant (CI) users and to provide pilot data on the impact of this training on different qualities of hearing. DESIGN Twelve bilateral CI adults aged between 19 and 69 followed an intensive 10-week rehabilitation program comprised eight virtual reality training sessions (two per week) interspersed with several evaluation sessions (2 weeks before training started, after four and eight training sessions, and 1 month after the end of training). During each 45-minute training session, participants localized a sound source whose position varied in azimuth and/or in elevation. At the start of each trial, CI users received no information about sound location, but after each response, feedback was given to enable error correction. Participants were divided into two groups: a multisensory feedback group (audiovisual spatial cue) and an unisensory group (visual spatial cue) who only received feedback in a wholly intact sensory modality. Training benefits were measured at each evaluation point using three tests: 3D sound localization in virtual reality, the French Matrix test, and the Speech, Spatial and other Qualities of Hearing questionnaire. RESULTS The training was well accepted and all participants attended the whole rehabilitation program. Four training sessions spread across 2 weeks were insufficient to induce significant performance changes, whereas performance on all three tests improved after eight training sessions. Front-back confusions decreased from 32% to 14.1% ( p = 0.017); speech recognition threshold score from 1.5 dB to -0.7 dB signal-to-noise ratio ( p = 0.029) and eight CI users successfully achieved a negative signal-to-noise ratio. One month after the end of structured training, these performance improvements were still present, and quality of life was significantly improved for both self-reports of sound localization (from 5.3 to 6.7, p = 0.015) and speech understanding (from 5.2 to 5.9, p = 0.048). CONCLUSIONS This pilot study shows the feasibility and potential clinical relevance of this type of intervention involving a sensorial immersive environment and could pave the way for more systematic rehabilitation programs after cochlear implantation.
Collapse
|
15
|
Hadley LV, Culling JF. Timing of head turns to upcoming talkers in triadic conversation: Evidence for prediction of turn ends and interruptions. Front Psychol 2022; 13:1061582. [PMID: 36605274 PMCID: PMC9807761 DOI: 10.3389/fpsyg.2022.1061582] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 11/24/2022] [Indexed: 12/24/2022] Open
Abstract
In conversation, people are able to listen to an utterance and respond within only a few hundred milliseconds. It takes substantially longer to prepare even a simple utterance, suggesting that interlocutors may make use of predictions about when the talker is about to end. But it is not only the upcoming talker that needs to anticipate the prior talker ending-listeners that are simply following the conversation could also benefit from predicting the turn end in order to shift attention appropriately with the turn switch. In this paper, we examined whether people predict upcoming turn ends when watching conversational turns switch between others by analysing natural conversations. These conversations were between triads of older adults in different levels and types of noise. The analysis focused on the observer during turn switches between the other two parties using head orientation (i.e. saccades from one talker to the next) to identify when their focus moved from one talker to the next. For non-overlapping utterances, observers started to turn to the upcoming talker before the prior talker had finished speaking in 17% of turn switches (going up to 26% when accounting for motor-planning time). For overlapping utterances, observers started to turn towards the interrupter before they interrupted in 18% of turn switches (going up to 33% when accounting for motor-planning time). The timing of head turns was more precise at lower than higher noise levels, and was not affected by noise type. These findings demonstrate that listeners in natural group conversation situations often exhibit head movements that anticipate the end of one conversational turn and the beginning of another. Furthermore, this work demonstrates the value of analysing head movement as a cue to social attention, which could be relevant for advancing communication technology such as hearing devices.
Collapse
Affiliation(s)
- Lauren V. Hadley
- Hearing Sciences – Scottish Section, School of Medicine, University of Nottingham, Glasgow, United Kingdom
| | - John F. Culling
- School of Psychology, Cardiff University, Cardiff, United Kingdom
| |
Collapse
|
16
|
Chawarska K, Lewkowicz D, Feiner H, Macari S, Vernetti A. Attention to audiovisual speech does not facilitate language acquisition in infants with familial history of autism. J Child Psychol Psychiatry 2022; 63:1466-1476. [PMID: 35244219 DOI: 10.1111/jcpp.13595] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 01/10/2022] [Accepted: 01/21/2022] [Indexed: 11/30/2022]
Abstract
BACKGROUND Due to familial liability, siblings of children with ASD exhibit elevated risk for language delays. The processes contributing to language delays in this population remain unclear. METHODS Considering well-established links between attention to dynamic audiovisual cues inherent in a speaker's face and speech processing, we investigated if attention to a speaker's face and mouth differs in 12-month-old infants at high familial risk for ASD but without ASD diagnosis (hr-sib; n = 91) and in infants at low familial risk (lr-sib; n = 62) for ASD and whether attention at 12 months predicts language outcomes at 18 months. RESULTS At 12 months, hr-sib and lr-sib infants did not differ in attention to face (p = .14), mouth preference (p = .30), or in receptive and expressive language scores (p = .36, p = .33). At 18 months, the hr-sib infants had lower receptive (p = .01) but not expressive (p = .84) language scores than the lr-sib infants. In the lr-sib infants, greater attention to the face (p = .022) and a mouth preference (p = .025) contributed to better language outcomes at 18 months. In the hr-sib infants, neither attention to the face nor a mouth preference was associated with language outcomes at 18 months. CONCLUSIONS Unlike low-risk infants, high-risk infants do not appear to benefit from audiovisual prosodic and speech cues in the service of language acquisition despite intact attention to these cues. We propose that impaired processing of audiovisual cues may constitute the link between genetic risk factors and poor language outcomes observed across the autism risk spectrum and may represent a promising endophenotype in autism.
Collapse
Affiliation(s)
- Katarzyna Chawarska
- Child Study Center, Yale University School of Medicine, New Haven, CT, USA
- Haskins Laboratories, New Haven, CT, USA
| | - David Lewkowicz
- Child Study Center, Yale University School of Medicine, New Haven, CT, USA
- Haskins Laboratories, New Haven, CT, USA
| | - Hannah Feiner
- Child Study Center, Yale University School of Medicine, New Haven, CT, USA
| | - Suzanne Macari
- Child Study Center, Yale University School of Medicine, New Haven, CT, USA
| | - Angelina Vernetti
- Child Study Center, Yale University School of Medicine, New Haven, CT, USA
| |
Collapse
|
17
|
Van Engen KJ, Dey A, Sommers MS, Peelle JE. Audiovisual speech perception: Moving beyond McGurk. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:3216. [PMID: 36586857 PMCID: PMC9894660 DOI: 10.1121/10.0015262] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 10/26/2022] [Accepted: 11/05/2022] [Indexed: 05/29/2023]
Abstract
Although it is clear that sighted listeners use both auditory and visual cues during speech perception, the manner in which multisensory information is combined is a matter of debate. One approach to measuring multisensory integration is to use variants of the McGurk illusion, in which discrepant auditory and visual cues produce auditory percepts that differ from those based on unimodal input. Not all listeners show the same degree of susceptibility to the McGurk illusion, and these individual differences are frequently used as a measure of audiovisual integration ability. However, despite their popularity, we join the voices of others in the field to argue that McGurk tasks are ill-suited for studying real-life multisensory speech perception: McGurk stimuli are often based on isolated syllables (which are rare in conversations) and necessarily rely on audiovisual incongruence that does not occur naturally. Furthermore, recent data show that susceptibility to McGurk tasks does not correlate with performance during natural audiovisual speech perception. Although the McGurk effect is a fascinating illusion, truly understanding the combined use of auditory and visual information during speech perception requires tasks that more closely resemble everyday communication: namely, words, sentences, and narratives with congruent auditory and visual speech cues.
Collapse
Affiliation(s)
- Kristin J Van Engen
- Department of Psychological and Brain Sciences, Washington University, St. Louis, Missouri 63130, USA
| | - Avanti Dey
- PLOS ONE, 1265 Battery Street, San Francisco, California 94111, USA
| | - Mitchell S Sommers
- Department of Psychological and Brain Sciences, Washington University, St. Louis, Missouri 63130, USA
| | - Jonathan E Peelle
- Department of Otolaryngology, Washington University, St. Louis, Missouri 63130, USA
| |
Collapse
|
18
|
Ross LA, Molholm S, Butler JS, Bene VAD, Foxe JJ. Neural correlates of multisensory enhancement in audiovisual narrative speech perception: a fMRI investigation. Neuroimage 2022; 263:119598. [PMID: 36049699 DOI: 10.1016/j.neuroimage.2022.119598] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 08/26/2022] [Accepted: 08/28/2022] [Indexed: 11/25/2022] Open
Abstract
This fMRI study investigated the effect of seeing articulatory movements of a speaker while listening to a naturalistic narrative stimulus. It had the goal to identify regions of the language network showing multisensory enhancement under synchronous audiovisual conditions. We expected this enhancement to emerge in regions known to underlie the integration of auditory and visual information such as the posterior superior temporal gyrus as well as parts of the broader language network, including the semantic system. To this end we presented 53 participants with a continuous narration of a story in auditory alone, visual alone, and both synchronous and asynchronous audiovisual speech conditions while recording brain activity using BOLD fMRI. We found multisensory enhancement in an extensive network of regions underlying multisensory integration and parts of the semantic network as well as extralinguistic regions not usually associated with multisensory integration, namely the primary visual cortex and the bilateral amygdala. Analysis also revealed involvement of thalamic brain regions along the visual and auditory pathways more commonly associated with early sensory processing. We conclude that under natural listening conditions, multisensory enhancement not only involves sites of multisensory integration but many regions of the wider semantic network and includes regions associated with extralinguistic sensory, perceptual and cognitive processing.
Collapse
Affiliation(s)
- Lars A Ross
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; Department of Imaging Sciences, University of Rochester Medical Center, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA.
| | - Sophie Molholm
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA
| | - John S Butler
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA; School of Mathematical Sciences, Technological University Dublin, Kevin Street Campus, Dublin, Ireland
| | - Victor A Del Bene
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA; University of Alabama at Birmingham, Heersink School of Medicine, Department of Neurology, Birmingham, Alabama, 35233, USA
| | - John J Foxe
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA.
| |
Collapse
|
19
|
The multisensory cocktail party problem in children: Synchrony-based segregation of multiple talking faces improves in early childhood. Cognition 2022; 228:105226. [PMID: 35882100 DOI: 10.1016/j.cognition.2022.105226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Revised: 07/09/2022] [Accepted: 07/11/2022] [Indexed: 11/23/2022]
Abstract
Extraction of meaningful information from multiple talkers relies on perceptual segregation. The temporal synchrony statistics inherent in everyday audiovisual (AV) speech offer a powerful basis for perceptual segregation. We investigated the developmental emergence of synchrony-based perceptual segregation of multiple talkers in 3-7-year-old children. Children either saw four identical or four different faces articulating temporally jittered versions of the same utterance and heard the audible version of the same utterance either synchronized with one of the talkers or desynchronized with all of them. Eye tracking revealed that selective attention to the temporally synchronized talking face increased while attention to the desynchronized faces decreased with age and that attention to the talkers' mouth primarily drove responsiveness. These findings demonstrate that the temporal synchrony statistics inherent in fluent AV speech assume an increasingly greater role in perceptual segregation of the multisensory clutter created by multiple talking faces in early childhood.
Collapse
|
20
|
Taris: An online speech recognition framework with sequence to sequence neural networks for both audio-only and audio-visual speech. COMPUT SPEECH LANG 2022. [DOI: 10.1016/j.csl.2022.101349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
21
|
Bernstein LE, Jordan N, Auer ET, Eberhardt SP. Lipreading: A Review of Its Continuing Importance for Speech Recognition With an Acquired Hearing Loss and Possibilities for Effective Training. Am J Audiol 2022; 31:453-469. [PMID: 35316072 DOI: 10.1044/2021_aja-21-00112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
PURPOSE The goal of this review article is to reinvigorate interest in lipreading and lipreading training for adults with acquired hearing loss. Most adults benefit from being able to see the talker when speech is degraded; however, the effect size is related to their lipreading ability, which is typically poor in adults who have experienced normal hearing through most of their lives. Lipreading training has been viewed as a possible avenue for rehabilitation of adults with an acquired hearing loss, but most training approaches have not been particularly successful. Here, we describe lipreading and theoretically motivated approaches to its training, as well as examples of successful training paradigms. We discuss some extensions to auditory-only (AO) and audiovisual (AV) speech recognition. METHOD Visual speech perception and word recognition are described. Traditional and contemporary views of training and perceptual learning are outlined. We focus on the roles of external and internal feedback and the training task in perceptual learning, and we describe results of lipreading training experiments. RESULTS Lipreading is commonly characterized as limited to viseme perception. However, evidence demonstrates subvisemic perception of visual phonetic information. Lipreading words also relies on lexical constraints, not unlike auditory spoken word recognition. Lipreading has been shown to be difficult to improve through training, but under specific feedback and task conditions, training can be successful, and learning can generalize to untrained materials, including AV sentence stimuli in noise. The results on lipreading have implications for AO and AV training and for use of acoustically processed speech in face-to-face communication. CONCLUSION Given its importance for speech recognition with a hearing loss, we suggest that the research and clinical communities integrate lipreading in their efforts to improve speech recognition in adults with acquired hearing loss.
Collapse
Affiliation(s)
- Lynne E. Bernstein
- Department of Speech, Language & Hearing Sciences, George Washington University, Washington, DC
| | - Nicole Jordan
- Department of Speech, Language & Hearing Sciences, George Washington University, Washington, DC
| | - Edward T. Auer
- Department of Speech, Language & Hearing Sciences, George Washington University, Washington, DC
| | - Silvio P. Eberhardt
- Department of Speech, Language & Hearing Sciences, George Washington University, Washington, DC
| |
Collapse
|
22
|
Mcleod RWJ, Gallagher M, Hall A, Bant SP, Culling JF. Acoustic analysis of the effect of personal protective equipment on speech understanding: lessons for clinical environments. Int J Audiol 2022:1-6. [DOI: 10.1080/14992027.2022.2070780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
| | | | - Andy Hall
- ENT Department, University Hospital of Wales, Cardiff, UK
| | - Sarah P. Bant
- Audiology Department, Betsi Cadwaladr University Health Board, Bangor, UK
| | | |
Collapse
|
23
|
Cox TJ, Dodgson G, Harris L, Perugia E, Stone MA, Walsh M. Improving the measurement and acoustic performance of transparent face masks and shields. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:2931. [PMID: 35649945 DOI: 10.1121/10.0010384] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 04/16/2022] [Indexed: 06/15/2023]
Abstract
Opaque face masks harm communication by preventing speech-reading (lip-reading) and attenuating high-frequency sound. Although transparent masks and shields (visors) with clear plastic inserts allow speech-reading, they usually create more sound attenuation than opaque masks. Consequently, an iterative process was undertaken to create a better design, and the instructions to make it are published. The experiments showed that lowering the mass of the plastic inserts decreases the high-frequency sound attenuation. A shield with a clear thermoplastic polyurethane (TPU) panel had an insertion loss of (2.0 ± 1.1) dB for 1.25-8 kHz, which improves on previous designs that had attenuations of 11.9 dB and above. A cloth mask with a TPU insert was designed and had an insertion loss of (4.6 ± 2.3) dB for 2-8 kHz, which is better than the 9-22 dB reported previously in the literature. The speech intelligibility index was also evaluated. Investigations to improve measurement protocols that use either mannikins or human talkers were undertaken. Manufacturing variability and inconsistency of human speaking were greater sources of experimental error than fitting differences. It was shown that measurements from a mannikin could match those from humans if insertion losses from four human talkers were averaged.
Collapse
Affiliation(s)
- Trevor J Cox
- Acoustics Research Centre, University of Salford, Salford, M5 4WT, United Kingdom
| | - George Dodgson
- Maker Space, University of Salford, Salford, M5 4WT, United Kingdom
| | - Lara Harris
- Acoustics Research Centre, University of Salford, Salford, M5 4WT, United Kingdom
| | - Emanuele Perugia
- Manchester Centre for Audiology and Deafness, University of Manchester, Manchester, M13 9PL, United Kingdom
| | - Michael A Stone
- Manchester Centre for Audiology and Deafness, University of Manchester, Manchester, M13 9PL, United Kingdom
| | - Michael Walsh
- Maker Space, University of Salford, Salford, M5 4WT, United Kingdom
| |
Collapse
|
24
|
Poon BT, Jenstad LM. Communication with face masks during the COVID-19 pandemic for adults with hearing loss. Cogn Res Princ Implic 2022; 7:24. [PMID: 35312877 PMCID: PMC8935619 DOI: 10.1186/s41235-022-00376-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 02/27/2022] [Indexed: 12/01/2022] Open
Abstract
Face masks have become common protective measures in community and workplace environments to help reduce the spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. Face masks can make it difficult to hear and understand speech, particularly for people with hearing loss. An aim of our cross-sectional survey was to investigate the extent that face masks as a health and safety protective measure against SARS-CoV-2 have affected understanding speech in the day-to-day lives of adults with deafness or hearing loss, and identify possible strategies to improve communication accessibility. We analyzed closed- and open-ended survey responses of 656 adults who self-identified as D/deaf or hard of hearing. Over 80% of respondents reported difficulty with understanding others who wore face masks. The proportion of those experiencing difficulty increased with increasing hearing loss severity. Recommended practical supports to facilitate communication and social interaction included more widespread use of clear face masks to aid lip-reading; improved clarity in policy guidance on face masks; and greater public awareness and understanding about ways to more clearly communicate with adults with hearing loss while wearing face masks.
Collapse
Affiliation(s)
- Brenda T Poon
- Wavefront Centre for Communication Accessibility, 2005 Quebec Street, Vancouver, BC, V5T 2Z6, Canada. .,School of Population and Public Health, University of British Columbia, 440 - 2206 East Mall, Vancouver, BC, V6T 1Z3, Canada.
| | - Lorienne M Jenstad
- School of Audiology and Speech Sciences, University of British Columbia, 4th Floor Friedman Building, 2177 Wesbrook Mall, Vancouver, BC, V6T 1Z3, Canada
| |
Collapse
|
25
|
Bernstein LE, Auer ET, Eberhardt SP. During Lipreading Training With Sentence Stimuli, Feedback Controls Learning and Generalization to Audiovisual Speech in Noise. Am J Audiol 2022; 31:57-77. [PMID: 34965362 DOI: 10.1044/2021_aja-21-00034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
PURPOSE This study investigated the effects of external feedback on perceptual learning of visual speech during lipreading training with sentence stimuli. The goal was to improve visual-only (VO) speech recognition and increase accuracy of audiovisual (AV) speech recognition in noise. The rationale was that spoken word recognition depends on the accuracy of sublexical (phonemic/phonetic) speech perception; effective feedback during training must support sublexical perceptual learning. METHOD Normal-hearing (NH) adults were assigned to one of three types of feedback: Sentence feedback was the entire sentence printed after responding to the stimulus. Word feedback was the correct response words and perceptually near but incorrect response words. Consonant feedback was correct response words and consonants in incorrect but perceptually near response words. Six training sessions were given. Pre- and posttraining testing included an untrained control group. Test stimuli were disyllable nonsense words for forced-choice consonant identification, and isolated words and sentences for open-set identification. Words and sentences were VO, AV, and audio-only (AO) with the audio in speech-shaped noise. RESULTS Lipreading accuracy increased during training. Pre- and posttraining tests of consonant identification showed no improvement beyond test-retest increases obtained by untrained controls. Isolated word recognition with a talker not seen during training showed that the control group improved more than the sentence group. Tests of untrained sentences showed that the consonant group significantly improved in all of the stimulus conditions (VO, AO, and AV). Its mean words correct scores increased by 9.2 percentage points for VO, 3.4 percentage points for AO, and 9.8 percentage points for AV stimuli. CONCLUSIONS Consonant feedback during training with sentences stimuli significantly increased perceptual learning. The training generalized to untrained VO, AO, and AV sentence stimuli. Lipreading training has potential to significantly improve adults' face-to-face communication in noisy settings in which the talker can be seen.
Collapse
Affiliation(s)
- Lynne E. Bernstein
- Department of Speech, Language, and Hearing Sciences, George Washington University, DC
| | - Edward T. Auer
- Department of Speech, Language, and Hearing Sciences, George Washington University, DC
| | - Silvio P. Eberhardt
- Department of Speech, Language, and Hearing Sciences, George Washington University, DC
| |
Collapse
|
26
|
Sönnichsen R, Llorach Tó G, Hochmuth S, Hohmann V, Radeloff A. How Face Masks Interfere With Speech Understanding of Normal-Hearing Individuals: Vision Makes the Difference. Otol Neurotol 2022; 43:282-288. [PMID: 34999618 PMCID: PMC8843397 DOI: 10.1097/mao.0000000000003458] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
OBJECTIVE To investigate the effects of wearing a simulated mask on speech perception of normal-hearing subjects. STUDY DESIGN Prospective cohort study. SETTING University hospital. PATIENTS Fifteen normal-hearing, native German speakers (8 female, 7 male). INTERVENTION Different experimental conditions with and without simulated face masks using the audiovisual version of the female German Matrix test (Oldenburger Satztest, OLSA). MAIN OUTCOME MEASURES Signal-to-noise ratio (SNR) at speech intelligibility of 80%. RESULTS The SNR at which 80% speech intelligibility was achieved deteriorated by a mean of 4.1 dB SNR when simulating a medical mask and by 5.1 dB SNR when simulating a cloth mask in comparison to the audiovisual condition without mask. Interestingly, the contribution of the visual component alone was 2.6 dB SNR and thus had a larger effect than the acoustic component in the medical mask condition. CONCLUSIONS As expected, speech understanding with face masks was significantly worse than under control conditions. Thus, the speaker's use of face masks leads to a significant deterioration of speech understanding by the normal-hearing listener. The data suggest that these effects may play a role in many everyday situations that typically involve noise.
Collapse
Affiliation(s)
- Rasmus Sönnichsen
- Department of Otolaryngology, Head and Neck Surgery, University of Oldenburg
| | - Gerard Llorach Tó
- Auditory Signal Processing and Hearing Devices, University of Oldenburg
| | - Sabine Hochmuth
- Department of Otolaryngology, Head and Neck Surgery, University of Oldenburg
| | - Volker Hohmann
- Auditory Signal Processing and Hearing Devices, University of Oldenburg
- Research Center Neurosensory Science, University of Oldenburg
- Cluster of Excellence “Hearing 4 All”, University of Oldenburg, Oldenburg, Germany
| | - Andreas Radeloff
- Department of Otolaryngology, Head and Neck Surgery, University of Oldenburg
- Research Center Neurosensory Science, University of Oldenburg
- Cluster of Excellence “Hearing 4 All”, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
27
|
Cieśla K, Wolak T, Lorens A, Mentzel M, Skarżyński H, Amedi A. Effects of training and using an audio-tactile sensory substitution device on speech-in-noise understanding. Sci Rep 2022; 12:3206. [PMID: 35217676 PMCID: PMC8881456 DOI: 10.1038/s41598-022-06855-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 01/28/2022] [Indexed: 11/09/2022] Open
Abstract
Understanding speech in background noise is challenging. Wearing face-masks, imposed by the COVID19-pandemics, makes it even harder. We developed a multi-sensory setup, including a sensory substitution device (SSD) that can deliver speech simultaneously through audition and as vibrations on the fingertips. The vibrations correspond to low frequencies extracted from the speech input. We trained two groups of non-native English speakers in understanding distorted speech in noise. After a short session (30-45 min) of repeating sentences, with or without concurrent matching vibrations, we showed comparable mean group improvement of 14-16 dB in Speech Reception Threshold (SRT) in two test conditions, i.e., when the participants were asked to repeat sentences only from hearing and also when matching vibrations on fingertips were present. This is a very strong effect, if one considers that a 10 dB difference corresponds to doubling of the perceived loudness. The number of sentence repetitions needed for both types of training to complete the task was comparable. Meanwhile, the mean group SNR for the audio-tactile training (14.7 ± 8.7) was significantly lower (harder) than for the auditory training (23.9 ± 11.8), which indicates a potential facilitating effect of the added vibrations. In addition, both before and after training most of the participants (70-80%) showed better performance (by mean 4-6 dB) in speech-in-noise understanding when the audio sentences were accompanied with matching vibrations. This is the same magnitude of multisensory benefit that we reported, with no training at all, in our previous study using the same experimental procedures. After training, performance in this test condition was also best in both groups (SRT ~ 2 dB). The least significant effect of both training types was found in the third test condition, i.e. when participants were repeating sentences accompanied with non-matching tactile vibrations and the performance in this condition was also poorest after training. The results indicate that both types of training may remove some level of difficulty in sound perception, which might enable a more proper use of speech inputs delivered via vibrotactile stimulation. We discuss the implications of these novel findings with respect to basic science. In particular, we show that even in adulthood, i.e. long after the classical "critical periods" of development have passed, a new pairing between a certain computation (here, speech processing) and an atypical sensory modality (here, touch) can be established and trained, and that this process can be rapid and intuitive. We further present possible applications of our training program and the SSD for auditory rehabilitation in patients with hearing (and sight) deficits, as well as healthy individuals in suboptimal acoustic situations.
Collapse
Affiliation(s)
- K Cieśla
- The Baruch Ivcher Institute for Brain, Cognition & Technology, The Baruch Ivcher School of Psychology and the Ruth and Meir Rosental Brain Imaging Center, Reichman University, Herzliya, Israel. .,World Hearing Centre, Institute of Physiology and Pathology of Hearing, Warsaw, Poland.
| | - T Wolak
- World Hearing Centre, Institute of Physiology and Pathology of Hearing, Warsaw, Poland
| | - A Lorens
- World Hearing Centre, Institute of Physiology and Pathology of Hearing, Warsaw, Poland
| | - M Mentzel
- The Baruch Ivcher Institute for Brain, Cognition & Technology, The Baruch Ivcher School of Psychology and the Ruth and Meir Rosental Brain Imaging Center, Reichman University, Herzliya, Israel
| | - H Skarżyński
- World Hearing Centre, Institute of Physiology and Pathology of Hearing, Warsaw, Poland
| | - A Amedi
- The Baruch Ivcher Institute for Brain, Cognition & Technology, The Baruch Ivcher School of Psychology and the Ruth and Meir Rosental Brain Imaging Center, Reichman University, Herzliya, Israel
| |
Collapse
|
28
|
Trudeau-Fisette P, Arnaud L, Ménard L. Visual Influence on Auditory Perception of Vowels by French-Speaking Children and Adults. Front Psychol 2022; 13:740271. [PMID: 35282186 PMCID: PMC8913716 DOI: 10.3389/fpsyg.2022.740271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 01/04/2022] [Indexed: 11/26/2022] Open
Abstract
Audiovisual interaction in speech perception is well defined in adults. Despite the large body of evidence suggesting that children are also sensitive to visual input, very few empirical studies have been conducted. To further investigate whether visual inputs influence auditory perception of phonemes in preschoolers in the same way as in adults, we conducted an audiovisual identification test. The auditory stimuli (/e/-/ø/ continuum) were presented either in an auditory condition only or simultaneously with a visual presentation of the articulation of the vowel /e/ or /ø/. The results suggest that, although all participants experienced visual influence on auditory perception, substantial individual differences exist in the 5- to 6-year-old group. While additional work is required to confirm this hypothesis, we suggest that auditory and visual systems are developing at that age and that multisensory phonological categorization of the rounding contrast took place only in children whose sensory systems and sensorimotor representations were mature.
Collapse
Affiliation(s)
- Paméla Trudeau-Fisette
- Laboratoire de Phonétique, Université du Québec à Montréal, Montreal, QC, Canada
- Centre for Research on Brain, Language and Music, Montreal, QC, Canada
- *Correspondence: Paméla Trudeau-Fisette,
| | - Laureline Arnaud
- Centre for Research on Brain, Language and Music, Montreal, QC, Canada
- Integrated Program in Neuroscience, McGill University, Montreal, QC, Canada
| | - Lucie Ménard
- Laboratoire de Phonétique, Université du Québec à Montréal, Montreal, QC, Canada
- Centre for Research on Brain, Language and Music, Montreal, QC, Canada
| |
Collapse
|
29
|
Zhao S, Li Y, Wang C, Feng C, Feng W. Updating the dual-mechanism model for cross-sensory attentional spreading: The influence of space-based visual selective attention. Hum Brain Mapp 2021; 42:6038-6052. [PMID: 34553806 PMCID: PMC8596974 DOI: 10.1002/hbm.25668] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Revised: 08/24/2021] [Accepted: 09/14/2021] [Indexed: 11/08/2022] Open
Abstract
Selective attention to visual stimuli can spread cross‐modally to task‐irrelevant auditory stimuli through either the stimulus‐driven binding mechanism or the representation‐driven priming mechanism. The stimulus‐driven attentional spreading occurs whenever a task‐irrelevant sound is delivered simultaneously with a spatially attended visual stimulus, whereas the representation‐driven attentional spreading occurs only when the object representation of the sound is congruent with that of the to‐be‐attended visual object. The current study recorded event‐related potentials in a space‐selective visual object‐recognition task to examine the exact roles of space‐based visual selective attention in both the stimulus‐driven and representation‐driven cross‐modal attentional spreading, which remain controversial in the literature. Our results yielded that the representation‐driven auditory Nd component (200–400 ms after sound onset) did not differ according to whether the peripheral visual representations of audiovisual target objects were spatially attended or not, but was decreased when the auditory representations of target objects were presented alone. In contrast, the stimulus‐driven auditory Nd component (200–300 ms) was decreased but still prominent when the peripheral visual constituents of audiovisual nontarget objects were spatially unattended. These findings demonstrate not only that the representation‐driven attentional spreading is independent of space‐based visual selective attention and benefits in an all‐or‐nothing manner from object‐based visual selection for actually presented visual representations of target objects, but also that although the stimulus‐driven attentional spreading is modulated by space‐based visual selective attention, attending to visual modality per se is more likely to be the endogenous determinant of the stimulus‐driven attentional spreading.
Collapse
Affiliation(s)
- Song Zhao
- Department of Psychology, School of Education, Soochow University, Suzhou, Jiangsu, China.,Department of English, School of Foreign Languages, Soochow University, Suzhou, Jiangsu, China
| | - Yang Li
- Department of Psychology, School of Education, Soochow University, Suzhou, Jiangsu, China
| | - Chongzhi Wang
- Department of Psychology, School of Education, Soochow University, Suzhou, Jiangsu, China
| | - Chengzhi Feng
- Department of Psychology, School of Education, Soochow University, Suzhou, Jiangsu, China
| | - Wenfeng Feng
- Department of Psychology, School of Education, Soochow University, Suzhou, Jiangsu, China.,Research Center for Psychology and Behavioral Sciences, Soochow University, Suzhou, Jiangsu, China
| |
Collapse
|
30
|
Gijbels L, Yeatman JD, Lalonde K, Lee AKC. Audiovisual Speech Processing in Relationship to Phonological and Vocabulary Skills in First Graders. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:5022-5040. [PMID: 34735292 PMCID: PMC9150669 DOI: 10.1044/2021_jslhr-21-00196] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 07/06/2021] [Accepted: 08/11/2021] [Indexed: 06/13/2023]
Abstract
PURPOSE It is generally accepted that adults use visual cues to improve speech intelligibility in noisy environments, but findings regarding visual speech benefit in children are mixed. We explored factors that contribute to audiovisual (AV) gain in young children's speech understanding. We examined whether there is an AV benefit to speech-in-noise recognition in children in first grade and if visual salience of phonemes influences their AV benefit. We explored if individual differences in AV speech enhancement could be explained by vocabulary knowledge, phonological awareness, or general psychophysical testing performance. METHOD Thirty-seven first graders completed online psychophysical experiments. We used an online single-interval, four-alternative forced-choice picture-pointing task with age-appropriate consonant-vowel-consonant words to measure auditory-only, visual-only, and AV word recognition in noise at -2 and -8 dB SNR. We obtained standard measures of vocabulary and phonological awareness and included a general psychophysical test to examine correlations with AV benefits. RESULTS We observed a significant overall AV gain among children in first grade. This effect was mainly attributed to the benefit at -8 dB SNR, for visually distinct targets. Individual differences were not explained by any of the child variables. Boys showed lower auditory-only performances, leading to significantly larger AV gains. CONCLUSIONS This study shows AV benefit, of distinctive visual cues, to word recognition in challenging noisy conditions in first graders. The cognitive and linguistic constraints of the task may have minimized the impact of individual differences of vocabulary and phonological awareness on AV benefit. The gender difference should be studied on a larger sample and age range.
Collapse
Affiliation(s)
- Liesbeth Gijbels
- Department of Speech & Hearing Sciences, University of Washington, Seattle
- Institute for Learning & Brain Sciences, University of Washington, Seattle
| | - Jason D. Yeatman
- Division of Developmental-Behavioral Pediatrics, School of Medicine, Stanford University, CA
- Graduate School of Education, Stanford University, CA
| | - Kaylah Lalonde
- Boys Town National Research Hospital, Center for Hearing Research, Omaha, NE
| | - Adrian K. C. Lee
- Department of Speech & Hearing Sciences, University of Washington, Seattle
- Institute for Learning & Brain Sciences, University of Washington, Seattle
| |
Collapse
|
31
|
Beadle J, Kim J, Davis C. Effects of Age and Uncertainty on the Visual Speech Benefit in Noise. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:5041-5060. [PMID: 34762813 DOI: 10.1044/2021_jslhr-20-00495] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
PURPOSE Listeners understand significantly more speech in noise when the talker's face can be seen (visual speech) in comparison to an auditory-only baseline (a visual speech benefit). This study investigated whether the visual speech benefit is reduced when the correspondence between auditory and visual speech is uncertain and whether any reduction is affected by listener age (older vs. younger) and how severe the auditory signal is masked. METHOD Older and younger adults completed a speech recognition in noise task that included an auditory-only condition and four auditory-visual (AV) conditions in which one, two, four, or six silent talking face videos were presented. One face always matched the auditory signal; the other face(s) did not. Auditory speech was presented in noise at -6 and -1 dB signal-to-noise ratio (SNR). RESULTS When the SNR was -6 dB, for both age groups, the standard-sized visual speech benefit reduced as more talking faces were presented. When the SNR was -1 dB, younger adults received the standard-sized visual speech benefit even when two talking faces were presented, whereas older adults did not. CONCLUSIONS The size of the visual speech benefit obtained by older adults was always smaller when AV correspondence was uncertain; this was not the case for younger adults. Difficulty establishing AV correspondence may be a factor that limits older adults' speech recognition in noisy AV environments. Supplemental Material https://doi.org/10.23641/asha.16879549.
Collapse
Affiliation(s)
- Julie Beadle
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Penrith, New South Wales, Australia
- The HEARing Cooperative Research Centre, Carlton, Victoria, Australia
| | - Jeesun Kim
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Penrith, New South Wales, Australia
| | - Chris Davis
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Penrith, New South Wales, Australia
- The HEARing Cooperative Research Centre, Carlton, Victoria, Australia
| |
Collapse
|
32
|
Wahn B, Schmitz L, Kingstone A, Böckler-Raettig A. When eyes beat lips: speaker gaze affects audiovisual integration in the McGurk illusion. PSYCHOLOGICAL RESEARCH 2021; 86:1930-1943. [PMID: 34854983 PMCID: PMC9363401 DOI: 10.1007/s00426-021-01618-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Accepted: 11/10/2021] [Indexed: 11/26/2022]
Abstract
Eye contact is a dynamic social signal that captures attention and plays a critical role in human communication. In particular, direct gaze often accompanies communicative acts in an ostensive function: a speaker directs her gaze towards the addressee to highlight the fact that this message is being intentionally communicated to her. The addressee, in turn, integrates the speaker’s auditory and visual speech signals (i.e., her vocal sounds and lip movements) into a unitary percept. It is an open question whether the speaker’s gaze affects how the addressee integrates the speaker’s multisensory speech signals. We investigated this question using the classic McGurk illusion, an illusory percept created by presenting mismatching auditory (vocal sounds) and visual information (speaker’s lip movements). Specifically, we manipulated whether the speaker (a) moved his eyelids up/down (i.e., open/closed his eyes) prior to speaking or did not show any eye motion, and (b) spoke with open or closed eyes. When the speaker’s eyes moved (i.e., opened or closed) before an utterance, and when the speaker spoke with closed eyes, the McGurk illusion was weakened (i.e., addressees reported significantly fewer illusory percepts). In line with previous research, this suggests that motion (opening or closing), as well as the closed state of the speaker’s eyes, captured addressees’ attention, thereby reducing the influence of the speaker’s lip movements on the addressees’ audiovisual integration process. Our findings reaffirm the power of speaker gaze to guide attention, showing that its dynamics can modulate low-level processes such as the integration of multisensory speech signals.
Collapse
Affiliation(s)
- Basil Wahn
- Department of Psychology, Leibniz Universität Hannover, Hannover, Germany.
| | - Laura Schmitz
- Institute of Sports Science, Leibniz Universität Hannover, Hannover, Germany
| | - Alan Kingstone
- Department of Psychology, University of British Columbia, Vancouver, BC, Canada
| | | |
Collapse
|
33
|
Zhao T, Hu A, Su R, Lyu C, Wang L, Yan N. Phonetic versus spatial processes during motor-oriented imitations of visuo-labial and visuo-lingual speech: A functional near-infrared spectroscopy study. Eur J Neurosci 2021; 55:154-174. [PMID: 34854143 DOI: 10.1111/ejn.15550] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Revised: 10/21/2021] [Accepted: 11/23/2021] [Indexed: 12/28/2022]
Abstract
While a large amount of research has studied the facilitation of visual speech on auditory speech recognition, few have investigated the processing of visual speech gestures in motor-oriented tasks that focus on the spatial and motor features of the articulator actions instead of the phonetic features of auditory and visual speech. The current study examined the engagement of spatial and phonetic processing of visual speech in a motor-oriented speech imitation task. Functional near-infrared spectroscopy (fNIRS) was used to measure the haemodynamic activities related to spatial processing and audiovisual integration in the superior parietal lobe (SPL) and the posterior superior/middle temporal gyrus (pSTG/pMTG) respectively. In addition, visuo-labial and visuo-lingual speech were compared with examine the influence of visual familiarity and audiovisual association on the processes in question. fNIRS revealed significant activations in the SPL but found no supra-additive audiovisual activations in the pSTG/pMTG, suggesting that the processing of audiovisual speech stimuli was primarily focused on spatial processes related to action comprehension and preparation, whereas phonetic processes related to audiovisual integration was minimal. Comparisons between visuo-labial and visuo-lingual speech imitations revealed no significant difference in the activation of the SPL or the pSTG/pMTG, suggesting that a higher degree of visual familiarity and audiovisual association did not significantly influence how visuo-labial speech was processed compared with visuo-lingual speech. The current study offered insights on the pattern of visual-speech processing under a motor-oriented task objective and provided further evidence for the modulation of multimodal speech integration by voluntary selective attention and task objective.
Collapse
Affiliation(s)
- Tinghao Zhao
- CAS Key Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.,Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Anming Hu
- Department of Rehabilitation Medicine, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Rongfeng Su
- CAS Key Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.,Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Chengchen Lyu
- Institute of Software, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Lan Wang
- CAS Key Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.,Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Nan Yan
- CAS Key Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.,Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| |
Collapse
|
34
|
Fleming JT, Maddox RK, Shinn-Cunningham BG. Spatial alignment between faces and voices improves selective attention to audio-visual speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:3085. [PMID: 34717460 DOI: 10.1121/10.0006415] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 09/01/2021] [Indexed: 06/13/2023]
Abstract
The ability to see a talker's face improves speech intelligibility in noise, provided that the auditory and visual speech signals are approximately aligned in time. However, the importance of spatial alignment between corresponding faces and voices remains unresolved, particularly in multi-talker environments. In a series of online experiments, we investigated this using a task that required participants to selectively attend a target talker in noise while ignoring a distractor talker. In experiment 1, we found improved task performance when the talkers' faces were visible, but only when corresponding faces and voices were presented in the same hemifield (spatially aligned). In experiment 2, we tested for possible influences of eye position on this result. In auditory-only conditions, directing gaze toward the distractor voice reduced performance, but this effect could not fully explain the cost of audio-visual (AV) spatial misalignment. Lowering the signal-to-noise ratio (SNR) of the speech from +4 to -4 dB increased the magnitude of the AV spatial alignment effect (experiment 3), but accurate closed-set lipreading caused a floor effect that influenced results at lower SNRs (experiment 4). Taken together, these results demonstrate that spatial alignment between faces and voices contributes to the ability to selectively attend AV speech.
Collapse
Affiliation(s)
- Justin T Fleming
- Speech and Hearing Bioscience and Technology Program, Harvard University, 243 Charles Street, Boston, Massachusetts 02114, USA
| | - Ross K Maddox
- Department of Biomedical Engineering, University of Rochester, 430 Elmwood Avenue, Rochester, New York 14620, USA
| | - Barbara G Shinn-Cunningham
- Neuroscience Institute, Carnegie Mellon University, 4825 Frew Street, Pittsburgh, Pennsylvania 15213, USA
| |
Collapse
|
35
|
Banks B, Gowen E, Munro KJ, Adank P. Eye Gaze and Perceptual Adaptation to Audiovisual Degraded Speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:3432-3445. [PMID: 34463528 DOI: 10.1044/2021_jslhr-21-00106] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Purpose Visual cues from a speaker's face may benefit perceptual adaptation to degraded speech, but current evidence is limited. We aimed to replicate results from previous studies to establish the extent to which visual speech cues can lead to greater adaptation over time, extending existing results to a real-time adaptation paradigm (i.e., without a separate training period). A second aim was to investigate whether eye gaze patterns toward the speaker's mouth were related to better perception, hypothesizing that listeners who looked more at the speaker's mouth would show greater adaptation. Method A group of listeners (n = 30) was presented with 90 noise-vocoded sentences in audiovisual format, whereas a control group (n = 29) was presented with the audio signal only. Recognition accuracy was measured throughout and eye tracking was used to measure fixations toward the speaker's eyes and mouth in the audiovisual group. Results Previous studies were partially replicated: The audiovisual group had better recognition throughout and adapted slightly more rapidly, but both groups showed an equal amount of improvement overall. Longer fixations on the speaker's mouth in the audiovisual group were related to better overall accuracy. An exploratory analysis further demonstrated that the duration of fixations to the speaker's mouth decreased over time. Conclusions The results suggest that visual cues may not benefit adaptation to degraded speech as much as previously thought. Longer fixations on a speaker's mouth may play a role in successfully decoding visual speech cues; however, this will need to be confirmed in future research to fully understand how patterns of eye gaze are related to audiovisual speech recognition. All materials, data, and code are available at https://osf.io/2wqkf/.
Collapse
Affiliation(s)
- Briony Banks
- Division of Neuroscience and Experimental Psychology, Faculty of Biology, Medicine and Health, The University of Manchester, United Kingdom
| | - Emma Gowen
- Division of Neuroscience and Experimental Psychology, Faculty of Biology, Medicine and Health, The University of Manchester, United Kingdom
| | - Kevin J Munro
- Manchester Centre for Audiology and Deafness, Faculty of Biology, Medicine and Health, The University of Manchester, United Kingdom
- Manchester University NHS Foundation Trust, Manchester Academic Health Science Centre, United Kingdom
| | - Patti Adank
- Speech, Hearing and Phonetic Sciences, University College London, United Kingdom
| |
Collapse
|
36
|
van de Rijt LPH, van Opstal AJ, van Wanrooij MM. Multisensory Integration-Attention Trade-Off in Cochlear-Implanted Deaf Individuals. Front Neurosci 2021; 15:683804. [PMID: 34393707 PMCID: PMC8358073 DOI: 10.3389/fnins.2021.683804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Accepted: 06/21/2021] [Indexed: 11/13/2022] Open
Abstract
The cochlear implant (CI) allows profoundly deaf individuals to partially recover hearing. Still, due to the coarse acoustic information provided by the implant, CI users have considerable difficulties in recognizing speech, especially in noisy environments. CI users therefore rely heavily on visual cues to augment speech recognition, more so than normal-hearing individuals. However, it is unknown how attention to one (focused) or both (divided) modalities plays a role in multisensory speech recognition. Here we show that unisensory speech listening and reading were negatively impacted in divided-attention tasks for CI users—but not for normal-hearing individuals. Our psychophysical experiments revealed that, as expected, listening thresholds were consistently better for the normal-hearing, while lipreading thresholds were largely similar for the two groups. Moreover, audiovisual speech recognition for normal-hearing individuals could be described well by probabilistic summation of auditory and visual speech recognition, while CI users were better integrators than expected from statistical facilitation alone. Our results suggest that this benefit in integration comes at a cost. Unisensory speech recognition is degraded for CI users when attention needs to be divided across modalities. We conjecture that CI users exhibit an integration-attention trade-off. They focus solely on a single modality during focused-attention tasks, but need to divide their limited attentional resources in situations with uncertainty about the upcoming stimulus modality. We argue that in order to determine the benefit of a CI for speech recognition, situational factors need to be discounted by presenting speech in realistic or complex audiovisual environments.
Collapse
Affiliation(s)
- Luuk P H van de Rijt
- Department of Otorhinolaryngology, Donders Institute for Brain, Cognition and Behaviour, Radboudumc, Nijmegen, Netherlands
| | - A John van Opstal
- Department of Biophysics, Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Marc M van Wanrooij
- Department of Biophysics, Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| |
Collapse
|
37
|
Trotter AS, Banks B, Adank P. The Relevance of the Availability of Visual Speech Cues During Adaptation to Noise-Vocoded Speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:2513-2528. [PMID: 34161748 DOI: 10.1044/2021_jslhr-20-00575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Purpose This study first aimed to establish whether viewing specific parts of the speaker's face (eyes or mouth), compared to viewing the whole face, affected adaptation to distorted noise-vocoded sentences. Second, this study also aimed to replicate results on processing of distorted speech from lab-based experiments in an online setup. Method We monitored recognition accuracy online while participants were listening to noise-vocoded sentences. We first established if participants were able to perceive and adapt to audiovisual four-band noise-vocoded sentences when the entire moving face was visible (AV Full). Four further groups were then tested: a group in which participants viewed the moving lower part of the speaker's face (AV Mouth), a group in which participants only see the moving upper part of the face (AV Eyes), a group in which participants could not see the moving lower or upper face (AV Blocked), and a group in which participants saw an image of a still face (AV Still). Results Participants repeated around 40% of the key words correctly and adapted during the experiment, but only when the moving mouth was visible. In contrast, performance was at floor level, and no adaptation took place, in conditions when the moving mouth was occluded. Conclusions The results show the importance of being able to observe relevant visual speech information from the speaker's mouth region, but not the eyes/upper face region, when listening and adapting to distorted sentences online. Second, the results also demonstrated that it is feasible to run speech perception and adaptation studies online, but that not all findings reported for lab studies replicate. Supplemental Material https://doi.org/10.23641/asha.14810523.
Collapse
Affiliation(s)
- Antony S Trotter
- Speech, Hearing and Phonetic Sciences, University College London, United Kingdom
| | - Briony Banks
- Department of Psychology, Lancaster University, United Kingdom
| | - Patti Adank
- Speech, Hearing and Phonetic Sciences, University College London, United Kingdom
| |
Collapse
|
38
|
Potential of Augmented Reality Platforms to Improve Individual Hearing Aids and to Support More Ecologically Valid Research. Ear Hear 2021; 41 Suppl 1:140S-146S. [PMID: 33105268 PMCID: PMC7676615 DOI: 10.1097/aud.0000000000000961] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
An augmented reality (AR) platform combines several technologies in a system that can render individual “digital objects” that can be manipulated for a given purpose. In the audio domain, these may, for example, be generated by speaker separation, noise suppression, and signal enhancement. Access to the “digital objects” could be used to augment auditory objects that the user wants to hear better. Such AR platforms in conjunction with traditional hearing aids may contribute to closing the gap for people with hearing loss through multimodal sensor integration, leveraging extensive current artificial intelligence research, and machine-learning frameworks. This could take the form of an attention-driven signal enhancement and noise suppression platform, together with context awareness, which would improve the interpersonal communication experience in complex real-life situations. In that sense, an AR platform could serve as a frontend to current and future hearing solutions. The AR device would enhance the signals to be attended, but the hearing amplification would still be handled by hearing aids. In this article, suggestions are made about why AR platforms may offer ideal affordances to compensate for hearing loss, and how research-focused AR platforms could help toward better understanding of the role of hearing in everyday life.
Collapse
|
39
|
Llorach G, Kirschner F, Grimm G, Zokoll MA, Wagener KC, Hohmann V. Development and evaluation of video recordings for the OLSA matrix sentence test. Int J Audiol 2021; 61:311-321. [PMID: 34109902 DOI: 10.1080/14992027.2021.1930205] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
OBJECTIVE The aim was to create and validate an audiovisual version of the German matrix sentence test (MST), which uses the existing audio-only speech material. DESIGN Video recordings were recorded and dubbed with the audio of the existing German MST. The current study evaluates the MST in conditions including audio and visual modalities, speech in quiet and noise, and open and closed-set response formats. SAMPLE One female talker recorded repetitions of the German MST sentences. Twenty-eight young normal-hearing participants completed the evaluation study. RESULTS The audiovisual benefit in quiet was 7.0 dB in sound pressure level (SPL). In noise, the audiovisual benefit was 4.9 dB in signal-to-noise ratio (SNR). Speechreading scores ranged from 0% to 84% speech reception in visual-only sentences (mean = 50%). Audiovisual speech reception thresholds (SRTs) had a larger standard deviation than audio-only SRTs. Audiovisual SRTs improved successively with increasing number of lists performed. The final video recordings are openly available. CONCLUSIONS The video material achieved similar results as the literature in terms of gross speech intelligibility, despite the inherent asynchronies of dubbing. Due to ceiling effects, adaptive procedures targeting 80% intelligibility should be used. At least one or two training lists should be performed.
Collapse
Affiliation(s)
- Gerard Llorach
- Hörzentrum Oldenburg GmbH, Oldenburg, Germany.,Cluster of Excellence Hearing4All, Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany.,Auditory Signal Processing, Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany
| | - Frederike Kirschner
- Cluster of Excellence Hearing4All, Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany.,Auditory Signal Processing, Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany
| | - Giso Grimm
- Cluster of Excellence Hearing4All, Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany.,Auditory Signal Processing, Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany
| | - Melanie A Zokoll
- Hörzentrum Oldenburg GmbH, Oldenburg, Germany.,Cluster of Excellence Hearing4All, Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany
| | - Kirsten C Wagener
- Hörzentrum Oldenburg GmbH, Oldenburg, Germany.,Cluster of Excellence Hearing4All, Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany.,Hörtech gGmbH, Oldenburg, Germany
| | - Volker Hohmann
- Hörzentrum Oldenburg GmbH, Oldenburg, Germany.,Cluster of Excellence Hearing4All, Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany.,Auditory Signal Processing, Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
40
|
Ricketts TA, Picou EM. Symmetrical and asymmetrical directional benefits are present for talkers at the front and side. Int J Audiol 2021; 61:177-186. [PMID: 34106803 DOI: 10.1080/14992027.2021.1931488] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
OBJECTIVE The purpose of the study was to examine the effects of symmetrical and asymmetrical directional microphone settings on speech recognition, localisation and microphone preference in listening conditions with on- and off-axis talkers. DESIGN A within-subjects repeated-measure evaluation of three hearing aid microphone settings (bilateral omnidirectional, bilateral directional, asymmetrical directional) was completed in a moderately reverberant laboratory. An exploratory analysis of the potential relationship between microphone preference and unaided measures was also completed. STUDY SAMPLE Twenty adult listeners with mild to moderately severe bilateral hearing loss participated. RESULTS The directional and asymmetric microphone settings resulted in equivalent benefits for sentence recognition in noise, word recall, and localisation speed regardless of the speech loudspeaker location (on- or off-axis). However, localisation accuracy was significantly worse with the asymmetric fitting than the directional setting when speech was presented from the rear hemisphere. Listeners who always preferred directional microphones had significantly poorer unaided speech recognition than those who preferred the omnidirectional setting for one or more listening condition. CONCLUSIONS Benefits from directional and asymmetric processing were small in the current study, but generally similar to each other. Unaided speech recognition in noise performance may have utility as a clinical predictor of preference for directional processing.
Collapse
Affiliation(s)
- Todd A Ricketts
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Erin M Picou
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
41
|
Chen H, Du J, Hu Y, Dai LR, Yin BC, Lee CH. Correlating subword articulation with lip shapes for embedding aware audio-visual speech enhancement. Neural Netw 2021; 143:171-182. [PMID: 34157642 DOI: 10.1016/j.neunet.2021.06.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Revised: 04/17/2021] [Accepted: 06/03/2021] [Indexed: 11/26/2022]
Abstract
In this paper, we propose a visual embedding approach to improve embedding aware speech enhancement (EASE) by synchronizing visual lip frames at the phone and place of articulation levels. We first extract visual embedding from lip frames using a pre-trained phone or articulation place recognizer for visual-only EASE (VEASE). Next, we extract audio-visual embedding from noisy speech and lip frames in an information intersection manner, utilizing a complementarity of audio and visual features for multi-modal EASE (MEASE). Experiments on the TCD-TIMIT corpus corrupted by simulated additive noises show that our proposed subword based VEASE approach is more effective than conventional embedding at the word level. Moreover, visual embedding at the articulation place level, leveraging upon a high correlation between place of articulation and lip shapes, demonstrates an even better performance than that at the phone level. Finally the experiments establish that the proposed MEASE framework, incorporating both audio and visual embeddings, yields significantly better speech quality and intelligibility than those obtained with the best visual-only and audio-only EASE systems.
Collapse
Affiliation(s)
- Hang Chen
- National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, Hefei, Anhui, China
| | - Jun Du
- National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, Hefei, Anhui, China.
| | - Yu Hu
- National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, Hefei, Anhui, China
| | - Li-Rong Dai
- National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, Hefei, Anhui, China
| | - Bao-Cai Yin
- iFlytek Research, iFlytek Co., Ltd., Hefei, Anhui, China
| | - Chin-Hui Lee
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| |
Collapse
|
42
|
Pinsonnault-Skvarenina A, de Lacerda ABM, Hotton M, Gagné JP. Communication With Older Adults in Times of a Pandemic: Practical Suggestions for the Health Care Professionals. Public Health Rev 2021; 42:1604046. [PMID: 34168899 PMCID: PMC8190655 DOI: 10.3389/phrs.2021.1604046] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Accepted: 04/01/2021] [Indexed: 01/22/2023] Open
Abstract
In order to limit the spread of the coronavirus, several protective measures have been put in place in the community, in private and public residences and in health care centers. Some measures have a negative impact on communication. They include physical distancing, the use of face masks and shields as well as the increased use of telephone and videoconferencing for distance communication. The effects of COVID-19 are particularly harsh on older adults. Consequently, older adults, especially those with hearing loss, are particularly at risk of experiencing communication breakdowns and increased social isolation. Health care professionals should learn about and be encouraged to use communication strategies to maintain good interactions with their patients. This article proposes practical suggestions to health professionals who interact with older adults, especially those who have difficulty understanding speech. The goal of this article is to inform on the prevalence of hearing loss, the hearing difficulties experienced by older adults, the manifestations of hearing problems, the effects of pandemic protection measures on communication and the strategies that can be used to optimize professional-patient communication during a pandemic.
Collapse
Affiliation(s)
- Alexis Pinsonnault-Skvarenina
- School of Speech-Language Pathology and Audiology, Faculty of Medicine, University of Montreal, Montreal, QC, Canada.,Center for Interdisciplinary Research in Rehabilitation of Greater Montreal (CRIR), Montreal, QC, Canada
| | - Adriana Bender Moreira de Lacerda
- School of Speech-Language Pathology and Audiology, Faculty of Medicine, University of Montreal, Montreal, QC, Canada.,Research Center of the Institut universitaire de gériatrie de Montréal (CRIUGM), Montréal, QC, Canada
| | - Mathieu Hotton
- School of Rehabilitation, Faculty of Medecine, Université Laval, Québec, QC, Canada.,Center for Interdisciplinary Research in Rehabilitation and Social Integration (CIRRIS), Québec, QC, Canada
| | - Jean-Pierre Gagné
- School of Speech-Language Pathology and Audiology, Faculty of Medicine, University of Montreal, Montreal, QC, Canada.,Center for Interdisciplinary Research in Rehabilitation of Greater Montreal (CRIR), Montreal, QC, Canada.,Research Center of the Institut universitaire de gériatrie de Montréal (CRIUGM), Montréal, QC, Canada.,Titulaire de la Chaire de la Fondation Caroline-Durand en audition et vieillissement de l'Université de Montréal, Montréal, Québec, QC, Canada
| |
Collapse
|
43
|
Lewkowicz DJ, Schmuckler M, Agrawal V. The multisensory cocktail party problem in adults: Perceptual segregation of talking faces on the basis of audiovisual temporal synchrony. Cognition 2021; 214:104743. [PMID: 33940250 DOI: 10.1016/j.cognition.2021.104743] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 04/16/2021] [Accepted: 04/21/2021] [Indexed: 10/21/2022]
Abstract
Social interactions often involve a cluttered multisensory scene consisting of multiple talking faces. We investigated whether audiovisual temporal synchrony can facilitate perceptual segregation of talking faces. Participants either saw four identical or four different talking faces producing temporally jittered versions of the same visible speech utterance and heard the audible version of the same speech utterance. The audible utterance was either synchronized with the visible utterance produced by one of the talking faces or not synchronized with any of them. Eye tracking indicated that participants exhibited a marked preference for the synchronized talking face, that they gazed more at the mouth than the eyes overall, that they gazed more at the eyes of an audiovisually synchronized than a desynchronized talking face, and that they gazed more at the mouth when all talking faces were audiovisually desynchronized. These findings demonstrate that audiovisual temporal synchrony plays a major role in perceptual segregation of multisensory clutter and that adults rely on differential scanning strategies of a talker's eyes and mouth to discover sources of multisensory coherence.
Collapse
Affiliation(s)
- David J Lewkowicz
- Haskins Laboratories, New Haven, CT, USA; Yale Child Study Center, New Haven, CT, USA.
| | - Mark Schmuckler
- Department of Psychology, University of Toronto at Scarborough, Toronto, Canada
| | | |
Collapse
|
44
|
Jones SA, Noppeney U. Ageing and multisensory integration: A review of the evidence, and a computational perspective. Cortex 2021; 138:1-23. [PMID: 33676086 DOI: 10.1016/j.cortex.2021.02.001] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 01/23/2021] [Accepted: 02/02/2021] [Indexed: 11/29/2022]
Abstract
The processing of multisensory signals is crucial for effective interaction with the environment, but our ability to perform this vital function changes as we age. In the first part of this review, we summarise existing research into the effects of healthy ageing on multisensory integration. We note that age differences vary substantially with the paradigms and stimuli used: older adults often receive at least as much benefit (to both accuracy and response times) as younger controls from congruent multisensory stimuli, but are also consistently more negatively impacted by the presence of intersensory conflict. In the second part, we outline a normative Bayesian framework that provides a principled and computationally informed perspective on the key ingredients involved in multisensory perception, and how these are affected by ageing. Applying this framework to the existing literature, we conclude that changes to sensory reliability, prior expectations (together with attentional control), and decisional strategies all contribute to the age differences observed. However, we find no compelling evidence of any age-related changes to the basic inference mechanisms involved in multisensory perception.
Collapse
Affiliation(s)
- Samuel A Jones
- The Staffordshire Centre for Psychological Research, Staffordshire University, Stoke-on-Trent, UK.
| | - Uta Noppeney
- Donders Institute for Brain, Cognition & Behaviour, Radboud University, Nijmegen, the Netherlands.
| |
Collapse
|
45
|
Vannuscorps G, Andres M, Carneiro SP, Rombaux E, Caramazza A. Typically Efficient Lipreading without Motor Simulation. J Cogn Neurosci 2021; 33:611-621. [PMID: 33416443 DOI: 10.1162/jocn_a_01666] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
All it takes is a face-to-face conversation in a noisy environment to realize that viewing a speaker's lip movements contributes to speech comprehension. What are the processes underlying the perception and interpretation of visual speech? Brain areas that control speech production are also recruited during lipreading. This finding raises the possibility that lipreading may be supported, at least to some extent, by a covert unconscious imitation of the observed speech movements in the observer's own speech motor system-a motor simulation. However, whether, and if so to what extent, motor simulation contributes to visual speech interpretation remains unclear. In two experiments, we found that several participants with congenital facial paralysis were as good at lipreading as the control population and performed these tasks in a way that is qualitatively similar to the controls despite severely reduced or even completely absent lip motor representations. Although it remains an open question whether this conclusion generalizes to other experimental conditions and to typically developed participants, these findings considerably narrow the space of hypothesis for a role of motor simulation in lipreading. Beyond its theoretical significance in the field of speech perception, this finding also calls for a re-examination of the more general hypothesis that motor simulation underlies action perception and interpretation developed in the frameworks of motor simulation and mirror neuron hypotheses.
Collapse
|
46
|
Lalonde K, Werner LA. Development of the Mechanisms Underlying Audiovisual Speech Perception Benefit. Brain Sci 2021; 11:49. [PMID: 33466253 PMCID: PMC7824772 DOI: 10.3390/brainsci11010049] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2020] [Revised: 12/30/2020] [Accepted: 12/30/2020] [Indexed: 02/07/2023] Open
Abstract
The natural environments in which infants and children learn speech and language are noisy and multimodal. Adults rely on the multimodal nature of speech to compensate for noisy environments during speech communication. Multiple mechanisms underlie mature audiovisual benefit to speech perception, including reduced uncertainty as to when auditory speech will occur, use of correlations between the amplitude envelope of auditory and visual signals in fluent speech, and use of visual phonetic knowledge for lexical access. This paper reviews evidence regarding infants' and children's use of temporal and phonetic mechanisms in audiovisual speech perception benefit. The ability to use temporal cues for audiovisual speech perception benefit emerges in infancy. Although infants are sensitive to the correspondence between auditory and visual phonetic cues, the ability to use this correspondence for audiovisual benefit may not emerge until age four. A more cohesive account of the development of audiovisual speech perception may follow from a more thorough understanding of the development of sensitivity to and use of various temporal and phonetic cues.
Collapse
Affiliation(s)
- Kaylah Lalonde
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, NE 68131, USA
| | - Lynne A. Werner
- Department of Speech and Hearing Sciences, University of Washington, Seattle, WA 98105, USA;
| |
Collapse
|
47
|
|
48
|
Atilgan H, Bizley JK. Training enhances the ability of listeners to exploit visual information for auditory scene analysis. Cognition 2020; 208:104529. [PMID: 33373937 PMCID: PMC7868888 DOI: 10.1016/j.cognition.2020.104529] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Revised: 11/24/2020] [Accepted: 11/25/2020] [Indexed: 11/25/2022]
Abstract
The ability to use temporal relationships between cross-modal cues facilitates perception and behavior. Previously we observed that temporally correlated changes in the size of a visual stimulus and the intensity in an auditory stimulus influenced the ability of listeners to perform an auditory selective attention task (Maddox, Atilgan, Bizley, & Lee, 2015). Participants detected timbral changes in a target sound while ignoring those in a simultaneously presented masker. When the visual stimulus was temporally coherent with the target sound, performance was significantly better than when the visual stimulus was temporally coherent with the masker, despite the visual stimulus conveying no task-relevant information. Here, we trained observers to detect audiovisual temporal coherence and asked whether this changed the way in which they were able to exploit visual information in the auditory selective attention task. We observed that after training, participants were able to benefit from temporal coherence between the visual stimulus and both the target and masker streams, relative to the condition in which the visual stimulus was coherent with neither sound. However, we did not observe such changes in a second group that were trained to discriminate modulation rate differences between temporally coherent audiovisual streams, although they did show an improvement in their overall performance. A control group did not change their performance between pretest and post-test and did not change how they exploited visual information. These results provide insights into how crossmodal experience may optimize multisensory integration.
Collapse
|
49
|
Lee HJ, Lee JM, Choi JY, Jung J. The Effects of Preoperative Audiovisual Speech Perception on the Audiologic Outcomes of Cochlear Implantation in Patients with Postlingual Deafness. Audiol Neurootol 2020; 26:149-156. [PMID: 33352550 DOI: 10.1159/000509969] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 07/06/2020] [Indexed: 11/19/2022] Open
Abstract
INTRODUCTION Patients with postlingual deafness usually depend on visual information for communication, and their lipreading ability could influence cochlear implantation (CI) outcomes. However, it is unclear whether preoperative visual dependency in postlingual deafness positively or negatively affects auditory rehabilitation after CI. Herein, we investigated the influence of preoperative audiovisual per-ception on CI outcomes. METHOD In this retrospective case-comparison study, 118 patients with postlingual deafness who underwent unilateral CI were enrolled. Evaluation of speech perception was performed under both audiovisual (AV) and audio-only (AO) conditions before and after CI. Before CI, the speech perception test was performed under hearing aid (HA)-assisted conditions. After CI, the speech perception test was performed under the CI-only condition. Only patients with a 10% or less preoperative AO speech perception score were included. RESULTS Multivariable regression analysis showed that age, gender, residual hearing, operation side, education level, and HA usage were not correlated with either postoperative AV (pAV) or AO (pAO) speech perception. However, duration of deafness showed a significant negative correlation with both pAO (p = 0.003) and pAV (p = 0.015) speech perceptions. Notably, the preoperative AV speech perception score was not correlated with pAO speech perception (R2 = 0.00134, p = 0.693) but was positively associated with pAV speech perception (R2 = 0.0731, p = 0.003). CONCLUSION Preoperative dependency on audiovisual information may positively influence pAV speech perception in patients with postlingual deafness.
Collapse
Affiliation(s)
- Hyun Jin Lee
- Department of Otolaryngology-Head and Neck Surgery, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - Jeon Mi Lee
- Department of Otorhinolaryngology, Ilsan Paik Hospital, Inje University College of Medicine, Goyang, Republic of Korea
| | - Jae Young Choi
- Department of Otorhinolaryngology, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Jinsei Jung
- Department of Otorhinolaryngology, Yonsei University College of Medicine, Seoul, Republic of Korea,
| |
Collapse
|
50
|
Dorman MF, Natale SC, Agrawal S. The Benefit of Remote and On-Ear Directional Microphone Technology Persists in the Presence of Visual Information. J Am Acad Audiol 2020; 32:39-44. [PMID: 33296930 DOI: 10.1055/s-0040-1718893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
BACKGROUND Both the Roger remote microphone and on-ear, adaptive beamforming technologies (e.g., Phonak UltraZoom) have been shown to improve speech understanding in noise for cochlear implant (CI) listeners when tested in audio-only (A-only) test environments. PURPOSE Our aim was to determine if adult and pediatric CI recipients benefited from these technologies in a more common environment-one in which both audio and visual cues were available and when overall performance was high. STUDY SAMPLE Ten adult CI listeners (Experiment 1) and seven pediatric CI listeners (Experiment 2) were tested. DESIGN Adults were tested in quiet and in two levels of noise (level 1 and level 2) in A-only and audio-visual (AV) environments. There were four device conditions: (1) an ear canal-level, omnidirectional microphone (T-mic) in quiet, (2) the T-mic in noise, (3) an adaptive directional mic (UltraZoom) in noise, and (4) a wireless, remote mic (Roger Pen) in noise. Pediatric listeners were tested in quiet and in level 1 noise in A-only and AV environments. The test conditions were: (1) a behind-the-ear level omnidirectional mic (processor mic) in quiet, (2) the processor mic in noise, (3) the T-mic in noise, and (4) the Roger Pen in noise. DATA COLLECTION AND ANALYSES In each test condition, sentence understanding was assessed (percent correct) and ease of listening ratings were obtained. The sentence understanding data were entered into repeated-measures analyses of variance. RESULTS For both adult and pediatric listeners in the AV test conditions in level 1 noise, performance with the Roger Pen was significantly higher than with the T-mic. For both populations, performance in level 1 noise with the Roger Pen approached the level of baseline performance in quiet. Ease of listening in noise was rated higher in the Roger Pen conditions than in the T-mic or processor mic conditions in both A-only and AV test conditions. CONCLUSION The Roger remote mic and on-ear directional mic technologies benefit both speech understanding and ease of listening in a realistic laboratory test environment and are likely do the same in real-world listening environments.
Collapse
Affiliation(s)
- Michael F Dorman
- Department of Speech and Hearing Science, Arizona State University, Tempe, Arizona
| | - Sarah Cook Natale
- Department of Speech and Hearing Science, Arizona State University, Tempe, Arizona
| | | |
Collapse
|