1
|
Sabesan S, Fragner A, Bench C, Drakopoulos F, Lesica NA. Large-scale electrophysiology and deep learning reveal distorted neural signal dynamics after hearing loss. eLife 2023; 12:e85108. [PMID: 37162188 PMCID: PMC10202456 DOI: 10.7554/elife.85108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 04/27/2023] [Indexed: 05/11/2023] Open
Abstract
Listeners with hearing loss often struggle to understand speech in noise, even with a hearing aid. To better understand the auditory processing deficits that underlie this problem, we made large-scale brain recordings from gerbils, a common animal model for human hearing, while presenting a large database of speech and noise sounds. We first used manifold learning to identify the neural subspace in which speech is encoded and found that it is low-dimensional and that the dynamics within it are profoundly distorted by hearing loss. We then trained a deep neural network (DNN) to replicate the neural coding of speech with and without hearing loss and analyzed the underlying network dynamics. We found that hearing loss primarily impacts spectral processing, creating nonlinear distortions in cross-frequency interactions that result in a hypersensitivity to background noise that persists even after amplification with a hearing aid. Our results identify a new focus for efforts to design improved hearing aids and demonstrate the power of DNNs as a tool for the study of central brain structures.
Collapse
Affiliation(s)
| | | | - Ciaran Bench
- Ear Institute, University College LondonLondonUnited Kingdom
| | | | | |
Collapse
|
2
|
Phatak SA, Zion DJ, Grant KW. Consonant Perception in Connected Syllables Spoken at a Conversational Syllabic Rate. Trends Hear 2023; 27:23312165231156673. [PMID: 36794551 PMCID: PMC9936395 DOI: 10.1177/23312165231156673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/17/2023] Open
Abstract
Closed-set consonant identification, measured using nonsense syllables, has been commonly used to investigate the encoding of speech cues in the human auditory system. Such tasks also evaluate the robustness of speech cues to masking from background noise and their impact on auditory-visual speech integration. However, extending the results of these studies to everyday speech communication has been a major challenge due to acoustic, phonological, lexical, contextual, and visual speech cue differences between consonants in isolated syllables and in conversational speech. In an attempt to isolate and address some of these differences, recognition of consonants spoken in multisyllabic nonsense phrases (e.g., aBaSHaGa spoken as /ɑbɑʃɑɡɑ/) produced at an approximately conversational syllabic rate was measured and compared with consonant recognition using Vowel-Consonant-Vowel bisyllables spoken in isolation. After accounting for differences in stimulus audibility using the Speech Intelligibility Index, consonants spoken in sequence at a conversational syllabic rate were found to be more difficult to recognize than those produced in isolated bisyllables. Specifically, place- and manner-of-articulation information was transmitted better in isolated nonsense syllables than for multisyllabic phrases. The contribution of visual speech cues to place-of-articulation information was also lower for consonants spoken in sequence at a conversational syllabic rate. These data imply that auditory-visual benefit based on models of feature complementarity from isolated syllable productions may over-estimate real-world benefit of integrating auditory and visual speech cues.
Collapse
Affiliation(s)
- Sandeep A. Phatak
- Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, Bethesda, MD, USA,Sandeep A. Phatak, Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, Bethesda, MD, USA.
| | - Danielle J. Zion
- Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, Bethesda, MD, USA
| | - Ken W. Grant
- Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, Bethesda, MD, USA
| |
Collapse
|
3
|
Butera IM, Stevenson RA, Gifford RH, Wallace MT. Visually biased Perception in Cochlear Implant Users: A Study of the McGurk and Sound-Induced Flash Illusions. Trends Hear 2023; 27:23312165221076681. [PMID: 37377212 PMCID: PMC10334005 DOI: 10.1177/23312165221076681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 12/08/2021] [Accepted: 01/10/2021] [Indexed: 06/29/2023] Open
Abstract
The reduction in spectral resolution by cochlear implants oftentimes requires complementary visual speech cues to facilitate understanding. Despite substantial clinical characterization of auditory-only speech measures, relatively little is known about the audiovisual (AV) integrative abilities that most cochlear implant (CI) users rely on for daily speech comprehension. In this study, we tested AV integration in 63 CI users and 69 normal-hearing (NH) controls using the McGurk and sound-induced flash illusions. To our knowledge, this study is the largest to-date measuring the McGurk effect in this population and the first that tests the sound-induced flash illusion (SIFI). When presented with conflicting AV speech stimuli (i.e., the phoneme "ba" dubbed onto the viseme "ga"), we found that 55 CI users (87%) reported a fused percept of "da" or "tha" on at least one trial. After applying an error correction based on unisensory responses, we found that among those susceptible to the illusion, CI users experienced lower fusion than controls-a result that was concordant with results from the SIFI where the pairing of a single circle flashing on the screen with multiple beeps resulted in fewer illusory flashes for CI users. While illusion perception in these two tasks appears to be uncorrelated among CI users, we identified a negative correlation in the NH group. Because neither illusion appears to provide further explanation of variability in CI outcome measures, further research is needed to determine how these findings relate to CI users' speech understanding, particularly in ecological listening conditions that are naturally multisensory.
Collapse
Affiliation(s)
- Iliza M. Butera
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN, USA
| | - Ryan A. Stevenson
- Department of Psychology, University of
Western Ontario, London, ON, Canada
- Brain and Mind Institute, University of
Western Ontario, London, ON, Canada
| | - René H. Gifford
- Department of Hearing and Speech
Sciences, Vanderbilt University, Nashville, TN, USA
| | - Mark T. Wallace
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN, USA
- Department of Hearing and Speech
Sciences, Vanderbilt University, Nashville, TN, USA
- Vanderbilt Kennedy Center, Vanderbilt
University Medical Center, Nashville, TN, USA
- Department of Psychology, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
4
|
Lu H, McKinney MF, Zhang T, Oxenham AJ. Investigating age, hearing loss, and background noise effects on speaker-targeted head and eye movements in three-way conversations. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:1889. [PMID: 33765809 DOI: 10.1121/10.0003707] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Accepted: 02/19/2021] [Indexed: 06/12/2023]
Abstract
Although beamforming algorithms for hearing aids can enhance performance, the wearer's head may not always face the target talker, potentially limiting real-world benefits. This study aimed to determine the extent to which eye tracking improves the accuracy of locating the current talker in three-way conversations and to test the hypothesis that eye movements become more likely to track the target talker with increasing background noise levels, particularly in older and/or hearing-impaired listeners. Conversations between a participant and two confederates were held around a small table in quiet and with background noise levels of 50, 60, and 70 dB sound pressure level, while the participant's eye and head movements were recorded. Ten young normal-hearing listeners were tested, along with ten older normal-hearing listeners and eight hearing-impaired listeners. Head movements generally undershot the talker's position by 10°-15°, but head and eye movements together predicted the talker's position well. Contrary to our original hypothesis, no major differences in listening behavior were observed between the groups or between noise levels, although the hearing-impaired listeners tended to spend less time looking at the current talker than the other groups, especially at the highest noise level.
Collapse
Affiliation(s)
- Hao Lu
- Department of Psychology, University of Minnesota, 75 East River Parkway, Minneapolis, Minnesota 55455, USA
| | - Martin F McKinney
- Starkey Hearing Technologies, 6700 Washington Avenue South, Eden Prairie, Minnesota 55344, USA
| | - Tao Zhang
- Starkey Hearing Technologies, 6700 Washington Avenue South, Eden Prairie, Minnesota 55344, USA
| | - Andrew J Oxenham
- Department of Psychology, University of Minnesota, 75 East River Parkway, Minneapolis, Minnesota 55455, USA
| |
Collapse
|
5
|
Polonenko MJ, Maddox RK. Exposing distinct subcortical components of the auditory brainstem response evoked by continuous naturalistic speech. eLife 2021; 10:62329. [PMID: 33594974 PMCID: PMC7946424 DOI: 10.7554/elife.62329] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Accepted: 02/16/2021] [Indexed: 12/21/2022] Open
Abstract
Speech processing is built upon encoding by the auditory nerve and brainstem, yet we know very little about how these processes unfold in specific subcortical structures. These structures are deep and respond quickly, making them difficult to study during ongoing speech. Recent techniques have begun to address this problem, but yield temporally broad responses with consequently ambiguous neural origins. Here, we describe a method that pairs re-synthesized ‘peaky’ speech with deconvolution analysis of electroencephalography recordings. We show that in adults with normal hearing the method quickly yields robust responses whose component waves reflect activity from distinct subcortical structures spanning auditory nerve to rostral brainstem. We further demonstrate the versatility of peaky speech by simultaneously measuring bilateral and ear-specific responses across different frequency bands and discuss the important practical considerations such as talker choice. The peaky speech method holds promise as a tool for investigating speech encoding and processing, and for clinical applications.
Collapse
Affiliation(s)
- Melissa J Polonenko
- Department of Neuroscience, University of Rochester, Rochester, United States.,Del Monte Institute for Neuroscience, University of Rochester, Rochester, United States.,Center for Visual Science, University of Rochester, Rochester, United States
| | - Ross K Maddox
- Department of Neuroscience, University of Rochester, Rochester, United States.,Del Monte Institute for Neuroscience, University of Rochester, Rochester, United States.,Center for Visual Science, University of Rochester, Rochester, United States.,Department of Biomedical Engineering, University of Rochester, Rochester, United States
| |
Collapse
|
6
|
Bernstein JGW, Venezia JH, Grant KW. Auditory and auditory-visual frequency-band importance functions for consonant recognition. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:3712. [PMID: 32486805 DOI: 10.1121/10.0001301] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Accepted: 05/05/2020] [Indexed: 06/11/2023]
Abstract
The relative importance of individual frequency regions for speech intelligibility has been firmly established for broadband auditory-only (AO) conditions. Yet, speech communication often takes place face-to-face. This study tested the hypothesis that under auditory-visual (AV) conditions, where visual information is redundant with high-frequency auditory cues, lower frequency regions will increase in relative importance compared to AO conditions. Frequency band-importance functions for consonants were measured for eight hearing-impaired and four normal-hearing listeners. Speech was filtered into four 1/3-octave bands each separated by an octave to minimize energetic masking. On each trial, the signal-to-noise ratio (SNR) in each band was selected randomly from a 10-dB range. AO and AV band-importance functions were estimated using three logistic-regression analyses: a primary model relating performance to the four independent SNRs; a control model that also included band-interaction terms; and a different set of four control models, each examining one band at a time. For both listener groups, the relative importance of the low-frequency bands increased under AV conditions, consistent with earlier studies using isolated speech bands. All three analyses showed similar results, indicating the absence of cross-band interactions. These results suggest that accurate prediction of AV speech intelligibility may require different frequency-importance functions than for AO conditions.
Collapse
Affiliation(s)
- Joshua G W Bernstein
- National Military Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, 4954 North Palmer Road, Bethesda, Maryland 20889, USA
| | - Jonathan H Venezia
- Veterans Affairs Loma Linda Healthcare System, 11201 Benton Street, Loma Linda, California 92357, USA
| | - Ken W Grant
- National Military Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, 4954 North Palmer Road, Bethesda, Maryland 20889, USA
| |
Collapse
|
7
|
Abstract
OBJECTIVES The present study investigated presentation modality differences in lexical encoding and working memory representations of spoken words of older, hearing-impaired adults. Two experiments were undertaken: a memory-scanning experiment and a stimulus gating experiment. The primary objective of experiment 1 was to determine whether memory encoding and retrieval and scanning speeds are different for easily identifiable words presented in auditory-visual (AV), auditory-only (AO), and visual-only (VO) modalities. The primary objective of experiment 2 was to determine if memory encoding and retrieval speed differences observed in experiment 1 could be attributed to the early availability of AV speech information compared with AO or VO conditions. DESIGN Twenty-six adults over age 60 years with bilateral mild to moderate sensorineural hearing loss participated in experiment 1, and 24 adults who took part in experiment 1 participated in experiment 2. An item recognition reaction-time paradigm (memory-scanning) was used in experiment 1 to measure (1) lexical encoding speed, that is, the speed at which an easily identifiable word was recognized and placed into working memory, and (2) retrieval speed, that is, the speed at which words were retrieved from memory and compared with similarly encoded words (memory scanning) presented in AV, AO, and VO modalities. Experiment 2 used a time-gated word identification task to test whether the time course of stimulus information available to participants predicted the modality-related memory encoding and retrieval speed results from experiment 1. RESULTS The results of experiment 1 revealed significant differences among the modalities with respect to both memory encoding and retrieval speed, with AV fastest and VO slowest. These differences motivated an examination of the time course of stimulus information available as a function of modality. Results from experiment 2 indicated the encoding and retrieval speed advantages for AV and AO words compared with VO words were mostly driven by the time course of stimulus information. The AV advantage seen in encoding and retrieval speeds is likely due to a combination of robust stimulus information available to the listener earlier in time and lower attentional demands compared with AO or VO encoding and retrieval. CONCLUSIONS Significant modality differences in lexical encoding and memory retrieval speeds were observed across modalities. The memory scanning speed advantage observed for AV compared with AO or VO modalities was strongly related to the time course of stimulus information. In contrast, lexical encoding and retrieval speeds for VO words could not be explained by the time-course of stimulus information alone. Working memory processes for the VO modality may be impacted by greater attentional demands and less information availability compared with the AV and AO modalities. Overall, these results support the hypothesis that the presentation modality for speech inputs (AV, AO, or VO) affects how older adult listeners with hearing loss encode, remember, and retrieve what they hear.
Collapse
|
8
|
Lalonde K, Werner LA. Perception of incongruent audiovisual English consonants. PLoS One 2019; 14:e0213588. [PMID: 30897109 PMCID: PMC6428273 DOI: 10.1371/journal.pone.0213588] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 02/25/2019] [Indexed: 11/21/2022] Open
Abstract
Causal inference—the process of deciding whether two incoming signals come from the same source—is an important step in audiovisual (AV) speech perception. This research explored causal inference and perception of incongruent AV English consonants. Nine adults were presented auditory, visual, congruent AV, and incongruent AV consonant-vowel syllables. Incongruent AV stimuli included auditory and visual syllables with matched vowels, but mismatched consonants. Open-set responses were collected. For most incongruent syllables, participants were aware of the mismatch between auditory and visual signals (59.04%) or reported the auditory syllable (33.73%). Otherwise, participants reported the visual syllable (1.13%) or some other syllable (6.11%). Statistical analyses were used to assess whether visual distinctiveness and place, voice, and manner features predicted responses. Mismatch responses occurred more when the auditory and visual consonants were visually distinct, when place and manner differed across auditory and visual consonants, and for consonants with high visual accuracy. Auditory responses occurred more when the auditory and visual consonants were visually similar, when place and manner were the same across auditory and visual stimuli, and with consonants produced further back in the mouth. Visual responses occurred more when voicing and manner were the same across auditory and visual stimuli, and for front and middle consonants. Other responses were variable, but typically matched the visual place, auditory voice, and auditory manner of the input. Overall, results indicate that causal inference and incongruent AV consonant perception depend on salience and reliability of auditory and visual inputs and degree of redundancy between auditory and visual inputs. A parameter-free computational model of incongruent AV speech perception based on unimodal confusions, with a causal inference rule, was applied. Data from the current study present an opportunity to test and improve the generalizability of current AV speech integration models.
Collapse
Affiliation(s)
- Kaylah Lalonde
- Department of Speech and Hearing Sciences, University of Washington, Seattle, Washington, United States of America
- * E-mail:
| | - Lynne A. Werner
- Department of Speech and Hearing Sciences, University of Washington, Seattle, Washington, United States of America
| |
Collapse
|
9
|
|
10
|
Hennequin A, Rochet-Capellan A, Gerber S, Dohen M. Does the Visual Channel Improve the Perception of Consonants Produced by Speakers of French With Down Syndrome? JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2018; 61:957-972. [PMID: 29635399 DOI: 10.1044/2017_jslhr-h-17-0112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2017] [Accepted: 12/08/2017] [Indexed: 06/08/2023]
Abstract
PURPOSE This work evaluates whether seeing the speaker's face could improve the speech intelligibility of adults with Down syndrome (DS). This is not straightforward because DS induces a number of anatomical and motor anomalies affecting the orofacial zone. METHOD A speech-in-noise perception test was used to evaluate the intelligibility of 16 consonants (Cs) produced in a vowel-consonant-vowel context (Vo = /a/) by 4 speakers with DS and 4 control speakers. Forty-eight naïve participants were asked to identify the stimuli in 3 modalities: auditory (A), visual (V), and auditory-visual (AV). The probability of correct responses was analyzed, as well as AV gain, confusions, and transmitted information as a function of modality and phonetic features. RESULTS The probability of correct response follows the trend AV > A > V, with smaller values for the DS than the control speakers in A and AV but not in V. This trend depended on the C: the V information particularly improved the transmission of place of articulation and to a lesser extent of manner, whereas voicing remained specifically altered in DS. CONCLUSIONS The results suggest that the V information is intact in the speech of people with DS and improves the perception of some phonetic features in Cs in a similar way as for control speakers. This result has implications for further studies, rehabilitation protocols, and specific training of caregivers. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.6002267.
Collapse
Affiliation(s)
| | | | - Silvain Gerber
- Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, 38000 Grenoble, France
| | - Marion Dohen
- Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, 38000 Grenoble, France
| |
Collapse
|
11
|
Abstract
The objective of this study was regarding sensory and cognitive interactions in older adults published since 2009, the approximate date of the most recent reviews on this topic. After an electronic database search of articles published in English since 2009 on measures of hearing and cognition or vision and cognition in older adults, a total of 437 articles were identified. Screening by title and abstract for appropriateness of topic and for articles presenting original research in peer-reviewed journals reduced the final number of articles reviewed to 34. These articles were qualitatively evaluated and synthesized with the existing knowledge base. Additional evidence has been obtained since 2009 associating declines in vision, hearing, or both with declines in cognition among older adults. The observed sensory-cognitive associations are generally stronger when more than one sensory domain is measured and when the sensory measures involve more than simple threshold sensitivity. Evidence continues to accumulate supporting a link between decline in sensory function and cognitive decline in older adults.
Collapse
|
12
|
Calandruccio L, Buss E. Spectral integration of English speech for non-native English speakers. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 142:1646. [PMID: 28964099 PMCID: PMC5614729 DOI: 10.1121/1.5003933] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2017] [Revised: 09/07/2017] [Accepted: 09/08/2017] [Indexed: 06/01/2023]
Abstract
When listening in noisy environments, good speech perception often relies on the ability to integrate cues distributed across disparate frequency regions. The present study evaluated this ability in non-native speakers of English. Native English-speaking and native Mandarin-speaking listeners who acquired English as their second language participated. English sentence recognition was evaluated in a two-stage procedure. First, the bandwidth associated with ∼15% correct was determined for a band centered on 500 Hz and a band centered at 2500 Hz. Performance was then evaluated for each band alone and both bands combined. Data indicated that non-natives needed significantly wider bandwidths than natives to achieve comparable performance with just the low or just the high band alone. Further, even when provided with wider bandwidth within each frequency region, non-natives were worse than natives at integrating information across bands. These data support the idea that greater bandwidth requirements and a reduced ability to integrate speech cues distributed across frequency may play an important role in the greater difficulty non-natives often experience when listening to English speech in noisy environments.
Collapse
Affiliation(s)
- Lauren Calandruccio
- Department of Psychological Sciences, Case Western Reserve University, 11635 Euclid Avenue, Cleveland, Ohio 44106, USA
| | - Emily Buss
- Department of Otolaryngology/Head and Neck Surgery, 170 Manning Drive, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| |
Collapse
|
13
|
Stacey PC, Kitterick PT, Morris SD, Sumner CJ. The contribution of visual information to the perception of speech in noise with and without informative temporal fine structure. Hear Res 2016; 336:17-28. [PMID: 27085797 PMCID: PMC5706637 DOI: 10.1016/j.heares.2016.04.002] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/03/2015] [Revised: 04/06/2016] [Accepted: 04/11/2016] [Indexed: 12/02/2022]
Abstract
Understanding what is said in demanding listening situations is assisted greatly by looking at the face of a talker. Previous studies have observed that normal-hearing listeners can benefit from this visual information when a talker's voice is presented in background noise. These benefits have also been observed in quiet listening conditions in cochlear-implant users, whose device does not convey the informative temporal fine structure cues in speech, and when normal-hearing individuals listen to speech processed to remove these informative temporal fine structure cues. The current study (1) characterised the benefits of visual information when listening in background noise; and (2) used sine-wave vocoding to compare the size of the visual benefit when speech is presented with or without informative temporal fine structure. The accuracy with which normal-hearing individuals reported words in spoken sentences was assessed across three experiments. The availability of visual information and informative temporal fine structure cues was varied within and across the experiments. The results showed that visual benefit was observed using open- and closed-set tests of speech perception. The size of the benefit increased when informative temporal fine structure cues were removed. This finding suggests that visual information may play an important role in the ability of cochlear-implant users to understand speech in many everyday situations. Models of audio-visual integration were able to account for the additional benefit of visual information when speech was degraded and suggested that auditory and visual information was being integrated in a similar way in all conditions. The modelling results were consistent with the notion that audio-visual benefit is derived from the optimal combination of auditory and visual sensory cues.
Collapse
Affiliation(s)
- Paula C Stacey
- Division of Psychology, Nottingham Trent University, Burton Street, Nottingham NG1 4BU, UK.
| | - Pádraig T Kitterick
- NIHR Nottingham Hearing Biomedical Research Unit, Ropewalk House, 113 The Ropewalk, Nottingham NG1 5DU, UK.
| | - Saffron D Morris
- MRC Institute of Hearing Research, University Park, Nottingham NG7 2RD, UK.
| | - Christian J Sumner
- MRC Institute of Hearing Research, University Park, Nottingham NG7 2RD, UK.
| |
Collapse
|
14
|
Frtusova JB, Phillips NA. The Auditory-Visual Speech Benefit on Working Memory in Older Adults with Hearing Impairment. Front Psychol 2016; 7:490. [PMID: 27148106 PMCID: PMC4828631 DOI: 10.3389/fpsyg.2016.00490] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2015] [Accepted: 03/21/2016] [Indexed: 11/24/2022] Open
Abstract
This study examined the effect of auditory-visual (AV) speech stimuli on working memory in older adults with poorer-hearing (PH) in comparison to age- and education-matched older adults with better hearing (BH). Participants completed a working memory n-back task (0- to 2-back) in which sequences of digits were presented in visual-only (i.e., speech-reading), auditory-only (A-only), and AV conditions. Auditory event-related potentials (ERP) were collected to assess the relationship between perceptual and working memory processing. The behavioral results showed that both groups were faster in the AV condition in comparison to the unisensory conditions. The ERP data showed perceptual facilitation in the AV condition, in the form of reduced amplitudes and latencies of the auditory N1 and/or P1 components, in the PH group. Furthermore, a working memory ERP component, the P3, peaked earlier for both groups in the AV condition compared to the A-only condition. In general, the PH group showed a more robust AV benefit; however, the BH group showed a dose-response relationship between perceptual facilitation and working memory improvement, especially for facilitation of processing speed. Two measures, reaction time and P3 amplitude, suggested that the presence of visual speech cues may have helped the PH group to counteract the demanding auditory processing, to the level that no group differences were evident during the AV modality despite lower performance during the A-only condition. Overall, this study provides support for the theory of an integrated perceptual-cognitive system. The practical significance of these findings is also discussed.
Collapse
Affiliation(s)
| | - Natalie A. Phillips
- Cognition, Aging, and Psychophysiology Lab, Department of Psychology, Concordia UniversityMontreal, QC, Canada
| |
Collapse
|
15
|
Statistical Learning, Syllable Processing, and Speech Production in Healthy Hearing and Hearing-Impaired Preschool Children. Ear Hear 2016; 37:e57-71. [DOI: 10.1097/aud.0000000000000197] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
16
|
Healy EW, Yoho SE, Wang Y, Wang D. An algorithm to improve speech recognition in noise for hearing-impaired listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 134:3029-38. [PMID: 24116438 PMCID: PMC3799726 DOI: 10.1121/1.4820893] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2013] [Revised: 08/22/2013] [Accepted: 08/26/2013] [Indexed: 05/09/2023]
Abstract
Despite considerable effort, monaural (single-microphone) algorithms capable of increasing the intelligibility of speech in noise have remained elusive. Successful development of such an algorithm is especially important for hearing-impaired (HI) listeners, given their particular difficulty in noisy backgrounds. In the current study, an algorithm based on binary masking was developed to separate speech from noise. Unlike the ideal binary mask, which requires prior knowledge of the premixed signals, the masks used to segregate speech from noise in the current study were estimated by training the algorithm on speech not used during testing. Sentences were mixed with speech-shaped noise and with babble at various signal-to-noise ratios (SNRs). Testing using normal-hearing and HI listeners indicated that intelligibility increased following processing in all conditions. These increases were larger for HI listeners, for the modulated background, and for the least-favorable SNRs. They were also often substantial, allowing several HI listeners to improve intelligibility from scores near zero to values above 70%.
Collapse
Affiliation(s)
- Eric W Healy
- Department of Speech and Hearing Science, and Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, Ohio 43210
| | | | | | | |
Collapse
|
17
|
Seeing and hearing a word: combining eye and ear is more efficient than combining the parts of a word. PLoS One 2013; 8:e64803. [PMID: 23734220 PMCID: PMC3667182 DOI: 10.1371/journal.pone.0064803] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2012] [Accepted: 04/18/2013] [Indexed: 11/21/2022] Open
Abstract
To understand why human sensitivity for complex objects is so low, we study how word identification combines eye and ear or parts of a word (features, letters, syllables). Our observers identify printed and spoken words presented concurrently or separately. When researchers measure threshold (energy of the faintest visible or audible signal) they may report either sensitivity (one over the human threshold) or efficiency (ratio of the best possible threshold to the human threshold). When the best possible algorithm identifies an object (like a word) in noise, its threshold is independent of how many parts the object has. But, with human observers, efficiency depends on the task. In some tasks, human observers combine parts efficiently, needing hardly more energy to identify an object with more parts. In other tasks, they combine inefficiently, needing energy nearly proportional to the number of parts, over a 60∶1 range. Whether presented to eye or ear, efficiency for detecting a short sinusoid (tone or grating) with few features is a substantial 20%, while efficiency for identifying a word with many features is merely 1%. Why? We show that the low human sensitivity for words is a cost of combining their many parts. We report a dichotomy between inefficient combining of adjacent features and efficient combining across senses. Joining our results with a survey of the cue-combination literature reveals that cues combine efficiently only if they are perceived as aspects of the same object. Observers give different names to adjacent letters in a word, and combine them inefficiently. Observers give the same name to a word’s image and sound, and combine them efficiently. The brain’s machinery optimally combines only cues that are perceived as originating from the same object. Presumably such cues each find their own way through the brain to arrive at the same object representation.
Collapse
|
18
|
The Influence of Audiovisual Ceiling Performance on the Relationship Between Reverberation and Directional Benefit. Ear Hear 2012; 33:604-14. [DOI: 10.1097/aud.0b013e31825641e4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
19
|
Obermeier C, Dolk T, Gunter TC. The benefit of gestures during communication: Evidence from hearing and hearing-impaired individuals. Cortex 2012; 48:857-70. [DOI: 10.1016/j.cortex.2011.02.007] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2010] [Revised: 10/30/2010] [Accepted: 02/07/2011] [Indexed: 10/18/2022]
|
20
|
Altieri N, Pisoni DB, Townsend JT. Some behavioral and neurobiological constraints on theories of audiovisual speech integration: a review and suggestions for new directions. ACTA ACUST UNITED AC 2011; 24:513-39. [PMID: 21968081 DOI: 10.1163/187847611x595864] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Summerfield (1987) proposed several accounts of audiovisual speech perception, a field of research that has burgeoned in recent years. The proposed accounts included the integration of discrete phonetic features, vectors describing the values of independent acoustical and optical parameters, the filter function of the vocal tract, and articulatory dynamics of the vocal tract. The latter two accounts assume that the representations of audiovisual speech perception are based on abstract gestures, while the former two assume that the representations consist of symbolic or featural information obtained from visual and auditory modalities. Recent converging evidence from several different disciplines reveals that the general framework of Summerfield's feature-based theories should be expanded. An updated framework building upon the feature-based theories is presented. We propose a processing model arguing that auditory and visual brain circuits provide facilitatory information when the inputs are correctly timed, and that auditory and visual speech representations do not necessarily undergo translation into a common code during information processing. Future research on multisensory processing in speech perception should investigate the connections between auditory and visual brain regions, and utilize dynamic modeling tools to further understand the timing and information processing mechanisms involved in audiovisual speech integration.
Collapse
Affiliation(s)
- Nicholas Altieri
- Department of Psychology, University of Oklahoma, OK 73072, USA.
| | | | | |
Collapse
|
21
|
Kong YY, Braida LD. Cross-frequency integration for consonant and vowel identification in bimodal hearing. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2011; 54:959-980. [PMID: 21060139 PMCID: PMC3107368 DOI: 10.1044/1092-4388(2010/10-0197)] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
PURPOSE Improved speech recognition in binaurally combined acoustic-electric stimulation (otherwise known as bimodal hearing) could arise when listeners integrate speech cues from the acoustic and electric hearing. The aims of this study were (a) to identify speech cues extracted in electric hearing and residual acoustic hearing in the low-frequency region and (b) to investigate cochlear implant (CI) users' ability to integrate speech cues across frequencies. METHOD Normal-hearing (NH) and CI subjects participated in consonant and vowel identification tasks. Each subject was tested in 3 listening conditions: CI alone (vocoder speech for NH), hearing aid (HA) alone (low-pass filtered speech for NH), and both. Integration ability for each subject was evaluated using a model of optimal integration--the PreLabeling integration model (Braida, 1991). RESULTS Only a few CI listeners demonstrated bimodal benefit for phoneme identification in quiet. Speech cues extracted from the CI and the HA were highly redundant for consonants but were complementary for vowels. CI listeners also exhibited reduced integration ability for both consonant and vowel identification compared with their NH counterparts. CONCLUSION These findings suggest that reduced bimodal benefits in CI listeners are due to insufficient complementary speech cues across ears, a decrease in integration ability, or both.
Collapse
|
22
|
|
23
|
Healy EW, Carson KA. Influence of broad auditory tuning on across-frequency integration of speech patterns. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2010; 53:1087-95. [PMID: 20689025 PMCID: PMC2954411 DOI: 10.1044/1092-4388(2010/09-0185)] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
PURPOSE The purpose of the present study was to assess whether diminished tolerance for disruptions to across-frequency timing in listeners with hearing impairment can be attributed to broad auditory tuning. METHOD In 2 experiments in which random assignment was used, sentences were represented as 3 noise bands centered at 530, 1500, and 4243 Hz, which were amplitude modulated by 3 corresponding narrow speech bands. To isolate broad tuning from other influences of hearing impairment, listeners with normal hearing (45 in Experiment 1 and 30 in Experiment 2) were presented with these vocoder stimuli, having carrier band filter slopes of 12, 24, and 192 dB/octave. These speech patterns were presented in synchrony and with between-band asynchronies up to 40 ms. RESULTS Mean intelligibility scores were reduced in conditions of severe, but not moderate, simulated broadening. Although scores fell as asynchrony increased, the steeper drop in performance characteristic of listeners with hearing impairment tested previously was not observed in conditions of simulated broadening. CONCLUSIONS The intolerance for small across-frequency asynchronies observed previously does not appear attributable to broad tuning. Instead, the present data suggest that the across-frequency processing mechanism in at least some listeners with hearing impairment might be less robust to this type of degradation.
Collapse
Affiliation(s)
- Eric W Healy
- Department of Speech and Hearing Science, The Ohio State University, Pressey Hall, Room 110, 1070 Carmack Road, Columbus, OH 43210, USA.
| | | |
Collapse
|
24
|
Hall JW, Buss E, Grose JH. Spectral integration of speech bands in normal-hearing and hearing-impaired listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 124:1105-15. [PMID: 18681600 PMCID: PMC2633714 DOI: 10.1121/1.2940582] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
This investigation examined whether listeners with mild-moderate sensorineural hearing impairment have a deficit in the ability to integrate synchronous spectral information in the perception of speech. In stage 1, the bandwidth of filtered speech centered either on 500 or 2500 Hz was varied adaptively to determine the width required for approximately 15%-25% correct recognition. In stage 2, these criterion bandwidths were presented simultaneously and percent correct performance was determined in fixed block trials. Experiment 1 tested normal-hearing listeners in quiet and in masking noise. The main findings were (1) there was no correlation between the criterion bandwidths at 500 and 2500 Hz; (2) listeners achieved a high percent correct in stage 2 (approximately 80%); and (3) performance in quiet and noise was similar. Experiment 2 tested listeners with mild-moderate sensorineural hearing impairment. The main findings were (1) the impaired listeners showed high variability in stage 1, with some listeners requiring narrower and others requiring wider bandwidths than normal, and (2) hearing-impaired listeners achieved percent correct performance in stage 2 that was comparable to normal. The results indicate that listeners with mild-moderate sensorineural hearing loss do not have an essential deficit in the ability to integrate across-frequency speech information.
Collapse
Affiliation(s)
- Joseph W Hall
- Department of Otolaryngology/Head and Neck Surgery, University of North Carolina School of Medicine, Chapel Hill, North Carolina 27599, USA.
| | | | | |
Collapse
|