1
|
The Cross-Modal Suppressive Role of Visual Context on Speech Intelligibility: An ERP Study. Brain Sci 2020; 10:brainsci10110810. [PMID: 33147691 PMCID: PMC7692090 DOI: 10.3390/brainsci10110810] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2020] [Revised: 10/21/2020] [Accepted: 10/29/2020] [Indexed: 11/20/2022] Open
Abstract
The efficacy of audiovisual (AV) integration is reflected in the degree of cross-modal suppression of the auditory event-related potentials (ERPs, P1-N1-P2), while stronger semantic encoding is reflected in enhanced late ERP negativities (e.g., N450). We hypothesized that increasing visual stimulus reliability should lead to more robust AV-integration and enhanced semantic prediction, reflected in suppression of auditory ERPs and enhanced N450, respectively. EEG was acquired while individuals watched and listened to clear and blurred videos of a speaker uttering intact or highly-intelligible degraded (vocoded) words and made binary judgments about word meaning (animate or inanimate). We found that intact speech evoked larger negativity between 280–527-ms than vocoded speech, suggestive of more robust semantic prediction for the intact signal. For visual reliability, we found that greater cross-modal ERP suppression occurred for clear than blurred videos prior to sound onset and for the P2 ERP. Additionally, the later semantic-related negativity tended to be larger for clear than blurred videos. These results suggest that the cross-modal effect is largely confined to suppression of early auditory networks with weak effect on networks associated with semantic prediction. However, the semantic-related visual effect on the late negativity may have been tempered by the vocoded signal’s high-reliability.
Collapse
|
2
|
Teng X, Cogan GB, Poeppel D. Speech fine structure contains critical temporal cues to support speech segmentation. Neuroimage 2019; 202:116152. [PMID: 31484039 DOI: 10.1016/j.neuroimage.2019.116152] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2018] [Revised: 08/10/2019] [Accepted: 08/31/2019] [Indexed: 11/16/2022] Open
Abstract
Segmenting the continuous speech stream into units for further perceptual and linguistic analyses is fundamental to speech recognition. The speech amplitude envelope (SE) has long been considered a fundamental temporal cue for segmenting speech. Does the temporal fine structure (TFS), a significant part of speech signals often considered to contain primarily spectral information, contribute to speech segmentation? Using magnetoencephalography, we show that the TFS entrains cortical responses between 3 and 6 Hz and demonstrate, using mutual information analysis, that (i) the temporal information in the TFS can be reconstructed from a measure of frame-to-frame spectral change and correlates with the SE and (ii) that spectral resolution is key to the extraction of such temporal information. Furthermore, we show behavioural evidence that, when the SE is temporally distorted, the TFS provides cues for speech segmentation and aids speech recognition significantly. Our findings show that it is insufficient to investigate solely the SE to understand temporal speech segmentation, as the SE and the TFS derived from a band-filtering method convey comparable, if not inseparable, temporal information. We argue for a more synthetic view of speech segmentation - the auditory system groups speech signals coherently in both temporal and spectral domains.
Collapse
Affiliation(s)
- Xiangbin Teng
- Department of Neuroscience, Max-Planck-Institute for Empirical Aesthetics, Frankfurt, 60322, Germany.
| | - Gregory B Cogan
- Department of Neurosurgery, Duke University, Durham, NC, USA, 27710
| | - David Poeppel
- Department of Neuroscience, Max-Planck-Institute for Empirical Aesthetics, Frankfurt, 60322, Germany; Department of Psychology, New York University, New York, NY, USA, 10003
| |
Collapse
|
3
|
Barona R, Vizcaino JA, Krstulovic C, Barona L, Comeche C, Montalt J, Ubeda M, Polo C. Does Asymmetric Hearing Loss Affect the Ability to Understand in Noisy Environments? J Int Adv Otol 2019; 15:267-271. [PMID: 31418717 DOI: 10.5152/iao.2019.5765] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
OBJECTIVES This study aimed to determine whether, in asymmetric hearing loss, the presence of an ear with a better or worse hearing threshold is related to either better or worse speech-in-noise (SiN) intelligibility. MATERIALS AND METHODS A total of 618 subjects with different degrees of hearing loss were evaluated for their ability to understand SiN. A stepwise forward logistic regression analysis was performed to identify the factors that affect performance. The influencing factors of very high or very low performance were determined. RESULTS Age, especially after 70 years of age, and hearing loss, especially from moderate hearing loss, negatively influence SiN intelligibility. Remarkably high intelligibility was identified in subjects with a contralateral ear presenting a better auditory threshold. CONCLUSION Although age and hearing loss are known factors that affect SiN intelligibility, the presence of a healthy contralateral ear is presented as the first description of preservation of SiN hearing ability.
Collapse
Affiliation(s)
- Rafael Barona
- Department of Otolaryngology, Barona Clinic, Valencia, Spain
| | | | | | - Luz Barona
- Department of Otolaryngology, Barona Clinic, Valencia, Spain
| | - Carmen Comeche
- Department of Otolaryngology, Barona Clinic, Valencia, Spain
| | - Jose Montalt
- Department of Otolaryngology, Barona Clinic, Valencia, Spain
| | - Mercedes Ubeda
- Department of Otolaryngology, Barona Clinic, Valencia, Spain
| | - Carolina Polo
- Valencia Catholic University Saint Vincent Martyr, Valencia, Spain
| |
Collapse
|
4
|
Ihlefeld A, Chen YW, Sanes DH. Developmental Conductive Hearing Loss Reduces Modulation Masking Release. Trends Hear 2018; 20:2331216516676255. [PMID: 28215119 PMCID: PMC5318943 DOI: 10.1177/2331216516676255] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Hearing-impaired individuals experience difficulties in detecting or understanding speech, especially in background sounds within the same frequency range. However, normally hearing (NH) human listeners experience less difficulty detecting a target tone in background noise when the envelope of that noise is temporally gated (modulated) than when that envelope is flat across time (unmodulated). This perceptual benefit is called modulation masking release (MMR). When flanking masker energy is added well outside the frequency band of the target, and comodulated with the original modulated masker, detection thresholds improve further (MMR+). In contrast, if the flanking masker is antimodulated with the original masker, thresholds worsen (MMR−). These interactions across disparate frequency ranges are thought to require central nervous system (CNS) processing. Therefore, we explored the effect of developmental conductive hearing loss (CHL) in gerbils on MMR characteristics, as a test for putative CNS mechanisms. The detection thresholds of NH gerbils were lower in modulated noise, when compared with unmodulated noise. The addition of a comodulated flanker further improved performance, whereas an antimodulated flanker worsened performance. However, for CHL-reared gerbils, all three forms of masking release were reduced when compared with NH animals. These results suggest that developmental CHL impairs both within- and across-frequency processing and provide behavioral evidence that CNS mechanisms are affected by a peripheral hearing impairment.
Collapse
Affiliation(s)
- Antje Ihlefeld
- 1 Department of Biomedical Engineering, New Jersey Institute of Technology, Newark, NJ, USA
| | - Yi-Wen Chen
- 2 Center for Neural Science, New York University, NY, USA
| | - Dan H Sanes
- 2 Center for Neural Science, New York University, NY, USA.,3 Department of Psychology, New York University, NY, USA.,4 Department of Biology, New York University, NY, USA
| |
Collapse
|
5
|
Xu Y, Chen M, LaFaire P, Tan X, Richter CP. Distorting temporal fine structure by phase shifting and its effects on speech intelligibility and neural phase locking. Sci Rep 2017; 7:13387. [PMID: 29042580 PMCID: PMC5645416 DOI: 10.1038/s41598-017-12975-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2017] [Accepted: 09/13/2017] [Indexed: 11/27/2022] Open
Abstract
Envelope (E) and temporal fine structure (TFS) are important features of acoustic signals and their corresponding perceptual function has been investigated with various listening tasks. To further understand the underlying neural processing of TFS, experiments in humans and animals were conducted to demonstrate the effects of modifying the TFS in natural speech sentences on both speech recognition and neural coding. The TFS of natural speech sentences was modified by distorting the phase and maintaining the magnitude. Speech intelligibility was then tested for normal-hearing listeners using the intact and reconstructed sentences presented in quiet and against background noise. Sentences with modified TFS were then used to evoke neural activity in auditory neurons of the inferior colliculus in guinea pigs. Our study demonstrated that speech intelligibility in humans relied on the periodic cues of speech TFS in both quiet and noisy listening conditions. Furthermore, recordings of neural activity from the guinea pig inferior colliculus have shown that individual auditory neurons exhibit phase locking patterns to the periodic cues of speech TFS that disappear when reconstructed sounds do not show periodic patterns anymore. Thus, the periodic cues of TFS are essential for speech intelligibility and are encoded in auditory neurons by phase locking.
Collapse
Affiliation(s)
- Yingyue Xu
- Northwestern University, Department of Otolaryngology, 320 E. Superior Street, Searle 12-561, Chicago, IL, 60611, USA
| | - Maxin Chen
- Northwestern University, Department of Biomedical Engineering, 2145 Sheridan Road, Tech E310, Evanston, IL, 60208, USA
| | - Petrina LaFaire
- Northwestern University, Department of Otolaryngology, 320 E. Superior Street, Searle 12-561, Chicago, IL, 60611, USA
| | - Xiaodong Tan
- Northwestern University, Department of Otolaryngology, 320 E. Superior Street, Searle 12-561, Chicago, IL, 60611, USA
| | - Claus-Peter Richter
- Northwestern University, Department of Otolaryngology, 320 E. Superior Street, Searle 12-561, Chicago, IL, 60611, USA. .,Northwestern University, The Hugh Knowles Center, Department of Communication Sciences and Disorders, 2240 Campus Drive, Evanston, IL, 60208, USA.
| |
Collapse
|
6
|
Predictions of Speech Chimaera Intelligibility Using Auditory Nerve Mean-Rate and Spike-Timing Neural Cues. J Assoc Res Otolaryngol 2017; 18:687-710. [PMID: 28748487 DOI: 10.1007/s10162-017-0627-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2015] [Accepted: 05/29/2017] [Indexed: 10/19/2022] Open
Abstract
Perceptual studies of speech intelligibility have shown that slow variations of acoustic envelope (ENV) in a small set of frequency bands provides adequate information for good perceptual performance in quiet, whereas acoustic temporal fine-structure (TFS) cues play a supporting role in background noise. However, the implications for neural coding are prone to misinterpretation because the mean-rate neural representation can contain recovered ENV cues from cochlear filtering of TFS. We investigated ENV recovery and spike-time TFS coding using objective measures of simulated mean-rate and spike-timing neural representations of chimaeric speech, in which either the ENV or the TFS is replaced by another signal. We (a) evaluated the levels of mean-rate and spike-timing neural information for two categories of chimaeric speech, one retaining ENV cues and the other TFS; (b) examined the level of recovered ENV from cochlear filtering of TFS speech; (c) examined and quantified the contribution to recovered ENV from spike-timing cues using a lateral inhibition network (LIN); and (d) constructed linear regression models with objective measures of mean-rate and spike-timing neural cues and subjective phoneme perception scores from normal-hearing listeners. The mean-rate neural cues from the original ENV and recovered ENV partially accounted for perceptual score variability, with additional variability explained by the recovered ENV from the LIN-processed TFS speech. The best model predictions of chimaeric speech intelligibility were found when both the mean-rate and spike-timing neural cues were included, providing further evidence that spike-time coding of TFS cues is important for intelligibility when the speech envelope is degraded.
Collapse
|
7
|
D'Aquila LA, Desloge JG, Reed CM, Braida LD. Effect of Energy Equalization on the Intelligibility of Speech in Fluctuating Background Interference for Listeners With Hearing Impairment. Trends Hear 2017; 21:2331216517710354. [PMID: 28602128 PMCID: PMC5470656 DOI: 10.1177/2331216517710354] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
The masking release (MR; i.e., better speech recognition in fluctuating compared with continuous noise backgrounds) that is evident for listeners with normal hearing (NH) is generally reduced or absent for listeners with sensorineural hearing impairment (HI). In this study, a real-time signal-processing technique was developed to improve MR in listeners with HI and offer insight into the mechanisms influencing the size of MR. This technique compares short-term and long-term estimates of energy, increases the level of short-term segments whose energy is below the average energy, and normalizes the overall energy of the processed signal to be equivalent to that of the original long-term estimate. This signal-processing algorithm was used to create two types of energy-equalized (EEQ) signals: EEQ1, which operated on the wideband speech plus noise signal, and EEQ4, which operated independently on each of four bands with equal logarithmic width. Consonant identification was tested in backgrounds of continuous and various types of fluctuating speech-shaped Gaussian noise including those with both regularly and irregularly spaced temporal fluctuations. Listeners with HI achieved similar scores for EEQ and the original (unprocessed) stimuli in continuous-noise backgrounds, while superior performance was obtained for the EEQ signals in fluctuating background noises that had regular temporal gaps but not for those with irregularly spaced fluctuations. Thus, in noise backgrounds with regularly spaced temporal fluctuations, the energy-normalized signals led to larger values of MR and higher intelligibility than obtained with unprocessed signals.
Collapse
Affiliation(s)
- Laura A D'Aquila
- 1 Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Joseph G Desloge
- 1 Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Charlotte M Reed
- 1 Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Louis D Braida
- 1 Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
8
|
Desloge JG, Reed CM, Braida LD, Perez ZD, D'Aquila LA. Masking release for hearing-impaired listeners: The effect of increased audibility through reduction of amplitude variability. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:4452. [PMID: 28679277 PMCID: PMC5552388 DOI: 10.1121/1.4985186] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2016] [Revised: 04/05/2017] [Accepted: 05/23/2017] [Indexed: 05/25/2023]
Abstract
The masking release (i.e., better speech recognition in fluctuating compared to continuous noise backgrounds) observed for normal-hearing (NH) listeners is generally reduced or absent in hearing-impaired (HI) listeners. One explanation for this lies in the effects of reduced audibility: elevated thresholds may prevent HI listeners from taking advantage of signals available to NH listeners during the dips of temporally fluctuating noise where the interference is relatively weak. This hypothesis was addressed through the development of a signal-processing technique designed to increase the audibility of speech during dips in interrupted noise. This technique acts to (i) compare short-term and long-term estimates of energy, (ii) increase the level of short-term segments whose energy is below the average energy, and (iii) normalize the overall energy of the processed signal to be equivalent to that of the original long-term estimate. Evaluations of this energy-equalizing (EEQ) technique included consonant identification and sentence reception in backgrounds of continuous and regularly interrupted noise. For HI listeners, performance was generally similar for processed and unprocessed signals in continuous noise; however, superior performance for EEQ processing was observed in certain regularly interrupted noise backgrounds.
Collapse
Affiliation(s)
- Joseph G Desloge
- Research Laboratory of Electronics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Charlotte M Reed
- Research Laboratory of Electronics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Louis D Braida
- Research Laboratory of Electronics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Zachary D Perez
- Research Laboratory of Electronics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Laura A D'Aquila
- Research Laboratory of Electronics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
9
|
Shahin AJ, Shen S, Kerlin JR. Tolerance for audiovisual asynchrony is enhanced by the spectrotemporal fidelity of the speaker's mouth movements and speech. LANGUAGE, COGNITION AND NEUROSCIENCE 2017; 32:1102-1118. [PMID: 28966930 PMCID: PMC5617130 DOI: 10.1080/23273798.2017.1283428] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2016] [Accepted: 01/07/2017] [Indexed: 06/07/2023]
Abstract
We examined the relationship between tolerance for audiovisual onset asynchrony (AVOA) and the spectrotemporal fidelity of the spoken words and the speaker's mouth movements. In two experiments that only varied in the temporal order of sensory modality, visual speech leading (exp1) or lagging (exp2) acoustic speech, participants watched intact and blurred videos of a speaker uttering trisyllabic words and nonwords that were noise vocoded with 4-, 8-, 16-, and 32-channels. They judged whether the speaker's mouth movements and the speech sounds were in-sync or out-of-sync. Individuals perceived synchrony (tolerated AVOA) on more trials when the acoustic speech was more speech-like (8 channels and higher vs. 4 channels), and when visual speech was intact than blurred (exp1 only). These findings suggest that enhanced spectrotemporal fidelity of the audiovisual (AV) signal prompts the brain to widen the window of integration promoting the fusion of temporally distant AV percepts.
Collapse
Affiliation(s)
- Antoine J Shahin
- Center for Mind and Brain, University of California, Davis, CA, 95618
| | - Stanley Shen
- Center for Mind and Brain, University of California, Davis, CA, 95618
| | - Jess R Kerlin
- Center for Mind and Brain, University of California, Davis, CA, 95618
| |
Collapse
|
10
|
Reed CM, Desloge JG, Braida LD, Perez ZD, Léger AC. Level variations in speech: Effect on masking release in hearing-impaired listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:102. [PMID: 27475136 PMCID: PMC6910012 DOI: 10.1121/1.4954746] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Revised: 06/02/2016] [Accepted: 06/10/2016] [Indexed: 05/31/2023]
Abstract
Acoustic speech is marked by time-varying changes in the amplitude envelope that may pose difficulties for hearing-impaired listeners. Removal of these variations (e.g., by the Hilbert transform) could improve speech reception for such listeners, particularly in fluctuating interference. Léger, Reed, Desloge, Swaminathan, and Braida [(2015b). J. Acoust. Soc. Am. 138, 389-403] observed that a normalized measure of masking release obtained for hearing-impaired listeners using speech processed to preserve temporal fine-structure (TFS) cues was larger than that for unprocessed or envelope-based speech. This study measured masking release for two other speech signals in which level variations were minimal: peak clipping and TFS processing of an envelope signal. Consonant identification was measured for hearing-impaired listeners in backgrounds of continuous and fluctuating speech-shaped noise. The normalized masking release obtained using speech with normal variations in overall level was substantially less than that observed using speech processed to achieve highly restricted level variations. These results suggest that the performance of hearing-impaired listeners in fluctuating noise may be improved by signal processing that leads to a decrease in stimulus level variations.
Collapse
Affiliation(s)
- Charlotte M Reed
- Research Laboratory of Electronics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Joseph G Desloge
- Research Laboratory of Electronics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Louis D Braida
- Research Laboratory of Electronics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Zachary D Perez
- Research Laboratory of Electronics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Agnès C Léger
- School of Psychological Sciences, University of Manchester, Manchester, M13 9PL, United Kingdom
| |
Collapse
|