1
|
de la Cruz-Pavía I, Hegde M, Cabrera L, Nazzi T. Infants' abilities to segment word forms from spectrally degraded speech in the first year of life. Dev Sci 2024; 27:e13533. [PMID: 38853379 DOI: 10.1111/desc.13533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 04/22/2024] [Accepted: 05/15/2024] [Indexed: 06/11/2024]
Abstract
Infants begin to segment word forms from fluent speech-a crucial task in lexical processing-between 4 and 7 months of age. Prior work has established that infants rely on a variety of cues available in the speech signal (i.e., prosodic, statistical, acoustic-segmental, and lexical) to accomplish this task. In two experiments with French-learning 6- and 10-month-olds, we use a psychoacoustic approach to examine if and how degradation of the two fundamental acoustic components extracted from speech by the auditory system, namely, temporal (both frequency and amplitude modulation) and spectral information, impact word form segmentation. Infants were familiarized with passages containing target words, in which frequency modulation (FM) information was replaced with pure tones using a vocoder, while amplitude modulation (AM) was preserved in either 8 or 16 spectral bands. Infants were then tested on their recognition of the target versus novel control words. While the 6-month-olds were unable to segment in either condition, the 10-month-olds succeeded, although only in the 16 spectral band condition. These findings suggest that 6-month-olds need FM temporal cues for speech segmentation while 10-month-olds do not, although they need the AM cues to be presented in enough spectral bands (i.e., 16). This developmental change observed in infants' sensitivity to spectrotemporal cues likely results from an increase in the range of available segmentation procedures, and/or shift from a vowel to a consonant bias in lexical processing between the two ages, as vowels are more affected by our acoustic manipulations. RESEARCH HIGHLIGHTS: Although segmenting speech into word forms is crucial for lexical acquisition, the acoustic information that infants' auditory system extracts to process continuous speech remains unknown. We examined infants' sensitivity to spectrotemporal cues in speech segmentation using vocoded speech, and revealed a developmental change between 6 and 10 months of age. We showed that FM information, that is, the fast temporal modulations of speech, is necessary for 6- but not 10-month-old infants to segment word forms. Moreover, reducing the number of spectral bands impacts 10-month-olds' segmentation abilities, who succeed when 16 bands are preserved, but fail with 8 bands.
Collapse
Affiliation(s)
- Irene de la Cruz-Pavía
- Faculty of Social and Human Sciences, Universidad de Deusto, Bilbao, Spain
- Basque Foundation for Science Ikerbasque, Bilbao, Spain
| | - Monica Hegde
- INCC UMR 8002, CNRS, F-75006, Université Paris Cité, Paris, France
| | | | - Thierry Nazzi
- INCC UMR 8002, CNRS, F-75006, Université Paris Cité, Paris, France
| |
Collapse
|
2
|
Borjigin A, Bharadwaj HM. Individual Differences Elucidate the Perceptual Benefits Associated with Robust Temporal Fine-Structure Processing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.20.558670. [PMID: 37790457 PMCID: PMC10542537 DOI: 10.1101/2023.09.20.558670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
The auditory system is unique among sensory systems in its ability to phase lock to and precisely follow very fast cycle-by-cycle fluctuations in the phase of sound-driven cochlear vibrations. Yet, the perceptual role of this temporal fine structure (TFS) code is debated. This fundamental gap is attributable to our inability to experimentally manipulate TFS cues without altering other perceptually relevant cues. Here, we circumnavigated this limitation by leveraging individual differences across 200 participants to systematically compare variations in TFS sensitivity to performance in a range of speech perception tasks. TFS sensitivity was assessed through detection of interaural time/phase differences, while speech perception was evaluated by word identification under noise interference. Results suggest that greater TFS sensitivity is not associated with greater masking release from fundamental-frequency or spatial cues, but appears to contribute to resilience against the effects of reverberation. We also found that greater TFS sensitivity is associated with faster response times, indicating reduced listening effort. These findings highlight the perceptual significance of TFS coding for everyday hearing.
Collapse
Affiliation(s)
- Agudemu Borjigin
- Weldon School of Biomedical Engineering, Purdue University, West Lafayette, IN 47907, USA
- Waisman Center, University of Wisconsin - Madison, Madison, WI 53705, USA
| | - Hari M. Bharadwaj
- Weldon School of Biomedical Engineering, Purdue University, West Lafayette, IN 47907, USA
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, IN 47907, USA
- Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, PA 15213, USA
| |
Collapse
|
3
|
Hegde M, Nazzi T, Cabrera L. An auditory perspective on phonological development in infancy. Front Psychol 2024; 14:1321311. [PMID: 38327506 PMCID: PMC10848800 DOI: 10.3389/fpsyg.2023.1321311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 12/11/2023] [Indexed: 02/09/2024] Open
Abstract
Introduction The auditory system encodes the phonetic features of languages by processing spectro-temporal modulations in speech, which can be described at two time scales: relatively slow amplitude variations over time (AM, further distinguished into the slowest <8-16 Hz and faster components 16-500 Hz), and frequency modulations (FM, oscillating at higher rates about 600-10 kHz). While adults require only the slowest AM cues to identify and discriminate speech sounds, infants have been shown to also require faster AM cues (>8-16 Hz) for similar tasks. Methods Using an observer-based psychophysical method, this study measured the ability of typical-hearing 6-month-olds, 10-month-olds, and adults to detect a change in the vowel or consonant features of consonant-vowel syllables when temporal modulations are selectively degraded. Two acoustically degraded conditions were designed, replacing FM cues with pure tones in 32 frequency bands, and then extracting AM cues in each frequency band with two different low-pass cut- off frequencies: (1) half the bandwidth (Fast AM condition), (2) <8 Hz (Slow AM condition). Results In the Fast AM condition, results show that with reduced FM cues, 85% of 6-month-olds, 72.5% of 10-month-olds, and 100% of adults successfully categorize phonemes. Among participants who passed the Fast AM condition, 67% of 6-month-olds, 75% of 10-month-olds, and 95% of adults passed the Slow AM condition. Furthermore, across the three age groups, the proportion of participants able to detect phonetic category change did not differ between the vowel and consonant conditions. However, age-related differences were observed for vowel categorization: while the 6- and 10-month-old groups did not differ from one another, they both independently differed from adults. Moreover, for consonant categorization, 10-month-olds were more impacted by acoustic temporal degradation compared to 6-month-olds, and showed a greater decline in detection success rates between the Fast AM and Slow AM conditions. Discussion The degradation of FM and faster AM cues (>8 Hz) appears to strongly affect consonant processing at 10 months of age. These findings suggest that between 6 and 10 months, infants show different developmental trajectories in the perceptual weight of speech temporal acoustic cues for vowel and consonant processing, possibly linked to phonological attunement.
Collapse
Affiliation(s)
- Monica Hegde
- Integrative Neuroscience and Cognition Center (INCC-UMR 8002), Université Paris Cité-CNRS, Paris, France
| | | | | |
Collapse
|
4
|
Apoux F, Miller-Viacava N, Ferrière R, Dai H, Krause B, Sueur J, Lorenzi C. Auditory discrimination of natural soundscapes. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:2706. [PMID: 37133815 DOI: 10.1121/10.0017972] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Accepted: 04/08/2023] [Indexed: 05/04/2023]
Abstract
A previous modelling study reported that spectro-temporal cues perceptually relevant to humans provide enough information to accurately classify "natural soundscapes" recorded in four distinct temperate habitats of a biosphere reserve [Thoret, Varnet, Boubenec, Ferriere, Le Tourneau, Krause, and Lorenzi (2020). J. Acoust. Soc. Am. 147, 3260]. The goal of the present study was to assess this prediction for humans using 2 s samples taken from the same soundscape recordings. Thirty-one listeners were asked to discriminate these recordings based on differences in habitat, season, or period of the day using an oddity task. Listeners' performance was well above chance, demonstrating effective processing of these differences and suggesting a general high sensitivity for natural soundscape discrimination. This performance did not improve with training up to 10 h. Additional results obtained for habitat discrimination indicate that temporal cues play only a minor role; instead, listeners appear to base their decisions primarily on gross spectral cues related to biological sound sources and habitat acoustics. Convolutional neural networks were trained to perform a similar task using spectro-temporal cues extracted by an auditory model as input. The results are consistent with the idea that humans exclude the available temporal information when discriminating short samples of habitats, implying a form of a sub-optimality.
Collapse
Affiliation(s)
- Frédéric Apoux
- Laboratoire des Systèmes Perceptifs, UMR CNRS 8248, Département d'Etudes Cognitives, Ecole normale supérieure, Université Paris Sciences et Lettres (PSL), Paris, 75005, France
| | - Nicole Miller-Viacava
- Laboratoire des Systèmes Perceptifs, UMR CNRS 8248, Département d'Etudes Cognitives, Ecole normale supérieure, Université Paris Sciences et Lettres (PSL), Paris, 75005, France
| | - Régis Ferrière
- International Research Laboratory for Interdisciplinary Global Environmental Studies (iGLOBES), CNRS, ENS-PSL University, University of Arizona, Tucson, Arizona 85721, USA
| | - Huanping Dai
- Speech Language and Hearing Sciences, University of Arizona, Tucson, Arizona 85721-0071, USA
| | - Bernie Krause
- Wild Sanctuary, 1102 Princeton Drive, Sonoma, California 95476, USA
| | - Jérôme Sueur
- Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum national d'Histoire naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles, 57 rue Cuvier, 75005 Paris, France
| | - Christian Lorenzi
- Laboratoire des Systèmes Perceptifs, UMR CNRS 8248, Département d'Etudes Cognitives, Ecole normale supérieure, Université Paris Sciences et Lettres (PSL), Paris, 75005, France
| |
Collapse
|
5
|
de la Cruz-Pavía I, Eloy C, Perrineau-Hecklé P, Nazzi T, Cabrera L. Consonant bias in adult lexical processing under acoustically degraded listening conditions. JASA EXPRESS LETTERS 2023; 3:2892558. [PMID: 37220232 DOI: 10.1121/10.0019576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Accepted: 05/05/2023] [Indexed: 05/25/2023]
Abstract
Consonants facilitate lexical processing across many languages, including French. This study investigates whether acoustic degradation affects this phonological bias in an auditory lexical decision task. French words were processed using an eight-band vocoder, degrading their frequency modulations (FM) while preserving original amplitude modulations (AM). Adult French natives were presented with these French words, preceded by similarly processed pseudoword primes sharing their vowels, consonants, or neither. Results reveal a consonant bias in the listeners' accuracy and response times, despite the reduced spectral and FM information. These degraded conditions resemble current cochlear-implant processors, and attest to the robustness of this phonological bias.
Collapse
Affiliation(s)
- Irene de la Cruz-Pavía
- Department of Linguistics and Basque Studies, Universidad del País Vasco/Euskal Herriko Unibertsitatea, Vitoria-Gasteiz 01006, Spain
| | - Coraline Eloy
- Integrative Neuroscience and Cognition Center, Université Paris Cité, Centre National de la Recherche Scientifique, Paris 75006, , , , ,
| | - Paula Perrineau-Hecklé
- Integrative Neuroscience and Cognition Center, Université Paris Cité, Centre National de la Recherche Scientifique, Paris 75006, , , , ,
| | - Thierry Nazzi
- Integrative Neuroscience and Cognition Center, Université Paris Cité, Centre National de la Recherche Scientifique, Paris 75006, , , , ,
| | - Laurianne Cabrera
- Integrative Neuroscience and Cognition Center, Université Paris Cité, Centre National de la Recherche Scientifique, Paris 75006, , , , ,
| |
Collapse
|
6
|
Drakopoulos F, Vasilkov V, Osses Vecchi A, Wartenberg T, Verhulst S. Model-based hearing-enhancement strategies for cochlear synaptopathy pathologies. Hear Res 2022; 424:108569. [DOI: 10.1016/j.heares.2022.108569] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 07/07/2022] [Accepted: 07/12/2022] [Indexed: 11/04/2022]
|
7
|
Tran Y, Tang D, McMahon C, Mitchell P, Gopinath B. Using a decision tree approach to determine hearing aid ownership in older adults. Disabil Rehabil 2022:1-7. [PMID: 35723014 DOI: 10.1080/09638288.2022.2087761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
PURPOSE The main clinical intervention for older adults with hearing loss is the provision of hearing aids. However, uptake and usage in this population have historically been reported as low. The aim of this study was to understand the hearing loss characteristics, from measured audiometric hearing loss and self-perceived hearing handicap, that contribute to the decision of hearing aid ownership. MATERIALS AND METHODS A total of 2833 adults aged 50+ years, of which 329 reported hearing aid ownership, were involved with a population-based survey with audiometric hearing assessments. Classification and regression tree (CART) analysis was used to classify hearing aid ownership from audiometric measurements and hearing disability outcomes. RESULTS An overall accuracy of 92.5% was found for the performance of the CART analysis in predicting hearing aid ownership from hearing loss characteristics. By including hearing disability, sensitivity for predicting hearing aid ownership increased by up to 40% compared with just audiometric hearing loss measurements alone. CONCLUSIONS A decision tree approach that considers both objectively measured hearing loss and self-perceived hearing disability, could facilitate a more tailored and personalised approach for determining hearing aid needs in the older population. IMPLICATIONS FOR REHABILITATIONWithout intervention, older adults with hearing loss are at higher risk of cognitive decline and higher rates of depression, anxiety, social isolation.The provision of hearing aids can compensate hearing function, however, uptake and usage have been reported as low.Using a more precise cut-off from audiometric measures and self-perceived hearing disability scores could facilitate a tailored and personalised approach to screen and identify older adults for hearing aid needs.
Collapse
Affiliation(s)
- Yvonne Tran
- Macquarie University Hearing, Department of Linguistics, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, Australia
| | - Diana Tang
- Macquarie University Hearing, Department of Linguistics, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, Australia
| | - Catherine McMahon
- Macquarie University Hearing, Department of Linguistics, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, Australia
| | - Paul Mitchell
- Centre for Vision Research, Department of Ophthalmology and Westmead Institute for Medical Research, University of Sydney, Sydney, Australia
| | - Bamini Gopinath
- Macquarie University Hearing, Department of Linguistics, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, Australia
| |
Collapse
|
8
|
Differential weighting of temporal envelope cues from the low-frequency region for Mandarin sentence recognition in noise. BMC Neurosci 2022; 23:35. [PMID: 35698039 PMCID: PMC9190152 DOI: 10.1186/s12868-022-00721-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2021] [Accepted: 06/01/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Temporal envelope cues are conveyed by cochlear implants (CIs) to hearing loss patients to restore hearing. Although CIs could enable users to communicate in clear listening environments, noisy environments still pose a problem. To improve speech-processing strategies used in Chinese CIs, we explored the relative contributions made by the temporal envelope in various frequency regions, as relevant to Mandarin sentence recognition in noise. METHODS Original speech material from the Mandarin version of the Hearing in Noise Test (MHINT) was mixed with speech-shaped noise (SSN), sinusoidally amplitude-modulated speech-shaped noise (SAM SSN), and sinusoidally amplitude-modulated (SAM) white noise (4 Hz) at a + 5 dB signal-to-noise ratio, respectively. Envelope information of the noise-corrupted speech material was extracted from 30 contiguous bands that were allocated to five frequency regions. The intelligibility of the noise-corrupted speech material (temporal cues from one or two regions were removed) was measured to estimate the relative weights of temporal envelope cues from the five frequency regions. RESULTS In SSN, the mean weights of Regions 1-5 were 0.34, 0.19, 0.20, 0.16, and 0.11, respectively; in SAM SSN, the mean weights of Regions 1-5 were 0.34, 0.17, 0.24, 0.14, and 0.11, respectively; and in SAM white noise, the mean weights of Regions 1-5 were 0.46, 0.24, 0.22, 0.06, and 0.02, respectively. CONCLUSIONS The results suggest that the temporal envelope in the low-frequency region transmits the greatest amount of information in terms of Mandarin sentence recognition for three types of noise, which differed from the perception strategy employed in clear listening environments.
Collapse
|
9
|
Individualized Assays of Temporal Coding in the Ascending Human Auditory System. eNeuro 2022; 9:ENEURO.0378-21.2022. [PMID: 35193890 PMCID: PMC8925652 DOI: 10.1523/eneuro.0378-21.2022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 01/12/2022] [Accepted: 02/08/2022] [Indexed: 11/21/2022] Open
Abstract
Neural phase-locking to temporal fluctuations is a fundamental and unique mechanism by which acoustic information is encoded by the auditory system. The perceptual role of this metabolically expensive mechanism, the neural phase-locking to temporal fine structure (TFS) in particular, is debated. Although hypothesized, it is unclear whether auditory perceptual deficits in certain clinical populations are attributable to deficits in TFS coding. Efforts to uncover the role of TFS have been impeded by the fact that there are no established assays for quantifying the fidelity of TFS coding at the individual level. While many candidates have been proposed, for an assay to be useful, it should not only intrinsically depend on TFS coding, but should also have the property that individual differences in the assay reflect TFS coding per se over and beyond other sources of variance. Here, we evaluate a range of behavioral and electroencephalogram (EEG)-based measures as candidate individualized measures of TFS sensitivity. Our comparisons of behavioral and EEG-based metrics suggest that extraneous variables dominate both behavioral scores and EEG amplitude metrics, rendering them ineffective. After adjusting behavioral scores using lapse rates, and extracting latency or percent-growth metrics from EEG, interaural timing sensitivity measures exhibit robust behavior-EEG correlations. Together with the fact that unambiguous theoretical links can be made relating binaural measures and phase-locking to TFS, our results suggest that these "adjusted" binaural assays may be well suited for quantifying individual TFS processing.
Collapse
|
10
|
Zheng Z, Li K, Feng G, Guo Y, Li Y, Xiao L, Liu C, He S, Zhang Z, Qian D, Feng Y. Relative Weights of Temporal Envelope Cues in Different Frequency Regions for Mandarin Vowel, Consonant, and Lexical Tone Recognition. Front Neurosci 2021; 15:744959. [PMID: 34924928 PMCID: PMC8678109 DOI: 10.3389/fnins.2021.744959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Accepted: 11/15/2021] [Indexed: 12/04/2022] Open
Abstract
Objectives: Mandarin-speaking users of cochlear implants (CI) perform poorer than their English counterpart. This may be because present CI speech coding schemes are largely based on English. This study aims to evaluate the relative contributions of temporal envelope (E) cues to Mandarin phoneme (including vowel, and consonant) and lexical tone recognition to provide information for speech coding schemes specific to Mandarin. Design: Eleven normal hearing subjects were studied using acoustic temporal E cues that were extracted from 30 continuous frequency bands between 80 and 7,562 Hz using the Hilbert transform and divided into five frequency regions. Percent-correct recognition scores were obtained with acoustic E cues presented in three, four, and five frequency regions and their relative weights calculated using the least-square approach. Results: For stimuli with three, four, and five frequency regions, percent-correct scores for vowel recognition using E cues were 50.43–84.82%, 76.27–95.24%, and 96.58%, respectively; for consonant recognition 35.49–63.77%, 67.75–78.87%, and 87.87%; for lexical tone recognition 60.80–97.15%, 73.16–96.87%, and 96.73%. For frequency region 1 to frequency region 5, the mean weights in vowel recognition were 0.17, 0.31, 0.22, 0.18, and 0.12, respectively; in consonant recognition 0.10, 0.16, 0.18, 0.23, and 0.33; in lexical tone recognition 0.38, 0.18, 0.14, 0.16, and 0.14. Conclusion: Regions that contributed most for vowel recognition was Region 2 (502–1,022 Hz) that contains first formant (F1) information; Region 5 (3,856–7,562 Hz) contributed most to consonant recognition; Region 1 (80–502 Hz) that contains fundamental frequency (F0) information contributed most to lexical tone recognition.
Collapse
Affiliation(s)
- Zhong Zheng
- Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China.,Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai, China
| | - Keyi Li
- Sydney Institute of Language and Commerce, Shanghai University, Shanghai, China
| | - Gang Feng
- Department of Graduate, The First Affiliated Hospital of Jinzhou Medical University, Jinzhou, China
| | - Yang Guo
- Ear, Nose, and Throat Institute and Otorhinolaryngology Department, Eye and ENT Hospital of Fudan University, Shanghai, China
| | - Yinan Li
- Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China.,Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai, China
| | - Lili Xiao
- Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China.,Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai, China
| | - Chengqi Liu
- Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China.,Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai, China
| | - Shouhuan He
- Department of Otolaryngology, Qingpu Branch of Zhongshan Hospital Affiliated to Fudan University, Shanghai, China
| | - Zhen Zhang
- Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China.,Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai, China
| | - Di Qian
- Department of Otolaryngology, Shenzhen Longhua District People's Hospital, Shenzhen, China
| | - Yanmei Feng
- Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China.,Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai, China
| |
Collapse
|
11
|
Jenson D. Audiovisual incongruence differentially impacts left and right hemisphere sensorimotor oscillations: Potential applications to production. PLoS One 2021; 16:e0258335. [PMID: 34618866 PMCID: PMC8496780 DOI: 10.1371/journal.pone.0258335] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2020] [Accepted: 09/26/2021] [Indexed: 11/21/2022] Open
Abstract
Speech production gives rise to distinct auditory and somatosensory feedback signals which are dynamically integrated to enable online monitoring and error correction, though it remains unclear how the sensorimotor system supports the integration of these multimodal signals. Capitalizing on the parity of sensorimotor processes supporting perception and production, the current study employed the McGurk paradigm to induce multimodal sensory congruence/incongruence. EEG data from a cohort of 39 typical speakers were decomposed with independent component analysis to identify bilateral mu rhythms; indices of sensorimotor activity. Subsequent time-frequency analyses revealed bilateral patterns of event related desynchronization (ERD) across alpha and beta frequency ranges over the time course of perceptual events. Right mu activity was characterized by reduced ERD during all cases of audiovisual incongruence, while left mu activity was attenuated and protracted in McGurk trials eliciting sensory fusion. Results were interpreted to suggest distinct hemispheric contributions, with right hemisphere mu activity supporting a coarse incongruence detection process and left hemisphere mu activity reflecting a more granular level of analysis including phonological identification and incongruence resolution. Findings are also considered in regard to incongruence detection and resolution processes during production.
Collapse
Affiliation(s)
- David Jenson
- Department of Speech and Hearing Sciences, Washington State University, Spokane, Washington, United States of America
| |
Collapse
|
12
|
Viswanathan V, Shinn-Cunningham BG, Heinz MG. Temporal fine structure influences voicing confusions for consonant identification in multi-talker babble. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:2664. [PMID: 34717498 PMCID: PMC8514254 DOI: 10.1121/10.0006527] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Revised: 09/07/2021] [Accepted: 09/09/2021] [Indexed: 05/17/2023]
Abstract
To understand the mechanisms of speech perception in everyday listening environments, it is important to elucidate the relative contributions of different acoustic cues in transmitting phonetic content. Previous studies suggest that the envelope of speech in different frequency bands conveys most speech content, while the temporal fine structure (TFS) can aid in segregating target speech from background noise. However, the role of TFS in conveying phonetic content beyond what envelopes convey for intact speech in complex acoustic scenes is poorly understood. The present study addressed this question using online psychophysical experiments to measure the identification of consonants in multi-talker babble for intelligibility-matched intact and 64-channel envelope-vocoded stimuli. Consonant confusion patterns revealed that listeners had a greater tendency in the vocoded (versus intact) condition to be biased toward reporting that they heard an unvoiced consonant, despite envelope and place cues being largely preserved. This result was replicated when babble instances were varied across independent experiments, suggesting that TFS conveys voicing information beyond what is conveyed by envelopes for intact speech in babble. Given that multi-talker babble is a masker that is ubiquitous in everyday environments, this finding has implications for the design of assistive listening devices such as cochlear implants.
Collapse
Affiliation(s)
- Vibha Viswanathan
- Weldon School of Biomedical Engineering, Purdue University, West Lafayette, Indiana 47907, USA
| | | | - Michael G. Heinz
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana 47907, USA
| |
Collapse
|
13
|
Zheng Z, Li K, Guo Y, Wang X, Xiao L, Liu C, He S, Feng G, Feng Y. The Relative Weight of Temporal Envelope Cues in Different Frequency Regions for Mandarin Disyllabic Word Recognition. Front Neurosci 2021; 15:670192. [PMID: 34335156 PMCID: PMC8320289 DOI: 10.3389/fnins.2021.670192] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Accepted: 06/14/2021] [Indexed: 11/13/2022] Open
Abstract
Objectives Acoustic temporal envelope (E) cues containing speech information are distributed across all frequency spectra. To provide a theoretical basis for the signal coding of hearing devices, we examined the relative weight of E cues in different frequency regions for Mandarin disyllabic word recognition in quiet. Design E cues were extracted from 30 continuous frequency bands within the range of 80 to 7,562 Hz using Hilbert decomposition and assigned to five frequency regions from low to high. Disyllabic word recognition of 20 normal-hearing participants were obtained using the E cues available in two, three, or four frequency regions. The relative weights of the five frequency regions were calculated using least-squares approach. Results Participants correctly identified 3.13-38.13%, 27.50-83.13%, or 75.00-93.13% of words when presented with two, three, or four frequency regions, respectively. Increasing the number of frequency region combinations improved recognition scores and decreased the magnitude of the differences in scores between combinations. This suggested a synergistic effect among E cues from different frequency regions. The mean weights of E cues of frequency regions 1-5 were 0.31, 0.19, 0.26, 0.22, and 0.02, respectively. Conclusion For Mandarin disyllabic words, E cues of frequency regions 1 (80-502 Hz) and 3 (1,022-1,913 Hz) contributed more to word recognition than other regions, while frequency region 5 (3,856-7,562) contributed little.
Collapse
Affiliation(s)
- Zhong Zheng
- Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China.,Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai, China
| | - Keyi Li
- Sydney Institute of Language and Commerce, Shanghai University, Shanghai, China
| | - Yang Guo
- Ear, Nose, and Throat Institute and Otorhinolaryngology Department, Eye and ENT Hospital of Fudan University, Shanghai, China
| | - Xinrong Wang
- Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China.,Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai, China
| | - Lili Xiao
- Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China.,Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai, China
| | - Chengqi Liu
- Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China.,Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai, China
| | - Shouhuan He
- Department of Otolaryngology, Qingpu Branch of Zhongshan Hospital Affiliated to Fudan University, Shanghai, China
| | - Gang Feng
- The First Affiliated Hospital of Jinzhou Medical University, Jinzhou, China
| | - Yanmei Feng
- Department of Otolaryngology-Head and Neck Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China.,Shanghai Key Laboratory of Sleep Disordered Breathing, Shanghai, China
| |
Collapse
|
14
|
Varnet L, Léger AC, Boucher S, Bonnet C, Petit C, Lorenzi C. Contributions of Age-Related and Audibility-Related Deficits to Aided Consonant Identification in Presbycusis: A Causal-Inference Analysis. Front Aging Neurosci 2021; 13:640522. [PMID: 33732140 PMCID: PMC7956988 DOI: 10.3389/fnagi.2021.640522] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Accepted: 02/08/2021] [Indexed: 12/05/2022] Open
Abstract
The decline of speech intelligibility in presbycusis can be regarded as resulting from the combined contribution of two main groups of factors: (1) audibility-related factors and (2) age-related factors. In particular, there is now an abundant scientific literature on the crucial role of suprathreshold auditory abilities and cognitive functions, which have been found to decline with age even in the absence of audiometric hearing loss. However, researchers investigating the direct effect of aging in presbycusis have to deal with the methodological issue that age and peripheral hearing loss covary to a large extent. In the present study, we analyzed a dataset of consonant-identification scores measured in quiet and in noise for a large cohort (n = 459, age = 42-92) of hearing-impaired (HI) and normal-hearing (NH) listeners. HI listeners were provided with a frequency-dependent amplification adjusted to their audiometric profile. Their scores in the two conditions were predicted from their pure-tone average (PTA) and age, as well as from their Extended Speech Intelligibility Index (ESII), a measure of the impact of audibility loss on speech intelligibility. We relied on a causal-inference approach combined with Bayesian modeling to disentangle the direct causal effects of age and audibility on intelligibility from the indirect effect of age on hearing loss. The analysis revealed that the direct effect of PTA on HI intelligibility scores was 5 times higher than the effect of age. This overwhelming effect of PTA was not due to a residual audibility loss despite amplification, as confirmed by a ESII-based model. More plausibly, the marginal role of age could be a consequence of the relatively little cognitively-demanding task used in this study. Furthermore, the amount of variance in intelligibility scores was smaller for NH than HI listeners, even after accounting for age and audibility, reflecting the presence of additional suprathreshold deficits in the latter group. Although the non-sense-syllable materials and the particular amplification settings used in this study potentially restrict the generalization of the findings, we think that these promising results call for a wider use of causal-inference analysis in audiology, e.g., as a way to disentangle the influence of the various cognitive factors and suprathreshold deficits associated to presbycusis.
Collapse
Affiliation(s)
- Léo Varnet
- Laboratoire des Systèmes Perceptifs, UMR CNRS 8248, Département d'Études Cognitives, École normale supérieure, Université Paris Sciences & Lettres, Paris, France
| | - Agnès C. Léger
- Manchester Centre for Audiology and Deafness, Division of Human Communication, Development & Hearing, School of Health Sciences, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, University of Manchester, Manchester, United Kingdom
| | - Sophie Boucher
- Complexité du Vivant, Sorbonne Universités, Université Pierre et Marie Curie, Université Paris VI, Paris, France
- Institut de l'Audition, Institut Pasteur, INSERM, Paris, France
- Centre Hospitalier Universitaire d'Angers, Angers, France
| | - Crystel Bonnet
- Complexité du Vivant, Sorbonne Universités, Université Pierre et Marie Curie, Université Paris VI, Paris, France
- Institut de l'Audition, Institut Pasteur, INSERM, Paris, France
| | - Christine Petit
- Institut de l'Audition, Institut Pasteur, INSERM, Paris, France
- Collège de France, Paris, France
| | - Christian Lorenzi
- Laboratoire des Systèmes Perceptifs, UMR CNRS 8248, Département d'Études Cognitives, École normale supérieure, Université Paris Sciences & Lettres, Paris, France
| |
Collapse
|
15
|
Cabrera L, Halliday LF. Relationship between sensitivity to temporal fine structure and spoken language abilities in children with mild-to-moderate sensorineural hearing loss. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:3334. [PMID: 33261401 PMCID: PMC7613189 DOI: 10.1121/10.0002669] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Children with sensorineural hearing loss show considerable variability in spoken language outcomes. The present study tested whether specific deficits in supra-threshold auditory perception might contribute to this variability. In a previous study by Halliday, Rosen, Tuomainen, and Calcus [(2019). J. Acoust. Soc. Am. 146, 4299], children with mild-to-moderate sensorineural hearing loss (MMHL) were shown to perform more poorly than those with normal hearing (NH) on measures designed to assess sensitivity to the temporal fine structure (TFS; the rapid oscillations in the amplitude of narrowband signals over short time intervals). However, they performed within normal limits on measures assessing sensitivity to the envelope (E; the slow fluctuations in the overall amplitude). Here, individual differences in unaided sensitivity to the TFS accounted for significant variance in the spoken language abilities of children with MMHL after controlling for nonverbal intelligence quotient, family history of language difficulties, and hearing loss severity. Aided sensitivity to the TFS and E cues was equally important for children with MMHL, whereas for children with NH, E cues were more important. These findings suggest that deficits in TFS perception may contribute to the variability in spoken language outcomes in children with sensorineural hearing loss.
Collapse
Affiliation(s)
- Laurianne Cabrera
- Integrative Neuroscience and Cognition Center, CNRS-Université de Paris, Paris, 75006, France
| | - Lorna F. Halliday
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, United Kingdom
| |
Collapse
|
16
|
Schiller IS, Morsomme D, Kob M, Remacle A. Noise and a Speaker's Impaired Voice Quality Disrupt Spoken Language Processing in School-Aged Children: Evidence From Performance and Response Time Measures. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2020; 63:2115-2131. [PMID: 32569506 DOI: 10.1044/2020_jslhr-19-00348] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Purpose Our aim was to investigate isolated and combined effects of speech-shaped noise (SSN) and a speaker's impaired voice quality on spoken language processing in first-grade children. Method In individual examinations, 53 typically developing children aged 5-6 years performed a speech perception task (phoneme discrimination) and a listening comprehension task (sentence-picture matching). Speech stimuli were randomly presented in a 2 × 2 factorial design with the factors noise (no added noise vs. SSN at 0- dB SNR) and voice quality (normal voice vs. impaired voice). Outcome measures were task performance and response time (RT). Results SSN and impaired voice quality significantly lowered children's performance and increased RTs in the speech perception task, particularly when combined. Regarding listening comprehension, a significant interaction between noise and voice quality indicated that children's performance was hindered by SSN when the speaker's voice was impaired but not when it was normal. RTs in this task were unaffected by noise or voice quality. Conclusions Results suggest that speech signal degradations caused by a speaker's impaired voice and background noise generate more processing errors and increased listening effort in young school-aged children. This finding is vital for classroom listening and highlights the importance of ensuring teachers' vocal health and adequate room acoustics.
Collapse
Affiliation(s)
- Isabel S Schiller
- Faculty of Psychology, Speech Therapy, and Education Sciences, University of Liège, Belgium
| | - Dominique Morsomme
- Faculty of Psychology, Speech Therapy, and Education Sciences, University of Liège, Belgium
| | - Malte Kob
- Erich Thienhaus Institute, Detmold University of Music, Germany
| | - Angélique Remacle
- Faculty of Psychology, Speech Therapy, and Education Sciences, University of Liège, Belgium
- Fund for Scientific Research (F.R.S. - FNRS), Brussels, Belgium
| |
Collapse
|
17
|
Abstract
OBJECTIVES Adults can use slow temporal envelope cues, or amplitude modulation (AM), to identify speech sounds in quiet. Faster AM cues and the temporal fine structure, or frequency modulation (FM), play a more important role in noise. This study assessed whether fast and slow temporal modulation cues play a similar role in infants' speech perception by comparing the ability of normal-hearing 3-month-olds and adults to use slow temporal envelope cues in discriminating consonants contrasts. DESIGN English consonant-vowel syllables differing in voicing or place of articulation were processed by 2 tone-excited vocoders to replace the original FM cues with pure tones in 32 frequency bands. AM cues were extracted in each frequency band with 2 different cutoff frequencies, 256 or 8 Hz. Discrimination was assessed for infants and adults using an observer-based testing method, in quiet or in a speech-shaped noise. RESULTS For infants, the effect of eliminating fast AM cues was the same in quiet and in noise: a high proportion of infants discriminated when both fast and slow AM cues were available, but less than half of the infants also discriminated when only slow AM cues were preserved. For adults, the effect of eliminating fast AM cues was greater in noise than in quiet: All adults discriminated in quiet whether or not fast AM cues were available, but in noise eliminating fast AM cues reduced the percentage of adults reaching criterion from 71 to 21%. CONCLUSIONS In quiet, infants seem to depend on fast AM cues more than adults do. In noise, adults seem to depend on FM cues to a greater extent than infants do. However, infants and adults are similarly affected by a loss of fast AM cues in noise. Experience with the native language seems to change the relative importance of different acoustic cues for speech perception.
Collapse
|
18
|
Effects of Various Extents of High-Frequency Hearing Loss on Speech Recognition and Gap Detection at Low Frequencies in Patients with Sensorineural Hearing Loss. Neural Plast 2018; 2017:8941537. [PMID: 29445551 PMCID: PMC5763132 DOI: 10.1155/2017/8941537] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2017] [Accepted: 11/07/2017] [Indexed: 12/03/2022] Open
Abstract
This study explored whether the time-compressed speech perception varied with the degree of hearing loss in high-frequency sensorineural hearing loss (HF SNHL) individuals. 65 HF SNHL individuals with different cutoff frequencies were recruited and further divided into mildly, moderately, and/or severely affected subgroups in terms of the averaged thresholds of all frequencies exhibiting hearing loss. Time-compressed speech recognition scores under both quiet and noisy conditions and gap detection thresholds within low frequencies that had normal thresholds were obtained from all patients and compared with data from 11 age-matched individuals with normal hearing threshold at all frequencies. Correlations of the time-compressed speech recognition scores with the extents of HF SNHL and with the 1 kHz gap detection thresholds were studied across all participants. We found that the time-compressed speech recognition scores were significantly affected by and correlated with the extents of HF SNHL. The time-compressed speech recognition scores also correlated with the 1 kHz gap detection thresholds except when the compression ratio of speech was 0.8 under quiet condition. Above all, the extents of HF SNHL were significantly correlated with the 1 kHz gap thresholds.
Collapse
|
19
|
Sheft S, Cheng MY, Shafiro V. Discrimination of Stochastic Frequency Modulation by Cochlear Implant Users. J Am Acad Audiol 2018; 26:572-81. [PMID: 26134724 DOI: 10.3766/jaaa.14067] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
BACKGROUND Past work has shown that low-rate frequency modulation (FM) may help preserve signal coherence, aid segmentation at word and syllable boundaries, and benefit speech intelligibility in the presence of a masker. PURPOSE This study evaluated whether difficulties in speech perception by cochlear implant (CI) users relate to a deficit in the ability to discriminate among stochastic low-rate patterns of FM. RESEARCH DESIGN RESEARCH DESIGN This is a correlational study assessing the association between the ability to discriminate stochastic patterns of low-rate FM and the intelligibility of speech in noise. STUDY SAMPLE Thirteen postlingually deafened adult CI users participated in this study. DATA COLLECTION AND ANALYSIS Using modulators derived from 5-Hz lowpass noise applied to a 1-kHz carrier, thresholds were measured in terms of frequency excursion both in quiet and with a speech-babble masker present, stimulus duration, and signal-to-noise ratio in the presence of a speech-babble masker. Speech perception ability was assessed in the presence of the same speech-babble masker. Relationships were evaluated with Pearson product-moment correlation analysis with correction for family-wise error, and commonality analysis to determine the unique and common contributions across psychoacoustic variables to the association with speech ability. RESULTS Significant correlations were obtained between masked speech intelligibility and three metrics of FM discrimination involving either signal-to-noise ratio or stimulus duration, with shared variance among the three measures accounting for much of the effect. Compared to past results from young normal-hearing adults and older adults with either normal hearing or a mild-to-moderate hearing loss, mean FM discrimination thresholds obtained from CI users were higher in all conditions. CONCLUSIONS The ability to process the pattern of frequency excursions of stochastic FM may, in part, have a common basis with speech perception in noise. Discrimination of differences in the temporally distributed place coding of the stimulus could serve as this common basis for CI users.
Collapse
Affiliation(s)
- Stanley Sheft
- Department of Communication Disorders and Sciences, Rush University Medical Center, Chicago, IL
| | - Min-Yu Cheng
- Department of Communication Disorders and Sciences, Rush University Medical Center, Chicago, IL
| | - Valeriy Shafiro
- Department of Communication Disorders and Sciences, Rush University Medical Center, Chicago, IL
| |
Collapse
|
20
|
Chang SA, Won JH, Kim H, Oh SH, Tyler RS, Cho CH. Frequency-Limiting Effects on Speech and Environmental Sound Identification for Cochlear Implant and Normal Hearing Listeners. J Audiol Otol 2018; 22:28-38. [PMID: 29325391 PMCID: PMC5784366 DOI: 10.7874/jao.2017.00178] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Revised: 10/09/2017] [Accepted: 10/17/2017] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND AND OBJECTIVES It is important to understand the frequency region of cues used, and not used, by cochlear implant (CI) recipients. Speech and environmental sound recognition by individuals with CI and normal-hearing (NH) was measured. Gradients were also computed to evaluate the pattern of change in identification performance with respect to the low-pass filtering or high-pass filtering cutoff frequencies. SUBJECTS AND METHODS Frequency-limiting effects were implemented in the acoustic waveforms by passing the signals through low-pass filters (LPFs) or high-pass filters (HPFs) with seven different cutoff frequencies. Identification of Korean vowels and consonants produced by a male and female speaker and environmental sounds was measured. Crossover frequencies were determined for each identification test, where the LPF and HPF conditions show the identical identification scores. RESULTS CI and NH subjects showed changes in identification performance in a similar manner as a function of cutoff frequency for the LPF and HPF conditions, suggesting that the degraded spectral information in the acoustic signals may similarly constraint the identification performance for both subject groups. However, CI subjects were generally less efficient than NH subjects in using the limited spectral information for speech and environmental sound identification due to the inefficient coding of acoustic cues through the CI sound processors. CONCLUSIONS This finding will provide vital information in Korean for understanding how different the frequency information is in receiving speech and environmental sounds by CI processor from normal hearing.
Collapse
Affiliation(s)
- Son-A Chang
- Department of Otolaryngology-Head and Neck Surgery, Seoul National University Hospital, Seoul, Korea
| | - Jong Ho Won
- Department of Audiology and Speech Pathology, University of Tennessee Health Science Center, Knoxville, TN, USA
| | - HyangHee Kim
- Graduate Program of Speech and Language Pathology, Department and Research Institute of Rehabilitation Medicine, Yonsei University College of Medicine, Seoul, Korea
| | - Seung-Ha Oh
- Department of Otolaryngology-Head and Neck Surgery, Seoul National University Hospital, Seoul, Korea
| | - Richard S Tyler
- Department of Otolaryngology-Head and Neck Surgery, University of Iowa Hospitals and Clinics, Iowa City, IA, USA
| | - Chang Hyun Cho
- Department of Otolaryngology-Head and Neck Surgery, Gachon University Gil Medical Center, Incheon, Korea
| |
Collapse
|
21
|
The Relative Weight of Temporal Envelope Cues in Different Frequency Regions for Mandarin Sentence Recognition. Neural Plast 2017; 2017:7416727. [PMID: 28203463 PMCID: PMC5288535 DOI: 10.1155/2017/7416727] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2016] [Accepted: 01/04/2017] [Indexed: 11/17/2022] Open
Abstract
Acoustic temporal envelope (E) cues containing speech information are distributed across the frequency spectrum. To investigate the relative weight of E cues in different frequency regions for Mandarin sentence recognition, E information was extracted from 30 contiguous bands across the range of 80–7,562 Hz using Hilbert decomposition and then allocated to five frequency regions. Recognition scores were obtained with acoustic E cues from 1 or 2 random regions from 40 normal-hearing listeners. While the recognition scores ranged from 8.2% to 16.3% when E information from only one region was available, the scores ranged from 57.9% to 87.7% when E information from two frequency regions was presented, suggesting a synergistic effect among the temporal E cues in different frequency regions. Next, the relative contributions of the E information from the five frequency regions to sentence perception were computed using a least-squares approach. The results demonstrated that, for Mandarin Chinese, a tonal language, the temporal E cues of Frequency Region 1 (80–502 Hz) and Region 3 (1,022–1,913 Hz) contributed more to the intelligence of sentence recognition than other regions, particularly the region of 80–502 Hz, which contained fundamental frequency (F0) information.
Collapse
|
22
|
Marmel F, Plack CJ, Hopkins K. The role of excitation-pattern cues in the detection of frequency shifts in bandpass-filtered complex tones. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 137:2687-97. [PMID: 25994700 PMCID: PMC5044982 DOI: 10.1121/1.4919315] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
One task intended to measure sensitivity to temporal fine structure (TFS) involves the discrimination of a harmonic complex tone from a tone in which all harmonics are shifted upwards by the same amount in hertz. Both tones are passed through a fixed bandpass filter centered on the high harmonics to reduce the availability of excitation-pattern cues and a background noise is used to mask combination tones. The role of frequency selectivity in this "TFS1" task was investigated by varying level. Experiment 1 showed that listeners performed more poorly at a high level than at a low level. Experiment 2 included intermediate levels and showed that performance deteriorated for levels above about 57 dB sound pressure level. Experiment 3 estimated the magnitude of excitation-pattern cues from the variation in forward masking of a pure tone as a function of frequency shift in the complex tones. There was negligible variation, except for the lowest level used. The results indicate that the changes in excitation level at threshold for the TFS1 task would be too small to be usable. The results are consistent with the TFS1 task being performed using TFS cues, and with frequency selectivity having an indirect effect on performance via its influence on TFS cues.
Collapse
Affiliation(s)
- Frederic Marmel
- School of Psychological Sciences, Manchester Academic Health Science Centre, The University of Manchester, Oxford Road, Manchester M13 9PL, United Kingdom
| | - Christopher J. Plack
- School of Psychological Sciences, Manchester Academic Health Science Centre, The University of Manchester, Oxford Road, Manchester M13 9PL, United Kingdom
| | - Kathryn Hopkins
- School of Psychological Sciences, Manchester Academic Health Science Centre, The University of Manchester, Oxford Road, Manchester M13 9PL, United Kingdom
| |
Collapse
|
23
|
Malone BJ, Scott BH, Semple MN. Encoding frequency contrast in primate auditory cortex. J Neurophysiol 2014; 111:2244-63. [PMID: 24598525 DOI: 10.1152/jn.00878.2013] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Changes in amplitude and frequency jointly determine much of the communicative significance of complex acoustic signals, including human speech. We have previously described responses of neurons in the core auditory cortex of awake rhesus macaques to sinusoidal amplitude modulation (SAM) signals. Here we report a complementary study of sinusoidal frequency modulation (SFM) in the same neurons. Responses to SFM were analogous to SAM responses in that changes in multiple parameters defining SFM stimuli (e.g., modulation frequency, modulation depth, carrier frequency) were robustly encoded in the temporal dynamics of the spike trains. For example, changes in the carrier frequency produced highly reproducible changes in shapes of the modulation period histogram, consistent with the notion that the instantaneous probability of discharge mirrors the moment-by-moment spectrum at low modulation rates. The upper limit for phase locking was similar across SAM and SFM within neurons, suggesting shared biophysical constraints on temporal processing. Using spike train classification methods, we found that neural thresholds for modulation depth discrimination are typically far lower than would be predicted from frequency tuning to static tones. This "dynamic hyperacuity" suggests a substantial central enhancement of the neural representation of frequency changes relative to the auditory periphery. Spike timing information was superior to average rate information when discriminating among SFM signals, and even when discriminating among static tones varying in frequency. This finding held even when differences in total spike count across stimuli were normalized, indicating both the primacy and generality of temporal response dynamics in cortical auditory processing.
Collapse
Affiliation(s)
- Brian J Malone
- Department of Otolaryngology-Head and Neck Surgery, University of California, San Francisco, California;
| | - Brian H Scott
- Laboratory of Neuropsychology, National Institute of Mental Health/National Institutes of Health, Bethesda, Maryland; and
| | - Malcolm N Semple
- Center for Neural Science, New York University, New York, New York
| |
Collapse
|
24
|
Cabrera L, Bertoncini J, Lorenzi C. Perception of speech modulation cues by 6-month-old infants. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2013; 56:1733-1744. [PMID: 24023378 DOI: 10.1044/1092-4388(2013/12-0169)] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
PURPOSE The capacity of 6-month-old infants to discriminate a voicing contrast (/aba/-/apa/) on the basis of amplitude modulation (AM) cues and frequency modulation (FM) cues was evaluated. METHOD Several vocoded speech conditions were designed to either degrade FM cues in 4 or 32 bands or degrade AM in 32 bands. Infants were familiarized to the vocoded stimuli for a period of either 1 or 2 min. Vocoded speech discrimination was assessed using the head-turn preference procedure. RESULTS Infants discriminated /aba/ from /apa/ in each condition; however, familiarization time was found to influence strongly infants' responses (i.e., their preference for novel vs. familiar stimuli). CONCLUSIONS Six-month-old infants do not require FM cues and can use the slowest (< 16 Hz) AM cues to discriminate voicing. Moreover, 6-month-old infants can use AM cues extracted from only 4 broad-frequency bands to discriminate voicing.
Collapse
|
25
|
Rickard NA, Heidtke UJ, O'Beirne GA. Assessment of auditory processing disorder in children using an adaptive filtered speech test. Int J Audiol 2013; 52:687-97. [PMID: 23879742 DOI: 10.3109/14992027.2013.802380] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
OBJECTIVE One type of test commonly used to assess auditory processing disorder (APD) is the 'filtered words test' (FWT), in which a monaural, low-redundancy speech sample is distorted by using filtering to modify its frequency content. One limitation of the various existing FWTs is that they are performed using a constant level of low-pass filtering, making them prone to ceiling and floor effects that compromise their efficiency and accuracy. A recently developed computer-based test, the University of Canterbury Adaptive Speech Test- Filtered Words (UCAST-FW), uses an adaptive procedure intended to improve the efficiency and sensitivity of the test over its constant-level counterparts. DESIGN The UCAST-FW was administered to school-aged children to investigate the ability of the test to distinguish between children with and without APD. STUDY SAMPLE Fifteen children aged 7-13 diagnosed with APD, and an aged-matched control group of 10 children with no history of listening difficulties. RESULTS Data obtained demonstrates a significant difference between the UCAST-FW results obtained by children with APD and those with normal auditory processing. CONCLUSIONS These findings provide evidence that the UCAST-FW may discriminate between children with and without APD with greater sensitivity than its constant-level counterparts.
Collapse
Affiliation(s)
- Natalie A Rickard
- * Department of Communication Disorders, University of Canterbury , Christchurch , New Zealand
| | | | | |
Collapse
|
26
|
Shamma S, Lorenzi C. On the balance of envelope and temporal fine structure in the encoding of speech in the early auditory system. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 133:2818-33. [PMID: 23654388 PMCID: PMC3663870 DOI: 10.1121/1.4795783] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
There is much debate on how the spectrotemporal modulations of speech (or its spectrogram) are encoded in the responses of the auditory nerve, and whether speech intelligibility is best conveyed via the "envelope" (E) or "temporal fine-structure" (TFS) of the neural responses. Wide use of vocoders to resolve this question has commonly assumed that manipulating the amplitude-modulation and frequency-modulation components of the vocoded signal alters the relative importance of E or TFS encoding on the nerve, thus facilitating assessment of their relative importance to intelligibility. Here we argue that this assumption is incorrect, and that the vocoder approach is ineffective in differentially altering the neural E and TFS. In fact, we demonstrate using a simplified model of early auditory processing that both neural E and TFS encode the speech spectrogram with constant and comparable relative effectiveness regardless of the vocoder manipulations. However, we also show that neural TFS cues are less vulnerable than their E counterparts under severe noisy conditions, and hence should play a more prominent role in cochlear stimulation strategies.
Collapse
Affiliation(s)
- Shihab Shamma
- Electrical and Computer Engineering Department and Institute for Systems Research, University of Maryland, College Park, Maryland 20742, USA.
| | | |
Collapse
|
27
|
Oxenham AJ. Revisiting place and temporal theories of pitch. ACOUSTICAL SCIENCE AND TECHNOLOGY 2013; 34:388-396. [PMID: 25364292 PMCID: PMC4215732 DOI: 10.1250/ast.34.388] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
The nature of pitch and its neural coding have been studied for over a century. A popular debate has revolved around the question of whether pitch is coded via "place" cues in the cochlea, or via timing cues in the auditory nerve. In the most recent incarnation of this debate, the role of temporal fine structure has been emphasized in conveying important pitch and speech information, particularly because the lack of temporal fine structure coding in cochlear implants might explain some of the difficulties faced by cochlear implant users in perceiving music and pitch contours in speech. In addition, some studies have postulated that hearing-impaired listeners may have a specific deficit related to processing temporal fine structure. This article reviews some of the recent literature surrounding the debate, and argues that much of the recent evidence suggesting the importance of temporal fine structure processing can also be accounted for using spectral (place) or temporal-envelope cues.
Collapse
Affiliation(s)
- Andrew J. Oxenham
- Department of Psychology, University of Minnesota, Elliott Hall, N218, 75 East River Parkway, Minneapolis, MN 55455, USA
| |
Collapse
|
28
|
Lorenzi C, Wallaert N, Gnansia D, Leger AC, Ives DT, Chays A, Garnier S, Cazals Y. Temporal-envelope reconstruction for hearing-impaired listeners. J Assoc Res Otolaryngol 2012; 13:853-65. [PMID: 23007719 PMCID: PMC3505588 DOI: 10.1007/s10162-012-0350-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2011] [Accepted: 09/09/2012] [Indexed: 10/27/2022] Open
Abstract
Recent studies suggest that normal-hearing listeners maintain robust speech intelligibility despite severe degradations of amplitude-modulation (AM) cues, by using temporal-envelope information recovered from broadband frequency-modulation (FM) speech cues at the output of cochlear filters. This study aimed to assess whether cochlear damage affects this capacity to reconstruct temporal-envelope information from FM. This was achieved by measuring the ability of 40 normal-hearing listeners and 41 listeners with mild-to-moderate hearing loss to identify syllables processed to degrade AM cues while leaving FM cues intact within three broad frequency bands spanning the range 65-3,645 Hz. Stimuli were presented at 65 dB SPL for both normal-hearing listeners and hearing-impaired listeners. They were presented as such or amplified using a modified half-gain rule for hearing-impaired listeners. Hearing-impaired listeners showed significantly poorer identification scores than normal-hearing listeners at both presentation levels. However, the deficit shown by hearing-impaired listeners for amplified stimuli was relatively modest. Overall, hearing-impaired data and the results of a simulation study were consistent with a poorer-than-normal ability to reconstruct temporal-envelope information resulting from a broadening of cochlear filters by a factor ranging from 2 to 4. These results suggest that mild-to-moderate cochlear hearing loss has only a modest detrimental effect on peripheral, temporal-envelope reconstruction mechanisms.
Collapse
Affiliation(s)
- Christian Lorenzi
- Equipe Audition (CNRS, Universite Paris Descartes, Ecole normale superieure), Institut d'Etude de la Cognition, Ecole normale superieure, Paris Sciences et Lettres, 29 rue d'Ulm, 75005 Paris, France.
| | | | | | | | | | | | | | | |
Collapse
|
29
|
Psychophysiological analyses demonstrate the importance of neural envelope coding for speech perception in noise. J Neurosci 2012; 32:1747-56. [PMID: 22302814 DOI: 10.1523/jneurosci.4493-11.2012] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Understanding speech in noisy environments is often taken for granted; however, this task is particularly challenging for people with cochlear hearing loss, even with hearing aids or cochlear implants. A significant limitation to improving auditory prostheses is our lack of understanding of the neural basis for robust speech perception in noise. Perceptual studies suggest the slowly varying component of the acoustic waveform (envelope, ENV) is sufficient for understanding speech in quiet, but the rapidly varying temporal fine structure (TFS) is important in noise. These perceptual findings have important implications for cochlear implants, which currently only provide ENV; however, neural correlates have been difficult to evaluate due to cochlear transformations between acoustic TFS and recovered neural ENV. Here, we demonstrate the relative contributions of neural ENV and TFS by quantitatively linking neural coding, predicted from a computational auditory nerve model, with perception of vocoded speech in noise measured from normal hearing human listeners. Regression models with ENV and TFS coding as independent variables predicted speech identification and phonetic feature reception at both positive and negative signal-to-noise ratios. We found that: (1) neural ENV coding was a primary contributor to speech perception, even in noise; and (2) neural TFS contributed in noise mainly in the presence of neural ENV, but rarely as the primary cue itself. These results suggest that neural TFS has less perceptual salience than previously thought due to cochlear signal processing transformations between TFS and ENV. Because these transformations differ between normal and impaired ears, these findings have important translational implications for auditory prostheses.
Collapse
|
30
|
Léger AC, Moore BCJ, Lorenzi C. Temporal and spectral masking release in low- and mid-frequency regions for normal-hearing and hearing-impaired listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 131:1502-14. [PMID: 22352520 DOI: 10.1121/1.3665993] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
"Masking release" (MR), the improvement of speech intelligibility in modulated compared with unmodulated maskers, is typically smaller than normal for hearing-impaired listeners. The extent to which this is due to reduced audibility or to suprathreshold processing deficits is unclear. Here, the effects of audibility were controlled by using stimuli restricted to the low- (≤1.5 kHz) or mid-frequency (1-3 kHz) region for normal-hearing listeners and hearing-impaired listeners with near-normal hearing in the tested region. Previous work suggests that the latter may have suprathreshold deficits. Both spectral and temporal MR were measured. Consonant identification was measured in quiet and in the presence of unmodulated, amplitude-modulated, and spectrally modulated noise at three signal-to-noise ratios (the same ratios for the two groups). For both frequency regions, consonant identification was poorer for the hearing-impaired than for the normal-hearing listeners in all conditions. The results suggest the presence of suprathreshold deficits for the hearing-impaired listeners, despite near-normal audiometric thresholds over the tested frequency regions. However, spectral MR and temporal MR were similar for the two groups. Thus, the suprathreshold deficits for the hearing-impaired group did not lead to reduced MR.
Collapse
Affiliation(s)
- Agnès C Léger
- Equipe Audition, Département d'Etudes Cognitives, École normale supérieure, 29 rue d'Ulm, 75005 Paris, France.
| | | | | |
Collapse
|
31
|
Whitmal NA, DeRoy K. Adaptive bandwidth measurements of importance functions for speech intelligibility prediction. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 130:4032-4043. [PMID: 22225057 PMCID: PMC3253602 DOI: 10.1121/1.3641453] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2010] [Revised: 08/23/2011] [Accepted: 08/30/2011] [Indexed: 05/28/2023]
Abstract
The Articulation Index (AI) and Speech Intelligibility Index (SII) predict intelligibility scores from measurements of speech and hearing parameters. One component in the prediction is the "importance function," a weighting function that characterizes contributions of particular spectral regions of speech to speech intelligibility. Previous work with SII predictions for hearing-impaired subjects suggests that prediction accuracy might improve if importance functions for individual subjects were available. Unfortunately, previous importance function measurements have required extensive intelligibility testing with groups of subjects, using speech processed by various fixed-bandwidth low-pass and high-pass filters. A more efficient approach appropriate to individual subjects is desired. The purpose of this study was to evaluate the feasibility of measuring importance functions for individual subjects with adaptive-bandwidth filters. In two experiments, ten subjects with normal-hearing listened to vowel-consonant-vowel (VCV) nonsense words processed by low-pass and high-pass filters whose bandwidths were varied adaptively to produce specified performance levels in accordance with the transformed up-down rules of Levitt [(1971). J. Acoust. Soc. Am. 49, 467-477]. Local linear psychometric functions were fit to resulting data and used to generate an importance function for VCV words. Results indicate that the adaptive method is reliable and efficient, and produces importance function data consistent with that of the corresponding AI/SII importance function.
Collapse
Affiliation(s)
- Nathaniel A Whitmal
- Department of Communication Disorders, University of Massachusetts, Amherst, Massachusetts 01003, USA.
| | | |
Collapse
|
32
|
Wang S, Xu L, Mannell R. Relative contributions of temporal envelope and fine structure cues to lexical tone recognition in hearing-impaired listeners. J Assoc Res Otolaryngol 2011; 12:783-94. [PMID: 21833816 DOI: 10.1007/s10162-011-0285-0] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2010] [Accepted: 07/25/2011] [Indexed: 11/24/2022] Open
Abstract
It has been reported that normal-hearing Chinese speakers base their lexical tone recognition on fine structure regardless of temporal envelope cues. However, a few psychoacoustic and perceptual studies have demonstrated that listeners with sensorineural hearing impairment may have an impaired ability to use fine structure information, whereas their ability to use temporal envelope information is close to normal. The purpose of this study is to investigate the relative contributions of temporal envelope and fine structure cues to lexical tone recognition in normal-hearing and hearing-impaired native Mandarin Chinese speakers. Twenty-two normal-hearing subjects and 31 subjects with various degrees of sensorineural hearing loss participated in the study. Sixteen sets of Mandarin monosyllables with four tone patterns for each were processed through a "chimeric synthesizer" in which temporal envelope from a monosyllabic word of one tone was paired with fine structure from the same monosyllable of other tones. The chimeric tokens were generated in the three channel conditions (4, 8, and 16 channels). Results showed that differences in tone responses among the three channel conditions were minor. On average, 90.9%, 70.9%, 57.5%, and 38.2% of tone responses were consistent with fine structure for normal-hearing, moderate, moderate to severe, and severely hearing-impaired groups respectively, whereas 6.8%, 21.1%, 31.4%, and 44.7% of tone responses were consistent with temporal envelope cues for the above-mentioned groups. Tone responses that were consistent neither with temporal envelope nor fine structure had averages of 2.3%, 8.0%, 11.1%, and 17.1% for the above-mentioned groups of subjects. Pure-tone average thresholds were negatively correlated with tone responses that were consistent with fine structure, but were positively correlated with tone responses that were based on the temporal envelope cues. Consistent with the idea that the spectral resolvability is responsible for fine structure coding, these results demonstrated that, as hearing loss becomes more severe, lexical tone recognition relies increasingly on temporal envelope rather than fine structure cues due to the widened auditory filters.
Collapse
Affiliation(s)
- Shuo Wang
- Beijing Institute of Otolaryngology, Key Laboratory of Otolaryngology Head and Neck Surgery (Capital Medical University), Ministry of Education, Beijing Tongren Hospital, Capital Medical University, Beijing, People's Republic of China
| | | | | |
Collapse
|
33
|
Ardoint M, Agus T, Sheft S, Lorenzi C. Importance of temporal-envelope speech cues in different spectral regions. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 130:EL115-EL121. [PMID: 21877769 DOI: 10.1121/1.3602462] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
This study investigated the ability to use temporal-envelope (E) cues in a consonant identification task when presented within one or two frequency bands. Syllables were split into five bands spanning the range 70-7300 Hz with each band processed to preserve E cues and degrade temporal fine-structure cues. Identification scores were measured for normal-hearing listeners in quiet for individual processed bands and for pairs of bands. Consistent patterns of results were obtained in both the single- and dual-band conditions: identification scores increased systematically with band center frequency, showing that E cues in the higher bands (1.8-7.3 kHz) convey greater information.
Collapse
Affiliation(s)
- Marine Ardoint
- Universite Paris Descartes, Ecole normale supérieure, Paris, France.
| | | | | | | |
Collapse
|
34
|
Fogerty D. Perceptual weighting of the envelope and fine structure across frequency bands for sentence intelligibility: effect of interruption at the syllabic-rate and periodic-rate of speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 130:489-500. [PMID: 21786914 PMCID: PMC3155597 DOI: 10.1121/1.3592220] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2010] [Accepted: 04/25/2011] [Indexed: 05/16/2023]
Abstract
Listeners often only have fragments of speech available to understand the intended message due to competing background noise. In order to maximize successful speech recognition, listeners must allocate their perceptual resources to the most informative acoustic properties. The speech signal contains temporally-varying acoustics in the envelope and fine structure that are present across the frequency spectrum. Understanding how listeners perceptually weigh these acoustic properties in different frequency regions during interrupted speech is essential for the design of assistive listening devices. This study measured the perceptual weighting of young normal-hearing listeners for the envelope and fine structure in each of three frequency bands for interrupted sentence materials. Perceptual weights were obtained during interruption at the syllabic rate (i.e., 4 Hz) and the periodic rate (i.e., 128 Hz) of speech. Potential interruption interactions with fundamental frequency information were investigated by shifting the natural pitch contour higher relative to the interruption rate. The availability of each acoustic property was varied independently by adding noise at different levels. Perceptual weights were determined by correlating a listener's performance with the availability of each acoustic property on a trial-by-trial basis. Results demonstrated similar relative weights across the interruption conditions, with emphasis on the envelope in high-frequencies.
Collapse
Affiliation(s)
- Daniel Fogerty
- Department of Speech and Hearing Sciences, Indiana University, Bloomington, Indiana 47405, USA.
| |
Collapse
|
35
|
Fogerty D. Perceptual weighting of individual and concurrent cues for sentence intelligibility: frequency, envelope, and fine structure. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 129:977-88. [PMID: 21361454 PMCID: PMC3070991 DOI: 10.1121/1.3531954] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2010] [Revised: 11/19/2010] [Accepted: 12/06/2010] [Indexed: 05/16/2023]
Abstract
The speech signal may be divided into frequency bands, each containing temporal properties of the envelope and fine structure. For maximal speech understanding, listeners must allocate their perceptual resources to the most informative acoustic properties. Understanding this perceptual weighting is essential for the design of assistive listening devices that need to preserve these important speech cues. This study measured the perceptual weighting of young normal-hearing listeners for the envelope and fine structure in each of three frequency bands for sentence materials. Perceptual weights were obtained under two listening contexts: (1) when each acoustic property was presented individually and (2) when multiple acoustic properties were available concurrently. The processing method was designed to vary the availability of each acoustic property independently by adding noise at different levels. Perceptual weights were determined by correlating a listener's performance with the availability of each acoustic property on a trial-by-trial basis. Results demonstrated that weights were (1) equal when acoustic properties were presented individually and (2) biased toward envelope and mid-frequency information when multiple properties were available. Results suggest a complex interaction between the available acoustic properties and the listening context in determining how best to allocate perceptual resources when listening to speech in noise.
Collapse
Affiliation(s)
- Daniel Fogerty
- Department of Speech and Hearing Sciences, Indiana University, Bloomington, Indiana 47405, USA.
| |
Collapse
|
36
|
Envelope coding in auditory nerve fibers following noise-induced hearing loss. J Assoc Res Otolaryngol 2010; 11:657-73. [PMID: 20556628 DOI: 10.1007/s10162-010-0223-6] [Citation(s) in RCA: 97] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2010] [Accepted: 05/25/2010] [Indexed: 10/19/2022] Open
Abstract
Recent perceptual studies suggest that listeners with sensorineural hearing loss (SNHL) have a reduced ability to use temporal fine-structure cues, whereas the effects of SNHL on temporal envelope cues are generally thought to be minimal. Several perceptual studies suggest that envelope coding may actually be enhanced following SNHL and that this effect may actually degrade listening in modulated maskers (e.g., competing talkers). The present study examined physiological effects of SNHL on envelope coding in auditory nerve (AN) fibers in relation to fine-structure coding. Responses were compared between anesthetized chinchillas with normal hearing and those with a mild-moderate noise-induced hearing loss. Temporal envelope coding of narrowband-modulated stimuli (sinusoidally amplitude-modulated tones and single-formant stimuli) was quantified with several neural metrics. The relative strength of envelope and fine-structure coding was compared using shuffled correlogram analyses. On average, the strength of envelope coding was enhanced in noise-exposed AN fibers. A high degree of enhanced envelope coding was observed in AN fibers with high thresholds and very steep rate-level functions, which were likely associated with severe outer and inner hair cell damage. Degradation in fine-structure coding was observed in that the transition between AN fibers coding primarily fine structure or envelope occurred at lower characteristic frequencies following SNHL. This relative fine-structure degradation occurred despite no degradation in the fundamental ability of AN fibers to encode fine structure and did not depend on reduced frequency selectivity. Overall, these data suggest the need to consider the relative effects of SNHL on envelope and fine-structure coding in evaluating perceptual deficits in temporal processing of complex stimuli.
Collapse
|