1
|
Margolis RH, Rao A, Wilson RH, Saly GL. Non-linguistic auditory speech processing. Int J Audiol 2023; 62:217-226. [PMID: 35369837 DOI: 10.1080/14992027.2022.2055654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
OBJECTIVES A method for testing auditory processing of non-linguistic speech-like stimuli was developed and evaluated. DESIGN Monosyllabic words were temporally reversed and distorted. Stimuli were matched for spectrum and level. Listeners discriminated between distorted and undistorted stimuli. STUDY SAMPLE Three groups were tested. The Normal group was comprised of 12 normal-hearing participants. The Senior group was comprised of 12 seniors. The Hearing Loss group was comprised of 12 participants with thresholds of at least 35 dB HL at one or more frequencies. RESULTS The Senior group scored lower than the Normal group, and the Hearing Loss group scored lower than the Senior group. Scores for forward compressed speech were slightly higher than backward compressed speech but the difference was not statistically significant. Retest scores were slightly higher than scores on the first test, but the difference was not statistically significant. CONCLUSIONS Large differences in discrimination of distorted speech were observed among the three groups. Age and hearing loss separately affected performance. The depressed performance of the Senior group may be a result of "hidden hearing loss" that is attributed to cochlear synaptopathy. The backward-distorted speech task may be a useful non-linguistic test of speech processing that is language independent.
Collapse
Affiliation(s)
- Robert H Margolis
- Speech and Hearing Science, College of Health Solutions, Arizona State University, Tempe, Arizona.,Audiology Incorporated, Arden Hills, Minnesota, USA
| | - Aparna Rao
- Speech and Hearing Science, College of Health Solutions, Arizona State University, Tempe, Arizona
| | - Richard H Wilson
- Speech and Hearing Science, College of Health Solutions, Arizona State University, Tempe, Arizona
| | | |
Collapse
|
2
|
Hou L, Xu L. Role of short-time acoustic temporal fine structure cues in sentence recognition for normal-hearing listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:EL127. [PMID: 29495716 PMCID: PMC5820060 DOI: 10.1121/1.5024817] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2017] [Revised: 01/28/2018] [Accepted: 02/03/2018] [Indexed: 06/08/2023]
Abstract
Short-time processing was employed to manipulate the amplitude, bandwidth, and temporal fine structure (TFS) in sentences. Fifty-two native-English-speaking, normal-hearing listeners participated in four sentence-recognition experiments. Results showed that recovered envelope (E) played an important role in speech recognition when the bandwidth was > 1 equivalent rectangular bandwidth. Removing TFS drastically reduced sentence recognition. Preserving TFS greatly improved sentence recognition when amplitude information was available at a rate ≥ 10 Hz (i.e., time segment ≤ 100 ms). Therefore, the short-time TFS facilitates speech perception together with the recovered E and works with the coarse amplitude cues to provide useful information for speech recognition.
Collapse
Affiliation(s)
- Limin Hou
- Communication and Information Engineering, Shanghai University, Shanghai, China
| | - Li Xu
- Communication Sciences and Disorders, Ohio University, Athens, Ohio 45701, USA
| |
Collapse
|
3
|
Xu Y, Chen M, LaFaire P, Tan X, Richter CP. Distorting temporal fine structure by phase shifting and its effects on speech intelligibility and neural phase locking. Sci Rep 2017; 7:13387. [PMID: 29042580 PMCID: PMC5645416 DOI: 10.1038/s41598-017-12975-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2017] [Accepted: 09/13/2017] [Indexed: 11/27/2022] Open
Abstract
Envelope (E) and temporal fine structure (TFS) are important features of acoustic signals and their corresponding perceptual function has been investigated with various listening tasks. To further understand the underlying neural processing of TFS, experiments in humans and animals were conducted to demonstrate the effects of modifying the TFS in natural speech sentences on both speech recognition and neural coding. The TFS of natural speech sentences was modified by distorting the phase and maintaining the magnitude. Speech intelligibility was then tested for normal-hearing listeners using the intact and reconstructed sentences presented in quiet and against background noise. Sentences with modified TFS were then used to evoke neural activity in auditory neurons of the inferior colliculus in guinea pigs. Our study demonstrated that speech intelligibility in humans relied on the periodic cues of speech TFS in both quiet and noisy listening conditions. Furthermore, recordings of neural activity from the guinea pig inferior colliculus have shown that individual auditory neurons exhibit phase locking patterns to the periodic cues of speech TFS that disappear when reconstructed sounds do not show periodic patterns anymore. Thus, the periodic cues of TFS are essential for speech intelligibility and are encoded in auditory neurons by phase locking.
Collapse
Affiliation(s)
- Yingyue Xu
- Northwestern University, Department of Otolaryngology, 320 E. Superior Street, Searle 12-561, Chicago, IL, 60611, USA
| | - Maxin Chen
- Northwestern University, Department of Biomedical Engineering, 2145 Sheridan Road, Tech E310, Evanston, IL, 60208, USA
| | - Petrina LaFaire
- Northwestern University, Department of Otolaryngology, 320 E. Superior Street, Searle 12-561, Chicago, IL, 60611, USA
| | - Xiaodong Tan
- Northwestern University, Department of Otolaryngology, 320 E. Superior Street, Searle 12-561, Chicago, IL, 60611, USA
| | - Claus-Peter Richter
- Northwestern University, Department of Otolaryngology, 320 E. Superior Street, Searle 12-561, Chicago, IL, 60611, USA. .,Northwestern University, The Hugh Knowles Center, Department of Communication Sciences and Disorders, 2240 Campus Drive, Evanston, IL, 60208, USA.
| |
Collapse
|
4
|
Predictions of Speech Chimaera Intelligibility Using Auditory Nerve Mean-Rate and Spike-Timing Neural Cues. J Assoc Res Otolaryngol 2017; 18:687-710. [PMID: 28748487 DOI: 10.1007/s10162-017-0627-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2015] [Accepted: 05/29/2017] [Indexed: 10/19/2022] Open
Abstract
Perceptual studies of speech intelligibility have shown that slow variations of acoustic envelope (ENV) in a small set of frequency bands provides adequate information for good perceptual performance in quiet, whereas acoustic temporal fine-structure (TFS) cues play a supporting role in background noise. However, the implications for neural coding are prone to misinterpretation because the mean-rate neural representation can contain recovered ENV cues from cochlear filtering of TFS. We investigated ENV recovery and spike-time TFS coding using objective measures of simulated mean-rate and spike-timing neural representations of chimaeric speech, in which either the ENV or the TFS is replaced by another signal. We (a) evaluated the levels of mean-rate and spike-timing neural information for two categories of chimaeric speech, one retaining ENV cues and the other TFS; (b) examined the level of recovered ENV from cochlear filtering of TFS speech; (c) examined and quantified the contribution to recovered ENV from spike-timing cues using a lateral inhibition network (LIN); and (d) constructed linear regression models with objective measures of mean-rate and spike-timing neural cues and subjective phoneme perception scores from normal-hearing listeners. The mean-rate neural cues from the original ENV and recovered ENV partially accounted for perceptual score variability, with additional variability explained by the recovered ENV from the LIN-processed TFS speech. The best model predictions of chimaeric speech intelligibility were found when both the mean-rate and spike-timing neural cues were included, providing further evidence that spike-time coding of TFS cues is important for intelligibility when the speech envelope is degraded.
Collapse
|
5
|
Qi B, Mao Y, Liu J, Liu B, Xu L. Relative contributions of acoustic temporal fine structure and envelope cues for lexical tone perception in noise. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:3022. [PMID: 28599529 PMCID: PMC5415402 DOI: 10.1121/1.4982247] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2016] [Revised: 03/21/2017] [Accepted: 04/11/2017] [Indexed: 06/07/2023]
Abstract
Previous studies have shown that lexical tone perception in quiet relies on the acoustic temporal fine structure (TFS) but not on the envelope (E) cues. The contributions of TFS to speech recognition in noise are under debate. In the present study, Mandarin tone tokens were mixed with speech-shaped noise (SSN) or two-talker babble (TTB) at five signal-to-noise ratios (SNRs; -18 to +6 dB). The TFS and E were then extracted from each of the 30 bands using Hilbert transform. Twenty-five combinations of TFS and E from the sound mixtures of the same tone tokens at various SNRs were created. Twenty normal-hearing, native-Mandarin-speaking listeners participated in the tone-recognition test. Results showed that tone-recognition performance improved as the SNRs in either TFS or E increased. The masking effects on tone perception for the TTB were weaker than those for the SSN. For both types of masker, the perceptual weights of TFS and E in tone perception in noise was nearly equivalent, with E playing a slightly greater role than TFS. Thus, the relative contributions of TFS and E cues to lexical tone perception in noise or in competing-talker maskers differ from those in quiet and those to speech perception of non-tonal languages.
Collapse
Affiliation(s)
- Beier Qi
- Department of Otolaryngology-Head and Neck Surgery, Beijing Tongren Hospital, Capital Medical University, Beijing, China
| | - Yitao Mao
- Department of Radiology, Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Jiaxing Liu
- Department of Otolaryngology-Head and Neck Surgery, Beijing Tongren Hospital, Capital Medical University, Beijing, China
| | - Bo Liu
- Department of Otolaryngology-Head and Neck Surgery, Beijing Tongren Hospital, Capital Medical University, Beijing, China
| | - Li Xu
- Communication Sciences and Disorders, Ohio University, Athens, Ohio 45701, USA
| |
Collapse
|
6
|
Nambi PMA, Mahajan Y, Francis N, Bhat JS. Temporal fine structure mediated recognition of speech in the presence of multitalker babble. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:EL296. [PMID: 27794309 DOI: 10.1121/1.4964416] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
This experiment investigated the mechanisms of temporal fine structure (TFS) mediated speech recognition in multi-talker babble. The signal-to-noise ratio 50 (SNR-50) for naive-listeners was measured when the TFS was retained in its original form (ORIG-TFS), the TFS was time reversed (REV-TFS), and the TFS was replaced by noise (NO-TFS). The original envelope was unchanged. In the REV-TFS condition, periodicity cues for stream segregation were preserved, but envelope recovery was compromised. Both the mechanisms were compromised in the NO-TFS condition. The SNR-50 was lowest for ORIG-TFS followed by REV-TFS, which was lower than NO-TFS. Results suggest both stream segregation and envelope recovery aided TFS mediated speech recognition.
Collapse
Affiliation(s)
- Pitchai Muthu Arivudai Nambi
- Department of Audiology and Speech Language Pathology, Kasturba Medical College (Manipal University), Mangalore, India
| | - Yatin Mahajan
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Sydney Australia , , , and
| | - Nikita Francis
- Department of Audiology and Speech Language Pathology, Kasturba Medical College (Manipal University), Mangalore, India
| | - Jayashree S Bhat
- Department of Audiology and Speech Language Pathology, Kasturba Medical College (Manipal University), Mangalore, India
| |
Collapse
|
7
|
Reed CM, Desloge JG, Braida LD, Perez ZD, Léger AC. Level variations in speech: Effect on masking release in hearing-impaired listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:102. [PMID: 27475136 PMCID: PMC6910012 DOI: 10.1121/1.4954746] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Revised: 06/02/2016] [Accepted: 06/10/2016] [Indexed: 05/31/2023]
Abstract
Acoustic speech is marked by time-varying changes in the amplitude envelope that may pose difficulties for hearing-impaired listeners. Removal of these variations (e.g., by the Hilbert transform) could improve speech reception for such listeners, particularly in fluctuating interference. Léger, Reed, Desloge, Swaminathan, and Braida [(2015b). J. Acoust. Soc. Am. 138, 389-403] observed that a normalized measure of masking release obtained for hearing-impaired listeners using speech processed to preserve temporal fine-structure (TFS) cues was larger than that for unprocessed or envelope-based speech. This study measured masking release for two other speech signals in which level variations were minimal: peak clipping and TFS processing of an envelope signal. Consonant identification was measured for hearing-impaired listeners in backgrounds of continuous and fluctuating speech-shaped noise. The normalized masking release obtained using speech with normal variations in overall level was substantially less than that observed using speech processed to achieve highly restricted level variations. These results suggest that the performance of hearing-impaired listeners in fluctuating noise may be improved by signal processing that leads to a decrease in stimulus level variations.
Collapse
Affiliation(s)
- Charlotte M Reed
- Research Laboratory of Electronics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Joseph G Desloge
- Research Laboratory of Electronics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Louis D Braida
- Research Laboratory of Electronics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Zachary D Perez
- Research Laboratory of Electronics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | - Agnès C Léger
- School of Psychological Sciences, University of Manchester, Manchester, M13 9PL, United Kingdom
| |
Collapse
|
8
|
Léger AC, Reed CM, Desloge JG, Swaminathan J, Braida LD. Consonant identification in noise using Hilbert-transform temporal fine-structure speech and recovered-envelope speech for listeners with normal and impaired hearing. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 138:389-403. [PMID: 26233038 PMCID: PMC4514718 DOI: 10.1121/1.4922949] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Revised: 04/07/2015] [Accepted: 06/11/2015] [Indexed: 05/31/2023]
Abstract
Consonant-identification ability was examined in normal-hearing (NH) and hearing-impaired (HI) listeners in the presence of steady-state and 10-Hz square-wave interrupted speech-shaped noise. The Hilbert transform was used to process speech stimuli (16 consonants in a-C-a syllables) to present envelope cues, temporal fine-structure (TFS) cues, or envelope cues recovered from TFS speech. The performance of the HI listeners was inferior to that of the NH listeners both in terms of lower levels of performance in the baseline condition and in the need for higher signal-to-noise ratio to yield a given level of performance. For NH listeners, scores were higher in interrupted noise than in steady-state noise for all speech types (indicating substantial masking release). For HI listeners, masking release was typically observed for TFS and recovered-envelope speech but not for unprocessed and envelope speech. For both groups of listeners, TFS and recovered-envelope speech yielded similar levels of performance and consonant confusion patterns. The masking release observed for TFS and recovered-envelope speech may be related to level effects associated with the manner in which the TFS processing interacts with the interrupted noise signal, rather than to the contributions of TFS cues per se.
Collapse
Affiliation(s)
- Agnès C Léger
- School of Psychological Sciences, University of Manchester, Manchester, M13 9PL, United Kingdom
| | - Charlotte M Reed
- Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Joseph G Desloge
- Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Jayaganesh Swaminathan
- Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Louis D Braida
- Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| |
Collapse
|