51
|
Heeren WFL. Vocalic correlates of pitch in whispered versus normal speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 138:3800-3810. [PMID: 26723334 DOI: 10.1121/1.4937762] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In whispered speech, the fundamental frequency is absent as a main cue to pitch. This study investigated how different pitch targets can acoustically be coded in whispered relative to normal speech. Secondary acoustic correlates that are found in normal speech may be preserved in whisper. Alternatively, whispering speakers may provide compensatory information. Compared to earlier studies, a more comprehensive set of acoustic correlates (duration, intensity, formants, center-of-gravity, spectral balance) and a larger set of materials were included. To elicit maximal acoustic differences among the low, mid, and high pitch targets, linguistic and semantic load were minimized: 12 native Dutch speakers produced the point vowels (/a, i, u/) in nonsense vowel-consonant-vowel targets (with C = {/s/, /f/}). Acoustic analyses showed that in addition to systematic changes in formants, which have been reported before, also center of gravity, spectral balance, and intensity varied with pitch target, both in whispered and normal speech. Some acoustic correlates differed more in whispered than in normal speech, suggesting that speakers can adopt a compensatory strategy when coding pitch in the speech mode lacking the main cue. Speakers furthermore varied in the extent to which particular correlates were used, and in the combination of correlates they altered systematically.
Collapse
Affiliation(s)
- Willemijn F L Heeren
- Leiden University Centre for Linguistics, Leiden Institute for Brain and Cognition, Leiden University, P. N. van Eyckhof 3, 2311 BV Leiden, The Netherlands
| |
Collapse
|
52
|
Heeren WFL. Coding pitch differences in voiceless fricatives: Whispered relative to normal speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 138:3427-3438. [PMID: 26723300 DOI: 10.1121/1.4936859] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Intonation can be perceived in whispered speech despite the absence of the fundamental frequency. In the past, acoustic correlates of pitch in whisper have been sought in vowel content, but, recently, studies of normal speech demonstrated correlates of intonation in consonants as well. This study examined how consonants may contribute to the coding of intonation in whispered relative to normal speech. The acoustic characteristics of whispered, voiceless fricatives /s/ and /f/, produced at different pitch targets (low, mid, high), were investigated and compared to corresponding normal speech productions to assess if whisper contained secondary or compensatory pitch correlates. Furthermore, listener sensitivity to fricative cues to pitch in whisper was established, also relative to normal speech. Consistent with recent studies, acoustic correlates of whispered and normal speech fricatives systematically varied with pitch target. Comparable findings across speech modes showed that acoustic correlates were secondary. Discrimination of vowel-fricative-vowel stimuli was less accurate and slower in whispered than normal speech, which is attributed to differences in acoustic cues available. Perception of fricatives presented without their vowel contexts, however, revealed comparable processing speeds and response accuracies between speech modes, supporting the finding that within fricatives, acoustic correlates of pitch are similar across speech modes.
Collapse
Affiliation(s)
- Willemijn F L Heeren
- Utrecht Institute of Linguistics OTS, Utrecht University, Trans 10, 3512 JK Utrecht, The Netherlands
| |
Collapse
|
53
|
Gao N, Xu XD, Chi FL, Zeng FG, Fu QJ, Jia XH, Yin YB, Ping LC, Kang HY, Feng HH, Wu YZ, Jiang Y. Objective and subjective evaluations of the Nurotron Venus cochlear implant system via animal experiments and clinical trials. Acta Otolaryngol 2015; 136:68-77. [PMID: 26382170 DOI: 10.3109/00016489.2015.1086022] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
CONCLUSION This study described objective and subjective evaluations of the Nurotron® Venus™ Cochlear Implant System and indicated that this system produced a satisfactory performance. OBJECTIVE To observe the performance of the Nurotron® Venus™ cochlear implant (CI) system via electrophysiological and psychophysical evaluations. METHODS A 26-electrode CI system was specially designed. The performance of MRI in animal and cadaveric head experiments, EABR in cats experiment, the correlation between ESRT and C level, and psychophysics evaluations in clinical trials were observed. RESULTS In the animal and cadaveric head experiments, magnet dislocation could not be prevented in the 1.5 T MRI without removal of the internal magnet. The EABR was clearly elicited in cat experiment. In the clinical trial, the ESRT was strongly correlated with C level (p < 0.001). The human clinical trial involving 57 post-lingually deafened native Mandarin-speaking patients was performed. Residual hearing protection in the implanted ear at each audiometric frequency was observed in 27.5-46.3% patients post-operatively. A pitch ranking test revealed that place pitches were generally ordered from apical to basal electrodes. The recognitions of the perceptions of 301 disyllabic words, environment sounds, disyllabic words, and numerals were significantly better than the pre-operative performance and reached plateaus.
Collapse
Affiliation(s)
- Na Gao
- a Department of Otology and Skull Base Surgery , Eye Ear Nose & Throat Hospital, Fudan University , Shanghai , PR China
- b Shanghai Auditory Medical Center , Shanghai , PR China
- c Key Laboratory of Hearing Science , Ministry of Health , Shanghai , PR China
| | - Xin-Da Xu
- a Department of Otology and Skull Base Surgery , Eye Ear Nose & Throat Hospital, Fudan University , Shanghai , PR China
- b Shanghai Auditory Medical Center , Shanghai , PR China
- c Key Laboratory of Hearing Science , Ministry of Health , Shanghai , PR China
| | - Fang-Lu Chi
- a Department of Otology and Skull Base Surgery , Eye Ear Nose & Throat Hospital, Fudan University , Shanghai , PR China
- b Shanghai Auditory Medical Center , Shanghai , PR China
- c Key Laboratory of Hearing Science , Ministry of Health , Shanghai , PR China
| | - Fan-Gang Zeng
- d Departments of Anatomy and Neurobiology , Biomedical Engineering, Cognitive Sciences and Otolaryngology - Head and Neck Surgery, University of California , Irvine , CA , USA
| | - Qian-Jie Fu
- e Department of Biomedical Engineering , University of Southern California , Los Angeles , CA , USA
| | - Xian-Hao Jia
- a Department of Otology and Skull Base Surgery , Eye Ear Nose & Throat Hospital, Fudan University , Shanghai , PR China
- b Shanghai Auditory Medical Center , Shanghai , PR China
- c Key Laboratory of Hearing Science , Ministry of Health , Shanghai , PR China
| | - Yan-Bo Yin
- a Department of Otology and Skull Base Surgery , Eye Ear Nose & Throat Hospital, Fudan University , Shanghai , PR China
- b Shanghai Auditory Medical Center , Shanghai , PR China
- c Key Laboratory of Hearing Science , Ministry of Health , Shanghai , PR China
| | - Li-Chuan Ping
- f Nurotron Biotechnology Inc. , Hangzhou , Zhejiang , PR China
| | - Hou-Yong Kang
- g Department of Otolaryngology - Head and Neck Surgery , The First Affiliated Hospital of Chongqing Medical University , Chongqing , PR China
| | - Hai-Hong Feng
- h Shanghai Acoustics Laboratory , Chinese Academy of Sciences , Shanghai , PR China
| | - Yong-Zhen Wu
- a Department of Otology and Skull Base Surgery , Eye Ear Nose & Throat Hospital, Fudan University , Shanghai , PR China
- b Shanghai Auditory Medical Center , Shanghai , PR China
- c Key Laboratory of Hearing Science , Ministry of Health , Shanghai , PR China
| | - Ye Jiang
- a Department of Otology and Skull Base Surgery , Eye Ear Nose & Throat Hospital, Fudan University , Shanghai , PR China
- b Shanghai Auditory Medical Center , Shanghai , PR China
- c Key Laboratory of Hearing Science , Ministry of Health , Shanghai , PR China
| |
Collapse
|
54
|
Kong YY, Somarowthu A, Ding N. Effects of Spectral Degradation on Attentional Modulation of Cortical Auditory Responses to Continuous Speech. J Assoc Res Otolaryngol 2015; 16:783-96. [PMID: 26362546 DOI: 10.1007/s10162-015-0540-x] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2014] [Accepted: 08/24/2015] [Indexed: 11/24/2022] Open
Abstract
This study investigates the effect of spectral degradation on cortical speech encoding in complex auditory scenes. Young normal-hearing listeners were simultaneously presented with two speech streams and were instructed to attend to only one of them. The speech mixtures were subjected to noise-channel vocoding to preserve the temporal envelope and degrade the spectral information of speech. Each subject was tested with five spectral resolution conditions (unprocessed speech, 64-, 32-, 16-, and 8-channel vocoder conditions) and two target-to-masker ratio (TMR) conditions (3 and 0 dB). Ongoing electroencephalographic (EEG) responses and speech comprehension were measured in each spectral and TMR condition for each subject. Neural tracking of each speech stream was characterized by cross-correlating the EEG responses with the envelope of each of the simultaneous speech streams at different time lags. Results showed that spectral degradation and TMR both significantly influenced how top-down attention modulated the EEG responses to the attended and unattended speech. That is, the EEG responses to the attended and unattended speech streams differed more for the higher (unprocessed, 64 ch, and 32 ch) than the lower (16 and 8 ch) spectral resolution conditions, as well as for the higher (3 dB) than the lower TMR (0 dB) condition. The magnitude of differential neural modulation responses to the attended and unattended speech streams significantly correlated with speech comprehension scores. These results suggest that severe spectral degradation and low TMR hinder speech stream segregation, making it difficult to employ top-down attention to differentially process different speech streams.
Collapse
Affiliation(s)
- Ying-Yee Kong
- Department of Communication Sciences and Disorders, Northeastern University, Boston, MA, 02115, USA. .,Department of Bioengineering, Northeastern University, Boston, MA, 02115, USA.
| | - Ala Somarowthu
- Department of Bioengineering, Northeastern University, Boston, MA, 02115, USA.
| | - Nai Ding
- College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Zhejiang, China.
| |
Collapse
|
55
|
Cabrera L, Tsao FM, Liu HM, Li LY, Hu YH, Lorenzi C, Bertoncini J. The perception of speech modulation cues in lexical tones is guided by early language-specific experience. Front Psychol 2015; 6:1290. [PMID: 26379605 PMCID: PMC4551816 DOI: 10.3389/fpsyg.2015.01290] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2015] [Accepted: 08/12/2015] [Indexed: 11/13/2022] Open
Abstract
A number of studies showed that infants reorganize their perception of speech sounds according to their native language categories during their first year of life. Still, information is lacking about the contribution of basic auditory mechanisms to this process. This study aimed to evaluate when native language experience starts to noticeably affect the perceptual processing of basic acoustic cues [i.e., frequency-modulation (FM) and amplitude-modulation information] known to be crucial for speech perception in adults. The discrimination of a lexical-tone contrast (rising versus low) was assessed in 6- and 10-month-old infants learning either French or Mandarin using a visual habituation paradigm. The lexical tones were presented in two conditions designed to either keep intact or to severely degrade the FM and fine spectral cues needed to accurately perceive voice-pitch trajectory. A third condition was designed to assess the discrimination of the same voice-pitch trajectories using click trains containing only the FM cues related to the fundamental-frequency (F0) in French- and Mandarin-learning 10-month-old infants. Results showed that the younger infants of both language groups and the Mandarin-learning 10-month-olds discriminated the intact lexical-tone contrast while French-learning 10-month-olds failed. However, only the French 10-month-olds discriminated degraded lexical tones when FM, and thus voice-pitch cues were reduced. Moreover, Mandarin-learning 10-month-olds were found to discriminate the pitch trajectories as presented in click trains better than French infants. Altogether, these results reveal that the perceptual reorganization occurring during the first year of life for lexical tones is coupled with changes in the auditory ability to use speech modulation cues.
Collapse
Affiliation(s)
- Laurianne Cabrera
- Centre National de la Recherche Scientifique, Laboratoire de Psychologie de la Perception, Université Paris DescartesParis, France
| | - Feng-Ming Tsao
- Department of Psychology, National Taiwan UniversityTaipei, Taiwan
| | - Huei-Mei Liu
- Department of Special Education, National Taiwan Normal UniversityTaipei, Taiwan
| | - Lu-Yang Li
- Department of Psychology, National Taiwan UniversityTaipei, Taiwan
| | - You-Hsin Hu
- Department of Psychology, National Taiwan UniversityTaipei, Taiwan
| | - Christian Lorenzi
- Centre National de la Recherche Scientifique, Laboratoire des Systèmes Perceptifs, Institut d’Etude de la Cognition, Ecole Normale SupérieureParis, France
| | - Josiane Bertoncini
- Centre National de la Recherche Scientifique, Laboratoire de Psychologie de la Perception, Université Paris DescartesParis, France
| |
Collapse
|
56
|
Effects of steep high-frequency hearing loss on speech recognition using temporal fine structure in low-frequency region. Hear Res 2015; 326:66-74. [DOI: 10.1016/j.heares.2015.04.004] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/20/2014] [Revised: 04/06/2015] [Accepted: 04/09/2015] [Indexed: 11/18/2022]
|
57
|
Liu X, Xu Y. Relations between affective music and speech: evidence from dynamics of affective piano performance and speech production. Front Psychol 2015. [PMID: 26217252 PMCID: PMC4495307 DOI: 10.3389/fpsyg.2015.00886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
This study compares affective piano performance with speech production from the perspective of dynamics: unlike previous research, this study uses finger force and articulatory effort as indexes reflecting the dynamics of affective piano performance and speech production respectively. Moreover, for the first time physical constraints such as piano fingerings and speech articulatory constraints are included due to their potential contribution to different patterns of dynamics. A piano performance experiment and speech production experiment were conducted in four emotions: anger, fear, happiness and sadness. The results show that in both piano performance and speech production, anger and happiness generally have high dynamics while sadness has the lowest dynamics. Fingerings interact with fear in the piano experiment and articulatory constraints interact with anger in the speech experiment, i.e., large physical constraints produce significantly higher dynamics than small physical constraints in piano performance under the condition of fear and in speech production under the condition of anger. Using production experiments, this study firstly supports previous perception studies on relations between affective music and speech. Moreover, this is the first study to show quantitative evidence for the importance of considering motor aspects such as dynamics in comparing music performance and speech production in which motor mechanisms play a crucial role.
Collapse
Affiliation(s)
- Xiaoluan Liu
- Department of Speech, Hearing and Phonetic Sciences, University College London London, UK
| | - Yi Xu
- Department of Speech, Hearing and Phonetic Sciences, University College London London, UK
| |
Collapse
|
58
|
Relative Contributions of Spectral and Temporal Cues to Korean Phoneme Recognition. PLoS One 2015; 10:e0131807. [PMID: 26162017 PMCID: PMC4498788 DOI: 10.1371/journal.pone.0131807] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2015] [Accepted: 06/07/2015] [Indexed: 12/02/2022] Open
Abstract
This study was aimed to evaluate the relative contributions of spectral and temporal information to Korean phoneme recognition and to compare them with those to English phoneme recognition. Eleven normal-hearing Korean-speaking listeners participated in the study. Korean phonemes, including 18 consonants in a /Ca/ format and 17 vowels in a /hVd/ format, were processed through a noise vocoder. The spectral information was controlled by varying the number of channels (1, 2, 3, 4, 6, 8, 12, and 16) whereas the temporal information was controlled by varying the lowpass cutoff frequency of the envelope extractor (1 to 512 Hz in octave steps). A total of 80 vocoder conditions (8 numbers of channels × 10 lowpass cutoff frequencies) were presented to listeners for phoneme recognition. While vowel recognition depended on the spectral cues predominantly, a tradeoff between the spectral and temporal information was evident for consonant recognition. The overall consonant recognition was dramatically lower than that of English consonant recognition under similar vocoder conditions. The complexity of the Korean consonant repertoire, the three-way distinction of stops in particular, hinders recognition of vocoder-processed phonemes.
Collapse
|
59
|
The Role of Temporal Envelope and Fine Structure in Mandarin Lexical Tone Perception in Auditory Neuropathy Spectrum Disorder. PLoS One 2015; 10:e0129710. [PMID: 26052707 PMCID: PMC4459992 DOI: 10.1371/journal.pone.0129710] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2015] [Accepted: 05/12/2015] [Indexed: 11/19/2022] Open
Abstract
Temporal information in a signal can be partitioned into temporal envelope (E) and fine structure (FS). Fine structure is important for lexical tone perception for normal-hearing (NH) listeners, and listeners with sensorineural hearing loss (SNHL) have an impaired ability to use FS in lexical tone perception due to the reduced frequency resolution. The present study was aimed to assess which of the acoustic aspects (E or FS) played a more important role in lexical tone perception in subjects with auditory neuropathy spectrum disorder (ANSD) and to determine whether it was the deficit in temporal resolution or frequency resolution that might lead to more detrimental effects on FS processing in pitch perception. Fifty-eight native Mandarin Chinese-speaking subjects (27 with ANSD, 16 with SNHL, and 15 with NH) were assessed for (1) their ability to recognize lexical tones using acoustic E or FS cues with the “auditory chimera” technique, (2) temporal resolution as measured with temporal gap detection (TGD) threshold, and (3) frequency resolution as measured with the Q10dB values of the psychophysical tuning curves. Overall, 26.5%, 60.2%, and 92.1% of lexical tone responses were consistent with FS cues for tone perception for listeners with ANSD, SNHL, and NH, respectively. The mean TGD threshold was significantly higher for listeners with ANSD (11.9 ms) than for SNHL (4.0 ms; p < 0.001) and NH (3.9 ms; p < 0.001) listeners, with no significant difference between SNHL and NH listeners. In contrast, the mean Q10dB for listeners with SNHL (1.8±0.4) was significantly lower than that for ANSD (3.5±1.0; p < 0.001) and NH (3.4±0.9; p < 0.001) listeners, with no significant difference between ANSD and NH listeners. These results suggest that reduced temporal resolution, as opposed to reduced frequency selectivity, in ANSD subjects leads to greater degradation of FS processing for pitch perception.
Collapse
|
60
|
The binaural masking-level difference of mandarin tone detection and the binaural intelligibility-level difference of mandarin tone recognition in the presence of speech-spectrum noise. PLoS One 2015; 10:e0120977. [PMID: 25835987 PMCID: PMC4383418 DOI: 10.1371/journal.pone.0120977] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2014] [Accepted: 02/09/2015] [Indexed: 11/29/2022] Open
Abstract
Binaural hearing involves using information relating to the differences between the signals that arrive at the two ears, and it can make it easier to detect and recognize signals in a noisy environment. This phenomenon of binaural hearing is quantified in laboratory studies as the binaural masking-level difference (BMLD). Mandarin is one of the most commonly used languages, but there are no publication values of BMLD or BILD based on Mandarin tones. Therefore, this study investigated the BMLD and BILD of Mandarin tones. The BMLDs of Mandarin tone detection were measured based on the detection threshold differences for the four tones of the voiced vowels /i/ (i.e., /i1/, /i2/, /i3/, and /i4/) and /u/ (i.e., /u1/, /u2/, /u3/, and /u4/) in the presence of speech-spectrum noise when presented interaurally in phase (S0N0) and interaurally in antiphase (SπN0). The BILDs of Mandarin tone recognition in speech-spectrum noise were determined as the differences in the target-to-masker ratio (TMR) required for 50% correct tone recognitions between the S0N0 and SπN0 conditions. The detection thresholds for the four tones of /i/ and /u/ differed significantly (p<0.001) between the S0N0 and SπN0 conditions. The average detection thresholds of Mandarin tones were all lower in the SπN0 condition than in the S0N0 condition, and the BMLDs ranged from 7.3 to 11.5 dB. The TMR for 50% correct Mandarin tone recognitions differed significantly (p<0.001) between the S0N0 and SπN0 conditions, at –13.4 and –18.0 dB, respectively, with a mean BILD of 4.6 dB. The study showed that the thresholds of Mandarin tone detection and recognition in the presence of speech-spectrum noise are improved when phase inversion is applied to the target speech. The average BILDs of Mandarin tones are smaller than the average BMLDs of Mandarin tones.
Collapse
|
61
|
Ambert-Dahan E, Giraud AL, Sterkers O, Samson S. Judgment of musical emotions after cochlear implantation in adults with progressive deafness. Front Psychol 2015; 6:181. [PMID: 25814961 PMCID: PMC4357245 DOI: 10.3389/fpsyg.2015.00181] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2014] [Accepted: 02/04/2015] [Indexed: 11/14/2022] Open
Abstract
While cochlear implantation is rather successful in restoring speech comprehension in quiet environments (Nimmons et al., 2008), other auditory tasks, such as music perception, can remain challenging for implant users. Here, we tested how patients who had received a cochlear implant (CI) after post-lingual progressive deafness perceive emotions in music. Thirteen adult CI recipients with good verbal comprehension (dissyllabic words ≥70%) and 13 normal hearing participants matched for age, gender, and education listened to 40 short musical excerpts that selectively expressed fear, happiness, sadness, and peacefulness ( Vieillard et al., 2008). The participants were asked to rate (on a 0–100 scale) how much the musical stimuli expressed these four cardinal emotions, and to judge their emotional valence (unpleasant–pleasant) and arousal (relaxing–stimulating). Although CI users performed above chance level, their emotional judgments (mean correctness scores) were generally impaired for happy, scary, and sad, but not for peaceful excerpts. CI users also demonstrated deficits in perceiving arousal of musical excerpts, whereas rating of valence remained unaffected. The current findings indicate that judgments of emotional categories and dimensions of musical excerpts are not uniformly impaired after cochlear implantation. These results are discussed in relation to the relatively spared abilities of CI users in perceiving temporal (rhythm and metric) as compared to spectral (pitch and timbre) musical dimensions, which might benefit the processing of musical emotions (Cooper et al., 2008).
Collapse
Affiliation(s)
- Emmanuèle Ambert-Dahan
- Unité Otologie, Implants auditifs et Chirurgie de la base du crâne, Assistance Publique Hôpitaux de Paris - Groupe Hospitalier Pitié-Salpêtrière Paris, France ; Laboratoire PSITEC (EA 4072), Neuropsychologie: Audition, Cognition et Action, Department of Psychology, Université de Lille 3 Villeneuve d'Ascq, France
| | - Anne-Lise Giraud
- Neuroscience Department, Campus Biotech, University of Geneva Geneva, Switzerland
| | - Olivier Sterkers
- Unité Otologie, Implants auditifs et Chirurgie de la base du crâne, Assistance Publique Hôpitaux de Paris - Groupe Hospitalier Pitié-Salpêtrière Paris, France
| | - Séverine Samson
- Laboratoire PSITEC (EA 4072), Neuropsychologie: Audition, Cognition et Action, Department of Psychology, Université de Lille 3 Villeneuve d'Ascq, France ; Unité d'épilepsie, Assistance Publique Hôpitaux de Paris - Groupe Hospitalier Pitié-Salpêtrière Paris, France
| |
Collapse
|
62
|
Stilp CE, Goupell MJ. Spectral and temporal resolutions of information-bearing acoustic changes for understanding vocoded sentences. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 137:844-55. [PMID: 25698018 PMCID: PMC4336249 DOI: 10.1121/1.4906179] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2014] [Revised: 12/12/2014] [Accepted: 12/27/2014] [Indexed: 06/04/2023]
Abstract
Short-time spectral changes in the speech signal are important for understanding noise-vocoded sentences. These information-bearing acoustic changes, measured using cochlea-scaled entropy in cochlear implant simulations [CSECI; Stilp et al. (2013). J. Acoust. Soc. Am. 133(2), EL136-EL141; Stilp (2014). J. Acoust. Soc. Am. 135(3), 1518-1529], may offer better understanding of speech perception by cochlear implant (CI) users. However, perceptual importance of CSECI for normal-hearing listeners was tested at only one spectral resolution and one temporal resolution, limiting generalizability of results to CI users. Here, experiments investigated the importance of these informational changes for understanding noise-vocoded sentences at different spectral resolutions (4-24 spectral channels; Experiment 1), temporal resolutions (4-64 Hz cutoff for low-pass filters that extracted amplitude envelopes; Experiment 2), or when both parameters varied (6-12 channels, 8-32 Hz; Experiment 3). Sentence intelligibility was reduced more by replacing high-CSECI intervals with noise than replacing low-CSECI intervals, but only when sentences had sufficient spectral and/or temporal resolution. High-CSECI intervals were more important for speech understanding as spectral resolution worsened and temporal resolution improved. Trade-offs between CSECI and intermediate spectral and temporal resolutions were minimal. These results suggest that signal processing strategies that emphasize information-bearing acoustic changes in speech may improve speech perception for CI users.
Collapse
Affiliation(s)
- Christian E Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292
| | - Matthew J Goupell
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742
| |
Collapse
|
63
|
Heeren WFL, van Heuven VJ. The interaction of lexical and phrasal prosody in whispered speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 136:3272. [PMID: 25480073 DOI: 10.1121/1.4901705] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
The production and perception of Dutch whispered boundary tones, i.e., phrasal prosody, was investigated as a function of characteristics of the tone-bearing word, i.e., lexical prosody. More specifically, the disyllabic tone-bearing word also carried a pitch accent, either on the same syllable as the boundary tone (clash condition), or on the directly adjacent syllable (no clash condition). In a statement/question classification task listeners showed moderate, but above-chance performance for both conditions in whisper, which, however, was much worse as well as slower than in normal speech. The syllabic rhymes of speakers' productions were investigated for acoustic correlates of boundary tones. Results showed mainly secondary cues to intonation, that is, cues that are present in whisper as in normal speech, but minimal compensatory cues, which would reflect speakers' efforts to enhance their whispered speech signal in some way. This suggests that multiple prosodic events in close proximity are challenging to perceive and produce in whispered speech. A moderate increase in classification performance was found when that acoustic cue was enhanced that whispering speakers seemed to employ in a compensatory way: changing the spectral tilt of the utterance-final syllable improved perception of especially the poorer speakers and of intonation on stressed syllables.
Collapse
Affiliation(s)
- W F L Heeren
- Leiden University Centre for Linguistics, Leiden Institute for Brain and Cognition, Leiden University, Cleveringaplaats 1, 2311 BD Leiden, The Netherlands
| | - V J van Heuven
- Leiden University Centre for Linguistics, Leiden Institute for Brain and Cognition, Leiden University, Cleveringaplaats 1, 2311 BD Leiden, The Netherlands
| |
Collapse
|
64
|
Li A, Wang N, Li J, Zhang J, Liu Z. Mandarin lexical tones identification among children with cochlear implants or hearing aids. Int J Pediatr Otorhinolaryngol 2014; 78:1945-52. [PMID: 25234731 DOI: 10.1016/j.ijporl.2014.08.033] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/21/2014] [Revised: 08/21/2014] [Accepted: 08/24/2014] [Indexed: 10/24/2022]
Abstract
OBJECTIVES Mandarin Chinese is a lexical tone language that has four tones, with a change in tone denoting a change in lexical meaning. There are few studies regarding lexical tone identification abilities in deafened children using either cochlear implants (CIs) or hearing aids (HAs). Furthermore, no study has compared the lexical tone identification abilities of deafened children with their hearing devices turned on and off. The present study aimed to investigate the lexical tone identification abilities of deafened children with CIs or HAs. METHODS Forty prelingually deafened children (20 with CIs and 20 with HAs) participated in the study. In the HA group, 20 children were binaurally aided. In the CI group, all of the children were unilaterally implanted. All of the subjects completed a computerized lexical tone pairs test with their hearing devices turned on and off. The correct answers of all items were recorded as the total score and the correct answers of the tone pairs were recorded as subtotal scores. RESULTS No significant differences in the tone pair identification scores were found between the CI group and HA group either with the devices turned on or off (t=1.62, p=0.11; t=1.863, p=0.07, respectively). The scores in the aided condition were higher than in the unaided condition regardless of the device used (t=22.09, p<0.001, in the HA group; t=20.20, p<0.001, in the CI group). Significantly higher scores were found in the tone pairs that contained tone 4. Age at fitting of the devices was correlated with tone identification abilities in both the CI and HA groups. Other demographic factors were not correlated with tone identification ability. CONCLUSIONS The hearing device, whether a hearing aid or cochlear implant, is beneficial for tone identification. The lexical tone identification abilities were similar regardless of whether the subjects wore a HA or CI. Lexical tone pairs with different durations and dissimilar tone contour patterns are more easily identified. Receiving devices at earlier age tends to produce better lexical tone identification abilities in prelingually deafened children.
Collapse
Affiliation(s)
- Aifeng Li
- Department of Otorhinolaryngology Head and Neck Surgery, Beijing Chao-Yang Hospital, Capital Medical University, PR China; College of Otolaryngology, Capital Medical University, PR China; Key Laboratory of Otolaryngology Head and Neck Surgery (Capital Medical University), Ministry of Education, Beijing 100020, PR China
| | - Ningyu Wang
- Department of Otorhinolaryngology Head and Neck Surgery, Beijing Chao-Yang Hospital, Capital Medical University, PR China; College of Otolaryngology, Capital Medical University, PR China; Key Laboratory of Otolaryngology Head and Neck Surgery (Capital Medical University), Ministry of Education, Beijing 100020, PR China.
| | - Jinlan Li
- Department of Otorhinolaryngology Head and Neck Surgery, Beijing Chao-Yang Hospital, Capital Medical University, PR China; College of Otolaryngology, Capital Medical University, PR China; Key Laboratory of Otolaryngology Head and Neck Surgery (Capital Medical University), Ministry of Education, Beijing 100020, PR China
| | - Juan Zhang
- Department of Otorhinolaryngology Head and Neck Surgery, Beijing Chao-Yang Hospital, Capital Medical University, PR China; College of Otolaryngology, Capital Medical University, PR China; Key Laboratory of Otolaryngology Head and Neck Surgery (Capital Medical University), Ministry of Education, Beijing 100020, PR China
| | - Zhiyong Liu
- Department of Otorhinolaryngology Head and Neck Surgery, Beijing Chao-Yang Hospital, Capital Medical University, PR China; College of Otolaryngology, Capital Medical University, PR China; Key Laboratory of Otolaryngology Head and Neck Surgery (Capital Medical University), Ministry of Education, Beijing 100020, PR China
| |
Collapse
|
65
|
Zhou N, Pfingst BE. Relationship between multipulse integration and speech recognition with cochlear implants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 136:1257. [PMID: 25190399 PMCID: PMC4165232 DOI: 10.1121/1.4890640] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Comparisons of performance with cochlear implants and postmortem conditions in the cochlea in humans have shown mixed results. The limitations in those studies favor the use of within-subject designs and non-invasive measures to estimate cochlear conditions. One non-invasive correlate of cochlear health is multipulse integration, established in an animal model. The present study used this measure to relate neural health in human cochlear implant users to their speech recognition performance. The multipulse-integration slopes were derived based on psychophysical detection thresholds measured for two pulse rates (80 and 640 pulses per second). A within-subject design was used in eight subjects with bilateral implants where the direction and magnitude of ear differences in the multipulse-integration slopes were compared with those of the speech-recognition results. The speech measures included speech reception threshold for sentences and phoneme recognition in noise. The magnitude of ear difference in the integration slopes was significantly correlated with the magnitude of ear difference in speech reception thresholds, consonant recognition in noise, and transmission of place of articulation of consonants. These results suggest that multipulse integration predicts speech recognition in noise and perception of features that use dynamic spectral cues.
Collapse
Affiliation(s)
- Ning Zhou
- Department of Communication Sciences and Disorders, East Carolina University, Greenville, North Carolina 27834
| | - Bryan E Pfingst
- Kresge Hearing Research Institute, Department of Otolaryngology, University of Michigan, Ann Arbor, Michigan 48109-5616
| |
Collapse
|
66
|
Cabrera L, Tsao FM, Gnansia D, Bertoncini J, Lorenzi C. The role of spectro-temporal fine structure cues in lexical-tone discrimination for French and Mandarin listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 136:877-882. [PMID: 25096121 DOI: 10.1121/1.4887444] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
The role of spectro-temporal modulation cues in conveying tonal information for lexical tones was assessed in native-Mandarin and native-French adult listeners using a lexical-tone discrimination task. The fundamental frequency (F0) of Thai tones was either degraded using an 8-band vocoder that reduced fine spectral details and frequency-modulation cues, or extracted and used to modulate the F0 of click trains. Mandarin listeners scored lower than French listeners in the discrimination of vocoded lexical tones. For click trains, Mandarin listeners outperformed French listeners. These preliminary results suggest that the perceptual weight of the fine spectro-temporal modulation cues conveying F0 information is enhanced for adults speaking a tonal language.
Collapse
Affiliation(s)
- Laurianne Cabrera
- Laboratoire de Psychologie de la Perception, CNRS, Université Paris Descartes, 45 rue des saints Pères, 75006 Paris, France
| | - Feng-Ming Tsao
- Department of Psychology, National Taiwan University, Number 1, Section 4, Roosevelt Road, Taipei, 106, Taiwan
| | - Dan Gnansia
- Neurelec, 2720 Chemin de Saint-Bernard Porte, 06224, Vallauris, France
| | - Josiane Bertoncini
- Laboratoire de Psychologie de la Perception, CNRS, Université Paris Descartes, 45 rue des saints Pères, 75006 Paris, France
| | - Christian Lorenzi
- Laboratoire des systèmes perceptifs, CNRS, Institut d'Etude de la Cognition, Ecole normale supérieure, Paris Sciences et Lettres, 29 rue d'Ulm, 75005 Paris, France
| |
Collapse
|
67
|
Lee T, Yu S, Yuan M, Wong TKC, Kong YY. The effect of enhancing temporal periodicity cues on Cantonese tone recognition by cochlear implantees. Int J Audiol 2014; 53:546-57. [PMID: 24694089 DOI: 10.3109/14992027.2014.893374] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
OBJECTIVES This study investigates the efficacy of a cochlear implant (CI) processing method that enhances temporal periodicity cues of speech. DESIGN Subjects participated in word and tone identification tasks. Two processing conditions - the conventional advanced combination encoder (ACE) and tone-enhanced ACE were tested. Test materials were Cantonese disyllabic words recorded from one male and one female speaker. Speech-shaped noise was added to clean speech. The fundamental frequency information for periodicity enhancement was extracted from the clean speech. Electrical stimuli generated from the noisy speech with and without periodicity enhancement were presented via direct stimulation using a Laura 34 research processor. Subjects were asked to identify the presented word. STUDY SAMPLE Seven post-lingually deafened native Cantonese-speaking CI users. RESULTS Percent correct word, segmental structure, and tone identification scores were calculated. While word and segmental structure identification accuracy remained similar between the two processing conditions, tone identification in noise was better with tone-enhanced ACE than with conventional ACE. Significant improvement on tone perception was found only for the female voice. CONCLUSIONS Temporal periodicity cues are important to tone perception in noise. Pitch and tone perception by CI users could be improved when listeners received enhanced temporal periodicity cues.
Collapse
Affiliation(s)
- Tan Lee
- * Department of Electronic Engineering, The Chinese University of Hong Kong , China
| | | | | | | | | |
Collapse
|
68
|
Heeren WFL, Lorenzi C. Perception of prosody in normal and whispered French. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 135:2026-2040. [PMID: 25235001 DOI: 10.1121/1.4868359] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
The current study explored perception of prosody in normal and whispered speech using a two-interval, two-alternative forced-choice psychophysical task where listeners discriminated between French noun phrases pronounced as declaratives or interrogatives. Stimuli were either presented between 50 and 8000 Hz or filtered into one of three broad frequency regions, corresponding to harmonic-resolvability regions for normal speech (resolved, partially resolved, unresolved harmonics). Normal speech was presented against a speech-shaped noise masker, whereas whispered speech was presented in quiet. The results showed that discrimination performance was differentially affected by filtering for normal and whispered speech, suggesting that cues to prosody differ between speech modes. For whispered speech, evidence was mainly derived from the high-frequency region, whereas for normal speech, evidence was mainly derived from the low-frequency (resolved harmonics) region. Modeling of the early stages of auditory processing confirmed that for whispered speech, perception of prosody was not based on temporal auditory cues and suggests that listeners may rely on place of excitation (spectral) cues that are, in contrast with suggestions made by earlier work, distributed across the spectrum.
Collapse
Affiliation(s)
- Willemijn F L Heeren
- Unité Mixte de Recherche LSP 8248, CNRS, Ecole Normale Supérieure, Paris 75005, France
| | - Christian Lorenzi
- Unité Mixte de Recherche LSP 8248, CNRS, Ecole Normale Supérieure, Paris 75005, France
| |
Collapse
|
69
|
Limb CJ, Roy AT. Technological, biological, and acoustical constraints to music perception in cochlear implant users. Hear Res 2014; 308:13-26. [DOI: 10.1016/j.heares.2013.04.009] [Citation(s) in RCA: 138] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/17/2013] [Revised: 04/04/2013] [Accepted: 04/22/2013] [Indexed: 11/30/2022]
|
70
|
Liu C, Azimi B, Bhandary M, Hu Y. Contribution of low-frequency harmonics to Mandarin Chinese tone identification in quiet and six-talker babble background. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 135:428-438. [PMID: 24437783 DOI: 10.1121/1.4837255] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
The goal of this study was to investigate Mandarin Chinese tone identification in quiet and multi-talker babble conditions for normal-hearing listeners. Tone identification was measured with speech stimuli and stimuli with low and/or high harmonics that were embedded in three Mandarin vowels with two fundamental frequencies. There were six types of stimuli: all harmonics (All), low harmonics (Low), high harmonics (High), and the first (H1), second (H2), and third (H3) harmonic. Results showed that, for quiet conditions, individual harmonics carried frequency contour information well enough for tone identification with high accuracy; however, in noisy conditions, tone identification with individual low harmonics (e.g., H1, H2, and H3) was significantly lower than that with the Low, High, and All harmonics. Moreover, tone identification with individual harmonics in noise was lower for a low F0 than for a high F0, and was also dependent on vowel category. Tone identification with individual low-frequency harmonics was accounted for by local signal-to-noise ratios, indicating that audibility of harmonics in noise may play a primary role in tone identification.
Collapse
Affiliation(s)
- Chang Liu
- Department of Communication Sciences and Disorders, University of Texas at Austin, 1 University Station A1100, Austin, Texas 78712
| | - Behnam Azimi
- Department of Electrical Engineering and Computer Science, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin 53201
| | - Moulesh Bhandary
- Department of Electrical Engineering and Computer Science, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin 53201
| | - Yi Hu
- Department of Electrical Engineering and Computer Science, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin 53201
| |
Collapse
|
71
|
Lee CY, Tao L, Bond ZS. Effects of speaker variability and noise on Mandarin tone identification by native and non-native listeners. SPEECH LANGUAGE AND HEARING 2013. [DOI: 10.1179/2050571x12z.0000000003] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
|
72
|
Ter-Mikaelian M, Semple MN, Sanes DH. Effects of spectral and temporal disruption on cortical encoding of gerbil vocalizations. J Neurophysiol 2013; 110:1190-204. [PMID: 23761696 DOI: 10.1152/jn.00645.2012] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Animal communication sounds contain spectrotemporal fluctuations that provide powerful cues for detection and discrimination. Human perception of speech is influenced both by spectral and temporal acoustic features but is most critically dependent on envelope information. To investigate the neural coding principles underlying the perception of communication sounds, we explored the effect of disrupting the spectral or temporal content of five different gerbil call types on neural responses in the awake gerbil's primary auditory cortex (AI). The vocalizations were impoverished spectrally by reduction to 4 or 16 channels of band-passed noise. For this acoustic manipulation, an average firing rate of the neuron did not carry sufficient information to distinguish between call types. In contrast, the discharge patterns of individual AI neurons reliably categorized vocalizations composed of only four spectral bands with the appropriate natural token. The pooled responses of small populations of AI cells classified spectrally disrupted and natural calls with an accuracy that paralleled human performance on an analogous speech task. To assess whether discharge pattern was robust to temporal perturbations of an individual call, vocalizations were disrupted by time-reversing segments of variable duration. For this acoustic manipulation, cortical neurons were relatively insensitive to short reversal lengths. Consistent with human perception of speech, these results indicate that the stable representation of communication sounds in AI is more dependent on sensitivity to slow temporal envelopes than on spectral detail.
Collapse
Affiliation(s)
- Maria Ter-Mikaelian
- Center for Neural Science, New York University, New York, New York 10003, USA
| | | | | |
Collapse
|
73
|
Chen F, Guan T, Wong LLN. Effect of temporal fine structure on speech intelligibility modeling. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2013; 2013:4199-4202. [PMID: 24110658 DOI: 10.1109/embc.2013.6610471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Temporal fine structure (TFS) carries important information for the speech perception of hearing-impaired listeners and for the design of novel prosthetic hearing devices. This study assessed the performance of present intelligibility indices for predicting the intelligibility of speech containing different amount of TFS information. Speech intelligibility data was collected from vocoded and wideband Mandarin sentences containing little/partial and intact TFS information, respectively, and was then subjected to the correlation analysis with existing intelligibility indices. It was found that, though performing well in predicting the intelligibility of vocoded or wideband speech separately, present intelligibility indices were not highly correlated with the intelligibility scores when a general function was used to map all intelligibility measures to intelligibility scores. Analysis further showed that the intelligibility prediction power could be significantly improved when multiple condition-dependent functions were used for mapping intelligibility measures to intelligibility scores.
Collapse
|
74
|
Wang S, Liu B, Zhang H, Dong R, Mannell R, Newall P, Chen X, Qi B, Zhang L, Han D. Mandarin lexical tone recognition in sensorineural hearing-impaired listeners and cochlear implant users. Acta Otolaryngol 2013; 133:47-54. [PMID: 23240663 DOI: 10.3109/00016489.2012.705438] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
CONCLUSIONS As the hearing loss becomes more severe, the tone recognition performance of hearing-impaired listeners gradually but slowly reduces. The tone recognition performance of cochlear implant listeners is below or close to the performance of severely hearing-impaired listeners. OBJECTIVES The present study aimed to investigate the Mandarin lexical tone recognition performance of sensorineural hearing-impaired listeners and post-lingually deafened cochlear implant users. METHODS Tone recognition performance was measured for 30 normal-hearing subjects, 41sensorineural hearing-impaired listeners, and 12 cochlear implant users using 128 monosyllables recorded by a male and a female adult native Mandarin speaker. RESULTS The results indicated that the accuracy of tone recognition was 99.3%, 96.4%, 93.7%, 83.9%, and 81.0% for the normal-hearing, moderate, moderate to severe, severely hearing-impaired, and cochlear implant subjects, respectively. For the hearing-impaired subjects, a significantly negative correlation was observed between tone recognition performance and the audiometric hearing thresholds. For cochlear implant subjects, Tone 3 was the easiest one to perceive and Tone 2 was the hardest one to perceive. They tended to misperceive Tone 1 as Tone 2, and misperceive Tone 2 as Tones 1 and 3.
Collapse
Affiliation(s)
- Shuo Wang
- Beijing Tongren Hospital, Capital Medical University, Beijing Institute of Otolaryngology, Key Laboratory of Otolaryngology Head and Neck Surgery (Capital Medical University), Ministry of Education, Beijing, China
| | | | | | | | | | | | | | | | | | | |
Collapse
|
75
|
Li B, Rong R. Tones in whispered Mandarin. 2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING 2012. [DOI: 10.1109/iscslp.2012.6423539] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
|
76
|
Chen F, Wong LLN, Tahmina Q, Azimi B, Hu Y. The effects of binaural spectral resolution mismatch on Mandarin speech perception in simulated electric hearing. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 132:EL142-EL148. [PMID: 22894313 DOI: 10.1121/1.4737595] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
This study assessed the effects of binaural spectral resolution mismatch on the intelligibility of Mandarin speech in noise using bilateral cochlear implant simulations. Noise-vocoded Mandarin speech, corrupted by speech-shaped noise at 0 and 5 dB signal-to-noise ratios, were presented unilaterally or bilaterally to normal-hearing listeners with mismatched spectral resolution between ears. Significant binaural benefits for Mandarin speech recognition were observed only with matched spectral resolution between ears. In addition, the performance of tone identification was more robust to noise than that of sentence recognition, suggesting factors other than tone identification might account more for the degraded sentence recognition in noise.
Collapse
Affiliation(s)
- Fei Chen
- Division of Speech and Hearing Sciences, The University of Hong Kong, Prince Philip Dental Hospital, 34 Hospital Road, Hong Kong.
| | | | | | | | | |
Collapse
|
77
|
Abstract
OBJECTIVES The primary goal of this study was to investigate how speech perception is altered by the provision of a preview or "prime" of a sample of speech just before it is presented in masking. A same-different test paradigm was developed which enabled the effect of priming to be measured with energetic maskers in addition to those that most likely produced both energetic and informational masking. Using this paradigm, the benefit of priming in overcoming energetic and informational masking was compared. DESIGN Twenty-four normal-hearing subjects listened to nonsense sentences presented in a background of competing speech (two-talker babble) or one of two types of speech-shaped noise. Both target and masker were presented via loudspeaker directly in front of the listeners. In the baseline condition, the listeners were then shown a sentence on a computer screen that either matched the auditory target sentence exactly or contained a replacement for one of the three target key words. Their task was to judge whether the printed sentence matched the auditory target and respond via computer keyboard. In the first experimental condition, the printed sentence preceded rather than followed the auditory presentation (the priming condition). In the second experimental condition, the perception of spatial separation was created between target and masker by presenting the masker from two loudspeakers (front and 60° to the right) and imposing a 4-msec delay in the masker coming from the front loudspeaker. This resulted in the target being heard from the front while, because of the precedence effect, the masker was heard well to the right (the spatial condition). In a third experimental condition, spatial separation and priming were combined. A total of five signal-to-noise ratios were tested for each masker. RESULTS The competing speech masker produced more masking than noise, consistent with previous findings. For the competing speech masker, the signal-to-noise ratio for 80% correct performance was approximately 6.7 dB lower when the listeners read the sentences first (the priming condition) than in the baseline condition. This priming effect was similar to the improvement obtained when the target and masker were separated spatially. Significant priming effects were also observed with speech-shaped noise maskers, and when there was perceived spatial separation between target and masker, conditions in which informational masking was believed to have been minimal. There seemed to be an additive effect of spatial separation and priming in the two-talker babble condition. CONCLUSIONS (1) Priming was effective in improving speech perception in all conditions, including those consisting of primarily energetic masking. (2) It is not clear how much benefit from priming could be attributed to release from informational masking. (3) Performance on the same-different task was linearly related to performance on an open-set speech recognition task using the same target and masker.
Collapse
|
78
|
Ping L, Yuan M, Feng H. Musical Pitch Discrimination by Cochlear Implant Users. Ann Otol Rhinol Laryngol 2012; 121:328-36. [DOI: 10.1177/000348941212100508] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Objectives: The main goal of this study was to investigate the effects of acoustic characteristics, including timbre and fundamental frequency (F0), on the musical pitch discrimination of cochlear implant users. Methods: Eight postlingually deafened cochlear implant users were recruited, along with 8 control subjects with normal hearing. Pitch discrimination tests were carried out using test stimuli from 4 musical instruments plus synthetic complex stimuli. Three reference tones with different F0s were used. Results: The mean difference limens were 1.8 to 10.7 semitones in the just-noticeable difference task and 2.1 to 13.6 semitones in the pitch-direction discrimination task for different timbre and F0 combinations. Three-way analysis of variance showed that the acoustic characteristics of the musical stimuli, such as timbre and F0, significantly influenced pitch discrimination performance. Conclusions: Acoustic characteristics determine the complexity of the electrical stimulation pattern, which directly affects performance in pitch discrimination. A place pattern with a clear and regular low-order harmonic structure is most important for good pitch discrimination. A clear F0-related temporal pattern is also useful when the F0 is low. Pitch perception performance will worsen when there is interference in the high-frequency channels.
Collapse
|
79
|
Hwang CF, Chen HC, Yang CH, Peng JP, Weng CH. Comparison of Mandarin tone and speech perception between advanced combination encoder and continuous interleaved sampling speech-processing strategies in children. Am J Otolaryngol 2012; 33:338-44. [PMID: 21982716 DOI: 10.1016/j.amjoto.2011.08.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2011] [Revised: 08/14/2011] [Accepted: 08/16/2011] [Indexed: 11/29/2022]
Abstract
OBJECTIVE This study was performed to compare cochlear implant (CI) users' performance in Mandarin speech and tone perception between 2 types of speech-processing strategies-advanced combination encoder (ACE) and continuous interleaved sampling (CIS)-under quiet and noisy conditions. METHODS This study involved 10 congenitally deaf children (age range, 5.7-15.3 years; mean, 9.2 years) who received the Nucleus 24-channel CI system cochlear device (CI24R; Cochlear Ltd, Lane Cove NSW, Australia). The subjects used ACE since switching on their CI devices. Speech and tone perception tests were administered under quiet and noisy (+5 dB signal-to-noise ratio) conditions with ACE and CIS strategies 20 minutes and 2 weeks apart. RESULTS Regardless of the strategy used, subjects showed significantly higher scores in speech perception than in tone recognition. Under noisy conditions, subjects had significantly higher tone identification scores with the CIS than the ACE strategy (P = .038). There was no significant difference in speech identification score between the strategies. Subjects showed significant higher tone identification and speech perception scores under quiet than noisy (+5 dB signal-to-noise ratio) conditions. Subjectively, 6 subjects preferred the ACE strategy, and the remaining 4 preferred the CIS strategy. The strategy preference of the subjects was related to speech perception performance rather than tone identification. A significant correlation was observed between tone identification and speech recognition, regardless of whether speech was evaluated by consonants (r = 0.669, P < .001), vowels (r = 0.426, P = .001), or sentences (r = 0.294, P = .023). CONCLUSION There are only 4 patterns of tone in Mandarin, which is far fewer than the number of speech sounds. However, tone identification is poorer than speech perception. The CIS speech-processing strategy may improve tone identification under noisy conditions. Before improved speech strategies to code acoustic characteristics of tone can be developed, it would be worthwhile to try both CIS and ACE for CI users and to select the most suitable speech-processing strategy according to the subjective preference and objective performance.
Collapse
Affiliation(s)
- Chung-Feng Hwang
- Department of Otolaryngology, Chang Gung Memorial Hospital-Kaohsiung Medical Center, Chang Gung University College of Medicine, Niaosong, Kaohsiung City, Taiwan.
| | | | | | | | | |
Collapse
|
80
|
Wang S, Liu B, Dong R, Zhou Y, Li J, Qi B, Chen X, Han D, Zhang L. Music and lexical tone perception in Chinese adult cochlear implant users. Laryngoscope 2012; 122:1353-60. [PMID: 22362607 DOI: 10.1002/lary.23271] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2011] [Revised: 01/31/2012] [Accepted: 02/07/2012] [Indexed: 11/05/2022]
Abstract
OBJECTIVES/HYPOTHESIS The present study's aim was to assess the music perception ability for Chinese adult cochlear implant users and to investigate the correlation between music and Mandarin-Chinese lexical tone perception. STUDY DESIGN Case-control study. METHODS Twenty normal-hearing and 21 adult cochlear implant users participated in the Musical Sounds in Cochlear Implants (MuSIC) perception test, including six objective and two subjective musical subtests. The comparison of music perception performance was made between normal-hearing and cochlear implant subjects. Sixteen of the 21 cochlear implant users also performed a tone identification test to investigate the correlation between music and tone perception. RESULTS Cochlear implant users performed significantly worse than normal-hearing subjects on pitch discrimination, instrument identification, and instrument detection tests, whereas close to normal-hearing subjects on melody discrimination, chords discrimination, rhythm discrimination, and emotion and dissonance rating subtests. Lexical tone perception was significantly correlated with pitch discrimination, melody discrimination, and instrument identification tests. Duration of hearing aid use was found to be correlated with pitch discrimination ability of cochlear implant users. CONCLUSIONS Chinese postlingually deafened cochlear implant users performed significantly poorer in pitch discrimination and timbre perception tasks than normal-hearing listeners. Lexical tone perception was found to be significantly correlated with music pitch perception, supporting the notion that tone and music perception may share a similar pitch perception mechanism.
Collapse
Affiliation(s)
- Shuo Wang
- Beijing Institute of Otolaryngology, Beijing Tongren Hospital, Capital Medical University, Beijing, China
| | | | | | | | | | | | | | | | | |
Collapse
|
81
|
Milczynski M, Chang JE, Wouters J, van Wieringen A. Perception of Mandarin Chinese with cochlear implants using enhanced temporal pitch cues. Hear Res 2012; 285:1-12. [PMID: 22361414 DOI: 10.1016/j.heares.2012.02.006] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/23/2011] [Revised: 01/22/2012] [Accepted: 02/08/2012] [Indexed: 11/25/2022]
Abstract
A cochlear implant (CI) signal processing strategy named F0 modulation (F0mod) was compared with the advanced combination encoder (ACE) strategy in a group of four post-lingually deafened Mandarin Chinese speaking CI listeners. F0 provides an enhanced temporal pitch cue by amplitude modulating the multichannel electrical stimulation pattern at the fundamental frequency (F0) of the incoming speech signal. Word and sentence recognition tests were carried out in quiet and in noise. The responses for the word-recognition test were further segmented into phoneme and tone scores. Off-line implementations of ACE and F0mod were used, and electrical stimulation patterns were directly streamed to the CI subject's implant. To focus on the feasibility of enhanced temporal cues for tonal language perception, idealized F0 information that was extracted from speech tokens in quiet was used in the F0mod processing of speech-in-noise mixtures. The results indicated significantly better lexical tone perception with the F0mod strategy than with ACE for the male voice (p<0.05). No significant differences in sentence recognition were found between F0mod and ACE.
Collapse
Affiliation(s)
- Matthias Milczynski
- ExpORL, Dept. Neurosciences, K.U.Leuven, O & N 2, Herestraat 49 bus 721, B-3000 Leuven, Belgium.
| | | | | | | |
Collapse
|
82
|
Feng YM, Xu L, Zhou N, Yang G, Yin SK. Sine-wave speech recognition in a tonal language. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 131:EL133-EL138. [PMID: 22352612 PMCID: PMC3272062 DOI: 10.1121/1.3670594] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2011] [Accepted: 11/10/2011] [Indexed: 05/31/2023]
Abstract
It is hypothesized that in sine-wave replicas of natural speech, lexical tone recognition would be severely impaired due to the loss of F0 information, but the linguistic information at the sentence level could be retrieved even with limited tone information. Forty-one native Mandarin-Chinese-speaking listeners participated in the experiments. Results showed that sine-wave tone-recognition performance was on average only 32.7% correct. However, sine-wave sentence-recognition performance was very accurate, approximately 92% correct on average. Therefore the functional load of lexical tones on sentence recognition is limited, and the high-level recognition of sine-wave sentences is likely attributed to the perceptual organization that is influenced by top-down processes.
Collapse
Affiliation(s)
- Yan-Mei Feng
- Department of Otolaryngology, Shanghai Sixth People's Hospital, Institute of Otolaryngology, Shanghai Jiao Tong University, Shanghai 200233, People's Republic of China.
| | | | | | | | | |
Collapse
|
83
|
Wang S, Xu L, Mannell R. Relative contributions of temporal envelope and fine structure cues to lexical tone recognition in hearing-impaired listeners. J Assoc Res Otolaryngol 2011; 12:783-94. [PMID: 21833816 DOI: 10.1007/s10162-011-0285-0] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2010] [Accepted: 07/25/2011] [Indexed: 11/24/2022] Open
Abstract
It has been reported that normal-hearing Chinese speakers base their lexical tone recognition on fine structure regardless of temporal envelope cues. However, a few psychoacoustic and perceptual studies have demonstrated that listeners with sensorineural hearing impairment may have an impaired ability to use fine structure information, whereas their ability to use temporal envelope information is close to normal. The purpose of this study is to investigate the relative contributions of temporal envelope and fine structure cues to lexical tone recognition in normal-hearing and hearing-impaired native Mandarin Chinese speakers. Twenty-two normal-hearing subjects and 31 subjects with various degrees of sensorineural hearing loss participated in the study. Sixteen sets of Mandarin monosyllables with four tone patterns for each were processed through a "chimeric synthesizer" in which temporal envelope from a monosyllabic word of one tone was paired with fine structure from the same monosyllable of other tones. The chimeric tokens were generated in the three channel conditions (4, 8, and 16 channels). Results showed that differences in tone responses among the three channel conditions were minor. On average, 90.9%, 70.9%, 57.5%, and 38.2% of tone responses were consistent with fine structure for normal-hearing, moderate, moderate to severe, and severely hearing-impaired groups respectively, whereas 6.8%, 21.1%, 31.4%, and 44.7% of tone responses were consistent with temporal envelope cues for the above-mentioned groups. Tone responses that were consistent neither with temporal envelope nor fine structure had averages of 2.3%, 8.0%, 11.1%, and 17.1% for the above-mentioned groups of subjects. Pure-tone average thresholds were negatively correlated with tone responses that were consistent with fine structure, but were positively correlated with tone responses that were based on the temporal envelope cues. Consistent with the idea that the spectral resolvability is responsible for fine structure coding, these results demonstrated that, as hearing loss becomes more severe, lexical tone recognition relies increasingly on temporal envelope rather than fine structure cues due to the widened auditory filters.
Collapse
Affiliation(s)
- Shuo Wang
- Beijing Institute of Otolaryngology, Key Laboratory of Otolaryngology Head and Neck Surgery (Capital Medical University), Ministry of Education, Beijing Tongren Hospital, Capital Medical University, Beijing, People's Republic of China
| | | | | |
Collapse
|
84
|
Li T, Fu QJ. Perceptual adaptation of voice gender discrimination with spectrally shifted vowels. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2011; 54:1240-1245. [PMID: 21173392 PMCID: PMC3580211 DOI: 10.1044/1092-4388(2010/10-0168)] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
PURPOSE To determine whether perceptual adaptation improves voice gender discrimination of spectrally shifted vowels and, if so, which acoustic cues contribute to the improvement. METHOD Voice gender discrimination was measured for 10 normal-hearing subjects, during 5 days of adaptation to spectrally shifted vowels, produced by processing the speech of 5 male and 5 female talkers with 16-channel sine-wave vocoders. The subjects were randomly divided into 2 groups; one subjected to 50-Hz, and the other to 200-Hz, temporal envelope cutoff frequencies. No preview or feedback was provided. RESULTS There was significant adaptation in voice gender discrimination with the 200-Hz cutoff frequency, but significant improvement was observed only for 3 female talkers with F(0) > 180 Hz and 3 male talkers with F(0) < 170 Hz. There was no significant adaptation with the 50-Hz cutoff frequency. CONCLUSIONS Temporal envelope cues are important for voice gender discrimination under spectral shift conditions with perceptual adaptation, but spectral shift may limit the exclusive use of spectral information and/or the use of formant structure on voice gender discrimination. The results have implications for cochlear implant users and for understanding voice gender discrimination.
Collapse
Affiliation(s)
- Tianhao Li
- House Ear Institute, Los Angeles, CA, USA.
| | | |
Collapse
|
85
|
Contribution of spectral cues to mandarin lexical tone recognition in normal-hearing and hearing-impaired Mandarin Chinese speakers. Ear Hear 2011; 32:97-103. [PMID: 20625301 DOI: 10.1097/aud.0b013e3181ec5c28] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
OBJECTIVE The purpose of this study was to investigate the contribution of spectral fine structure and spectral envelope cues to recognition of Mandarin lexical tones in normal-hearing and sensorineural hearing-impaired Mandarin-speaking listeners. DESIGN Four groups of subjects participated in the study, including 20 normal-hearing, 20 moderately, 20 moderately to severely, and 8 severely hearing-impaired listeners. The original speech materials consisted of 16 sets of Mandarin monosyllables spoken by a male and a female. Each monosyllable had four tonal patterns, resulting in a total of 64 combinations of consonants, vowels, and tones. A Linear Predictive Coding (LPC) algorithm was used to create two sets of synthesized materials, including 128 tokens with the original spectral fine structure mixed with the spectral envelope from a different tone, as well as 128 tokens with noise fine structure and the original spectral envelope. All subjects participated in tone recognition tests using the two sets of chimeric tone tokens. Oral responses to tones were recorded and scored as percent correct. RESULTS Hearing-impaired listeners could take advantage of spectral fine structure in the recognition of lexical tones, but with increasing hearing loss, the ability of hearing-impaired listeners to recognize tones became worse, especially for severely hearing-impaired listeners. Hearing-impaired listeners showed significant differences in tone recognition between the male and female voices. Tone 3 was the easiest tone to perceive, followed by tone 2, whereas tones 1 and 4 were hard for all subjects, particularly when only the spectral envelope cue was available. Hearing-impaired listeners showed a significantly lower level of lexical tone recognition than normal-hearing listeners when using spectral envelope cues in comparison with normal-hearing listeners. CONCLUSIONS These results demonstrate that the spectral fine structure cue dominates lexical tone recognition for all subjects. Listeners with sensorineural hearing impairment showed reduced ability in the recognition of lexical tones using both spectral fine structure and spectral envelope cues, which may result from their impaired auditory spectral resolution.
Collapse
|
86
|
|
87
|
Li X, Jeng FC. Noise tolerance in human frequency-following responses to voice pitch. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 129:EL21-EL26. [PMID: 21302977 DOI: 10.1121/1.3528775] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Speech communication usually occurs in the presence of background noise. This study examined noise tolerance in the brainstem's processing of voice pitch, as reflected by the scalp-recorded frequency-following response (FFR) from 12 normal-hearing adults. By systematically manipulating signal-to-noise ratio (SNR) across three different stimulus intensities, the results indicated that Frequency Error, Slope Error, and Tracking Accuracy remained relatively stable until SNR was degraded to 0 dB or lower (i.e., a turning point). This turning point not only provided physiological evidence supporting pitch tolerance of noise but also allowed recommendation of a minimal SNR when evaluating pitch processing in difficult-to-test patients.
Collapse
Affiliation(s)
- Ximing Li
- Communication Sciences and Disorders, Ohio University, Grover Center W224, Athens, Ohio 45701, USA.
| | | |
Collapse
|
88
|
Abstract
OBJECTIVE The purpose of the present study was to test the hypothesis that cochlear implant (CI) users' music perception is correlated with their lexical tone perception, and the two types of perception share similar mechanisms in electric hearing. DESIGN A lexical tone perception test and a pitch interval discrimination test were administered to a group of CI users and a group of normal-hearing (NH) listeners. SAMPLE STUDY: Nineteen adult CI users and 10 NH listeners who are native-Mandarin-Chinese speakers participated in the study. RESULT Tone-perception performance of the CI group was, on average, 58.3% correct (± 19.78% correct), and performance of the NH group was near perfect. The CI group had a mean threshold of 5.66 semitones (± 5.57 semitones) in pitch discrimination as compared to the threshold of 0.44 semitone from the NH group. There was a strong correlation between the CI users' tone-perception performance and their pitch discrimination threshold (r = -0.75, p < 0.001). CONCLUSION Musical and lexical pitch perceptions are strongly correlated with each other and they might share similar mechanisms in electric hearing.
Collapse
Affiliation(s)
- Wuqing Wang
- Eye, Ear, Nose, and Throat Hospital, Fudan University, Shanghai, China
| | | | | |
Collapse
|
89
|
Lexical tone perception with HiResolution and HiResolution 120 sound-processing strategies in pediatric Mandarin-speaking cochlear implant users. Ear Hear 2010; 30:169-77. [PMID: 19194297 DOI: 10.1097/aud.0b013e31819342cf] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
OBJECTIVES Lexical tone recognition tends to be poor in cochlear implant users. The HiResolution (HiRes) sound-processing strategy is designed to better preserve temporal fine structure, or the detailed envelope information, of an acoustic signal. The newer HiRes 120 strategy builds on HiRes by increasing the amount of potential spectral information delivered to the implant user. The purpose of this study was to examine lexical tone recognition in native Mandarin Chinese-speaking children with cochlear implants using the HiRes and HiRes 120 sound-processing strategies. Tone recognition performance was tested with HiRes at baseline and then after up to 6 mo of HiRes 120 experience in the same subjects. DESIGN Twenty prelingually deafened, native Mandarin-speaking children, with ages ranging from 3.5 to 16.5 yr, participated. All children completed a computerized tone contrast test on three occasions: (1) using HiRes immediately before conversion to HiRes 120 (baseline), (2) 1 mo after conversion, and (3) 3 mo after conversion. Twelve of the 20 children also were tested 6 mo after conversion. In addition, the parents of 18 children completed a questionnaire at the 3-mo follow-up visit regarding the preference of sound-processing strategies and the children's experience related to various aspects of auditory perception and speech production using HiRes 120. RESULTS As a group, no statistically significant differences were seen between the tone recognition scores using HiRes and HiRes 120. Individual scores showed great variability. Tone recognition performance ranged from chance (50% correct) to nearly perfect. Using the conventional HiRes strategy, 6 of the 20 children achieved high-level tone recognition performance (i.e., >or=90% correct), whereas 7 performed at a level not significantly different from chance (50-60% correct). At the final test, either 3 or 6 mo after conversion, all children achieved tone recognition performance with HiRes 120 that was equal to or better than that with HiRes, although some children's tone recognition performance was worse initially at the 1 or 3 mo follow-up intervals than at baseline. Eight of the 20 children showed statistically significant improvement in tone recognition performance with HiRes 120 on at least one of the follow-up tests. Age at implantation was correlated with tone recognition performance at all four test intervals. Parents of most of the children indicated that the children preferred HiRes 120 more than HiRes. CONCLUSIONS As a group, HiRes 120 did not provide significantly improved lexical tone recognition compared to HiRes, at least throughout the length of the study (up to 6 mo). There were large individual differences in lexical tone recognition among the prelingually deafened, native Mandarin-speaking children with cochlear implants using either HiRes or HiRes 120. Six of the 20 children performed at or near ceiling in the baseline HiRes condition. Of the remainder, approximately half showed significantly better tone recognition when subsequently tested with HiRes 120, although the extent to which this improvement may be attributable to factors other than the change in processing strategy (e.g., general development) is unknown. The children who benefited most from HiRes 120 tended to be those who were implanted at younger ages.
Collapse
|
90
|
Schatzer R, Krenmayr A, Au DKK, Kals M, Zierhofer C. Temporal fine structure in cochlear implants: preliminary speech perception results in Cantonese-speaking implant users. Acta Otolaryngol 2010; 130:1031-9. [PMID: 20141488 DOI: 10.3109/00016481003591731] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
CONCLUSION Acute comparisons between continuous interleaved sampling (CIS) and a temporal fine structure (TFS) coding strategy in Cantonese-speaking cochlear implant (CI) users did not reveal any significant differences in speech perception. Performance with the unfamiliar TFS coding strategy was on a par with CIS. Benefits of extended fine structure use observed in other studies should be investigated for tonal languages. OBJECTIVES CIS-based stimulation strategies lack an explicit representation of fine structure, which is crucial for tonal language speech perception. The aim of this study was to assess speech recognition with a TFS coding strategy in Cantonese-speaking CI users with no prior fine structure experience. METHODS The fine structure coding strategy encodes TFS on a few apical channels, while the remaining more basal channels carry CIS stimuli. Twelve MED-EL implantees and long-term CIS users participated in a study comparing recognition for Cantonese lexical tones and CHINT sentences between CIS and fine structure stimulation. RESULTS Mean tone identification scores in 12 subjects were 59.2% with CIS and 59.2% with fine structure stimulation using 4 TFS channels, mean scores of CHINT sentences in 8 subjects were 54.2% with CIS and 55.9% with TFS stimulation. Differences between the two strategies were not significant for any speech test. Two additional versions of TFS strategy and pulse rates were tested in six subjects. No significant differences between strategies were found.
Collapse
Affiliation(s)
- Reinhold Schatzer
- C. Doppler Laboratory for Active Implantable Systems, Institute of Ion Physics and Applied Physics, University of Innsbruck, Innsbruck, Austria.
| | | | | | | | | |
Collapse
|
91
|
Altmann CF, Júnior CGDO, Heinemann L, Kaiser J. Processing of spectral and amplitude envelope of animal vocalizations in the human auditory cortex. Neuropsychologia 2010; 48:2824-32. [DOI: 10.1016/j.neuropsychologia.2010.05.024] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2009] [Revised: 05/09/2010] [Accepted: 05/12/2010] [Indexed: 11/28/2022]
|
92
|
Stohl JS, Throckmorton CS, Collins LM. Investigating the effects of stimulus duration and context on pitch perception by cochlear implant users. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 126:318-326. [PMID: 19603888 PMCID: PMC2723905 DOI: 10.1121/1.3133246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2008] [Revised: 04/21/2009] [Accepted: 04/22/2009] [Indexed: 05/28/2023]
Abstract
Cochlear implant sound processing strategies that use time-varying pulse rates to transmit fine structure information are one proposed method for improving the spectral representation of a sound with the eventual goal of improving speech recognition in noisy conditions, speech recognition in tonal languages, and music identification and appreciation. However, many of the perceptual phenomena associated with time-varying rates are not well understood. In this study, the effects of stimulus duration on both the place and rate-pitch percepts were investigated via psychophysical experiments. Four Nucleus CI24 cochlear implant users participated in these experiments, which included a short-duration pitch ranking task and three adaptive pulse rate discrimination tasks. When duration was fixed from trial-to-trial and rate was varied adaptively, results suggested that both the place-pitch and rate-pitch percepts may be independent of duration for durations above 10 and 20 ms, respectively. When duration was varied and pulse rates were fixed, performance was highly variable within and across subjects. Implications for multi-rate sound processing strategies are discussed.
Collapse
Affiliation(s)
- Joshua S Stohl
- Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina 27708-0291, USA
| | | | | |
Collapse
|
93
|
Yuan M, Lee T, Yuen KCP, Soli SD, van Hasselt CA, Tong MCF. Cantonese tone recognition with enhanced temporal periodicity cues. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 126:327-337. [PMID: 19603889 DOI: 10.1121/1.3117447] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
This study investigated the contributions of temporal periodicity cues and the effectiveness of enhancing these cues for Cantonese tone recognition in noise. A multichannel noise-excited vocoder was used to simulate speech processing in cochlear implants. Ten normal-hearing listeners were tested. Temporal envelope and periodicity cues (TEPCs) below 500 Hz were extracted from four frequency bands: 60-500, 500-1000, 1000-2000, and 2000-4000 Hz. The test stimuli were obtained by combining TEPC-modulated noise signals from individual bands. For periodicity enhancement, temporal fluctuations in the range 20-500 Hz were replaced by a sinusoid with frequency equal to the fundamental frequency of original speech. Tone identification experiments were carried out using disyllabic word carriers. Results showed that TEPCs from the two high-frequency bands were more important for tone identification than TEPCs from the low-frequency bands. The use of periodicity-enhanced TEPCs led to consistent improvement of tone identification accuracy. The improvement was more significant at low signal-to-noise ratios, and more noticeable for female than for male voices. Analysis of error distributions showed that the enhancement method reduced tone identification errors and did not show any negative effect on the recognition of segmental structures.
Collapse
Affiliation(s)
- Meng Yuan
- Department of Electronic Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong.
| | | | | | | | | | | |
Collapse
|
94
|
Li X, Ning Z, Brashears R, Rife K. Relative Contributions of Spectral and Temporal Cues for Speech Recognition in Patients with Sensorineural Hearing Loss. J Otol 2008. [DOI: 10.1016/s1672-2930(08)50019-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
|
95
|
Stone MA, Füllgrabe C, Moore BCJ. Benefit of high-rate envelope cues in vocoder processing: effect of number of channels and spectral region. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 124:2272-82. [PMID: 19062865 DOI: 10.1121/1.2968678] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
In cochlear implants, or vocoder simulations of cochlear implants, the transmission of envelope cues at high rates (related to voice fundamental frequency, f0) may be limited by the widths of the filters used to form the channels and/or by the cutoff frequency, f(lp), of the low-pass filters used for envelope extraction. The effect of varying f(lp) in tone and noise vocoders was investigated for channel numbers, N, from 6 to 18. As N increased, the widths of the channels decreased. The value of f(lp) was 45 Hz (envelope or "E" filter), or 180 Hz (pitch or "P" filter). The following combinations of cutoff frequencies were used for channels below and above 1500 Hz, respectively: EE, PE, EP, and PP. Results from a competing-talker task showed that the tone vocoder led to better intelligibility than the noise vocoder. The PP condition led to the best intelligibility and the EE condition to the worst. For N=6, intelligibility was better for condition PE than for condition EP. For N=18, the reverse was true. The results indicate that the channel bandwidths can compromise the transmission of f0-related envelope information, and suggest that vocoder simulations of cochlear-implant processing have limitations.
Collapse
Affiliation(s)
- Michael A Stone
- Department of Experimental Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, United Kingdom.
| | | | | |
Collapse
|
96
|
Zhou N, Xu L. Lexical tone recognition with spectrally mismatched envelopes. Hear Res 2008; 246:36-43. [PMID: 18848614 DOI: 10.1016/j.heares.2008.09.006] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/29/2008] [Revised: 09/16/2008] [Accepted: 09/17/2008] [Indexed: 12/21/2022]
Abstract
It has been shown that frequency-place mismatch has detrimental effects on English speech recognition. The present study investigated the effects of mismatched spectral distribution of envelopes on Mandarin Chinese tone recognition using a noise-excited vocoder. In Experiment 1, speech samples were processed to simulate a cochlear implant with various insertion depths. The carrier bands were shifted basally relative to the analysis bands by 1-7 mm in the cochlea. Nine normal-hearing Mandarin Chinese listeners participated in this experiment. Basal shift of the carriers only slightly affected tone recognition. The resistance of tone recognition to spectral shift can be attributed to the overall amplitude contour cues that are independent from spectral manipulations. Experiment 2 examined the effects of frequency compression, where widened analysis bands by 2, 6, and 10 mm were compressively allocated to narrower carrier bands. Five of the 9 subjects participated in Experiment 2. It appears that the expanded frequency information especially on the low frequency end can compensate for the distortion from frequency compression. Thus, spectral shift might not pose a severe problem for tone recognition, and allocation of wider frequency range to include more low frequency information might be beneficial for tone recognition.
Collapse
Affiliation(s)
- Ning Zhou
- School of Hearing, Speech and Language Sciences, Ohio University, Athens, OH 45701, USA
| | | |
Collapse
|
97
|
Niebuhr O. Coding of intonational meanings beyond F0: evidence from utterance-final /t/ aspiration in German. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 124:1252-1263. [PMID: 18681611 DOI: 10.1121/1.2940588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
An acoustic analysis of a German read-speech corpus showed that utterance-final /t/ aspirations differ systematically depending on the accompanying nuclear accent contour. Two contours were included: Terminal-falling early and late F0 peaks in terms of the Kiel Intonation Model. They correspond to H+L*L-% and L*+HL-% within the autosegmental metrical (AM) model. Aspirations in early-peak contexts were characterized by (a) "short", (b) "high-intensity" noise with (c) "low" frequency values for the spectral energy maximum above the lower spectral energy boundary. The opposite holds for aspirations accompanying late-peak productions. Starting from the acoustic analysis, a perception experiment was performed using a variant of the semantic differential paradigm. The stimuli were varied in the duration and intensity pattern as well as the spectral energy pattern of the final /t/ aspiration. Results revealed that the different noise patterns found in connection with early and late peak productions were able to change the attitudinal meaning of the stimuli toward the meaning profile of the respective F0 peak category. This suggests that final aspirations can be part of the coding of meanings, so far solely associated with intonation contours. Hence, the traditionally separated segmental and suprasegmental coding levels seem to be more intertwined than previously thought.
Collapse
Affiliation(s)
- Oliver Niebuhr
- Institute of Phonetics and Digital Speech Processing, Christian-Albrecht-University, Kiel, Germany.
| |
Collapse
|
98
|
Morton KD, Torrione PA, Throckmorton CS, Collins LM. Mandarin Chinese tone identification in cochlear implants: predictions from acoustic models. Hear Res 2008; 244:66-76. [PMID: 18706497 DOI: 10.1016/j.heares.2008.07.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/23/2008] [Revised: 07/15/2008] [Accepted: 07/22/2008] [Indexed: 10/21/2022]
Abstract
It has been established that current cochlear implants do not supply adequate spectral information for perception of tonal languages. Comprehension of a tonal language, such as Mandarin Chinese, requires recognition of lexical tones. New strategies of cochlear stimulation such as variable stimulation rate and current steering may provide the means of delivering more spectral information and thus may provide the auditory fine-structure required for tone recognition. Several cochlear implant signal processing strategies are examined in this study, the continuous interleaved sampling (CIS) algorithm, the frequency amplitude modulation encoding (FAME) algorithm, and the multiple carrier frequency algorithm (MCFA). These strategies provide different types and amounts of spectral information. Pattern recognition techniques can be applied to data from Mandarin Chinese tone recognition tasks using acoustic models as a means of testing the abilities of these algorithms to transmit the changes in fundamental frequency indicative of the four lexical tones. The ability of processed Mandarin Chinese tones to be correctly classified may predict trends in the effectiveness of different signal processing algorithms in cochlear implants. The proposed techniques can predict trends in performance of the signal processing techniques in quiet conditions but fail to do so in noise.
Collapse
Affiliation(s)
- Kenneth D Morton
- Duke University Department of Electrical and Computer Engineering, Box 90291, Durham, NC 27708-0291, USA
| | | | | | | |
Collapse
|
99
|
Xu L, Pfingst BE. Spectral and temporal cues for speech recognition: implications for auditory prostheses. Hear Res 2007; 242:132-40. [PMID: 18249077 DOI: 10.1016/j.heares.2007.12.010] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/30/2007] [Revised: 12/16/2007] [Accepted: 12/19/2007] [Indexed: 11/30/2022]
Abstract
Features of stimulation important for speech recognition in people with normal hearing and in people using implanted auditory prostheses include spectral information represented by place of stimulation along the tonotopic axis and temporal information represented in low-frequency envelopes of the signal. The relative contributions of these features to speech recognition and their interactions have been studied using vocoder-like simulations of cochlear implant speech processors presented to listeners with normal hearing. In these studies, spectral/place information was manipulated by varying the number of channels and the temporal-envelope information was manipulated by varying the lowpass cutoffs of the envelope extractors. Consonant and vowel recognition in quiet reached plateau at 8 and 12 channels and lowpass cutoff frequencies of 16 Hz and 4 Hz, respectively. Phoneme (especially vowel) recognition in noise required larger numbers of channels. Lexical tone recognition required larger numbers of channels and higher lowpass cutoff frequencies. There was a tradeoff between spectral/place and temporal-envelope requirements. Most current auditory prostheses seem to deliver adequate temporal-envelope information, but the number of effective channels is suboptimal, particularly for speech recognition in noise, lexical tone recognition, and music perception.
Collapse
Affiliation(s)
- Li Xu
- School of Hearing, Speech and Language Sciences, Ohio University, Athens, OH 45701, USA.
| | | |
Collapse
|
100
|
Xu H, Kotak VC, Sanes DH. Conductive hearing loss disrupts synaptic and spike adaptation in developing auditory cortex. J Neurosci 2007; 27:9417-26. [PMID: 17728455 PMCID: PMC6673134 DOI: 10.1523/jneurosci.1992-07.2007] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Although sensorineural hearing loss (SNHL) is known to compromise central auditory structure and function, the impact of milder forms of hearing loss on cellular neurophysiology remains mostly undefined. We induced conductive hearing loss (CHL) in developing gerbils, reared the animals for 8-13 d, and subsequently assessed the temporal features of auditory cortex layer 2/3 pyramidal neurons in a thalamocortical brain slice preparation with whole-cell recordings. Repetitive stimulation of the ventral medial geniculate nucleus (MGv) evoked robust short-term depression of the postsynaptic potentials in control neurons, and this depression increased monotonically at higher stimulation frequencies. In contrast, CHL neurons displayed a faster rate of synaptic depression and a smaller asymptotic amplitude. Moreover, the latency of MGv evoked potentials was consistently longer in CHL neurons for all stimulus rates. A separate assessment of spike frequency adaptation in response to trains of injected current pulses revealed that CHL neurons displayed less adaptation compared with controls, although there was an increase in temporal jitter. For each of these properties, nearly identical findings were observed for SNHL neurons. Together, these data show that CHL significantly alters the temporal properties of auditory cortex synapses and spikes, and this may contribute to processing deficits that attend mild to moderate hearing loss.
Collapse
Affiliation(s)
- Han Xu
- Center for Neural Science and
| | | | - Dan H. Sanes
- Center for Neural Science and
- Department of Biology, New York University, New York, New York 10003
| |
Collapse
|