1
|
Nittrouer S, Lowenstein JH. Recognition of Sentences With Complex Syntax in Speech Babble by Adolescents With Normal Hearing or Cochlear Implants. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:1110-1135. [PMID: 36758200 PMCID: PMC10205108 DOI: 10.1044/2022_jslhr-22-00407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 10/17/2022] [Accepted: 11/22/2022] [Indexed: 05/25/2023]
Abstract
PURPOSE General language abilities of children with cochlear implants have been thoroughly investigated, especially at young ages, but far less is known about how well they process language in real-world settings, especially in higher grades. This study addressed this gap in knowledge by examining recognition of sentences with complex syntactic structures in backgrounds of speech babble by adolescents with cochlear implants, and peers with normal hearing. DESIGN Two experiments were conducted. First, new materials were developed using young adults with normal hearing as the normative sample, creating a corpus of sentences with controlled, but complex syntactic structures presented in three kinds of babble that varied in voice gender and number of talkers. Second, recognition by adolescents with normal hearing or cochlear implants was examined for these new materials and for sentence materials used with these adolescents at younger ages. Analyses addressed three objectives: (1) to assess the stability of speech recognition across a multiyear age range, (2) to evaluate speech recognition of sentences with complex syntax in babble, and (3) to explore how bottom-up and top-down mechanisms account for performance under these conditions. RESULTS Results showed: (1) Recognition was stable across the ages of 10-14 years for both groups. (2) Adolescents with normal hearing performed similarly to young adults with normal hearing, showing effects of syntactic complexity and background babble; adolescents with cochlear implants showed poorer recognition overall, and diminished effects of both factors. (3) Top-down language and working memory primarily explained recognition for adolescents with normal hearing, but the bottom-up process of perceptual organization primarily explained recognition for adolescents with cochlear implants. CONCLUSIONS Comprehension of language in real-world settings relies on different mechanisms for adolescents with cochlear implants than for adolescents with normal hearing. A novel finding was that perceptual organization is a critical factor. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.21965228.
Collapse
Affiliation(s)
- Susan Nittrouer
- Department of Speech, Language, and Hearing Sciences, University of Florida, Gainesville
| | - Joanna H. Lowenstein
- Department of Speech, Language, and Hearing Sciences, University of Florida, Gainesville
| |
Collapse
|
2
|
Bernstein LE, Jordan N, Auer ET, Eberhardt SP. Lipreading: A Review of Its Continuing Importance for Speech Recognition With an Acquired Hearing Loss and Possibilities for Effective Training. Am J Audiol 2022; 31:453-469. [PMID: 35316072 PMCID: PMC9524756 DOI: 10.1044/2021_aja-21-00112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Revised: 10/25/2021] [Accepted: 12/30/2021] [Indexed: 11/09/2022] Open
Abstract
PURPOSE The goal of this review article is to reinvigorate interest in lipreading and lipreading training for adults with acquired hearing loss. Most adults benefit from being able to see the talker when speech is degraded; however, the effect size is related to their lipreading ability, which is typically poor in adults who have experienced normal hearing through most of their lives. Lipreading training has been viewed as a possible avenue for rehabilitation of adults with an acquired hearing loss, but most training approaches have not been particularly successful. Here, we describe lipreading and theoretically motivated approaches to its training, as well as examples of successful training paradigms. We discuss some extensions to auditory-only (AO) and audiovisual (AV) speech recognition. METHOD Visual speech perception and word recognition are described. Traditional and contemporary views of training and perceptual learning are outlined. We focus on the roles of external and internal feedback and the training task in perceptual learning, and we describe results of lipreading training experiments. RESULTS Lipreading is commonly characterized as limited to viseme perception. However, evidence demonstrates subvisemic perception of visual phonetic information. Lipreading words also relies on lexical constraints, not unlike auditory spoken word recognition. Lipreading has been shown to be difficult to improve through training, but under specific feedback and task conditions, training can be successful, and learning can generalize to untrained materials, including AV sentence stimuli in noise. The results on lipreading have implications for AO and AV training and for use of acoustically processed speech in face-to-face communication. CONCLUSION Given its importance for speech recognition with a hearing loss, we suggest that the research and clinical communities integrate lipreading in their efforts to improve speech recognition in adults with acquired hearing loss.
Collapse
Affiliation(s)
- Lynne E. Bernstein
- Department of Speech, Language & Hearing Sciences, George Washington University, Washington, DC
| | - Nicole Jordan
- Department of Speech, Language & Hearing Sciences, George Washington University, Washington, DC
| | - Edward T. Auer
- Department of Speech, Language & Hearing Sciences, George Washington University, Washington, DC
| | - Silvio P. Eberhardt
- Department of Speech, Language & Hearing Sciences, George Washington University, Washington, DC
| |
Collapse
|
3
|
Automatic Speech Recognition (ASR) Systems for Children: A Systematic Literature Review. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12094419] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Automatic speech recognition (ASR) is one of the ways used to transform acoustic speech signals into text. Over the last few decades, an enormous amount of research work has been done in the research area of speech recognition (SR). However, most studies have focused on building ASR systems based on adult speech. The recognition of children’s speech was neglected for some time, which means that the field of children’s SR research is wide open. Children’s SR is a challenging task due to the large variations in children’s articulatory, acoustic, physical, and linguistic characteristics compared to adult speech. Thus, the field became a very attractive area of research and it is important to understand where the main center of attention is, and what are the most widely used methods for extracting acoustic features, various acoustic models, speech datasets, the SR toolkits used during the recognition process, and so on. ASR systems or interfaces are extensively used and integrated into various real-life applications, such as search engines, the healthcare industry, biometric analysis, car systems, the military, aids for people with disabilities, and mobile devices. A systematic literature review (SLR) is presented in this work by extracting the relevant information from 76 research papers published from 2009 to 2020 in the field of ASR for children. The objective of this review is to throw light on the trends of research in children’s speech recognition and analyze the potential of trending techniques to recognize children’s speech.
Collapse
|
4
|
Holder JT, Gifford RH. Effect of Increased Daily Cochlear Implant Use on Auditory Perception in Adults. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:4044-4055. [PMID: 34546763 PMCID: PMC9132064 DOI: 10.1044/2021_jslhr-21-00066] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Revised: 05/27/2021] [Accepted: 06/28/2021] [Indexed: 06/13/2023]
Abstract
Purpose Despite the recommendation for cochlear implant (CI) processor use during all waking hours, variability in average daily wear time remains high. Previous work has shown that objective wear time is significantly correlated with speech recognition outcomes. We aimed to investigate the causal link between daily wear time and speech recognition outcomes and assess one potential underlying mechanism, spectral processing, driving the causal link. We hypothesized that increased CI use would result in improved speech recognition via improved spectral processing. Method Twenty adult CI recipients completed two study visits. The baseline visit included auditory perception testing (speech recognition and spectral processing measures), questionnaire administration, and documentation of data logging from the CI software. Participants watched an educational video, and they were informed of the compensation schedule. Participants were then asked to increase their daily CI use over a 4-week period during everyday life. Baseline measures were reassessed following the 4-week period. Results Seventeen out of 20 participants increased their daily CI use. On average, participants' speech recognition improved by 3.0, 2.4, and 7.0 percentage points per hour of increased average daily CI use for consonant-nucleus-consonant words, AzBio sentences, and AzBio sentences in noise, respectively. Questionnaire scores were similar between visits. Spectral processing showed significant improvement and accounted for a small amount of variance in the change in speech recognition values. Conclusions Improved consistency of processor use over a 4-week period yielded significant improvements in speech recognition scores. Though a significant factor, spectral processing is likely not the only mechanism driving improvement in speech recognition; further research is warranted.
Collapse
Affiliation(s)
- Jourdan T. Holder
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN
| | - René H. Gifford
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN
| |
Collapse
|
5
|
Yoon YS, Mills I, Toliver B, Park C, Whitaker G, Drew C. Comparisons in Frequency Difference Limens Between Sequential and Simultaneous Listening Conditions in Normal-Hearing Listeners. Am J Audiol 2021; 30:266-274. [PMID: 33769845 DOI: 10.1044/2021_aja-20-00134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Purpose We compared frequency difference limens (FDLs) in normal-hearing listeners under two listening conditions: sequential and simultaneous. Method Eighteen adult listeners participated in three experiments. FDL was measured using a method of limits for comparison frequency. In the sequential listening condition, the tones were presented with a half-second time interval in between, but for the simultaneous listening condition, the tones were presented simultaneously. For the first experiment, one of four reference tones (125, 250, 500, or 750 Hz), which was presented to the left ear, was paired with one of four starting comparison tones (250, 500, 750, or 1000 Hz), which was presented to the right ear. The second and third experiments had the same testing conditions as the first experiment except with two- and three-tone complexes, comparison tones. The subjects were asked if the tones sounded the same or different. When a subject chose "different," the comparison frequency decreased by 10% of the frequency difference between the reference and comparison tones. FDLs were determined when the subjects chose "same" 3 times in a row. Results FDLs were significantly broader (worse) with simultaneous listening than with sequential listening for the two- and three-tone complex conditions but not for the single-tone condition. The FDLs were narrowest (best) with the three-tone complex under both listening conditions. FDLs broadened as the testing frequencies increased for the single tone and the two-tone complex. The FDLs were not broadened at frequencies > 250 Hz for the three-tone complex. Conclusion The results suggest that sequential and simultaneous frequency discriminations are mediated by different processes at different stages in the auditory pathway for complex tones, but not for pure tones.
Collapse
Affiliation(s)
- Yang-Soo Yoon
- Department of Communication Sciences and Disorders, Baylor University, Waco, TX
| | - Ivy Mills
- Department of Communication Sciences and Disorders, Baylor University, Waco, TX
| | - BaileyAnn Toliver
- Department of Communication Sciences and Disorders, Baylor University, Waco, TX
| | - Christine Park
- Department of Communication Sciences and Disorders, Baylor University, Waco, TX
| | - George Whitaker
- Division of Otolaryngology, Baylor Scott & White Medical Center, Temple, TX
| | - Carrie Drew
- Department of Communication Sciences and Disorders, Baylor University, Waco, TX
| |
Collapse
|
6
|
Errors on a Speech-in-Babble Sentence Recognition Test Reveal Individual Differences in Acoustic Phonetic Perception and Babble Misallocations. Ear Hear 2021; 42:673-690. [PMID: 33928926 DOI: 10.1097/aud.0000000000001020] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
OBJECTIVES The ability to recognize words in connected speech under noisy listening conditions is critical to everyday communication. Many processing levels contribute to the individual listener's ability to recognize words correctly against background speech, and there is clinical need for measures of individual differences at different levels. Typical listening tests of speech recognition in noise require a list of items to obtain a single threshold score. Diverse abilities measures could be obtained through mining various open-set recognition errors during multi-item tests. This study sought to demonstrate that an error mining approach using open-set responses from a clinical sentence-in-babble-noise test can be used to characterize abilities beyond signal-to-noise ratio (SNR) threshold. A stimulus-response phoneme-to-phoneme sequence alignment software system was used to achieve automatic, accurate quantitative error scores. The method was applied to a database of responses from normal-hearing (NH) adults. Relationships between two types of response errors and words correct scores were evaluated through use of mixed models regression. DESIGN Two hundred thirty-three NH adults completed three lists of the Quick Speech in Noise test. Their individual open-set speech recognition responses were automatically phonemically transcribed and submitted to a phoneme-to-phoneme stimulus-response sequence alignment system. The computed alignments were mined for a measure of acoustic phonetic perception, a measure of response text that could not be attributed to the stimulus, and a count of words correct. The mined data were statistically analyzed to determine whether the response errors were significant factors beyond stimulus SNR in accounting for the number of words correct per response from each participant. This study addressed two hypotheses: (1) Individuals whose perceptual errors are less severe recognize more words correctly under difficult listening conditions due to babble masking and (2) Listeners who are better able to exclude incorrect speech information such as from background babble and filling in recognize more stimulus words correctly. RESULTS Statistical analyses showed that acoustic phonetic accuracy and exclusion of babble background were significant factors, beyond the stimulus sentence SNR, in accounting for the number of words a participant recognized. There was also evidence that poorer acoustic phonetic accuracy could occur along with higher words correct scores. This paradoxical result came from a subset of listeners who had also performed subjective accuracy judgments. Their results suggested that they recognized more words while also misallocating acoustic cues from the background into the stimulus, without realizing their errors. Because the Quick Speech in Noise test stimuli are locked to their own babble sample, misallocations of whole words from babble into the responses could be investigated in detail. The high rate of common misallocation errors for some sentences supported the view that the functional stimulus was the combination of the target sentence and its babble. CONCLUSIONS Individual differences among NH listeners arise both in terms of words accurately identified and errors committed during open-set recognition of sentences in babble maskers. Error mining to characterize individual listeners can be done automatically at the levels of acoustic phonetic perception and the misallocation of background babble words into open-set responses. Error mining can increase test information and the efficiency and accuracy of characterizing individual listeners.
Collapse
|
7
|
Musacchia G, Ortiz-Mantilla S, Roesler CP, Rajendran S, Morgan-Byrne J, Benasich AA. Effects of noise and age on the infant brainstem response to speech. Clin Neurophysiol 2018; 129:2623-2634. [DOI: 10.1016/j.clinph.2018.08.005] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2018] [Revised: 08/20/2018] [Accepted: 08/24/2018] [Indexed: 12/23/2022]
|
8
|
Nittrouer S, Krieg LM, Lowenstein JH. Speech Recognition in Noise by Children with and without Dyslexia: How is it Related to Reading? RESEARCH IN DEVELOPMENTAL DISABILITIES 2018; 77:98-113. [PMID: 29724639 PMCID: PMC5947872 DOI: 10.1016/j.ridd.2018.04.014] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/20/2017] [Revised: 12/20/2017] [Accepted: 04/11/2018] [Indexed: 05/08/2023]
Abstract
PURPOSE Developmental dyslexia is commonly viewed as a phonological deficit that makes it difficult to decode written language. But children with dyslexia typically exhibit other problems, as well, including poor speech recognition in noise. The purpose of this study was to examine whether the speech-in-noise problems of children with dyslexia are related to their reading problems, and if so, if a common underlying factor might explain both. The specific hypothesis examined was that a spectral processing disorder results in these children receiving smeared signals, which could explain both the diminished sensitivity to phonological structure - leading to reading problems - and the speech recognition in noise difficulties. The alternative hypothesis tested in this study was that children with dyslexia simply have broadly based language deficits. PARTICIPANTS Ninety-seven children between the ages of 7 years; 10 months and 12 years; 9 months participated: 46 with dyslexia and 51 without dyslexia. METHODS Children were tested on two dependent measures: word reading and recognition in noise with two types of sentence materials: as unprocessed (UP) signals, and as spectrally smeared (SM) signals. Data were collected for four predictor variables: phonological awareness, vocabulary, grammatical knowledge, and digit span. RESULTS Children with dyslexia showed deficits on both dependent and all predictor variables. Their scores for speech recognition in noise were poorer than those of children without dyslexia for both the UP and SM signals, but by equivalent amounts across signal conditions indicating that they were not disproportionately hindered by spectral distortion. Correlation analyses on scores from children with dyslexia showed that reading ability and speech-in-noise recognition were only mildly correlated, and each skill was related to different underlying abilities. CONCLUSIONS No substantial evidence was found to support the suggestion that the reading and speech recognition in noise problems of children with dyslexia arise from a single factor that could be defined as a spectral processing disorder. The reading and speech recognition in noise deficits of these children appeared to be largely independent.
Collapse
|
9
|
Abstract
OBJECTIVES Spectral resolution is a correlate of open-set speech understanding in postlingually deaf adults and prelingually deaf children who use cochlear implants (CIs). To apply measures of spectral resolution to assess device efficacy in younger CI users, it is necessary to understand how spectral resolution develops in normal-hearing children. In this study, spectral ripple discrimination (SRD) was used to measure listeners' sensitivity to a shift in phase of the spectral envelope of a broadband noise. Both resolution of peak to peak location (frequency resolution) and peak to trough intensity (across-channel intensity resolution) are required for SRD. DESIGN SRD was measured as the highest ripple density (in ripples per octave) for which a listener could discriminate a 90° shift in phase of the sinusoidally-modulated amplitude spectrum. A 2 × 3 between-subjects design was used to assess the effects of age (7-month-old infants versus adults) and ripple peak/trough "depth" (10, 13, and 20 dB) on SRD in normal-hearing listeners (experiment 1). In experiment 2, SRD thresholds in the same age groups were compared using a task in which ripple starting phases were randomized across trials to obscure within-channel intensity cues. In experiment 3, the randomized starting phase method was used to measure SRD as a function of age (3-month-old infants, 7-month-old infants, and young adults) and ripple depth (10 and 20 dB in repeated measures design). RESULTS In experiment 1, there was a significant interaction between age and ripple depth. The infant SRDs were significantly poorer than the adult SRDs at 10 and 13 dB ripple depths but adult-like at 20 dB depth. This result is consistent with immature across-channel intensity resolution. In contrast, the trajectory of SRD as a function of depth was steeper for infants than adults suggesting that frequency resolution was better in infants than adults. However, in experiment 2 infant performance was significantly poorer than adults at 20 dB depth suggesting that variability of infants' use of within-channel intensity cues, rather than better frequency resolution, explained the results of experiment 1. In experiment 3, age effects were seen with both groups of infants showing poorer SRD than adults but, unlike experiment 1, no significant interaction between age and depth was seen. CONCLUSIONS Measurement of SRD thresholds in individual 3 to 7-month-old infants is feasible. Performance of normal-hearing infants on SRD may be limited by across-channel intensity resolution despite mature frequency resolution. These findings have significant implications for design and stimulus choice for applying SRD for testing infants with CIs. The high degree of variability in infant SRD can be somewhat reduced by obscuring within-channel cues.
Collapse
Affiliation(s)
- David L Horn
- 1Virginia Merrill Bloedel Hearing Research Center, Department of Otolaryngology-Head and Neck Surgery, University of Washington, Seattle, Washington, USA; 2Division of Otolaryngology, Seattle Children's Hospital, Seattle, Wahington, USA; and 3Department of Speech and Hearing Sciences, University of Washington, Seattle, Washington
| | | | | | | |
Collapse
|
10
|
Han W, Chun H, Kim G, Jin IK. Substitution Patterns of Phoneme Errors in Hearing Aid and Cochlear Implant Users. J Audiol Otol 2017; 21:28-32. [PMID: 28417105 PMCID: PMC5392003 DOI: 10.7874/jao.2017.21.1.28] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Revised: 06/01/2016] [Accepted: 06/09/2016] [Indexed: 11/25/2022] Open
Abstract
Background and Objectives It is acknowledged that speech perceptual errors are increased in listeners who have sensorineural hearing loss as noise increases. However, there is a lack of detailed information for their error pattern. The purpose of the present study was to analyze substitution patterns of phoneme errors in Korean hearing aid (HA) and cochlear implant (CI) users who are postlingually deafened adults. Subjects and Methods In quiet and under two noise conditions, the phoneme errors of twenty HA and fourteen CI users were measured by using monosyllabic words, and a substitution pattern was analyzed in terms of manner of articulation. Results The results showed that both groups had a high percentage of nasal and plosive substitutions regardless of background conditions. Conclusions This finding will provide vital information for understanding the speech perception of hearing-impaired listeners and for improving their ability to communicate when applied to auditory training.
Collapse
Affiliation(s)
- Woojae Han
- Division of Speech Pathology and Audiology, Research Institute of Audiology and Speech Pathology, College of Natural Science, Hallym University, Chuncheon, Korea
| | - Hyungi Chun
- Department of Speech Pathology and Audiology, Hallym University Graduate School, Chuncheon, Korea
| | - Gibbeum Kim
- Department of Speech Pathology and Audiology, Hallym University Graduate School, Chuncheon, Korea
| | - In-Ki Jin
- Division of Speech Pathology and Audiology, Research Institute of Audiology and Speech Pathology, College of Natural Science, Hallym University, Chuncheon, Korea
| |
Collapse
|