1
|
Drouin JR, Davis CP. Individual differences in visual pattern completion predict adaptation to degraded speech. BRAIN AND LANGUAGE 2024; 255:105449. [PMID: 39083999 DOI: 10.1016/j.bandl.2024.105449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Revised: 03/18/2024] [Accepted: 07/23/2024] [Indexed: 08/02/2024]
Abstract
Recognizing acoustically degraded speech relies on predictive processing whereby incomplete auditory cues are mapped to stored linguistic representations via pattern recognition processes. While listeners vary in their ability to recognize degraded speech, performance improves when a written transcription is presented, allowing completion of the partial sensory pattern to preexisting representations. Building on work characterizing predictive processing as pattern completion, we examined the relationship between domain-general pattern recognition and individual variation in degraded speech learning. Participants completed a visual pattern recognition task to measure individual-level tendency towards pattern completion. Participants were also trained to recognize noise-vocoded speech with written transcriptions and tested on speech recognition pre- and post-training using a retrieval-based transcription task. Listeners significantly improved in recognizing speech after training, and pattern completion on the visual task predicted improvement for novel items. The results implicate pattern completion as a domain-general learning mechanism that can facilitate speech adaptation in challenging contexts.
Collapse
Affiliation(s)
- Julia R Drouin
- Division of Speech and Hearing Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Department of Communication Sciences and Disorders, California State University Fullerton, Fullerton, CA 92831, USA.
| | - Charles P Davis
- Department of Psychology & Neuroscience, Duke University, Durham, NC 27708, USA
| |
Collapse
|
2
|
Ueda K, Hashimoto M, Takeichi H, Wakamiya K. Interrupted mosaic speech revisited: Gain and loss in intelligibility by stretchinga). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:1767-1779. [PMID: 38441439 DOI: 10.1121/10.0025132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 02/16/2024] [Indexed: 03/07/2024]
Abstract
Our previous investigation on the effect of stretching spectrotemporally degraded and temporally interrupted speech stimuli showed remarkable intelligibility gains [Udea, Takeichi, and Wakamiya (2022). J. Acoust. Soc. Am. 152(2), 970-980]. In this previous study, however, gap durations and temporal resolution were confounded. In the current investigation, we therefore observed the intelligibility of so-called mosaic speech while dissociating the effects of interruption and temporal resolution. The intelligibility of mosaic speech (20 frequency bands and 20 ms segment duration) declined from 95% to 78% and 33% by interrupting it with 20 and 80 ms gaps. Intelligibility improved, however, to 92% and 54% (14% and 21% gains for 20 and 80 ms gaps, respectively) by stretching mosaic segments to fill silent gaps (n = 21). By contrast, the intelligibility was impoverished to a minimum of 9% (7% loss) when stretching stimuli interrupted with 160 ms gaps. Explanations based on auditory grouping, modulation unmasking, or phonemic restoration may account for the intelligibility improvement by stretching, but not for the loss. The probability summation model accounted for "U"-shaped intelligibility curves and the gain and loss of intelligibility, suggesting that perceptual unit length and speech rate may affect the intelligibility of spectrotemporally degraded speech stimuli.
Collapse
Affiliation(s)
- Kazuo Ueda
- Department of Acoustic Design, Faculty of Design/Research Center for Applied Perceptual Science/Research and Development Center for Five-Sense Devices, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| | - Masashi Hashimoto
- Department of Acoustic Design, Faculty of Design, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| | - Hiroshige Takeichi
- Open Systems Information Science Team, Advanced Data Science Project (ADSP), RIKEN Information R&D and Strategy Headquarters (R-IH), RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Kohei Wakamiya
- Department of Acoustic Design, Faculty of Design, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| |
Collapse
|
3
|
Sathe NC, Kain A, Reiss LAJ. Fusion of dichotic consonants in normal-hearing and hearing-impaired listenersa). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:68-77. [PMID: 38174963 PMCID: PMC10990566 DOI: 10.1121/10.0024245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Revised: 12/09/2023] [Accepted: 12/13/2023] [Indexed: 01/05/2024]
Abstract
Hearing-impaired (HI) listeners have been shown to exhibit increased fusion of dichotic vowels, even with different fundamental frequency (F0), leading to binaural spectral averaging and interference. To determine if similar fusion and averaging occurs for consonants, four natural and synthesized stop consonants (/pa/, /ba/, /ka/, /ga/) at three F0s of 74, 106, and 185 Hz were presented dichotically-with ΔF0 varied-to normal-hearing (NH) and HI listeners. Listeners identified the one or two consonants perceived, and response options included /ta/ and /da/ as fused percepts. As ΔF0 increased, both groups showed decreases in fusion and increases in percent correct identification of both consonants, with HI listeners displaying similar fusion but poorer identification. Both groups exhibited spectral averaging (psychoacoustic fusion) of place of articulation but phonetic feature fusion for differences in voicing. With synthetic consonants, NH subjects showed increased fusion and decreased identification. Most HI listeners were unable to discriminate the synthetic consonants. The findings suggest smaller differences between groups in consonant fusion than vowel fusion, possibly due to the presence of more cues for segregation in natural speech or reduced reliance on spectral cues for consonant perception. The inability of HI listeners to discriminate synthetic consonants suggests a reliance on cues other than formant transitions for consonant discrimination.
Collapse
Affiliation(s)
- Nishad C Sathe
- Oregon Health and Science University, Portland, Oregon 97239, USA
| | - Alexander Kain
- Oregon Health and Science University, Portland, Oregon 97239, USA
| | - Lina A J Reiss
- Oregon Health and Science University, Portland, Oregon 97239, USA
| |
Collapse
|
4
|
van der Willigen RF, Versnel H, van Opstal AJ. Spectral-temporal processing of naturalistic sounds in monkeys and humans. J Neurophysiol 2024; 131:38-63. [PMID: 37965933 PMCID: PMC11305640 DOI: 10.1152/jn.00129.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 10/23/2023] [Accepted: 11/13/2023] [Indexed: 11/16/2023] Open
Abstract
Human speech and vocalizations in animals are rich in joint spectrotemporal (S-T) modulations, wherein acoustic changes in both frequency and time are functionally related. In principle, the primate auditory system could process these complex dynamic sounds based on either an inseparable representation of S-T features or, alternatively, a separable representation. The separability hypothesis implies an independent processing of spectral and temporal modulations. We collected comparative data on the S-T hearing sensitivity in humans and macaque monkeys to a wide range of broadband dynamic spectrotemporal ripple stimuli employing a yes-no signal-detection task. Ripples were systematically varied, as a function of density (spectral modulation frequency), velocity (temporal modulation frequency), or modulation depth, to cover a listener's full S-T modulation sensitivity, derived from a total of 87 psychometric ripple detection curves. Audiograms were measured to control for normal hearing. Determined were hearing thresholds, reaction time distributions, and S-T modulation transfer functions (MTFs), both at the ripple detection thresholds and at suprathreshold modulation depths. Our psychophysically derived MTFs are consistent with the hypothesis that both monkeys and humans employ analogous perceptual strategies: S-T acoustic information is primarily processed separable. Singular value decomposition (SVD), however, revealed a small, but consistent, inseparable spectral-temporal interaction. Finally, SVD analysis of the known visual spatiotemporal contrast sensitivity function (CSF) highlights that human vision is space-time inseparable to a much larger extent than is the case for S-T sensitivity in hearing. Thus, the specificity with which the primate brain encodes natural sounds appears to be less strict than is required to adequately deal with natural images.NEW & NOTEWORTHY We provide comparative data on primate audition of naturalistic sounds comprising hearing thresholds, reaction time distributions, and spectral-temporal modulation transfer functions. Our psychophysical experiments demonstrate that auditory information is primarily processed in a spectral-temporal-independent manner by both monkeys and humans. Singular value decomposition of known visual spatiotemporal contrast sensitivity, in comparison to our auditory spectral-temporal sensitivity, revealed a striking contrast in how the brain encodes natural sounds as opposed to natural images, as vision appears to be space-time inseparable.
Collapse
Affiliation(s)
- Robert F van der Willigen
- Section Neurophysics, Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
- School of Communication, Media and Information Technology, Rotterdam University of Applied Sciences, Rotterdam, The Netherlands
- Research Center Creating 010, Rotterdam University of Applied Sciences, Rotterdam, The Netherlands
| | - Huib Versnel
- Section Neurophysics, Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
- Department of Otorhinolaryngology and Head & Neck Surgery, UMC Utrecht Brain Center, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - A John van Opstal
- Section Neurophysics, Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| |
Collapse
|
5
|
Porto L, Wouters J, van Wieringen A. Speech perception in noise, working memory, and attention in children: A scoping review. Hear Res 2023; 439:108883. [PMID: 37722287 DOI: 10.1016/j.heares.2023.108883] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 08/28/2023] [Accepted: 09/07/2023] [Indexed: 09/20/2023]
Abstract
PURPOSE Speech perception in noise is an everyday occurrence for adults and children alike. The factors that influence how well individuals cope with noise during spoken communication are not well understood, particularly in the case of children. This article aims to review the available evidence on how working memory and attention play a role in children's speech perception in noise, how characteristics of measures affect results, and how this relationship differs in non-typical populations. METHOD This article is a scoping review of the literature available on PubMed. Forty articles were included for meeting the inclusion criteria of including children as participants, some measure of speech perception in noise, some measure of attention and/or working memory, and some attempt to establish relationships between the measures. Findings were charted and presented keeping in mind how they relate to the research questions. RESULTS The majority of studies report that attention and especially working memory are involved in speech perception in noise by children. We provide an overview of the impact of certain task characteristics on findings across the literature, as well as how these affect non-typical populations. CONCLUSION While most of the work reviewed here provides evidence suggesting that working memory and attention are important abilities employed by children in overcoming the difficulties imposed by noise during spoken communication, methodological variability still prevents a clearer picture from emerging.
Collapse
Affiliation(s)
- Lyan Porto
- Department of Neurosciences, University of Leuven, Research group Experimental Oto-Rino-Laryngologie. O&N II, Herestraat 49, Leuven 3000, Belgium.
| | - Jan Wouters
- Department of Neurosciences, University of Leuven, Research group Experimental Oto-Rino-Laryngologie. O&N II, Herestraat 49, Leuven 3000, Belgium
| | - Astrid van Wieringen
- Department of Neurosciences, University of Leuven, Research group Experimental Oto-Rino-Laryngologie. O&N II, Herestraat 49, Leuven 3000, Belgium; Department of Special Needs Education, University of Oslo, Norway
| |
Collapse
|
6
|
Yasmin S, Irsik VC, Johnsrude IS, Herrmann B. The effects of speech masking on neural tracking of acoustic and semantic features of natural speech. Neuropsychologia 2023; 186:108584. [PMID: 37169066 DOI: 10.1016/j.neuropsychologia.2023.108584] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 04/30/2023] [Accepted: 05/08/2023] [Indexed: 05/13/2023]
Abstract
Listening environments contain background sounds that mask speech and lead to communication challenges. Sensitivity to slow acoustic fluctuations in speech can help segregate speech from background noise. Semantic context can also facilitate speech perception in noise, for example, by enabling prediction of upcoming words. However, not much is known about how different degrees of background masking affect the neural processing of acoustic and semantic features during naturalistic speech listening. In the current electroencephalography (EEG) study, participants listened to engaging, spoken stories masked at different levels of multi-talker babble to investigate how neural activity in response to acoustic and semantic features changes with acoustic challenges, and how such effects relate to speech intelligibility. The pattern of neural response amplitudes associated with both acoustic and semantic speech features across masking levels was U-shaped, such that amplitudes were largest for moderate masking levels. This U-shape may be due to increased attentional focus when speech comprehension is challenging, but manageable. The latency of the neural responses increased linearly with increasing background masking, and neural latency change associated with acoustic processing most closely mirrored the changes in speech intelligibility. Finally, tracking responses related to semantic dissimilarity remained robust until severe speech masking (-3 dB SNR). The current study reveals that neural responses to acoustic features are highly sensitive to background masking and decreasing speech intelligibility, whereas neural responses to semantic features are relatively robust, suggesting that individuals track the meaning of the story well even in moderate background sound.
Collapse
Affiliation(s)
- Sonia Yasmin
- Department of Psychology & the Brain and Mind Institute,The University of Western Ontario, London, ON, N6A 3K7, Canada.
| | - Vanessa C Irsik
- Department of Psychology & the Brain and Mind Institute,The University of Western Ontario, London, ON, N6A 3K7, Canada
| | - Ingrid S Johnsrude
- Department of Psychology & the Brain and Mind Institute,The University of Western Ontario, London, ON, N6A 3K7, Canada; School of Communication and Speech Disorders,The University of Western Ontario, London, ON, N6A 5B7, Canada
| | - Björn Herrmann
- Rotman Research Institute, Baycrest, M6A 2E1, Toronto, ON, Canada; Department of Psychology,University of Toronto, M5S 1A1, Toronto, ON, Canada
| |
Collapse
|
7
|
Hou J, Chen C, Dong Q. Early musical training benefits to non-musical cognitive ability associated with the Gestalt principles. Front Psychol 2023; 14:1134116. [PMID: 37554141 PMCID: PMC10405822 DOI: 10.3389/fpsyg.2023.1134116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Accepted: 07/07/2023] [Indexed: 08/10/2023] Open
Abstract
Musical training has been evidenced to facilitate music perception, which refers to the consistencies, boundaries, and segmentations in pieces of music that are associated with the Gestalt principles. The current study aims to test whether musical training is beneficial to non-musical cognitive ability with Gestalt principles. Three groups of Chinese participants (with early, late, and no musical training) were compared in terms of their performances on the Motor-Free Visual Perception Test (MVPT). The results show that the participants with early musical training had significantly better performance in the Gestalt-like Visual Closure subtest than those with late and no musical training, but no significances in other Gestalt-unlike subtests was identified (Visual Memory, Visual Discrimination, Spatial Relationship, Figure Ground in MVPT). This study suggests the benefit of early musical training on non-musical cognitive ability with Gestalt principles.
Collapse
Affiliation(s)
- Jiancheng Hou
- Research Center for Cross-Straits Cultural Development, Fujian Normal University, Fuzhou, Fujian, China
- State Key Lab of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing, China
- School of Public Health, Indiana University Bloomington, Bloomington, IN, United States
| | - Chuansheng Chen
- Department of Psychological Science, University of California, Irvine, CA, United States
| | - Qi Dong
- State Key Lab of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing, China
| |
Collapse
|
8
|
Nittrouer S, Lowenstein JH. Recognition of Sentences With Complex Syntax in Speech Babble by Adolescents With Normal Hearing or Cochlear Implants. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:1110-1135. [PMID: 36758200 PMCID: PMC10205108 DOI: 10.1044/2022_jslhr-22-00407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 10/17/2022] [Accepted: 11/22/2022] [Indexed: 05/25/2023]
Abstract
PURPOSE General language abilities of children with cochlear implants have been thoroughly investigated, especially at young ages, but far less is known about how well they process language in real-world settings, especially in higher grades. This study addressed this gap in knowledge by examining recognition of sentences with complex syntactic structures in backgrounds of speech babble by adolescents with cochlear implants, and peers with normal hearing. DESIGN Two experiments were conducted. First, new materials were developed using young adults with normal hearing as the normative sample, creating a corpus of sentences with controlled, but complex syntactic structures presented in three kinds of babble that varied in voice gender and number of talkers. Second, recognition by adolescents with normal hearing or cochlear implants was examined for these new materials and for sentence materials used with these adolescents at younger ages. Analyses addressed three objectives: (1) to assess the stability of speech recognition across a multiyear age range, (2) to evaluate speech recognition of sentences with complex syntax in babble, and (3) to explore how bottom-up and top-down mechanisms account for performance under these conditions. RESULTS Results showed: (1) Recognition was stable across the ages of 10-14 years for both groups. (2) Adolescents with normal hearing performed similarly to young adults with normal hearing, showing effects of syntactic complexity and background babble; adolescents with cochlear implants showed poorer recognition overall, and diminished effects of both factors. (3) Top-down language and working memory primarily explained recognition for adolescents with normal hearing, but the bottom-up process of perceptual organization primarily explained recognition for adolescents with cochlear implants. CONCLUSIONS Comprehension of language in real-world settings relies on different mechanisms for adolescents with cochlear implants than for adolescents with normal hearing. A novel finding was that perceptual organization is a critical factor. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.21965228.
Collapse
Affiliation(s)
- Susan Nittrouer
- Department of Speech, Language, and Hearing Sciences, University of Florida, Gainesville
| | - Joanna H. Lowenstein
- Department of Speech, Language, and Hearing Sciences, University of Florida, Gainesville
| |
Collapse
|
9
|
Ozmeral EJ, Menon KN. Selective auditory attention modulates cortical responses to sound location change for speech in quiet and in babble. PLoS One 2023; 18:e0268932. [PMID: 36638116 PMCID: PMC9838839 DOI: 10.1371/journal.pone.0268932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Accepted: 01/03/2023] [Indexed: 01/14/2023] Open
Abstract
Listeners use the spatial location or change in spatial location of coherent acoustic cues to aid in auditory object formation. From stimulus-evoked onset responses in normal-hearing listeners using electroencephalography (EEG), we have previously shown measurable tuning to stimuli changing location in quiet, revealing a potential window into the cortical representations of auditory scene analysis. These earlier studies used non-fluctuating, spectrally narrow stimuli, so it was still unknown whether previous observations would translate to speech stimuli, and whether responses would be preserved for stimuli in the presence of background maskers. To examine the effects that selective auditory attention and interferers have on object formation, we measured cortical responses to speech changing location in the free field with and without background babble (+6 dB SNR) during both passive and active conditions. Active conditions required listeners to respond to the onset of the speech stream when it occurred at a new location, explicitly indicating 'yes' or 'no' to whether the stimulus occurred at a block-specific location either 30 degrees to the left or right of midline. In the aggregate, results show similar evoked responses to speech stimuli changing location in quiet compared to babble background. However, the effect of the two background environments diverges somewhat when considering the magnitude and direction of the location change and where the subject was attending. In quiet, attention to the right hemifield appeared to evoke a stronger response than attention to the left hemifield when speech shifted in the rightward direction. No such difference was found in babble conditions. Therefore, consistent with challenges associated with cocktail party listening, directed spatial attention could be compromised in the presence of stimulus noise and likely leads to poorer use of spatial cues in auditory streaming.
Collapse
Affiliation(s)
- Erol J Ozmeral
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, FL, United States of America
| | - Katherine N Menon
- Department of Hearing and Speech Sciences, University of Maryland, College Park, MD, United States of America
| |
Collapse
|
10
|
Zheng C, Zhang H, Liu W, Luo X, Li A, Li X, Moore BCJ. Sixty Years of Frequency-Domain Monaural Speech Enhancement: From Traditional to Deep Learning Methods. Trends Hear 2023; 27:23312165231209913. [PMID: 37956661 PMCID: PMC10658184 DOI: 10.1177/23312165231209913] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Accepted: 10/09/2023] [Indexed: 11/15/2023] Open
Abstract
Frequency-domain monaural speech enhancement has been extensively studied for over 60 years, and a great number of methods have been proposed and applied to many devices. In the last decade, monaural speech enhancement has made tremendous progress with the advent and development of deep learning, and performance using such methods has been greatly improved relative to traditional methods. This survey paper first provides a comprehensive overview of traditional and deep-learning methods for monaural speech enhancement in the frequency domain. The fundamental assumptions of each approach are then summarized and analyzed to clarify their limitations and advantages. A comprehensive evaluation of some typical methods was conducted using the WSJ + Deep Noise Suppression (DNS) challenge and Voice Bank + DEMAND datasets to give an intuitive and unified comparison. The benefits of monaural speech enhancement methods using objective metrics relevant for normal-hearing and hearing-impaired listeners were evaluated. The objective test results showed that compression of the input features was important for simulated normal-hearing listeners but not for simulated hearing-impaired listeners. Potential future research and development topics in monaural speech enhancement are suggested.
Collapse
Affiliation(s)
- Chengshi Zheng
- Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Huiyong Zhang
- Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Wenzhe Liu
- Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xiaoxue Luo
- Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Andong Li
- Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xiaodong Li
- Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Brian C. J. Moore
- Cambridge Hearing Group, Department of Psychology, University of Cambridge, Cambridge, UK
| |
Collapse
|
11
|
Zheng C, Xu C, Wang M, Li X, Moore BCJ. Evaluation of deep marginal feedback cancellation for hearing aids using speech and music. Trends Hear 2023; 27:23312165231192290. [PMID: 37551089 PMCID: PMC10408330 DOI: 10.1177/23312165231192290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Accepted: 05/22/2023] [Indexed: 08/09/2023] Open
Abstract
Speech and music both play fundamental roles in daily life. Speech is important for communication while music is important for relaxation and social interaction. Both speech and music have a large dynamic range. This does not pose problems for listeners with normal hearing. However, for hearing-impaired listeners, elevated hearing thresholds may result in low-level portions of sound being inaudible. Hearing aids with frequency-dependent amplification and amplitude compression can partly compensate for this problem. However, the gain required for low-level portions of sound to compensate for the hearing loss can be larger than the maximum stable gain of a hearing aid, leading to acoustic feedback. Feedback control is used to avoid such instability, but this can lead to artifacts, especially when the gain is only just below the maximum stable gain. We previously proposed a deep-learning method called DeepMFC for controlling feedback and reducing artifacts and showed that when the sound source was speech DeepMFC performed much better than traditional approaches. However, its performance using music as the sound source was not assessed and the way in which it led to improved performance for speech was not determined. The present paper reveals how DeepMFC addresses feedback problems and evaluates DeepMFC using speech and music as sound sources with both objective and subjective measures. DeepMFC achieved good performance for both speech and music when it was trained with matched training materials. When combined with an adaptive feedback canceller it provided over 13 dB of additional stable gain for hearing-impaired listeners.
Collapse
Affiliation(s)
- Chengshi Zheng
- Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Chenyang Xu
- Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Meihuang Wang
- Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xiaodong Li
- Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Brian C. J. Moore
- Cambridge Hearing Group, Department of Psychology, University of Cambridge, Cambridge, UK
| |
Collapse
|
12
|
Wong LLN, Zhu S, Chen Y, Li X, Chan WMC. Discrimination of consonants in quiet and in noise in Mandarin-speaking children with normal hearing. PLoS One 2023; 18:e0283198. [PMID: 36943841 PMCID: PMC10030016 DOI: 10.1371/journal.pone.0283198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Accepted: 03/03/2023] [Indexed: 03/23/2023] Open
Abstract
OBJECTIVE Given the critical role of consonants in speech perception and the lack of knowledge on consonant perception in noise in Mandarin-speaking children, the current study aimed to investigate Mandarin consonant discrimination in normal-hearing children, in relation to the effects of age and signal-to-noise ratios (S/N). DESIGN A discrimination task consisting of 33 minimal pairs in monosyllabic words was designed to explore the development of consonant discrimination in five test conditions: 0, -5, -10, -15 dB S/Ns, and quiet. STUDY SAMPLE Forty Mandarin-speaking, normal-hearing children aged from 4;0 to 8;9 in one-year-age increment were recruited and their performance was compared to 10 adult listeners. RESULTS A significant main effect of age, test conditions, and an interaction effect between these variables was noted. Consonant discrimination in quiet and in noise improved as children became older. Consonants that were difficult to discriminate in quiet and in noise were mainly velar contrasts. Noise seemed to have less effect on the discrimination of affricates and fricatives, and plosives appeared to be to be more difficult to discriminate in noise than in quiet. Place contrasts between alveolar and palato-alveolar consonants were difficult in quiet. CONCLUSIONS The findings were the first to reveal typical perceptual development of Mandarin consonant discrimination in children and can serve as a reference for comparison with children with disordered perceptual development, such as those with hearing loss.
Collapse
Affiliation(s)
- Lena L N Wong
- Faculty of Education, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Shufeng Zhu
- Faculty of Education, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Yuan Chen
- Department of Special Education and Counselling, The Education University of Hong Kong, Hong Kong SAR, China
| | - Xinxin Li
- Faculty of Education, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Wing M C Chan
- Faculty of Education, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| |
Collapse
|
13
|
Szalárdy O, Tóth B, Farkas D, Orosz G, Winkler I. Do we parse the background into separate streams in the cocktail party? Front Hum Neurosci 2022; 16:952557. [DOI: 10.3389/fnhum.2022.952557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 10/06/2022] [Indexed: 11/13/2022] Open
Abstract
In the cocktail party situation, people with normal hearing usually follow a single speaker among multiple concurrent ones. However, there is no agreement in the literature as to whether the background is segregated into multiple streams/speakers. The current study varied the number of concurrent speech streams and investigated target detection and memory for the contents of a target stream as well as the processing of distractors. A male-voiced target stream was either presented alone (single-speech), together with one male-voiced distractor (one-distractor), or a male- and a female-voiced distractor (two-distractor). Behavioral measures of target detection and content tracking performance as well as target- and distractor detection related event-related brain potentials (ERPs) were assessed. We found that the N2 amplitude decreased whereas the P3 amplitude increased from the single-speech to the concurrent speech streams conditions. Importantly, the behavioral effect of distractors differed between the conditions with one vs. two distractor speech streams and the non-zero voltages in the N2 time window for distractor numerals and in the P3 time window for syntactic violations appearing in the non-target speech stream significantly differed between the one- and two-distractor conditions for the same (male) speaker. These results support the notion that the two background speech streams are segregated, as they show that distractors and syntactic violations appearing in the non-target streams are processed even when two speech non-target speech streams are delivered together with the target stream.
Collapse
|
14
|
Buss E, Miller MK, Leibold LJ. Maturation of Speech-in-Speech Recognition for Whispered and Voiced Speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:3117-3128. [PMID: 35868232 PMCID: PMC9911131 DOI: 10.1044/2022_jslhr-21-00620] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 04/01/2022] [Accepted: 04/29/2022] [Indexed: 06/15/2023]
Abstract
PURPOSE Some speech recognition data suggest that children rely less on voice pitch and harmonicity to support auditory scene analysis than adults. Two experiments evaluated development of speech-in-speech recognition using voiced speech and whispered speech, which lacks the harmonic structure of voiced speech. METHOD Listeners were 5- to 7-year-olds and adults with normal hearing. Targets were monosyllabic words organized into three-word sets that differ in vowel content. Maskers were two-talker or one-talker streams of speech. Targets and maskers were recorded by different female talkers in both voiced and whispered speaking styles. For each masker, speech reception thresholds (SRTs) were measured in all four combinations of target and masker speech, including matched and mismatched speaking styles for the target and masker. RESULTS Children performed more poorly than adults overall. For the two-talker masker, this age effect was smaller for the whispered target and masker than for the other three conditions. Children's SRTs in this condition were predominantly positive, suggesting that they may have relied on a wholistic listening strategy rather than segregating the target from the masker. For the one-talker masker, age effects were consistent across the four conditions. Reduced informational masking for the one-talker masker could be responsible for differences in age effects for the two maskers. A benefit of mismatching the target and masker speaking style was observed for both target styles in the two-talker masker and for the voiced targets in the one-talker masker. CONCLUSIONS These results provide no compelling evidence that young school-age children and adults are differentially sensitive to the cues present in voiced and whispered speech. Both groups benefit from mismatches in speaking style under some conditions. These benefits could be due to a combination of reduced perceptual similarity, harmonic cancelation, and differences in energetic masking.
Collapse
Affiliation(s)
- Emily Buss
- Department of Otolaryngology-Head and Neck Surgery, University of North Carolina at Chapel Hill
| | - Margaret K. Miller
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, NE
| | - Lori J. Leibold
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, NE
| |
Collapse
|
15
|
Age-related differences in the neural network interactions underlying the predictability gain. Cortex 2022; 154:269-286. [DOI: 10.1016/j.cortex.2022.05.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 03/30/2022] [Accepted: 05/03/2022] [Indexed: 11/20/2022]
|
16
|
Wilms V, Drijvers L, Brouwer S. The Effects of Iconic Gestures and Babble Language on Word Intelligibility in Sentence Context. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:1822-1838. [PMID: 35439423 DOI: 10.1044/2022_jslhr-21-00387] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
PURPOSE This study investigated to what extent iconic co-speech gestures help word intelligibility in sentence context in two different linguistic maskers (native vs. foreign). It was hypothesized that sentence recognition improves with the presence of iconic co-speech gestures and with foreign compared to native babble. METHOD Thirty-two native Dutch participants performed a Dutch word recognition task in context in which they were presented with videos in which an actress uttered short Dutch sentences (e.g., Ze begint te openen, "She starts to open"). Participants were presented with a total of six audiovisual conditions: no background noise (i.e., clear condition) without gesture, no background noise with gesture, French babble without gesture, French babble with gesture, Dutch babble without gesture, and Dutch babble with gesture; and they were asked to type down what was said by the Dutch actress. The accurate identification of the action verbs at the end of the target sentences was measured. RESULTS The results demonstrated that performance on the task was better in the gesture compared to the nongesture conditions (i.e., gesture enhancement effect). In addition, performance was better in French babble than in Dutch babble. CONCLUSIONS Listeners benefit from iconic co-speech gestures during communication and from foreign background speech compared to native. These insights into multimodal communication may be valuable to everyone who engages in multimodal communication and especially to a public who often works in public places where competing speech is present in the background.
Collapse
Affiliation(s)
- Veerle Wilms
- Centre for Language Studies, Radboud University, Nijmegen, the Netherlands
| | - Linda Drijvers
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
| | - Susanne Brouwer
- Centre for Language Studies, Radboud University, Nijmegen, the Netherlands
| |
Collapse
|
17
|
Roberts B, Summers RJ, Bailey PJ. Effects of stimulus naturalness and contralateral interferers on lexical bias in consonant identification. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:3369. [PMID: 35649936 DOI: 10.1121/10.0011395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 05/02/2022] [Indexed: 06/15/2023]
Abstract
Lexical bias is the tendency to perceive an ambiguous speech sound as a phoneme completing a word; more ambiguity typically causes greater reliance on lexical knowledge. A speech sound ambiguous between /g/ and /k/ is more likely to be perceived as /g/ before /ɪft/ and as /k/ before /ɪs/. The magnitude of this difference-the Ganong shift-increases when high cognitive load limits available processing resources. The effects of stimulus naturalness and informational masking on Ganong shifts and reaction times were explored. Tokens between /gɪ/ and /kɪ/ were generated using morphing software, from which two continua were created ("giss"-"kiss" and "gift"-"kift"). In experiment 1, Ganong shifts were considerably larger for sine- than noise-vocoded versions of these continua, presumably because the spectral sparsity and unnatural timbre of the former increased cognitive load. In experiment 2, noise-vocoded stimuli were presented alone or accompanied by contralateral interferers with constant within-band amplitude envelope, or within-band envelope variation that was the same or different across bands. The latter, with its implied spectro-temporal variation, was predicted to cause the greatest cognitive load. Reaction-time measures matched this prediction; Ganong shifts showed some evidence of greater lexical bias for frequency-varying interferers, but were influenced by context effects and diminished over time.
Collapse
Affiliation(s)
- Brian Roberts
- School of Psychology, Aston University, Birmingham, B4 7ET, United Kingdom
| | - Robert J Summers
- School of Psychology, Aston University, Birmingham, B4 7ET, United Kingdom
| | - Peter J Bailey
- Department of Psychology, University of York, Heslington, York, YO10 5DD, United Kingdom
| |
Collapse
|
18
|
Luberadzka J, Kayser H, Hohmann V. Making sense of periodicity glimpses in a prediction-update-loop-A computational model of attentive voice tracking. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:712. [PMID: 35232067 PMCID: PMC9088677 DOI: 10.1121/10.0009337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 11/13/2021] [Accepted: 01/03/2022] [Indexed: 06/14/2023]
Abstract
Humans are able to follow a speaker even in challenging acoustic conditions. The perceptual mechanisms underlying this ability remain unclear. A computational model of attentive voice tracking, consisting of four computational blocks: (1) sparse periodicity-based auditory features (sPAF) extraction, (2) foreground-background segregation, (3) state estimation, and (4) top-down knowledge, is presented. The model connects the theories about auditory glimpses, foreground-background segregation, and Bayesian inference. It is implemented with the sPAF, sequential Monte Carlo sampling, and probabilistic voice models. The model is evaluated by comparing it with the human data obtained in the study by Woods and McDermott [Curr. Biol. 25(17), 2238-2246 (2015)], which measured the ability to track one of two competing voices with time-varying parameters [fundamental frequency (F0) and formants (F1,F2)]. Three model versions were tested, which differ in the type of information used for the segregation: version (a) uses the oracle F0, version (b) uses the estimated F0, and version (c) uses the spectral shape derived from the estimated F0 and oracle F1 and F2. Version (a) simulates the optimal human performance in conditions with the largest separation between the voices, version (b) simulates the conditions in which the separation in not sufficient to follow the voices, and version (c) is closest to the human performance for moderate voice separation.
Collapse
Affiliation(s)
- Joanna Luberadzka
- Auditory Signal Processing, Department of Medical Physics and Acoustics, University of Oldenburg, Germany
| | - Hendrik Kayser
- Auditory Signal Processing, Department of Medical Physics and Acoustics, University of Oldenburg, Germany
| | - Volker Hohmann
- Auditory Signal Processing, Department of Medical Physics and Acoustics, University of Oldenburg, Germany
| |
Collapse
|
19
|
Bilinguals Show Proportionally Greater Benefit From Visual Speech Cues and Sentence Context in Their Second Compared to Their First Language. Ear Hear 2021; 43:1316-1326. [PMID: 34966162 DOI: 10.1097/aud.0000000000001182] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
OBJECTIVES Speech perception in noise is challenging, but evidence suggests that it may be facilitated by visual speech cues (e.g., lip movements) and supportive sentence context in native speakers. Comparatively few studies have investigated speech perception in noise in bilinguals, and little is known about the impact of visual speech cues and supportive sentence context in a first language compared to a second language within the same individual. The current study addresses this gap by directly investigating the extent to which bilinguals benefit from visual speech cues and supportive sentence context under similarly noisy conditions in their first and second language. DESIGN Thirty young adult English-French/French-English bilinguals were recruited from the undergraduate psychology program at Concordia University and from the Montreal community. They completed a speech perception in noise task during which they were presented with video-recorded sentences and instructed to repeat the last word of each sentence out loud. Sentences were presented in three different modalities: visual-only, auditory-only, and audiovisual. Additionally, sentences had one of two levels of context: moderate (e.g., "In the woods, the hiker saw a bear.") and low (e.g., "I had not thought about that bear."). Each participant completed this task in both their first and second language; crucially, the level of background noise was calibrated individually for each participant and was the same throughout the first language and second language (L2) portions of the experimental task. RESULTS Overall, speech perception in noise was more accurate in bilinguals' first language compared to the second. However, participants benefited from visual speech cues and supportive sentence context to a proportionally greater extent in their second language compared to their first. At the individual level, performance during the speech perception in noise task was related to aspects of bilinguals' experience in their second language (i.e., age of acquisition, relative balance between the first and the second language). CONCLUSIONS Bilinguals benefit from visual speech cues and sentence context in their second language during speech in noise and do so to a greater extent than in their first language given the same level of background noise. Together, this indicates that L2 speech perception can be conceptualized within an inverse effectiveness hypothesis framework with a complex interplay of sensory factors (i.e., the quality of the auditory speech signal and visual speech cues) and linguistic factors (i.e., presence or absence of supportive context and L2 experience of the listener).
Collapse
|
20
|
Roberts B, Summers RJ, Bailey PJ. Mandatory dichotic integration of second-formant information: Contralateral sine bleats have predictable effects on consonant place judgments. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:3693. [PMID: 34852626 DOI: 10.1121/10.0007132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 10/22/2021] [Indexed: 06/13/2023]
Abstract
Speech-on-speech informational masking arises because the interferer disrupts target processing (e.g., capacity limitations) or corrupts it (e.g., intrusions into the target percept); the latter should produce predictable errors. Listeners identified the consonant in monaural buzz-excited three-formant analogues of approximant-vowel syllables, forming a place of articulation series (/w/-/l/-/j/). There were two 11-member series; the vowel was either high-front or low-back. Series members shared formant-amplitude contours, fundamental frequency, and F1+F3 frequency contours; they were distinguished solely by the F2 frequency contour before the steady portion. Targets were always presented in the left ear. For each series, F2 frequency and amplitude contours were also used to generate interferers with altered source properties-sine-wave analogues of F2 (sine bleats) matched to their buzz-excited counterparts. Accompanying each series member with a fixed mismatched sine bleat in the contralateral ear produced systematic and predictable effects on category judgments; these effects were usually largest for bleats involving the fastest rate or greatest extent of frequency change. Judgments of isolated sine bleats using the three place labels were often unsystematic or arbitrary. These results indicate that informational masking by interferers involved corruption of target processing as a result of mandatory dichotic integration of F2 information, despite the grouping cues disfavoring this integration.
Collapse
Affiliation(s)
- Brian Roberts
- School of Psychology, Aston University, Birmingham B4 7ET, United Kingdom
| | - Robert J Summers
- School of Psychology, Aston University, Birmingham B4 7ET, United Kingdom
| | - Peter J Bailey
- Department of Psychology, University of York, Heslington, York YO10 5DD, United Kingdom
| |
Collapse
|
21
|
Fitzhugh MC, LaCroix AN, Rogalsky C. Distinct Contributions of Working Memory and Attentional Control to Sentence Comprehension in Noise in Persons With Stroke. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:3230-3241. [PMID: 34284642 PMCID: PMC8740654 DOI: 10.1044/2021_jslhr-20-00694] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 03/26/2021] [Accepted: 04/22/2021] [Indexed: 06/13/2023]
Abstract
Purpose Sentence comprehension deficits are common following a left hemisphere stroke and have primarily been investigated under optimal listening conditions. However, ample work in neurotypical controls indicates that background noise affects sentence comprehension and the cognitive resources it engages. The purpose of this study was to examine how background noise affects sentence comprehension poststroke using both energetic and informational maskers. We further sought to identify whether sentence comprehension in noise abilities are related to poststroke cognitive abilities, specifically working memory and/or attentional control. Method Twenty persons with chronic left hemisphere stroke completed a sentence-picture matching task where they listened to sentences presented in three types of maskers: multispeakers, broadband noise, and silence (control condition). Working memory, attentional control, and hearing thresholds were also assessed. Results A repeated-measures analysis of variance identified participants to have the greatest difficulty with the multispeakers condition, followed by broadband noise and then silence. Regression analyses, after controlling for age and hearing ability, identified working memory as a significant predictor of listening engagement (i.e., mean reaction time) in broadband noise and multispeakers and attentional control as a significant predictor of informational masking effects (computed as a reaction time difference score where broadband noise is subtracted from multispeakers). Conclusions The results from this study indicate that background noise impacts sentence comprehension abilities poststroke and that these difficulties may arise due to deficits in the cognitive resources supporting sentence comprehension and not other factors such as age or hearing. These findings also highlight a relationship between working memory abilities and sentence comprehension in background noise. We further suggest that attentional control abilities contribute to sentence comprehension by supporting the additional demands associated with informational masking. Supplemental Material https://doi.org/10.23641/asha.14984511.
Collapse
Affiliation(s)
- Megan C. Fitzhugh
- Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA
| | | | | |
Collapse
|
22
|
Ozmeral EJ, Eddins DA, Eddins AC. Selective auditory attention modulates cortical responses to sound location change in younger and older adults. J Neurophysiol 2021; 126:803-815. [PMID: 34288759 DOI: 10.1152/jn.00609.2020] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
The present study measured scalp potentials in response to low-frequency, narrowband noise bursts changing location in the front, azimuthal plane. At question was whether selective auditory attention has a modulatory effect on the cortical encoding of spatial change and whether older listeners with normal-hearing thresholds would show depressed cortical representation for spatial changes relative to younger listeners. Young and older normal-hearing listeners were instructed to either passively listen to the stimulus presentation or actively attend to a single location (either 30° left or right of midline) and detect when a noise stream moved to the attended location. Prominent peaks of the electroencephalographic scalp waveforms were compared across groups, locations, and attention conditions. In addition, an opponent-channel model of spatial coding was performed to capture the effect of attention on spatial-change tuning. Younger listeners showed not only larger responses overall but a greater dynamic range in their response to location changes. Results suggest that younger listeners were acquiring and encoding key spatial cues at early cortical processing areas. On the other hand, each group exhibited modulatory effects of attention to spatial-change tuning, indicating that both younger and older listeners selectively attend to space in a manner that amplifies the available signal.NEW & NOTEWORTHY In complex acoustic scenes, listeners take advantage of spatial cues to selectively attend to sounds that are deemed immediately relevant. At the neural level, selective attention amplifies electrical responses to spatial changes. We tested whether older and younger listeners have comparable modulatory effects of attention to stimuli moving in the free field. Results indicate that although older listeners do have depressed overall responses, selective attention enhances spatial-change tuning in younger and older listeners alike.
Collapse
Affiliation(s)
- Erol J Ozmeral
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida
| | - David A Eddins
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida
| | - Ann Clock Eddins
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida
| |
Collapse
|
23
|
Bader M, Schröger E, Grimm S. Auditory Pattern Representations Under Conditions of Uncertainty-An ERP Study. Front Hum Neurosci 2021; 15:682820. [PMID: 34305553 PMCID: PMC8299531 DOI: 10.3389/fnhum.2021.682820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Accepted: 06/11/2021] [Indexed: 11/13/2022] Open
Abstract
The auditory system is able to recognize auditory objects and is thought to form predictive models of them even though the acoustic information arriving at our ears is often imperfect, intermixed, or distorted. We investigated implicit regularity extraction for acoustically intact versus disrupted six-tone sound patterns via event-related potentials (ERPs). In an exact-repetition condition, identical patterns were repeated; in two distorted-repetition conditions, one randomly chosen segment in each sound pattern was replaced either by white noise or by a wrong pitch. In a roving-standard paradigm, sound patterns were repeated 1-12 times (standards) in a row before a new pattern (deviant) occurred. The participants were not informed about the roving rule and had to detect rarely occurring loudness changes. Behavioral detectability of pattern changes was assessed in a subsequent behavioral task. Pattern changes (standard vs. deviant) elicited mismatch negativity (MMN) and P3a, and were behaviorally detected above the chance level in all conditions, suggesting that the auditory system extracts regularities despite distortions in the acoustic input. However, MMN and P3a amplitude were decreased by distortions. At the level of MMN, both types of distortions caused similar impairments, suggesting that auditory regularity extraction is largely determined by the stimulus statistics of matching information. At the level of P3a, wrong-pitch distortions caused larger decreases than white-noise distortions. Wrong-pitch distortions likely prevented the engagement of restoration mechanisms and the segregation of disrupted from true pattern segments, causing stronger informational interference with the relevant pattern information.
Collapse
Affiliation(s)
- Maria Bader
- Cognitive and Biological Psychology, Institute of Psychology-Wilhelm Wundt, Faculty of Life Sciences, Leipzig University, Leipzig, Germany
| | - Erich Schröger
- Cognitive and Biological Psychology, Institute of Psychology-Wilhelm Wundt, Faculty of Life Sciences, Leipzig University, Leipzig, Germany
| | - Sabine Grimm
- Cognitive and Biological Psychology, Institute of Psychology-Wilhelm Wundt, Faculty of Life Sciences, Leipzig University, Leipzig, Germany
| |
Collapse
|
24
|
Fitzhugh MC, Schaefer SY, Baxter LC, Rogalsky C. Cognitive and neural predictors of speech comprehension in noisy backgrounds in older adults. LANGUAGE, COGNITION AND NEUROSCIENCE 2020; 36:269-287. [PMID: 34250179 PMCID: PMC8261331 DOI: 10.1080/23273798.2020.1828946] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Accepted: 09/18/2020] [Indexed: 06/13/2023]
Abstract
Older adults often experience difficulties comprehending speech in noisy backgrounds, which hearing loss does not fully explain. It remains unknown how cognitive abilities, brain networks, and age-related hearing loss may uniquely contribute to speech in noise comprehension at the sentence level. In 31 older adults, using cognitive measures and resting-state fMRI, we investigated the cognitive and neural predictors of speech comprehension with energetic (broadband noise) and informational masking (multi-speakers) effects. Better hearing thresholds and greater working memory abilities were associated with better speech comprehension with energetic masking. Conversely, faster processing speed and stronger functional connectivity between frontoparietal and language networks were associated with better speech comprehension with informational masking. Our findings highlight the importance of the frontoparietal network in older adults' ability to comprehend speech in multi-speaker backgrounds, and that hearing loss and working memory in older adults contributes to speech comprehension abilities related to energetic, but not informational masking.
Collapse
Affiliation(s)
- Megan C. Fitzhugh
- Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA
- College of Health Solutions, Arizona State University, Tempe, AZ
| | - Sydney Y. Schaefer
- School of Biological and Health Systems Engineering, Arizona State University, Tempe, AZ
| | | | | |
Collapse
|
25
|
Roberts B, Summers RJ. Informational masking of speech depends on masker spectro-temporal variation but not on its coherence. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:2416. [PMID: 33138537 DOI: 10.1121/10.0002359] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Accepted: 10/07/2020] [Indexed: 06/11/2023]
Abstract
The impact of an extraneous formant on intelligibility is affected by the extent (depth) of variation in its formant-frequency contour. Two experiments explored whether this impact also depends on masker spectro-temporal coherence, using a method ensuring that interference occurred only through informational masking. Targets were monaural three-formant analogues (F1+F2+F3) of natural sentences presented alone or accompanied by a contralateral competitor for F2 (F2C) that listeners must reject to optimize recognition. The standard F2C was created using the inverted F2 frequency contour and constant amplitude. Variants were derived by dividing F2C into abutting segments (100-200 ms, 10-ms rise/fall). Segments were presented either in the correct order (coherent) or in random order (incoherent), introducing abrupt discontinuities into the F2C frequency contour. F2C depth was also manipulated (0%, 50%, or 100%) prior to segmentation, and the frequency contour of each segment either remained time-varying or was set to constant at the geometric mean frequency of that segment. The extent to which F2C lowered keyword scores depended on segment type (frequency-varying vs constant) and depth, but not segment order. This outcome indicates that the impact on intelligibility depends critically on the overall amount of frequency variation in the competitor, but not its spectro-temporal coherence.
Collapse
Affiliation(s)
- Brian Roberts
- School of Psychology, Aston University, Birmingham B4 7ET, United Kingdom
| | - Robert J Summers
- School of Psychology, Aston University, Birmingham B4 7ET, United Kingdom
| |
Collapse
|
26
|
Tóth B, Honbolygó F, Szalárdy O, Orosz G, Farkas D, Winkler I. The effects of speech processing units on auditory stream segregation and selective attention in a multi-talker (cocktail party) situation. Cortex 2020; 130:387-400. [DOI: 10.1016/j.cortex.2020.06.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Revised: 03/24/2020] [Accepted: 06/08/2020] [Indexed: 10/23/2022]
|
27
|
Eipert L, Klump GM. Uncertainty-based informational masking in a vowel discrimination task for young and old Mongolian gerbils. Hear Res 2020; 392:107959. [PMID: 32330738 DOI: 10.1016/j.heares.2020.107959] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 03/13/2020] [Accepted: 04/01/2020] [Indexed: 11/25/2022]
Abstract
Informational masking emerges with processing of complex sounds in the central auditory system and can be affected by uncertainty emerging from trial-to-trial variation of stimulus features. Uncertainty can be non-informative but confusing and thus mask otherwise salient stimulus changes resulting in increased discrimination thresholds. With increasing age, the ability for processing of such complex sound scenes degrades. Here, 6 young and 4 old gerbils were tested behaviorally in a vowel discrimination task. Animals were trained to discriminate between sequentially presented target and reference vowels of the vowel pair/I/-/i/. Reference and target vowels were generated shifting the three formants of the reference vowel in steps towards the formants of the target vowels. Non-informative but distracting uncertainty was introduced by random changes in location, level, fundamental frequency or all three features combined. Young gerbils tested with uncertainty for the target or target and reference vowels showed similar informational masking effects for both conditions. Young and old gerbils were tested with uncertainty for the target vowels only. Old gerbils showed no threshold increase discriminating vowels without uncertainty in comparison with young gerbils. Introducing uncertainty, vowel discrimination thresholds increased for young and old gerbils and vowel discrimination thresholds increased most when presenting all three uncertainty features combined. Old gerbils were more susceptible to non-informative uncertainty and their thresholds increased more than thresholds of young gerbils. Gerbils' vowel discrimination thresholds are compared to human performance in the same task (Eipert et al., 2019).
Collapse
Affiliation(s)
- Lena Eipert
- Cluster of Excellence Hearing4all, Division Animal Physiology and Behavior, Department of Neuroscience, School of Medicine and Health Sciences, University of Oldenburg, D-26111, Oldenburg, Germany
| | - Georg M Klump
- Cluster of Excellence Hearing4all, Division Animal Physiology and Behavior, Department of Neuroscience, School of Medicine and Health Sciences, University of Oldenburg, D-26111, Oldenburg, Germany.
| |
Collapse
|
28
|
McCullagh EA, Rotschafer SE, Auerbach BD, Klug A, Kaczmarek LK, Cramer KS, Kulesza RJ, Razak KA, Lovelace JW, Lu Y, Koch U, Wang Y. Mechanisms underlying auditory processing deficits in Fragile X syndrome. FASEB J 2020; 34:3501-3518. [PMID: 32039504 DOI: 10.1096/fj.201902435r] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Revised: 12/31/2019] [Accepted: 01/18/2020] [Indexed: 01/14/2023]
Abstract
Autism spectrum disorders (ASD) are strongly associated with auditory hypersensitivity or hyperacusis (difficulty tolerating sounds). Fragile X syndrome (FXS), the most common monogenetic cause of ASD, has emerged as a powerful gateway for exploring underlying mechanisms of hyperacusis and auditory dysfunction in ASD. This review discusses examples of disruption of the auditory pathways in FXS at molecular, synaptic, and circuit levels in animal models as well as in FXS individuals. These examples highlight the involvement of multiple mechanisms, from aberrant synaptic development and ion channel deregulation of auditory brainstem circuits, to impaired neuronal plasticity and network hyperexcitability in the auditory cortex. Though a relatively new area of research, recent discoveries have increased interest in auditory dysfunction and mechanisms underlying hyperacusis in this disorder. This rapidly growing body of data has yielded novel research directions addressing critical questions regarding the timing and possible outcomes of human therapies for auditory dysfunction in ASD.
Collapse
Affiliation(s)
- Elizabeth A McCullagh
- Department of Physiology and Biophysics, University of Colorado Anschutz, Aurora, CO, USA.,Department of Integrative Biology, Oklahoma State University, Stillwater, OK, USA
| | - Sarah E Rotschafer
- Department of Neurobiology and Behavior, University of California, Irvine, CA, USA.,Department of Biomedical Sciences, Mercer University School of Medicine, Savannah, GA, USA
| | - Benjamin D Auerbach
- Center for Hearing and Deafness, Department of Communicative Disorders & Sciences, SUNY at Buffalo, Buffalo, NY, USA
| | - Achim Klug
- Department of Physiology and Biophysics, University of Colorado Anschutz, Aurora, CO, USA
| | - Leonard K Kaczmarek
- Departments of Pharmacology and Cellular and Molecular Physiology, Yale University, New Haven, CT, USA
| | - Karina S Cramer
- Department of Neurobiology and Behavior, University of California, Irvine, CA, USA
| | - Randy J Kulesza
- Department of Anatomy, Lake Erie College of Osteopathic Medicine, Erie, PA, USA
| | - Khaleel A Razak
- Department of Psychology, University of California, Riverside, CA, USA
| | | | - Yong Lu
- Department of Anatomy and Neurobiology, College of Medicine, Northeast Ohio Medical University, Rootstown, OH, USA
| | - Ursula Koch
- Institute of Biology, Neurophysiology, Freie Universität Berlin, Berlin, Germany
| | - Yuan Wang
- Department of Biomedical Sciences, Program in Neuroscience, Florida State University, Tallahassee, FL, USA
| |
Collapse
|
29
|
Summers RJ, Roberts B. Informational masking of speech by acoustically similar intelligible and unintelligible interferers. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:1113. [PMID: 32113320 DOI: 10.1121/10.0000688] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 01/19/2020] [Indexed: 06/10/2023]
Abstract
Masking experienced when target speech is accompanied by a single interfering voice is often primarily informational masking (IM). IM is generally greater when the interferer is intelligible than when it is not (e.g., speech from an unfamiliar language), but the relative contributions of acoustic-phonetic and linguistic interference are often difficult to assess owing to acoustic differences between interferers (e.g., different talkers). Three-formant analogues (F1+F2+F3) of natural sentences were used as targets and interferers. Targets were presented monaurally either alone or accompanied contralaterally by interferers from another sentence (F0 = 4 semitones higher); a target-to-masker ratio (TMR) between ears of 0, 6, or 12 dB was used. Interferers were either intelligible or rendered unintelligible by delaying F2 and advancing F3 by 150 ms relative to F1, a manipulation designed to minimize spectro-temporal differences between corresponding interferers. Target-sentence intelligibility (keywords correct) was 67% when presented alone, but fell considerably when an unintelligible interferer was present (49%) and significantly further when the interferer was intelligible (41%). Changes in TMR produced neither a significant main effect nor an interaction with interferer type. Interference with acoustic-phonetic processing of the target can explain much of the impact on intelligibility, but linguistic factors-particularly interferer intrusions-also make an important contribution to IM.
Collapse
Affiliation(s)
- Robert J Summers
- Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, United Kingdom
| | - Brian Roberts
- Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, United Kingdom
| |
Collapse
|
30
|
Eipert L, Selle A, Klump GM. Uncertainty in location, level and fundamental frequency results in informational masking in a vowel discrimination task for young and elderly subjects. Hear Res 2019; 377:142-152. [DOI: 10.1016/j.heares.2019.03.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Revised: 03/15/2019] [Accepted: 03/18/2019] [Indexed: 10/27/2022]
|
31
|
Majumder S, Deen MJ. Smartphone Sensors for Health Monitoring and Diagnosis. SENSORS (BASEL, SWITZERLAND) 2019; 19:E2164. [PMID: 31075985 PMCID: PMC6539461 DOI: 10.3390/s19092164] [Citation(s) in RCA: 131] [Impact Index Per Article: 26.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 04/27/2019] [Accepted: 04/30/2019] [Indexed: 12/29/2022]
Abstract
Over the past few decades, we have witnessed a dramatic rise in life expectancy owing to significant advances in medical science and technology, medicine as well as increased awareness about nutrition, education, and environmental and personal hygiene. Consequently, the elderly population in many countries are expected to rise rapidly in the coming years. A rapidly rising elderly demographics is expected to adversely affect the socioeconomic systems of many nations in terms of costs associated with their healthcare and wellbeing. In addition, diseases related to the cardiovascular system, eye, respiratory system, skin and mental health are widespread globally. However, most of these diseases can be avoided and/or properly managed through continuous monitoring. In order to enable continuous health monitoring as well as to serve growing healthcare needs; affordable, non-invasive and easy-to-use healthcare solutions are critical. The ever-increasing penetration of smartphones, coupled with embedded sensors and modern communication technologies, make it an attractive technology for enabling continuous and remote monitoring of an individual's health and wellbeing with negligible additional costs. In this paper, we present a comprehensive review of the state-of-the-art research and developments in smartphone-sensor based healthcare technologies. A discussion on regulatory policies for medical devices and their implications in smartphone-based healthcare systems is presented. Finally, some future research perspectives and concerns regarding smartphone-based healthcare systems are described.
Collapse
Affiliation(s)
- Sumit Majumder
- Department of Electrical and Computer Engineering, McMaster University, Hamilton, ON L8S 4L8, Canada.
| | - M Jamal Deen
- Department of Electrical and Computer Engineering, McMaster University, Hamilton, ON L8S 4L8, Canada.
- School of Biomedical Engineering, McMaster University, Hamilton, ON L8S 4L8, Canada.
| |
Collapse
|
32
|
Szalárdy O, Tóth B, Farkas D, György E, Winkler I. Neuronal Correlates of Informational and Energetic Masking in the Human Brain in a Multi-Talker Situation. Front Psychol 2019; 10:786. [PMID: 31024409 PMCID: PMC6465330 DOI: 10.3389/fpsyg.2019.00786] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Accepted: 03/21/2019] [Indexed: 11/13/2022] Open
Abstract
Human listeners can follow the voice of one speaker while several others are talking at the same time. This process requires segregating the speech streams from each other and continuously directing attention to the target stream. We investigated the functional brain networks underlying this ability. Two speech streams were presented simultaneously to participants, who followed one of them and detected targets within it (target stream). The loudness of the distractor speech stream varied on five levels: moderately softer, slightly softer, equal, slightly louder, or moderately louder than the attended. Performance measures showed that the most demanding task was the moderately softer distractors condition, which indicates that a softer distractor speech may receive more covert attention than louder distractors and, therefore, they require more cognitive resources. EEG-based measurement of functional connectivity between various brain regions revealed frequency-band specific networks: (1) energetic masking (comparing the louder distractor conditions with the equal loudness condition) was predominantly associated with stronger connectivity between the frontal and temporal regions at the lower alpha (8–10 Hz) and gamma (30–70 Hz) bands; (2) informational masking (comparing the softer distractor conditions with the equal loudness condition) was associated with a distributed network between parietal, frontal, and temporal regions at the theta (4–8 Hz) and beta (13–30 Hz) bands. These results suggest the presence of distinct cognitive and neural processes for solving the interference from energetic vs. informational masking.
Collapse
Affiliation(s)
- Orsolya Szalárdy
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary.,Institute of Behavioural Sciences, Faculty of Medicine, Semmelweis University, Budapest, Hungary
| | - Brigitta Tóth
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| | - Dávid Farkas
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| | - Erika György
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| | - István Winkler
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| |
Collapse
|
33
|
Abstract
Research in speech perception has explored how knowledge of a language influences phonetic perception. The current study investigated whether such linguistic influences extend to the perceptual (sequential) organization of speech. Listeners heard sinewave analogs of word pairs (e.g., loose seam, which contains a single [s] frication but is perceived as two /s/ phonemes) cycle continuously, which causes the stimulus to split apart into foreground and background percepts. They had to identify the foreground percept when the stimuli were heard as nonspeech and then again when heard as speech. Of interest was how grouping changed across listening condition when [s] was heard as speech or as a hiss. Although the section of the signal that was identified as the foreground differed little across listening condition, a strong bias to perceive [s] as forming the onset of the foreground was observed in the speech condition (Experiment 1). This effect was reduced in Experiment 2 by increasing the stimulus repetition rate. Findings suggest that the sequential organization of speech arises from the interaction of auditory and linguistic processes, with the former constraining the latter.
Collapse
|
34
|
Roberts B, Summers RJ. Dichotic integration of acoustic-phonetic information: Competition from extraneous formants increases the effect of second-formant attenuation on intelligibility. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:1230. [PMID: 31067923 DOI: 10.1121/1.5091443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Accepted: 02/01/2019] [Indexed: 06/09/2023]
Abstract
Differences in ear of presentation and level do not prevent effective integration of concurrent speech cues such as formant frequencies. For example, presenting the higher formants of a consonant-vowel syllable in the opposite ear to the first formant protects them from upward spread of masking, allowing them to remain effective speech cues even after substantial attenuation. This study used three-formant (F1+F2+F3) analogues of natural sentences and extended the approach to include competitive conditions. Target formants were presented dichotically (F1+F3; F2), either alone or accompanied by an extraneous competitor for F2 (i.e., F1±F2C+F3; F2) that listeners must reject to optimize recognition. F2C was created by inverting the F2 frequency contour and using the F2 amplitude contour without attenuation. In experiment 1, F2C was always absent and intelligibility was unaffected until F2 attenuation exceeded 30 dB; F2 still provided useful information at 48-dB attenuation. In experiment 2, attenuating F2 by 24 dB caused considerable loss of intelligibility when F2C was present, but had no effect in its absence. Factors likely to contribute to this interaction include informational masking from F2C acting to swamp the acoustic-phonetic information carried by F2, and interaural inhibition from F2C acting to reduce the effective level of F2.
Collapse
Affiliation(s)
- Brian Roberts
- Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, United Kingdom
| | - Robert J Summers
- Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, United Kingdom
| |
Collapse
|
35
|
Wang D, Chen J. Supervised Speech Separation Based on Deep Learning: An Overview. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 2018; 26:1702-1726. [PMID: 31223631 PMCID: PMC6586438 DOI: 10.1109/taslp.2018.2842159] [Citation(s) in RCA: 120] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Speech separation is the task of separating target speech from background interference. Traditionally, speech separation is studied as a signal processing problem. A more recent approach formulates speech separation as a supervised learning problem, where the discriminative patterns of speech, speakers, and background noise are learned from training data. Over the past decade, many supervised separation algorithms have been put forward. In particular, the recent introduction of deep learning to supervised speech separation has dramatically accelerated progress and boosted separation performance. This paper provides a comprehensive overview of the research on deep learning based supervised speech separation in the last several years. We first introduce the background of speech separation and the formulation of supervised separation. Then, we discuss three main components of supervised separation: learning machines, training targets, and acoustic features. Much of the overview is on separation algorithms where we review monaural methods, including speech enhancement (speech-nonspeech separation), speaker separation (multitalker separation), and speech dereverberation, as well as multimicrophone techniques. The important issue of generalization, unique to supervised learning, is discussed. This overview provides a historical perspective on how advances are made. In addition, we discuss a number of conceptual issues, including what constitutes the target source.
Collapse
Affiliation(s)
- DeLiang Wang
- Department of Computer Science and Engineering and the Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, OH 43210 USA, and also with the Center of Intelligent Acoustics and Immersive Communications, Northwestern Polytechnical University, Xi'an 710072, China
| | - Jitong Chen
- Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210 USA. He is now with Silicon Valley AI Lab, Baidu Research, Sunnyvale, CA 94089 USA
| |
Collapse
|
36
|
Kamourieh S, Braga RM, Leech R, Mehta A, Wise RJS. Speech Registration in Symptomatic Memory Impairment. Front Aging Neurosci 2018; 10:201. [PMID: 30038566 PMCID: PMC6046456 DOI: 10.3389/fnagi.2018.00201] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2017] [Accepted: 06/13/2018] [Indexed: 11/20/2022] Open
Abstract
Background: An inability to recall recent conversations often indicates impaired episodic memory retrieval. It may also reflect a failure of attentive registration of spoken sentences which leads to unsuccessful memory encoding. The hypothesis was that patients complaining of impaired memory would demonstrate impaired function of “multiple demand” (MD) brain regions, whose activation profile generalizes across cognitive domains, during speech registration in naturalistic listening conditions. Methods: Using functional MRI, brain activity was measured in 22 normal participants and 31 patients complaining of memory impairment, 21 of whom had possible or probable Alzheimer’s disease (AD). Participants heard a target speaker, either speaking alone or in the presence of distracting background speech, followed by a question to determine if the target speech had been registered. Results: Patients performed poorly at registering verbal information, which correlated with their scores on a screening test of cognitive impairment. Speech registration was associated with widely distributed activity in both auditory cortex and in MD cortex. Additional regions were most active when the target speech had to be separated from background speech. Activity in midline and lateral frontal MD cortex was reduced in the patients. A central cholinesterase inhibitor to increase brain acetylcholine levels in half the patients was not observed to alter brain activity or improve task performance at a second fMRI scan performed 6–11 weeks later. However, individual performances spontaneously fluctuated between the two scanning sessions, and these performance differences correlated with activity within a right hemisphere fronto-temporal system previously associated with sustained auditory attention. Conclusions: Midline and lateralized frontal regions that are engaged in task-dependent attention to, and registration of, verbal information are potential targets for transcranial brain stimulation to improve speech registration in neurodegenerative conditions.
Collapse
Affiliation(s)
- Salwa Kamourieh
- Computational, Cognitive, and Clinical Neuroimaging Laboratory, Division of Brain Sciences, Imperial College London, Hammersmith Hospital, London, United Kingdom
| | - Rodrigo M Braga
- Computational, Cognitive, and Clinical Neuroimaging Laboratory, Division of Brain Sciences, Imperial College London, Hammersmith Hospital, London, United Kingdom.,Center for Brain Science, Harvard University, Cambridge, MA, United States
| | - Robert Leech
- Computational, Cognitive, and Clinical Neuroimaging Laboratory, Division of Brain Sciences, Imperial College London, Hammersmith Hospital, London, United Kingdom
| | - Amrish Mehta
- Department of Neuroradiology, Charing Cross Hospital, Imperial College Healthcare NHS Trust, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Richard J S Wise
- Computational, Cognitive, and Clinical Neuroimaging Laboratory, Division of Brain Sciences, Imperial College London, Hammersmith Hospital, London, United Kingdom
| |
Collapse
|
37
|
The effects of attention and task-relevance on the processing of syntactic violations during listening to two concurrent speech streams. COGNITIVE AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2018; 18:932-948. [DOI: 10.3758/s13415-018-0614-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
38
|
Josupeit A, Schoenmaker E, van de Par S, Hohmann V. Sparse periodicity-based auditory features explain human performance in a spatial multitalker auditory scene analysis task. Eur J Neurosci 2018; 51:1353-1363. [PMID: 29855099 DOI: 10.1111/ejn.13981] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2017] [Revised: 05/15/2018] [Accepted: 05/21/2018] [Indexed: 11/28/2022]
Abstract
Human listeners robustly decode speech information from a talker of interest that is embedded in a mixture of spatially distributed interferers. A relevant question is which time-frequency segments of the speech are predominantly used by a listener to solve such a complex Auditory Scene Analysis task. A recent psychoacoustic study investigated the relevance of low signal-to-noise ratio (SNR) components of a target signal on speech intelligibility in a spatial multitalker situation. For this, a three-talker stimulus was manipulated in the spectro-temporal domain such that target speech time-frequency units below a variable SNR threshold (SNRcrit ) were discarded while keeping the interferers unchanged. The psychoacoustic data indicate that only target components at and above a local SNR of about 0 dB contribute to intelligibility. This study applies an auditory scene analysis "glimpsing" model to the same manipulated stimuli. Model data are found to be similar to the human data, supporting the notion of "glimpsing," that is, that salient speech-related information is predominantly used by the auditory system to decode speech embedded in a mixture of sounds, at least for the tested conditions of three overlapping speech signals. This implies that perceptually relevant auditory information is sparse and may be processed with low computational effort, which is relevant for neurophysiological research of scene analysis and novelty processing in the auditory system.
Collapse
Affiliation(s)
- Angela Josupeit
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, Oldenburg, Germany
| | - Esther Schoenmaker
- Acoustics Group and Cluster of Excellence Hearing4all, Universität Oldenburg, Oldenburg, Germany
| | - Steven van de Par
- Acoustics Group and Cluster of Excellence Hearing4all, Universität Oldenburg, Oldenburg, Germany
| | - Volker Hohmann
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, Oldenburg, Germany
| |
Collapse
|
39
|
Nie Y, Galvin JJ, Morikawa M, André V, Wheeler H, Fu QJ. Music and Speech Perception in Children Using Sung Speech. Trends Hear 2018; 22:2331216518766810. [PMID: 29609496 PMCID: PMC5888806 DOI: 10.1177/2331216518766810] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
This study examined music and speech perception in normal-hearing children with some or no musical training. Thirty children (mean age = 11.3 years), 15 with and 15 without formal music training participated in the study. Music perception was measured using a melodic contour identification (MCI) task; stimuli were a piano sample or sung speech with a fixed timbre (same word for each note) or a mixed timbre (different words for each note). Speech perception was measured in quiet and in steady noise using a matrix-styled sentence recognition task; stimuli were naturally intonated speech or sung speech with a fixed pitch (same note for each word) or a mixed pitch (different notes for each word). Significant musician advantages were observed for MCI and speech in noise but not for speech in quiet. MCI performance was significantly poorer with the mixed timbre stimuli. Speech performance in noise was significantly poorer with the fixed or mixed pitch stimuli than with spoken speech. Across all subjects, age at testing and MCI performance were significantly correlated with speech performance in noise. MCI and speech performance in quiet was significantly poorer for children than for adults from a related study using the same stimuli and tasks; speech performance in noise was significantly poorer for young than for older children. Long-term music training appeared to benefit melodic pitch perception and speech understanding in noise in these pediatric listeners.
Collapse
Affiliation(s)
- Yingjiu Nie
- 1 Department of Communication Sciences and Disorders, 3745 James Madison University , Harrisonburg, VA, USA
| | | | - Michael Morikawa
- 1 Department of Communication Sciences and Disorders, 3745 James Madison University , Harrisonburg, VA, USA
| | - Victoria André
- 1 Department of Communication Sciences and Disorders, 3745 James Madison University , Harrisonburg, VA, USA
| | - Harley Wheeler
- 1 Department of Communication Sciences and Disorders, 3745 James Madison University , Harrisonburg, VA, USA
| | - Qian-Jie Fu
- 3 Department of Head and Neck Surgery, University of California-Los Angeles, CA, USA
| |
Collapse
|
40
|
Abstract
The cocktail party problem requires listeners to infer individual sound sources from mixtures of sound. The problem can be solved only by leveraging regularities in natural sound sources, but little is known about how such regularities are internalized. We explored whether listeners learn source "schemas"-the abstract structure shared by different occurrences of the same type of sound source-and use them to infer sources from mixtures. We measured the ability of listeners to segregate mixtures of time-varying sources. In each experiment a subset of trials contained schema-based sources generated from a common template by transformations (transposition and time dilation) that introduced acoustic variation but preserved abstract structure. Across several tasks and classes of sound sources, schema-based sources consistently aided source separation, in some cases producing rapid improvements in performance over the first few exposures to a schema. Learning persisted across blocks that did not contain the learned schema, and listeners were able to learn and use multiple schemas simultaneously. No learning was evident when schema were presented in the task-irrelevant (i.e., distractor) source. However, learning from task-relevant stimuli showed signs of being implicit, in that listeners were no more likely to report that sources recurred in experiments containing schema-based sources than in control experiments containing no schema-based sources. The results implicate a mechanism for rapidly internalizing abstract sound structure, facilitating accurate perceptual organization of sound sources that recur in the environment.
Collapse
|
41
|
Roberts B, Summers RJ. Informational masking of speech by time-varying competitors: Effects of frequency region and number of interfering formants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:891. [PMID: 29495741 DOI: 10.1121/1.5023476] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This study explored the extent to which informational masking of speech depends on the frequency region and number of extraneous formants in an interferer. Target formants-monotonized three-formant (F1+F2+F3) analogues of natural sentences-were presented monaurally, with target ear assigned randomly on each trial. Interferers were presented contralaterally. In experiment 1, single-formant interferers were created using the time-reversed F2 frequency contour and constant amplitude, root-mean-square (RMS)-matched to F2. Interferer center frequency was matched to that of F1, F2, or F3, while maintaining the extent of formant-frequency variation (depth) on a log scale. Adding an interferer lowered intelligibility; the effect of frequency region was small and broadly tuned around F2. In experiment 2, interferers comprised either one formant (F1, the most intense) or all three, created using the time-reversed frequency contours of the corresponding targets and RMS-matched constant amplitudes. Interferer formant-frequency variation was scaled to 0%, 50%, or 100% of the original depth. Increasing the depth of formant-frequency variation and number of formants in the interferer had independent and additive effects. These findings suggest that the impact on intelligibility depends primarily on the overall extent of frequency variation in each interfering formant (up to ∼100% depth) and the number of extraneous formants.
Collapse
Affiliation(s)
- Brian Roberts
- Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, United Kingdom
| | - Robert J Summers
- Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, United Kingdom
| |
Collapse
|
42
|
Kobald SO, Wascher E, Heppner H, Getzmann S. Eye blinks are related to auditory information processing: evidence from a complex speech perception task. PSYCHOLOGICAL RESEARCH 2018; 83:1281-1291. [DOI: 10.1007/s00426-017-0952-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2017] [Accepted: 11/23/2017] [Indexed: 10/18/2022]
|
43
|
Van Bentum GC, Van Opstal AJ, Van Aartrijk CMM, Van Wanrooij MM. Level-weighted averaging in elevation to synchronous amplitude-modulated sounds. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 142:3094. [PMID: 29195479 PMCID: PMC6147220 DOI: 10.1121/1.5011182] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
To program a goal-directed response in the presence of multiple sounds, the audiomotor system should separate the sound sources. The authors examined whether the brain can segregate synchronous broadband sounds in the midsagittal plane, using amplitude modulations as an acoustic discrimination cue. To succeed in this task, the brain has to use pinna-induced spectral-shape cues and temporal envelope information. The authors tested spatial segregation performance in the midsagittal plane in two paradigms in which human listeners were required to localize, or distinguish, a target amplitude-modulated broadband sound when a non-modulated broadband distractor was played simultaneously at another location. The level difference between the amplitude-modulated and distractor stimuli was systematically varied, as well as the modulation frequency of the target sound. The authors found that participants were unable to segregate, or localize, the synchronous sounds. Instead, they invariably responded toward a level-weighted average of both sound locations, irrespective of the modulation frequency. An increased variance in the response distributions for double sounds of equal level was also observed, which cannot be accounted for by a segregation model, or by a probabilistic averaging model.
Collapse
|
44
|
Stachurski M, Summers RJ, Roberts B. Stream segregation of concurrent speech and the verbal transformation effect: Influence of fundamental frequency and lateralization cues. Hear Res 2017; 354:16-27. [PMID: 28843209 DOI: 10.1016/j.heares.2017.07.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/04/2017] [Revised: 07/25/2017] [Accepted: 07/31/2017] [Indexed: 10/19/2022]
Abstract
Repeating a recorded word produces verbal transformations (VTs); perceptual regrouping of acoustic-phonetic elements may contribute to this effect. The influence of fundamental frequency (F0) and lateralization grouping cues was explored by presenting two concurrent sequences of the same word resynthesized on different F0s (100 and 178 Hz). In experiment 1, listeners monitored both sequences simultaneously, reporting for each any change in stimulus identity. Three lateralization conditions were used - diotic, ±680-μs interaural time difference, and dichotic. Results were similar for the first two conditions, but fewer forms and later initial transformations were reported in the dichotic condition. This suggests that large lateralization differences per se have little effect - rather, there are more possibilities for regrouping when each ear receives both sequences. In the dichotic condition, VTs reported for one sequence were also more independent of those reported for the other. Experiment 2 used diotic stimuli and explored the effect of the number of sequences presented and monitored. The most forms and earliest transformations were reported when two sequences were presented but only one was monitored, indicating that high task demands decreased reporting of VTs for concurrent sequences. Overall, these findings support the idea that perceptual regrouping contributes to the VT effect.
Collapse
Affiliation(s)
- Marcin Stachurski
- Psychology, School of Life and Health Sciences, Aston University, Birmingham, B4 7ET, UK
| | - Robert J Summers
- Psychology, School of Life and Health Sciences, Aston University, Birmingham, B4 7ET, UK
| | - Brian Roberts
- Psychology, School of Life and Health Sciences, Aston University, Birmingham, B4 7ET, UK.
| |
Collapse
|
45
|
Josupeit A, Hohmann V. Modeling speech localization, talker identification, and word recognition in a multi-talker setting. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 142:35. [PMID: 28764452 DOI: 10.1121/1.4990375] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
This study introduces a model for solving three different auditory tasks in a multi-talker setting: target localization, target identification, and word recognition. The model was used to simulate psychoacoustic data from a call-sign-based listening test involving multiple spatially separated talkers [Brungart and Simpson (2007). Percept. Psychophys. 69(1), 79-91]. The main characteristics of the model are (i) the extraction of salient auditory features ("glimpses") from the multi-talker signal and (ii) the use of a classification method that finds the best target hypothesis by comparing feature templates from clean target signals to the glimpses derived from the multi-talker mixture. The four features used were periodicity, periodic energy, and periodicity-based interaural time and level differences. The model results widely exceeded probability of chance for all subtasks and conditions, and generally coincided strongly with the subject data. This indicates that, despite their sparsity, glimpses provide sufficient information about a complex auditory scene. This also suggests that complex source superposition models may not be needed for auditory scene analysis. Instead, simple models of clean speech may be sufficient to decode even complex multi-talker scenes.
Collapse
Affiliation(s)
- Angela Josupeit
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, 26111 Oldenburg, Germany
| | - Volker Hohmann
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, 26111 Oldenburg, Germany
| |
Collapse
|
46
|
Farris HE, Ryan MJ. Schema vs. primitive perceptual grouping: the relative weighting of sequential vs. spatial cues during an auditory grouping task in frogs. J Comp Physiol A Neuroethol Sens Neural Behav Physiol 2017; 203:175-182. [PMID: 28197725 PMCID: PMC10084916 DOI: 10.1007/s00359-017-1149-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2016] [Revised: 01/19/2017] [Accepted: 01/20/2017] [Indexed: 10/20/2022]
Abstract
Perceptually, grouping sounds based on their sources is critical for communication. This is especially true in túngara frog breeding aggregations, where multiple males produce overlapping calls that consist of an FM 'whine' followed by harmonic bursts called 'chucks'. Phonotactic females use at least two cues to group whines and chucks: whine-chuck spatial separation and sequence. Spatial separation is a primitive cue, whereas sequence is schema-based, as chuck production is morphologically constrained to follow whines, meaning that males cannot produce the components simultaneously. When one cue is available, females perceptually group whines and chucks using relative comparisons: components with the smallest spatial separation or those closest to the natural sequence are more likely grouped. By simultaneously varying the temporal sequence and spatial separation of a single whine and two chucks, this study measured between-cue perceptual weighting during a specific grouping task. Results show that whine-chuck spatial separation is a stronger grouping cue than temporal sequence, as grouping is more likely for stimuli with smaller spatial separation and non-natural sequence than those with larger spatial separation and natural sequence. Compared to the schema-based whine-chuck sequence, we propose that spatial cues have less variance, potentially explaining their preferred use when grouping during directional behavioral responses.
Collapse
Affiliation(s)
- Hamilton E Farris
- Neuroscience Center, Louisiana State University Health Sciences Center, New Orleans, LA, 70112, USA. .,Department of Cell Biology and Anatomy, Louisiana State University Health Sciences Center, New Orleans, LA, 70112, USA. .,Department of Otorhinolaryingology, Louisiana State University Health Sciences Center, New Orleans, LA, 70112, USA.
| | - Michael J Ryan
- Department of Integrative Biology, University of Texas, 1 University Station C0930, Austin, TX, 78712, USA.,Smithsonian Tropical Research Institute, Balboa, Panama
| |
Collapse
|
47
|
Informational masking and the effects of differences in fundamental frequency and fundamental-frequency contour on phonetic integration in a formant ensemble. Hear Res 2017; 344:295-303. [DOI: 10.1016/j.heares.2016.10.026] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
48
|
Summers RJ, Bailey PJ, Roberts B. WITHDRAWN: Informational masking and the effects of differences in fundamental frequency and fundamental-frequency contour on phonetic integration in a formant ensemble. Hear Res 2017:S0378-5955(16)30380-X. [PMID: 28110077 DOI: 10.1016/j.heares.2016.10.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/22/2016] [Revised: 10/17/2016] [Accepted: 10/21/2016] [Indexed: 10/20/2022]
Affiliation(s)
- Robert J Summers
- Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, UK
| | - Peter J Bailey
- Department of Psychology, University of York, Heslington, York YO10 5DD, UK
| | - Brian Roberts
- Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, UK.
| |
Collapse
|
49
|
van Laarhoven T, Keetels M, Schakel L, Vroomen J. Audio-visual speech in noise perception in dyslexia. Dev Sci 2016; 21. [DOI: 10.1111/desc.12504] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Accepted: 08/09/2016] [Indexed: 11/30/2022]
Affiliation(s)
- Thijs van Laarhoven
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| | - Mirjam Keetels
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| | - Lemmy Schakel
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| | - Jean Vroomen
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| |
Collapse
|
50
|
Helfer KS, Merchant GR, Freyman RL. Aging and the effect of target-masker alignment. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:3844. [PMID: 27908027 PMCID: PMC5392104 DOI: 10.1121/1.4967297] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/11/2016] [Revised: 10/05/2016] [Accepted: 10/25/2016] [Indexed: 05/29/2023]
Abstract
Similarity between target and competing speech messages plays a large role in how easy or difficult it is to understand messages of interest. Much research on informational masking has used highly aligned target and masking utterances that are very similar semantically and syntactically. However, listeners rarely encounter situations in real life where they must understand one sentence in the presence of another (or more than one) highly aligned, syntactically similar competing sentence(s). The purpose of the present study was to examine the effect of syntactic/semantic similarity of target and masking speech in different spatial conditions among younger, middle-aged, and older adults. The results of this experiment indicate that differences in speech recognition between older and younger participants were largest when the masker surrounded the target and was more similar to the target, especially at more adverse signal-to-noise ratios. Differences among listeners and the effect of similarity were much less robust, and all listeners were relatively resistant to masking, when maskers were located on one side of the target message. The present results suggest that previous studies using highly aligned stimuli may have overestimated age-related speech recognition problems.
Collapse
Affiliation(s)
- Karen S Helfer
- Department of Communication Disorders, University of Massachusetts Amherst, 358 North Pleasant Street, Amherst, Massachusetts 01003, USA
| | - Gabrielle R Merchant
- Department of Communication Disorders, University of Massachusetts Amherst, 358 North Pleasant Street, Amherst, Massachusetts 01003, USA
| | - Richard L Freyman
- Department of Communication Disorders, University of Massachusetts Amherst, 358 North Pleasant Street, Amherst, Massachusetts 01003, USA
| |
Collapse
|