1
|
Windle R, Dillon H, Heinrich A. A review of auditory processing and cognitive change during normal ageing, and the implications for setting hearing aids for older adults. Front Neurol 2023; 14:1122420. [PMID: 37409017 PMCID: PMC10318159 DOI: 10.3389/fneur.2023.1122420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 06/02/2023] [Indexed: 07/07/2023] Open
Abstract
Throughout our adult lives there is a decline in peripheral hearing, auditory processing and elements of cognition that support listening ability. Audiometry provides no information about the status of auditory processing and cognition, and older adults often struggle with complex listening situations, such as speech in noise perception, even if their peripheral hearing appears normal. Hearing aids can address some aspects of peripheral hearing impairment and improve signal-to-noise ratios. However, they cannot directly enhance central processes and may introduce distortion to sound that might act to undermine listening ability. This review paper highlights the need to consider the distortion introduced by hearing aids, specifically when considering normally-ageing older adults. We focus on patients with age-related hearing loss because they represent the vast majority of the population attending audiology clinics. We believe that it is important to recognize that the combination of peripheral and central, auditory and cognitive decline make older adults some of the most complex patients seen in audiology services, so they should not be treated as "standard" despite the high prevalence of age-related hearing loss. We argue that a primary concern should be to avoid hearing aid settings that introduce distortion to speech envelope cues, which is not a new concept. The primary cause of distortion is the speed and range of change to hearing aid amplification (i.e., compression). We argue that slow-acting compression should be considered as a default for some users and that other advanced features should be reconsidered as they may also introduce distortion that some users may not be able to tolerate. We discuss how this can be incorporated into a pragmatic approach to hearing aid fitting that does not require increased loading on audiology services.
Collapse
Affiliation(s)
- Richard Windle
- Audiology Department, Royal Berkshire NHS Foundation Trust, Reading, United Kingdom
| | - Harvey Dillon
- NIHR Manchester Biomedical Research Centre, Manchester, United Kingdom
- Department of Linguistics, Macquarie University, North Ryde, NSW, Australia
| | - Antje Heinrich
- NIHR Manchester Biomedical Research Centre, Manchester, United Kingdom
- Division of Human Communication, Development and Hearing, School of Health Sciences, University of Manchester, Manchester, United Kingdom
| |
Collapse
|
2
|
Hernández-Pérez H, Mikiel-Hunter J, McAlpine D, Dhar S, Boothalingam S, Monaghan JJM, McMahon CM. Understanding degraded speech leads to perceptual gating of a brainstem reflex in human listeners. PLoS Biol 2021; 19:e3001439. [PMID: 34669696 PMCID: PMC8559948 DOI: 10.1371/journal.pbio.3001439] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 11/01/2021] [Accepted: 10/07/2021] [Indexed: 11/19/2022] Open
Abstract
The ability to navigate "cocktail party" situations by focusing on sounds of interest over irrelevant, background sounds is often considered in terms of cortical mechanisms. However, subcortical circuits such as the pathway underlying the medial olivocochlear (MOC) reflex modulate the activity of the inner ear itself, supporting the extraction of salient features from auditory scene prior to any cortical processing. To understand the contribution of auditory subcortical nuclei and the cochlea in complex listening tasks, we made physiological recordings along the auditory pathway while listeners engaged in detecting non(sense) words in lists of words. Both naturally spoken and intrinsically noisy, vocoded speech-filtering that mimics processing by a cochlear implant (CI)-significantly activated the MOC reflex, but this was not the case for speech in background noise, which more engaged midbrain and cortical resources. A model of the initial stages of auditory processing reproduced specific effects of each form of speech degradation, providing a rationale for goal-directed gating of the MOC reflex based on enhancing the representation of the energy envelope of the acoustic waveform. Our data reveal the coexistence of 2 strategies in the auditory system that may facilitate speech understanding in situations where the signal is either intrinsically degraded or masked by extrinsic acoustic energy. Whereas intrinsically degraded streams recruit the MOC reflex to improve representation of speech cues peripherally, extrinsically masked streams rely more on higher auditory centres to denoise signals.
Collapse
Affiliation(s)
- Heivet Hernández-Pérez
- Department of Linguistics, The Australian Hearing Hub, Macquarie University, Sydney, Australia
| | - Jason Mikiel-Hunter
- Department of Linguistics, The Australian Hearing Hub, Macquarie University, Sydney, Australia
| | - David McAlpine
- Department of Linguistics, The Australian Hearing Hub, Macquarie University, Sydney, Australia
| | - Sumitrajit Dhar
- Department of Communication Sciences and Disorders, Northwestern University, Evanston, Illinois, United States of America
| | - Sriram Boothalingam
- University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Jessica J. M. Monaghan
- Department of Linguistics, The Australian Hearing Hub, Macquarie University, Sydney, Australia
- National Acoustic Laboratories, Sydney, Australia
| | - Catherine M. McMahon
- Department of Linguistics, The Australian Hearing Hub, Macquarie University, Sydney, Australia
| |
Collapse
|
3
|
Listening to speech with a guinea pig-to-human brain-to-brain interface. Sci Rep 2021; 11:12231. [PMID: 34112826 PMCID: PMC8192924 DOI: 10.1038/s41598-021-90823-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 05/12/2021] [Indexed: 11/30/2022] Open
Abstract
Nicolelis wrote in his 2003 review on brain-machine interfaces (BMIs) that the design of a successful BMI relies on general physiological principles describing how neuronal signals are encoded. Our study explored whether neural information exchanged between brains of different species is possible, similar to the information exchange between computers. We show for the first time that single words processed by the guinea pig auditory system are intelligible to humans who receive the processed information via a cochlear implant. We recorded the neural response patterns to single-spoken words with multi-channel electrodes from the guinea inferior colliculus. The recordings served as a blueprint for trains of biphasic, charge-balanced electrical pulses, which a cochlear implant delivered to the cochlear implant user’s ear. Study participants completed a four-word forced-choice test and identified the correct word in 34.8% of trials. The participants' recognition, defined by the ability to choose the same word twice, whether right or wrong, was 53.6%. For all sessions, the participants received no training and no feedback. The results show that lexical information can be transmitted from an animal to a human auditory system. In the discussion, we will contemplate how learning from the animals might help developing novel coding strategies.
Collapse
|
4
|
Gao X, Yan T, Huang T, Li X, Zhang YX. Speech in noise perception improved by training fine auditory discrimination: far and applicable transfer of perceptual learning. Sci Rep 2020; 10:19320. [PMID: 33168921 PMCID: PMC7653913 DOI: 10.1038/s41598-020-76295-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 10/21/2020] [Indexed: 12/12/2022] Open
Abstract
A longstanding focus of perceptual learning research is learning specificity, the difficulty for learning to transfer to tasks and situations beyond the training setting. Previous studies have focused on promoting transfer across stimuli, such as from one sound frequency to another. Here we examined whether learning could transfer across tasks, particularly from fine discrimination of sound features to speech perception in noise, one of the most frequently encountered perceptual challenges in real life. Separate groups of normal-hearing listeners were trained on auditory interaural level difference (ILD) discrimination, interaural time difference (ITD) discrimination, and fundamental frequency (F0) discrimination with non-speech stimuli delivered through headphones. While ITD training led to no improvement, both ILD and F0 training produced learning as well as transfer to speech-in-noise perception when noise differed from speech in the trained feature. These training benefits did not require similarity of task or stimuli between training and application settings, construing far and wide transfer. Thus, notwithstanding task specificity among basic perceptual skills such as discrimination of different sound features, auditory learning appears readily transferable between these skills and their “upstream” tasks utilizing them, providing an effective approach to improving performance in challenging situations or challenged populations.
Collapse
Affiliation(s)
- Xiang Gao
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, 100875, China
| | - Tingting Yan
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, 100875, China
| | - Ting Huang
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, 100875, China
| | - Xiaoli Li
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, 100875, China
| | - Yu-Xuan Zhang
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, 100875, China.
| |
Collapse
|
5
|
Noise-Sensitive But More Precise Subcortical Representations Coexist with Robust Cortical Encoding of Natural Vocalizations. J Neurosci 2020; 40:5228-5246. [PMID: 32444386 DOI: 10.1523/jneurosci.2731-19.2020] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 05/08/2020] [Accepted: 05/15/2020] [Indexed: 01/30/2023] Open
Abstract
Humans and animals maintain accurate sound discrimination in the presence of loud sources of background noise. It is commonly assumed that this ability relies on the robustness of auditory cortex responses. However, only a few attempts have been made to characterize neural discrimination of communication sounds masked by noise at each stage of the auditory system and to quantify the noise effects on the neuronal discrimination in terms of alterations in amplitude modulations. Here, we measured neural discrimination between communication sounds masked by a vocalization-shaped stationary noise from multiunit responses recorded in the cochlear nucleus, inferior colliculus, auditory thalamus, and primary and secondary auditory cortex at several signal-to-noise ratios (SNRs) in anesthetized male or female guinea pigs. Masking noise decreased sound discrimination of neuronal populations in each auditory structure, but collicular and thalamic populations showed better performance than cortical populations at each SNR. In contrast, in each auditory structure, discrimination by neuronal populations was slightly decreased when tone-vocoded vocalizations were tested. These results shed new light on the specific contributions of subcortical structures to robust sound encoding, and suggest that the distortion of slow amplitude modulation cues conveyed by communication sounds is one of the factors constraining the neuronal discrimination in subcortical and cortical levels.SIGNIFICANCE STATEMENT Dissecting how auditory neurons discriminate communication sounds in noise is a major goal in auditory neuroscience. Robust sound coding in noise is often viewed as a specific property of cortical networks, although this remains to be demonstrated. Here, we tested the discrimination performance of neuronal populations at five levels of the auditory system in response to conspecific vocalizations masked by noise. In each acoustic condition, subcortical neurons better discriminated target vocalizations than cortical ones and in each structure, the reduction in discrimination performance was related to the reduction in slow amplitude modulation cues.
Collapse
|
6
|
Manno FAM, Lau C, Fernandez-Ruiz J, Manno SHC, Cheng SH, Barrios FA. The human amygdala disconnecting from auditory cortex preferentially discriminates musical sound of uncertain emotion by altering hemispheric weighting. Sci Rep 2019; 9:14787. [PMID: 31615998 PMCID: PMC6794305 DOI: 10.1038/s41598-019-50042-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2019] [Accepted: 08/24/2019] [Indexed: 02/06/2023] Open
Abstract
How do humans discriminate emotion from non-emotion? The specific psychophysical cues and neural responses involved with resolving emotional information in sound are unknown. In this study we used a discrimination psychophysical-fMRI sparse sampling paradigm to locate threshold responses to happy and sad acoustic stimuli. The fine structure and envelope of auditory signals were covaried to vary emotional certainty. We report that emotion identification at threshold in music utilizes fine structure cues. The auditory cortex was activated but did not vary with emotional uncertainty. Amygdala activation was modulated by emotion identification and was absent when emotional stimuli were chance identifiable, especially in the left hemisphere. The right hemisphere amygdala was considerably more deactivated in response to uncertain emotion. The threshold of emotion was signified by a right amygdala deactivation and change of left amygdala greater than right amygdala activation. Functional sex differences were noted during binaural uncertain emotional stimuli presentations, where the right amygdala showed larger activation in females. Negative control (silent stimuli) experiments investigated sparse sampling of silence to ensure modulation effects were inherent to emotional resolvability. No functional modulation of Heschl's gyrus occurred during silence; however, during rest the amygdala baseline state was asymmetrically lateralized. The evidence indicates changing hemispheric activation and deactivation patterns between the left and right amygdala is a hallmark feature of discriminating emotion from non-emotion in music.
Collapse
Affiliation(s)
- Francis A M Manno
- School of Biomedical Engineering, Faculty of Engineering, The University of Sydney, Sydney, New South Wales, Australia.
- Department of Physics, City University of Hong Kong, HKSAR, China.
| | - Condon Lau
- Department of Physics, City University of Hong Kong, HKSAR, China.
| | - Juan Fernandez-Ruiz
- Departamento de Fisiología, Facultad de Medicina, Universidad Nacional Autónoma de México, México City, 04510, Mexico
| | | | - Shuk Han Cheng
- Department of Biomedical Sciences, City University of Hong Kong, HKSAR, China
| | - Fernando A Barrios
- Instituto de Neurobiología, Universidad Nacional Autónoma de México, Juriquilla, Querétaro, Mexico.
| |
Collapse
|
7
|
Manno FAM, Cruces RR, Lau C, Barrios FA. Uncertain Emotion Discrimination Differences Between Musicians and Non-musicians Is Determined by Fine Structure Association: Hilbert Transform Psychophysics. Front Neurosci 2019; 13:902. [PMID: 31619943 PMCID: PMC6759500 DOI: 10.3389/fnins.2019.00902] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Accepted: 08/13/2019] [Indexed: 11/13/2022] Open
Abstract
Humans perceive musical sound as a complex phenomenon, which is known to induce an emotional response. The cues used to perceive emotion in music have not been unequivocally elucidated. Here, we sought to identify the attributes of sound that confer an emotion to music and determine if professional musicians have different musical emotion perception than non-musicians. The objective was to determine which sound cues are used to resolve emotional signals. Happy or sad classical music excerpts modified in fine structure or envelope conveying different degrees of emotional certainty were presented. Certainty was determined by identification of the emotional characteristic presented during a forced-choice discrimination task. Participants were categorized as good or poor performers (n = 32, age 21.16 ± 2.59 SD) and in a separate group as musicians in the first or last year of music education at a conservatory (n = 32, age 21.97 ± 2.42). We found that temporal fine structure information is essential for correct emotional identification. Non-musicians used less fine structure information to discriminate emotion in music compared with musicians. The present psychophysical experiments revealed what cues are used to resolve emotional signals and how they differ between non-musicians and musically educated individuals.
Collapse
Affiliation(s)
- Francis A. M. Manno
- School of Biomedical Engineering, Faculty of Engineering, University of Sydney, Sydney, NSW, Australia
- Department of Physics, City University of Hong Kong, Hong Kong, China
| | - Raul R. Cruces
- Instituto de Neurobiología, Universidad Nacional Autónoma de México, Querétaro, Mexico
| | - Condon Lau
- Department of Physics, City University of Hong Kong, Hong Kong, China
| | - Fernando A. Barrios
- Instituto de Neurobiología, Universidad Nacional Autónoma de México, Querétaro, Mexico
| |
Collapse
|
8
|
Jacobi I, Sheikh Rashid M, de Laat JAPM, Dreschler WA. Age Dependence of Thresholds for Speech in Noise in Normal-Hearing Adolescents. Trends Hear 2019; 21:2331216517743641. [PMID: 29212433 PMCID: PMC5724638 DOI: 10.1177/2331216517743641] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Previously found effects of age on thresholds for speech reception thresholds in noise in adolescents as measured by an online screening survey require further study in a well-controlled teenage sample. Speech reception thresholds (SRT) of 72 normal-hearing adolescent students were analyzed by means of the online speech-in-noise screening tool Earcheck (In Dutch: Oorcheck). Screening was performed at school and included pure-tone audiometry to ensure normal-hearing thresholds. The students’ ages ranged from 12 to 17 years. A group of young adults was included as a control group. Data were controlled for effects of gender and level of education. SRT scores within the controlled teenage sample revealed an effect of age on the order of an improvement of −0.2 dB per year. Effects of level of education and gender were not significant. Hearing screening tools that are based on SRT for speech in noise should control for an effect of age when assessing adolescents. Based on the present data, a correction factor of −0.2 dB per year between the ages of 12 and 17 is proposed. The proposed age-corrected SRT cut-off scores need to be evaluated in a larger sample including hearing-impaired adolescents.
Collapse
Affiliation(s)
- Irene Jacobi
- 1 Department of Clinical and Experimental Audiology, 26066 Academic Medical Centre , Amsterdam, The Netherlands
| | - Marya Sheikh Rashid
- 1 Department of Clinical and Experimental Audiology, 26066 Academic Medical Centre , Amsterdam, The Netherlands
| | - Jan A P M de Laat
- 2 Department of Audiology, 4501 Leiden University Medical Centre , Leiden, The Netherlands
| | - Wouter A Dreschler
- 1 Department of Clinical and Experimental Audiology, 26066 Academic Medical Centre , Amsterdam, The Netherlands
| |
Collapse
|
9
|
A mechanoelectrical mechanism for detection of sound envelopes in the hearing organ. Nat Commun 2018; 9:4175. [PMID: 30302006 PMCID: PMC6177430 DOI: 10.1038/s41467-018-06725-w] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2017] [Accepted: 09/21/2018] [Indexed: 11/22/2022] Open
Abstract
To understand speech, the slowly varying outline, or envelope, of the acoustic stimulus is used to distinguish words. A small amount of information about the envelope is sufficient for speech recognition, but the mechanism used by the auditory system to extract the envelope is not known. Several different theories have been proposed, including envelope detection by auditory nerve dendrites as well as various mechanisms involving the sensory hair cells. We used recordings from human and animal inner ears to show that the dominant mechanism for envelope detection is distortion introduced by mechanoelectrical transduction channels. This electrical distortion, which is not apparent in the sound-evoked vibrations of the basilar membrane, tracks the envelope, excites the auditory nerve, and transmits information about the shape of the envelope to the brain. The sound envelope is important for speech perception. Here, the authors look at mechanisms by which the sound envelope is encoded, finding that it arises from distortion produced by mechanoelectrical transduction channels. Surprisingly, the envelope is not present in basilar membrane vibrations.
Collapse
|
10
|
Neural representation of interaural correlation in human auditory brainstem: Comparisons between temporal-fine structure and envelope. Hear Res 2018; 365:165-173. [PMID: 29853322 DOI: 10.1016/j.heares.2018.05.015] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/26/2017] [Revised: 05/05/2018] [Accepted: 05/20/2018] [Indexed: 11/24/2022]
Abstract
Central processing of interaural correlation (IAC), which depends on the precise representation of acoustic signals from the two ears, is essential for both localization and recognition of auditory objects. A complex soundwave is initially filtered by the peripheral auditory system into multiple narrowband waves, which are further decomposed into two functionally distinctive components: the quickly-varying temporal-fine structure (TFS) and the slowly-varying envelope. In rats, a narrowband noise can evoke auditory-midbrain frequency-following responses (FFRs) that contain both the TFS component (FFRTFS) and the envelope component (FFREnv), which represent the TFS and envelope of the narrowband noise, respectively. These two components are different in sensitivity to the interaural time disparity. In human listeners, the present study investigated whether the FFRTFS and FFREnv components of brainstem FFRs to a narrowband noise are different in sensitivity to IAC and whether there are potential brainstem mechanisms underlying the integration of the two components. The results showed that although both the amplitude of FFRTFS and that of FFREnv were significantly affected by shifts of IAC between 1 and 0, the stimulus-to-response correlation for FFRTFS, but not that for FFREnv, was sensitive to the IAC shifts. Moreover, in addition to the correlation between the binaurally evoked FFRTFS and FFREnv, the correlation between the IAC-shift-induced change of FFRTFS and that of FFREnv was significant. Thus, the TFS information is more precisely represented in the human auditory brainstem than the envelope information, and the correlation between FFRTFS and FFREnv for the same narrowband noise suggest a brainstem binding mechanism underlying the perceptual integration of the TFS and envelope signals.
Collapse
|
11
|
Factors Affecting Speech Reception in Background Noise with a Vocoder Implementation of the FAST Algorithm. J Assoc Res Otolaryngol 2018; 19:467-478. [PMID: 29744731 DOI: 10.1007/s10162-018-0672-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2017] [Accepted: 04/23/2018] [Indexed: 10/16/2022] Open
Abstract
Speech segregation in background noise remains a difficult task for individuals with hearing loss. Several signal processing strategies have been developed to improve the efficacy of hearing assistive technologies in complex listening environments. The present study measured speech reception thresholds in normal-hearing listeners attending to a vocoder based on the Fundamental Asynchronous Stimulus Timing algorithm (FAST: Smith et al. 2014), which triggers pulses based on the amplitudes of channel magnitudes in order to preserve envelope timing cues, with two different reconstruction bandwidths (narrowband and broadband) to control the degree of spectrotemporal resolution. Five types of background noise were used including same male talker, female talker, time-reversed male talker, time-reversed female talker, and speech-shaped noise to probe the contributions of different types of speech segregation cues and to elucidate how degradation affects speech reception across these conditions. Maskers were spatialized using head-related transfer functions in order to create co-located and spatially separated conditions. Results indicate that benefits arising from voicing and spatial cues can be preserved using the FAST algorithm but are reduced with a reduction in spectral resolution.
Collapse
|
12
|
Focal versus distributed temporal cortex activity for speech sound category assignment. Proc Natl Acad Sci U S A 2018; 115:E1299-E1308. [PMID: 29363598 PMCID: PMC5819402 DOI: 10.1073/pnas.1714279115] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
When listening to speech, phonemes are represented in a distributed fashion in our temporal and prefrontal cortices. How these representations are selected in a phonemic decision context, and in particular whether distributed or focal neural information is required for explicit phoneme recognition, is unclear. We hypothesized that focal and early neural encoding of acoustic signals is sufficiently informative to access speech sound representations and permit phoneme recognition. We tested this hypothesis by combining a simple speech-phoneme categorization task with univariate and multivariate analyses of fMRI, magnetoencephalography, intracortical, and clinical data. We show that neural information available focally in the temporal cortex prior to decision-related neural activity is specific enough to account for human phonemic identification. Percepts and words can be decoded from distributed neural activity measures. However, the existence of widespread representations might conflict with the more classical notions of hierarchical processing and efficient coding, which are especially relevant in speech processing. Using fMRI and magnetoencephalography during syllable identification, we show that sensory and decisional activity colocalize to a restricted part of the posterior superior temporal gyrus (pSTG). Next, using intracortical recordings, we demonstrate that early and focal neural activity in this region distinguishes correct from incorrect decisions and can be machine-decoded to classify syllables. Crucially, significant machine decoding was possible from neuronal activity sampled across different regions of the temporal and frontal lobes, despite weak or absent sensory or decision-related responses. These findings show that speech-sound categorization relies on an efficient readout of focal pSTG neural activity, while more distributed activity patterns, although classifiable by machine learning, instead reflect collateral processes of sensory perception and decision.
Collapse
|
13
|
Riecke L, Formisano E, Sorger B, Başkent D, Gaudrain E. Neural Entrainment to Speech Modulates Speech Intelligibility. Curr Biol 2017; 28:161-169.e5. [PMID: 29290557 DOI: 10.1016/j.cub.2017.11.033] [Citation(s) in RCA: 116] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Revised: 10/26/2017] [Accepted: 11/15/2017] [Indexed: 01/02/2023]
Abstract
Speech is crucial for communication in everyday life. Speech-brain entrainment, the alignment of neural activity to the slow temporal fluctuations (envelope) of acoustic speech input, is a ubiquitous element of current theories of speech processing. Associations between speech-brain entrainment and acoustic speech signal, listening task, and speech intelligibility have been observed repeatedly. However, a methodological bottleneck has prevented so far clarifying whether speech-brain entrainment contributes functionally to (i.e., causes) speech intelligibility or is merely an epiphenomenon of it. To address this long-standing issue, we experimentally manipulated speech-brain entrainment without concomitant acoustic and task-related variations, using a brain stimulation approach that enables modulating listeners' neural activity with transcranial currents carrying speech-envelope information. Results from two experiments involving a cocktail-party-like scenario and a listening situation devoid of aural speech-amplitude envelope input reveal consistent effects on listeners' speech-recognition performance, demonstrating a causal role of speech-brain entrainment in speech intelligibility. Our findings imply that speech-brain entrainment is critical for auditory speech comprehension and suggest that transcranial stimulation with speech-envelope-shaped currents can be utilized to modulate speech comprehension in impaired listening conditions.
Collapse
Affiliation(s)
- Lars Riecke
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, 6229 EV Maastricht, the Netherlands.
| | - Elia Formisano
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, 6229 EV Maastricht, the Netherlands
| | - Bettina Sorger
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, 6229 EV Maastricht, the Netherlands
| | - Deniz Başkent
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, 9700 RB Groningen, the Netherlands
| | - Etienne Gaudrain
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, 9700 RB Groningen, the Netherlands; CNRS UMR 5292, Lyon Neuroscience Research Center, Auditory Cognition and Psychoacoustics, Inserm UMRS 1028, Université Claude Bernard Lyon 1, Université de Lyon, 69366 Lyon Cedex 07, France
| |
Collapse
|
14
|
Xu Y, Chen M, LaFaire P, Tan X, Richter CP. Distorting temporal fine structure by phase shifting and its effects on speech intelligibility and neural phase locking. Sci Rep 2017; 7:13387. [PMID: 29042580 PMCID: PMC5645416 DOI: 10.1038/s41598-017-12975-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2017] [Accepted: 09/13/2017] [Indexed: 11/27/2022] Open
Abstract
Envelope (E) and temporal fine structure (TFS) are important features of acoustic signals and their corresponding perceptual function has been investigated with various listening tasks. To further understand the underlying neural processing of TFS, experiments in humans and animals were conducted to demonstrate the effects of modifying the TFS in natural speech sentences on both speech recognition and neural coding. The TFS of natural speech sentences was modified by distorting the phase and maintaining the magnitude. Speech intelligibility was then tested for normal-hearing listeners using the intact and reconstructed sentences presented in quiet and against background noise. Sentences with modified TFS were then used to evoke neural activity in auditory neurons of the inferior colliculus in guinea pigs. Our study demonstrated that speech intelligibility in humans relied on the periodic cues of speech TFS in both quiet and noisy listening conditions. Furthermore, recordings of neural activity from the guinea pig inferior colliculus have shown that individual auditory neurons exhibit phase locking patterns to the periodic cues of speech TFS that disappear when reconstructed sounds do not show periodic patterns anymore. Thus, the periodic cues of TFS are essential for speech intelligibility and are encoded in auditory neurons by phase locking.
Collapse
Affiliation(s)
- Yingyue Xu
- Northwestern University, Department of Otolaryngology, 320 E. Superior Street, Searle 12-561, Chicago, IL, 60611, USA
| | - Maxin Chen
- Northwestern University, Department of Biomedical Engineering, 2145 Sheridan Road, Tech E310, Evanston, IL, 60208, USA
| | - Petrina LaFaire
- Northwestern University, Department of Otolaryngology, 320 E. Superior Street, Searle 12-561, Chicago, IL, 60611, USA
| | - Xiaodong Tan
- Northwestern University, Department of Otolaryngology, 320 E. Superior Street, Searle 12-561, Chicago, IL, 60611, USA
| | - Claus-Peter Richter
- Northwestern University, Department of Otolaryngology, 320 E. Superior Street, Searle 12-561, Chicago, IL, 60611, USA. .,Northwestern University, The Hugh Knowles Center, Department of Communication Sciences and Disorders, 2240 Campus Drive, Evanston, IL, 60208, USA.
| |
Collapse
|
15
|
Qi B, Mao Y, Liu J, Liu B, Xu L. Relative contributions of acoustic temporal fine structure and envelope cues for lexical tone perception in noise. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:3022. [PMID: 28599529 PMCID: PMC5415402 DOI: 10.1121/1.4982247] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2016] [Revised: 03/21/2017] [Accepted: 04/11/2017] [Indexed: 06/07/2023]
Abstract
Previous studies have shown that lexical tone perception in quiet relies on the acoustic temporal fine structure (TFS) but not on the envelope (E) cues. The contributions of TFS to speech recognition in noise are under debate. In the present study, Mandarin tone tokens were mixed with speech-shaped noise (SSN) or two-talker babble (TTB) at five signal-to-noise ratios (SNRs; -18 to +6 dB). The TFS and E were then extracted from each of the 30 bands using Hilbert transform. Twenty-five combinations of TFS and E from the sound mixtures of the same tone tokens at various SNRs were created. Twenty normal-hearing, native-Mandarin-speaking listeners participated in the tone-recognition test. Results showed that tone-recognition performance improved as the SNRs in either TFS or E increased. The masking effects on tone perception for the TTB were weaker than those for the SSN. For both types of masker, the perceptual weights of TFS and E in tone perception in noise was nearly equivalent, with E playing a slightly greater role than TFS. Thus, the relative contributions of TFS and E cues to lexical tone perception in noise or in competing-talker maskers differ from those in quiet and those to speech perception of non-tonal languages.
Collapse
Affiliation(s)
- Beier Qi
- Department of Otolaryngology-Head and Neck Surgery, Beijing Tongren Hospital, Capital Medical University, Beijing, China
| | - Yitao Mao
- Department of Radiology, Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Jiaxing Liu
- Department of Otolaryngology-Head and Neck Surgery, Beijing Tongren Hospital, Capital Medical University, Beijing, China
| | - Bo Liu
- Department of Otolaryngology-Head and Neck Surgery, Beijing Tongren Hospital, Capital Medical University, Beijing, China
| | - Li Xu
- Communication Sciences and Disorders, Ohio University, Athens, Ohio 45701, USA
| |
Collapse
|
16
|
Hedrick MS, Moon IJ, Woo J, Won JH. Effects of Physiological Internal Noise on Model Predictions of Concurrent Vowel Identification for Normal-Hearing Listeners. PLoS One 2016; 11:e0149128. [PMID: 26866811 PMCID: PMC4750862 DOI: 10.1371/journal.pone.0149128] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2015] [Accepted: 01/27/2016] [Indexed: 11/18/2022] Open
Abstract
Previous studies have shown that concurrent vowel identification improves with increasing temporal onset asynchrony of the vowels, even if the vowels have the same fundamental frequency. The current study investigated the possible underlying neural processing involved in concurrent vowel perception. The individual vowel stimuli from a previously published study were used as inputs for a phenomenological auditory-nerve (AN) model. Spectrotemporal representations of simulated neural excitation patterns were constructed (i.e., neurograms) and then matched quantitatively with the neurograms of the single vowels using the Neurogram Similarity Index Measure (NSIM). A novel computational decision model was used to predict concurrent vowel identification. To facilitate optimum matches between the model predictions and the behavioral human data, internal noise was added at either neurogram generation or neurogram matching using the NSIM procedure. The best fit to the behavioral data was achieved with a signal-to-noise ratio (SNR) of 8 dB for internal noise added at the neurogram but with a much smaller amount of internal noise (SNR of 60 dB) for internal noise added at the level of the NSIM computations. The results suggest that accurate modeling of concurrent vowel data from listeners with normal hearing may partly depend on internal noise and where internal noise is hypothesized to occur during the concurrent vowel identification process.
Collapse
Affiliation(s)
- Mark S. Hedrick
- Department of Audiology and Speech Pathology, University of Tennessee Health Science Center, Knoxville, TN, United States of America
| | - Il Joon Moon
- Department of Otorhinolaryngology-Head and Neck Surgery, Samsung Medical Center, Sungkyunkwan University, School of Medicine, Seoul, Korea
| | - Jihwan Woo
- Department of Biomedical Engineering, University of Ulsan, Ulsan, Korea
- * E-mail:
| | - Jong Ho Won
- Department of Audiology and Speech Pathology, University of Tennessee Health Science Center, Knoxville, TN, United States of America
| |
Collapse
|
17
|
Wang Q, Li L. Auditory midbrain representation of a break in interaural correlation. J Neurophysiol 2015; 114:2258-64. [PMID: 26269559 DOI: 10.1152/jn.00645.2015] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2015] [Accepted: 08/10/2015] [Indexed: 11/22/2022] Open
Abstract
The auditory peripheral system filters broadband sounds into narrowband waves and decomposes narrowband waves into quickly varying temporal fine structures (TFSs) and slowly varying envelopes. When a noise is presented binaurally (with the interaural correlation being 1), human listeners can detect a transient break in interaural correlation (BIC), which does not alter monaural inputs substantially. The central correlates of BIC are unknown. This study examined whether phase locking-based frequency-following responses (FFRs) of neuron populations in the rat auditory midbrain [inferior colliculus (IC)] to interaurally correlated steady-state narrowband noises are modulated by introduction of a BIC. The results showed that the noise-induced FFR exhibited both a TFS component (FFRTFS) and an envelope component (FFREnv), signaling the center frequency and bandwidth, respectively. Introduction of either a BIC or an interaurally correlated amplitude gap (which had the summated amplitude matched to the BIC) significantly reduced both FFRTFS and FFREnv. However, the BIC-induced FFRTFS reduction and FFREnv reduction were not correlated with the amplitude gap-induced FFRTFS reduction and FFREnv reduction, respectively. Thus, although introduction of a BIC does not affect monaural inputs, it causes a temporary reduction in sustained responses of IC neuron populations to the noise. This BIC-induced FFR reduction is not based on a simple linear summation of noise signals.
Collapse
Affiliation(s)
- Qian Wang
- Department of Psychology and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, People's Republic of China
| | - Liang Li
- Department of Psychology and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, People's Republic of China; Speech and Hearing Research Center, Key Laboratory on Machine Perception (Ministry of Education), Peking University, Beijing, People's Republic of China; PKU-IDG/McGovern Institute for Brain Research, Peking University, Beijing, People's Republic of China; and Beijing Institute for Brain Disorders, Beijing, People's Republic of China
| |
Collapse
|