1
|
Maruyama H, Motoyoshi I. Two-stage spectral space and the perceptual properties of sound textures. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2025; 157:2067-2076. [PMID: 40130951 DOI: 10.1121/10.0036219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Accepted: 03/03/2025] [Indexed: 03/26/2025]
Abstract
Textural sounds can be perceived in the natural environment such as wind, waterflows, and footsteps. Recent studies have shown that the perception of auditory textures can be described and synthesized by the multiple classes of time-averaged statistics or the linear spectra and energy spectra of input sounds. The findings lead to a possibility that the explicit perceptual property of a textural sound, such as heaviness and complexity, could be predictable from the two-stage spectra. In the present study, numerous rating data were collected for 17 different perceptual properties with 325 real-world sounds, and the relationship between the rating and the two-stage spectral characteristics was investigated. The analysis showed that the ratings for each property were strongly and systematically correlated with specific frequency bands in the two-stage spectral space. The subsequent experiment demonstrated further that manipulation of power at critical frequency bands significantly alters the perceived property of natural sounds in the predicted direction. The results suggest that the perceptual impression of sound texture is strongly dependent on the power distribution of first- and second-order acoustic filters in the early auditory system.
Collapse
Affiliation(s)
- Hironori Maruyama
- Graduate School of Arts and Sciences, The University of Tokyo, Meguro-ku, Tokyo, 153-8902, Japan
- Japan Society for the Promotion of Science (JSPS), Chiyoda-ku, Tokyo, 102-0083, Japan
| | - Isamu Motoyoshi
- Graduate School of Arts and Sciences, The University of Tokyo, Meguro-ku, Tokyo, 153-8902, Japan
| |
Collapse
|
2
|
Cherri D, Ozmeral EJ, Gallun FJ, Seitz AR, Eddins DA. Feasibility and Repeatability of an Abbreviated Auditory Perceptual and Cognitive Test Battery. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2025; 68:719-739. [PMID: 39700469 PMCID: PMC11842072 DOI: 10.1044/2024_jslhr-23-00590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 01/25/2024] [Accepted: 09/25/2024] [Indexed: 12/21/2024]
Abstract
PURPOSE Auditory perceptual and cognitive tasks can be useful as a long-term goal in guiding rehabilitation and intervention strategies in audiology clinics that mostly operate at a faster pace and on strict timelines. The rationale of this study was to assess test-retest reliability of an abbreviated test battery and evaluate age-related auditory perceptual and cognitive effects on these measures. METHOD Experiment 1 evaluated the test-retest repeatability of an abbreviated test battery and its use in an adverse listening environment. Ten participants performed two visits, each including four conditions: quiet, background noise, external noise, and background mixed with external noise. In Experiment 2, both auditory perceptual and cognitive assessments were collected from younger adults with normal hearing and older adults with and without hearing loss. The full test battery included measures of frequency selectivity, temporal fine structure and envelope processing, spectrotemporal and spatial processing and cognition, and an external measure of tolerance to background noise. RESULTS Results from Experiment 1 showed good test-retest repeatability and nonsignificant effects from background or external noise. In Experiment 2, effects of age and hearing loss were shown across auditory perceptual and cognitive measures, except in measures of temporal envelope perception and tolerance to background noise. CONCLUSIONS These data support the use of an abbreviated test battery in relatively uncontrolled listening environments such as clinic waiting rooms. With an efficient test battery, perceptual and cognitive deficits can be assessed with minimal resources and little clinician involvement due to the automated nature of the test and the use of consumer-grade technology. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.28021070.
Collapse
Affiliation(s)
- Dana Cherri
- Department of Communication Sciences and Disorders, University of South Florida, Tampa
| | - Erol J. Ozmeral
- Department of Communication Sciences and Disorders, University of South Florida, Tampa
| | | | - Aaron R. Seitz
- Department of Psychology, Northeastern University, Boston, MA
| | - David A. Eddins
- Department of Communication Sciences and Disorders, University of South Florida, Tampa
- Department of Communication Sciences and Disorders, University of Central Florida, Orlando
| |
Collapse
|
3
|
Giroud J, Trébuchon A, Mercier M, Davis MH, Morillon B. The human auditory cortex concurrently tracks syllabic and phonemic timescales via acoustic spectral flux. SCIENCE ADVANCES 2024; 10:eado8915. [PMID: 39705351 DOI: 10.1126/sciadv.ado8915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 11/15/2024] [Indexed: 12/22/2024]
Abstract
Dynamical theories of speech processing propose that the auditory cortex parses acoustic information in parallel at the syllabic and phonemic timescales. We developed a paradigm to independently manipulate both linguistic timescales, and acquired intracranial recordings from 11 patients who are epileptic listening to French sentences. Our results indicate that (i) syllabic and phonemic timescales are both reflected in the acoustic spectral flux; (ii) during comprehension, the auditory cortex tracks the syllabic timescale in the theta range, while neural activity in the alpha-beta range phase locks to the phonemic timescale; (iii) these neural dynamics occur simultaneously and share a joint spatial location; (iv) the spectral flux embeds two timescales-in the theta and low-beta ranges-across 17 natural languages. These findings help us understand how the human brain extracts acoustic information from the continuous speech signal at multiple timescales simultaneously, a prerequisite for subsequent linguistic processing.
Collapse
Affiliation(s)
- Jérémy Giroud
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| | - Agnès Trébuchon
- Aix Marseille Université, INSERM, INS, Institut de Neurosciences des Systèmes, Marseille, France
- APHM, Clinical Neurophysiology, Timone Hospital, Marseille, France
| | - Manuel Mercier
- Aix Marseille Université, INSERM, INS, Institut de Neurosciences des Systèmes, Marseille, France
| | - Matthew H Davis
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| | - Benjamin Morillon
- Aix Marseille Université, INSERM, INS, Institut de Neurosciences des Systèmes, Marseille, France
| |
Collapse
|
4
|
Huo M, Sun Y, Fogerty D, Tang Y. Release from same-talker speech-in-speech masking: Effects of masker intelligibility and other contributing factorsa). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 156:2960-2973. [PMID: 39485097 DOI: 10.1121/10.0034235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 10/11/2024] [Indexed: 11/03/2024]
Abstract
Human speech perception declines in the presence of masking speech, particularly when the masker is intelligible and acoustically similar to the target. A prior investigation demonstrated a substantial reduction in masking when the intelligibility of competing speech was reduced by corrupting voiced segments with noise [Huo, Sun, Fogerty, and Tang (2023), "Quantifying informational masking due to masker intelligibility in same-talker speech-in-speech perception," in Interspeech 2023, pp. 1783-1787]. As this processing also reduced the prominence of voiced segments, it was unclear whether the unmasking was due to reduced linguistic content, acoustic similarity, or both. The current study compared the masking of original competing speech (high intelligibility) to competing speech with time reversal of voiced segments (VS-reversed, low intelligibility) at various target-to-masker ratios. Modeling results demonstrated similar energetic masking between the two maskers. However, intelligibility of the target speech was considerably better with the VS-reversed masker compared to the original masker, likely due to the reduced linguistic content. Further corrupting the masker's voiced segments resulted in additional release from masking. Acoustic analyses showed that the portion of target voiced segments overlapping with masker voiced segments and the similarity between target and masker overlapped voiced segments impacted listeners' speech recognition. Evidence also suggested modulation masking in the spectro-temporal domain interferes with listeners' ability to glimpse the target.
Collapse
Affiliation(s)
- Mingyue Huo
- Department of Linguistics, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Yinglun Sun
- Department of Linguistics, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Daniel Fogerty
- Department of Speech & Hearing Science, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Yan Tang
- Department of Linguistics, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, USA
- Beckman Institute for Advanced Science and Technology, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, USA
| |
Collapse
|
5
|
Nechaev D, Milekhina O, Tomozova M, Supin A. Hearing Sensitivity to Gliding Rippled Spectra in Hearing-Impaired Listeners. Audiol Res 2024; 14:928-938. [PMID: 39585000 PMCID: PMC11587035 DOI: 10.3390/audiolres14060078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Revised: 10/18/2024] [Accepted: 10/22/2024] [Indexed: 11/26/2024] Open
Abstract
OBJECTIVES Sensitivity to the gliding of ripples in rippled-spectrum signals was measured in both normal-hearing and hearing-impaired listeners. METHODS The test signal was a 2 oct wide rippled noise centered at 2 kHz, with the ripples gliding downward along the frequency scale. Both the gliding velocity and ripple density were frequency-proportional across the signal band. Ripple density was specified in ripples/oct and velocity was specified in oct/s. The listener's task was to discriminate between the signal with gliding ripples and the non-rippled signal. RESULTS In all listener groups, increasing the ripple density decreased the maximal velocity of detectable ripple gliding. The velocity limit of ripple gliding decreased with hearing loss. CONCLUSIONS The results can be explained by deteriorated temporal resolution in hearing-impaired listeners.
Collapse
Affiliation(s)
- Dmitry Nechaev
- A.N. Severtsov Institute of Ecology and Evolution, 119071 Moscow, Russia; (O.M.); (M.T.); (A.S.)
| | | | | | | |
Collapse
|
6
|
Clonan AC, Zhai X, Stevenson IH, Escabí MA. Interference of mid-level sound statistics underlie human speech recognition sensitivity in natural noise. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.13.579526. [PMID: 38405870 PMCID: PMC10888804 DOI: 10.1101/2024.02.13.579526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Recognizing speech in noise, such as in a busy restaurant, is an essential cognitive skill where the task difficulty varies across environments and noise levels. Although there is growing evidence that the auditory system relies on statistical representations for perceiving 1-5 and coding4,6-9 natural sounds, it's less clear how statistical cues and neural representations contribute to segregating speech in natural auditory scenes. We demonstrate that human listeners rely on mid-level statistics to segregate and recognize speech in environmental noise. Using natural backgrounds and variants with perturbed spectro-temporal statistics, we show that speech recognition accuracy at a fixed noise level varies extensively across natural backgrounds (0% to 100%). Furthermore, for each background the unique interference created by summary statistics can mask or unmask speech, thus hindering or improving speech recognition. To identify the neural coding strategy and statistical cues that influence accuracy, we developed generalized perceptual regression, a framework that links summary statistics from a neural model to word recognition accuracy. Whereas a peripheral cochlear model accounts for only 60% of perceptual variance, summary statistics from a mid-level auditory midbrain model accurately predicts single trial sensory judgments, accounting for more than 90% of the perceptual variance. Furthermore, perceptual weights from the regression framework identify which statistics and tuned neural filters are influential and how they impact recognition. Thus, perception of speech in natural backgrounds relies on a mid-level auditory representation involving interference of multiple summary statistics that impact recognition beneficially or detrimentally across natural background sounds.
Collapse
Affiliation(s)
- Alex C Clonan
- Electrical and Computer Engineering, University of Connecticut, Storrs, CT 06269
- Biomedical Engineering, University of Connecticut, Storrs, CT 06269
- Institute of Brain and Cognitive Sciences, University of Connecticut, Storrs, CT 06269
| | - Xiu Zhai
- Biomedical Engineering, Wentworth Institute of Technology, Boston, MA 02115
| | - Ian H Stevenson
- Biomedical Engineering, University of Connecticut, Storrs, CT 06269
- Psychological Sciences, University of Connecticut, Storrs, CT 06269
- Institute of Brain and Cognitive Sciences, University of Connecticut, Storrs, CT 06269
| | - Monty A Escabí
- Electrical and Computer Engineering, University of Connecticut, Storrs, CT 06269
- Psychological Sciences, University of Connecticut, Storrs, CT 06269
- Institute of Brain and Cognitive Sciences, University of Connecticut, Storrs, CT 06269
| |
Collapse
|
7
|
Fogerty D, Ahlstrom JB, Dubno JR. Attenuation and distortion components of age-related hearing loss: Contributions to recognizing temporal-envelope filtered speech in modulated noise. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 156:93-106. [PMID: 38958486 PMCID: PMC11223777 DOI: 10.1121/10.0026450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 05/29/2024] [Accepted: 05/30/2024] [Indexed: 07/04/2024]
Abstract
Older adults with hearing loss may experience difficulty recognizing speech in noise due to factors related to attenuation (e.g., reduced audibility and sensation levels, SLs) and distortion (e.g., reduced temporal fine structure, TFS, processing). Furthermore, speech recognition may improve when the amplitude modulation spectrum of the speech and masker are non-overlapping. The current study investigated this by filtering the amplitude modulation spectrum into different modulation rates for speech and speech-modulated noise. The modulation depth of the noise was manipulated to vary the SL of speech glimpses. Younger adults with normal hearing and older adults with normal or impaired hearing listened to natural speech or speech vocoded to degrade TFS cues. Control groups of younger adults were tested on all conditions with spectrally shaped speech and threshold matching noise, which reduced audibility to match that of the older hearing-impaired group. All groups benefitted from increased masker modulation depth and preservation of syllabic-rate speech modulations. Older adults with hearing loss had reduced speech recognition across all conditions. This was explained by factors related to attenuation, due to reduced SLs, and distortion, due to reduced TFS processing, which resulted in poorer auditory processing of speech cues during the dips of the masker.
Collapse
Affiliation(s)
- Daniel Fogerty
- Department of Speech and Hearing Science, University of Illinois Urbana-Champaign, Champaign, Illinois 61820, USA
| | - Jayne B Ahlstrom
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, Charleston, South Carolina 29425, USA
| | - Judy R Dubno
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, Charleston, South Carolina 29425, USA
| |
Collapse
|
8
|
Guest DR, Rajappa N, Oxenham AJ. Limitations in human auditory spectral analysis at high frequencies. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 156:326-340. [PMID: 38990035 PMCID: PMC11240212 DOI: 10.1121/10.0026475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 06/04/2024] [Accepted: 06/07/2024] [Indexed: 07/12/2024]
Abstract
Humans are adept at identifying spectral patterns, such as vowels, in different rooms, at different sound levels, or produced by different talkers. How this feat is achieved remains poorly understood. Two psychoacoustic analogs of spectral pattern recognition are spectral profile analysis and spectrotemporal ripple direction discrimination. This study tested whether pattern-recognition abilities observed previously at low frequencies are also observed at extended high frequencies. At low frequencies (center frequency ∼500 Hz), listeners were able to achieve accurate profile-analysis thresholds, consistent with prior literature. However, at extended high frequencies (center frequency ∼10 kHz), listeners' profile-analysis thresholds were either unmeasurable or could not be distinguished from performance based on overall loudness cues. A similar pattern of results was observed with spectral ripple discrimination, where performance was again considerably better at low than at high frequencies. Collectively, these results suggest a severe deficit in listeners' ability to analyze patterns of intensity across frequency in the extended high-frequency region that cannot be accounted for by cochlear frequency selectivity. One interpretation is that the auditory system is not optimized to analyze such fine-grained across-frequency profiles at extended high frequencies, as they are not typically informative for everyday sounds.
Collapse
Affiliation(s)
- Daniel R Guest
- Department of Biomedical Engineering, University of Rochester, Rochester, New York 14642, USA
| | - Neha Rajappa
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | - Andrew J Oxenham
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota 55455, USA
| |
Collapse
|
9
|
Gulati D, Ray S. Auditory and Visual Gratings Elicit Distinct Gamma Responses. eNeuro 2024; 11:ENEURO.0116-24.2024. [PMID: 38604776 PMCID: PMC11046261 DOI: 10.1523/eneuro.0116-24.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Revised: 04/01/2024] [Accepted: 04/03/2024] [Indexed: 04/13/2024] Open
Abstract
Sensory stimulation is often accompanied by fluctuations at high frequencies (>30 Hz) in brain signals. These could be "narrowband" oscillations in the gamma band (30-70 Hz) or nonoscillatory "broadband" high-gamma (70-150 Hz) activity. Narrowband gamma oscillations, which are induced by presenting some visual stimuli such as gratings and have been shown to weaken with healthy aging and the onset of Alzheimer's disease, hold promise as potential biomarkers. However, since delivering visual stimuli is cumbersome as it requires head stabilization for eye tracking, an equivalent auditory paradigm could be useful. Although simple auditory stimuli have been shown to produce high-gamma activity, whether specific auditory stimuli can also produce narrowband gamma oscillations is unknown. We tested whether auditory ripple stimuli, which are considered an analog to visual gratings, could elicit narrowband oscillations in auditory areas. We recorded 64-channel electroencephalogram from male and female (18 each) subjects while they either fixated on the monitor while passively viewing static visual gratings or listened to stationary and moving ripples, played using loudspeakers, with their eyes open or closed. We found that while visual gratings induced narrowband gamma oscillations with suppression in the alpha band (8-12 Hz), auditory ripples did not produce narrowband gamma but instead elicited very strong broadband high-gamma response and suppression in the beta band (14-26 Hz). Even though we used equivalent stimuli in both modalities, our findings indicate that the underlying neuronal circuitry may not share ubiquitous strategies for stimulus processing.
Collapse
Affiliation(s)
- Divya Gulati
- Centre for Neuroscience, Indian Institute of Science, Bengaluru 560012, India
| | - Supratim Ray
- Centre for Neuroscience, Indian Institute of Science, Bengaluru 560012, India
| |
Collapse
|
10
|
Zaar J, Simonsen LB, Laugesen S. A spectro-temporal modulation test for predicting speech reception in hearing-impaired listeners with hearing aids. Hear Res 2024; 443:108949. [PMID: 38281473 DOI: 10.1016/j.heares.2024.108949] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Revised: 12/15/2023] [Accepted: 01/03/2024] [Indexed: 01/30/2024]
Abstract
Spectro-temporal modulation (STM) detection sensitivity has been shown to be associated with speech-in-noise reception in hearing-impaired (HI) individuals. Based on previous research, a recent study [Zaar, Simonsen, Dau, and Laugesen (2023). Hear Res 427:108650] introduced an STM test paradigm with audibility compensation, employing STM stimulus variants using noise and complex tones as carrier signals. The study demonstrated that the test was suitable for the target population of elderly individuals with moderate-to-severe hearing loss and showed promising predictions of speech-reception thresholds (SRTs) measured in a realistic set up with spatially distributed speech and noise maskers and linear audibility compensation. The present study further investigated the suggested STM test with respect to (i) test-retest variability for the most promising STM stimulus variants, (ii) its predictive power with respect to realistic speech-in-noise reception with non-linear hearing-aid amplification, (iii) its connection to effects of directionality and noise reduction (DIR+NR) hearing-aid processing, and (iv) its relation to DIR+NR preference. Thirty elderly HI participants were tested in a combined laboratory and field study, collecting STM thresholds with a complex-tone based and a noise-based STM stimulus design, SRTs with spatially distributed speech and noise maskers using hearing aids with non-linear amplification and two different levels of DIR+NR, as well as subjective reports and preference ratings obtained in two field periods with the two DIR+NR hearing-aid settings. The results indicate that the noise-carrier based STM test variant (i) showed optimal test-retest properties, (ii) yielded a highly significant correlation with SRTs (R2=0.61) exceeding and complementing the predictive power of the audiogram, (iii) yielded significant correlation (R2=0.51) with the DIR+NR-induced SRT benefit, and (iv) did not provide significant correlation with subjective preference for DIR+NR settings in the field. Overall, the suggested STM test represents a valuable tool for diagnosing speech-reception problems that remain when hearing-aid amplification has been provided and the resulting need for and benefit from DIR+NR hearing-aid processing.
Collapse
Affiliation(s)
- Johannes Zaar
- Eriksholm Research Centre, DK-3070 Snekkersten, Denmark; Hearing Systems Section, Department of Health Technology,Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.
| | - Lisbeth Birkelund Simonsen
- Hearing Systems Section, Department of Health Technology,Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark; Interacoustics Research Unit, DK-2800, Kgs. Lyngby, Denmark
| | - Søren Laugesen
- Interacoustics Research Unit, DK-2800, Kgs. Lyngby, Denmark
| |
Collapse
|
11
|
Zoefel B, Kösem A. Neural tracking of continuous acoustics: properties, speech-specificity and open questions. Eur J Neurosci 2024; 59:394-414. [PMID: 38151889 DOI: 10.1111/ejn.16221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 11/17/2023] [Accepted: 11/22/2023] [Indexed: 12/29/2023]
Abstract
Human speech is a particularly relevant acoustic stimulus for our species, due to its role of information transmission during communication. Speech is inherently a dynamic signal, and a recent line of research focused on neural activity following the temporal structure of speech. We review findings that characterise neural dynamics in the processing of continuous acoustics and that allow us to compare these dynamics with temporal aspects in human speech. We highlight properties and constraints that both neural and speech dynamics have, suggesting that auditory neural systems are optimised to process human speech. We then discuss the speech-specificity of neural dynamics and their potential mechanistic origins and summarise open questions in the field.
Collapse
Affiliation(s)
- Benedikt Zoefel
- Centre de Recherche Cerveau et Cognition (CerCo), CNRS UMR 5549, Toulouse, France
- Université de Toulouse III Paul Sabatier, Toulouse, France
| | - Anne Kösem
- Lyon Neuroscience Research Center (CRNL), INSERM U1028, Bron, France
| |
Collapse
|
12
|
Gao J, Chen H, Fang M, Ding N. Original speech and its echo are segregated and separately processed in the human brain. PLoS Biol 2024; 22:e3002498. [PMID: 38358954 PMCID: PMC10868781 DOI: 10.1371/journal.pbio.3002498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 01/15/2024] [Indexed: 02/17/2024] Open
Abstract
Speech recognition crucially relies on slow temporal modulations (<16 Hz) in speech. Recent studies, however, have demonstrated that the long-delay echoes, which are common during online conferencing, can eliminate crucial temporal modulations in speech but do not affect speech intelligibility. Here, we investigated the underlying neural mechanisms. MEG experiments demonstrated that cortical activity can effectively track the temporal modulations eliminated by an echo, which cannot be fully explained by basic neural adaptation mechanisms. Furthermore, cortical responses to echoic speech can be better explained by a model that segregates speech from its echo than by a model that encodes echoic speech as a whole. The speech segregation effect was observed even when attention was diverted but would disappear when segregation cues, i.e., speech fine structure, were removed. These results strongly suggested that, through mechanisms such as stream segregation, the auditory system can build an echo-insensitive representation of speech envelope, which can support reliable speech recognition.
Collapse
Affiliation(s)
- Jiaxin Gao
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Honghua Chen
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Mingxuan Fang
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Nai Ding
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
- Nanhu Brain-computer Interface Institute, Hangzhou, China
- The State key Lab of Brain-Machine Intelligence; The MOE Frontier Science Center for Brain Science & Brain-machine Integration, Zhejiang University, Hangzhou, China
| |
Collapse
|
13
|
Leonard MK, Gwilliams L, Sellers KK, Chung JE, Xu D, Mischler G, Mesgarani N, Welkenhuysen M, Dutta B, Chang EF. Large-scale single-neuron speech sound encoding across the depth of human cortex. Nature 2024; 626:593-602. [PMID: 38093008 PMCID: PMC10866713 DOI: 10.1038/s41586-023-06839-2] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 11/06/2023] [Indexed: 01/31/2024]
Abstract
Understanding the neural basis of speech perception requires that we study the human brain both at the scale of the fundamental computational unit of neurons and in their organization across the depth of cortex. Here we used high-density Neuropixels arrays1-3 to record from 685 neurons across cortical layers at nine sites in a high-level auditory region that is critical for speech, the superior temporal gyrus4,5, while participants listened to spoken sentences. Single neurons encoded a wide range of speech sound cues, including features of consonants and vowels, relative vocal pitch, onsets, amplitude envelope and sequence statistics. Neurons at each cross-laminar recording exhibited dominant tuning to a primary speech feature while also containing a substantial proportion of neurons that encoded other features contributing to heterogeneous selectivity. Spatially, neurons at similar cortical depths tended to encode similar speech features. Activity across all cortical layers was predictive of high-frequency field potentials (electrocorticography), providing a neuronal origin for macroelectrode recordings from the cortical surface. Together, these results establish single-neuron tuning across the cortical laminae as an important dimension of speech encoding in human superior temporal gyrus.
Collapse
Affiliation(s)
- Matthew K Leonard
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Laura Gwilliams
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Kristin K Sellers
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Jason E Chung
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Duo Xu
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Gavin Mischler
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
- Department of Electrical Engineering, Columbia University, New York, NY, USA
| | - Nima Mesgarani
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
- Department of Electrical Engineering, Columbia University, New York, NY, USA
| | | | | | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA.
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
14
|
López-Ramos D, Marrufo-Pérez MI, Eustaquio-Martín A, López-Bascuas LE, Lopez-Poveda EA. Adaptation to Noise in Spectrotemporal Modulation Detection and Word Recognition. Trends Hear 2024; 28:23312165241266322. [PMID: 39267369 PMCID: PMC11401146 DOI: 10.1177/23312165241266322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 06/10/2024] [Accepted: 06/12/2024] [Indexed: 09/17/2024] Open
Abstract
Noise adaptation is the improvement in auditory function as the signal of interest is delayed in the noise. Here, we investigated if noise adaptation occurs in spectral, temporal, and spectrotemporal modulation detection as well as in speech recognition. Eighteen normal-hearing adults participated in the experiments. In the modulation detection tasks, the signal was a 200ms spectrally and/or temporally modulated ripple noise. The spectral modulation rate was two cycles per octave, the temporal modulation rate was 10 Hz, and the spectrotemporal modulations combined these two modulations, which resulted in a downward-moving ripple. A control experiment was performed to determine if the results generalized to upward-moving ripples. In the speech recognition task, the signal consisted of disyllabic words unprocessed or vocoded to maintain only envelope cues. Modulation detection thresholds at 0 dB signal-to-noise ratio and speech reception thresholds were measured in quiet and in white noise (at 60 dB SPL) for noise-signal onset delays of 50 ms (early condition) and 800 ms (late condition). Adaptation was calculated as the threshold difference between the early and late conditions. Adaptation in word recognition was statistically significant for vocoded words (2.1 dB) but not for natural words (0.6 dB). Adaptation was found to be statistically significant in spectral (2.1 dB) and temporal (2.2 dB) modulation detection but not in spectrotemporal modulation detection (downward ripple: 0.0 dB, upward ripple: -0.4 dB). Findings suggest that noise adaptation in speech recognition is unrelated to improvements in the encoding of spectrotemporal modulation cues.
Collapse
Affiliation(s)
- David López-Ramos
- Instituto de Neurociencias de Castilla y León, Universidad de Salamanca, Salamanca, Spain
- Instituto de Investigación Biomédica de Salamanca, Universidad de Salamanca, Salamanca, Spain
| | - Miriam I. Marrufo-Pérez
- Instituto de Neurociencias de Castilla y León, Universidad de Salamanca, Salamanca, Spain
- Instituto de Investigación Biomédica de Salamanca, Universidad de Salamanca, Salamanca, Spain
| | - Almudena Eustaquio-Martín
- Instituto de Neurociencias de Castilla y León, Universidad de Salamanca, Salamanca, Spain
- Instituto de Investigación Biomédica de Salamanca, Universidad de Salamanca, Salamanca, Spain
| | - Luis E. López-Bascuas
- Departamento de Psicología Experimental, Procesos Cognitivos y Logopedia, Universidad Complutense de Madrid, Madrid, Spain
| | - Enrique A. Lopez-Poveda
- Instituto de Neurociencias de Castilla y León, Universidad de Salamanca, Salamanca, Spain
- Instituto de Investigación Biomédica de Salamanca, Universidad de Salamanca, Salamanca, Spain
- Departamento de Cirugía, Facultad de Medicina, Universidad de Salamanca, Salamanca, Spain
| |
Collapse
|
15
|
van der Willigen RF, Versnel H, van Opstal AJ. Spectral-temporal processing of naturalistic sounds in monkeys and humans. J Neurophysiol 2024; 131:38-63. [PMID: 37965933 PMCID: PMC11305640 DOI: 10.1152/jn.00129.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 10/23/2023] [Accepted: 11/13/2023] [Indexed: 11/16/2023] Open
Abstract
Human speech and vocalizations in animals are rich in joint spectrotemporal (S-T) modulations, wherein acoustic changes in both frequency and time are functionally related. In principle, the primate auditory system could process these complex dynamic sounds based on either an inseparable representation of S-T features or, alternatively, a separable representation. The separability hypothesis implies an independent processing of spectral and temporal modulations. We collected comparative data on the S-T hearing sensitivity in humans and macaque monkeys to a wide range of broadband dynamic spectrotemporal ripple stimuli employing a yes-no signal-detection task. Ripples were systematically varied, as a function of density (spectral modulation frequency), velocity (temporal modulation frequency), or modulation depth, to cover a listener's full S-T modulation sensitivity, derived from a total of 87 psychometric ripple detection curves. Audiograms were measured to control for normal hearing. Determined were hearing thresholds, reaction time distributions, and S-T modulation transfer functions (MTFs), both at the ripple detection thresholds and at suprathreshold modulation depths. Our psychophysically derived MTFs are consistent with the hypothesis that both monkeys and humans employ analogous perceptual strategies: S-T acoustic information is primarily processed separable. Singular value decomposition (SVD), however, revealed a small, but consistent, inseparable spectral-temporal interaction. Finally, SVD analysis of the known visual spatiotemporal contrast sensitivity function (CSF) highlights that human vision is space-time inseparable to a much larger extent than is the case for S-T sensitivity in hearing. Thus, the specificity with which the primate brain encodes natural sounds appears to be less strict than is required to adequately deal with natural images.NEW & NOTEWORTHY We provide comparative data on primate audition of naturalistic sounds comprising hearing thresholds, reaction time distributions, and spectral-temporal modulation transfer functions. Our psychophysical experiments demonstrate that auditory information is primarily processed in a spectral-temporal-independent manner by both monkeys and humans. Singular value decomposition of known visual spatiotemporal contrast sensitivity, in comparison to our auditory spectral-temporal sensitivity, revealed a striking contrast in how the brain encodes natural sounds as opposed to natural images, as vision appears to be space-time inseparable.
Collapse
Affiliation(s)
- Robert F van der Willigen
- Section Neurophysics, Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
- School of Communication, Media and Information Technology, Rotterdam University of Applied Sciences, Rotterdam, The Netherlands
- Research Center Creating 010, Rotterdam University of Applied Sciences, Rotterdam, The Netherlands
| | - Huib Versnel
- Section Neurophysics, Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
- Department of Otorhinolaryngology and Head & Neck Surgery, UMC Utrecht Brain Center, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - A John van Opstal
- Section Neurophysics, Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| |
Collapse
|
16
|
Fogerty D, Ahlstrom JB, Dubno JR. Sentence recognition with modulation-filtered speech segments for younger and older adults: Effects of hearing impairment and cognition. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:3328-3343. [PMID: 37983296 PMCID: PMC10663055 DOI: 10.1121/10.0022445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 10/23/2023] [Accepted: 11/01/2023] [Indexed: 11/22/2023]
Abstract
This study investigated word recognition for sentences temporally filtered within and across acoustic-phonetic segments providing primarily vocalic or consonantal cues. Amplitude modulation was filtered at syllabic (0-8 Hz) or slow phonemic (8-16 Hz) rates. Sentence-level modulation properties were also varied by amplifying or attenuating segments. Participants were older adults with normal or impaired hearing. Older adult speech recognition was compared to groups of younger normal-hearing adults who heard speech unmodified or spectrally shaped with and without threshold matching noise that matched audibility to hearing-impaired thresholds. Participants also completed cognitive and speech recognition measures. Overall, results confirm the primary contribution of syllabic speech modulations to recognition and demonstrate the importance of these modulations across vowel and consonant segments. Group differences demonstrated a hearing loss-related impairment in processing modulation-filtered speech, particularly at 8-16 Hz. This impairment could not be fully explained by age or poorer audibility. Principal components analysis identified a single factor score that summarized speech recognition across modulation-filtered conditions; analysis of individual differences explained 81% of the variance in this summary factor among the older adults with hearing loss. These results suggest that a combination of cognitive abilities and speech glimpsing abilities contribute to speech recognition in this group.
Collapse
Affiliation(s)
- Daniel Fogerty
- Department of Speech and Hearing Science, University of Illinois Urbana-Champaign, Champaign, Illinois 61820, USA
| | - Jayne B Ahlstrom
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, Charleston, South Carolina 29425, USA
| | - Judy R Dubno
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, Charleston, South Carolina 29425, USA
| |
Collapse
|
17
|
Pomper U, Curetti LZ, Chait M. Neural dynamics underlying successful auditory short-term memory performance. Eur J Neurosci 2023; 58:3859-3878. [PMID: 37691137 PMCID: PMC10946728 DOI: 10.1111/ejn.16140] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 08/18/2023] [Accepted: 08/19/2023] [Indexed: 09/12/2023]
Abstract
Listeners often operate in complex acoustic environments, consisting of many concurrent sounds. Accurately encoding and maintaining such auditory objects in short-term memory is crucial for communication and scene analysis. Yet, the neural underpinnings of successful auditory short-term memory (ASTM) performance are currently not well understood. To elucidate this issue, we presented a novel, challenging auditory delayed match-to-sample task while recording MEG. Human participants listened to 'scenes' comprising three concurrent tone pip streams. The task was to indicate, after a delay, whether a probe stream was present in the just-heard scene. We present three key findings: First, behavioural performance revealed faster responses in correct versus incorrect trials as well as in 'probe present' versus 'probe absent' trials, consistent with ASTM search. Second, successful compared with unsuccessful ASTM performance was associated with a significant enhancement of event-related fields and oscillatory activity in the theta, alpha and beta frequency ranges. This extends previous findings of an overall increase of persistent activity during short-term memory performance. Third, using distributed source modelling, we found these effects to be confined mostly to sensory areas during encoding, presumably related to ASTM contents per se. Parietal and frontal sources then became relevant during the maintenance stage, indicating that effective STM operation also relies on ongoing inhibitory processes suppressing task-irrelevant information. In summary, our results deliver a detailed account of the neural patterns that differentiate successful from unsuccessful ASTM performance in the context of a complex, multi-object auditory scene.
Collapse
Affiliation(s)
- Ulrich Pomper
- Ear InstituteUniversity College LondonLondonUK
- Faculty of PsychologyUniversity of ViennaViennaAustria
| | | | - Maria Chait
- Ear InstituteUniversity College LondonLondonUK
| |
Collapse
|
18
|
Van Opstal AJ, Noordanus E. Towards personalized and optimized fitting of cochlear implants. Front Neurosci 2023; 17:1183126. [PMID: 37521701 PMCID: PMC10372492 DOI: 10.3389/fnins.2023.1183126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 06/21/2023] [Indexed: 08/01/2023] Open
Abstract
A cochlear implant (CI) is a neurotechnological device that restores total sensorineural hearing loss. It contains a sophisticated speech processor that analyzes and transforms the acoustic input. It distributes its time-enveloped spectral content to the auditory nerve as electrical pulsed stimulation trains of selected frequency channels on a multi-contact electrode that is surgically inserted in the cochlear duct. This remarkable brain interface enables the deaf to regain hearing and understand speech. However, tuning of the large (>50) number of parameters of the speech processor, so-called "device fitting," is a tedious and complex process, which is mainly carried out in the clinic through 'one-size-fits-all' procedures. Current fitting typically relies on limited and often subjective data that must be collected in limited time. Despite the success of the CI as a hearing-restoration device, variability in speech-recognition scores among users is still very large, and mostly unexplained. The major factors that underly this variability incorporate three levels: (i) variability in auditory-system malfunction of CI-users, (ii) variability in the selectivity of electrode-to-auditory nerve (EL-AN) activation, and (iii) lack of objective perceptual measures to optimize the fitting. We argue that variability in speech recognition can only be alleviated by using objective patient-specific data for an individualized fitting procedure, which incorporates knowledge from all three levels. In this paper, we propose a series of experiments, aimed at collecting a large amount of objective (i.e., quantitative, reproducible, and reliable) data that characterize the three processing levels of the user's auditory system. Machine-learning algorithms that process these data will eventually enable the clinician to derive reliable and personalized characteristics of the user's auditory system, the quality of EL-AN signal transfer, and predictions of the perceptual effects of changes in the current fitting.
Collapse
Affiliation(s)
- A. John Van Opstal
- Donders Centre for Neuroscience, Section Neurophysics, Radboud University, Nijmegen, Netherlands
| | | |
Collapse
|
19
|
Acoustic correlates of the syllabic rhythm of speech: Modulation spectrum or local features of the temporal envelope. Neurosci Biobehav Rev 2023; 147:105111. [PMID: 36822385 DOI: 10.1016/j.neubiorev.2023.105111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 12/04/2022] [Accepted: 02/19/2023] [Indexed: 02/25/2023]
Abstract
The syllable is a perceptually salient unit in speech. Since both the syllable and its acoustic correlate, i.e., the speech envelope, have a preferred range of rhythmicity between 4 and 8 Hz, it is hypothesized that theta-band neural oscillations play a major role in extracting syllables based on the envelope. A literature survey, however, reveals inconsistent evidence about the relationship between speech envelope and syllables, and the current study revisits this question by analyzing large speech corpora. It is shown that the center frequency of speech envelope, characterized by the modulation spectrum, reliably correlates with the rate of syllables only when the analysis is pooled over minutes of speech recordings. In contrast, in the time domain, a component of the speech envelope is reliably phase-locked to syllable onsets. Based on a speaker-independent model, the timing of syllable onsets explains about 24% variance of the speech envelope. These results indicate that local features in the speech envelope, instead of the modulation spectrum, are a more reliable acoustic correlate of syllables.
Collapse
|
20
|
Ding N, Gao J, Wang J, Sun W, Fang M, Liu X, Zhao H. Speech recognition in echoic environments and the effect of aging and hearing impairment. Hear Res 2023; 431:108725. [PMID: 36931021 DOI: 10.1016/j.heares.2023.108725] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 02/12/2023] [Accepted: 02/23/2023] [Indexed: 03/01/2023]
Abstract
Temporal modulations provide critical cues for speech recognition. When the temporal modulations are distorted by, e.g., reverberations, speech intelligibility drops, and the drop in speech intelligibility can be explained by the amount of distortions to the speech modulation spectrum, i.e., the spectrum of temporal modulations. Here, we test a condition in which speech is contaminated by a single echo. Speech is delayed by either 0.125 s or 0.25 s to create an echo, and these two conditions notch out the temporal modulations at 2 or 4 Hz, respectively. We evaluate how well young and older listeners can recognize such echoic speech. For young listeners, the speech recognition rate is not influenced by the echo, even when they are exposed to the first echoic sentence. For older listeners, the speech recognition rate drops to less than 60% when listening to the first echoic sentence, but rapidly recovers to above 75% with exposure to a few sentences. Further analyses reveal that both age and the hearing threshold influence the recognition of echoic speech for the older listeners. These results show that the recognition of echoic speech cannot be fully explained by distortions to the modulation spectrum, and suggest that the auditory system has mechanisms to effectively compensate the influence of single echoes.
Collapse
Affiliation(s)
- Nai Ding
- College of Biomedical Engineering and Instrument Science,Department of Nursing, The Second Affiliated Hospital of Zhejiang University School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Jiaxin Gao
- College of Biomedical Engineering and Instrument Science,Department of Nursing, The Second Affiliated Hospital of Zhejiang University School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Jing Wang
- College of Biomedical Engineering and Instrument Science,Department of Nursing, The Second Affiliated Hospital of Zhejiang University School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Wenhui Sun
- Research Center for Applied Mathematics and Machine Intelligence, Research Institute of Basic Theories, Zhejiang Lab, Hangzhou, Zhejiang, China
| | - Mingxuan Fang
- College of Biomedical Engineering and Instrument Science,Department of Nursing, The Second Affiliated Hospital of Zhejiang University School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Xiaoling Liu
- College of Biomedical Engineering and Instrument Science,Department of Nursing, The Second Affiliated Hospital of Zhejiang University School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Hua Zhao
- College of Biomedical Engineering and Instrument Science,Department of Nursing, The Second Affiliated Hospital of Zhejiang University School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China.
| |
Collapse
|
21
|
He F, Stevenson IH, Escabí MA. Two stages of bandwidth scaling drives efficient neural coding of natural sounds. PLoS Comput Biol 2023; 19:e1010862. [PMID: 36787338 PMCID: PMC9970106 DOI: 10.1371/journal.pcbi.1010862] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 02/27/2023] [Accepted: 01/09/2023] [Indexed: 02/15/2023] Open
Abstract
Theories of efficient coding propose that the auditory system is optimized for the statistical structure of natural sounds, yet the transformations underlying optimal acoustic representations are not well understood. Using a database of natural sounds including human speech and a physiologically-inspired auditory model, we explore the consequences of peripheral (cochlear) and mid-level (auditory midbrain) filter tuning transformations on the representation of natural sound spectra and modulation statistics. Whereas Fourier-based sound decompositions have constant time-frequency resolution at all frequencies, cochlear and auditory midbrain filters bandwidths increase proportional to the filter center frequency. This form of bandwidth scaling produces a systematic decrease in spectral resolution and increase in temporal resolution with increasing frequency. Here we demonstrate that cochlear bandwidth scaling produces a frequency-dependent gain that counteracts the tendency of natural sound power to decrease with frequency, resulting in a whitened output representation. Similarly, bandwidth scaling in mid-level auditory filters further enhances the representation of natural sounds by producing a whitened modulation power spectrum (MPS) with higher modulation entropy than both the cochlear outputs and the conventional Fourier MPS. These findings suggest that the tuning characteristics of the peripheral and mid-level auditory system together produce a whitened output representation in three dimensions (frequency, temporal and spectral modulation) that reduces redundancies and allows for a more efficient use of neural resources. This hierarchical multi-stage tuning strategy is thus likely optimized to extract available information and may underlies perceptual sensitivity to natural sounds.
Collapse
Affiliation(s)
- Fengrong He
- Biomedical Engineering, University of Connecticut, Storrs, Connecticut, United States of America
| | - Ian H. Stevenson
- Biomedical Engineering, University of Connecticut, Storrs, Connecticut, United States of America
- Psychological Sciences, University of Connecticut, Storrs, Connecticut, United States of America
- The Connecticut Institute for Brain and Cognitive Sciences, University of Connecticut, Storrs, Connecticut, United States of America
| | - Monty A. Escabí
- Biomedical Engineering, University of Connecticut, Storrs, Connecticut, United States of America
- Psychological Sciences, University of Connecticut, Storrs, Connecticut, United States of America
- The Connecticut Institute for Brain and Cognitive Sciences, University of Connecticut, Storrs, Connecticut, United States of America
- Electrical and Computer Engineering, University of Connecticut, Storrs, Connecticut, United States of America
- * E-mail:
| |
Collapse
|
22
|
Zaar J, Simonsen LB, Dau T, Laugesen S. Toward a clinically viable spectro-temporal modulation test for predicting supra-threshold speech reception in hearing-impaired listeners. Hear Res 2023; 427:108650. [PMID: 36463632 DOI: 10.1016/j.heares.2022.108650] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 11/05/2022] [Accepted: 11/12/2022] [Indexed: 11/23/2022]
Abstract
The ability of hearing-impaired listeners to detect spectro-temporal modulation (STM) has been shown to correlate with individual listeners' speech reception performance. However, the STM detection tests used in previous studies were overly challenging especially for elderly listeners with moderate-to-severe hearing loss. Furthermore, the speech tests considered as a reference were not optimized to yield ecologically valid outcomes that represent real-life speech reception deficits. The present study investigated an STM detection measurement paradigm with individualized audibility compensation, focusing on its clinical viability and relevance as a real-life supra-threshold speech intelligibility predictor. STM thresholds were measured in 13 elderly hearing-impaired native Danish listeners using four previously established (noise-carrier based) and two novel complex-tone carrier based STM stimulus variants. Speech reception thresholds (SRTs) were measured (i) in a realistic spatial speech-on-speech set up and (ii) using co-located stationary noise, both with individualized amplification. In contrast with previous related studies, the proposed measurement paradigm yielded robust STM thresholds for all listeners and conditions. The STM thresholds were positively correlated with the SRTs, whereby significant correlations were found for the realistic speech-test condition but not for the stationary-noise condition. Three STM stimulus variants (one noise-carrier based and two complex-tone based) yielded significant predictions of SRTs, accounting for up to 53% of the SRT variance. The results of the study could form the basis for a clinically viable STM test for quantifying supra-threshold speech reception deficits in aided hearing-impaired listeners.
Collapse
Affiliation(s)
- Johannes Zaar
- Eriksholm Research Centre, DK-3070 Snekkersten, Denmark; Hearing Systems Section, Department of Health Technology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.
| | | | - Torsten Dau
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
| | - Søren Laugesen
- Interacoustics Research Unit, DK-2800, Kgs. Lyngby, Denmark
| |
Collapse
|
23
|
Edraki A, Chan WY, Jensen J, Fogerty D. Spectro-temporal modulation glimpsing for speech intelligibility prediction. Hear Res 2022; 426:108620. [PMID: 36175300 PMCID: PMC10125146 DOI: 10.1016/j.heares.2022.108620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 09/14/2022] [Accepted: 09/20/2022] [Indexed: 11/22/2022]
Abstract
We compare two alternative speech intelligibility prediction algorithms: time-frequency glimpse proportion (GP) and spectro-temporal glimpsing index (STGI). Both algorithms hypothesize that listeners understand speech in challenging acoustic environments by "glimpsing" partially available information from degraded speech. GP defines glimpses as those time-frequency regions whose local signal-to-noise ratio is above a certain threshold and estimates intelligibility as the proportion of the time-frequency regions glimpsed. STGI, on the other hand, applies glimpsing to the spectro-temporal modulation (STM) domain and uses a similarity measure based on the normalized cross-correlation between the STM envelopes of the clean and degraded speech signals to estimate intelligibility as the proportion of the STM channels glimpsed. Our experimental results demonstrate that STGI extends the notion of glimpsing proportion to a wider range of distortions, including non-linear signal processing, and outperforms GP for the additive uncorrelated noise datasets we tested. Furthermore, the results show that spectro-temporal modulation analysis enables STGI to account for the effects of masker type on speech intelligibility, leading to superior performance over GP in modulated noise datasets.
Collapse
Affiliation(s)
- Amin Edraki
- Department of Electrical and Computer Engineering, Queen's University, Kingston, ON K7L 3N6, Canada.
| | - Wai-Yip Chan
- Department of Electrical and Computer Engineering, Queen's University, Kingston, ON K7L 3N6, Canada
| | - Jesper Jensen
- Department of Electronic Systems, Aalborg University, Aalborg 9220, Denmark; Demant A/S, Smørum 2765, Denmark
| | - Daniel Fogerty
- Department of Speech and Hearing Science, University of Illinois Urbana-Champaign, Champaign, IL 61820, USA
| |
Collapse
|
24
|
Marczyk A, O'Brien B, Tremblay P, Woisard V, Ghio A. Correlates of vowel clarity in the spectrotemporal modulation domain: Application to speech impairment evaluation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:2675. [PMID: 36456260 DOI: 10.1121/10.0015024] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 10/13/2022] [Indexed: 06/17/2023]
Abstract
This article reports on vowel clarity metrics based on spectrotemporal modulations of speech signals. Motivated by previous findings on the relevance of modulation-based metrics for speech intelligibility assessment and pathology classification, the current study used factor analysis to identify regions within a bi-dimensional modulation space, the magnitude power spectrum, as in Elliott and Theunissen [(2009). PLoS Comput. Biol. 5(3), e1000302] by relating them to a set of conventional acoustic metrics of vowel space area and vowel distinctiveness. Two indices based on the energy ratio between high and low modulation rates across temporal and spectral dimensions of the modulation space emerged from the analyses. These indices served as input for measurements of central tendency and classification analyses that aimed to identify vowel-related speech impairments in French native speakers with head and neck cancer (HNC) and Parkinson dysarthria (PD). Following the analysis, vowel-related speech impairment was identified in HNC speakers, but not in PD. These results were consistent with findings based on subjective evaluations of speech intelligibility. The findings reported are consistent with previous studies indicating that impaired speech is associated with attenuation in energy in higher spectrotemporal modulation bands.
Collapse
Affiliation(s)
- Anna Marczyk
- Aix-Marseille Université, CNRS, LPL, UMR 7309, Aix-en-Provence, France
| | - Benjamin O'Brien
- Aix-Marseille Université, CNRS, LPL, UMR 7309, Aix-en-Provence, France
| | - Pascale Tremblay
- Universite Laval, Faculte de Medecine, Departement de Readaptation, Quebec City, Quebec G1V 0A6, Canada
| | | | - Alain Ghio
- Aix-Marseille Université, CNRS, LPL, UMR 7309, Aix-en-Provence, France
| |
Collapse
|
25
|
Ramdani C, Ogier M, Coutrot A. Communicating and reading emotion with masked faces in the Covid era: A short review of the literature. Psychiatry Res 2022; 316:114755. [PMID: 35963061 PMCID: PMC9338224 DOI: 10.1016/j.psychres.2022.114755] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Revised: 07/26/2022] [Accepted: 07/28/2022] [Indexed: 11/25/2022]
Abstract
Face masks have proven to be key to slowing down the SARS-Cov2 virus spread in the COVID-19 pandemic context. However, wearing face masks is not devoid of "side-effects", at both the physical and psychosocial levels. In particular, masks hinder emotion reading from facial expressions as they hide a significant part of the face. This disturbs both holistic and featural processing of facial expressions and, therefore, impairs emotion recognition, and influences many aspects of human social behavior. Communication in general is disrupted by face masks, as they modify the wearer's voice and prevent the audience from using lip reading or other non-verbal cues for speech comprehension. Individuals suffering from psychiatric conditions with impairment of communication, are at higher risk of distress because masks increase their difficulties to read emotions from faces. The identification and acknowledgement of these "side-effects" on communication are necessary because they warrant further work on adaptive solutions that will help foster the use of face masks by the greatest number.
Collapse
Affiliation(s)
- Celine Ramdani
- French Armed Forces Biomedical Research Institute, Bretigny sur Orge, France.
| | - Michael Ogier
- French Armed Forces Biomedical Research Institute, Bretigny sur Orge, France
| | | |
Collapse
|
26
|
Kim SG. On the encoding of natural music in computational models and human brains. Front Neurosci 2022; 16:928841. [PMID: 36203808 PMCID: PMC9531138 DOI: 10.3389/fnins.2022.928841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 08/15/2022] [Indexed: 11/13/2022] Open
Abstract
This article discusses recent developments and advances in the neuroscience of music to understand the nature of musical emotion. In particular, it highlights how system identification techniques and computational models of music have advanced our understanding of how the human brain processes the textures and structures of music and how the processed information evokes emotions. Musical models relate physical properties of stimuli to internal representations called features, and predictive models relate features to neural or behavioral responses and test their predictions against independent unseen data. The new frameworks do not require orthogonalized stimuli in controlled experiments to establish reproducible knowledge, which has opened up a new wave of naturalistic neuroscience. The current review focuses on how this trend has transformed the domain of the neuroscience of music.
Collapse
|
27
|
Brungart DS, Sherlock LP, Kuchinsky SE, Perry TT, Bieber RE, Grant KW, Bernstein JGW. Assessment methods for determining small changes in hearing performance over time. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:3866. [PMID: 35778214 DOI: 10.1121/10.0011509] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Although the behavioral pure-tone threshold audiogram is considered the gold standard for quantifying hearing loss, assessment of speech understanding, especially in noise, is more relevant to quality of life but is only partly related to the audiogram. Metrics of speech understanding in noise are therefore an attractive target for assessing hearing over time. However, speech-in-noise assessments have more potential sources of variability than pure-tone threshold measures, making it a challenge to obtain results reliable enough to detect small changes in performance. This review examines the benefits and limitations of speech-understanding metrics and their application to longitudinal hearing assessment, and identifies potential sources of variability, including learning effects, differences in item difficulty, and between- and within-individual variations in effort and motivation. We conclude by recommending the integration of non-speech auditory tests, which provide information about aspects of auditory health that have reduced variability and fewer central influences than speech tests, in parallel with the traditional audiogram and speech-based assessments.
Collapse
Affiliation(s)
- Douglas S Brungart
- Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, Building 19, Floor 5, 4954 North Palmer Road, Bethesda, Maryland 20889, USA
| | - LaGuinn P Sherlock
- Hearing Conservation and Readiness Branch, U.S. Army Public Health Center, E1570 8977 Sibert Road, Aberdeen Proving Ground, Maryland 21010, USA
| | - Stefanie E Kuchinsky
- Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, Building 19, Floor 5, 4954 North Palmer Road, Bethesda, Maryland 20889, USA
| | - Trevor T Perry
- Hearing Conservation and Readiness Branch, U.S. Army Public Health Center, E1570 8977 Sibert Road, Aberdeen Proving Ground, Maryland 21010, USA
| | - Rebecca E Bieber
- Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, Building 19, Floor 5, 4954 North Palmer Road, Bethesda, Maryland 20889, USA
| | - Ken W Grant
- Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, Building 19, Floor 5, 4954 North Palmer Road, Bethesda, Maryland 20889, USA
| | - Joshua G W Bernstein
- Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, Building 19, Floor 5, 4954 North Palmer Road, Bethesda, Maryland 20889, USA
| |
Collapse
|
28
|
Gallun FJ, Coco L, Koerner TK, de Larrea-Mancera ESL, Molis MR, Eddins DA, Seitz AR. Relating Suprathreshold Auditory Processing Abilities to Speech Understanding in Competition. Brain Sci 2022; 12:brainsci12060695. [PMID: 35741581 PMCID: PMC9221421 DOI: 10.3390/brainsci12060695] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Revised: 05/17/2022] [Accepted: 05/25/2022] [Indexed: 11/28/2022] Open
Abstract
(1) Background: Difficulty hearing in noise is exacerbated in older adults. Older adults are more likely to have audiometric hearing loss, although some individuals with normal pure-tone audiograms also have difficulty perceiving speech in noise. Additional variables also likely account for speech understanding in noise. It has been suggested that one important class of variables is the ability to process auditory information once it has been detected. Here, we tested a set of these “suprathreshold” auditory processing abilities and related them to performance on a two-part test of speech understanding in competition with and without spatial separation of the target and masking speech. Testing was administered in the Portable Automated Rapid Testing (PART) application developed by our team; PART facilitates psychoacoustic assessments of auditory processing. (2) Methods: Forty-one individuals (average age 51 years), completed assessments of sensitivity to temporal fine structure (TFS) and spectrotemporal modulation (STM) detection via an iPad running the PART application. Statistical models were used to evaluate the strength of associations between performance on the auditory processing tasks and speech understanding in competition. Age and pure-tone-average (PTA) were also included as potential predictors. (3) Results: The model providing the best fit also included age and a measure of diotic frequency modulation (FM) detection but none of the other potential predictors. However, even the best fitting models accounted for 31% or less of the variance, supporting work suggesting that other variables (e.g., cognitive processing abilities) also contribute significantly to speech understanding in noise. (4) Conclusions: The results of the current study do not provide strong support for previous suggestions that suprathreshold processing abilities alone can be used to explain difficulties in speech understanding in competition among older adults. This discrepancy could be due to the speech tests used, the listeners tested, or the suprathreshold tests chosen. Future work with larger numbers of participants is warranted, including a range of cognitive tests and additional assessments of suprathreshold auditory processing abilities.
Collapse
Affiliation(s)
- Frederick J. Gallun
- Oregon Hearing Research Center, Oregon Health & Science University, Portland, OR 97239, USA; (L.C.); (T.K.K.)
- VA RR&D National Center for Rehabilitative Auditory Research, VA Portland Health Care System, Portland, OR 97239, USA;
- Correspondence: ; Tel.: +1-503-494-4331
| | - Laura Coco
- Oregon Hearing Research Center, Oregon Health & Science University, Portland, OR 97239, USA; (L.C.); (T.K.K.)
- VA RR&D National Center for Rehabilitative Auditory Research, VA Portland Health Care System, Portland, OR 97239, USA;
| | - Tess K. Koerner
- Oregon Hearing Research Center, Oregon Health & Science University, Portland, OR 97239, USA; (L.C.); (T.K.K.)
- VA RR&D National Center for Rehabilitative Auditory Research, VA Portland Health Care System, Portland, OR 97239, USA;
| | | | - Michelle R. Molis
- VA RR&D National Center for Rehabilitative Auditory Research, VA Portland Health Care System, Portland, OR 97239, USA;
| | - David A. Eddins
- Department of Communication Science & Disorders, University of South Florida, Tampa, FL 33620, USA;
| | - Aaron R. Seitz
- Department of Psychology, University of California, Riverside, CA 92521, USA; (E.S.L.d.L.-M.); (A.R.S.)
| |
Collapse
|
29
|
Conroy C, Byrne AJ, Kidd G. Forward masking of spectrotemporal modulation detection. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:1181. [PMID: 35232084 PMCID: PMC8865928 DOI: 10.1121/10.0009404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 01/14/2022] [Accepted: 01/15/2022] [Indexed: 06/14/2023]
Abstract
Recent work has suggested that there may be specialized mechanisms in the auditory system for coding spectrotemporal modulations (STMs), tuned to different combinations of spectral modulation frequency, temporal modulation frequency, and STM sweep direction. The current study sought evidence of such mechanisms using a psychophysical forward masking paradigm. The detectability of a target comprising upward sweeping STMs was measured following the presentation of modulated maskers applied to the same carrier. Four maskers were tested, which had either (1) the same spectral modulation frequency as the target but a flat temporal envelope, (2) the same temporal modulation frequency as the target but a flat spectral envelope, (3) the same spectral and temporal modulation frequencies as the target but the opposite sweep direction (downward sweeping STMs), or (4) the same spectral and temporal modulation frequencies as the target and the same sweep direction (upward sweeping STMs). Forward masking was greatest for the masker fully matched to the target (4), intermediate for the masker with the opposite sweep direction (3), and negligible for the other two (1, 2). These findings are consistent with the suggestion that the detectability of the target was mediated by an STM-specific coding mechanism with sweep-direction selectivity.
Collapse
Affiliation(s)
- Christopher Conroy
- Department of Speech, Language & Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Andrew J Byrne
- Department of Speech, Language & Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Gerald Kidd
- Department of Speech, Language & Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| |
Collapse
|
30
|
Arehart KH, Chon SH, Lundberg EMH, Harvey LO, Kates JM, Anderson MC, Rallapalli VH, Souza PE. A comparison of speech intelligibility and subjective quality with hearing-aid processing in older adults with hearing loss. Int J Audiol 2022; 61:46-58. [PMID: 33913795 PMCID: PMC11108258 DOI: 10.1080/14992027.2021.1900609] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Revised: 12/18/2020] [Accepted: 02/24/2021] [Indexed: 10/21/2022]
Abstract
OBJECTIVE This study characterised the relationship between speech intelligibility and quality in listeners with hearing loss for a range of hearing-aid processing settings and acoustic conditions. DESIGN Binaural speech intelligibility scores and quality ratings were measured for sentences presented in babble noise and processed through a hearing-aid simulation. The intelligibility-quality relationship was investigated by (1) assessing the effects of experimental conditions on each task; (2) directly comparing intelligibility scores and quality ratings for each participant across the range of conditions; and (3) comparing the association between signal envelope fidelity (represented by a cepstral correlation metric) and intelligibility and quality. STUDY SAMPLE Participants were 15 adults (7 females; age range 59-81 years) with mild to moderately severe sensorineural hearing loss. RESULTS Intelligibility and quality showed a positive association both with each other and with changes to signal fidelity introduced by the entire acoustic and signal-processing system including the additive noise and the hearing-aid output. As signal fidelity decreased, quality ratings changed at a slower rate than intelligibility scores. Individual psychometric functions were more variable for quality compared to intelligibility. CONCLUSIONS Variability in the intelligibility-quality relationship reinforces the importance of measuring both intelligibility and quality in clinical hearing-aid fittings.
Collapse
Affiliation(s)
| | - Song Hui Chon
- Audio Engineering Technology, Belmont University, Nashville, TN, USA
| | | | - Lewis O. Harvey
- Psychology and Neuroscience, University of Colorado Boulder, Boulder, CO, USA
| | - James M. Kates
- SLHS Department, University of Colorado Boulder, Boulder, CO, USA
| | | | - Varsha H. Rallapalli
- Department of Communication Sciences and Disorders, Northwestern University, Evanston, IL, USA
| | - Pamela E. Souza
- Department of Communication Sciences and Disorders, Northwestern University, Evanston, IL, USA
| |
Collapse
|
31
|
Veugen LCE, van Opstal AJ, van Wanrooij MM. Reaction Time Sensitivity to Spectrotemporal Modulations of Sound. Trends Hear 2022; 26:23312165221127589. [PMID: 36172759 PMCID: PMC9523861 DOI: 10.1177/23312165221127589] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 07/18/2022] [Accepted: 09/02/2022] [Indexed: 11/24/2022] Open
Abstract
We tested whether sensitivity to acoustic spectrotemporal modulations can be observed from reaction times for normal-hearing and impaired-hearing conditions. In a manual reaction-time task, normal-hearing listeners had to detect the onset of a ripple (with density between 0-8 cycles/octave and a fixed modulation depth of 50%), that moved up or down the log-frequency axis at constant velocity (between 0-64 Hz), in an otherwise-unmodulated broadband white-noise. Spectral and temporal modulations elicited band-pass filtered sensitivity characteristics, with fastest detection rates around 1 cycle/oct and 32 Hz for normal-hearing conditions. These results closely resemble data from other studies that typically used the modulation-depth threshold as a sensitivity criterion. To simulate hearing-impairment, stimuli were processed with a 6-channel cochlear-implant vocoder, and a hearing-aid simulation that introduced separate spectral smearing and low-pass filtering. Reaction times were always much slower compared to normal hearing, especially for the highest spectral densities. Binaural performance was predicted well by the benchmark race model of binaural independence, which models statistical facilitation of independent monaural channels. For the impaired-hearing simulations this implied a "best-of-both-worlds" principle in which the listeners relied on the hearing-aid ear to detect spectral modulations, and on the cochlear-implant ear for temporal-modulation detection. Although singular-value decomposition indicated that the joint spectrotemporal sensitivity matrix could be largely reconstructed from independent temporal and spectral sensitivity functions, in line with time-spectrum separability, a substantial inseparable spectral-temporal interaction was present in all hearing conditions. These results suggest that the reaction-time task yields a valid and effective objective measure of acoustic spectrotemporal-modulation sensitivity.
Collapse
Affiliation(s)
- Lidwien C. E. Veugen
- Department of Biophysics, Donders Institute for Brain, Cognition and Behavior, Radboud University, Nijmegen, Netherlands
| | - A. John van Opstal
- Department of Biophysics, Donders Institute for Brain, Cognition and Behavior, Radboud University, Nijmegen, Netherlands
| | - Marc M. van Wanrooij
- Department of Biophysics, Donders Institute for Brain, Cognition and Behavior, Radboud University, Nijmegen, Netherlands
| |
Collapse
|
32
|
Reybrouck M, Vuust P, Brattico E. Neural Correlates of Music Listening: Does the Music Matter? Brain Sci 2021; 11:1553. [PMID: 34942855 PMCID: PMC8699514 DOI: 10.3390/brainsci11121553] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 11/16/2021] [Accepted: 11/18/2021] [Indexed: 11/29/2022] Open
Abstract
The last decades have seen a proliferation of music and brain studies, with a major focus on plastic changes as the outcome of continuous and prolonged engagement with music. Thanks to the advent of neuroaesthetics, research on music cognition has broadened its scope by considering the multifarious phenomenon of listening in all its forms, including incidental listening up to the skillful attentive listening of experts, and all its possible effects. These latter range from objective and sensorial effects directly linked to the acoustic features of the music to the subjectively affective and even transformational effects for the listener. Of special importance is the finding that neural activity in the reward circuit of the brain is a key component of a conscious listening experience. We propose that the connection between music and the reward system makes music listening a gate towards not only hedonia but also eudaimonia, namely a life well lived, full of meaning that aims at realizing one's own "daimon" or true nature. It is argued, further, that music listening, even when conceptualized in this aesthetic and eudaimonic framework, remains a learnable skill that changes the way brain structures respond to sounds and how they interact with each other.
Collapse
Affiliation(s)
- Mark Reybrouck
- Faculty of Arts, University of Leuven, 3000 Leuven, Belgium
- Department of Art History, Musicology and Theater Studies, IPEM Institute for Psychoacoustics and Electronic Music, 9000 Ghent, Belgium
| | - Peter Vuust
- Center for Music in the Brain, Department of Clinical Medicine, Aarhus University, 8000 Aarhus, Denmark; (P.V.); (E.B.)
- The Royal Academy of Music Aarhus/Aalborg, 8000 Aarhus, Denmark
| | - Elvira Brattico
- Center for Music in the Brain, Department of Clinical Medicine, Aarhus University, 8000 Aarhus, Denmark; (P.V.); (E.B.)
- Department of Education, Psychology, Communication, University of Bari Aldo Moro, 70122 Bari, Italy
| |
Collapse
|
33
|
Sidiras C, Sanchez-Lopez R, Pedersen ER, Sørensen CB, Nielsen J, Schmidt JH. User-Operated Audiometry Project (UAud) - Introducing an Automated User-Operated System for Audiometric Testing Into Everyday Clinic Practice. Front Digit Health 2021; 3:724748. [PMID: 34713194 PMCID: PMC8529271 DOI: 10.3389/fdgth.2021.724748] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 09/13/2021] [Indexed: 11/16/2022] Open
Abstract
Hearing loss is the third leading cause of years lived with disability. It is estimated that 430 million people worldwide are affected, and the number of cases is expected to increase in the future. There is therefore increased pressure on hearing health systems around the world to improve efficiency and reduce costs to ensure increased access to quality hearing health care. Here, we describe the User-Operated Audiometry project, the goal of which is to introduce an automated system for user-operated audiometric testing into everyday clinic practice as a means to relieve part of this pressure. The alternative to the existing referral route is presented in which examination is executed via the user-operated system. This route is conceptualized as an interaction between the patient, the system, and the hearing care professional (HCP). Technological requirements of the system and challenges that are related to the interaction between patients, the user-operated system, and the HCPs within the specific medical setting are discussed. Lastly, a strategy for the development and implementation of user-operated audiometry is presented, which includes initial investigations, a validation study, and implementation in a real-life clinical situation.
Collapse
Affiliation(s)
- Christos Sidiras
- Faculty of Engineering, The Maersk Mc-Kinney Møller Institute, University of Southern Denmark, Odense, Denmark
| | - Raul Sanchez-Lopez
- Interacoustics Research Unit, Kongens Lyngby, Denmark.,Hearing Systems Section, Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Ellen Raben Pedersen
- Faculty of Engineering, The Maersk Mc-Kinney Møller Institute, University of Southern Denmark, Odense, Denmark
| | - Chris Bang Sørensen
- Faculty of Engineering, The Maersk Mc-Kinney Møller Institute, University of Southern Denmark, Odense, Denmark
| | - Jacob Nielsen
- Faculty of Engineering, The Maersk Mc-Kinney Møller Institute, University of Southern Denmark, Odense, Denmark
| | - Jesper Hvass Schmidt
- Department of Clinical Research, Faculty of Health Science, University of Southern Denmark, Odense, Denmark.,OPEN, Open Patient Data Explorative Network, Odense University Hospital, Odense, Denmark.,Research Unit for ORL-Head and Neck Surgery and Audiology, Odense University Hospital and University of Southern Denmark, Odense, Denmark
| |
Collapse
|
34
|
Homma NY, Bajo VM. Lemniscal Corticothalamic Feedback in Auditory Scene Analysis. Front Neurosci 2021; 15:723893. [PMID: 34489635 PMCID: PMC8417129 DOI: 10.3389/fnins.2021.723893] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Accepted: 07/30/2021] [Indexed: 12/15/2022] Open
Abstract
Sound information is transmitted from the ear to central auditory stations of the brain via several nuclei. In addition to these ascending pathways there exist descending projections that can influence the information processing at each of these nuclei. A major descending pathway in the auditory system is the feedback projection from layer VI of the primary auditory cortex (A1) to the ventral division of medial geniculate body (MGBv) in the thalamus. The corticothalamic axons have small glutamatergic terminals that can modulate thalamic processing and thalamocortical information transmission. Corticothalamic neurons also provide input to GABAergic neurons of the thalamic reticular nucleus (TRN) that receives collaterals from the ascending thalamic axons. The balance of corticothalamic and TRN inputs has been shown to refine frequency tuning, firing patterns, and gating of MGBv neurons. Therefore, the thalamus is not merely a relay stage in the chain of auditory nuclei but does participate in complex aspects of sound processing that include top-down modulations. In this review, we aim (i) to examine how lemniscal corticothalamic feedback modulates responses in MGBv neurons, and (ii) to explore how the feedback contributes to auditory scene analysis, particularly on frequency and harmonic perception. Finally, we will discuss potential implications of the role of corticothalamic feedback in music and speech perception, where precise spectral and temporal processing is essential.
Collapse
Affiliation(s)
- Natsumi Y. Homma
- Center for Integrative Neuroscience, University of California, San Francisco, San Francisco, CA, United States
- Coleman Memorial Laboratory, Department of Otolaryngology – Head and Neck Surgery, University of California, San Francisco, San Francisco, CA, United States
| | - Victoria M. Bajo
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
35
|
Homma NY, Hullett PW, Atencio CA, Schreiner CE. Auditory Cortical Plasticity Dependent on Environmental Noise Statistics. Cell Rep 2021; 30:4445-4458.e5. [PMID: 32234479 PMCID: PMC7326484 DOI: 10.1016/j.celrep.2020.03.014] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Revised: 08/07/2019] [Accepted: 03/05/2020] [Indexed: 01/14/2023] Open
Abstract
During critical periods, neural circuits develop to form receptive fields that adapt to the sensory environment and enable optimal performance of relevant tasks. We hypothesized that early exposure to background noise can improve signal-in-noise processing, and the resulting receptive field plasticity in the primary auditory cortex can reveal functional principles guiding that important task. We raised rat pups in different spectro-temporal noise statistics during their auditory critical period. As adults, they showed enhanced behavioral performance in detecting vocalizations in noise. Concomitantly, encoding of vocalizations in noise in the primary auditory cortex improves with noise-rearing. Significantly, spectro-temporal modulation plasticity shifts cortical preferences away from the exposed noise statistics, thus reducing noise interference with the foreground sound representation. Auditory cortical plasticity shapes receptive field preferences to optimally extract foreground information in noisy environments during noise-rearing. Early noise exposure induces cortical circuits to implement efficient coding in the joint spectral and temporal modulation domain. After rearing rats in moderately loud spectro-temporally modulated background noise, Homma et al. investigated signal-in-noise processing in the primary auditory cortex. Noise-rearing improved vocalization-in-noise performance in both behavioral testing and neural decoding. Cortical plasticity shifted neuronal spectro-temporal modulation preferences away from the exposed noise statistics.
Collapse
Affiliation(s)
- Natsumi Y Homma
- Coleman Memorial Laboratory, Department of Otolaryngology - Head and Neck Surgery, University of California, San Francisco, San Francisco, CA 94143, USA; Center for Integrative Neuroscience, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Patrick W Hullett
- Coleman Memorial Laboratory, Department of Otolaryngology - Head and Neck Surgery, University of California, San Francisco, San Francisco, CA 94143, USA; Center for Integrative Neuroscience, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Craig A Atencio
- Coleman Memorial Laboratory, Department of Otolaryngology - Head and Neck Surgery, University of California, San Francisco, San Francisco, CA 94143, USA; Center for Integrative Neuroscience, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Christoph E Schreiner
- Coleman Memorial Laboratory, Department of Otolaryngology - Head and Neck Surgery, University of California, San Francisco, San Francisco, CA 94143, USA; Center for Integrative Neuroscience, University of California, San Francisco, San Francisco, CA 94143, USA.
| |
Collapse
|
36
|
Stavropoulos TA, Isarangura S, Hoover EC, Eddins DA, Seitz AR, Gallun FJ. Exponential spectro-temporal modulation generation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:1434. [PMID: 33765775 PMCID: PMC8097710 DOI: 10.1121/10.0003604] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 01/20/2021] [Accepted: 02/06/2021] [Indexed: 05/23/2023]
Abstract
Traditionally, real-time generation of spectro-temporally modulated noise has been performed on a linear amplitude scale, partially due to computational constraints. Experiments often require modulation that is sinusoidal on a logarithmic amplitude scale as a result of the many perceptual and physiological measures which scale linearly with exponential changes in the signal magnitude. A method is presented for computing exponential spectro-temporal modulation, showing that it can be expressed analytically as a sum over linearly offset sidebands with component amplitudes equal to the values of the modified Bessel function of the first kind. This approach greatly improves the efficiency and precision of stimulus generation over current methods, facilitating real-time generation for a broad range of carrier and envelope signals.
Collapse
Affiliation(s)
- Trevor A Stavropoulos
- Brain Game Center for Mental Fitness and Well-being, University of California, Riverside, California 92521, USA
| | - Sittiprapa Isarangura
- Department of Communication Sciences and Disorders, Mahidol University, Bangkok, Thailand
| | - Eric C Hoover
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA
| | - David A Eddins
- Auditory and Speech Science Laboratory, University of South Florida, Tampa, Florida 33612, USA
| | - Aaron R Seitz
- Brain Game Center for Mental Fitness and Well-being, University of California, Riverside, California 92521, USA
| | - Frederick J Gallun
- National Center for Rehabilitative Auditory Research, Portland VA Medical Center, Portland, Oregon 97239, USA
| |
Collapse
|
37
|
Stefaniak JD, Lambon Ralph MA, De Dios Perez B, Griffiths TD, Grube M. Auditory beat perception is related to speech output fluency in post-stroke aphasia. Sci Rep 2021; 11:3168. [PMID: 33542379 PMCID: PMC7862238 DOI: 10.1038/s41598-021-82809-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Accepted: 01/25/2021] [Indexed: 11/08/2022] Open
Abstract
Aphasia affects at least one third of stroke survivors, and there is increasing awareness that more fundamental deficits in auditory processing might contribute to impaired language performance in such individuals. We performed a comprehensive battery of psychoacoustic tasks assessing the perception of tone pairs and sequences across the domains of pitch, rhythm and timbre in 17 individuals with post-stroke aphasia and 17 controls. At the level of individual differences we demonstrated a correlation between metrical pattern (beat) perception and speech output fluency with strong effect (Spearman's rho = 0.72). This dissociated from more basic auditory timing perception, which did not correlate with output fluency. This was also specific in terms of the language and cognitive measures, amongst which phonological, semantic and executive function did not correlate with beat detection. We interpret the data in terms of a requirement for the analysis of the metrical structure of sound to construct fluent output, with both being a function of higher-order "temporal scaffolding". The beat perception task herein allows measurement of timing analysis without any need to account for motor output deficit, and could be a potential clinical tool to examine this. This work suggests strategies to improve fluency after stroke by training in metrical pattern perception.
Collapse
Affiliation(s)
- James D Stefaniak
- Division of Neuroscience and Experimental Psychology, University of Manchester, Manchester Academic Health Science Centre, Oxford Road, Manchester, UK.
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK.
| | | | - Blanca De Dios Perez
- Division of Psychiatry and Applied Psychology, University of Nottingham, Nottingham, UK
| | - Timothy D Griffiths
- Newcastle University Medical School, Framlington Place, Newcastle-upon-Tyne, UK
- Wellcome Centre for Human Neuroimaging, University College London, London, UK
| | - Manon Grube
- Newcastle University Medical School, Framlington Place, Newcastle-upon-Tyne, UK
- Center for Music in the Brain, Department of Clinical Medicine, Aarhus University, Aarhus, Denmark
| |
Collapse
|
38
|
Ponsot E, Varnet L, Wallaert N, Daoud E, Shamma SA, Lorenzi C, Neri P. Mechanisms of Spectrotemporal Modulation Detection for Normal- and Hearing-Impaired Listeners. Trends Hear 2021; 25:2331216520978029. [PMID: 33620023 PMCID: PMC7905488 DOI: 10.1177/2331216520978029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Revised: 10/26/2020] [Accepted: 11/06/2020] [Indexed: 11/20/2022] Open
Abstract
Spectrotemporal modulations (STM) are essential features of speech signals that make them intelligible. While their encoding has been widely investigated in neurophysiology, we still lack a full understanding of how STMs are processed at the behavioral level and how cochlear hearing loss impacts this processing. Here, we introduce a novel methodological framework based on psychophysical reverse correlation deployed in the modulation space to characterize the mechanisms underlying STM detection in noise. We derive perceptual filters for young normal-hearing and older hearing-impaired individuals performing a detection task of an elementary target STM (a given product of temporal and spectral modulations) embedded in other masking STMs. Analyzed with computational tools, our data show that both groups rely on a comparable linear (band-pass)-nonlinear processing cascade, which can be well accounted for by a temporal modulation filter bank model combined with cross-correlation against the target representation. Our results also suggest that the modulation mistuning observed for the hearing-impaired group results primarily from broader cochlear filters. Yet, we find idiosyncratic behaviors that cannot be captured by cochlear tuning alone, highlighting the need to consider variability originating from additional mechanisms. Overall, this integrated experimental-computational approach offers a principled way to assess suprathreshold processing distortions in each individual and could thus be used to further investigate interindividual differences in speech intelligibility.
Collapse
Affiliation(s)
- Emmanuel Ponsot
- Laboratoire des systèmes perceptifs, Département
d′études cognitives, École normale supérieure, Université PSL, CNRS,
Paris, France
- Hearing Technology @ WAVES, Department of Information
Technology, Ghent University, Ghent, Belgium
| | - Léo Varnet
- Laboratoire des systèmes perceptifs, Département
d′études cognitives, École normale supérieure, Université PSL, CNRS,
Paris, France
| | - Nicolas Wallaert
- Laboratoire des systèmes perceptifs, Département
d′études cognitives, École normale supérieure, Université PSL, CNRS,
Paris, France
| | - Elza Daoud
- Aix-Marseille Université, UMR CNRS 7260, Laboratoire
Neurosciences Intégratives et Adaptatives, Centre Saint-Charles,
Marseille, France
| | - Shihab A. Shamma
- Laboratoire des systèmes perceptifs, Département
d′études cognitives, École normale supérieure, Université PSL, CNRS,
Paris, France
| | - Christian Lorenzi
- Laboratoire des systèmes perceptifs, Département
d′études cognitives, École normale supérieure, Université PSL, CNRS,
Paris, France
| | - Peter Neri
- Laboratoire des systèmes perceptifs, Département
d′études cognitives, École normale supérieure, Université PSL, CNRS,
Paris, France
| |
Collapse
|
39
|
Cortical potentials evoked by tone frequency changes compared to frequency discrimination and speech perception: Thresholds in normal-hearing and hearing-impaired subjects. Hear Res 2020; 401:108154. [PMID: 33387905 DOI: 10.1016/j.heares.2020.108154] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 11/29/2020] [Accepted: 12/08/2020] [Indexed: 11/21/2022]
Abstract
Frequency discrimination ability varies within the normal hearing population, partially explained by factors such as musical training and age, and it deteriorates with hearing loss. Frequency discrimination, while essential for several auditory tasks, is not routinely measured in clinical setting. This study investigates cortical auditory evoked potentials in response to frequency changes, known as acoustic change complexes (ACCs), and explores their value as a clinically applicable objective measurement of frequency discrimination. In 12 normal-hearing and 13 age-matched hearing-impaired subjects, ACC thresholds were recorded at 4 base frequencies (0.5, 1, 2, 4 kHz) and compared to psychophysically assessed frequency discrimination thresholds. ACC thresholds had a moderate to strong correlation to psychophysical frequency discrimination thresholds. In addition, ACC thresholds increased with hearing loss and higher ACC thresholds were associated with poorer speech perception in noise. The ACC threshold in response to a frequency change therefore holds promise as an objective clinical measurement in hearing impairment, indicative of frequency discrimination ability and related to speech perception. However, recordings as conducted in the current study are relatively time consuming. The current clinical application would be most relevant in cases where behavioral testing is unreliable.
Collapse
|
40
|
Learning metrics on spectrotemporal modulations reveals the perception of musical instrument timbre. Nat Hum Behav 2020; 5:369-377. [PMID: 33257878 DOI: 10.1038/s41562-020-00987-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Accepted: 09/18/2020] [Indexed: 11/08/2022]
Abstract
Humans excel at using sounds to make judgements about their immediate environment. In particular, timbre is an auditory attribute that conveys crucial information about the identity of a sound source, especially for music. While timbre has been primarily considered to occupy a multidimensional space, unravelling the acoustic correlates of timbre remains a challenge. Here we re-analyse 17 datasets from published studies between 1977 and 2016 and observe that original results are only partially replicable. We use a data-driven computational account to reveal the acoustic correlates of timbre. Human dissimilarity ratings are simulated with metrics learned on acoustic spectrotemporal modulation models inspired by cortical processing. We observe that timbre has both generic and experiment-specific acoustic correlates. These findings provide a broad overview of former studies on musical timbre and identify its relevant acoustic substrates according to biologically inspired models.
Collapse
|
41
|
Edraki A, Chan WY, Jensen J, Fogerty D. Speech Intelligibility Prediction using Spectro-Temporal Modulation Analysis. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 2020; 29:210-225. [PMID: 33748329 PMCID: PMC7978234 DOI: 10.1109/taslp.2020.3039929] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Spectro-temporal modulations are believed to mediate the analysis of speech sounds in the human primary auditory cortex. Inspired by humans' robustness in comprehending speech in challenging acoustic environments, we propose an intrusive speech intelligibility prediction (SIP) algorithm, wSTMI, for normal-hearing listeners based on spectro-temporal modulation analysis (STMA) of the clean and degraded speech signals. In the STMA, each of 55 modulation frequency channels contributes an intermediate intelligibility measure. A sparse linear model with parameters optimized using Lasso regression results in combining the intermediate measures of 8 of the most salient channels for SIP. In comparison with a suite of 10 SIP algorithms, wSTMI performs consistently well across 13 datasets, which together cover degradation conditions including modulated noise, noise reduction processing, reverberation, near-end listening enhancement, and speech interruption. We show that the optimized parameters of wSTMI may be interpreted in terms of modulation transfer functions of the human auditory system. Thus, the proposed approach offers evidence affirming previous studies of the perceptual characteristics underlying speech signal intelligibility.
Collapse
Affiliation(s)
- Amin Edraki
- Department of Electrical and Computer Engineering, Queen's University, Kingston, ON K7L 3N6, Canada
| | - Wai-Yip Chan
- Department of Electrical and Computer Engineering, Queen's University, Kingston, ON K7L 3N6, Canada
| | - Jesper Jensen
- Department of Electronic Systems, Aalborg University, 9220 Aalborg, Denmark
| | - Daniel Fogerty
- Department of Speech and Hearing Science, University of Illinois at Urbana-Champaign, Champaign, IL 61820, USA
| |
Collapse
|
42
|
Coding of consonant-vowel transition in children with central auditory processing disorder: an electrophysiological study. Eur Arch Otorhinolaryngol 2020; 278:3673-3681. [PMID: 33052460 DOI: 10.1007/s00405-020-06425-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Accepted: 10/05/2020] [Indexed: 10/23/2022]
Abstract
INTRODUCTION Acoustic change complex (ACC) is an important tool to investigate the encoding of the acoustic property of speech signals in various populations. However, there is a limited number of research papers that have explored the usefulness of ACC as a tool to study the neural encoding of consonant-vowel (CV) transition in children with central auditory processing disorder (CAPD). Thus, the present study aims to investigate the utility of ACC as an objective tool to study the neural representation of consonant-vowel (CV) transition in children with CAPD. METHODS Twenty children diagnosed having CAPD and 20 normal counterparts in the age range of 8-14 years were the participants. The ACC was acquired using naturally produced CV syllable /sa/ with a duration of 380 ms. RESULTS Latency of N1' and P2' was found to be prolonged in children with CAPD compared to normal counterparts, whereas the amplitude of N1' and P2' did not show any significant difference. Scalp topography showed significantly different activation patterns for children with and without CAPD. CONCLUSION Prolonged latencies of ACC indicated poor encoding of CV transition in children with CAPD. The difference in scalp topography might be because of the involvement of additional brain areas for the neural discrimination task in children with CAPD.
Collapse
|
43
|
Momeni M, Rahmani M. Speech signal analysis of alzheimer's diseases in farsi using auditory model system. Cogn Neurodyn 2020; 15:453-461. [PMID: 34040671 DOI: 10.1007/s11571-020-09644-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Revised: 09/21/2020] [Accepted: 10/06/2020] [Indexed: 11/25/2022] Open
Abstract
In recent years, extensive studies have been conducted on the diagnosis of Alzheimer's disease (AD) using the non-invasive speech signal recognition method. In this study, Farsi speech signals were analyzed using the auditory model system (AMS) in order to recognize AD. For this purpose, after the pre-processing of the speech signals and utilizing AMS, 4D outputs as function of time, frequency, rate, and scale range were obtained. The AMS outcomes were averaged in term of time to analyze the rate-frequency-scale for both groups, Alzheimer's and healthy control subjects. Thereafter, the maximum of spectral and temporal modulation and frequency were extracted to classify by the support vector machine (SVM). The SVM achieves higher promising recognition accuracy with compare to prevalent approaches in the field of speech processing. The acceptable results demonstrate the applicability of the proposed algorithm in non-invasive and low-cost recognizing Alzheimer's only using the few extracted features of the speech signal.
Collapse
Affiliation(s)
- Maryam Momeni
- Department of Electrical Engineering, Faculty of Engineering, Arak University, Arak, Iran
| | - Mahdiyeh Rahmani
- Department of Electrical Engineering, Faculty of Engineering, Arak University, Arak, Iran
| |
Collapse
|
44
|
Lelo de Larrea-Mancera ES, Stavropoulos T, Hoover EC, Eddins DA, Gallun FJ, Seitz AR. Portable Automated Rapid Testing (PART) for auditory assessment: Validation in a young adult normal-hearing population. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:1831. [PMID: 33138479 PMCID: PMC7541091 DOI: 10.1121/10.0002108] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/26/2019] [Revised: 09/14/2020] [Accepted: 09/16/2020] [Indexed: 05/23/2023]
Abstract
This study aims to determine the degree to which Portable Automated Rapid Testing (PART), a freely available program running on a tablet computer, is capable of reproducing standard laboratory results. Undergraduate students were assigned to one of three within-subject conditions that examined repeatability of performance on a battery of psychoacoustical tests of temporal fine structure processing, spectro-temporal amplitude modulation, and targets in competition. The repeatability condition examined test/retest with the same system, the headphones condition examined the effects of varying headphones (passive and active noise-attenuating), and the noise condition examined repeatability in the presence of recorded cafeteria noise. In general, performance on the test battery showed high repeatability, even across manipulated conditions, and was similar to that reported in the literature. These data serve as validation that suprathreshold psychoacoustical tests can be made accessible to run on consumer-grade hardware and perform in less controlled settings. This dataset also provides a distribution of thresholds that can be used as a normative baseline against which auditory dysfunction can be identified in future work.
Collapse
Affiliation(s)
| | - Trevor Stavropoulos
- Brain Game Center, University of California Riverside, 1201 University Avenue, Riverside California 92521, USA
| | - Eric C Hoover
- University of Maryland, College Park, Maryland 20742, USA
| | | | | | - Aaron R Seitz
- Psychology Department, University of California, Riverside, 900 University Avenue, Riverside, California 92521, USA
| |
Collapse
|
45
|
Keshishian M, Akbari H, Khalighinejad B, Herrero JL, Mehta AD, Mesgarani N. Estimating and interpreting nonlinear receptive field of sensory neural responses with deep neural network models. eLife 2020; 9:53445. [PMID: 32589140 PMCID: PMC7347387 DOI: 10.7554/elife.53445] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Accepted: 06/21/2020] [Indexed: 12/21/2022] Open
Abstract
Our understanding of nonlinear stimulus transformations by neural circuits is hindered by the lack of comprehensive yet interpretable computational modeling frameworks. Here, we propose a data-driven approach based on deep neural networks to directly model arbitrarily nonlinear stimulus-response mappings. Reformulating the exact function of a trained neural network as a collection of stimulus-dependent linear functions enables a locally linear receptive field interpretation of the neural network. Predicting the neural responses recorded invasively from the auditory cortex of neurosurgical patients as they listened to speech, this approach significantly improves the prediction accuracy of auditory cortical responses, particularly in nonprimary areas. Moreover, interpreting the functions learned by neural networks uncovered three distinct types of nonlinear transformations of speech that varied considerably from primary to nonprimary auditory regions. The ability of this framework to capture arbitrary stimulus-response mappings while maintaining model interpretability leads to a better understanding of cortical processing of sensory signals.
Collapse
Affiliation(s)
- Menoua Keshishian
- Department of Electrical Engineering, Columbia University, New York, United States.,Zuckerman Mind Brain Behavior Institute, Columbia University, New York, United States
| | - Hassan Akbari
- Department of Electrical Engineering, Columbia University, New York, United States.,Zuckerman Mind Brain Behavior Institute, Columbia University, New York, United States
| | - Bahar Khalighinejad
- Department of Electrical Engineering, Columbia University, New York, United States.,Zuckerman Mind Brain Behavior Institute, Columbia University, New York, United States
| | - Jose L Herrero
- Feinstein Institute for Medical Research, Manhasset, United States.,Department of Neurosurgery, Hofstra-Northwell School of Medicine and Feinstein Institute for Medical Research, Manhasset, United States
| | - Ashesh D Mehta
- Feinstein Institute for Medical Research, Manhasset, United States.,Department of Neurosurgery, Hofstra-Northwell School of Medicine and Feinstein Institute for Medical Research, Manhasset, United States
| | - Nima Mesgarani
- Department of Electrical Engineering, Columbia University, New York, United States.,Zuckerman Mind Brain Behavior Institute, Columbia University, New York, United States
| |
Collapse
|
46
|
Spiking network optimized for word recognition in noise predicts auditory system hierarchy. PLoS Comput Biol 2020; 16:e1007558. [PMID: 32559204 PMCID: PMC7329140 DOI: 10.1371/journal.pcbi.1007558] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 07/01/2020] [Accepted: 11/22/2019] [Indexed: 11/21/2022] Open
Abstract
The auditory neural code is resilient to acoustic variability and capable of recognizing sounds amongst competing sound sources, yet, the transformations enabling noise robust abilities are largely unknown. We report that a hierarchical spiking neural network (HSNN) optimized to maximize word recognition accuracy in noise and multiple talkers predicts organizational hierarchy of the ascending auditory pathway. Comparisons with data from auditory nerve, midbrain, thalamus and cortex reveals that the optimal HSNN predicts several transformations of the ascending auditory pathway including a sequential loss of temporal resolution and synchronization ability, increasing sparseness, and selectivity. The optimal organizational scheme enhances performance by selectively filtering out noise and fast temporal cues such as voicing periodicity, that are not directly relevant to the word recognition task. An identical network arranged to enable high information transfer fails to predict auditory pathway organization and has substantially poorer performance. Furthermore, conventional single-layer linear and nonlinear receptive field networks that capture the overall feature extraction of the HSNN fail to achieve similar performance. The findings suggest that the auditory pathway hierarchy and its sequential nonlinear feature extraction computations enhance relevant cues while removing non-informative sources of noise, thus enhancing the representation of sounds in noise impoverished conditions. The brain’s ability to recognize sounds in the presence of competing sounds or background noise is essential for everyday hearing tasks. How the brain accomplishes noise resiliency, however, is poorly understood. Using neural recordings from the ascending auditory pathway and an auditory spiking network model trained for sound recognition in noise we explore the computational strategies that enable noise robustness. Our results suggest that the hierarchical feature organization of the ascending auditory pathway and the resulting computations are critical for sound recognition in the presence of noise.
Collapse
|
47
|
Lotfi Y, Moossavi A, Afshari PJ, Bakhshi E, Sadjedi H. Spectro-temporal modulation detection and its relation to speech perception in children with auditory processing disorder. Int J Pediatr Otorhinolaryngol 2020; 131:109860. [PMID: 31958768 DOI: 10.1016/j.ijporl.2020.109860] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/06/2019] [Revised: 12/31/2019] [Accepted: 01/01/2020] [Indexed: 11/18/2022]
Abstract
OBJECTIVES Poor speech perception in noise is one of the most common complaints reported for children with auditory processing disorder (APD). APD is defined as a deficit in perceptual processing of acoustic information in the auditory system in which decreased spectro-temporal resolution may also contribute. Since the recognition of spoken message in the context of other sounds, is based on the processing of auditory spectro-temporal modulations, the assessment of spectro-temporal modulations sensitivity can evaluate the listener's ability to retrieve and integrate speech segments covered by noise. Therefore, the purpose of this study was to examine spectro-temporal modulation (STM) detection and its relation to speech perception in children with APD and to compare the results with aged-matched normally developed children. METHODS 35 children with APD and 32 normal hearing children (8-12 years old) were enrolled. In order to examine STM detection performance, six different STM stimulus conditions were employed using three different temporal modulation rates (4, 12 and 32 Hz) and two different spectral modulation densities (0.5 and 2.0 cycles/octave). Initially, the STM detection thresholds at these six STM stimulus conditions were measured in both groups and the results were compared. Thereafter, the relation between STM detection thresholds and speech perception tests, including consonant-vowel in noise and word in noise tests were assessed. RESULTS The STM sensitivity was poorer than normal for APD children at all STM stimulus conditions. Children with APD displayed significantly poorer STM detection thresholds than those of normally developed children (p < 0.05). Significant correlations were found between STM detection thresholds and speech perception in noise in both groups (p < 0.05). CONCLUSION The results suggest that the altered encoding of spectro-temporal acoustic cues in the auditory nervous system may be one of the underlying factors of reduced STM detection performance in children with APD. The present study may suggest that poor ability to extract STM cues in children with APD, can be an underlying factor for their listening problems in noise and poor speech perception in challenging situations.
Collapse
Affiliation(s)
- Younes Lotfi
- Department of Audiology, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Abdollah Moossavi
- Department of Otolaryngology and Head and Neck Surgery, School of Medicine, Iran University of Medical Sciences, Tehran, Iran
| | | | - Enayatollah Bakhshi
- Department of Biostatistics, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Hamed Sadjedi
- Faculty of Engineering, Shahed University, Tehran, Iran
| |
Collapse
|
48
|
Narne VK, Jain S, Sharma C, Baer T, Moore BCJ. Narrow-band ripple glide direction discrimination and its relationship to frequency selectivity estimated using psychophysical tuning curves. Hear Res 2020; 389:107910. [PMID: 32086020 DOI: 10.1016/j.heares.2020.107910] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Revised: 01/29/2020] [Accepted: 02/06/2020] [Indexed: 10/25/2022]
Abstract
The highest spectral ripple density at which the discrimination of ripple glide direction was possible (STRtdir task) was assessed for one-octave wide (narrowband) stimuli with center frequencies of 500, 1000, 2000, and 4000 Hz and for a broadband stimulus. A pink noise lowpass filtered at the lower edge frequency of the rippled-noise stimuli was used to mask possible combination ripples. The relationship between thresholds measured using the STRtdir task and estimates of the sharpness of tuning (Q10) derived from fast psychophysical tuning curves was assessed for subjects with normal hearing (NH) and cochlear hearing loss (CHL). The STRtdir thresholds for the narrowband stimuli were highly correlated with Q10 values for the same center frequency, supporting the idea that STRtdir thresholds for the narrowband stimuli provide a good measure of frequency resolution. Both the STRtdir thresholds and the Q10 values were lower (worse) for the subjects with CHL than for the subjects with NH. For both the NH and CHL subjects, mean STRtdir thresholds for the broadband stimulus were not significantly higher (better) than for the narrowband stimuli, suggesting little or no ability to combine information across center frequencies.
Collapse
Affiliation(s)
- Vijaya Kumar Narne
- Department of Audiology, JSS Institute of Speech and Hearing, Mysore, India.
| | - Saransh Jain
- Department of Audiology, JSS Institute of Speech and Hearing, Mysore, India
| | - Chitkala Sharma
- Department of Audiology, JSS Institute of Speech and Hearing, Mysore, India
| | - Thomas Baer
- Department of Experimental Psychology, University of Cambridge, Cambridge, UK
| | - Brian C J Moore
- Department of Experimental Psychology, University of Cambridge, Cambridge, UK
| |
Collapse
|
49
|
Bellur A, Elhilali M. Audio object classification using distributed beliefs and attention. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 2020; 28:729-739. [PMID: 33564695 PMCID: PMC7869589 DOI: 10.1109/taslp.2020.2966867] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
One of the unique characteristics of human hearing is its ability to recognize acoustic objects even in presence of severe noise and distortions. In this work, we explore two mechanisms underlying this ability: 1) redundant mapping of acoustic waveforms along distributed latent representations and 2) adaptive feedback based on prior knowledge to selectively attend to targets of interest. We propose a bio-mimetic account of acoustic object classification by developing a novel distributed deep belief network validated for the task of robust acoustic object classification using the UrbanSound database. The proposed distributed belief network (DBN) encompasses an array of independent sub-networks trained generatively to capture different abstractions of natural sounds. A supervised classifier then performs a readout of this distributed mapping. The overall architecture not only matches the state of the art system for acoustic object classification but leads to significant improvement over the baseline in mismatched noisy conditions (31.4% relative improvement in 0dB conditions). Furthermore, we incorporate mechanisms of attentional feedback that allows the DBN to deploy local memories of sounds targets estimated at multiple views to bias network activation when attending to a particular object. This adaptive feedback results in further improvement of object classification in unseen noise conditions (relative improvement of 54% over the baseline in 0dB conditions).
Collapse
Affiliation(s)
- Ashwin Bellur
- Department of Electrical and Computer Engineering, Laboratory for Computational Audio Perception, Johns Hopkins University
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Laboratory for Computational Audio Perception, Johns Hopkins University
| |
Collapse
|
50
|
Nechaev DI, Milekhina ON, Supin AY. Estimates of Ripple-Density Resolution Based on the Discrimination From Rippled and Nonrippled Reference Signals. Trends Hear 2019; 23:2331216518824435. [PMID: 30669951 DOI: 10.1177/2331216518824435] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Rippled-spectrum stimuli are used to evaluate the resolution of the spectro-temporal structure of sounds. Measurements of spectrum-pattern resolution imply the discrimination between the test and reference stimuli. Therefore, estimates of rippled-pattern resolution could depend on both the test stimulus and the reference stimulus type. In this study, the ripple-density resolution was measured using combinations of two test stimuli and two reference stimuli. The test stimuli were rippled-spectrum signals with constant phase or rippled-spectrum signals with ripple-phase reversals. The reference stimuli were rippled-spectrum signals with opposite ripple phase to the test or nonrippled signals. The spectra were centered at 2 kHz and had an equivalent rectangular bandwidth of 1 oct and a level of 70 dB sound pressure level. A three-alternative forced-choice procedure was combined with an adaptive procedure. With rippled reference stimuli, the mean ripple-density resolution limits were 8.9 ripples/oct (phase-reversals test stimulus) or 7.7 ripples/oct (constant-phase test stimulus). With nonrippled reference stimuli, the mean resolution limits were 26.1 ripples/oct (phase-reversals test stimulus) or 22.2 ripples/oct (constant-phase test stimulus). Different contributions of excitation-pattern and temporal-processing mechanisms are assumed for measurements with rippled and nonrippled reference stimuli: The excitation-pattern mechanism is more effective for the discrimination of rippled stimuli that differ in their ripple-phase patterns, whereas the temporal-processing mechanism is more effective for the discrimination of rippled and nonrippled stimuli.
Collapse
Affiliation(s)
- Dmitry I Nechaev
- 1 Institute of Ecology and Evolution, Russian Academy of Sciences, Moscow, Russia
| | - Olga N Milekhina
- 1 Institute of Ecology and Evolution, Russian Academy of Sciences, Moscow, Russia
| | - Alexander Ya Supin
- 1 Institute of Ecology and Evolution, Russian Academy of Sciences, Moscow, Russia
| |
Collapse
|