1
|
Herff SA, Bonetti L, Cecchetti G, Vuust P, Kringelbach ML, Rohrmeier MA. Hierarchical syntax model of music predicts theta power during music listening. Neuropsychologia 2024; 199:108905. [PMID: 38740179 DOI: 10.1016/j.neuropsychologia.2024.108905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 03/07/2024] [Accepted: 05/06/2024] [Indexed: 05/16/2024]
Abstract
Linguistic research showed that the depth of syntactic embedding is reflected in brain theta power. Here, we test whether this also extends to non-linguistic stimuli, specifically music. We used a hierarchical model of musical syntax to continuously quantify two types of expert-annotated harmonic dependencies throughout a piece of Western classical music: prolongation and preparation. Prolongations can roughly be understood as a musical analogue to linguistic coordination between constituents that share the same function (e.g., 'pizza' and 'pasta' in 'I ate pizza and pasta'). Preparation refers to the dependency between two harmonies whereby the first implies a resolution towards the second (e.g., dominant towards tonic; similar to how the adjective implies the presence of a noun in 'I like spicy … '). Source reconstructed MEG data of sixty-five participants listening to the musical piece was then analysed. We used Bayesian Mixed Effects models to predict theta envelope in the brain, using the number of open prolongation and preparation dependencies as predictors whilst controlling for audio envelope. We observed that prolongation and preparation both carry independent and distinguishable predictive value for theta band fluctuation in key linguistic areas such as the Angular, Superior Temporal, and Heschl's Gyri, or their right-lateralised homologues, with preparation showing additional predictive value for areas associated with the reward system and prediction. Musical expertise further mediated these effects in language-related brain areas. Results show that predictions of precisely formalised music-theoretical models are reflected in the brain activity of listeners which furthers our understanding of the perception and cognition of musical structure.
Collapse
Affiliation(s)
- Steffen A Herff
- Sydney Conservatorium of Music, University of Sydney, Sydney, Australia; The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Sydney, Australia; Digital and Cognitive Musicology Lab, College of Humanities, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.
| | - Leonardo Bonetti
- Center for Music in the Brain, Department of Clinical Medicine, Aarhus University & The Royal Academy of Music, Aarhus/Aalborg, Denmark; Centre for Eudaimonia and Human Flourishing, Linacre College, University of Oxford, Oxford, United Kingdom; Department of Psychiatry, University of Oxford, Oxford, United Kingdom
| | - Gabriele Cecchetti
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Sydney, Australia; Digital and Cognitive Musicology Lab, College of Humanities, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Peter Vuust
- Center for Music in the Brain, Department of Clinical Medicine, Aarhus University & The Royal Academy of Music, Aarhus/Aalborg, Denmark
| | - Morten L Kringelbach
- Center for Music in the Brain, Department of Clinical Medicine, Aarhus University & The Royal Academy of Music, Aarhus/Aalborg, Denmark; Centre for Eudaimonia and Human Flourishing, Linacre College, University of Oxford, Oxford, United Kingdom; Department of Psychiatry, University of Oxford, Oxford, United Kingdom
| | - Martin A Rohrmeier
- Digital and Cognitive Musicology Lab, College of Humanities, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| |
Collapse
|
2
|
Kries J, De Clercq P, Gillis M, Vanthornhout J, Lemmens R, Francart T, Vandermosten M. Exploring neural tracking of acoustic and linguistic speech representations in individuals with post-stroke aphasia. Hum Brain Mapp 2024; 45:e26676. [PMID: 38798131 PMCID: PMC11128780 DOI: 10.1002/hbm.26676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 03/04/2024] [Accepted: 03/21/2024] [Indexed: 05/29/2024] Open
Abstract
Aphasia is a communication disorder that affects processing of language at different levels (e.g., acoustic, phonological, semantic). Recording brain activity via Electroencephalography while people listen to a continuous story allows to analyze brain responses to acoustic and linguistic properties of speech. When the neural activity aligns with these speech properties, it is referred to as neural tracking. Even though measuring neural tracking of speech may present an interesting approach to studying aphasia in an ecologically valid way, it has not yet been investigated in individuals with stroke-induced aphasia. Here, we explored processing of acoustic and linguistic speech representations in individuals with aphasia in the chronic phase after stroke and age-matched healthy controls. We found decreased neural tracking of acoustic speech representations (envelope and envelope onsets) in individuals with aphasia. In addition, word surprisal displayed decreased amplitudes in individuals with aphasia around 195 ms over frontal electrodes, although this effect was not corrected for multiple comparisons. These results show that there is potential to capture language processing impairments in individuals with aphasia by measuring neural tracking of continuous speech. However, more research is needed to validate these results. Nonetheless, this exploratory study shows that neural tracking of naturalistic, continuous speech presents a powerful approach to studying aphasia.
Collapse
Affiliation(s)
- Jill Kries
- Experimental Oto‐Rhino‐Laryngology, Department of Neurosciences, Leuven Brain InstituteKU LeuvenLeuvenBelgium
- Department of PsychologyStanford UniversityStanfordCaliforniaUSA
| | - Pieter De Clercq
- Experimental Oto‐Rhino‐Laryngology, Department of Neurosciences, Leuven Brain InstituteKU LeuvenLeuvenBelgium
| | - Marlies Gillis
- Experimental Oto‐Rhino‐Laryngology, Department of Neurosciences, Leuven Brain InstituteKU LeuvenLeuvenBelgium
| | - Jonas Vanthornhout
- Experimental Oto‐Rhino‐Laryngology, Department of Neurosciences, Leuven Brain InstituteKU LeuvenLeuvenBelgium
| | - Robin Lemmens
- Experimental Neurology, Department of NeurosciencesKU LeuvenLeuvenBelgium
- Laboratory of Neurobiology, VIB‐KU Leuven Center for Brain and Disease ResearchLeuvenBelgium
- Department of NeurologyUniversity Hospitals LeuvenLeuvenBelgium
| | - Tom Francart
- Experimental Oto‐Rhino‐Laryngology, Department of Neurosciences, Leuven Brain InstituteKU LeuvenLeuvenBelgium
| | - Maaike Vandermosten
- Experimental Oto‐Rhino‐Laryngology, Department of Neurosciences, Leuven Brain InstituteKU LeuvenLeuvenBelgium
| |
Collapse
|
3
|
Kaneshiro B, Nguyen DT, Norcia AM, Dmochowski JP, Berger J. Inter-subject correlation of electroencephalographic and behavioural responses reflects time-varying engagement with natural music. Eur J Neurosci 2024; 59:3162-3183. [PMID: 38626924 DOI: 10.1111/ejn.16324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 02/28/2024] [Accepted: 03/07/2024] [Indexed: 06/15/2024]
Abstract
Musical engagement can be conceptualized through various activities, modes of listening and listener states. Recent research has reported that a state of focused engagement can be indexed by the inter-subject correlation (ISC) of audience responses to a shared naturalistic stimulus. While statistically significant ISC has been reported during music listening, we lack insight into the temporal dynamics of engagement over the course of musical works-such as those composed in the Western classical style-which involve the formulation of expectations that are realized or derailed at subsequent points of arrival. Here, we use the ISC of electroencephalographic (EEG) and continuous behavioural (CB) responses to investigate the time-varying dynamics of engagement with functional tonal music. From a sample of adult musicians who listened to a complete cello concerto movement, we found that ISC varied throughout the excerpt for both measures. In particular, significant EEG ISC was observed during periods of musical tension that built to climactic highpoints, while significant CB ISC corresponded more to declarative entrances and points of arrival. Moreover, we found that a control stimulus retaining envelope characteristics of the intact music, but little other temporal structure, also elicited significantly correlated EEG and CB responses, though to lesser extents than the original version. In sum, these findings shed light on the temporal dynamics of engagement during music listening and clarify specific aspects of musical engagement that may be indexed by each measure.
Collapse
Affiliation(s)
- Blair Kaneshiro
- Center for Computer Research in Music and Acoustics, Stanford University, Stanford, California, USA
- Center for the Study of Language and Information, Stanford University, Stanford, California, USA
- Graduate School of Education, Stanford University, Stanford, California, USA
| | - Duc T Nguyen
- Center for Computer Research in Music and Acoustics, Stanford University, Stanford, California, USA
- Center for the Study of Language and Information, Stanford University, Stanford, California, USA
| | - Anthony M Norcia
- Department of Psychology, Stanford University, Stanford, California, USA
- Wu Tsai Neurosciences Institute, Stanford University, Stanford, California, USA
| | - Jacek P Dmochowski
- Department of Biomedical Engineering, City College of New York, New York, New York, USA
| | - Jonathan Berger
- Center for Computer Research in Music and Acoustics, Stanford University, Stanford, California, USA
| |
Collapse
|
4
|
Sergeeva A, Christensen CB, Kidmose P. Towards ASSR-based hearing assessment using natural sounds. J Neural Eng 2024; 21:026045. [PMID: 38579741 DOI: 10.1088/1741-2552/ad3b6b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 04/05/2024] [Indexed: 04/07/2024]
Abstract
Objective. The auditory steady-state response (ASSR) allows estimation of hearing thresholds. The ASSR can be estimated from electroencephalography (EEG) recordings from electrodes positioned on both the scalp and within the ear (ear-EEG). Ear-EEG can potentially be integrated into hearing aids, which would enable automatic fitting of the hearing device in daily life. The conventional stimuli for ASSR-based hearing assessment, such as pure tones and chirps, are monotonous and tiresome, making them inconvenient for repeated use in everyday situations. In this study we investigate the use of natural speech sounds for ASSR estimation.Approach.EEG was recorded from 22 normal hearing subjects from both scalp and ear electrodes. Subjects were stimulated monaurally with 180 min of speech stimulus modified by applying a 40 Hz amplitude modulation (AM) to an octave frequency sub-band centered at 1 kHz. Each 50 ms sub-interval in the AM sub-band was scaled to match one of 10 pre-defined levels (0-45 dB sensation level, 5 dB steps). The apparent latency for the ASSR was estimated as the maximum average cross-correlation between the envelope of the AM sub-band and the recorded EEG and was used to align the EEG signal with the audio signal. The EEG was then split up into sub-epochs of 50 ms length and sorted according to the stimulation level. ASSR was estimated for each level for both scalp- and ear-EEG.Main results. Significant ASSRs with increasing amplitude as a function of presentation level were recorded from both scalp and ear electrode configurations.Significance. Utilizing natural sounds in ASSR estimation offers the potential for electrophysiological hearing assessment that are more comfortable and less fatiguing compared to existing ASSR methods. Combined with ear-EEG, this approach may allow convenient hearing threshold estimation in everyday life, utilizing ambient sounds. Additionally, it may facilitate both initial fitting and subsequent adjustments of hearing aids outside of clinical settings.
Collapse
Affiliation(s)
- Anna Sergeeva
- Department of Electrical and Computer Engineering, Aarhus University, Finlandsgade 22, 8200 Aarhus N, Denmark
| | - Christian Bech Christensen
- Department of Electrical and Computer Engineering, Aarhus University, Finlandsgade 22, 8200 Aarhus N, Denmark
| | - Preben Kidmose
- Department of Electrical and Computer Engineering, Aarhus University, Finlandsgade 22, 8200 Aarhus N, Denmark
| |
Collapse
|
5
|
Karunathilake IMD, Brodbeck C, Bhattasali S, Resnik P, Simon JZ. Neural Dynamics of the Processing of Speech Features: Evidence for a Progression of Features from Acoustic to Sentential Processing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.02.578603. [PMID: 38352332 PMCID: PMC10862830 DOI: 10.1101/2024.02.02.578603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]
Abstract
When we listen to speech, our brain's neurophysiological responses "track" its acoustic features, but it is less well understood how these auditory responses are modulated by linguistic content. Here, we recorded magnetoencephalography (MEG) responses while subjects listened to four types of continuous-speech-like passages: speech-envelope modulated noise, English-like non-words, scrambled words, and narrative passage. Temporal response function (TRF) analysis provides strong neural evidence for the emergent features of speech processing in cortex, from acoustics to higher-level linguistics, as incremental steps in neural speech processing. Critically, we show a stepwise hierarchical progression of progressively higher order features over time, reflected in both bottom-up (early) and top-down (late) processing stages. Linguistically driven top-down mechanisms take the form of late N400-like responses, suggesting a central role of predictive coding mechanisms at multiple levels. As expected, the neural processing of lower-level acoustic feature responses is bilateral or right lateralized, with left lateralization emerging only for lexical-semantic features. Finally, our results identify potential neural markers of the computations underlying speech perception and comprehension.
Collapse
Affiliation(s)
| | - Christian Brodbeck
- Department of Computing and Software, McMaster University, Hamilton, ON, Canada
| | - Shohini Bhattasali
- Department of Language Studies, University of Toronto, Scarborough, Canada
| | - Philip Resnik
- Department of Linguistics and Institute for Advanced Computer Studies, University of Maryland, College Park, MD, USA
| | - Jonathan Z Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, USA
- Department of Biology, University of Maryland, College Park, MD, USA
- Institute for Systems Research, University of Maryland, College Park, MD, USA
| |
Collapse
|
6
|
MacIntyre AD, Carlyon RP, Goehring T. Neural Decoding of the Speech Envelope: Effects of Intelligibility and Spectral Degradation. Trends Hear 2024; 28:23312165241266316. [PMID: 39183533 PMCID: PMC11345737 DOI: 10.1177/23312165241266316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 05/23/2024] [Accepted: 06/16/2024] [Indexed: 08/27/2024] Open
Abstract
During continuous speech perception, endogenous neural activity becomes time-locked to acoustic stimulus features, such as the speech amplitude envelope. This speech-brain coupling can be decoded using non-invasive brain imaging techniques, including electroencephalography (EEG). Neural decoding may provide clinical use as an objective measure of stimulus encoding by the brain-for example during cochlear implant listening, wherein the speech signal is severely spectrally degraded. Yet, interplay between acoustic and linguistic factors may lead to top-down modulation of perception, thereby complicating audiological applications. To address this ambiguity, we assess neural decoding of the speech envelope under spectral degradation with EEG in acoustically hearing listeners (n = 38; 18-35 years old) using vocoded speech. We dissociate sensory encoding from higher-order processing by employing intelligible (English) and non-intelligible (Dutch) stimuli, with auditory attention sustained using a repeated-phrase detection task. Subject-specific and group decoders were trained to reconstruct the speech envelope from held-out EEG data, with decoder significance determined via random permutation testing. Whereas speech envelope reconstruction did not vary by spectral resolution, intelligible speech was associated with better decoding accuracy in general. Results were similar across subject-specific and group analyses, with less consistent effects of spectral degradation in group decoding. Permutation tests revealed possible differences in decoder statistical significance by experimental condition. In general, while robust neural decoding was observed at the individual and group level, variability within participants would most likely prevent the clinical use of such a measure to differentiate levels of spectral degradation and intelligibility on an individual basis.
Collapse
Affiliation(s)
| | - Robert P. Carlyon
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| | - Tobias Goehring
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| |
Collapse
|
7
|
Ha J, Baek SC, Lim Y, Chung JH. Validation of cost-efficient EEG experimental setup for neural tracking in an auditory attention task. Sci Rep 2023; 13:22682. [PMID: 38114579 PMCID: PMC10730561 DOI: 10.1038/s41598-023-49990-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 12/14/2023] [Indexed: 12/21/2023] Open
Abstract
When individuals listen to speech, their neural activity phase-locks to the slow temporal rhythm, which is commonly referred to as "neural tracking". The neural tracking mechanism allows for the detection of an attended sound source in a multi-talker situation by decoding neural signals obtained by electroencephalography (EEG), known as auditory attention decoding (AAD). Neural tracking with AAD can be utilized as an objective measurement tool for diverse clinical contexts, and it has potential to be applied to neuro-steered hearing devices. To effectively utilize this technology, it is essential to enhance the accessibility of EEG experimental setup and analysis. The aim of the study was to develop a cost-efficient neural tracking system and validate the feasibility of neural tracking measurement by conducting an AAD task using an offline and real-time decoder model outside the soundproof environment. We devised a neural tracking system capable of conducting AAD experiments using an OpenBCI and Arduino board. Nine participants were recruited to assess the performance of the AAD using the developed system, which involved presenting competing speech signals in an experiment setting without soundproofing. As a result, the offline decoder model demonstrated an average performance of 90%, and real-time decoder model exhibited a performance of 78%. The present study demonstrates the feasibility of implementing neural tracking and AAD using cost-effective devices in a practical environment.
Collapse
Affiliation(s)
- Jiyeon Ha
- Department of HY-KIST Bio-Convergence, Hanyang University, Seoul, 04763, Korea
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul, 02792, Korea
| | - Seung-Cheol Baek
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul, 02792, Korea
- Research Group Neurocognition of Music and Language, Max Planck Institute for Empirical Aesthetics, 60322, Frankfurt\ Main, Germany
| | - Yoonseob Lim
- Department of HY-KIST Bio-Convergence, Hanyang University, Seoul, 04763, Korea.
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul, 02792, Korea.
| | - Jae Ho Chung
- Department of HY-KIST Bio-Convergence, Hanyang University, Seoul, 04763, Korea.
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul, 02792, Korea.
- Department of Otolaryngology-Head and Neck Surgery, College of Medicine, Hanyang University, Seoul, 04763, Korea.
- Department of Otolaryngology-Head and Neck Surgery, School of Medicine, Hanyang University, 222-Wangshimni-ro, Seongdong-gu, Seoul, 133-792, Korea.
| |
Collapse
|
8
|
Ahmed F, Nidiffer AR, Lalor EC. The effect of gaze on EEG measures of multisensory integration in a cocktail party scenario. Front Hum Neurosci 2023; 17:1283206. [PMID: 38162285 PMCID: PMC10754997 DOI: 10.3389/fnhum.2023.1283206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 11/20/2023] [Indexed: 01/03/2024] Open
Abstract
Seeing the speaker's face greatly improves our speech comprehension in noisy environments. This is due to the brain's ability to combine the auditory and the visual information around us, a process known as multisensory integration. Selective attention also strongly influences what we comprehend in scenarios with multiple speakers-an effect known as the cocktail-party phenomenon. However, the interaction between attention and multisensory integration is not fully understood, especially when it comes to natural, continuous speech. In a recent electroencephalography (EEG) study, we explored this issue and showed that multisensory integration is enhanced when an audiovisual speaker is attended compared to when that speaker is unattended. Here, we extend that work to investigate how this interaction varies depending on a person's gaze behavior, which affects the quality of the visual information they have access to. To do so, we recorded EEG from 31 healthy adults as they performed selective attention tasks in several paradigms involving two concurrently presented audiovisual speakers. We then modeled how the recorded EEG related to the audio speech (envelope) of the presented speakers. Crucially, we compared two classes of model - one that assumed underlying multisensory integration (AV) versus another that assumed two independent unisensory audio and visual processes (A+V). This comparison revealed evidence of strong attentional effects on multisensory integration when participants were looking directly at the face of an audiovisual speaker. This effect was not apparent when the speaker's face was in the peripheral vision of the participants. Overall, our findings suggest a strong influence of attention on multisensory integration when high fidelity visual (articulatory) speech information is available. More generally, this suggests that the interplay between attention and multisensory integration during natural audiovisual speech is dynamic and is adaptable based on the specific task and environment.
Collapse
Affiliation(s)
| | | | - Edmund C. Lalor
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, and Center for Visual Science, University of Rochester, Rochester, NY, United States
| |
Collapse
|
9
|
Di Liberto GM, Attaheri A, Cantisani G, Reilly RB, Ní Choisdealbha Á, Rocha S, Brusini P, Goswami U. Emergence of the cortical encoding of phonetic features in the first year of life. Nat Commun 2023; 14:7789. [PMID: 38040720 PMCID: PMC10692113 DOI: 10.1038/s41467-023-43490-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Accepted: 11/10/2023] [Indexed: 12/03/2023] Open
Abstract
Even prior to producing their first words, infants are developing a sophisticated speech processing system, with robust word recognition present by 4-6 months of age. These emergent linguistic skills, observed with behavioural investigations, are likely to rely on increasingly sophisticated neural underpinnings. The infant brain is known to robustly track the speech envelope, however previous cortical tracking studies were unable to demonstrate the presence of phonetic feature encoding. Here we utilise temporal response functions computed from electrophysiological responses to nursery rhymes to investigate the cortical encoding of phonetic features in a longitudinal cohort of infants when aged 4, 7 and 11 months, as well as adults. The analyses reveal an increasingly detailed and acoustically invariant phonetic encoding emerging over the first year of life, providing neurophysiological evidence that the pre-verbal human cortex learns phonetic categories. By contrast, we found no credible evidence for age-related increases in cortical tracking of the acoustic spectrogram.
Collapse
Affiliation(s)
- Giovanni M Di Liberto
- ADAPT Centre, School of Computer Science and Statistics, Trinity College, The University of Dublin, Dublin, Ireland.
- Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Dublin, Ireland.
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge, United Kingdom.
| | - Adam Attaheri
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge, United Kingdom
| | - Giorgia Cantisani
- ADAPT Centre, School of Computer Science and Statistics, Trinity College, The University of Dublin, Dublin, Ireland
- Laboratoire des Systémes Perceptifs, Département d'études Cognitives, École normale supérieure, PSL University, CNRS, 75005, Paris, France
| | - Richard B Reilly
- Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Dublin, Ireland
- School of Engineering, Trinity Centre for Biomedical Engineering, Trinity College, The University of Dublin., Dublin, Ireland
- School of Medicine, Trinity College, The University of Dublin, Dublin, Ireland
| | - Áine Ní Choisdealbha
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge, United Kingdom
| | - Sinead Rocha
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge, United Kingdom
| | - Perrine Brusini
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge, United Kingdom
| | - Usha Goswami
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
10
|
An H, Lee J, Suh MW, Lim Y. Neural correlation of speech envelope tracking for background noise in normal hearing. Front Neurosci 2023; 17:1268591. [PMID: 37916182 PMCID: PMC10616241 DOI: 10.3389/fnins.2023.1268591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 10/04/2023] [Indexed: 11/03/2023] Open
Abstract
Everyday speech communication often occurs in environments with background noise, and the impact of noise on speech recognition can vary depending on factors such as noise type, noise intensity, and the listener's hearing ability. However, the extent to which neural mechanisms in speech understanding are influenced by different types and levels of noise remains unknown. This study aims to investigate whether individuals exhibit distinct neural responses and attention strategies depending on noise conditions. We recorded electroencephalography (EEG) data from 20 participants with normal hearing (13 males) and evaluated both neural tracking of speech envelopes and behavioral performance in speech understanding in the presence of varying types of background noise. Participants engaged in an EEG experiment consisting of two separate sessions. The first session involved listening to a 12-min story presented binaurally without any background noise. In the second session, speech understanding scores were measured using matrix sentences presented under speech-shaped noise (SSN) and Story noise background noise conditions at noise levels corresponding to sentence recognitions score (SRS). We observed differences in neural envelope correlation depending on noise type but not on its level. Interestingly, the impact of noise type on the variation in envelope tracking was more significant among participants with higher speech perception scores, while those with lower scores exhibited similarities in envelope correlation regardless of the noise condition. The findings suggest that even individuals with normal hearing could adopt different strategies to understand speech in challenging listening environments, depending on the type of noise.
Collapse
Affiliation(s)
- HyunJung An
- Center for Intelligent and Interactive Robotics, Korea Institute of Science and Technology, Seoul, Republic of Korea
| | - JeeWon Lee
- Center for Intelligent and Interactive Robotics, Korea Institute of Science and Technology, Seoul, Republic of Korea
- Department of Electronic and Electrical Engineering, Ewha Womans University, Seoul, Republic of Korea
| | - Myung-Whan Suh
- Department of Otorhinolaryngology-Head and Neck Surgery, Seoul National University Hospital, Seoul, Republic of Korea
| | - Yoonseob Lim
- Center for Intelligent and Interactive Robotics, Korea Institute of Science and Technology, Seoul, Republic of Korea
- Department of HY-KIST Bio-convergence, Hanyang University, Seoul, Republic of Korea
| |
Collapse
|
11
|
Ahmed F, Nidiffer AR, Lalor EC. The effect of gaze on EEG measures of multisensory integration in a cocktail party scenario. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.23.554451. [PMID: 37662393 PMCID: PMC10473711 DOI: 10.1101/2023.08.23.554451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Seeing the speaker's face greatly improves our speech comprehension in noisy environments. This is due to the brain's ability to combine the auditory and the visual information around us, a process known as multisensory integration. Selective attention also strongly influences what we comprehend in scenarios with multiple speakers - an effect known as the cocktail-party phenomenon. However, the interaction between attention and multisensory integration is not fully understood, especially when it comes to natural, continuous speech. In a recent electroencephalography (EEG) study, we explored this issue and showed that multisensory integration is enhanced when an audiovisual speaker is attended compared to when that speaker is unattended. Here, we extend that work to investigate how this interaction varies depending on a person's gaze behavior, which affects the quality of the visual information they have access to. To do so, we recorded EEG from 31 healthy adults as they performed selective attention tasks in several paradigms involving two concurrently presented audiovisual speakers. We then modeled how the recorded EEG related to the audio speech (envelope) of the presented speakers. Crucially, we compared two classes of model - one that assumed underlying multisensory integration (AV) versus another that assumed two independent unisensory audio and visual processes (A+V). This comparison revealed evidence of strong attentional effects on multisensory integration when participants were looking directly at the face of an audiovisual speaker. This effect was not apparent when the speaker's face was in the peripheral vision of the participants. Overall, our findings suggest a strong influence of attention on multisensory integration when high fidelity visual (articulatory) speech information is available. More generally, this suggests that the interplay between attention and multisensory integration during natural audiovisual speech is dynamic and is adaptable based on the specific task and environment.
Collapse
Affiliation(s)
- Farhin Ahmed
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, and Center for Visual Science, University of Rochester, Rochester, NY 14627, USA
| | - Aaron R. Nidiffer
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, and Center for Visual Science, University of Rochester, Rochester, NY 14627, USA
| | - Edmund C. Lalor
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, and Center for Visual Science, University of Rochester, Rochester, NY 14627, USA
| |
Collapse
|
12
|
Deoisres S, Lu Y, Vanheusden FJ, Bell SL, Simpson DM. Continuous speech with pauses inserted between words increases cortical tracking of speech envelope. PLoS One 2023; 18:e0289288. [PMID: 37498891 PMCID: PMC10374040 DOI: 10.1371/journal.pone.0289288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Accepted: 07/17/2023] [Indexed: 07/29/2023] Open
Abstract
The decoding multivariate Temporal Response Function (decoder) or speech envelope reconstruction approach is a well-known tool for assessing the cortical tracking of speech envelope. It is used to analyse the correlation between the speech stimulus and the neural response. It is known that auditory late responses are enhanced with longer gaps between stimuli, but it is not clear if this applies to the decoder, and whether the addition of gaps/pauses in continuous speech could be used to increase the envelope reconstruction accuracy. We investigated this in normal hearing participants who listened to continuous speech with no added pauses (natural speech), and then with short (250 ms) or long (500 ms) silent pauses inserted between each word. The total duration for continuous speech stimulus with no, short, and long pauses were approximately, 10 minutes, 16 minutes, and 21 minutes, respectively. EEG and speech envelope were simultaneously acquired and then filtered into delta (1-4 Hz) and theta (4-8 Hz) frequency bands. In addition to analysing responses to the whole speech envelope, speech envelope was also segmented to focus response analysis on onset and non-onset regions of speech separately. Our results show that continuous speech with additional pauses inserted between words significantly increases the speech envelope reconstruction correlations compared to using natural speech, in both the delta and theta frequency bands. It also appears that these increase in speech envelope reconstruction are dominated by the onset regions in the speech envelope. Introducing pauses in speech stimuli has potential clinical benefit for increasing auditory evoked response detectability, though with the disadvantage of speech sounding less natural. The strong effect of pauses and onsets on the decoder should be considered when comparing results from different speech corpora. Whether the increased cortical response, when longer pauses are introduced, reflect improved intelligibility requires further investigation.
Collapse
Affiliation(s)
- Suwijak Deoisres
- Institute of Sound and Vibration Research, University of Southampton, Southampton, United Kingdom
| | - Yuhan Lu
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Frederique J Vanheusden
- Department of Engineering, School of Science and Technology, Nottingham Trent University, Nottingham, United Kingdom
| | - Steven L Bell
- Institute of Sound and Vibration Research, University of Southampton, Southampton, United Kingdom
| | - David M Simpson
- Institute of Sound and Vibration Research, University of Southampton, Southampton, United Kingdom
| |
Collapse
|
13
|
Van Herck S, Economou M, Bempt FV, Ghesquière P, Vandermosten M, Wouters J. Pulsatile modulation greatly enhances neural synchronization at syllable rate in children. Neuroimage 2023:120223. [PMID: 37315772 DOI: 10.1016/j.neuroimage.2023.120223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 05/22/2023] [Accepted: 06/11/2023] [Indexed: 06/16/2023] Open
Abstract
Neural processing of the speech envelope is of crucial importance for speech perception and comprehension. This envelope processing is often investigated by measuring neural synchronization to sinusoidal amplitude-modulated stimuli at different modulation frequencies. However, it has been argued that these stimuli lack ecological validity. Pulsatile amplitude-modulated stimuli, on the other hand, are suggested to be more ecologically valid and efficient, and have increased potential to uncover the neural mechanisms behind some developmental disorders such a dyslexia. Nonetheless, pulsatile stimuli have not yet been investigated in pre-reading and beginning reading children, which is a crucial age for developmental reading research. We performed a longitudinal study to examine the potential of pulsatile stimuli in this age range. Fifty-two typically reading children were tested at three time points from the middle of their last year of kindergarten (5 years old) to the end of first grade (7 years old). Using electroencephalography, we measured neural synchronization to syllable rate and phoneme rate sinusoidal and pulsatile amplitude-modulated stimuli. Our results revealed that the pulsatile stimuli significantly enhance neural synchronization at syllable rate, compared to the sinusoidal stimuli. Additionally, the pulsatile stimuli at syllable rate elicited a different hemispheric specialization, more closely resembling natural speech envelope tracking. We postulate that using the pulsatile stimuli greatly increases EEG data acquisition efficiency compared to the common sinusoidal amplitude-modulated stimuli in research in younger children and in developmental reading research.
Collapse
Affiliation(s)
- Shauni Van Herck
- Research Group ExpORL, Department of Neurosciences, KU Leuven, Belgium; Parenting and Special Education Research Unit, Faculty of Psychology and Educational Sciences, KU Leuven, Belgium.
| | - Maria Economou
- Research Group ExpORL, Department of Neurosciences, KU Leuven, Belgium; Parenting and Special Education Research Unit, Faculty of Psychology and Educational Sciences, KU Leuven, Belgium
| | - Femke Vanden Bempt
- Research Group ExpORL, Department of Neurosciences, KU Leuven, Belgium; Parenting and Special Education Research Unit, Faculty of Psychology and Educational Sciences, KU Leuven, Belgium
| | - Pol Ghesquière
- Parenting and Special Education Research Unit, Faculty of Psychology and Educational Sciences, KU Leuven, Belgium
| | | | - Jan Wouters
- Research Group ExpORL, Department of Neurosciences, KU Leuven, Belgium
| |
Collapse
|
14
|
Ribas-Prats T, Arenillas-Alcón S, Pérez-Cruz M, Costa-Faidella J, Gómez-Roig MD, Escera C. Speech-Encoding Deficits in Neonates Born Large-for-Gestational Age as Revealed With the Envelope Frequency-Following Response. Ear Hear 2023:00003446-990000000-00115. [PMID: 36759954 DOI: 10.1097/aud.0000000000001330] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
Abstract
OBJECTIVES The present envelope frequency-following response (FFRENV) study aimed at characterizing the neural encoding of the fundamental frequency of speech sounds in neonates born at the higher end of the birth weight continuum (>90th percentile), known as large-for-gestational age (LGA). DESIGN Twenty-five LGA newborns were recruited from the maternity unit of Sant Joan de Déu Barcelona Children's Hospital and paired by age and sex with 25 babies born adequate-for-gestational age (AGA), all from healthy mothers and normal pregnancies. FFRENVs were elicited to the/da/ syllable and recorded while the baby was sleeping in its cradle after a successful universal hearing screening. Neural encoding of the stimulus' envelope of the fundamental frequency (F0ENV) was characterized through the FFRENV spectral amplitude. Relationships between electrophysiological parameters and maternal/neonatal variables that may condition neonatal neurodevelopment were assessed, including pregestational body mass index (BMI), maternal gestational weight gain and neonatal BMI. RESULTS LGA newborns showed smaller spectral amplitudes at the F0ENV compared to the AGA group. Significant negative correlations were found between neonatal BMI and the spectral amplitude at the F0ENV. CONCLUSIONS Our results indicate that in spite of having a healthy pregnancy, LGA neonates' central auditory system is impaired in encoding a fundamental aspect of the speech sounds, namely their fundamental frequency. The negative correlation between the neonates' BMI and FFRENV indicates that this impaired encoding is independent of the pregnant woman BMI and weight gain during pregnancy, supporting the role of the neonatal BMI. We suggest that the higher adipose tissue observed in the LGA group may impair, via proinflammatory products, the fine-grained central auditory system microstructure required for the neural encoding of the fundamental frequency of speech sounds.
Collapse
Affiliation(s)
- Teresa Ribas-Prats
- Brainlab-Cognitive Neuroscience Research Group. Department of Clinical Psychology and Psychobiology, University of Barcelona, Catalonia, Spain.,Institute of Neurosciences, University of Barcelona, Catalonia, Spain.,Institut de Recerca Sant Joan de Déu, Esplugues de Llobregat, Barcelona, Catalonia, Spain
| | - Sonia Arenillas-Alcón
- Brainlab-Cognitive Neuroscience Research Group. Department of Clinical Psychology and Psychobiology, University of Barcelona, Catalonia, Spain.,Institute of Neurosciences, University of Barcelona, Catalonia, Spain.,Institut de Recerca Sant Joan de Déu, Esplugues de Llobregat, Barcelona, Catalonia, Spain
| | - Míriam Pérez-Cruz
- Institut de Recerca Sant Joan de Déu, Esplugues de Llobregat, Barcelona, Catalonia, Spain.,BCNatal-Barcelona Center for Maternal Fetal and Neonatal Medicine (Hospital Sant Joan de Déu and Hospital Clínic), University of Barcelona, Barcelona, Catalonia, Spain
| | - Jordi Costa-Faidella
- Brainlab-Cognitive Neuroscience Research Group. Department of Clinical Psychology and Psychobiology, University of Barcelona, Catalonia, Spain.,Institute of Neurosciences, University of Barcelona, Catalonia, Spain.,Institut de Recerca Sant Joan de Déu, Esplugues de Llobregat, Barcelona, Catalonia, Spain
| | - Maria Dolores Gómez-Roig
- Institut de Recerca Sant Joan de Déu, Esplugues de Llobregat, Barcelona, Catalonia, Spain.,BCNatal-Barcelona Center for Maternal Fetal and Neonatal Medicine (Hospital Sant Joan de Déu and Hospital Clínic), University of Barcelona, Barcelona, Catalonia, Spain
| | - Carles Escera
- Brainlab-Cognitive Neuroscience Research Group. Department of Clinical Psychology and Psychobiology, University of Barcelona, Catalonia, Spain.,Institute of Neurosciences, University of Barcelona, Catalonia, Spain.,Institut de Recerca Sant Joan de Déu, Esplugues de Llobregat, Barcelona, Catalonia, Spain
| |
Collapse
|
15
|
Mesik J, Wojtczak M. The effects of data quantity on performance of temporal response function analyses of natural speech processing. Front Neurosci 2023; 16:963629. [PMID: 36711133 PMCID: PMC9878558 DOI: 10.3389/fnins.2022.963629] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 12/26/2022] [Indexed: 01/15/2023] Open
Abstract
In recent years, temporal response function (TRF) analyses of neural activity recordings evoked by continuous naturalistic stimuli have become increasingly popular for characterizing response properties within the auditory hierarchy. However, despite this rise in TRF usage, relatively few educational resources for these tools exist. Here we use a dual-talker continuous speech paradigm to demonstrate how a key parameter of experimental design, the quantity of acquired data, influences TRF analyses fit to either individual data (subject-specific analyses), or group data (generic analyses). We show that although model prediction accuracy increases monotonically with data quantity, the amount of data required to achieve significant prediction accuracies can vary substantially based on whether the fitted model contains densely (e.g., acoustic envelope) or sparsely (e.g., lexical surprisal) spaced features, especially when the goal of the analyses is to capture the aspect of neural responses uniquely explained by specific features. Moreover, we demonstrate that generic models can exhibit high performance on small amounts of test data (2-8 min), if they are trained on a sufficiently large data set. As such, they may be particularly useful for clinical and multi-task study designs with limited recording time. Finally, we show that the regularization procedure used in fitting TRF models can interact with the quantity of data used to fit the models, with larger training quantities resulting in systematically larger TRF amplitudes. Together, demonstrations in this work should aid new users of TRF analyses, and in combination with other tools, such as piloting and power analyses, may serve as a detailed reference for choosing acquisition duration in future studies.
Collapse
Affiliation(s)
- Juraj Mesik
- Department of Psychology, University of Minnesota, Minneapolis, MN, United States
| | | |
Collapse
|
16
|
Lu H, Mehta AH, Oxenham AJ. Methodological considerations when measuring and analyzing auditory steady-state responses with multi-channel EEG. CURRENT RESEARCH IN NEUROBIOLOGY 2022; 3:100061. [PMID: 36386860 PMCID: PMC9647176 DOI: 10.1016/j.crneur.2022.100061] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 07/11/2022] [Accepted: 10/19/2022] [Indexed: 11/06/2022] Open
Abstract
The auditory steady-state response (ASSR) has been traditionally recorded with few electrodes and is often measured as the voltage difference between mastoid and vertex electrodes (vertical montage). As high-density EEG recording systems have gained popularity, multi-channel analysis methods have been developed to integrate the ASSR signal across channels. The phases of ASSR across electrodes can be affected by factors including the stimulus modulation rate and re-referencing strategy, which will in turn affect the estimated ASSR strength. To explore the relationship between the classical vertical-montage ASSR and whole-scalp ASSR, we applied these two techniques to the same data to estimate the strength of ASSRs evoked by tones with sinusoidal amplitude modulation rates of around 40, 100, and 200 Hz. The whole-scalp methods evaluated in our study, with either linked-mastoid or common-average reference, included ones that assume equal phase across all channels, as well as ones that allow for different phase relationships. The performance of simple averaging was compared to that of more complex methods involving principal component analysis. Overall, the root-mean-square of the phase locking values (PLVs) across all channels provided the most efficient method to detect ASSR across the range of modulation rates tested here.
Collapse
Affiliation(s)
- Hao Lu
- Department of Psychology, University of Minnesota, 75 East River Parkway, Minneapolis, MN, 55455, USA
| | - Anahita H. Mehta
- Department of Psychology, University of Minnesota, 75 East River Parkway, Minneapolis, MN, 55455, USA
| | - Andrew J. Oxenham
- Department of Psychology, University of Minnesota, 75 East River Parkway, Minneapolis, MN, 55455, USA
| |
Collapse
|
17
|
Gillis M, Van Canneyt J, Francart T, Vanthornhout J. Neural tracking as a diagnostic tool to assess the auditory pathway. Hear Res 2022; 426:108607. [PMID: 36137861 DOI: 10.1016/j.heares.2022.108607] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 08/11/2022] [Accepted: 09/12/2022] [Indexed: 11/20/2022]
Abstract
When a person listens to sound, the brain time-locks to specific aspects of the sound. This is called neural tracking and it can be investigated by analysing neural responses (e.g., measured by electroencephalography) to continuous natural speech. Measures of neural tracking allow for an objective investigation of a range of auditory and linguistic processes in the brain during natural speech perception. This approach is more ecologically valid than traditional auditory evoked responses and has great potential for research and clinical applications. This article reviews the neural tracking framework and highlights three prominent examples of neural tracking analyses: neural tracking of the fundamental frequency of the voice (f0), the speech envelope and linguistic features. Each of these analyses provides a unique point of view into the human brain's hierarchical stages of speech processing. F0-tracking assesses the encoding of fine temporal information in the early stages of the auditory pathway, i.e., from the auditory periphery up to early processing in the primary auditory cortex. Envelope tracking reflects bottom-up and top-down speech-related processes in the auditory cortex and is likely necessary but not sufficient for speech intelligibility. Linguistic feature tracking (e.g. word or phoneme surprisal) relates to neural processes more directly related to speech intelligibility. Together these analyses form a multi-faceted objective assessment of an individual's auditory and linguistic processing.
Collapse
Affiliation(s)
- Marlies Gillis
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium.
| | - Jana Van Canneyt
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Tom Francart
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Jonas Vanthornhout
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| |
Collapse
|
18
|
Raghavendra S, Lee S, Chun H, Martin BA, Tan CT. Cortical entrainment to speech produced by cochlear implant talkers and normal-hearing talkers. Front Neurosci 2022; 16:927872. [PMID: 36017176 PMCID: PMC9396306 DOI: 10.3389/fnins.2022.927872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 07/01/2022] [Indexed: 11/13/2022] Open
Abstract
Cochlear implants (CIs) are commonly used to restore the ability to hear in those with severe or profound hearing loss. CIs provide the necessary auditory feedback for them to monitor and control speech production. However, the speech produced by CI users may not be fully restored to achieve similar perceived sound quality to that produced by normal-hearing talkers and this difference is easily noticeable in their daily conversation. In this study, we attempt to address this difference as perceived by normal-hearing listeners, when listening to continuous speech produced by CI talkers and normal-hearing talkers. We used a regenerative model to decode and reconstruct the speech envelope from the single-trial electroencephalogram (EEG) recorded on the scalp of the normal-hearing listeners. Bootstrap Spearman correlation between the actual speech envelope and the envelope reconstructed from the EEG was computed as a metric to quantify the difference in response to the speech produced by the two talker groups. The same listeners were asked to rate the perceived sound quality of the speech produced by the two talker groups as a behavioral sound quality assessment. The results show that both the perceived sound quality ratings and the computed metric, which can be seen as the degree of cortical entrainment to the actual speech envelope across the normal-hearing listeners, were higher in value for speech produced by normal hearing talkers than that for CI talkers. The first purpose of the study was to determine how well the envelope of speech is represented neurophysiologically via its similarity to the envelope reconstructed from EEG. The second purpose was to show how well this representation of speech for both CI and normal hearing talker groups differentiates in term of perceived sound quality.
Collapse
Affiliation(s)
- Shruthi Raghavendra
- Department of Electrical and Computer Engineering, University of Texas at Dallas, Richardson, TX, United States
- *Correspondence: Shruthi Raghavendra,
| | - Sungmin Lee
- Department of Speech-Language Pathology and Audiology, Tongmyong University, Busan, South Korea
| | - Hyungi Chun
- Graduate Center, City University of New York, New York City, NY, United States
| | - Brett A. Martin
- Graduate Center, City University of New York, New York City, NY, United States
| | - Chin-Tuan Tan
- Department of Electrical and Computer Engineering, University of Texas at Dallas, Richardson, TX, United States
| |
Collapse
|
19
|
David W, Gransier R, Wouters J. Evaluation of phase-locking to parameterized speech envelopes. Front Neurol 2022; 13:852030. [PMID: 35989900 PMCID: PMC9382131 DOI: 10.3389/fneur.2022.852030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Accepted: 06/29/2022] [Indexed: 12/04/2022] Open
Abstract
Humans rely on the temporal processing ability of the auditory system to perceive speech during everyday communication. The temporal envelope of speech is essential for speech perception, particularly envelope modulations below 20 Hz. In the literature, the neural representation of this speech envelope is usually investigated by recording neural phase-locked responses to speech stimuli. However, these phase-locked responses are not only associated with envelope modulation processing, but also with processing of linguistic information at a higher-order level when speech is comprehended. It is thus difficult to disentangle the responses into components from the acoustic envelope itself and the linguistic structures in speech (such as words, phrases and sentences). Another way to investigate neural modulation processing is to use sinusoidal amplitude-modulated stimuli at different modulation frequencies to obtain the temporal modulation transfer function. However, these transfer functions are considerably variable across modulation frequencies and individual listeners. To tackle the issues of both speech and sinusoidal amplitude-modulated stimuli, the recently introduced Temporal Speech Envelope Tracking (TEMPEST) framework proposed the use of stimuli with a distribution of envelope modulations. The framework aims to assess the brain's capability to process temporal envelopes in different frequency bands using stimuli with speech-like envelope modulations. In this study, we provide a proof-of-concept of the framework using stimuli with modulation frequency bands around the syllable and phoneme rate in natural speech. We evaluated whether the evoked phase-locked neural activity correlates with the speech-weighted modulation transfer function measured using sinusoidal amplitude-modulated stimuli in normal-hearing listeners. Since many studies on modulation processing employ different metrics and comparing their results is difficult, we included different power- and phase-based metrics and investigate how these metrics relate to each other. Results reveal a strong correspondence across listeners between the neural activity evoked by the speech-like stimuli and the activity evoked by the sinusoidal amplitude-modulated stimuli. Furthermore, strong correspondence was also apparent between each metric, facilitating comparisons between studies using different metrics. These findings indicate the potential of the TEMPEST framework to efficiently assess the neural capability to process temporal envelope modulations within a frequency band that is important for speech perception.
Collapse
Affiliation(s)
- Wouter David
- ExpORL, Department of Neurosciences, KU Leuven, Leuven, Belgium
| | | | | |
Collapse
|
20
|
Raghavendra S, Lee S, Chen F, Martin BA, Tan CT. Cortical Entrainment to Speech Produced by Cochlear Implant Users and Normal-Hearing Talkers. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2022; 2022:3577-3581. [PMID: 36085647 DOI: 10.1109/embc48229.2022.9871276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The perceived sound quality of speech produced by hard-of-hearing individuals greatly depends on the degree and configuration of their hearing loss. A cochlear implant (CI) may provide some compensation and auditory feedback to monitor/control speech production. However, to date, the speech produced by CI users is still different in quality from that produced by normal-hearing (NH) talkers. In this study, we attempted to address this difference by examining the cortical activity of NH listeners when listening to continuous speech produced by 8 CI talkers and 8 NH talkers. We utilized a discriminative model to decode and reconstruct the speech envelope from the single-trial electroencephalogram (EEG) recorded from scalp electrode in NH listeners when listening to continuous speech. The correlation coefficient between the reconstructed envelope and original speech envelope was computed as a metric to quantify the difference in response to the speech produced by CI and NH talkers. The same listeners were asked to rate the perceived sound quality of the speech as a behavioral sound quality assessment. Both behavioral perceived sound quality ratings and the cortical entrainment to speech envelope were higher for the speech set produced by NH talkers than for the speech set produced by CI talkers.
Collapse
|
21
|
Muncke J, Kuruvila I, Hoppe U. Prediction of Speech Intelligibility by Means of EEG Responses to Sentences in Noise. Front Neurosci 2022; 16:876421. [PMID: 35720724 PMCID: PMC9198593 DOI: 10.3389/fnins.2022.876421] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 03/13/2022] [Indexed: 11/13/2022] Open
Abstract
Objective Understanding speech in noisy conditions is challenging even for people with mild hearing loss, and intelligibility for an individual person is usually evaluated by using several subjective test methods. In the last few years, a method has been developed to determine a temporal response function (TRF) between speech envelope and simultaneous electroencephalographic (EEG) measurements. By using this TRF it is possible to predict the EEG signal for any speech signal. Recent studies have suggested that the accuracy of this prediction varies with the level of noise added to the speech signal and can predict objectively the individual speech intelligibility. Here we assess the variations of the TRF itself when it is calculated for measurements with different signal-to-noise ratios and apply these variations to predict speech intelligibility. Methods For 18 normal hearing subjects the individual threshold of 50% speech intelligibility was determined by using a speech in noise test. Additionally, subjects listened passively to speech material of the speech in noise test at different signal-to-noise ratios close to individual threshold of 50% speech intelligibility while an EEG was recorded. Afterwards the shape of TRFs for each signal-to-noise ratio and subject were compared with the derived intelligibility. Results The strongest effect of variations in stimulus signal-to-noise ratio on the TRF shape occurred close to 100 ms after the stimulus presentation, and was located in the left central scalp region. The investigated variations in TRF morphology showed a strong correlation with speech intelligibility, and we were able to predict the individual threshold of 50% speech intelligibility with a mean deviation of less then 1.5 dB. Conclusion The intelligibility of speech in noise can be predicted by analyzing the shape of the TRF derived from different stimulus signal-to-noise ratios. Because TRFs are interpretable, in a manner similar to auditory evoked potentials, this method offers new options for clinical diagnostics.
Collapse
Affiliation(s)
- Jan Muncke
- Department of Audiology, ENT-Clinic, University Hospital Erlangen, Erlangen, Germany
| | - Ivine Kuruvila
- Department of Audiology, ENT-Clinic, University Hospital Erlangen, Erlangen, Germany
- WS Audiology, Erlangen, Germany
| | - Ulrich Hoppe
- Department of Audiology, ENT-Clinic, University Hospital Erlangen, Erlangen, Germany
| |
Collapse
|
22
|
Holtze B, Rosenkranz M, Jaeger M, Debener S, Mirkovic B. Ear-EEG Measures of Auditory Attention to Continuous Speech. Front Neurosci 2022; 16:869426. [PMID: 35592265 PMCID: PMC9111016 DOI: 10.3389/fnins.2022.869426] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 03/25/2022] [Indexed: 11/13/2022] Open
Abstract
Auditory attention is an important cognitive function used to separate relevant from irrelevant auditory information. However, most findings on attentional selection have been obtained in highly controlled laboratory settings using bulky recording setups and unnaturalistic stimuli. Recent advances in electroencephalography (EEG) facilitate the measurement of brain activity outside the laboratory, and around-the-ear sensors such as the cEEGrid promise unobtrusive acquisition. In parallel, methods such as speech envelope tracking, intersubject correlations and spectral entropy measures emerged which allow us to study attentional effects in the neural processing of natural, continuous auditory scenes. In the current study, we investigated whether these three attentional measures can be reliably obtained when using around-the-ear EEG. To this end, we analyzed the cEEGrid data of 36 participants who attended to one of two simultaneously presented speech streams. Speech envelope tracking results confirmed a reliable identification of the attended speaker from cEEGrid data. The accuracies in identifying the attended speaker increased when fitting the classification model to the individual. Artifact correction of the cEEGrid data with artifact subspace reconstruction did not increase the classification accuracy. Intersubject correlations were higher for those participants attending to the same speech stream than for those attending to different speech streams, replicating previously obtained results with high-density cap-EEG. We also found that spectral entropy decreased over time, possibly reflecting the decrease in the listener's level of attention. Overall, these results support the idea of using ear-EEG measurements to unobtrusively monitor auditory attention to continuous speech. This knowledge may help to develop assistive devices that support listeners separating relevant from irrelevant information in complex auditory environments.
Collapse
Affiliation(s)
- Björn Holtze
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany
| | - Marc Rosenkranz
- Neurophysiology of Everyday Life Group, Department of Psychology, University of Oldenburg, Oldenburg, Germany
| | - Manuela Jaeger
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany
- Division Hearing, Speech and Audio Technology, Fraunhofer Institute for Digital Media Technology IDMT, Oldenburg, Germany
| | - Stefan Debener
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany
- Research Center for Neurosensory Science, University of Oldenburg, Oldenburg, Germany
- Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany
| | - Bojana Mirkovic
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
23
|
Bröhl F, Keitel A, Kayser C. MEG Activity in Visual and Auditory Cortices Represents Acoustic Speech-Related Information during Silent Lip Reading. eNeuro 2022; 9:ENEURO.0209-22.2022. [PMID: 35728955 PMCID: PMC9239847 DOI: 10.1523/eneuro.0209-22.2022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Accepted: 06/06/2022] [Indexed: 11/21/2022] Open
Abstract
Speech is an intrinsically multisensory signal, and seeing the speaker's lips forms a cornerstone of communication in acoustically impoverished environments. Still, it remains unclear how the brain exploits visual speech for comprehension. Previous work debated whether lip signals are mainly processed along the auditory pathways or whether the visual system directly implements speech-related processes. To probe this, we systematically characterized dynamic representations of multiple acoustic and visual speech-derived features in source localized MEG recordings that were obtained while participants listened to speech or viewed silent speech. Using a mutual-information framework we provide a comprehensive assessment of how well temporal and occipital cortices reflect the physically presented signals and unique aspects of acoustic features that were physically absent but may be critical for comprehension. Our results demonstrate that both cortices feature a functionally specific form of multisensory restoration: during lip reading, they reflect unheard acoustic features, independent of co-existing representations of the visible lip movements. This restoration emphasizes the unheard pitch signature in occipital cortex and the speech envelope in temporal cortex and is predictive of lip-reading performance. These findings suggest that when seeing the speaker's lips, the brain engages both visual and auditory pathways to support comprehension by exploiting multisensory correspondences between lip movements and spectro-temporal acoustic cues.
Collapse
Affiliation(s)
- Felix Bröhl
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld University, Bielefeld 33615, Germany
| | - Anne Keitel
- Psychology, University of Dundee, Dundee DD1 4HN, United Kingdom
| | - Christoph Kayser
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld University, Bielefeld 33615, Germany
| |
Collapse
|
24
|
Phelps J, Attaheri A, Bozic M. How bilingualism modulates selective attention in children. Sci Rep 2022; 12:6381. [PMID: 35430617 PMCID: PMC9013372 DOI: 10.1038/s41598-022-09989-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 03/31/2022] [Indexed: 11/09/2022] Open
Abstract
AbstractThere is substantial evidence that learning and using multiple languages modulates selective attention in children. The current study investigated the mechanisms that drive this modification. Specifically, we asked whether the need for constant management of competing languages in bilinguals increases attentional capacity, or draws on the available resources such that they need to be economised to support optimal task performance. Monolingual and bilingual children aged 7–12 attended to a narrative presented in one ear, while ignoring different types of interference in the other ear. We used EEG to capture the neural encoding of attended and unattended speech envelopes, and assess how well they can be reconstructed from the responses of the neuronal populations that encode them. Despite equivalent behavioral performance, monolingual and bilingual children encoded attended speech differently, with the pattern of encoding across conditions in bilinguals suggesting a redistribution of the available attentional capacity, rather than its enhancement.
Collapse
|
25
|
Geirnaert S, Francart T, Bertrand A. Time-adaptive Unsupervised Auditory Attention Decoding Using EEG-based Stimulus Reconstruction. IEEE J Biomed Health Inform 2022; 26:3767-3778. [PMID: 35344501 DOI: 10.1109/jbhi.2022.3162760] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The goal of auditory attention decoding (AAD) is to determine to which speaker out of multiple competing speakers a listener is attending based on the brain signals recorded via, e.g., electroencephalography (EEG). AAD algorithms are a fundamental building block of so-called neuro-steered hearing devices that would allow identifying the speaker that should be amplified based on the brain activity. A common approach is to train a subject-specific stimulus decoder that reconstructs the amplitude envelope of the attended speech signal. However, training this decoder requires a dedicated 'ground-truth' EEG recording of the subject under test, during which the attended speaker is known. Furthermore, this decoder remains fixed during operation and can thus not adapt to changing conditions and situations. Therefore, we propose an online time-adaptive unsupervised stimulus reconstruction method that continuously and automatically adapts over time when new EEG and audio data are streaming in. The adaptive decoder does not require ground-truth attention labels obtained from a training session with the end-user and instead can be initialized with a generic subject-independent decoder or even completely random values. We propose two different implementations: a sliding window and recursive implementation, which we extensively validate on three independent datasets based on multiple performance metrics. We show that the proposed time-adaptive unsupervised decoder outperforms a time-invariant supervised decoder, representing an important step toward practically applicable AAD algorithms for neuro-steered hearing devices.
Collapse
|
26
|
Gillis M, Decruy L, Vanthornhout J, Francart T. Hearing loss is associated with delayed neural responses to continuous speech. Eur J Neurosci 2022; 55:1671-1690. [PMID: 35263814 DOI: 10.1111/ejn.15644] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 02/21/2022] [Accepted: 02/23/2022] [Indexed: 11/28/2022]
Abstract
We investigated the impact of hearing loss on the neural processing of speech. Using a forward modeling approach, we compared the neural responses to continuous speech of 14 adults with sensorineural hearing loss with those of age-matched normal-hearing peers. Compared to their normal-hearing peers, hearing-impaired listeners had increased neural tracking and delayed neural responses to continuous speech in quiet. The latency also increased with the degree of hearing loss. As speech understanding decreased, neural tracking decreased in both populations; however, a significantly different trend was observed for the latency of the neural responses. For normal-hearing listeners, the latency increased with increasing background noise level. However, for hearing-impaired listeners, this increase was not observed. Our results support the idea that the neural response latency indicates the efficiency of neural speech processing: more or different brain regions are involved in processing speech, which causes longer communication pathways in the brain. These longer communication pathways hamper the information integration among these brain regions, reflected in longer processing times. Altogether, this suggests decreased neural speech processing efficiency in HI listeners as more time and more or different brain regions are required to process speech. Our results suggest that this reduction in neural speech processing efficiency occurs gradually as hearing deteriorates. From our results, it is apparent that sound amplification does not solve hearing loss. Even when listening to speech in silence at a comfortable loudness, hearing-impaired listeners process speech less efficiently.
Collapse
Affiliation(s)
- Marlies Gillis
- KU Leuven, Department of Neurosciences, ExpORL, Leuven, Belgium
| | - Lien Decruy
- Institute for Systems Research, University of Maryland, College Park, MD, USA
| | | | - Tom Francart
- KU Leuven, Department of Neurosciences, ExpORL, Leuven, Belgium
| |
Collapse
|
27
|
Etard O, Messaoud RB, Gaugain G, Reichenbach T. No Evidence of Attentional Modulation of the Neural Response to the Temporal Fine Structure of Continuous Musical Pieces. J Cogn Neurosci 2021; 34:411-424. [PMID: 35015867 DOI: 10.1162/jocn_a_01811] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Speech and music are spectrotemporally complex acoustic signals that are highly relevant for humans. Both contain a temporal fine structure that is encoded in the neural responses of subcortical and cortical processing centers. The subcortical response to the temporal fine structure of speech has recently been shown to be modulated by selective attention to one of two competing voices. Music similarly often consists of several simultaneous melodic lines, and a listener can selectively attend to a particular one at a time. However, the neural mechanisms that enable such selective attention remain largely enigmatic, not least since most investigations to date have focused on short and simplified musical stimuli. Here, we studied the neural encoding of classical musical pieces in human volunteers, using scalp EEG recordings. We presented volunteers with continuous musical pieces composed of one or two instruments. In the latter case, the participants were asked to selectively attend to one of the two competing instruments and to perform a vibrato identification task. We used linear encoding and decoding models to relate the recorded EEG activity to the stimulus waveform. We show that we can measure neural responses to the temporal fine structure of melodic lines played by one single instrument, at the population level as well as for most individual participants. The neural response peaks at a latency of 7.6 msec and is not measurable past 15 msec. When analyzing the neural responses to the temporal fine structure elicited by competing instruments, we found no evidence of attentional modulation. We observed, however, that low-frequency neural activity exhibited a modulation consistent with the behavioral task at latencies from 100 to 160 msec, in a similar manner to the attentional modulation observed in continuous speech (N100). Our results show that, much like speech, the temporal fine structure of music is tracked by neural activity. In contrast to speech, however, this response appears unaffected by selective attention in the context of our experiment.
Collapse
|
28
|
Straetmans L, Holtze B, Debener S, Jaeger M, Mirkovic B. Neural tracking to go: auditory attention decoding and saliency detection with mobile EEG. J Neural Eng 2021; 18. [PMID: 34902846 DOI: 10.1088/1741-2552/ac42b5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Accepted: 12/13/2021] [Indexed: 11/11/2022]
Abstract
OBJECTIVE Neuro-steered assistive technologies have been suggested to offer a major advancement in future devices like neuro-steered hearing aids. Auditory attention decoding methods would in that case allow for identification of an attended speaker within complex auditory environments, exclusively from neural data. Decoding the attended speaker using neural information has so far only been done in controlled laboratory settings. Yet, it is known that ever-present factors like distraction and movement are reflected in the neural signal parameters related to attention. APPROACH Thus, in the current study we applied a two-competing speaker paradigm to investigate performance of a commonly applied EEG-based auditory attention decoding (AAD) model outside of the laboratory during leisure walking and distraction. Unique environmental sounds were added to the auditory scene and served as distractor events. MAIN RESULTS The current study shows, for the first time, that the attended speaker can be accurately decoded during natural movement. At a temporal resolution of as short as 5-seconds and without artifact attenuation, decoding was found to be significantly above chance level. Further, as hypothesized, we found a decrease in attention to the to-be-attended and the to-be-ignored speech stream after the occurrence of a salient event. Additionally, we demonstrate that it is possible to predict neural correlates of distraction with a computational model of auditory saliency based on acoustic features. CONCLUSION Taken together, our study shows that auditory attention tracking outside of the laboratory in ecologically valid conditions is feasible and a step towards the development of future neural-steered hearing aids.
Collapse
Affiliation(s)
- Lisa Straetmans
- Department of Psychology, Carl von Ossietzky Universität Oldenburg Fakultät für Medizin und Gesundheitswissenschaften, Ammerländer Heerstraße 114-118, Oldenburg, Niedersachsen, 26129, GERMANY
| | - B Holtze
- Department of Psychology, Carl von Ossietzky Universität Oldenburg Fakultät für Medizin und Gesundheitswissenschaften, Ammerländer Heerstr. 114-118, Oldenburg, Niedersachsen, 26129, GERMANY
| | - Stefan Debener
- Department of Psychology, Carl von Ossietzky Universität Oldenburg Fakultät für Medizin und Gesundheitswissenschaften, Ammerländer Heerstr. 114-118, Oldenburg, Niedersachsen, 26129, GERMANY
| | - Manuela Jaeger
- Department of Psychology, Carl von Ossietzky Universität Oldenburg Fakultät für Medizin und Gesundheitswissenschaften, Ammerländer Heerstr. 114-118, Oldenburg, Niedersachsen, 26129, GERMANY
| | - Bojana Mirkovic
- Department of Psychology , Carl von Ossietzky Universität Oldenburg Fakultät für Medizin und Gesundheitswissenschaften, Ammerländer Heerstr. 114-118, Oldenburg, Niedersachsen, 26129, GERMANY
| |
Collapse
|
29
|
Wilson RH, Scherer NJ. Waveform Amplitude and Temporal Symmetric/Asymmetric Characteristics of Phoneme and Syllable Segments in the W-1 Spondaic Words Recorded by Four Speakers. J Am Acad Audiol 2021; 32:445-463. [PMID: 34847585 DOI: 10.1055/s-0041-1730959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
BACKGROUND The amplitude and temporal asymmetry of the speech waveform are mostly associated with voiced speech utterances and are obvious in recent graphic depictions in the literature. The asymmetries are attributed to the presence and interactions of the major formants characteristic of voicing with possible contributions from the unidirectional air flow that accompanies speaking. PURPOSE This study investigated the amplitude symmetry/asymmetry characteristics (polarity) of speech waveforms that to our knowledge have not been quantified. STUDY SAMPLE Thirty-six spondaic words spoken by two male speakers and two female speakers were selected because they were multisyllabic words providing a reasonable sampling of speech sounds and four recordings were available that were not related to the topic under study. RESEARCH DESIGN Collectively, the words were segmented into phonemes (vowels [130], diphthongs [77], voiced consonants [258], voiceless consonants [219]), syllables (82), and blends (6). For each segment the following were analyzed separately for the positive and negative datum points: peak amplitude, the percent of the total segment datum points, the root-mean-square (rms) amplitude, and the crest factor. DATA COLLECTION AND ANALYSES The digitized words (44,100 samples/s; 16-bit) were parsed into 144 files (36 words × 4 speakers), edited, transcribed to numeric values (±1), and stored in a spread sheet in which all analyses were performed with in-house routines. Overall approximately 85% of each waveform was analyzed, which excluded portions of silent intervals, transitions, and diminished waveform endings. RESULTS The vowel, diphthong, and syllable segments had durations (180-220 ms) that were about twice as long as the consonant durations (∼90 ms) and peak and rms amplitudes that were 6 to 12 dB higher than the consonant peak and rms amplitudes. Vowel, diphthong, and syllable segments had 10% more positive datum points (55%) than negative points (45%), which suggested temporal asymmetries within the segments. With voiced consonants, the distribution of positive and negative datum points dropped to 52 and 48% and essentially was equal with the voiceless consonants (50.3 and 49.6%). The mean rms amplitudes of the negative datum points were higher than the rms amplitudes for the positive points by 2 dB (vowels, diphthongs, and syllables), 1 dB (voiced consonants), and 0.1 dB (voiceless consonants). The 144 waveforms and segmentations are illustrated in the Supplementary Material along with the tabularized positive and negative segment characteristics. CONCLUSIONS The temporal and amplitude waveform asymmetries were by far most notable in segments that had a voicing component, which included the voiced consonants. These asymmetries were characterized by larger envelopes and more energy in the negative side of the waveform segment than in the positive side. Interestingly, these segments had more positive datum points than negative points, which indicated temporal asymmetry. All aspects of the voiceless consonants were equally divided between the positive and negative domains. There were female/male differences but with these limited samples such differences should not be generalized beyond the speakers in this study. The influence of the temporal and amplitude asymmetries on monaural word-recognition performance is thought to be negligible.
Collapse
Affiliation(s)
- Richard H Wilson
- Speech and Hearing Sciences, Arizona State University, Tempe, Arizona
| | - Nancy J Scherer
- Speech and Hearing Sciences, Arizona State University, Tempe, Arizona
| |
Collapse
|
30
|
Gransier R, Wouters J. Neural auditory processing of parameterized speech envelopes. Hear Res 2021; 412:108374. [PMID: 34800800 DOI: 10.1016/j.heares.2021.108374] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Revised: 10/01/2021] [Accepted: 10/13/2021] [Indexed: 10/19/2022]
Abstract
Speech perception depends highly on the neural processing of the speech envelope. Several auditory processing deficits are hypothesized to result in a reduction in fidelity of the neural representation of the speech envelope across the auditory pathway. Furthermore, this reduction in fidelity is associated with supra-threshold speech processing deficits. Investigating the mechanisms that affect the neural encoding of the speech envelope can be of great value to gain insight in the different mechanisms that account for this reduced neural representation, and to develop stimulation strategies for hearing prosthesis that aim to restore it. In this perspective, we discuss the importance of neural assessment of phase-locking to the speech envelope from an audiological view and introduce the Temporal Envelope Speech Tracking (TEMPEST) stimulus framework which enables the electrophysiological assessment of envelope processing across the auditory pathway in a systematic and standardized way. We postulate that this framework can be used to gain insight in the salience of speech-like temporal envelopes in the neural code and to evaluate the effectiveness of stimulation strategies that aim to restore temporal processing across the auditory pathway with auditory prostheses.
Collapse
Affiliation(s)
- Robin Gransier
- ExpORL, Department of Neurosciences, KU Leuven, 3000 Leuven, Belgium; Leuven Brain Institute, KU Leuven, 3000 Leuven, Belgium.
| | - Jan Wouters
- ExpORL, Department of Neurosciences, KU Leuven, 3000 Leuven, Belgium; Leuven Brain Institute, KU Leuven, 3000 Leuven, Belgium
| |
Collapse
|
31
|
θ-Band Cortical Tracking of the Speech Envelope Shows the Linear Phase Property. eNeuro 2021; 8:ENEURO.0058-21.2021. [PMID: 34380659 PMCID: PMC8387159 DOI: 10.1523/eneuro.0058-21.2021] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Revised: 07/10/2021] [Accepted: 07/29/2021] [Indexed: 11/30/2022] Open
Abstract
When listening to speech, low-frequency cortical activity tracks the speech envelope. It remains controversial, however, whether such envelope-tracking neural activity reflects entrainment of neural oscillations or superposition of transient responses evoked by sound features. Recently, it is suggested that the phase of envelope-tracking activity can potentially distinguish entrained oscillations and evoked responses. Here, we analyze the phase of envelope-tracking in humans during passive listening, and observe that the phase lag between cortical activity and speech envelope tends to change linearly across frequency in the θ band (4–8 Hz), suggesting that the θ-band envelope-tracking activity can be readily modeled by evoked responses.
Collapse
|
32
|
Kuruvila I, Muncke J, Fischer E, Hoppe U. Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model. Front Physiol 2021; 12:700655. [PMID: 34408661 PMCID: PMC8365753 DOI: 10.3389/fphys.2021.700655] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Accepted: 07/05/2021] [Indexed: 11/25/2022] Open
Abstract
Human brain performs remarkably well in segregating a particular speaker from interfering ones in a multispeaker scenario. We can quantitatively evaluate the segregation capability by modeling a relationship between the speech signals present in an auditory scene, and the listener's cortical signals measured using electroencephalography (EEG). This has opened up avenues to integrate neuro-feedback into hearing aids where the device can infer user's attention and enhance the attended speaker. Commonly used algorithms to infer the auditory attention are based on linear systems theory where cues such as speech envelopes are mapped on to the EEG signals. Here, we present a joint convolutional neural network (CNN)—long short-term memory (LSTM) model to infer the auditory attention. Our joint CNN-LSTM model takes the EEG signals and the spectrogram of the multiple speakers as inputs and classifies the attention to one of the speakers. We evaluated the reliability of our network using three different datasets comprising of 61 subjects, where each subject undertook a dual-speaker experiment. The three datasets analyzed corresponded to speech stimuli presented in three different languages namely German, Danish, and Dutch. Using the proposed joint CNN-LSTM model, we obtained a median decoding accuracy of 77.2% at a trial duration of 3 s. Furthermore, we evaluated the amount of sparsity that the model can tolerate by means of magnitude pruning and found a tolerance of up to 50% sparsity without substantial loss of decoding accuracy.
Collapse
Affiliation(s)
- Ivine Kuruvila
- Department of Audiology, ENT-Clinic, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
| | - Jan Muncke
- Department of Audiology, ENT-Clinic, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
| | | | - Ulrich Hoppe
- Department of Audiology, ENT-Clinic, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
| |
Collapse
|
33
|
Soni S, Tata MS. Brain electrical dynamics in speech segmentation depends upon prior experience with the language. BRAIN AND LANGUAGE 2021; 219:104967. [PMID: 34022679 DOI: 10.1016/j.bandl.2021.104967] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Revised: 04/26/2021] [Accepted: 05/10/2021] [Indexed: 06/12/2023]
Abstract
It remains unclear whether the process of speech tracking, which facilitates speech segmentation, reflects top-down mechanisms related to prior linguistic models or stimulus-driven mechanisms, or possibly both. To address this, we recorded electroencephalography (EEG) responses from native and non-native speakers of English that had different prior experience with the English language but heard acoustically identical stimuli. Despite a significant difference in the ability to segment and perceive speech, our EEG results showed that theta-band tracking of the speech envelope did not depend significantly on prior experience with language. However, tracking in the theta-band did show changes across repetitions of the same sentence, suggesting a priming effect. Furthermore, native and non-native speakers showed different phase dynamics at word boundaries, suggesting differences in segmentation mechanisms. Finally, we found that the correlation between higher frequency dynamics reflecting phoneme-level processing and perceptual segmentation of words might depend on prior experience with the spoken language.
Collapse
Affiliation(s)
- Shweta Soni
- The University of Lethbridge, Lethbridge, AB, Canada.
| | | |
Collapse
|
34
|
Verschueren E, Vanthornhout J, Francart T. The Effect of Stimulus Choice on an EEG-Based Objective Measure of Speech Intelligibility. Ear Hear 2021; 41:1586-1597. [PMID: 33136634 DOI: 10.1097/aud.0000000000000875] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
OBJECTIVES Recently, an objective measure of speech intelligibility (SI), based on brain responses derived from the electroencephalogram (EEG), has been developed using isolated Matrix sentences as a stimulus. We investigated whether this objective measure of SI can also be used with natural speech as a stimulus, as this would be beneficial for clinical applications. DESIGN We recorded the EEG in 19 normal-hearing participants while they listened to two types of stimuli: Matrix sentences and a natural story. Each stimulus was presented at different levels of SI by adding speech weighted noise. SI was assessed in two ways for both stimuli: (1) behaviorally and (2) objectively by reconstructing the speech envelope from the EEG using a linear decoder and correlating it with the acoustic envelope. We also calculated temporal response functions (TRFs) to investigate the temporal characteristics of the brain responses in the EEG channels covering different brain areas. RESULTS For both stimulus types, the correlation between the speech envelope and the reconstructed envelope increased with increasing SI. In addition, correlations were higher for the natural story than for the Matrix sentences. Similar to the linear decoder analysis, TRF amplitudes increased with increasing SI for both stimuli. Remarkable is that although SI remained unchanged under the no-noise and +2.5 dB SNR conditions, neural speech processing was affected by the addition of this small amount of noise: TRF amplitudes across the entire scalp decreased between 0 and 150 ms, while amplitudes between 150 and 200 ms increased in the presence of noise. TRF latency changes in function of SI appeared to be stimulus specific: the latency of the prominent negative peak in the early responses (50 to 300 ms) increased with increasing SI for the Matrix sentences, but remained unchanged for the natural story. CONCLUSIONS These results show (1) the feasibility of natural speech as a stimulus for the objective measure of SI; (2) that neural tracking of speech is enhanced using a natural story compared to Matrix sentences; and (3) that noise and the stimulus type can change the temporal characteristics of the brain responses. These results might reflect the integration of incoming acoustic features and top-down information, suggesting that the choice of the stimulus has to be considered based on the intended purpose of the measurement.
Collapse
Affiliation(s)
- Eline Verschueren
- Research Group Experimental Oto-rhino-laryngology (ExpORL), Department of Neurosciences, KU Leuven-University of Leuven, Leuven, Belgium
| | | | | |
Collapse
|
35
|
Rosenkranz M, Holtze B, Jaeger M, Debener S. EEG-Based Intersubject Correlations Reflect Selective Attention in a Competing Speaker Scenario. Front Neurosci 2021; 15:685774. [PMID: 34194296 PMCID: PMC8236636 DOI: 10.3389/fnins.2021.685774] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Accepted: 05/18/2021] [Indexed: 11/13/2022] Open
Abstract
Several solutions have been proposed to study the relationship between ongoing brain activity and natural sensory stimuli, such as running speech. Computing the intersubject correlation (ISC) has been proposed as one possible approach. Previous evidence suggests that ISCs between the participants' electroencephalogram (EEG) may be modulated by attention. The current study addressed this question in a competing-speaker paradigm, where participants (N = 41) had to attend to one of two concurrently presented speech streams. ISCs between participants' EEG were higher for participants attending to the same story compared to participants attending to different stories. Furthermore, we found that ISCs between individual and group data predicted whether an individual attended to the left or right speech stream. Interestingly, the magnitude of the shared neural response with others attending to the same story was related to the individual neural representation of the attended and ignored speech envelope. Overall, our findings indicate that ISC differences reflect the magnitude of selective attentional engagement to speech.
Collapse
Affiliation(s)
- Marc Rosenkranz
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany
| | - Björn Holtze
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany
| | - Manuela Jaeger
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Division Hearing, Fraunhofer Institute for Digital Media Technology IDMT, Speech and Audio Technology, Oldenburg, Germany
| | - Stefan Debener
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany.,Research Center for Neurosensory Science, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
36
|
Gonzalez JE, Musiek FE. The Onset-Offset N1-P2 Auditory Evoked Response in Individuals With High-Frequency Sensorineural Hearing Loss: Responses to Broadband Noise. Am J Audiol 2021; 30:423-432. [PMID: 34057857 DOI: 10.1044/2021_aja-20-00113] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Purpose Clinical use of electrophysiologic measures has been limited to use of brief stimuli to evoke responses. While brief stimuli elicit onset responses in individuals with normal hearing and normal central auditory nervous system (CANS) function, responses represent the integrity of a fraction of the mainly excitatory central auditory neurons. Longer stimuli could provide information regarding excitatory and inhibitory CANS function. Our goal was to measure the onset-offset N1-P2 auditory evoked response in subjects with normal hearing and subjects with moderate high-frequency sensorineural hearing loss (HFSNHL) to determine whether the response can be measured in individuals with moderate HFSNHL and, if so, whether waveform components differ between participant groups. Method Waveforms were obtained from 10 participants with normal hearing and seven participants with HFSNHL aged 40-67 years using 2,000-ms broadband noise stimuli with 40-ms rise-fall times presented at 50 dB SL referenced to stimulus threshold. Amplitudes and latencies were analyzed via repeated-measures analysis of variance (ANOVA). N1 and P2 onset latencies were compared to offset counterparts via repeated-measures ANOVA after subtracting 2,000 ms from the offset latencies to account for stimulus duration. Offset-to-onset trough-to-peak amplitude ratios between groups were compared using a one-way ANOVA. Results Responses were evoked from all participants. There were no differences between participant groups for the waveform components measured. Response × Participant Group interactions were not significant. Offset N1-P2 latencies were significantly shorter than onset counterparts after adjusting for stimulus duration (normal hearing: 43 ms shorter; HFSNHL: 47 ms shorter). Conclusions Onset-offset N1-P2 responses were resistant to moderate HFSNHL. It is likely that the onset was elicited by the presentation of a sound in silence and the offset by the change in stimulus envelope from plateau to fall, suggesting an excitatory onset response and an inhibitory-influenced offset response. Results indicated this protocol can be used to investigate CANS function in individuals with moderate HFSNHL. Supplemental Material https://doi.org/10.23641/asha.14669007.
Collapse
Affiliation(s)
- Jennifer E. Gonzalez
- Speech and Hearing Science, College of Health Solutions, Arizona State University, Tempe
| | - Frank E. Musiek
- Department of Speech, Language, and Hearing Sciences, The University of Arizona, Tucson
| |
Collapse
|
37
|
Irsik VC, Almanaseer A, Johnsrude IS, Herrmann B. Cortical Responses to the Amplitude Envelopes of Sounds Change with Age. J Neurosci 2021; 41:5045-5055. [PMID: 33903222 PMCID: PMC8197634 DOI: 10.1523/jneurosci.2715-20.2021] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Revised: 02/16/2021] [Accepted: 03/26/2021] [Indexed: 11/21/2022] Open
Abstract
Many older listeners have difficulty understanding speech in noise, when cues to speech-sound identity are less redundant. The amplitude envelope of speech fluctuates dramatically over time, and features such as the rate of amplitude change at onsets (attack) and offsets (decay), signal critical information about the identity of speech sounds. Aging is also thought to be accompanied by increases in cortical excitability, which may differentially alter sensitivity to envelope dynamics. Here, we recorded electroencephalography in younger and older human adults (of both sexes) to investigate how aging affects neural synchronization to 4 Hz amplitude-modulated noises with different envelope shapes (ramped: slow attack and sharp decay; damped: sharp attack and slow decay). We observed that subcortical responses did not differ between age groups, whereas older compared with younger adults exhibited larger cortical responses to sound onsets, consistent with an increase in auditory cortical excitability. Neural activity in older adults synchronized more strongly to rapid-onset, slow-offset (damped) envelopes, was less sinusoidal, and was more peaked. Younger adults demonstrated the opposite pattern, showing stronger synchronization to slow-onset, rapid-offset (ramped) envelopes, as well as a more sinusoidal neural response shape. The current results suggest that age-related changes in the excitability of auditory cortex alter responses to envelope dynamics. This may be part of the reason why older adults experience difficulty understanding speech in noise.SIGNIFICANCE STATEMENT Many middle-aged and older adults report difficulty understanding speech when there is background noise, which can trigger social withdrawal and negative psychosocial health outcomes. The difficulty may be related to age-related changes in how the brain processes temporal sound features. We tested younger and older people on their sensitivity to different envelope shapes, using EEG. Our results demonstrate that aging is associated with heightened sensitivity to sounds with a sharp attack and gradual decay, and sharper neural responses that deviate from the sinusoidal features of the stimulus, perhaps reflecting increased excitability in the aged auditory cortex. Altered responses to temporal sound features may be part of the reason why older adults often experience difficulty understanding speech in social situations.
Collapse
Affiliation(s)
- Vanessa C Irsik
- Department of Psychology & the Brain and Mind Institute, University of Western Ontario, London, Ontario N6A 3K7, Canada
| | - Ala Almanaseer
- Department of Psychology & the Brain and Mind Institute, University of Western Ontario, London, Ontario N6A 3K7, Canada
| | - Ingrid S Johnsrude
- Department of Psychology & the Brain and Mind Institute, University of Western Ontario, London, Ontario N6A 3K7, Canada
- School of Communication and Speech Disorders, University of Western Ontario, London, Ontario N6A 5B7, Canada
| | - Björn Herrmann
- Department of Psychology & the Brain and Mind Institute, University of Western Ontario, London, Ontario N6A 3K7, Canada
- Rotman Research Institute Baycrest, Toronto, Ontario M6A 2E1, Canada
- Department of Psychology, University of Toronto, Toronto, Ontario M5S 1A1, Canada
| |
Collapse
|
38
|
Bröhl F, Kayser C. Delta/theta band EEG differentially tracks low and high frequency speech-derived envelopes. Neuroimage 2021; 233:117958. [PMID: 33744458 PMCID: PMC8204264 DOI: 10.1016/j.neuroimage.2021.117958] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 03/08/2021] [Accepted: 03/09/2021] [Indexed: 11/01/2022] Open
Abstract
The representation of speech in the brain is often examined by measuring the alignment of rhythmic brain activity to the speech envelope. To conveniently quantify this alignment (termed 'speech tracking') many studies consider the broadband speech envelope, which combines acoustic fluctuations across the spectral range. Using EEG recordings, we show that using this broadband envelope can provide a distorted picture on speech encoding. We systematically investigated the encoding of spectrally-limited speech-derived envelopes presented by individual and multiple noise carriers in the human brain. Tracking in the 1 to 6 Hz EEG bands differentially reflected low (0.2 - 0.83 kHz) and high (2.66 - 8 kHz) frequency speech-derived envelopes. This was independent of the specific carrier frequency but sensitive to attentional manipulations, and may reflect the context-dependent emphasis of information from distinct spectral ranges of the speech envelope in low frequency brain activity. As low and high frequency speech envelopes relate to distinct phonemic features, our results suggest that functionally distinct processes contribute to speech tracking in the same EEG bands, and are easily confounded when considering the broadband speech envelope.
Collapse
Affiliation(s)
- Felix Bröhl
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld University, Universitätsstr. 25, 33615 Bielefeld, Germany.
| | - Christoph Kayser
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld University, Universitätsstr. 25, 33615 Bielefeld, Germany
| |
Collapse
|
39
|
Wang L, Wu EX, Chen F. EEG-based auditory attention decoding using speech-level-based segmented computational models. J Neural Eng 2021; 18. [PMID: 33957606 DOI: 10.1088/1741-2552/abfeba] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 05/06/2021] [Indexed: 11/11/2022]
Abstract
Objective.Auditory attention in complex scenarios can be decoded by electroencephalography (EEG)-based cortical speech-envelope tracking. The relative root-mean-square (RMS) intensity is a valuable cue for the decomposition of speech into distinct characteristic segments. To improve auditory attention decoding (AAD) performance, this work proposed a novel segmented AAD approach to decode target speech envelopes from different RMS-level-based speech segments.Approach.Speech was decomposed into higher- and lower-RMS-level speech segments with a threshold of -10 dB relative RMS level. A support vector machine classifier was designed to identify higher- and lower-RMS-level speech segments, using clean target and mixed speech as reference signals based on corresponding EEG signals recorded when subjects listened to target auditory streams in competing two-speaker auditory scenes. Segmented computational models were developed with the classification results of higher- and lower-RMS-level speech segments. Speech envelopes were reconstructed based on segmented decoding models for either higher- or lower-RMS-level speech segments. AAD accuracies were calculated according to the correlations between actual and reconstructed speech envelopes. The performance of the proposed segmented AAD computational model was compared to those of traditional AAD methods with unified decoding functions.Main results.Higher- and lower-RMS-level speech segments in continuous sentences could be identified robustly with classification accuracies that approximated or exceeded 80% based on corresponding EEG signals at 6 dB, 3 dB, 0 dB, -3 dB and -6 dB signal-to-mask ratios (SMRs). Compared with unified AAD decoding methods, the proposed segmented AAD approach achieved more accurate results in the reconstruction of target speech envelopes and in the detection of attentional directions. Moreover, the proposed segmented decoding method had higher information transfer rates (ITRs) and shorter minimum expected switch times compared with the unified decoder.Significance.This study revealed that EEG signals may be used to classify higher- and lower-RMS-level-based speech segments across a wide range of SMR conditions (from 6 dB to -6 dB). A novel finding was that the specific information in different RMS-level-based speech segments facilitated EEG-based decoding of auditory attention. The significantly improved AAD accuracies and ITRs of the segmented decoding method suggests that this proposed computational model may be an effective method for the application of neuro-controlled brain-computer interfaces in complex auditory scenes.
Collapse
Affiliation(s)
- Lei Wang
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, People's Republic of China.,Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, People's Republic of China
| | - Ed X Wu
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, People's Republic of China
| | - Fei Chen
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, People's Republic of China
| |
Collapse
|
40
|
de Cheveigné A, Slaney M, Fuglsang SA, Hjortkjaer J. Auditory stimulus-response modeling with a match-mismatch task. J Neural Eng 2021; 18. [PMID: 33849003 DOI: 10.1088/1741-2552/abf771] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Accepted: 04/13/2021] [Indexed: 11/12/2022]
Abstract
Objective.An auditory stimulus can be related to the brain response that it evokes by a stimulus-response model fit to the data. This offers insight into perceptual processes within the brain and is also of potential use for devices such as brain computer interfaces (BCIs). The quality of the model can be quantified by measuring the fit with a regression problem, or by applying it to a classification task and measuring its performance.Approach.Here we focus on amatch-mismatch(MM) task that entails deciding whether a segment of brain signal matches, via a model, the auditory stimulus that evoked it.Main results. Using these metrics, we describe a range of models of increasing complexity that we compare to methods in the literature, showing state-of-the-art performance. We document in detail one particular implementation, calibrated on a publicly-available database, that can serve as a robust reference to evaluate future developments.Significance.The MM task allows stimulus-response models to be evaluated in the limit of very high model accuracy, making it an attractive alternative to the more commonly used task of auditory attention detection. The MM task does not require class labels, so it is immune to mislabeling, and it is applicable to data recorded in listening scenarios with only one sound source, thus it is cheap to obtain large quantities of training and testing data. Performance metrics from this task, associated with regression accuracy, provide complementary insights into the relation between stimulus and response, as well as information about discriminatory power directly applicable to BCI applications.
Collapse
Affiliation(s)
- Alain de Cheveigné
- Laboratoire des Systèmes Perceptifs, Paris, CNRS UMR 8248, France.,Département d'Etudes Cognitives, Ecole Normale Supérieure, Paris, PSL, France.,UCL Ear Institute, London, United Kingdom.,Audition, DEC, ENS, 29 rue d'Ulm, 75230 Paris, France
| | - Malcolm Slaney
- Google Research, Machine Hearing Group, Mountain View, CA, United States of America
| | - Søren A Fuglsang
- Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital Hvidovre, Copenhagen, Denmark
| | - Jens Hjortkjaer
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, Kgs. Lyngby, Denmark.,Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital Hvidovre, Copenhagen, Denmark
| |
Collapse
|
41
|
Kuruvila I, Can Demir K, Fischer E, Hoppe U. Inference of the Selective Auditory Attention Using Sequential LMMSE Estimation. IEEE Trans Biomed Eng 2021; 68:3501-3512. [PMID: 33891545 DOI: 10.1109/tbme.2021.3075337] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Attentive listening in a multispeaker environment such as a cocktail party requires suppression of the interfering speakers and the noise around. People with normal hearing perform remarkably well in such situations. Analysis of the cortical signals using electroencephalography (EEG) has revealed that the EEG signals track the envelope of the attended speech stronger than that of the interfering speech. This has enabled the development of algorithms that can decode the selective attention of a listener in controlled experimental settings. However, often these algorithms require longer trial duration and computationally expensive calibration to obtain a reliable inference of attention. In this paper, we present a novel framework to decode the attention of a listener within trial durations of the order of two seconds. It comprises of three modules: 1) Dynamic estimation of the temporal response functions (TRF) in every trial using a sequential linear minimum mean squared error (LMMSE) estimator, 2) Extract the N1 -P2 peak of the estimated TRF that serves as a marker related to the attentional state, and 3) Obtain a probabilistic measure of the attentional state using a support vector machine followed by a logistic regression. The efficacy of the proposed decoding framework was evaluated using EEG data collected from 27 subjects. The total number of electrodes required to infer the attention was four: One for the signal estimation, one for the noise estimation and the other two being the reference and the ground electrodes. Our results make further progress towards the realization of neuro-steered hearing aids.
Collapse
|
42
|
Holtze B, Jaeger M, Debener S, Adiloğlu K, Mirkovic B. Are They Calling My Name? Attention Capture Is Reflected in the Neural Tracking of Attended and Ignored Speech. Front Neurosci 2021; 15:643705. [PMID: 33828451 PMCID: PMC8019946 DOI: 10.3389/fnins.2021.643705] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 02/19/2021] [Indexed: 11/15/2022] Open
Abstract
Difficulties in selectively attending to one among several speakers have mainly been associated with the distraction caused by ignored speech. Thus, in the current study, we investigated the neural processing of ignored speech in a two-competing-speaker paradigm. For this, we recorded the participant’s brain activity using electroencephalography (EEG) to track the neural representation of the attended and ignored speech envelope. To provoke distraction, we occasionally embedded the participant’s first name in the ignored speech stream. Retrospective reports as well as the presence of a P3 component in response to the name indicate that participants noticed the occurrence of their name. As predicted, the neural representation of the ignored speech envelope increased after the name was presented therein, suggesting that the name had attracted the participant’s attention. Interestingly, in contrast to our hypothesis, the neural tracking of the attended speech envelope also increased after the name occurrence. On this account, we conclude that the name might not have primarily distracted the participants, at most for a brief duration, but that it alerted them to focus to their actual task. These observations remained robust even when the sound intensity of the ignored speech stream, and thus the sound intensity of the name, was attenuated.
Collapse
Affiliation(s)
- Björn Holtze
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany
| | - Manuela Jaeger
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Fraunhofer Institute for Digital Media Technology IDMT, Division Hearing, Speech and Audio Technology, Oldenburg, Germany
| | - Stefan Debener
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Research Center for Neurosensory Science, University of Oldenburg, Oldenburg, Germany.,Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany
| | - Kamil Adiloğlu
- Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany.,HörTech gGmbH, Oldenburg, Germany
| | - Bojana Mirkovic
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
43
|
Elmahallawi TH, Gabr TA, Darwish ME, Seleem FM. Children with developmental language disorder: a frequency following response in the noise study. Braz J Otorhinolaryngol 2021; 88:954-961. [PMID: 33766501 PMCID: PMC9615520 DOI: 10.1016/j.bjorl.2021.01.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Revised: 12/21/2020] [Accepted: 01/31/2021] [Indexed: 11/27/2022] Open
Abstract
Introduction Children with developmental language disorder have been reported to have poor temporal auditory processing. This study aimed to examine the frequency following response. Objective This work aimed to investigate speech processing in quiet and in noise. Methods Two groups of children were included in this work: the control group (15 children with normal language development) and the study group (25 children diagnosed with developmental language disorder). All children were submitted to intelligence scale, language assessment, full audiological evaluation, and frequency following response in quiet and noise (+5QNR and +10QNR). Results Results showed no statically significant difference between both groups as regards IQ or PTA. In the study group, the advanced analysis of frequency following response showed reduced F0 and F2 amplitudes. Results also showed that noise has an impact on both the transient and sustained components of the frequency following response in the same group. Conclusion Children with developmental language disorder have difficulty in speech processing especially in the presence of background noise. Frequency following response is an efficient procedure that can be used to address speech processing problems in children with developmental language disorder.
Collapse
Affiliation(s)
- Trandil H Elmahallawi
- Tanta University Hospitals, Otolaryngology Head and Neck Surgery Department, Audiovestibular Unit, Tanta, Egypt
| | - Takwa A Gabr
- Kafrelsheikh University Hospitals, Otolaryngology Head and Neck Surgery Department, Audiovestibular Unit, Kafrelsheikh, Egypt.
| | - Mohamed E Darwish
- Tanta University Hospitals, Otolaryngology Head and Neck Surgery Department, Phoniatrics Unit, Tanta, Egypt
| | - Fatma M Seleem
- Tanta University Hospitals, Otolaryngology Head and Neck Surgery Department, Audiovestibular Unit, Tanta, Egypt
| |
Collapse
|
44
|
Erkens J, Schulte M, Vormann M, Wilsch A, Herrmann CS. Hearing Impaired Participants Improve More Under Envelope-Transcranial Alternating Current Stimulation When Signal to Noise Ratio Is High. Neurosci Insights 2021; 16:2633105520988854. [PMID: 33709079 PMCID: PMC7907945 DOI: 10.1177/2633105520988854] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 12/31/2020] [Indexed: 11/16/2022] Open
Abstract
An issue commonly expressed by hearing aid users is a difficulty to understand speech in complex hearing scenarios, that is, when speech is presented together with background noise or in situations with multiple speakers. Conventional hearing aids are already designed with these issues in mind, using beamforming to only enhance sound from a specific direction, but these are limited in solving these issues as they can only modulate incoming sound at the cochlear level. However, evidence exists that age-related hearing loss might partially be caused later in the hearing processes due to brain processes slowing down and becoming less efficient. In this study, we tested whether it would be possible to improve the hearing process at the cortical level by improving neural tracking of speech. The speech envelopes of target sentences were transformed into an electrical signal and stimulated onto elderly participants' cortices using transcranial alternating current stimulation (tACS). We compared 2 different signal to noise ratios (SNRs) with 5 different delays between sound presentation and stimulation ranging from 50 ms to 150 ms, and the differences in effects between elderly normal hearing and elderly hearing impaired participants. When the task was performed at a high SNR, hearing impaired participants appeared to gain more from envelope-tACS compared to when the task was performed at a lower SNR. This was not the case for normal hearing participants. Furthermore, a post-hoc analysis of the different time-lags suggest that elderly were significantly better at a stimulation time-lag of 150 ms when the task was presented at a high SNR. In this paper, we outline why these effects are worth exploring further, and what they tell us about the optimal tACS time-lag.
Collapse
Affiliation(s)
- Jules Erkens
- Department of Psychology, Cluster of
Excellence “Hearing4All,” European Medical School, Carl von Ossietzky University,
Oldenburg, Germany
| | | | | | - Anna Wilsch
- Department of Psychology, Cluster of
Excellence “Hearing4All,” European Medical School, Carl von Ossietzky University,
Oldenburg, Germany
| | - Christoph S Herrmann
- Department of Psychology, Cluster of
Excellence “Hearing4All,” European Medical School, Carl von Ossietzky University,
Oldenburg, Germany
- Research Center Neurosensory Science,
Carl von Ossietzky University, Oldenburg, Germany
| |
Collapse
|
45
|
Duprez J, Stokkermans M, Drijvers L, Cohen MX. Synchronization between Keyboard Typing and Neural Oscillations. J Cogn Neurosci 2021; 33:887-901. [PMID: 33571075 DOI: 10.1162/jocn_a_01692] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Rhythmic neural activity synchronizes with certain rhythmic behaviors, such as breathing, sniffing, saccades, and speech. The extent to which neural oscillations synchronize with higher-level and more complex behaviors is largely unknown. Here, we investigated electrophysiological synchronization with keyboard typing, which is an omnipresent behavior daily engaged by an uncountably large number of people. Keyboard typing is rhythmic, with frequency characteristics roughly the same as neural oscillatory dynamics associated with cognitive control, notably through midfrontal theta (4-7 Hz) oscillations. We tested the hypothesis that synchronization occurs between typing and midfrontal theta and breaks down when errors are committed. Thirty healthy participants typed words and sentences on a keyboard without visual feedback, while EEG was recorded. Typing rhythmicity was investigated by interkeystroke interval analyses and by a kernel density estimation method. We used a multivariate spatial filtering technique to investigate frequency-specific synchronization between typing and neuronal oscillations. Our results demonstrate theta rhythmicity in typing (around 6.5 Hz) through the two different behavioral analyses. Synchronization between typing and neuronal oscillations occurred at frequencies ranging from 4 to 15 Hz, but to a larger extent for lower frequencies. However, peak synchronization frequency was idiosyncratic across participants, therefore not specific to theta nor to midfrontal regions, and correlated somewhat with peak typing frequency. Errors and trials associated with stronger cognitive control were not associated with changes in synchronization at any frequency. As a whole, this study shows that brain-behavior synchronization does occur during keyboard typing but is not specific to midfrontal theta.
Collapse
Affiliation(s)
- Joan Duprez
- University Rennes, France.,Radboud University Medical Centre, Nijmegen, The Netherlands
| | | | - Linda Drijvers
- Radboud University, Nijmegen, The Netherlands.,Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| | - Michael X Cohen
- Radboud University Medical Centre, Nijmegen, The Netherlands
| |
Collapse
|
46
|
Effect of number and placement of EEG electrodes on measurement of neural tracking of speech. PLoS One 2021; 16:e0246769. [PMID: 33571299 PMCID: PMC7877609 DOI: 10.1371/journal.pone.0246769] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Accepted: 01/25/2021] [Indexed: 11/19/2022] Open
Abstract
Measurement of neural tracking of natural running speech from the electroencephalogram (EEG) is an increasingly popular method in auditory neuroscience and has applications in audiology. The method involves decoding the envelope of the speech signal from the EEG signal, and calculating the correlation with the envelope of the audio stream that was presented to the subject. Typically EEG systems with 64 or more electrodes are used. However, in practical applications, set-ups with fewer electrodes are required. Here, we determine the optimal number of electrodes, and the best position to place a limited number of electrodes on the scalp. We propose a channel selection strategy based on an utility metric, which allows a quick quantitative assessment of the influence of a channel (or a group of channels) on the reconstruction error. We consider two use cases: a subject-specific case, where the optimal number and position of the electrodes is determined for each subject individually, and a subject-independent case, where the electrodes are placed at the same positions (in the 10-20 system) for all the subjects. We evaluated our approach using 64-channel EEG data from 90 subjects. In the subject-specific case we found that the correlation between actual and reconstructed envelope first increased with decreasing number of electrodes, with an optimum at around 20 electrodes, yielding 29% higher correlations using the optimal number of electrodes compared to all electrodes. This means that our strategy of removing electrodes can be used to improve the correlation metric in high-density EEG recordings. In the subject-independent case, we obtained a stable decoding performance when decreasing from 64 to 22 channels. When the number of channels was further decreased, the correlation decreased. For a maximal decrease in correlation of 10%, 32 well-placed electrodes were sufficient in 91% of the subjects.
Collapse
|
47
|
Reetzke R, Gnanateja GN, Chandrasekaran B. Neural tracking of the speech envelope is differentially modulated by attention and language experience. BRAIN AND LANGUAGE 2021; 213:104891. [PMID: 33290877 PMCID: PMC7856208 DOI: 10.1016/j.bandl.2020.104891] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Revised: 09/22/2020] [Accepted: 11/18/2020] [Indexed: 05/13/2023]
Abstract
The ability to selectively attend to a speech signal amid competing sounds is a significant challenge, especially for listeners trying to comprehend non-native speech. Attention is critical to direct neural processing resources to the most essential information. Here, neural tracking of the speech envelope of an English story narrative and cortical auditory evoked potentials (CAEPs) to non-speech stimuli were simultaneously assayed in native and non-native listeners of English. Although native listeners exhibited higher narrative comprehension accuracy, non-native listeners exhibited enhanced neural tracking of the speech envelope and heightened CAEP magnitudes. These results support an emerging view that although attention to a target speech signal enhances neural tracking of the speech envelope, this mechanism itself may not confer speech comprehension advantages. Our findings suggest that non-native listeners may engage neural attentional processes that enhance low-level acoustic features, regardless if the target signal contains speech or non-speech information.
Collapse
Affiliation(s)
- Rachel Reetzke
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, United States; Center for Autism and Related Disorders, Kennedy Krieger Institute, United States
| | - G Nike Gnanateja
- Department of Communication Science and Disorders, University of Pittsburgh, United States
| | - Bharath Chandrasekaran
- Department of Communication Science and Disorders, University of Pittsburgh, United States.
| |
Collapse
|
48
|
Abstract
OBJECTIVE Birdsong sounds are often used to inform visually-challenged people about the presence of basic infrastructures, and therefore need to be salient in noisy urban environments. How salient sounds are processed in the brain could inform us about the optimal birdsong in such environments. However, brain activity related to birdsong salience is not yet known. METHODS Oscillatory magnetoencephalographic (MEG) activities and subjective salience induced by six birdsongs under three background noise conditions were measured. Thirteen participants completed the MEG measurements and 11 participants took part in the paired-comparison tests. We estimated the power of induced oscillatory activities, and explored the relationship between subjective salience of birdsongs and the power of induced activities using sparse regression analysis. RESULTS According to sparse regression analysis, the subjective salience was explained by the power of induced alpha (8-13 Hz) in the frontal region, induced beta (13-30 Hz) in the occipital region, and induced gamma (30-50 Hz) in the parietal region. The power of the frontal alpha and parietal gamma activities significantly varied across both birds and noise conditions. CONCLUSION These results indicate that frontal alpha activity is related to the salience of birdsong and that parietal gamma activity is related to differences in salience across noisy environments. These results suggest that salient birdsong under a noisy environment activates the bottom-up attention network.
Collapse
|
49
|
Kuruvila I, Fischer E, Hoppe U. An LMMSE-based Estimation of Temporal Response Function in Auditory Attention Decoding. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2020; 2020:2837-2840. [PMID: 33018597 DOI: 10.1109/embc44109.2020.9175866] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
One of the remarkable abilities of humans is to focus the attention on a certain speaker in a multi-speaker environment that is known as the cocktail party effect. How the human brain solves this non-trivial task is a challenge that the scientific community has not yet found answers to. In recent years, progress has been made thanks to the development of system identification method based on least-squares (LS) that maps the variations between the cortical signals of a listener and the speech signals present in an auditory scene. Results from numerous two-speaker experiments simulating the cocktail party effect have shown that the auditory attention could be inferred from electroencephalography (EEG) using the LS method. It has been suggested that these methods have the potential to be integrated into hearing aids for algorithmic control. However, a major challenge remains using LS methods such that a large number of scalp EEG electrodes are required in order to get a reliable estimate of the attention. Here we present a new system identification method based on linear minimum mean squared error (LMMSE) that could estimate the attention with the help of two electrodes: one for the true signal estimation and other for the noise estimation. The algorithm is tested using EEG signals collected from ten subjects and its performance is compared against the state-of-the-art LS algorithm.
Collapse
|
50
|
Lacey S, Jamal Y, List SM, McCormick K, Sathian K, Nygaard LC. Stimulus Parameters Underlying Sound-Symbolic Mapping of Auditory Pseudowords to Visual Shapes. Cogn Sci 2020; 44:e12883. [PMID: 32909637 PMCID: PMC7896554 DOI: 10.1111/cogs.12883] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Revised: 06/06/2020] [Accepted: 07/01/2020] [Indexed: 12/12/2022]
Abstract
Sound symbolism refers to non-arbitrary mappings between the sounds of words and their meanings and is often studied by pairing auditory pseudowords such as "maluma" and "takete" with rounded and pointed visual shapes, respectively. However, it is unclear what auditory properties of pseudowords contribute to their perception as rounded or pointed. Here, we compared perceptual ratings of the roundedness/pointedness of large sets of pseudowords and shapes to their acoustic and visual properties using a novel application of representational similarity analysis (RSA). Representational dissimilarity matrices (RDMs) of the auditory and visual ratings of roundedness/pointedness were significantly correlated crossmodally. The auditory perceptual RDM correlated significantly with RDMs of spectral tilt, the temporal fast Fourier transform (FFT), and the speech envelope. Conventional correlational analyses showed that ratings of pseudowords transitioned from rounded to pointed as vocal roughness (as measured by the harmonics-to-noise ratio, pulse number, fraction of unvoiced frames, mean autocorrelation, shimmer, and jitter) increased. The visual perceptual RDM correlated significantly with RDMs of global indices of visual shape (the simple matching coefficient, image silhouette, image outlines, and Jaccard distance). Crossmodally, the RDMs of the auditory spectral parameters correlated weakly but significantly with those of the global indices of visual shape. Our work establishes the utility of RSA for analysis of large stimulus sets and offers novel insights into the stimulus parameters underlying sound symbolism, showing that sound-to-shape mapping is driven by acoustic properties of pseudowords and suggesting audiovisual cross-modal correspondence as a basis for language users' sensitivity to this type of sound symbolism.
Collapse
Affiliation(s)
- Simon Lacey
- Department of Neurology, Milton S. Hershey Medical Center, Penn State College of Medicine, Hershey, PA 17033-0859, USA
- Department of Neural & Behavioral Sciences, Milton S. Hershey Medical Center, Penn State College of Medicine, Hershey, PA 17033-0859, USA
- Department of Neurology, Emory University, Atlanta, GA 30322, USA
| | - Yaseen Jamal
- Department of Psychology, Emory University, Atlanta, GA 30322, USA
| | - Sara M. List
- Department of Neurology, Emory University, Atlanta, GA 30322, USA
- Department of Psychology, Emory University, Atlanta, GA 30322, USA
| | - Kelly McCormick
- Department of Neurology, Emory University, Atlanta, GA 30322, USA
- Department of Psychology, Emory University, Atlanta, GA 30322, USA
| | - K. Sathian
- Department of Neurology, Milton S. Hershey Medical Center, Penn State College of Medicine, Hershey, PA 17033-0859, USA
- Department of Neural & Behavioral Sciences, Milton S. Hershey Medical Center, Penn State College of Medicine, Hershey, PA 17033-0859, USA
- Department of Psychology, Milton S. Hershey Medical Center, Penn State College of Medicine, Hershey, PA 17033-0859, USA
- Department of Neurology, Emory University, Atlanta, GA 30322, USA
- Department of Psychology, Emory University, Atlanta, GA 30322, USA
| | - Lynne C. Nygaard
- Department of Psychology, Emory University, Atlanta, GA 30322, USA
| |
Collapse
|