1
|
Riegel J, Schüller A, Reichenbach T. No Evidence of Musical Training Influencing the Cortical Contribution to the Speech-Frequency-Following Response and Its Modulation through Selective Attention. eNeuro 2024; 11:ENEURO.0127-24.2024. [PMID: 39160069 DOI: 10.1523/eneuro.0127-24.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 07/23/2024] [Accepted: 07/24/2024] [Indexed: 08/21/2024] Open
Abstract
Musicians can have better abilities to understand speech in adverse condition such as background noise than non-musicians. However, the neural mechanisms behind such enhanced behavioral performances remain largely unclear. Studies have found that the subcortical frequency-following response to the fundamental frequency of speech and its higher harmonics (speech-FFR) may be involved since it is larger in people with musical training than in those without. Recent research has shown that the speech-FFR consists of a cortical contribution in addition to the subcortical sources. Both the subcortical and the cortical contribution are modulated by selective attention to one of two competing speakers. However, it is unknown whether the strength of the cortical contribution to the speech-FFR, or its attention modulation, is influenced by musical training. Here we investigate these issues through magnetoencephalographic (MEG) recordings of 52 subjects (18 musicians, 25 non-musicians, and 9 neutral participants) listening to two competing male speakers while selectively attending one of them. The speech-in-noise comprehension abilities of the participants were not assessed. We find that musicians and non-musicians display comparable cortical speech-FFRs and additionally exhibit similar subject-to-subject variability in the response. Furthermore, we also do not observe a difference in the modulation of the neural response through selective attention between musicians and non-musicians. Moreover, when assessing whether the cortical speech-FFRs are influenced by particular aspects of musical training, no significant effects emerged. Taken together, we did not find any effect of musical training on the cortical speech-FFR.
Collapse
Affiliation(s)
- Jasmin Riegel
- Department Artificial Intelligence in Biomedical Engineering (AIBE), Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), 91052 Erlangen, Germany
| | | | | |
Collapse
|
2
|
Puffay C, Vanthornhout J, Gillis M, Clercq PD, Accou B, Hamme HV, Francart T. Classifying coherent versus nonsense speech perception from EEG using linguistic speech features. Sci Rep 2024; 14:18922. [PMID: 39143297 PMCID: PMC11324895 DOI: 10.1038/s41598-024-69568-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Accepted: 08/06/2024] [Indexed: 08/16/2024] Open
Abstract
When a person listens to natural speech, the relation between features of the speech signal and the corresponding evoked electroencephalogram (EEG) is indicative of neural processing of the speech signal. Using linguistic representations of speech, we investigate the differences in neural processing between speech in a native and foreign language that is not understood. We conducted experiments using three stimuli: a comprehensible language, an incomprehensible language, and randomly shuffled words from a comprehensible language, while recording the EEG signal of native Dutch-speaking participants. We modeled the neural tracking of linguistic features of the speech signals using a deep-learning model in a match-mismatch task that relates EEG signals to speech, while accounting for lexical segmentation features reflecting acoustic processing. The deep learning model effectively classifies coherent versus nonsense languages. We also observed significant differences in tracking patterns between comprehensible and incomprehensible speech stimuli within the same language. It demonstrates the potential of deep learning frameworks in measuring speech understanding objectively.
Collapse
Affiliation(s)
- Corentin Puffay
- Department Neurosciences, KU Leuven, ExpORL, Leuven, Belgium.
- Department of Electrical engineering (ESAT), KU Leuven, PSI, Leuven, Belgium.
| | | | - Marlies Gillis
- Department Neurosciences, KU Leuven, ExpORL, Leuven, Belgium
| | | | - Bernd Accou
- Department Neurosciences, KU Leuven, ExpORL, Leuven, Belgium
- Department of Electrical engineering (ESAT), KU Leuven, PSI, Leuven, Belgium
| | - Hugo Van Hamme
- Department of Electrical engineering (ESAT), KU Leuven, PSI, Leuven, Belgium
| | - Tom Francart
- Department Neurosciences, KU Leuven, ExpORL, Leuven, Belgium.
| |
Collapse
|
3
|
Polonenko MJ, Maddox RK. Fundamental frequency predominantly drives talker differences in auditory brainstem responses to continuous speech. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.12.603125. [PMID: 39026858 PMCID: PMC11257598 DOI: 10.1101/2024.07.12.603125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
Deriving human neural responses to natural speech is now possible, but the responses to male- and female-uttered speech have been shown to differ. These talker differences may complicate interpretations or restrict experimental designs geared toward more realistic communication scenarios. This study found that when a male and female talker had the same fundamental frequency, auditory brainstem responses (ABRs) were very similar. Those responses became smaller and later with increasing fundamental frequency, as did click ABRs with increasing stimulus rates. Modeled responses suggested that the speech and click ABR differences were reasonably predicted by peripheral and brainstem processing of stimulus acoustics.
Collapse
Affiliation(s)
- Melissa J. Polonenko
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Minneapolis, MN, 55455, USA
- Departments of Biomedical Engineering and Neuroscience, University of Rochester, Rochester, NY, 14642
| | - Ross K. Maddox
- Kresge Hearing Research Institute, Department of Otolaryngology – Head and Neck Surgery, University of Michigan, Ann Arbor, MI, 48109, USA
- Departments of Biomedical Engineering and Neuroscience, University of Rochester, Rochester, NY, 14642
| |
Collapse
|
4
|
Schüller A, Schilling A, Krauss P, Reichenbach T. The Early Subcortical Response at the Fundamental Frequency of Speech Is Temporally Separated from Later Cortical Contributions. J Cogn Neurosci 2024; 36:475-491. [PMID: 38165737 DOI: 10.1162/jocn_a_02103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2024]
Abstract
Most parts of speech are voiced, exhibiting a degree of periodicity with a fundamental frequency and many higher harmonics. Some neural populations respond to this temporal fine structure, in particular at the fundamental frequency. This frequency-following response to speech consists of both subcortical and cortical contributions and can be measured through EEG as well as through magnetoencephalography (MEG), although both differ in the aspects of neural activity that they capture: EEG is sensitive to both radial and tangential sources as well as to deep sources, whereas MEG is more restrained to the measurement of tangential and superficial neural activity. EEG responses to continuous speech have shown an early subcortical contribution, at a latency of around 9 msec, in agreement with MEG measurements in response to short speech tokens, whereas MEG responses to continuous speech have not yet revealed such an early component. Here, we analyze MEG responses to long segments of continuous speech. We find an early subcortical response at latencies of 4-11 msec, followed by later right-lateralized cortical activities at delays of 20-58 msec as well as potential subcortical activities. Our results show that the early subcortical component of the FFR to continuous speech can be measured from MEG in populations of participants and that its latency agrees with that measured with EEG. They furthermore show that the early subcortical component is temporally well separated from later cortical contributions, enabling an independent assessment of both components toward further aspects of speech processing.
Collapse
Affiliation(s)
| | | | - Patrick Krauss
- Friedrich-Alexander-Universität Erlangen-Nürnberg
- Universitätsklinikum Erlangen
| | | |
Collapse
|
5
|
Bachmann FL, Kulasingham JP, Eskelund K, Enqvist M, Alickovic E, Innes-Brown H. Extending Subcortical EEG Responses to Continuous Speech to the Sound-Field. Trends Hear 2024; 28:23312165241246596. [PMID: 38738341 DOI: 10.1177/23312165241246596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/14/2024] Open
Abstract
The auditory brainstem response (ABR) is a valuable clinical tool for objective hearing assessment, which is conventionally detected by averaging neural responses to thousands of short stimuli. Progressing beyond these unnatural stimuli, brainstem responses to continuous speech presented via earphones have been recently detected using linear temporal response functions (TRFs). Here, we extend earlier studies by measuring subcortical responses to continuous speech presented in the sound-field, and assess the amount of data needed to estimate brainstem TRFs. Electroencephalography (EEG) was recorded from 24 normal hearing participants while they listened to clicks and stories presented via earphones and loudspeakers. Subcortical TRFs were computed after accounting for non-linear processing in the auditory periphery by either stimulus rectification or an auditory nerve model. Our results demonstrated that subcortical responses to continuous speech could be reliably measured in the sound-field. TRFs estimated using auditory nerve models outperformed simple rectification, and 16 minutes of data was sufficient for the TRFs of all participants to show clear wave V peaks for both earphones and sound-field stimuli. Subcortical TRFs to continuous speech were highly consistent in both earphone and sound-field conditions, and with click ABRs. However, sound-field TRFs required slightly more data (16 minutes) to achieve clear wave V peaks compared to earphone TRFs (12 minutes), possibly due to effects of room acoustics. By investigating subcortical responses to sound-field speech stimuli, this study lays the groundwork for bringing objective hearing assessment closer to real-life conditions, which may lead to improved hearing evaluations and smart hearing technologies.
Collapse
Affiliation(s)
| | - Joshua P Kulasingham
- Automatic Control, Department of Electrical Engineering, Linköping University, Linköping, Sweden
| | | | - Martin Enqvist
- Automatic Control, Department of Electrical Engineering, Linköping University, Linköping, Sweden
| | - Emina Alickovic
- Eriksholm Research Centre, Snekkersten, Denmark
- Automatic Control, Department of Electrical Engineering, Linköping University, Linköping, Sweden
| | - Hamish Innes-Brown
- Eriksholm Research Centre, Snekkersten, Denmark
- Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
6
|
Commuri V, Kulasingham JP, Simon JZ. Cortical responses time-locked to continuous speech in the high-gamma band depend on selective attention. Front Neurosci 2023; 17:1264453. [PMID: 38156264 PMCID: PMC10752935 DOI: 10.3389/fnins.2023.1264453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 11/21/2023] [Indexed: 12/30/2023] Open
Abstract
Auditory cortical responses to speech obtained by magnetoencephalography (MEG) show robust speech tracking to the speaker's fundamental frequency in the high-gamma band (70-200 Hz), but little is currently known about whether such responses depend on the focus of selective attention. In this study 22 human subjects listened to concurrent, fixed-rate, speech from male and female speakers, and were asked to selectively attend to one speaker at a time, while their neural responses were recorded with MEG. The male speaker's pitch range coincided with the lower range of the high-gamma band, whereas the female speaker's higher pitch range had much less overlap, and only at the upper end of the high-gamma band. Neural responses were analyzed using the temporal response function (TRF) framework. As expected, the responses demonstrate robust speech tracking of the fundamental frequency in the high-gamma band, but only to the male's speech, with a peak latency of ~40 ms. Critically, the response magnitude depends on selective attention: the response to the male speech is significantly greater when male speech is attended than when it is not attended, under acoustically identical conditions. This is a clear demonstration that even very early cortical auditory responses are influenced by top-down, cognitive, neural processing mechanisms.
Collapse
Affiliation(s)
- Vrishab Commuri
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, United States
| | | | - Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, United States
- Department of Biology, University of Maryland, College Park, MD, United States
- Institute for Systems Research, University of Maryland, College Park, MD, United States
| |
Collapse
|
7
|
Schüller A, Schilling A, Krauss P, Rampp S, Reichenbach T. Attentional Modulation of the Cortical Contribution to the Frequency-Following Response Evoked by Continuous Speech. J Neurosci 2023; 43:7429-7440. [PMID: 37793908 PMCID: PMC10621774 DOI: 10.1523/jneurosci.1247-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 09/07/2023] [Accepted: 09/21/2023] [Indexed: 10/06/2023] Open
Abstract
Selective attention to one of several competing speakers is required for comprehending a target speaker among other voices and for successful communication with them. It moreover has been found to involve the neural tracking of low-frequency speech rhythms in the auditory cortex. Effects of selective attention have also been found in subcortical neural activities, in particular regarding the frequency-following response related to the fundamental frequency of speech (speech-FFR). Recent investigations have, however, shown that the speech-FFR contains cortical contributions as well. It remains unclear whether these are also modulated by selective attention. Here we used magnetoencephalography to assess the attentional modulation of the cortical contributions to the speech-FFR. We presented both male and female participants with two competing speech signals and analyzed the cortical responses during attentional switching between the two speakers. Our findings revealed robust attentional modulation of the cortical contribution to the speech-FFR: the neural responses were higher when the speaker was attended than when they were ignored. We also found that, regardless of attention, a voice with a lower fundamental frequency elicited a larger cortical contribution to the speech-FFR than a voice with a higher fundamental frequency. Our results show that the attentional modulation of the speech-FFR does not only occur subcortically but extends to the auditory cortex as well.SIGNIFICANCE STATEMENT Understanding speech in noise requires attention to a target speaker. One of the speech features that a listener can use to identify a target voice among others and attend it is the fundamental frequency, together with its higher harmonics. The fundamental frequency arises from the opening and closing of the vocal folds and is tracked by high-frequency neural activity in the auditory brainstem and in the cortex. Previous investigations showed that the subcortical neural tracking is modulated by selective attention. Here we show that attention affects the cortical tracking of the fundamental frequency as well: it is stronger when a particular voice is attended than when it is ignored.
Collapse
Affiliation(s)
- Alina Schüller
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Achim Schilling
- Neuroscience Laboratory, University Hospital Erlangen, 91058 Erlangen, Germany
| | - Patrick Krauss
- Neuroscience Laboratory, University Hospital Erlangen, 91058 Erlangen, Germany
- Pattern Recognition Lab, Department Computer Science, Friedrich-Alexander-University Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Stefan Rampp
- Department of Neurosurgery, University Hospital Erlangen, 91058 Erlangen, Germany
- Department of Neurosurgery, University Hospital Halle (Saale), 06120 Halle (Saale), Germany
- Department of Neuroradiology, University Hospital Erlangen, 91058 Erlangen, Germany
| | - Tobias Reichenbach
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nürnberg, 91054 Erlangen, Germany
| |
Collapse
|
8
|
Commuri V, Kulasingham JP, Simon JZ. Cortical Responses Time-Locked to Continuous Speech in the High-Gamma Band Depend on Selective Attention. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.20.549567. [PMID: 37546895 PMCID: PMC10401961 DOI: 10.1101/2023.07.20.549567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Auditory cortical responses to speech obtained by magnetoencephalography (MEG) show robust speech tracking to the speaker's fundamental frequency in the high-gamma band (70-200 Hz), but little is currently known about whether such responses depend on the focus of selective attention. In this study 22 human subjects listened to concurrent, fixed-rate, speech from male and female speakers, and were asked to selectively attend to one speaker at a time, while their neural responses were recorded with MEG. The male speaker's pitch range coincided with the lower range of the high-gamma band, whereas the female speaker's higher pitch range had much less overlap, and only at the upper end of the high-gamma band. Neural responses were analyzed using the temporal response function (TRF) framework. As expected, the responses demonstrate robust speech tracking of the fundamental frequency in the high-gamma band, but only to the male's speech, with a peak latency of approximately 40 ms. Critically, the response magnitude depends on selective attention: the response to the male speech is significantly greater when male speech is attended than when it is not attended, under acoustically identical conditions. This is a clear demonstration that even very early cortical auditory responses are influenced by top-down, cognitive, neural processing mechanisms.
Collapse
Affiliation(s)
- Vrishab Commuri
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, United States
| | | | - Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, United States
- Department of Biology, University of Maryland, College Park, MD, United States
- Institute for Systems Research, University of Maryland, College Park, MD, United States
| |
Collapse
|
9
|
Easwar V, Aiken S, Beh K, McGrath E, Galloy M, Scollie S, Purcell D. Variability in the Estimated Amplitude of Vowel-Evoked Envelope Following Responses Caused by Assumed Neurophysiologic Processing Delays. J Assoc Res Otolaryngol 2022; 23:759-769. [PMID: 36002663 PMCID: PMC9789223 DOI: 10.1007/s10162-022-00855-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 06/16/2022] [Indexed: 01/06/2023] Open
Abstract
Vowel-evoked envelope following responses (EFRs) reflect neural encoding of the fundamental frequency of voice (f0). Accurate analysis of EFRs elicited by natural vowels requires the use of methods like the Fourier analyzer (FA) to consider the production-related f0 changes. The FA's accuracy in estimating EFRs is, however, dependent on the assumed neurophysiological processing delay needed to time-align the f0 time course and the recorded electroencephalogram (EEG). For male-spoken vowels (f0 ~ 100 Hz), a constant 10-ms delay correction is often assumed. Since processing delays vary with stimulus and physiological factors, we quantified (i) the delay-related variability that would occur in EFR estimation, and (ii) the influence of stimulus frequency, non-f0 related neural activity, and the listener's age on such variability. EFRs were elicited by the low-frequency first formant, and mid-frequency second and higher formants of /u/, /a/, and /i/ in young adults and 6- to 17-year-old children. To time-align with the f0 time course, EEG was shifted by delays between 5 and 25 ms to encompass plausible response latencies. The delay-dependent range in EFR amplitude did not vary by stimulus frequency or age and was significantly smaller when interference from low-frequency activity was reduced. On average, the delay-dependent range was < 22% of the maximum variability in EFR amplitude that could be expected by noise. Results suggest that using a constant EEG delay correction in FA analysis does not substantially alter EFR amplitude estimation. In the present study, the lack of substantial variability was likely facilitated by using vowels with small f0 ranges.
Collapse
Affiliation(s)
- Vijayalakshmi Easwar
- Department of Communication Sciences and Disorders & Waisman Center, University of Wisconsin-Madison, Madison, WI, USA.
- National Acoustic Laboratories, Sydney, Australia.
| | - Steven Aiken
- School of Communication Sciences and Disorders, Dalhousie University, Nova Scotia, Canada
| | - Krystal Beh
- Department of Communication Sciences and Disorders & National Centre for Audiology, Western University, London, ON, Canada
| | - Emma McGrath
- Department of Communication Sciences and Disorders & Waisman Center, University of Wisconsin-Madison, Madison, WI, USA
| | - Mary Galloy
- Department of Communication Sciences and Disorders & Waisman Center, University of Wisconsin-Madison, Madison, WI, USA
| | - Susan Scollie
- Department of Communication Sciences and Disorders & National Centre for Audiology, Western University, London, ON, Canada
| | - David Purcell
- Department of Communication Sciences and Disorders & National Centre for Audiology, Western University, London, ON, Canada
| |
Collapse
|
10
|
Boothalingam S, Easwar V, Bross A. External and middle ear influence on envelope following responses. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:2794. [PMID: 36456277 DOI: 10.1121/10.0015004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 10/11/2022] [Indexed: 06/17/2023]
Abstract
Considerable between-subject variability in envelope following response (EFR) amplitude limits its clinical translation. Based on a pattern of lower amplitude and larger variability in the low (<1.2 kHz) and high (>8 kHz), relative to mid (1-3 kHz) frequency carriers, we hypothesized that the between-subject variability in external and middle ear (EM) contribute to between-subject variability in EFR amplitude. It is predicted that equalizing the stimulus reaching the cochlea by accounting for EM differences using forward pressure level (FPL) calibration would at least partially improve response amplitude and reduce between-subject variability. In 21 young normal hearing adults, EFRs of four modulation rates (91, 96, 101, and 106 Hz) were measured concurrently from four frequency bands [low (0.091-1.2 kHz), mid (1-3 kHz), high (4-5.4 kHz), and very high (vHigh; 8-9.4 kHz)], respectively, with 12 harmonics each. The results indicate that FPL calibration in-ear and in a coupler leads to larger EFR amplitudes in the low and vHigh frequency bands relative to conventional coupler root-mean-square calibration. However, improvement in variability was modest with FPL calibration. This lack of a statistically significant improvement in variability suggests that the dominant source of variability in EFR amplitude may arise from cochlear and/or neural processing.
Collapse
Affiliation(s)
- Sriram Boothalingam
- Department of Communication Sciences and Disorders, Waisman Center, University of Wisconsin-Madison, Madison, Wisconsin 53705, USA
| | - Vijayalakshmi Easwar
- Department of Communication Sciences and Disorders, Waisman Center, University of Wisconsin-Madison, Madison, Wisconsin 53705, USA
| | - Abigail Bross
- Department of Communication Sciences and Disorders, Waisman Center, University of Wisconsin-Madison, Madison, Wisconsin 53705, USA
| |
Collapse
|
11
|
Brodbeck C, Simon JZ. Cortical tracking of voice pitch in the presence of multiple speakers depends on selective attention. Front Neurosci 2022; 16:828546. [PMID: 36003957 PMCID: PMC9393379 DOI: 10.3389/fnins.2022.828546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 07/08/2022] [Indexed: 11/13/2022] Open
Abstract
Voice pitch carries linguistic and non-linguistic information. Previous studies have described cortical tracking of voice pitch in clean speech, with responses reflecting both pitch strength and pitch value. However, pitch is also a powerful cue for auditory stream segregation, especially when competing streams have pitch differing in fundamental frequency, as is the case when multiple speakers talk simultaneously. We therefore investigated how cortical speech pitch tracking is affected in the presence of a second, task-irrelevant speaker. We analyzed human magnetoencephalography (MEG) responses to continuous narrative speech, presented either as a single talker in a quiet background or as a two-talker mixture of a male and a female speaker. In clean speech, voice pitch was associated with a right-dominant response, peaking at a latency of around 100 ms, consistent with previous electroencephalography and electrocorticography results. The response tracked both the presence of pitch and the relative value of the speaker's fundamental frequency. In the two-talker mixture, the pitch of the attended speaker was tracked bilaterally, regardless of whether or not there was simultaneously present pitch in the speech of the irrelevant speaker. Pitch tracking for the irrelevant speaker was reduced: only the right hemisphere still significantly tracked pitch of the unattended speaker, and only during intervals in which no pitch was present in the attended talker's speech. Taken together, these results suggest that pitch-based segregation of multiple speakers, at least as measured by macroscopic cortical tracking, is not entirely automatic but strongly dependent on selective attention.
Collapse
Affiliation(s)
- Christian Brodbeck
- Department of Psychological Sciences, University of Connecticut, Storrs, CT, United States
- Institute for Systems Research, University of Maryland, College Park, College Park, MD, United States
| | - Jonathan Z. Simon
- Institute for Systems Research, University of Maryland, College Park, College Park, MD, United States
- Department of Electrical and Computer Engineering, University of Maryland, College Park, College Park, MD, United States
- Department of Biology, University of Maryland, College Park, College Park, MD, United States
| |
Collapse
|
12
|
Easwar V, Purcell D, Eeckhoutte MV, Aiken SJ. The Influence of Male- and Female-Spoken Vowel Acoustics on Envelope-Following Responses. Semin Hear 2022; 43:223-239. [PMID: 36313043 PMCID: PMC9605803 DOI: 10.1055/s-0042-1756165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023] Open
Abstract
The influence of male and female vowel characteristics on the envelope-following responses (EFRs) is not well understood. This study explored the role of vowel characteristics on the EFR at the fundamental frequency (f0) in response to the vowel /ε/ (as in "head"). Vowel tokens were spoken by five males and five females and EFRs were measured in 25 young adults (21 females). An auditory model was used to estimate changes in auditory processing that might account for talker effects on EFR amplitude. There were several differences between male and female vowels in relation to the EFR. For male talkers, EFR amplitudes were correlated with the bandwidth and harmonic count of the first formant, and the amplitude of the trough below the second formant. For female talkers, EFR amplitudes were correlated with the range of f0 frequencies and the amplitude of the trough above the second formant. The model suggested that the f0 EFR reflects a wide distribution of energy in speech, with primary contributions from high-frequency harmonics mediated from cochlear regions basal to the peaks of the first and second formants, not from low-frequency harmonics with energy near f0. Vowels produced by female talkers tend to produce lower-amplitude EFR, likely because they depend on higher-frequency harmonics where speech sound levels tend to be lower. This work advances auditory electrophysiology by showing how the EFR evoked by speech relates to the acoustics of speech, for both male and female voices.
Collapse
Affiliation(s)
- Vijayalakshmi Easwar
- Department of Communication Sciences and Disorders & Waisman Center, University of Wisconsin, Madison
- Department of Communication Sciences, National Acoustic Laboratories, Sydney, Australia
| | - David Purcell
- National Center for Audiology, School of Communication Sciences and Disorders, Western University, London, Canada
| | - Maaike Van Eeckhoutte
- Division of Hearing Systems, Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark
- Copenhagen Hearing and Balance Centre - Ear, Nose, Throat and Audiology Clinic, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark
- National Center for Audiology, Western University, London, Canada
| | - Steven J. Aiken
- School of Communication Sciences and Disorders, Departments of Surgery and Psychology and Neuroscience, Dalhousie University, Halifax, Canada
| |
Collapse
|
13
|
Kegler M, Weissbart H, Reichenbach T. The neural response at the fundamental frequency of speech is modulated by word-level acoustic and linguistic information. Front Neurosci 2022; 16:915744. [PMID: 35942153 PMCID: PMC9355803 DOI: 10.3389/fnins.2022.915744] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 07/04/2022] [Indexed: 11/21/2022] Open
Abstract
Spoken language comprehension requires rapid and continuous integration of information, from lower-level acoustic to higher-level linguistic features. Much of this processing occurs in the cerebral cortex. Its neural activity exhibits, for instance, correlates of predictive processing, emerging at delays of a few 100 ms. However, the auditory pathways are also characterized by extensive feedback loops from higher-level cortical areas to lower-level ones as well as to subcortical structures. Early neural activity can therefore be influenced by higher-level cognitive processes, but it remains unclear whether such feedback contributes to linguistic processing. Here, we investigated early speech-evoked neural activity that emerges at the fundamental frequency. We analyzed EEG recordings obtained when subjects listened to a story read by a single speaker. We identified a response tracking the speaker's fundamental frequency that occurred at a delay of 11 ms, while another response elicited by the high-frequency modulation of the envelope of higher harmonics exhibited a larger magnitude and longer latency of about 18 ms with an additional significant component at around 40 ms. Notably, while the earlier components of the response likely originate from the subcortical structures, the latter presumably involves contributions from cortical regions. Subsequently, we determined the magnitude of these early neural responses for each individual word in the story. We then quantified the context-independent frequency of each word and used a language model to compute context-dependent word surprisal and precision. The word surprisal represented how predictable a word is, given the previous context, and the word precision reflected the confidence about predicting the next word from the past context. We found that the word-level neural responses at the fundamental frequency were predominantly influenced by the acoustic features: the average fundamental frequency and its variability. Amongst the linguistic features, only context-independent word frequency showed a weak but significant modulation of the neural response to the high-frequency envelope modulation. Our results show that the early neural response at the fundamental frequency is already influenced by acoustic as well as linguistic information, suggesting top-down modulation of this neural response.
Collapse
Affiliation(s)
- Mikolaj Kegler
- Department of Bioengineering, Centre for Neurotechnology, Imperial College London, London, United Kingdom
| | - Hugo Weissbart
- Donders Centre for Cognitive Neuroimaging, Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Tobias Reichenbach
- Department of Bioengineering, Centre for Neurotechnology, Imperial College London, London, United Kingdom
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nuremberg, Erlangen, Germany
- *Correspondence: Tobias Reichenbach
| |
Collapse
|
14
|
Wang L, Wang Y, Liu Z, Wu EX, Chen F. A Speech-Level–Based Segmented Model to Decode the Dynamic Auditory Attention States in the Competing Speaker Scenes. Front Neurosci 2022; 15:760611. [PMID: 35221885 PMCID: PMC8866945 DOI: 10.3389/fnins.2021.760611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Accepted: 12/30/2021] [Indexed: 11/21/2022] Open
Abstract
In the competing speaker environments, human listeners need to focus or switch their auditory attention according to dynamic intentions. The reliable cortical tracking ability to the speech envelope is an effective feature for decoding the target speech from the neural signals. Moreover, previous studies revealed that the root mean square (RMS)–level–based speech segmentation made a great contribution to the target speech perception with the modulation of sustained auditory attention. This study further investigated the effect of the RMS-level–based speech segmentation on the auditory attention decoding (AAD) performance with both sustained and switched attention in the competing speaker auditory scenes. Objective biomarkers derived from the cortical activities were also developed to index the dynamic auditory attention states. In the current study, subjects were asked to concentrate or switch their attention between two competing speaker streams. The neural responses to the higher- and lower-RMS-level speech segments were analyzed via the linear temporal response function (TRF) before and after the attention switching from one to the other speaker stream. Furthermore, the AAD performance decoded by the unified TRF decoding model was compared to that by the speech-RMS-level–based segmented decoding model with the dynamic change of the auditory attention states. The results showed that the weight of the typical TRF component approximately 100-ms time lag was sensitive to the switching of the auditory attention. Compared to the unified AAD model, the segmented AAD model improved attention decoding performance under both the sustained and switched auditory attention modulations in a wide range of signal-to-masker ratios (SMRs). In the competing speaker scenes, the TRF weight and AAD accuracy could be used as effective indicators to detect the changes of the auditory attention. In addition, with a wide range of SMRs (i.e., from 6 to –6 dB in this study), the segmented AAD model showed the robust decoding performance even with short decision window length, suggesting that this speech-RMS-level–based model has the potential to decode dynamic attention states in the realistic auditory scenarios.
Collapse
Affiliation(s)
- Lei Wang
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Yihan Wang
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China
| | - Zhixing Liu
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China
| | - Ed X. Wu
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Fei Chen
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China
- *Correspondence: Fei Chen,
| |
Collapse
|
15
|
Bachmann FL, MacDonald EN, Hjortkjær J. Neural Measures of Pitch Processing in EEG Responses to Running Speech. Front Neurosci 2022; 15:738408. [PMID: 35002597 PMCID: PMC8729880 DOI: 10.3389/fnins.2021.738408] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 11/01/2021] [Indexed: 11/13/2022] Open
Abstract
Linearized encoding models are increasingly employed to model cortical responses to running speech. Recent extensions to subcortical responses suggest clinical perspectives, potentially complementing auditory brainstem responses (ABRs) or frequency-following responses (FFRs) that are current clinical standards. However, while it is well-known that the auditory brainstem responds both to transient amplitude variations and the stimulus periodicity that gives rise to pitch, these features co-vary in running speech. Here, we discuss challenges in disentangling the features that drive the subcortical response to running speech. Cortical and subcortical electroencephalographic (EEG) responses to running speech from 19 normal-hearing listeners (12 female) were analyzed. Using forward regression models, we confirm that responses to the rectified broadband speech signal yield temporal response functions consistent with wave V of the ABR, as shown in previous work. Peak latency and amplitude of the speech-evoked brainstem response were correlated with standard click-evoked ABRs recorded at the vertex electrode (Cz). Similar responses could be obtained using the fundamental frequency (F0) of the speech signal as model predictor. However, simulations indicated that dissociating responses to temporal fine structure at the F0 from broadband amplitude variations is not possible given the high co-variance of the features and the poor signal-to-noise ratio (SNR) of subcortical EEG responses. In cortex, both simulations and data replicated previous findings indicating that envelope tracking on frontal electrodes can be dissociated from responses to slow variations in F0 (relative pitch). Yet, no association between subcortical F0-tracking and cortical responses to relative pitch could be detected. These results indicate that while subcortical speech responses are comparable to click-evoked ABRs, dissociating pitch-related processing in the auditory brainstem may be challenging with natural speech stimuli.
Collapse
Affiliation(s)
- Florine L Bachmann
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| | - Ewen N MacDonald
- Department of Systems Design Engineering, University of Waterloo, Waterloo, ON, Canada
| | - Jens Hjortkjær
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, Lyngby, Denmark.,Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital - Amager and Hvidovre, Copenhagen, Denmark
| |
Collapse
|
16
|
Van Canneyt J, Wouters J, Francart T. Cortical compensation for hearing loss, but not age, in neural tracking of the fundamental frequency of the voice. J Neurophysiol 2021; 126:791-802. [PMID: 34232756 DOI: 10.1152/jn.00156.2021] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Auditory processing is affected by advancing age and hearing loss, but the underlying mechanisms are still unclear. We investigated the effects of age and hearing loss on temporal processing of naturalistic stimuli in the auditory system. We used a recently developed objective measure for neural phase-locking to the fundamental frequency of the voice (f0) which uses continuous natural speech as a stimulus, that is, "f0-tracking." The f0-tracking responses from 54 normal-hearing and 14 hearing-impaired adults of varying ages were analyzed. The responses were evoked by a Flemish story with a male talker and contained contributions from both subcortical and cortical sources. Results indicated that advancing age was related to smaller responses with less cortical response contributions. This is consistent with an age-related decrease in neural phase-locking ability at frequencies in the range of the f0, possibly due to decreased inhibition in the auditory system. Conversely, hearing-impaired subjects displayed larger responses compared with age-matched normal-hearing controls. This was due to additional cortical response contributions in the 38- to 50-ms latency range, which were stronger for participants with more severe hearing loss. This is consistent with hearing-loss-induced cortical reorganization and recruitment of additional neural resources to aid in speech perception.NEW & NOTEWORTHY Previous studies disagree on the effects of age and hearing loss on the neurophysiological processing of the fundamental frequency of the voice (f0), in part due to confounding effects. Using a novel electrophysiological technique, natural speech stimuli, and controlled study design, we quantified and disentangled the effects of age and hearing loss on neural f0 processing. We uncovered evidence for underlying neurophysiological mechanisms, including a cortical compensation mechanism for hearing loss, but not for age.
Collapse
Affiliation(s)
| | - Jan Wouters
- ExpORL, Department of Neurosciences, KU Leuven, Leuven, Belgium
| | - Tom Francart
- ExpORL, Department of Neurosciences, KU Leuven, Leuven, Belgium
| |
Collapse
|
17
|
Van Canneyt J, Wouters J, Francart T. Enhanced neural tracking of the fundamental frequency of the voice. IEEE Trans Biomed Eng 2021; 68:3612-3619. [PMID: 33983878 DOI: 10.1109/tbme.2021.3080123] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
OBJECTIVE 'F0 tracking' is a novel method that investigates neural processing of the fundamental frequency of the voice (f0) in continuous speech. Using linear modelling, a feature that reflects the f0 of a presented speech stimulus is predicted from neural EEG responses. The correlation between the predicted and the 'actual' f0 feature is a measure for neural response strength. In this study, we aimed to design a new f0 feature that approximates the expected human EEG response to the f0 in order to improve neural tracking results. METHODS Two techniques were explored: constructing the feature with a phenomenological model to simulate neural processing in the auditory periphery and low-pass filtering the feature to approximate the effect of more central processing. RESULTS Analysis of EEG-data evoked by a Flemish story in 34 subjects indicated that both the auditory model and the low-pass filter significantly improved the correlations between the actual and reconstructed feature. The combination of both strategies almost doubled the mean correlation across subjects, from 0.078 to 0.13. Moreover, canonical correlation analysis revealed two distinct processes contributing to the f0 response: one driven by broad range of auditory nerve fibers with center frequency up to 8 kHz and one driven by a more narrow selection of auditory nerve fibers, possibly responding to unresolved harmonics. CONCLUSION Optimizing the f0 feature towards the expected neural response, significantly improves f0-tracking correlations. SIGNIFICANCE The optimized f0 feature enhances the f0-tracking method, facilitating future research on temporal auditory processing in the human brain.
Collapse
|