1
|
Kovács P, Tóth B, Honbolygó F, Szalárdy O, Kohári A, Mády K, Magyari L, Winkler I. Speech prosody supports speaker selection and auditory stream segregation in a multi-talker situation. Brain Res 2023; 1805:148246. [PMID: 36657631 DOI: 10.1016/j.brainres.2023.148246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 01/06/2023] [Accepted: 01/12/2023] [Indexed: 01/19/2023]
Abstract
To process speech in a multi-talker environment, listeners need to segregate the mixture of incoming speech streams and focus their attention on one of them. Potentially, speech prosody could aid the segregation of different speakers, the selection of the desired speech stream, and detecting targets within the attended stream. For testing these issues, we recorded behavioral responses and extracted event-related potentials and functional brain networks from electroencephalographic signals recorded while participants listened to two concurrent speech streams, performing a lexical detection and a recognition memory task in parallel. Prosody manipulation was applied to the attended speech stream in one group of participants and to the ignored speech stream in another group. Naturally recorded speech stimuli were either intact, synthetically F0-flattened, or prosodically suppressed by the speaker. Results show that prosody - especially the parsing cues mediated by speech rate - facilitates stream selection, while playing a smaller role in auditory stream segmentation and target detection.
Collapse
Affiliation(s)
- Petra Kovács
- Department of Cognitive Science, Budapest University of Technology and Economics, Hungary
| | - Brigitta Tóth
- Institute of Cognitive Neuroscience and Psychology, Research Center for Natural Sciences, Hungary.
| | - Ferenc Honbolygó
- Brain Imaging Center, Research Center for Natural Sciences, Hungary
| | - Orsolya Szalárdy
- Institute of Cognitive Neuroscience and Psychology, Research Center for Natural Sciences, Hungary; Institute of Behavioural Sciences, Faculty of Medicine, Semmelweis University, Budapest, Hungary
| | - Anna Kohári
- Research Group of Phonetics, Institute for General and Hungarian Linguistics, Hungarian Research Centre for Linguistics, Hungary
| | - Katalin Mády
- Research Group of Phonetics, Institute for General and Hungarian Linguistics, Hungarian Research Centre for Linguistics, Hungary
| | - Lilla Magyari
- Department of Social Studies, Faculty of Social Sciences, University of Stavanger, Stavanger, Norway; Norwegian Centre for Reading Education and Research, Faculty of Arts and Education, University of Stavanger, Stavanger, Norway
| | - István Winkler
- Institute of Cognitive Neuroscience and Psychology, Research Center for Natural Sciences, Hungary
| |
Collapse
|
2
|
Szalárdy O, Tóth B, Farkas D, Orosz G, Winkler I. Do we parse the background into separate streams in the cocktail party? Front Hum Neurosci 2022; 16:952557. [DOI: 10.3389/fnhum.2022.952557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 10/06/2022] [Indexed: 11/13/2022] Open
Abstract
In the cocktail party situation, people with normal hearing usually follow a single speaker among multiple concurrent ones. However, there is no agreement in the literature as to whether the background is segregated into multiple streams/speakers. The current study varied the number of concurrent speech streams and investigated target detection and memory for the contents of a target stream as well as the processing of distractors. A male-voiced target stream was either presented alone (single-speech), together with one male-voiced distractor (one-distractor), or a male- and a female-voiced distractor (two-distractor). Behavioral measures of target detection and content tracking performance as well as target- and distractor detection related event-related brain potentials (ERPs) were assessed. We found that the N2 amplitude decreased whereas the P3 amplitude increased from the single-speech to the concurrent speech streams conditions. Importantly, the behavioral effect of distractors differed between the conditions with one vs. two distractor speech streams and the non-zero voltages in the N2 time window for distractor numerals and in the P3 time window for syntactic violations appearing in the non-target speech stream significantly differed between the one- and two-distractor conditions for the same (male) speaker. These results support the notion that the two background speech streams are segregated, as they show that distractors and syntactic violations appearing in the non-target streams are processed even when two speech non-target speech streams are delivered together with the target stream.
Collapse
|
3
|
Szalárdy O, Tóth B, Farkas D, Hajdu B, Orosz G, Winkler I. Who said what? The effects of speech tempo on target detection and information extraction in a multi-talker situation: An ERP and functional connectivity study. Psychophysiology 2020; 58:e13747. [PMID: 33314262 DOI: 10.1111/psyp.13747] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Revised: 10/24/2020] [Accepted: 11/18/2020] [Indexed: 11/27/2022]
Abstract
People with normal hearing can usually follow one of the several concurrent speakers. Speech tempo affects both the separation of concurrent speech streams and information extraction from them. The current study varied the tempo of two concurrent speech streams to investigate these processes in a multi-talker situation. Listeners performed a target-detection and a content-tracking task, while target-related ERPs and functional brain networks sensitive to speech tempo were extracted from the EEG signal. At slower than normal speech tempo, building the two streams required longer processing times, and possibly the utilization of higher-order, for example, syntactic and semantic cues. The observed longer reaction times and higher connectivity strength in a theta band network associated with frontal control over auditory/speech processing are compatible with this notion. With increasing tempo, target detection performance decreased and the N2b and the P3b amplitudes increased. These data suggest an increased need for strictly allocating target-detection-related resources at higher tempo. This was also reflected by the observed increase in the strength of gamma-band networks within and between frontal, temporal, and cingular areas. At the fastest tested speech tempo, there was a sharp drop in recognition memory performance, while target detection performance increased compared to the normal speech tempo. This was accompanied by a significant increase in the strength of a low alpha network associated with the suppression of task-irrelevant speech. These results suggest that participants prioritized the immediate target detection task over the continuous content tracking, likely due to some capacity limit reached the fastest speech tempo.
Collapse
Affiliation(s)
- Orsolya Szalárdy
- Faculty of Medicine, Institute of Behavioural Sciences, Semmelweis University, Budapest, Hungary.,Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Budapest, Hungary
| | - Brigitta Tóth
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Budapest, Hungary
| | - Dávid Farkas
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Budapest, Hungary
| | - Botond Hajdu
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Budapest, Hungary
| | - Gábor Orosz
- Unité de Recherche Pluridisciplinaire Sport Santé Société, Universite Artois, Universite Lille, Universite Littoral Côte d'Opale, Liévin, France
| | - István Winkler
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Budapest, Hungary
| |
Collapse
|
4
|
Yuriko Santos Kawata N, Hashimoto T, Kawashima R. Neural mechanisms underlying concurrent listening of simultaneous speech. Brain Res 2020; 1738:146821. [PMID: 32259518 DOI: 10.1016/j.brainres.2020.146821] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2019] [Revised: 03/31/2020] [Accepted: 04/03/2020] [Indexed: 10/24/2022]
Abstract
Can we identify what two people are saying at the same time? Although it is difficult to perfectly repeat two or more simultaneous messages, listeners can report information from both speakers. In a concurrent/divided listening task, enhanced attention and segregation of speech can be required rather than selection and suppression. However, the neural mechanisms of concurrent listening to multi-speaker concurrent speech has yet to be clarified. The present study utilized functional magnetic resonance imaging to examine the neural responses of healthy young adults listening to concurrent male and female speakers in an attempt to reveal the mechanism of concurrent listening. After practice and multiple trials testing concurrent listening, 31 participants achieved performance comparable with that of selective listening. Furthermore, compared to selective listening, concurrent listening induced greater activation in the anterior cingulate cortex, bilateral anterior insula, frontoparietal regions, and the periaqueductal gray region. In addition to the salience network for multi-speaker listening, attentional modulation and enhanced segregation of these signals could be used to achieve successful concurrent listening. These results indicate the presence of a potential mechanism by which one can listen to two voices with enhanced attention to saliency signals.
Collapse
Affiliation(s)
- Natasha Yuriko Santos Kawata
- Department of Functional Brain Imaging, Institute of Development, Aging and Cancer (IDAC), Tohoku University, Japan
| | - Teruo Hashimoto
- Division of Developmental Cognitive Neuroscience, Institute of Development, Aging and Cancer (IDAC), Tohoku University, Japan.
| | - Ryuta Kawashima
- Department of Functional Brain Imaging, Institute of Development, Aging and Cancer (IDAC), Tohoku University, Japan; Division of Developmental Cognitive Neuroscience, Institute of Development, Aging and Cancer (IDAC), Tohoku University, Japan
| |
Collapse
|
5
|
Szalárdy O, Tóth B, Farkas D, Orosz G, Honbolygó F, Winkler I. Linguistic predictability influences auditory stimulus classification within two concurrent speech streams. Psychophysiology 2020; 57:e13547. [DOI: 10.1111/psyp.13547] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Revised: 01/20/2020] [Accepted: 01/22/2020] [Indexed: 11/30/2022]
Affiliation(s)
- Orsolya Szalárdy
- Faculty of Medicine Institute of Behavioural Sciences Semmelweis University Budapest Hungary
- Institute of Cognitive Neuroscience and Psychology Research Centre for Natural Sciences Hungarian Academy of Sciences Budapest Hungary
| | - Brigitta Tóth
- Institute of Cognitive Neuroscience and Psychology Research Centre for Natural Sciences Hungarian Academy of Sciences Budapest Hungary
| | - Dávid Farkas
- Analytics Development, Performance Management and Analytics, Business Development, Integrated Supply Chain Management, Nokia Business Services, Nokia Operations, Nokia Budapest Hungary
| | - Gábor Orosz
- Department of Psychology Stanford University Stanford CA USA
| | - Ferenc Honbolygó
- Brain Imaging Centre Research Centre for Natural Sciences Hungarian Academy of Sciences Budapest Hungary
- Institute of Psychology ELTE Eötvös Loránd University Budapest Hungary
| | - István Winkler
- Institute of Cognitive Neuroscience and Psychology Research Centre for Natural Sciences Hungarian Academy of Sciences Budapest Hungary
| |
Collapse
|
6
|
Szalárdy O, Tóth B, Farkas D, György E, Winkler I. Neuronal Correlates of Informational and Energetic Masking in the Human Brain in a Multi-Talker Situation. Front Psychol 2019; 10:786. [PMID: 31024409 PMCID: PMC6465330 DOI: 10.3389/fpsyg.2019.00786] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Accepted: 03/21/2019] [Indexed: 11/13/2022] Open
Abstract
Human listeners can follow the voice of one speaker while several others are talking at the same time. This process requires segregating the speech streams from each other and continuously directing attention to the target stream. We investigated the functional brain networks underlying this ability. Two speech streams were presented simultaneously to participants, who followed one of them and detected targets within it (target stream). The loudness of the distractor speech stream varied on five levels: moderately softer, slightly softer, equal, slightly louder, or moderately louder than the attended. Performance measures showed that the most demanding task was the moderately softer distractors condition, which indicates that a softer distractor speech may receive more covert attention than louder distractors and, therefore, they require more cognitive resources. EEG-based measurement of functional connectivity between various brain regions revealed frequency-band specific networks: (1) energetic masking (comparing the louder distractor conditions with the equal loudness condition) was predominantly associated with stronger connectivity between the frontal and temporal regions at the lower alpha (8–10 Hz) and gamma (30–70 Hz) bands; (2) informational masking (comparing the softer distractor conditions with the equal loudness condition) was associated with a distributed network between parietal, frontal, and temporal regions at the theta (4–8 Hz) and beta (13–30 Hz) bands. These results suggest the presence of distinct cognitive and neural processes for solving the interference from energetic vs. informational masking.
Collapse
Affiliation(s)
- Orsolya Szalárdy
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary.,Institute of Behavioural Sciences, Faculty of Medicine, Semmelweis University, Budapest, Hungary
| | - Brigitta Tóth
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| | - Dávid Farkas
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| | - Erika György
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| | - István Winkler
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| |
Collapse
|
7
|
Tóth B, Farkas D, Urbán G, Szalárdy O, Orosz G, Hunyadi L, Hajdu B, Kovács A, Szabó BT, Shestopalova LB, Winkler I. Attention and speech-processing related functional brain networks activated in a multi-speaker environment. PLoS One 2019; 14:e0212754. [PMID: 30818389 PMCID: PMC6394951 DOI: 10.1371/journal.pone.0212754] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Accepted: 02/10/2019] [Indexed: 11/19/2022] Open
Abstract
Human listeners can focus on one speech stream out of several concurrent ones. The present study aimed to assess the whole-brain functional networks underlying a) the process of focusing attention on a single speech stream vs. dividing attention between two streams and 2) speech processing on different time-scales and depth. Two spoken narratives were presented simultaneously while listeners were instructed to a) track and memorize the contents of a speech stream and b) detect the presence of numerals or syntactic violations in the same ("focused attended condition") or in the parallel stream ("divided attended condition"). Speech content tracking was found to be associated with stronger connectivity in lower frequency bands (delta band- 0,5-4 Hz), whereas the detection tasks were linked with networks operating in the faster alpha (8-10 Hz) and beta (13-30 Hz) bands. These results suggest that the oscillation frequencies of the dominant brain networks during speech processing may be related to the duration of the time window within which information is integrated. We also found that focusing attention on a single speaker compared to dividing attention between two concurrent speakers was predominantly associated with connections involving the frontal cortices in the delta (0.5-4 Hz), alpha (8-10 Hz), and beta bands (13-30 Hz), whereas dividing attention between two parallel speech streams was linked with stronger connectivity involving the parietal cortices in the delta and beta frequency bands. Overall, connections strengthened by focused attention may reflect control over information selection, whereas connections strengthened by divided attention may reflect the need for maintaining two streams in parallel and the related control processes necessary for performing the tasks.
Collapse
Affiliation(s)
- Brigitta Tóth
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| | - Dávid Farkas
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
- Department of Cognitive Science, Faculty of Natural Sciences, Budapest University of Technology and Economics, Budapest, Hungary
| | - Gábor Urbán
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
- Department of Cognitive Science, Faculty of Natural Sciences, Budapest University of Technology and Economics, Budapest, Hungary
| | - Orsolya Szalárdy
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
- Institute of Behavioural Sciences, Faculty of Medicine, Semmelweis University, Budapest, Hungary
| | - Gábor Orosz
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
- Department of Social and Educational Psychology, Eötvös Loránd University, Budapest, Hungary
| | - László Hunyadi
- Department of General and Applied Linguistic, University of Debrecen, Debrecen, Hungary
| | - Botond Hajdu
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| | - Annamária Kovács
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
- Department of Telecommunication and Media Informatics, Budapest University of Technology and Economics, Budapest, Hungary
| | - Beáta Tünde Szabó
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
- Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Piliscsaba, Hungary
| | | | - István Winkler
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| |
Collapse
|