1
|
Crespo K, Vlach H, Kaushanskaya M. The effects of speaker and exemplar variability in children's cross-situational word learning. Psychon Bull Rev 2024; 31:1650-1660. [PMID: 38228967 DOI: 10.3758/s13423-023-02444-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/18/2023] [Indexed: 01/18/2024]
Abstract
Cross-situational word learning (XSWL) - children's ability to learn words by tracking co-occurrence statistics of words and their referents over time - has been identified as a fundamental mechanism underlying lexical learning. However, it is unknown whether children can acquire new words when faced with variable input in XSWL paradigms, such as varying object exemplars and variable speakers. In the present study, we examine the separate and combined effects of exemplar and speaker variability on XSWL in typically developing English-speaking monolingual children. Results revealed that variability in speakers and exemplars did not facilitate or hinder XSWL performance. However, input that varied in both speakers and exemplars simultaneously did hinder children's word learning. Results from this work suggest that XSWL mechanisms may support categorization and generalization beyond word-object associations, but that accommodating multiple forms of variable input may incur costs. Overall, this research provides new theoretical insights into how fundamental mechanisms of word learning scale to more complex and naturalistic forms of input.
Collapse
Affiliation(s)
- Kimberly Crespo
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA, USA.
| | - Haley Vlach
- Department of Educational Psychology, University of Wisconsin-Madison, Madison, WI, USA
| | - Margarita Kaushanskaya
- Department of Communication Sciences and Disorders, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
2
|
Luthra S. Why are listeners hindered by talker variability? Psychon Bull Rev 2024; 31:104-121. [PMID: 37580454 PMCID: PMC10864679 DOI: 10.3758/s13423-023-02355-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/27/2023] [Indexed: 08/16/2023]
Abstract
Though listeners readily recognize speech from a variety of talkers, accommodating talker variability comes at a cost: Myriad studies have shown that listeners are slower to recognize a spoken word when there is talker variability compared with when talker is held constant. This review focuses on two possible theoretical mechanisms for the emergence of these processing penalties. One view is that multitalker processing costs arise through a resource-demanding talker accommodation process, wherein listeners compare sensory representations against hypothesized perceptual candidates and error signals are used to adjust the acoustic-to-phonetic mapping (an active control process known as contextual tuning). An alternative proposal is that these processing costs arise because talker changes involve salient stimulus-level discontinuities that disrupt auditory attention. Some recent data suggest that multitalker processing costs may be driven by both mechanisms operating over different time scales. Fully evaluating this claim requires a foundational understanding of both talker accommodation and auditory streaming; this article provides a primer on each literature and also reviews several studies that have observed multitalker processing costs. The review closes by underscoring a need for comprehensive theories of speech perception that better integrate auditory attention and by highlighting important considerations for future research in this area.
Collapse
Affiliation(s)
- Sahil Luthra
- Department of Psychology, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, 15213, USA.
| |
Collapse
|
3
|
Har-Shai Yahav P, Sharaabi A, Zion Golumbic E. The effect of voice familiarity on attention to speech in a cocktail party scenario. Cereb Cortex 2024; 34:bhad475. [PMID: 38142293 DOI: 10.1093/cercor/bhad475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/20/2023] [Accepted: 11/20/2023] [Indexed: 12/25/2023] Open
Abstract
Selective attention to one speaker in multi-talker environments can be affected by the acoustic and semantic properties of speech. One highly ecological feature of speech that has the potential to assist in selective attention is voice familiarity. Here, we tested how voice familiarity interacts with selective attention by measuring the neural speech-tracking response to both target and non-target speech in a dichotic listening "Cocktail Party" paradigm. We measured Magnetoencephalography from n = 33 participants, presented with concurrent narratives in two different voices, and instructed to pay attention to one ear ("target") and ignore the other ("non-target"). Participants were familiarized with one of the voices during the week prior to the experiment, rendering this voice familiar to them. Using multivariate speech-tracking analysis we estimated the neural responses to both stimuli and replicate their well-established modulation by selective attention. Importantly, speech-tracking was also affected by voice familiarity, showing enhanced response for target speech and reduced response for non-target speech in the contra-lateral hemisphere, when these were in a familiar vs. an unfamiliar voice. These findings offer valuable insight into how voice familiarity, and by extension, auditory-semantics, interact with goal-driven attention, and facilitate perceptual organization and speech processing in noisy environments.
Collapse
Affiliation(s)
- Paz Har-Shai Yahav
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Aviya Sharaabi
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Elana Zion Golumbic
- The Gonda Center for Multidisciplinary Brain Research, Bar Ilan University, Ramat Gan 5290002, Israel
| |
Collapse
|
4
|
Shorey AE, King CJ, Theodore RM, Stilp CE. Talker adaptation or "talker" adaptation? Musical instrument variability impedes pitch perception. Atten Percept Psychophys 2023; 85:2488-2501. [PMID: 37258892 DOI: 10.3758/s13414-023-02722-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/26/2023] [Indexed: 06/02/2023]
Abstract
Listeners show perceptual benefits (faster and/or more accurate responses) when perceiving speech spoken by a single talker versus multiple talkers, known as talker adaptation. While near-exclusively studied in speech and with talkers, some aspects of talker adaptation might reflect domain-general processes. Music, like speech, is a sound class replete with acoustic variation, such as a multitude of pitch and instrument possibilities. Thus, it was hypothesized that perceptual benefits from structure in the acoustic signal (i.e., hearing the same sound source on every trial) are not specific to speech but rather a general auditory response. Forty nonmusician participants completed a simple musical task that mirrored talker adaptation paradigms. Low- or high-pitched notes were presented in single- and mixed-instrument blocks. Reflecting both music research on pitch and timbre interdependence and mirroring traditional "talker" adaptation paradigms, listeners were faster to make their pitch judgments when presented with a single instrument timbre relative to when the timbre was selected from one of four instruments from trial to trial. A second experiment ruled out the possibility that participants were responding faster to the specific instrument chosen as the single-instrument timbre. Consistent with general theoretical approaches to perception, perceptual benefits from signal structure are not limited to speech.
Collapse
Affiliation(s)
- Anya E Shorey
- Department of Psychological and Brain Sciences, University of Louisville, 317 Life Sciences Building, Louisville, KY, 40272, USA.
| | - Caleb J King
- Department of Psychological and Brain Sciences, University of Louisville, 317 Life Sciences Building, Louisville, KY, 40272, USA
| | - Rachel M Theodore
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, 2 Alethia Drive, Unit 1085, Storrs, CT, 06269-1085, USA
- Connecticut Institute for the Brain and Cognitive Sciences, University of Connecticut, 337 Mansfield Road, Unit 1272, Storrs, CT, 06269-1272, USA
| | - Christian E Stilp
- Department of Psychological and Brain Sciences, University of Louisville, 317 Life Sciences Building, Louisville, KY, 40272, USA
| |
Collapse
|
5
|
Crespo K, Vlach H, Kaushanskaya M. The effects of bilingualism on children's cross-situational word learning under different variability conditions. J Exp Child Psychol 2023; 229:105621. [PMID: 36689904 PMCID: PMC10088528 DOI: 10.1016/j.jecp.2022.105621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 12/20/2022] [Accepted: 12/22/2022] [Indexed: 01/22/2023]
Abstract
In the current study, we examined the separate and combined effects of exemplar and speaker variability on monolingual and bilingual children's cross-situational word learning performance. Results revealed that children's word learning performance did not differ when the input varied in a single dimension (i.e., exemplars or speakers) compared with a condition with no variability independent of their linguistic background. However, when performance in conditions that varied in a single dimension (i.e., exemplars or speakers) was compared with a condition that varied in multiple dimensions (i.e., exemplars and speakers), bilingual word learning advantages were observed; bilinguals were more likely to learn word-referent associations than monolinguals. Together, results suggest that children can learn and generalize word-referent associations from input that varies in exemplars and speakers and that bilingualism may bolster learning under conditions of increased input variability.
Collapse
Affiliation(s)
- Kimberly Crespo
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA 02215, USA.
| | - Haley Vlach
- Department of Educational Psychology, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Margarita Kaushanskaya
- Department of Communication Sciences and Disorders, University of Wisconsin-Madison, Madison, WI 53706, USA
| |
Collapse
|
6
|
Target voice probability influences enhancement in auditory selective attention. Atten Percept Psychophys 2023; 85:879-888. [PMID: 36918507 DOI: 10.3758/s13414-023-02683-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/17/2023] [Indexed: 03/16/2023]
Abstract
Auditory selective attention is thought to consist of two mechanisms: an enhancement mechanism that boosts the target signal, and a suppression mechanism that attenuates concurrent distracting signals. The current study explored the conditions necessary to observe enhancement of predictable auditory objects. Participants heard scenes consisting of three voices and a distracting noise. They were asked to find the gender singleton (target) and report whether it was saying even or odd numbers. One of the voices appeared as the high-probability target (70%) across trials. We expected responses to be faster when the high-probability target was in the scene, and results from Experiment 1 supported that prediction. However, this target enhancement effect was substantially weakened when a distracting noise was also in the scene, suggesting that the distractor captured attention and interfered with enhancement. Experiment 2 tested the hypothesis that distractor predictability modulates target enhancement by varying the probability of the distractor. Although this hypothesis was not supported, the results of Experiment 1 were replicated. Findings support the existence of an easily disruptable enhancement mechanism that boosts the representation of highly probable target objects.
Collapse
|
7
|
Noyce AL, Kwasa JAC, Shinn-Cunningham BG. Defining attention from an auditory perspective. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2023; 14:e1610. [PMID: 35642475 PMCID: PMC9712589 DOI: 10.1002/wcs.1610] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Revised: 04/24/2022] [Accepted: 04/29/2022] [Indexed: 01/17/2023]
Abstract
Attention prioritizes certain information at the expense of other information in ways that are similar across vision, audition, and other sensory modalities. It influences how-and even what-information is represented and processed, affecting brain activity at every level. Much of the core research into cognitive and neural mechanisms of attention has used visual tasks. However, the same top-down, object-based, and bottom-up attentional processes shape auditory perception, largely through the same underlying, cognitive networks. This article is categorized under: Psychology > Attention.
Collapse
|
8
|
Mathias SR, Knowles EE, Mollon J, Rodrigue AL, Woolsey MK, Hernandez AM, Garrett AS, Fox PT, Olvera RL, Peralta JM, Kumar S, Göring HH, Duggirala R, Curran JE, Blangero J, Glahn DC. The Genetic contribution to solving the cocktail-party problem. iScience 2022; 25:104997. [PMID: 36111257 PMCID: PMC9468408 DOI: 10.1016/j.isci.2022.104997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 07/19/2022] [Accepted: 08/18/2022] [Indexed: 11/25/2022] Open
Abstract
Communicating in everyday situations requires solving the cocktail-party problem, or segregating the acoustic mixture into its constituent sounds and attending to those of most interest. Humans show dramatic variation in this ability, leading some to experience real-world problems irrespective of whether they meet criteria for clinical hearing loss. Here, we estimated the genetic contribution to cocktail-party listening by measuring speech-reception thresholds (SRTs) in 425 people from large families and ranging in age from 18 to 91 years. Roughly half the variance of SRTs was explained by genes (h 2 = 0.567). The genetic correlation between SRTs and hearing thresholds (HTs) was medium (ρ G = 0.392), suggesting that the genetic factors influencing cocktail-party listening were partially distinct from those influencing sound sensitivity. Aging and socioeconomic status also strongly influenced SRTs. These findings may represent a first step toward identifying genes for "hidden hearing loss," or hearing problems in people with normal HTs.
Collapse
Affiliation(s)
- Samuel R. Mathias
- Department of Psychiatry, Boston Children’s Hospital, Boston, MA 02115, USA
- Harvard Medical School, Boston, MA 02115, USA
| | - Emma E.M. Knowles
- Department of Psychiatry, Boston Children’s Hospital, Boston, MA 02115, USA
- Harvard Medical School, Boston, MA 02115, USA
| | - Josephine Mollon
- Department of Psychiatry, Boston Children’s Hospital, Boston, MA 02115, USA
- Harvard Medical School, Boston, MA 02115, USA
| | - Amanda L. Rodrigue
- Department of Psychiatry, Boston Children’s Hospital, Boston, MA 02115, USA
- Harvard Medical School, Boston, MA 02115, USA
| | - Mary K. Woolsey
- Research Imaging Institute, University of Texas Health Science Center, San Antonio, TX 78229, USA
| | - Alyssa M. Hernandez
- Research Imaging Institute, University of Texas Health Science Center, San Antonio, TX 78229, USA
| | - Amy S. Garrett
- Research Imaging Institute, University of Texas Health Science Center, San Antonio, TX 78229, USA
| | - Peter T. Fox
- Research Imaging Institute, University of Texas Health Science Center, San Antonio, TX 78229, USA
- South Texas Veterans Health Care System, San Antonio, TX 78229, USA
| | - Rene L. Olvera
- South Texas Diabetes and Obesity Institute and Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX 78520, USA
| | - Juan M. Peralta
- South Texas Diabetes and Obesity Institute and Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX 78520, USA
| | - Satish Kumar
- South Texas Diabetes and Obesity Institute and Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX 78520, USA
| | - Harald H.H. Göring
- South Texas Diabetes and Obesity Institute and Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX 78520, USA
| | - Ravi Duggirala
- South Texas Diabetes and Obesity Institute and Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX 78520, USA
| | - Joanne E. Curran
- South Texas Diabetes and Obesity Institute and Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX 78520, USA
| | - John Blangero
- South Texas Diabetes and Obesity Institute and Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX 78520, USA
| | - David C. Glahn
- Department of Psychiatry, Boston Children’s Hospital, Boston, MA 02115, USA
- Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
9
|
Lee JJ, Perrachione TK. Implicit and explicit learning in talker identification. Atten Percept Psychophys 2022; 84:2002-2015. [PMID: 35534783 PMCID: PMC10081569 DOI: 10.3758/s13414-022-02500-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/23/2022] [Indexed: 11/08/2022]
Abstract
In the real world, listeners seem to implicitly learn talkers' vocal identities during interactions that prioritize attending to the content of talkers' speech. In contrast, most laboratory experiments of talker identification employ training paradigms that require listeners to explicitly practice identifying voices. Here, we investigated whether listeners become familiar with talkers' vocal identities during initial exposures that do not involve explicit talker identification. Participants were assigned to one of three exposure tasks, in which they heard identical stimuli but were differentially required to attend to the talkers' vocal identity or to the verbal content of their speech: (1) matching the talker to a concurrent visual cue (talker-matching); (2) discriminating whether the talker was the same as the prior trial (talker 1-back); or (3) discriminating whether speech content matched the previous trial (verbal 1-back). All participants were then tested on their ability to learn to identify talkers from novel speech content. Critically, we manipulated whether the talkers during this post-test differed from those heard during training. Compared to learning to identify novel talkers, listeners were significantly more accurate learning to identify the talkers they had previously been exposed to in the talker-matching and verbal 1-back tasks, but not the talker 1-back task. The correlation between talker identification test performance and exposure task performance was also greater when the talkers were the same in both tasks. These results suggest that listeners learn talkers' vocal identity implicitly during speech perception, even if they are not explicitly attending to the talkers' identity.
Collapse
Affiliation(s)
- Jayden J Lee
- Department of Speech, Language, & Hearing Sciences, Boston University, 635 Commonwealth Ave, Boston, MA, 02215, USA
| | - Tyler K Perrachione
- Department of Speech, Language, & Hearing Sciences, Boston University, 635 Commonwealth Ave, Boston, MA, 02215, USA.
| |
Collapse
|
10
|
Huet MP, Micheyl C, Gaudrain E, Parizet E. Vocal and semantic cues for the segregation of long concurrent speech stimuli in diotic and dichotic listening-The Long-SWoRD test. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:1557. [PMID: 35364949 DOI: 10.1121/10.0007225] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Accepted: 10/25/2021] [Indexed: 06/14/2023]
Abstract
It is not always easy to follow a conversation in a noisy environment. To distinguish between two speakers, a listener must mobilize many perceptual and cognitive processes to maintain attention on a target voice and avoid shifting attention to the background noise. The development of an intelligibility task with long stimuli-the Long-SWoRD test-is introduced. This protocol allows participants to fully benefit from the cognitive resources, such as semantic knowledge, to separate two talkers in a realistic listening environment. Moreover, this task also provides the experimenters with a means to infer fluctuations in auditory selective attention. Two experiments document the performance of normal-hearing listeners in situations where the perceptual separability of the competing voices ranges from easy to hard using a combination of voice and binaural cues. The results show a strong effect of voice differences when the voices are presented diotically. In addition, analyzing the influence of the semantic context on the pattern of responses indicates that the semantic information induces a response bias in situations where the competing voices are distinguishable and indistinguishable from one another.
Collapse
Affiliation(s)
- Moïra-Phoebé Huet
- Laboratory of Vibration and Acoustics, National Institute of Applied Sciences, University of Lyon, 20 Avenue Albert Einstein, Villeurbanne, 69100, France
| | | | - Etienne Gaudrain
- Lyon Neuroscience Research Center, Auditory Cognition and Psychoacoustics, Centre National de la Recerche Scientifique UMR5292, Institut National de la Santé et de la Recherche Médicale U1028, Université Claude Bernard Lyon 1, Université de Lyon, Centre Hospitalier Le Vinatier, Neurocampus, 95 boulevard Pinel, Bron Cedex, 69675, France
| | - Etienne Parizet
- Laboratory of Vibration and Acoustics, National Institute of Applied Sciences, University of Lyon, 20 Avenue Albert Einstein, Villeurbanne, 69100, France
| |
Collapse
|
11
|
Luberadzka J, Kayser H, Hohmann V. Making sense of periodicity glimpses in a prediction-update-loop-A computational model of attentive voice tracking. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:712. [PMID: 35232067 PMCID: PMC9088677 DOI: 10.1121/10.0009337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 11/13/2021] [Accepted: 01/03/2022] [Indexed: 06/14/2023]
Abstract
Humans are able to follow a speaker even in challenging acoustic conditions. The perceptual mechanisms underlying this ability remain unclear. A computational model of attentive voice tracking, consisting of four computational blocks: (1) sparse periodicity-based auditory features (sPAF) extraction, (2) foreground-background segregation, (3) state estimation, and (4) top-down knowledge, is presented. The model connects the theories about auditory glimpses, foreground-background segregation, and Bayesian inference. It is implemented with the sPAF, sequential Monte Carlo sampling, and probabilistic voice models. The model is evaluated by comparing it with the human data obtained in the study by Woods and McDermott [Curr. Biol. 25(17), 2238-2246 (2015)], which measured the ability to track one of two competing voices with time-varying parameters [fundamental frequency (F0) and formants (F1,F2)]. Three model versions were tested, which differ in the type of information used for the segregation: version (a) uses the oracle F0, version (b) uses the estimated F0, and version (c) uses the spectral shape derived from the estimated F0 and oracle F1 and F2. Version (a) simulates the optimal human performance in conditions with the largest separation between the voices, version (b) simulates the conditions in which the separation in not sufficient to follow the voices, and version (c) is closest to the human performance for moderate voice separation.
Collapse
Affiliation(s)
- Joanna Luberadzka
- Auditory Signal Processing, Department of Medical Physics and Acoustics, University of Oldenburg, Germany
| | - Hendrik Kayser
- Auditory Signal Processing, Department of Medical Physics and Acoustics, University of Oldenburg, Germany
| | - Volker Hohmann
- Auditory Signal Processing, Department of Medical Physics and Acoustics, University of Oldenburg, Germany
| |
Collapse
|
12
|
Distinct mechanisms for talker adaptation operate in parallel on different timescales. Psychon Bull Rev 2021; 29:627-634. [PMID: 34731443 DOI: 10.3758/s13423-021-02019-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/23/2021] [Indexed: 11/08/2022]
Abstract
The mapping between speech acoustics and phonemic representations is highly variable across talkers, and listeners are slower to recognize words when listening to multiple talkers compared with a single talker. Listeners' speech processing efficiency in mixed-talker settings improves when given time to reorient their attention to each new talker. However, it remains unknown how much time is needed to fully reorient attention to a new talker in mixed-talker settings so that speech processing becomes as efficient as when listening to a single talker. In this study, we examined how speech processing efficiency improves in mixed-talker settings as a function of the duration of continuous speech from a talker. In single-talker and mixed-talker conditions, listeners identified target words either in isolation or preceded by a carrier vowel of parametrically varying durations from 300 to 1,500 ms. Listeners' word identification was significantly slower in every mixed-talker condition compared with the corresponding single-talker condition. The costs associated with processing mixed-talker speech declined significantly as the duration of the speech carrier increased from 0 to 600 ms. However, increasing the carrier duration beyond 600 ms did not achieve further reduction in talker variability-related processing costs. These results suggest that two parallel mechanisms support processing talker variability: A stimulus-driven mechanism that operates on short timescales to reorient attention to new auditory sources, and a top-down mechanism that operates over longer timescales to allocate the cognitive resources needed to accommodate uncertainty in acoustic-phonemic correspondences during contexts where speech may come from multiple talkers.
Collapse
|
13
|
Lim SJ, Carter YD, Njoroge JM, Shinn-Cunningham BG, Perrachione TK. Talker discontinuity disrupts attention to speech: Evidence from EEG and pupillometry. BRAIN AND LANGUAGE 2021; 221:104996. [PMID: 34358924 PMCID: PMC8515637 DOI: 10.1016/j.bandl.2021.104996] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 07/11/2021] [Accepted: 07/13/2021] [Indexed: 05/13/2023]
Abstract
Speech is processed less efficiently from discontinuous, mixed talkers than one consistent talker, but little is known about the neural mechanisms for processing talker variability. Here, we measured psychophysiological responses to talker variability using electroencephalography (EEG) and pupillometry while listeners performed a delayed recall of digit span task. Listeners heard and recalled seven-digit sequences with both talker (single- vs. mixed-talker digits) and temporal (0- vs. 500-ms inter-digit intervals) discontinuities. Talker discontinuity reduced serial recall accuracy. Both talker and temporal discontinuities elicited P3a-like neural evoked response, while rapid processing of mixed-talkers' speech led to increased phasic pupil dilation. Furthermore, mixed-talkers' speech produced less alpha oscillatory power during working memory maintenance, but not during speech encoding. Overall, these results are consistent with an auditory attention and streaming framework in which talker discontinuity leads to involuntary, stimulus-driven attentional reorientation to novel speech sources, resulting in the processing interference classically associated with talker variability.
Collapse
Affiliation(s)
- Sung-Joo Lim
- Department of Speech, Language, and Hearing Sciences, Boston University, United States.
| | - Yaminah D Carter
- Department of Speech, Language, and Hearing Sciences, Boston University, United States
| | - J Michelle Njoroge
- Department of Speech, Language, and Hearing Sciences, Boston University, United States
| | | | - Tyler K Perrachione
- Department of Speech, Language, and Hearing Sciences, Boston University, United States.
| |
Collapse
|
14
|
Viswanathan V, Bharadwaj HM, Shinn-Cunningham BG, Heinz MG. Modulation masking and fine structure shape neural envelope coding to predict speech intelligibility across diverse listening conditions. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:2230. [PMID: 34598642 PMCID: PMC8483789 DOI: 10.1121/10.0006385] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 07/22/2021] [Accepted: 08/30/2021] [Indexed: 05/28/2023]
Abstract
A fundamental question in the neuroscience of everyday communication is how scene acoustics shape the neural processing of attended speech sounds and in turn impact speech intelligibility. While it is well known that the temporal envelopes in target speech are important for intelligibility, how the neural encoding of target-speech envelopes is influenced by background sounds or other acoustic features of the scene is unknown. Here, we combine human electroencephalography with simultaneous intelligibility measurements to address this key gap. We find that the neural envelope-domain signal-to-noise ratio in target-speech encoding, which is shaped by masker modulations, predicts intelligibility over a range of strategically chosen realistic listening conditions unseen by the predictive model. This provides neurophysiological evidence for modulation masking. Moreover, using high-resolution vocoding to carefully control peripheral envelopes, we show that target-envelope coding fidelity in the brain depends not only on envelopes conveyed by the cochlea, but also on the temporal fine structure (TFS), which supports scene segregation. Our results are consistent with the notion that temporal coherence of sound elements across envelopes and/or TFS influences scene analysis and attentive selection of a target sound. Our findings also inform speech-intelligibility models and technologies attempting to improve real-world speech communication.
Collapse
Affiliation(s)
- Vibha Viswanathan
- Weldon School of Biomedical Engineering, Purdue University, West Lafayette, Indiana 47907, USA
| | - Hari M Bharadwaj
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana 47907, USA
| | | | - Michael G Heinz
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana 47907, USA
| |
Collapse
|
15
|
Daly HR, Pitt MA. Distractor probability influences suppression in auditory selective attention. Cognition 2021; 216:104849. [PMID: 34332212 DOI: 10.1016/j.cognition.2021.104849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Revised: 03/05/2021] [Accepted: 07/11/2021] [Indexed: 10/20/2022]
Abstract
Auditory selective attention is thought to facilitate listening to the sound of interest (e.g., voice or music) in a noisy environment. One mechanism thought to underlie this ability is suppression of distracting stimuli. However, little is known about its operation or characteristics. We tested whether suppression in auditory selective attention capitalizes on statistical regularities in the environment to facilitate attention. Participants listened to seven-second scenes consisting of several voices speaking sequences of numbers and a distractor, which occurred more (70%) or less (30%) frequently across trials. Participants had to find the voice that was a gender singleton and report whether it was saying even or odd numbers. If suppression is an active component of auditory selective attention, task performance was expected to be better when the more frequent distractor was present. Results across the experiment and three replications revealed significantly shorter RTs when the high-probability distractor was in the scene relative to the low-probability distractor. Results are suggestive of a suppression mechanism that mitigates the detrimental influence of a frequently occurring distracting sound.
Collapse
Affiliation(s)
- Heather R Daly
- Department of Psychology, The Ohio State University, United States of America.
| | - Mark A Pitt
- Department of Psychology, The Ohio State University, United States of America
| |
Collapse
|
16
|
Jett B, Buss E, Best V, Oleson J, Calandruccio L. Does Sentence-Level Coarticulation Affect Speech Recognition in Noise or a Speech Masker? JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:1390-1403. [PMID: 33784185 PMCID: PMC8608179 DOI: 10.1044/2021_jslhr-20-00450] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Revised: 12/04/2020] [Accepted: 01/05/2021] [Indexed: 06/12/2023]
Abstract
Purpose Three experiments were conducted to better understand the role of between-word coarticulation in masked speech recognition. Specifically, we explored whether naturally coarticulated sentences supported better masked speech recognition as compared to sentences derived from individually spoken concatenated words. We hypothesized that sentence recognition thresholds (SRTs) would be similar for coarticulated and concatenated sentences in a noise masker but would be better for coarticulated sentences in a speech masker. Method Sixty young adults participated (n = 20 per experiment). An adaptive tracking procedure was used to estimate SRTs in the presence of noise or two-talker speech maskers. Targets in Experiments 1 and 2 were matrix-style sentences, while targets in Experiment 3 were semantically meaningful sentences. All experiments included coarticulated and concatenated targets; Experiments 2 and 3 included a third target type, concatenated keyword-intensity-matched (KIM) sentences, in which the words were concatenated but individually scaled to replicate the intensity contours of the coarticulated sentences. Results Regression analyses evaluated the main effects of target type, masker type, and their interaction. Across all three experiments, effects of target type were small (< 2 dB). In Experiment 1, SRTs were slightly poorer for coarticulated than concatenated sentences. In Experiment 2, coarticulation facilitated speech recognition compared to the concatenated KIM condition. When listeners had access to semantic context (Experiment 3), a coarticulation benefit was observed in noise but not in the speech masker. Conclusions Overall, differences between SRTs for sentences with and without between-word coarticulation were small. Beneficial effects of coarticulation were only observed relative to the concatenated KIM targets; for unscaled concatenated targets, it appeared that consistent audibility across the sentence offsets any benefit of coarticulation. Contrary to our hypothesis, effects of coarticulation generally were not more pronounced in speech maskers than in noise maskers.
Collapse
Affiliation(s)
- Brandi Jett
- Department of Psychological Sciences, Case Western Reserve University, Cleveland, OH
| | - Emily Buss
- Department of Otolaryngology/Head and Neck Surgery, University of North Carolina at Chapel Hill
| | - Virginia Best
- Department of Speech, Language and Hearing Sciences, Boston University, MA
| | - Jacob Oleson
- Department of Biostatistics, University of Iowa, Iowa City
| | - Lauren Calandruccio
- Department of Psychological Sciences, Case Western Reserve University, Cleveland, OH
| |
Collapse
|
17
|
Selecting among competing models of talker adaptation: Attention, cognition, and memory in speech processing efficiency. Cognition 2020; 204:104393. [PMID: 32688132 DOI: 10.1016/j.cognition.2020.104393] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Revised: 06/14/2020] [Accepted: 06/29/2020] [Indexed: 11/24/2022]
Abstract
Phonetic variability across talkers imposes additional processing costs during speech perception, often measured by performance decrements between single- and mixed-talker conditions. However, models differ in their predictions about whether accommodating greater phonetic variability (i.e., more talkers) imposes greater processing costs. We measured speech processing efficiency in a speeded word identification task, in which we manipulated the number of talkers (1, 2, 4, 8, or 16) listeners heard. Word identification was less efficient in every mixed-talker condition compared to the single-talker condition, but the magnitude of this performance decrement was not affected by the number of talkers. Furthermore, in a condition with uniform transition probabilities between two talkers, word identification was more efficient when the talker was the same as the prior trial compared to trials when the talker switched. These results support an auditory streaming model of talker adaptation, where processing costs associated with changing talkers result from attentional reorientation.
Collapse
|
18
|
Cai H, Dent ML. Attention capture in birds performing an auditory streaming task. PLoS One 2020; 15:e0235420. [PMID: 32589692 PMCID: PMC7319309 DOI: 10.1371/journal.pone.0235420] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2020] [Accepted: 06/15/2020] [Indexed: 11/19/2022] Open
Abstract
Numerous animal models have been used to investigate the neural mechanisms of auditory processing in complex acoustic environments, but it is unclear whether an animal’s auditory attention is functionally similar to a human’s in processing competing auditory scenes. Here we investigated the effects of attention capture in birds performing an objective auditory streaming paradigm. The classical ABAB… patterned pure tone sequences were modified and used for the task. We trained the birds to selectively attend to a target stream and only respond to the deviant appearing in the target stream, even though their attention may be captured by a deviant in the background stream. When no deviant appeared in the background stream, the birds experience the buildup of streaming process in a qualitatively similar way as they did in a subjective paradigm. Although the birds were trained to selectively attend to the target stream, they failed to avoid the involuntary attention switch caused by the background deviant, especially when the background deviant was sequentially unpredictable. Their global performance deteriorated more with increasingly salient background deviants, where the buildup process was reset by the background distractor. Moreover, sequential predictability of the background deviant facilitated the recovery of the buildup process after attention capture. This is the first study that addresses the perceptual consequences of the joint effects of top-down and bottom-up attention in behaving animals.
Collapse
Affiliation(s)
- Huaizhen Cai
- Department of Psychology, University at Buffalo, The State University of New York, Buffalo, New York, United States of America
| | - Micheal L. Dent
- Department of Psychology, University at Buffalo, The State University of New York, Buffalo, New York, United States of America
- * E-mail:
| |
Collapse
|
19
|
Bologna WJ, Ahlstrom JB, Dubno JR. Contributions of Voice Expectations to Talker Selection in Younger and Older Adults With Normal Hearing. Trends Hear 2020; 24:2331216520915110. [PMID: 32372720 PMCID: PMC7225833 DOI: 10.1177/2331216520915110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 03/02/2020] [Accepted: 03/03/2020] [Indexed: 11/17/2022] Open
Abstract
Focused attention on expected voice features, such as fundamental frequency (F0) and spectral envelope, may facilitate segregation and selection of a target talker in competing talker backgrounds. Age-related declines in attention may limit these abilities in older adults, resulting in poorer speech understanding in complex environments. To test this hypothesis, younger and older adults with normal hearing listened to sentences with a single competing talker. For most trials, listener attention was directed to the target by a cue phrase that matched the target talker's F0 and spectral envelope. For a small percentage of randomly occurring probe trials, the target's voice unexpectedly differed from the cue phrase in terms of F0 and spectral envelope. Overall, keyword recognition for the target talker was poorer for older adults than younger adults. Keyword recognition was poorer on probe trials than standard trials for both groups, and incorrect responses on probe trials contained keywords from the single-talker masker. No interaction was observed between age-group and the decline in keyword recognition on probe trials. Thus, reduced performance by older adults overall could not be attributed to declines in attention to an expected voice. Rather, other cognitive abilities, such as speed of processing and linguistic closure, were predictive of keyword recognition for younger and older adults. Moreover, the effects of age interacted with the sex of the target talker, such that older adults had greater difficulty understanding target keywords from female talkers than male talkers.
Collapse
Affiliation(s)
- William J. Bologna
- Department of Otolaryngology—Head and Neck Surgery, Medical University of South Carolina
| | - Jayne B. Ahlstrom
- Department of Otolaryngology—Head and Neck Surgery, Medical University of South Carolina
| | - Judy R. Dubno
- Department of Otolaryngology—Head and Neck Surgery, Medical University of South Carolina
| |
Collapse
|
20
|
Deng Y, Reinhart RMG, Choi I, Shinn-Cunningham BG. Causal links between parietal alpha activity and spatial auditory attention. eLife 2019; 8:e51184. [PMID: 31782732 PMCID: PMC6904218 DOI: 10.7554/elife.51184] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Accepted: 11/28/2019] [Indexed: 11/13/2022] Open
Abstract
Both visual and auditory spatial selective attention result in lateralized alpha (8-14 Hz) oscillatory power in parietal cortex: alpha increases in the hemisphere ipsilateral to attentional focus. Brain stimulation studies suggest a causal relationship between parietal alpha and suppression of the representation of contralateral visual space. However, there is no evidence that parietal alpha controls auditory spatial attention. Here, we performed high definition transcranial alternating current stimulation (HD-tACS) on human subjects performing an auditory task in which they directed attention based on either spatial or nonspatial features. Alpha (10 Hz) but not theta (6 Hz) HD-tACS of right parietal cortex interfered with attending left but not right auditory space. Parietal stimulation had no effect for nonspatial auditory attention. Moreover, performance in post-stimulation trials returned rapidly to baseline. These results demonstrate a causal, frequency-, hemispheric-, and task-specific effect of parietal alpha brain stimulation on top-down control of auditory spatial attention.
Collapse
Affiliation(s)
- Yuqi Deng
- Biomedical EngineeringBoston UniversityBostonUnited States
| | | | - Inyong Choi
- Communication Sciences and DisordersUniversity of IowaIowa CityUnited States
| | - Barbara G Shinn-Cunningham
- Biomedical EngineeringBoston UniversityBostonUnited States
- Neuroscience InstituteCarnegie Mellon UniversityPittsburghUnited States
| |
Collapse
|
21
|
Choi JY, Perrachione TK. Time and information in perceptual adaptation to speech. Cognition 2019; 192:103982. [PMID: 31229740 PMCID: PMC6732236 DOI: 10.1016/j.cognition.2019.05.019] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Revised: 05/11/2019] [Accepted: 05/25/2019] [Indexed: 11/18/2022]
Abstract
Perceptual adaptation to a talker enables listeners to efficiently resolve the many-to-many mapping between variable speech acoustics and abstract linguistic representations. However, models of speech perception have not delved into the variety or the quantity of information necessary for successful adaptation, nor how adaptation unfolds over time. In three experiments using speeded classification of spoken words, we explored how the quantity (duration), quality (phonetic detail), and temporal continuity of talker-specific context contribute to facilitating perceptual adaptation to speech. In single- and mixed-talker conditions, listeners identified phonetically-confusable target words in isolation or preceded by carrier phrases of varying lengths and phonetic content, spoken by the same talker as the target word. Word identification was always slower in mixed-talker conditions than single-talker ones. However, interference from talker variability decreased as the duration of preceding speech increased but was not affected by the amount of preceding talker-specific phonetic information. Furthermore, efficiency gains from adaptation depended on temporal continuity between preceding speech and the target word. These results suggest that perceptual adaptation to speech may be understood via models of auditory streaming, where perceptual continuity of an auditory object (e.g., a talker) facilitates allocation of attentional resources, resulting in more efficient perceptual processing.
Collapse
Affiliation(s)
- Ja Young Choi
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA, United States; Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA, United States
| | - Tyler K Perrachione
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA, United States.
| |
Collapse
|
22
|
Working-memory disruption by task-irrelevant talkers depends on degree of talker familiarity. Atten Percept Psychophys 2019; 81:1108-1118. [PMID: 30993655 DOI: 10.3758/s13414-019-01727-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
When one is listening, familiarity with an attended talker's voice improves speech comprehension. Here, we instead investigated the effect of familiarity with a distracting talker. In an irrelevant-speech task, we assessed listeners' working memory for the serial order of spoken digits when a task-irrelevant, distracting sentence was produced by either a familiar or an unfamiliar talker (with rare omissions of the task-irrelevant sentence). We tested two groups of listeners using the same experimental procedure. The first group were undergraduate psychology students (N = 66) who had attended an introductory statistics course. Critically, each student had been taught by one of two course instructors, whose voices served as the familiar and unfamiliar task-irrelevant talkers. The second group of listeners were family members and friends (N = 20) who had known either one of the two talkers for more than 10 years. Students, but not family members and friends, made more errors when the task-irrelevant talker was familiar versus unfamiliar. Interestingly, the effect of talker familiarity was not modulated by the presence of task-irrelevant speech: Students experienced stronger working memory disruption by a familiar talker, irrespective of whether they heard a task-irrelevant sentence during memory retention or merely expected it. While previous work has shown that familiarity with an attended talker benefits speech comprehension, our findings indicate that familiarity with an ignored talker disrupts working memory for target speech. The absence of this effect in family members and friends suggests that the degree of familiarity modulates the memory disruption.
Collapse
|
23
|
Lin G, Carlile S. The Effects of Switching Non-Spatial Attention During Conversational Turn Taking. Sci Rep 2019; 9:8057. [PMID: 31147609 PMCID: PMC6542845 DOI: 10.1038/s41598-019-44560-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Accepted: 05/17/2019] [Indexed: 11/09/2022] Open
Abstract
This study examined the effect of a change in target voice on word recall during a multi-talker conversation. Two experiments were conducted using matrix sentences to assess the cost of a single endogenous switch in non-spatial attention. Performance in a yes-no recognition task was significantly worse when a target voice changed compared to when it remained the same after a turn-taking gap. We observed a decrease in target hit rate and sensitivity, and an increase in masker confusion errors following a change in voice. These results highlight the cognitive demands of not only engaging attention on a new talker, but also of disengaging attention from a previous target voice. This shows that exposure to a voice can have a biasing effect on attention that persists well after a turn-taking gap. A second experiment showed that there was no change in switching performance using different talker combinations. This demonstrates that switching costs were consistent and did not depend on the degree of acoustic differences in target voice characteristics.
Collapse
Affiliation(s)
- Gaven Lin
- School of Medical Sciences and The Bosch Institute, University of Sydney, Sydney, New South Wales, Australia.
| | - Simon Carlile
- School of Medical Sciences and The Bosch Institute, University of Sydney, Sydney, New South Wales, Australia
| |
Collapse
|
24
|
Lim SJ, Shinn-Cunningham BG, Perrachione TK. Effects of talker continuity and speech rate on auditory working memory. Atten Percept Psychophys 2019; 81:1167-1177. [PMID: 30737757 PMCID: PMC6752734 DOI: 10.3758/s13414-019-01684-w] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Speech processing is slower and less accurate when listeners encounter speech from multiple talkers compared to one continuous talker. However, interference from multiple talkers has been investigated only using immediate speech recognition or long-term memory recognition tasks. These tasks reveal opposite effects of speech processing time on speech recognition - while fast processing of multi-talker speech impedes immediate recognition, it also results in more abstract and less talker-specific long-term memories for speech. Here, we investigated whether and how processing multi-talker speech disrupts working memory maintenance, an intermediate stage between perceptual recognition and long-term memory. In a digit sequence recall task, listeners encoded seven-digit sequences and recalled them after a 5-s delay. Sequences were spoken by either a single talker or multiple talkers at one of three presentation rates (0-, 200-, and 500-ms inter-digit intervals). Listeners' recall was slower and less accurate for sequences spoken by multiple talkers than a single talker. Especially for the fastest presentation rate, listeners were less efficient when recalling sequences spoken by multiple talkers. Our results reveal that talker-specificity effects for speech working memory are most prominent when listeners must rapidly encode speech. These results suggest that, like immediate speech recognition, working memory for speech is susceptible to interference from variability across talkers. While many studies ascribe effects of talker variability to the need to calibrate perception to talker-specific acoustics, these results are also consistent with the idea that a sudden change of talkers disrupts attentional focus, interfering with efficient working-memory processing.
Collapse
Affiliation(s)
- Sung-Joo Lim
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Ave, Boston, MA, 02215, USA.
- Biomedical Engineering, Boston University, Boston, MA, USA.
| | | | - Tyler K Perrachione
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Ave, Boston, MA, 02215, USA
| |
Collapse
|
25
|
Goldsworthy RL, Markle KL. Pediatric Hearing Loss and Speech Recognition in Quiet and in Different Types of Background Noise. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2019; 62:758-767. [PMID: 30950727 PMCID: PMC9907566 DOI: 10.1044/2018_jslhr-h-17-0389] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Revised: 04/23/2018] [Accepted: 10/12/2018] [Indexed: 05/27/2023]
Abstract
Purpose Speech recognition deteriorates with hearing loss, particularly in fluctuating background noise. This study examined how hearing loss affects speech recognition in different types of noise to clarify how characteristics of the noise interact with the benefits listeners receive when listening in fluctuating compared to steady-state noise. Method Speech reception thresholds were measured for a closed set of spondee words in children (ages 5-17 years) in quiet, speech-spectrum noise, 2-talker babble, and instrumental music. Twenty children with normal hearing and 43 children with hearing loss participated; children with hearing loss were subdivided into groups with cochlear implant (18 children) and hearing aid (25 children) groups. A cohort of adults with normal hearing was included for comparison. Results Hearing loss had a large effect on speech recognition for each condition, but the effect of hearing loss was largest in 2-talker babble and smallest in speech-spectrum noise. Children with normal hearing had better speech recognition in 2-talker babble than in speech-spectrum noise, whereas children with hearing loss had worse recognition in 2-talker babble than in speech-spectrum noise. Almost all subjects had better speech recognition in instrumental music compared to speech-spectrum noise, but with less of a difference observed for children with hearing loss. Conclusions Speech recognition is more sensitive to the effects of hearing loss when measured in fluctuating compared to steady-state noise. Speech recognition measured in fluctuating noise depends on an interaction of hearing loss with characteristics of the background noise; specifically, children with hearing loss were able to derive a substantial benefit for listening in fluctuating noise when measured in instrumental music compared to 2-talker babble.
Collapse
|
26
|
Object-based attention in complex, naturalistic auditory streams. Sci Rep 2019; 9:2854. [PMID: 30814547 PMCID: PMC6393668 DOI: 10.1038/s41598-019-39166-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Accepted: 01/14/2019] [Indexed: 11/08/2022] Open
Abstract
In vision, objects have been described as the 'units' on which non-spatial attention operates in many natural settings. Here, we test the idea of object-based attention in the auditory domain within ecologically valid auditory scenes, composed of two spatially and temporally overlapping sound streams (speech signal vs. environmental soundscapes in Experiment 1 and two speech signals in Experiment 2). Top-down attention was directed to one or the other auditory stream by a non-spatial cue. To test for high-level, object-based attention effects we introduce an auditory repetition detection task in which participants have to detect brief repetitions of auditory objects, ruling out any possible confounds with spatial or feature-based attention. The participants' responses were significantly faster and more accurate in the valid cue condition compared to the invalid cue condition, indicating a robust cue-validity effect of high-level, object-based auditory attention.
Collapse
|
27
|
Kreitewolf J, Mathias SR, Trapeau R, Obleser J, Schönwiesner M. Perceptual grouping in the cocktail party: Contributions of voice-feature continuity. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:2178. [PMID: 30404485 DOI: 10.1121/1.5058684] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Accepted: 09/18/2018] [Indexed: 06/08/2023]
Abstract
Cocktail parties pose a difficult yet solvable problem for the auditory system. Previous work has shown that the cocktail-party problem is considerably easier when all sounds in the target stream are spoken by the same talker (the voice-continuity benefit). The present study investigated the contributions of two of the most salient voice features-glottal-pulse rate (GPR) and vocal-tract length (VTL)-to the voice-continuity benefit. Twenty young, normal-hearing listeners participated in two experiments. On each trial, listeners heard concurrent sequences of spoken digits from three different spatial locations and reported the digits coming from a target location. Critically, across conditions, GPR and VTL either remained constant or varied across target digits. Additionally, across experiments, the target location either remained constant (Experiment 1) or varied (Experiment 2) within a trial. In Experiment 1, listeners benefited from continuity in either voice feature, but VTL continuity was more helpful than GPR continuity. In Experiment 2, spatial discontinuity greatly hindered listeners' abilities to exploit continuity in GPR and VTL. The present results suggest that selective attention benefits from continuity in target voice features and that VTL and GPR play different roles for perceptual grouping and stream segregation in the cocktail party.
Collapse
Affiliation(s)
- Jens Kreitewolf
- International Laboratory for Brain, Music and Sound Research (BRAMS), Department of Psychology, Université de Montréal, Pavillon 1420 Boulevard Mont-Royal, Outremont, Quebec, H2V 4P3, Canada
| | - Samuel R Mathias
- Neurocognition, Neurocomputation and Neurogenetics (n3) Division, Yale University School of Medicine, 40 Temple Street, New Haven, Connecticut 06511, USA
| | - Régis Trapeau
- International Laboratory for Brain, Music and Sound Research (BRAMS), Department of Psychology, Université de Montréal, Pavillon 1420 Boulevard Mont-Royal, Outremont, Quebec, H2V 4P3, Canada
| | - Jonas Obleser
- Department of Psychology, University of Lübeck, Maria-Goeppert-Straße 9a, D-23562 Lübeck, Germany
| | - Marc Schönwiesner
- International Laboratory for Brain, Music and Sound Research (BRAMS), Department of Psychology, Université de Montréal, Pavillon 1420 Boulevard Mont-Royal, Outremont, Quebec, H2V 4P3, Canada
| |
Collapse
|
28
|
Paredes-Gallardo A, Innes-Brown H, Madsen SMK, Dau T, Marozeau J. Auditory Stream Segregation and Selective Attention for Cochlear Implant Listeners: Evidence From Behavioral Measures and Event-Related Potentials. Front Neurosci 2018; 12:581. [PMID: 30186105 PMCID: PMC6110823 DOI: 10.3389/fnins.2018.00581] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Accepted: 08/02/2018] [Indexed: 11/13/2022] Open
Abstract
The role of the spatial separation between the stimulating electrodes (electrode separation) in sequential stream segregation was explored in cochlear implant (CI) listeners using a deviant detection task. Twelve CI listeners were instructed to attend to a series of target sounds in the presence of interleaved distractor sounds. A deviant was randomly introduced in the target stream either at the beginning, middle or end of each trial. The listeners were asked to detect sequences that contained a deviant and to report its location within the trial. The perceptual segregation of the streams should, therefore, improve deviant detection performance. The electrode range for the distractor sounds was varied, resulting in different amounts of overlap between the target and the distractor streams. For the largest electrode separation condition, event-related potentials (ERPs) were recorded under active and passive listening conditions. The listeners were asked to perform the behavioral task for the active listening condition and encouraged to watch a muted movie for the passive listening condition. Deviant detection performance improved with increasing electrode separation between the streams, suggesting that larger electrode differences facilitate the segregation of the streams. Deviant detection performance was best for deviants happening late in the sequence, indicating that a segregated percept builds up over time. The analysis of the ERP waveforms revealed that auditory selective attention modulates the ERP responses in CI listeners. Specifically, the responses to the target stream were, overall, larger in the active relative to the passive listening condition. Conversely, the ERP responses to the distractor stream were not affected by selective attention. However, no significant correlation was observed between the behavioral performance and the amount of attentional modulation. Overall, the findings from the present study suggest that CI listeners can use electrode separation to perceptually group sequential sounds. Moreover, selective attention can be deployed on the resulting auditory objects, as reflected by the attentional modulation of the ERPs at the group level.
Collapse
Affiliation(s)
- Andreu Paredes-Gallardo
- Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Hamish Innes-Brown
- Department of Medical Bionics, The University of Melbourne, Melbourne, VIC, Australia.,Bionics Institute, East Melbourne, VIC, Australia
| | - Sara M K Madsen
- Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Torsten Dau
- Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Jeremy Marozeau
- Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Kongens Lyngby, Denmark
| |
Collapse
|
29
|
Mehraei G, Shinn-Cunningham B, Dau T. Influence of talker discontinuity on cortical dynamics of auditory spatial attention. Neuroimage 2018; 179:548-556. [PMID: 29960089 DOI: 10.1016/j.neuroimage.2018.06.067] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2018] [Revised: 06/12/2018] [Accepted: 06/25/2018] [Indexed: 11/25/2022] Open
Abstract
In everyday acoustic scenes, listeners face the challenge of selectively attending to a sound source and maintaining attention on that source long enough to extract meaning. This task is made more daunting by frequent perceptual discontinuities in the acoustic scene: talkers move in space and conversations switch from one speaker to another in a background of many other sources. The inherent dynamics of such switches directly impact our ability to sustain attention. Here we asked how discontinuity in talker voice affects the ability to focus auditory attention to sounds from a particular location as well as neural correlates of underlying processes. During electroencephalography recordings, listeners attended to a stream of spoken syllables from one direction while ignoring distracting syllables from a different talker from the opposite hemifield. On some trials, the talker switched locations in the middle of the streams, creating a discontinuity. This switch disrupted attentional modulation of cortical responses; specifically, event-related potentials evoked by syllables in the to-be-attended direction were suppressed and power in alpha oscillations (8-12 Hz) were reduced following the discontinuity. Importantly, at an individual level, the ability to maintain attention to a target stream and report its content, despite the discontinuity, correlates with the magnitude of the disruption of these cortical responses. These results have implications for understanding cortical mechanisms supporting attention. The changes in the cortical responses may serve as a predictor of how well individuals can communicate in complex acoustic scenes and may help in the development of assistive devices and interventions to aid clinical populations.
Collapse
Affiliation(s)
- Golbarg Mehraei
- Hearing Systems Group, Technical University of Denmark, Ørsteds Plads Building 352, 2800, Kongens Lyngby, Denmark.
| | - Barbara Shinn-Cunningham
- Center for Research in Sensory Communication and Emerging Neural Technology, Boston University, Boston, MA, 02215, USA; Department of Biomedical Engineering, Boston University, Boston, MA, 02215, USA
| | - Torsten Dau
- Hearing Systems Group, Technical University of Denmark, Ørsteds Plads Building 352, 2800, Kongens Lyngby, Denmark
| |
Collapse
|
30
|
Abstract
OBJECTIVES Cochlear implants (CIs) restore hearing to the profoundly deaf by direct electrical stimulation of the auditory nerve. To provide an optimal electrical stimulation pattern the CI must be individually fitted to each CI user. To date, CI fitting is primarily based on subjective feedback from the user. However, not all CI users are able to provide such feedback, for example, small children. This study explores the possibility of using the electroencephalogram (EEG) to objectively determine if CI users are able to hear differences in tones presented to them, which has potential applications in CI fitting or closed loop systems. DESIGN Deviant and standard stimuli were presented to 12 CI users in an active auditory oddball paradigm. The EEG was recorded in two sessions and classification of the EEG data was performed with shrinkage linear discriminant analysis. Also, the impact of CI artifact removal on classification performance and the possibility to reuse a trained classifier in future sessions were evaluated. RESULTS Overall, classification performance was above chance level for all participants although performance varied considerably between participants. Also, artifacts were successfully removed from the EEG without impairing classification performance. Finally, reuse of the classifier causes only a small loss in classification performance. CONCLUSIONS Our data provide first evidence that EEG can be automatically classified on single-trial basis in CI users. Despite the slightly poorer classification performance over sessions, classifier and CI artifact correction appear stable over successive sessions. Thus, classifier and artifact correction weights can be reused without repeating the set-up procedure in every session, which makes the technique easier applicable. With our present data, we can show successful classification of event-related cortical potential patterns in CI users. In the future, this has the potential to objectify and automate parts of CI fitting procedures.
Collapse
|
31
|
Shinn-Cunningham B, Best V, Lee AKC. Auditory Object Formation and Selection. SPRINGER HANDBOOK OF AUDITORY RESEARCH 2017. [DOI: 10.1007/978-3-319-51662-2_2] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
|
32
|
Dai L, Shinn-Cunningham BG. Contributions of Sensory Coding and Attentional Control to Individual Differences in Performance in Spatial Auditory Selective Attention Tasks. Front Hum Neurosci 2016; 10:530. [PMID: 27812330 PMCID: PMC5071360 DOI: 10.3389/fnhum.2016.00530] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2016] [Accepted: 10/05/2016] [Indexed: 11/13/2022] Open
Abstract
Listeners with normal hearing thresholds (NHTs) differ in their ability to steer attention to whatever sound source is important. This ability depends on top-down executive control, which modulates the sensory representation of sound in the cortex. Yet, this sensory representation also depends on the coding fidelity of the peripheral auditory system. Both of these factors may thus contribute to the individual differences in performance. We designed a selective auditory attention paradigm in which we could simultaneously measure envelope following responses (EFRs, reflecting peripheral coding), onset event-related potentials (ERPs) from the scalp (reflecting cortical responses to sound) and behavioral scores. We performed two experiments that varied stimulus conditions to alter the degree to which performance might be limited due to fine stimulus details vs. due to control of attentional focus. Consistent with past work, in both experiments we find that attention strongly modulates cortical ERPs. Importantly, in Experiment I, where coding fidelity limits the task, individual behavioral performance correlates with subcortical coding strength (derived by computing how the EFR is degraded for fully masked tones compared to partially masked tones); however, in this experiment, the effects of attention on cortical ERPs were unrelated to individual subject performance. In contrast, in Experiment II, where sensory cues for segregation are robust (and thus less of a limiting factor on task performance), inter-subject behavioral differences correlate with subcortical coding strength. In addition, after factoring out the influence of subcortical coding strength, behavioral differences are also correlated with the strength of attentional modulation of ERPs. These results support the hypothesis that behavioral abilities amongst listeners with NHTs can arise due to both subcortical coding differences and differences in attentional control, depending on stimulus characteristics and task demands.
Collapse
Affiliation(s)
- Lengshi Dai
- Department of Biomedical Engineering, Boston University Boston, MA, USA
| | | |
Collapse
|
33
|
Dai L, Shinn-Cunningham BG. Contributions of Sensory Coding and Attentional Control to Individual Differences in Performance in Spatial Auditory Selective Attention Tasks. Front Hum Neurosci 2016. [PMID: 27812330 DOI: 10.3389/fnhum.2016.00530/bibtex] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Listeners with normal hearing thresholds (NHTs) differ in their ability to steer attention to whatever sound source is important. This ability depends on top-down executive control, which modulates the sensory representation of sound in the cortex. Yet, this sensory representation also depends on the coding fidelity of the peripheral auditory system. Both of these factors may thus contribute to the individual differences in performance. We designed a selective auditory attention paradigm in which we could simultaneously measure envelope following responses (EFRs, reflecting peripheral coding), onset event-related potentials (ERPs) from the scalp (reflecting cortical responses to sound) and behavioral scores. We performed two experiments that varied stimulus conditions to alter the degree to which performance might be limited due to fine stimulus details vs. due to control of attentional focus. Consistent with past work, in both experiments we find that attention strongly modulates cortical ERPs. Importantly, in Experiment I, where coding fidelity limits the task, individual behavioral performance correlates with subcortical coding strength (derived by computing how the EFR is degraded for fully masked tones compared to partially masked tones); however, in this experiment, the effects of attention on cortical ERPs were unrelated to individual subject performance. In contrast, in Experiment II, where sensory cues for segregation are robust (and thus less of a limiting factor on task performance), inter-subject behavioral differences correlate with subcortical coding strength. In addition, after factoring out the influence of subcortical coding strength, behavioral differences are also correlated with the strength of attentional modulation of ERPs. These results support the hypothesis that behavioral abilities amongst listeners with NHTs can arise due to both subcortical coding differences and differences in attentional control, depending on stimulus characteristics and task demands.
Collapse
Affiliation(s)
- Lengshi Dai
- Department of Biomedical Engineering, Boston University Boston, MA, USA
| | | |
Collapse
|
34
|
Samson F, Johnsrude IS. Effects of a consistent target or masker voice on target speech intelligibility in two- and three-talker mixtures. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 139:1037-1046. [PMID: 27036241 DOI: 10.1121/1.4942589] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
When the spatial location or identity of a sound is held constant, it is not masked as effectively by competing sounds. This suggests that experience with a particular voice over time might facilitate perceptual organization in multitalker environments. The current study examines whether listeners benefit from experience with a voice only when it is the target, or also when it is a masker, using diotic presentation and a closed-set task (coordinate response measure). A reliable interaction was observed such that, in two-talker mixtures, consistency of masker or target voice over 3-7 trials significantly benefited target recognition performance, whereas in three-talker mixtures, target, but not masker, consistency was beneficial. Overall, this work suggests that voice consistency improves intelligibility, although somewhat differently when two talkers, compared to three talkers, are present, suggesting that consistent-voice information facilitates intelligibility in at least two different ways. Listeners can use a template-matching strategy to extract a known voice from a mixture when it is the target. However, consistent-voice information facilitates segregation only when two, but not three, talkers are present.
Collapse
Affiliation(s)
- Fabienne Samson
- Department of Psychology, The Brain and Mind Institute, Natural Sciences Center, Room 227, The University of Western Ontario, London, Ontario, N6A 5B7, Canada
| | - Ingrid S Johnsrude
- Department of Psychology, The Brain and Mind Institute, Natural Sciences Center, Room 227, The University of Western Ontario, London, Ontario, N6A 5B7, Canada
| |
Collapse
|
35
|
Zimmermann JF, Moscovitch M, Alain C. Attending to auditory memory. Brain Res 2015; 1640:208-21. [PMID: 26638836 DOI: 10.1016/j.brainres.2015.11.032] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2015] [Revised: 11/18/2015] [Accepted: 11/19/2015] [Indexed: 10/22/2022]
Abstract
Attention to memory describes the process of attending to memory traces when the object is no longer present. It has been studied primarily for representations of visual stimuli with only few studies examining attention to sound object representations in short-term memory. Here, we review the interplay of attention and auditory memory with an emphasis on 1) attending to auditory memory in the absence of related external stimuli (i.e., reflective attention) and 2) effects of existing memory on guiding attention. Attention to auditory memory is discussed in the context of change deafness, and we argue that failures to detect changes in our auditory environments are most likely the result of a faulty comparison system of incoming and stored information. Also, objects are the primary building blocks of auditory attention, but attention can also be directed to individual features (e.g., pitch). We review short-term and long-term memory guided modulation of attention based on characteristic features, location, and/or semantic properties of auditory objects, and propose that auditory attention to memory pathways emerge after sensory memory. A neural model for auditory attention to memory is developed, which comprises two separate pathways in the parietal cortex, one involved in attention to higher-order features and the other involved in attention to sensory information. This article is part of a Special Issue entitled SI: Auditory working memory.
Collapse
Affiliation(s)
- Jacqueline F Zimmermann
- University of Toronto, Department of Psychology, Sidney Smith Hall, 100 St. George Street, Toronto, Ontario, Canada M5S 3G3; Rotman Research Institute, Baycrest Hospital, 3560 Bathurst Street, Toronto, Ontario, Canada M6A 2E1.
| | - Morris Moscovitch
- University of Toronto, Department of Psychology, Sidney Smith Hall, 100 St. George Street, Toronto, Ontario, Canada M5S 3G3; Rotman Research Institute, Baycrest Hospital, 3560 Bathurst Street, Toronto, Ontario, Canada M6A 2E1
| | - Claude Alain
- University of Toronto, Department of Psychology, Sidney Smith Hall, 100 St. George Street, Toronto, Ontario, Canada M5S 3G3; Rotman Research Institute, Baycrest Hospital, 3560 Bathurst Street, Toronto, Ontario, Canada M6A 2E1; Institute of Medical Sciences, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
36
|
Ohl FW. Role of cortical neurodynamics for understanding the neural basis of motivated behavior - lessons from auditory category learning. Curr Opin Neurobiol 2014; 31:88-94. [PMID: 25241212 DOI: 10.1016/j.conb.2014.08.014] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2014] [Revised: 08/26/2014] [Accepted: 08/28/2014] [Indexed: 11/25/2022]
Abstract
Rhythmic activity appears in the auditory cortex in both microscopic and macroscopic observables and is modulated by both bottom-up and top-down processes. How this activity serves both types of processes is largely unknown. Here we review studies that have recently improved our understanding of potential functional roles of large-scale global dynamic activity patterns in auditory cortex. The experimental paradigm of auditory category learning allowed critical testing of the hypothesis that global auditory cortical activity states are associated with endogenous cognitive states mediating the meaning associated with an acoustic stimulus rather than with activity states that merely represent the stimulus for further processing.
Collapse
Affiliation(s)
- Frank W Ohl
- Leibniz Institute for Neurobiology, Department of Systems Physiology of Learning, Brenneckestr. 6, D-39118 Magdeburg, Germany.
| |
Collapse
|
37
|
Bendixen A, Koch I. Editorial for special issue: "auditory attention: merging paradigms and perspectives". PSYCHOLOGICAL RESEARCH 2014; 78:301-3. [PMID: 24638844 DOI: 10.1007/s00426-014-0562-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2014] [Accepted: 03/06/2014] [Indexed: 10/25/2022]
Affiliation(s)
- Alexandra Bendixen
- Auditory Psychophysiology Lab, Department of Psychology, Cluster of Excellence "Hearing4all", European Medical School, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany,
| | | |
Collapse
|