1
|
Cusimano M, Hewitt LB, McDermott JH. Listening with generative models. Cognition 2024; 253:105874. [PMID: 39216190 DOI: 10.1016/j.cognition.2024.105874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 03/31/2024] [Accepted: 07/03/2024] [Indexed: 09/04/2024]
Abstract
Perception has long been envisioned to use an internal model of the world to explain the causes of sensory signals. However, such accounts have historically not been testable, typically requiring intractable search through the space of possible explanations. Using auditory scenes as a case study, we leveraged contemporary computational tools to infer explanations of sounds in a candidate internal generative model of the auditory world (ecologically inspired audio synthesizers). Model inferences accounted for many classic illusions. Unlike traditional accounts of auditory illusions, the model is applicable to any sound, and exhibited human-like perceptual organization for real-world sound mixtures. The combination of stimulus-computability and interpretable model structure enabled 'rich falsification', revealing additional assumptions about sound generation needed to account for perception. The results show how generative models can account for the perception of both classic illusions and everyday sensory signals, and illustrate the opportunities and challenges involved in incorporating them into theories of perception.
Collapse
Affiliation(s)
- Maddie Cusimano
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, United States of America.
| | - Luke B Hewitt
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, United States of America
| | - Josh H McDermott
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, United States of America; McGovern Institute, Massachusetts Institute of Technology, United States of America; Center for Brains Minds and Machines, Massachusetts Institute of Technology, United States of America; Speech and Hearing Bioscience and Technology, Harvard University, United States of America.
| |
Collapse
|
2
|
Luthra S. Why are listeners hindered by talker variability? Psychon Bull Rev 2024; 31:104-121. [PMID: 37580454 PMCID: PMC10864679 DOI: 10.3758/s13423-023-02355-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/27/2023] [Indexed: 08/16/2023]
Abstract
Though listeners readily recognize speech from a variety of talkers, accommodating talker variability comes at a cost: Myriad studies have shown that listeners are slower to recognize a spoken word when there is talker variability compared with when talker is held constant. This review focuses on two possible theoretical mechanisms for the emergence of these processing penalties. One view is that multitalker processing costs arise through a resource-demanding talker accommodation process, wherein listeners compare sensory representations against hypothesized perceptual candidates and error signals are used to adjust the acoustic-to-phonetic mapping (an active control process known as contextual tuning). An alternative proposal is that these processing costs arise because talker changes involve salient stimulus-level discontinuities that disrupt auditory attention. Some recent data suggest that multitalker processing costs may be driven by both mechanisms operating over different time scales. Fully evaluating this claim requires a foundational understanding of both talker accommodation and auditory streaming; this article provides a primer on each literature and also reviews several studies that have observed multitalker processing costs. The review closes by underscoring a need for comprehensive theories of speech perception that better integrate auditory attention and by highlighting important considerations for future research in this area.
Collapse
Affiliation(s)
- Sahil Luthra
- Department of Psychology, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, 15213, USA.
| |
Collapse
|
3
|
Attentional control via synaptic gain mechanisms in auditory streaming. Brain Res 2021; 1778:147720. [PMID: 34785256 DOI: 10.1016/j.brainres.2021.147720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 09/13/2021] [Accepted: 11/05/2021] [Indexed: 11/21/2022]
Abstract
Attention is a crucial component in sound source segregation allowing auditory objects of interest to be both singled out and held in focus. Our study utilizes a fundamental paradigm for sound source segregation: a sequence of interleaved tones, A and B, of different frequencies that can be heard as a single integrated stream or segregated into two streams (auditory streaming paradigm). We focus on the irregular alternations between integrated and segregated that occur for long presentations, so-called auditory bistability. Psychaoustic experiments demonstrate how attentional control, a listener's intention to experience integrated or segregated, biases perception in favour of different perceptual interpretations. Our data show that this is achieved by prolonging the dominance times of the attended percept and, to a lesser extent, by curtailing the dominance times of the unattended percept, an effect that remains consistent across a range of values for the difference in frequency between A and B. An existing neuromechanistic model describes the neural dynamics of perceptual competition downstream of primary auditory cortex (A1). The model allows us to propose plausible neural mechanisms for attentional control, as linked to different attentional strategies, in a direct comparison with behavioural data. A mechanism based on a percept-specific input gain best accounts for the effects of attentional control.
Collapse
|
4
|
Luthra S, Peraza‐Santiago G, Beeson K, Saltzman D, Crinnion AM, Magnuson JS. Robust Lexically Mediated Compensation for Coarticulation: Christmash Time Is Here Again. Cogn Sci 2021; 45:e12962. [PMID: 33877697 PMCID: PMC8243960 DOI: 10.1111/cogs.12962] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Revised: 02/10/2021] [Accepted: 02/19/2021] [Indexed: 11/30/2022]
Abstract
A long-standing question in cognitive science is how high-level knowledge is integrated with sensory input. For example, listeners can leverage lexical knowledge to interpret an ambiguous speech sound, but do such effects reflect direct top-down influences on perception or merely postperceptual biases? A critical test case in the domain of spoken word recognition is lexically mediated compensation for coarticulation (LCfC). Previous LCfC studies have shown that a lexically restored context phoneme (e.g., /s/ in Christma#) can alter the perceived place of articulation of a subsequent target phoneme (e.g., the initial phoneme of a stimulus from a tapes-capes continuum), consistent with the influence of an unambiguous context phoneme in the same position. Because this phoneme-to-phoneme compensation for coarticulation is considered sublexical, scientists agree that evidence for LCfC would constitute strong support for top-down interaction. However, results from previous LCfC studies have been inconsistent, and positive effects have often been small. Here, we conducted extensive piloting of stimuli prior to testing for LCfC. Specifically, we ensured that context items elicited robust phoneme restoration (e.g., that the final phoneme of Christma# was reliably identified as /s/) and that unambiguous context-final segments (e.g., a clear /s/ at the end of Christmas) drove reliable compensation for coarticulation for a subsequent target phoneme. We observed robust LCfC in a well-powered, preregistered experiment with these pretested items (N = 40) as well as in a direct replication study (N = 40). These results provide strong evidence in favor of computational models of spoken word recognition that include top-down feedback.
Collapse
Affiliation(s)
| | | | | | | | | | - James S. Magnuson
- Psychological SciencesUniversity of Connecticut
- BCBL, Basque Center on Cognition Brain and Language
- Ikerbasque, Basque Foundation for Science
| |
Collapse
|
5
|
Friston KJ, Sajid N, Quiroga-Martinez DR, Parr T, Price CJ, Holmes E. Active listening. Hear Res 2021; 399:107998. [PMID: 32732017 PMCID: PMC7812378 DOI: 10.1016/j.heares.2020.107998] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Revised: 05/11/2020] [Accepted: 05/13/2020] [Indexed: 11/27/2022]
Abstract
This paper introduces active listening, as a unified framework for synthesising and recognising speech. The notion of active listening inherits from active inference, which considers perception and action under one universal imperative: to maximise the evidence for our (generative) models of the world. First, we describe a generative model of spoken words that simulates (i) how discrete lexical, prosodic, and speaker attributes give rise to continuous acoustic signals; and conversely (ii) how continuous acoustic signals are recognised as words. The 'active' aspect involves (covertly) segmenting spoken sentences and borrows ideas from active vision. It casts speech segmentation as the selection of internal actions, corresponding to the placement of word boundaries. Practically, word boundaries are selected that maximise the evidence for an internal model of how individual words are generated. We establish face validity by simulating speech recognition and showing how the inferred content of a sentence depends on prior beliefs and background noise. Finally, we consider predictive validity by associating neuronal or physiological responses, such as the mismatch negativity and P300, with belief updating under active listening, which is greatest in the absence of accurate prior beliefs about what will be heard next.
Collapse
Affiliation(s)
- Karl J Friston
- The Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, London, WC1N 3AR, UK.
| | - Noor Sajid
- The Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, London, WC1N 3AR, UK.
| | | | - Thomas Parr
- The Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, London, WC1N 3AR, UK.
| | - Cathy J Price
- The Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, London, WC1N 3AR, UK.
| | - Emma Holmes
- The Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, London, WC1N 3AR, UK.
| |
Collapse
|
6
|
Jayakody DMP, Menegola HK, Yiannos JM, Goodman-Simpson J, Friedland PL, Taddei K, Laws SM, Weinborn M, Martins RN, Sohrabi HR. The Peripheral Hearing and Central Auditory Processing Skills of Individuals With Subjective Memory Complaints. Front Neurosci 2020; 14:888. [PMID: 32982675 PMCID: PMC7475691 DOI: 10.3389/fnins.2020.00888] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Accepted: 07/30/2020] [Indexed: 11/22/2022] Open
Abstract
Purpose This study examined the central auditory processing (CAP) assessment results of adults between 45 and 85 years of age with probable pre-clinical Alzheimer’s disease – i.e., individuals with subjective memory complaints (SMCs) as compared to those who were not reporting significant levels of memory complaints (non-SMCs). It was hypothesized that the SMC group would perform significantly poorer on tests of central auditory skills compared to participants with non-SMCs (control group). Methods A total of 95 participants were recruited from the larger Western Australia Memory Study and were classified as SMCs (N = 61; 20 males and 41 females, mean age 71.47 ±7.18 years) and non-SMCs (N = 34; 10 males, 24 females, mean age 68.85 ±7.69 years). All participants completed a peripheral hearing assessment, a CAP assessment battery including Dichotic Digits, Duration Pattern Test, Dichotic Sentence Identification, Synthetic Sentence Identification with Ipsilateral Competing Message (SSI-ICM) and the Quick-Speech-in-Noise, and a cognitive screening assessment. Results The SMCs group performed significantly poorer than the control group on SSI-ICM −10 and −20 dB signal-to-noise conditions. No significant differences were found between the two groups on the peripheral hearing threshold measurements and other CAP assessments. Conclusions The results suggest that individuals with SMCs perform poorly on specific CAP assessments in comparison to the controls. The poor CAP in SMC individuals may result in a higher cost to their finite pool of cognitive resources. The CAP results provide yet another biomarker that supports the hypothesis that SMCs may be a primary indication of neuropathological changes in the brain. Longitudinal follow up of individuals with SMCs, and decreased CAP abilities should inform whether this group is at higher risk of developing dementia as compared to non-SMCs and those SMC individuals without CAP difficulties.
Collapse
Affiliation(s)
- Dona M P Jayakody
- Ear Science Institute Australia, Subiaco, WA, Australia.,Ear Sciences Centre Faculty of Health and Medical Sciences, The University of Western Australia, Crawley, WA, Australia
| | | | - Jessica M Yiannos
- Ear Science Institute Australia, Subiaco, WA, Australia.,School of Human Sciences, The University of Western Australia, Crawley, WA, Australia
| | | | - Peter L Friedland
- Department of Otolaryngology Head Neck Skull Base Surgery, Sir Charles Gairdner Hospital, Nedlands, WA, Australia.,School of Medicine, University Notre Dame, Fremantle, WA, Australia
| | - Kevin Taddei
- School of Medical and Health Sciences, Edith Cowan University, Joondalup, WA, Australia
| | - Simon M Laws
- Collaborative Genomics Group, School of Medical and Health Sciences, Edith Cowan University, Joondalup, WA, Australia.,School of Pharmacy and Biomedical Sciences, Faculty of Health Sciences, Curtin Health Innovation Research Institute, Curtin University, Bentley, WA, Australia
| | - Michael Weinborn
- School of Medical and Health Sciences, Edith Cowan University, Joondalup, WA, Australia.,School of Psychological Science, The University of Western Australia, Nedlands, WA, Australia
| | - Ralph N Martins
- School of Medical and Health Sciences, Edith Cowan University, Joondalup, WA, Australia.,Department of Biomedical Sciences, Faculty of Medicine and Health Sciences, Macquarie University, Sydney, NSW, Australia
| | - Hamid R Sohrabi
- School of Medical and Health Sciences, Edith Cowan University, Joondalup, WA, Australia.,Department of Biomedical Sciences, Faculty of Medicine and Health Sciences, Macquarie University, Sydney, NSW, Australia.,Centre for Healthy Ageing, School of Psychology and Exercise Science, Murdoch University, Murdoch, WA, Australia
| |
Collapse
|
7
|
Canales-Johnson A, Billig AJ, Olivares F, Gonzalez A, Garcia MDC, Silva W, Vaucheret E, Ciraolo C, Mikulan E, Ibanez A, Huepe D, Noreika V, Chennu S, Bekinschtein TA. Dissociable Neural Information Dynamics of Perceptual Integration and Differentiation during Bistable Perception. Cereb Cortex 2020; 30:4563-4580. [PMID: 32219312 PMCID: PMC7325715 DOI: 10.1093/cercor/bhaa058] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
At any given moment, we experience a perceptual scene as a single whole and yet we may distinguish a variety of objects within it. This phenomenon instantiates two properties of conscious perception: integration and differentiation. Integration is the property of experiencing a collection of objects as a unitary percept and differentiation is the property of experiencing these objects as distinct from each other. Here, we evaluated the neural information dynamics underlying integration and differentiation of perceptual contents during bistable perception. Participants listened to a sequence of tones (auditory bistable stimuli) experienced either as a single stream (perceptual integration) or as two parallel streams (perceptual differentiation) of sounds. We computed neurophysiological indices of information integration and information differentiation with electroencephalographic and intracranial recordings. When perceptual alternations were endogenously driven, the integrated percept was associated with an increase in neural information integration and a decrease in neural differentiation across frontoparietal regions, whereas the opposite pattern was observed for the differentiated percept. However, when perception was exogenously driven by a change in the sound stream (no bistability), neural oscillatory power distinguished between percepts but information measures did not. We demonstrate that perceptual integration and differentiation can be mapped to theoretically motivated neural information signatures, suggesting a direct relationship between phenomenology and neurophysiology.
Collapse
Affiliation(s)
- Andrés Canales-Johnson
- Department of Psychology, University of Cambridge, CB2 3EB Cambridge, UK
- Vicerectoria de Investigacion y Posgrado, Universidad Catolica del Maule, Talca 3480112, Chile
| | - Alexander J Billig
- Brain and Mind Institute, University of Western Ontario, London, N6A 3K7, Canada
- UCL Ear Institute, University College London, London, UK
| | - Francisco Olivares
- Facultad de Psicologia, Universidad Diego Portales, Santiago 8370076, Chile
| | - Andrés Gonzalez
- Facultad de Psicologia, Universidad Diego Portales, Santiago 8370076, Chile
| | - María del Carmen Garcia
- Programa de Cirugía de Epilepsia, Hospital Italiano de Buenos Aires, Buenos Aires C1199ABB, Argentina
| | - Walter Silva
- Programa de Cirugía de Epilepsia, Hospital Italiano de Buenos Aires, Buenos Aires C1199ABB, Argentina
| | - Esteban Vaucheret
- Programa de Cirugía de Epilepsia, Hospital Italiano de Buenos Aires, Buenos Aires C1199ABB, Argentina
| | - Carlos Ciraolo
- Programa de Cirugía de Epilepsia, Hospital Italiano de Buenos Aires, Buenos Aires C1199ABB, Argentina
| | - Ezequiel Mikulan
- Laboratory of Experimental Psychology and Neuroscience (LPEN), Institute of Cognitive and Translational Neuroscience (INCyT), INECO Foundation, Favaloro University, Buenos Aires 1126, Argentina
| | - Agustín Ibanez
- Laboratory of Experimental Psychology and Neuroscience (LPEN), Institute of Cognitive and Translational Neuroscience (INCyT), INECO Foundation, Favaloro University, Buenos Aires 1126, Argentina
- National Scientific and Technical Research Council (CONICET), Buenos Aires, Argentina
- School of Psychology, Center for Social and Cognitive Neuroscience (CSCN), Universidad Adolfo Ibáñez, Santiago 2485, Chile
| | - David Huepe
- School of Psychology, Center for Social and Cognitive Neuroscience (CSCN), Universidad Adolfo Ibáñez, Santiago 2485, Chile
| | - Valdas Noreika
- Department of Psychology, University of Cambridge, CB2 3EB Cambridge, UK
| | - Srivas Chennu
- School of Computing, University of Kent, ME4 4AG Chatham, UK
- Department of Clinical Neurosciences, University of Cambridge, CB2 3EB Cambridge, UK
| | | |
Collapse
|
8
|
Szalárdy O, Tóth B, Farkas D, Orosz G, Honbolygó F, Winkler I. Linguistic predictability influences auditory stimulus classification within two concurrent speech streams. Psychophysiology 2020; 57:e13547. [DOI: 10.1111/psyp.13547] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Revised: 01/20/2020] [Accepted: 01/22/2020] [Indexed: 11/30/2022]
Affiliation(s)
- Orsolya Szalárdy
- Faculty of Medicine Institute of Behavioural Sciences Semmelweis University Budapest Hungary
- Institute of Cognitive Neuroscience and Psychology Research Centre for Natural Sciences Hungarian Academy of Sciences Budapest Hungary
| | - Brigitta Tóth
- Institute of Cognitive Neuroscience and Psychology Research Centre for Natural Sciences Hungarian Academy of Sciences Budapest Hungary
| | - Dávid Farkas
- Analytics Development, Performance Management and Analytics, Business Development, Integrated Supply Chain Management, Nokia Business Services, Nokia Operations, Nokia Budapest Hungary
| | - Gábor Orosz
- Department of Psychology Stanford University Stanford CA USA
| | - Ferenc Honbolygó
- Brain Imaging Centre Research Centre for Natural Sciences Hungarian Academy of Sciences Budapest Hungary
- Institute of Psychology ELTE Eötvös Loránd University Budapest Hungary
| | - István Winkler
- Institute of Cognitive Neuroscience and Psychology Research Centre for Natural Sciences Hungarian Academy of Sciences Budapest Hungary
| |
Collapse
|
9
|
Little DF, Snyder JS, Elhilali M. Ensemble modeling of auditory streaming reveals potential sources of bistability across the perceptual hierarchy. PLoS Comput Biol 2020; 16:e1007746. [PMID: 32275706 PMCID: PMC7185718 DOI: 10.1371/journal.pcbi.1007746] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Revised: 04/27/2020] [Accepted: 02/25/2020] [Indexed: 11/19/2022] Open
Abstract
Perceptual bistability-the spontaneous, irregular fluctuation of perception between two interpretations of a stimulus-occurs when observing a large variety of ambiguous stimulus configurations. This phenomenon has the potential to serve as a tool for, among other things, understanding how function varies across individuals due to the large individual differences that manifest during perceptual bistability. Yet it remains difficult to interpret the functional processes at work, without knowing where bistability arises during perception. In this study we explore the hypothesis that bistability originates from multiple sources distributed across the perceptual hierarchy. We develop a hierarchical model of auditory processing comprised of three distinct levels: a Peripheral, tonotopic analysis, a Central analysis computing features found more centrally in the auditory system, and an Object analysis, where sounds are segmented into different streams. We model bistable perception within this system by applying adaptation, inhibition and noise into one or all of the three levels of the hierarchy. We evaluate a large ensemble of variations of this hierarchical model, where each model has a different configuration of adaptation, inhibition and noise. This approach avoids the assumption that a single configuration must be invoked to explain the data. Each model is evaluated based on its ability to replicate two hallmarks of bistability during auditory streaming: the selectivity of bistability to specific stimulus configurations, and the characteristic log-normal pattern of perceptual switches. Consistent with a distributed origin, a broad range of model parameters across this hierarchy lead to a plausible form of perceptual bistability.
Collapse
Affiliation(s)
- David F. Little
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Joel S. Snyder
- Department of Psychology, University of Nevada, Las Vegas; Las Vegas, Nevada, United States of America
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
| |
Collapse
|
10
|
Bergevin C, Narayan C, Williams J, Mhatre N, Steeves JKE, Bernstein JGW, Story B. Overtone focusing in biphonic tuvan throat singing. eLife 2020; 9:e50476. [PMID: 32048990 PMCID: PMC7064340 DOI: 10.7554/elife.50476] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2019] [Accepted: 01/31/2020] [Indexed: 11/13/2022] Open
Abstract
Khoomei is a unique singing style originating from the republic of Tuva in central Asia. Singers produce two pitches simultaneously: a booming low-frequency rumble alongside a hovering high-pitched whistle-like tone. The biomechanics of this biphonation are not well-understood. Here, we use sound analysis, dynamic magnetic resonance imaging, and vocal tract modeling to demonstrate how biphonation is achieved by modulating vocal tract morphology. Tuvan singers show remarkable control in shaping their vocal tract to narrowly focus the harmonics (or overtones) emanating from their vocal cords. The biphonic sound is a combination of the fundamental pitch and a focused filter state, which is at the higher pitch (1-2 kHz) and formed by merging two formants, thereby greatly enhancing sound-production in a very narrow frequency range. Most importantly, we demonstrate that this biphonation is a phenomenon arising from linear filtering rather than from a nonlinear source.
Collapse
Affiliation(s)
- Christopher Bergevin
- Physics and Astronomy, York UniversityTorontoCanada
- Centre for Vision Research, York UniversityTorontoCanada
- Fields Institute for Research in Mathematical SciencesTorontoCanada
- Kavli Institute of Theoretical Physics, University of CaliforniaSanta BarbaraUnited States
| | - Chandan Narayan
- Languages, Literatures and Linguistics, York UniversityTorontoCanada
| | | | | | - Jennifer KE Steeves
- Centre for Vision Research, York UniversityTorontoCanada
- Psychology, York UniversityTorontoCanada
| | - Joshua GW Bernstein
- National Military Audiology & Speech Pathology Center, Walter Reed National Military Medical CenterBethesdaUnited States
| | - Brad Story
- Speech, Language, and Hearing Sciences, University of ArizonaTucsonUnited States
| |
Collapse
|
11
|
Enhanced auditory disembedding in an interleaved melody recognition test is associated with absolute pitch ability. Sci Rep 2019; 9:7838. [PMID: 31127171 PMCID: PMC6534562 DOI: 10.1038/s41598-019-44297-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Accepted: 05/09/2019] [Indexed: 11/08/2022] Open
Abstract
Absolute pitch (AP) and autism have recently been associated with each other. Neurocognitive theories of autism could perhaps explain this co-occurrence. This study investigates whether AP musicians show an advantage in an interleaved melody recognition task (IMRT), an auditory version of an embedded figures test often investigated in autism with respect to the these theories. A total of N = 59 professional musicians (AP = 27) participated in the study. In each trial a probe melody was followed by an interleaved sequence. Participants had to indicate as to whether the probe melody was present in the interleaved sequence. Sensitivity index d′ and response bias c were calculated according to signal detection theory. Additionally, a pitch adjustment test measuring fine-graded differences in absolute pitch proficiency, the Autism-Spectrum-Quotient and a visual embedded figures test were conducted. AP outperformed relative pitch (RP) possessors on the overall IMRT and the fully interleaved condition. AP proficiency, visual disembedding and musicality predicted 39.2% of variance in the IMRT. No correlations were found between IMRT and autistic traits. Results are in line with a detailed-oriented cognitive style and enhanced perceptional functioning of AP musicians similar to that observed in autism.
Collapse
|
12
|
Abstract
Research in speech perception has explored how knowledge of a language influences phonetic perception. The current study investigated whether such linguistic influences extend to the perceptual (sequential) organization of speech. Listeners heard sinewave analogs of word pairs (e.g., loose seam, which contains a single [s] frication but is perceived as two /s/ phonemes) cycle continuously, which causes the stimulus to split apart into foreground and background percepts. They had to identify the foreground percept when the stimuli were heard as nonspeech and then again when heard as speech. Of interest was how grouping changed across listening condition when [s] was heard as speech or as a hiss. Although the section of the signal that was identified as the foreground differed little across listening condition, a strong bias to perceive [s] as forming the onset of the foreground was observed in the speech condition (Experiment 1). This effect was reduced in Experiment 2 by increasing the stimulus repetition rate. Findings suggest that the sequential organization of speech arises from the interaction of auditory and linguistic processes, with the former constraining the latter.
Collapse
|
13
|
Calandruccio L, Buss E, Bencheck P, Jett B. Does the semantic content or syntactic regularity of masker speech affect speech-on-speech recognition? THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:3289. [PMID: 30599661 PMCID: PMC6786886 DOI: 10.1121/1.5081679] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Revised: 11/07/2018] [Accepted: 11/09/2018] [Indexed: 05/30/2023]
Abstract
Speech-on-speech recognition differs substantially across stimuli, but it is unclear what role linguistic features of the masker play in this variability. The linguistic similarity hypothesis suggests similarity between sentence-level semantic content of the target and masker speech increases masking. Sentence recognition in a two-talker masker was evaluated with respect to semantic content and syntactic structure of the masker (experiment 1) and linguistic similarity of the target and masker (experiment 2). Target and masker sentences were semantically meaningful or anomalous. Masker syntax was varied or the same across sentences. When other linguistic features of the masker were controlled, variability in syntactic structure across masker tokens was only relevant when the masker was played continuously (as opposed to gated); when played continuously, sentence-recognition thresholds were poorer with variable than consistent masker syntax, but this effect was small (0.5 dB). When the syntactic structure of the masker was held constant, semantic meaningfulness of the masker did not increase masking, and at times performance was better for the meaningful than the anomalous masker. These data indicate that sentence-level semantic content of the masker speech does not influence speech-on-speech masking. Further, no evidence that similarities between target/masker sentence-level semantic content increases masking was found.
Collapse
Affiliation(s)
- Lauren Calandruccio
- Department of Psychological Sciences, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | - Emily Buss
- Department of Head/Neck Surgery and Otolaryngology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | - Penelope Bencheck
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | - Brandi Jett
- Department of Psychological Sciences, Case Western Reserve University, Cleveland, Ohio 44106, USA
| |
Collapse
|
14
|
Neural Prediction Errors Distinguish Perception and Misperception of Speech. J Neurosci 2018; 38:6076-6089. [PMID: 29891730 DOI: 10.1523/jneurosci.3258-17.2018] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Revised: 03/08/2018] [Accepted: 03/28/2018] [Indexed: 11/21/2022] Open
Abstract
Humans use prior expectations to improve perception, especially of sensory signals that are degraded or ambiguous. However, if sensory input deviates from prior expectations, then correct perception depends on adjusting or rejecting prior expectations. Failure to adjust or reject the prior leads to perceptual illusions, especially if there is partial overlap (and thus partial mismatch) between expectations and input. With speech, "slips of the ear" occur when expectations lead to misperception. For instance, an entomologist might be more susceptible to hear "The ants are my friends" for "The answer, my friend" (in the Bob Dylan song Blowing in the Wind). Here, we contrast two mechanisms by which prior expectations may lead to misperception of degraded speech. First, clear representations of the common sounds in the prior and input (i.e., expected sounds) may lead to incorrect confirmation of the prior. Second, insufficient representations of sounds that deviate between prior and input (i.e., prediction errors) could lead to deception. We used crossmodal predictions from written words that partially match degraded speech to compare neural responses when male and female human listeners were deceived into accepting the prior or correctly reject it. Combined behavioral and multivariate representational similarity analysis of fMRI data show that veridical perception of degraded speech is signaled by representations of prediction error in the left superior temporal sulcus. Instead of using top-down processes to support perception of expected sensory input, our findings suggest that the strength of neural prediction error representations distinguishes correct perception and misperception.SIGNIFICANCE STATEMENT Misperceiving spoken words is an everyday experience, with outcomes that range from shared amusement to serious miscommunication. For hearing-impaired individuals, frequent misperception can lead to social withdrawal and isolation, with severe consequences for wellbeing. In this work, we specify the neural mechanisms by which prior expectations, which are so often helpful for perception, can lead to misperception of degraded sensory signals. Most descriptive theories of illusory perception explain misperception as arising from a clear sensory representation of features or sounds that are in common between prior expectations and sensory input. Our work instead provides support for a complementary proposal: that misperception occurs when there is an insufficient sensory representations of the deviation between expectations and sensory signals.
Collapse
|
15
|
Popham S, Boebinger D, Ellis DPW, Kawahara H, McDermott JH. Inharmonic speech reveals the role of harmonicity in the cocktail party problem. Nat Commun 2018; 9:2122. [PMID: 29844313 PMCID: PMC5974276 DOI: 10.1038/s41467-018-04551-8] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Accepted: 05/08/2018] [Indexed: 11/22/2022] Open
Abstract
The "cocktail party problem" requires us to discern individual sound sources from mixtures of sources. The brain must use knowledge of natural sound regularities for this purpose. One much-discussed regularity is the tendency for frequencies to be harmonically related (integer multiples of a fundamental frequency). To test the role of harmonicity in real-world sound segregation, we developed speech analysis/synthesis tools to perturb the carrier frequencies of speech, disrupting harmonic frequency relations while maintaining the spectrotemporal envelope that determines phonemic content. We find that violations of harmonicity cause individual frequencies of speech to segregate from each other, impair the intelligibility of concurrent utterances despite leaving intelligibility of single utterances intact, and cause listeners to lose track of target talkers. However, additional segregation deficits result from replacing harmonic frequencies with noise (simulating whispering), suggesting additional grouping cues enabled by voiced speech excitation. Our results demonstrate acoustic grouping cues in real-world sound segregation.
Collapse
Affiliation(s)
- Sara Popham
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, 02139, USA
- Helen Wills Neuroscience Institute, UC Berkeley, Berkeley, CA, 94720, USA
| | - Dana Boebinger
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, 02139, USA
- Program in Speech and Hearing Sciences, Harvard University, Cambridge, MA, 02138, USA
| | | | | | - Josh H McDermott
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, 02139, USA.
- Program in Speech and Hearing Sciences, Harvard University, Cambridge, MA, 02138, USA.
| |
Collapse
|
16
|
Abstract
The cocktail party problem requires listeners to infer individual sound sources from mixtures of sound. The problem can be solved only by leveraging regularities in natural sound sources, but little is known about how such regularities are internalized. We explored whether listeners learn source "schemas"-the abstract structure shared by different occurrences of the same type of sound source-and use them to infer sources from mixtures. We measured the ability of listeners to segregate mixtures of time-varying sources. In each experiment a subset of trials contained schema-based sources generated from a common template by transformations (transposition and time dilation) that introduced acoustic variation but preserved abstract structure. Across several tasks and classes of sound sources, schema-based sources consistently aided source separation, in some cases producing rapid improvements in performance over the first few exposures to a schema. Learning persisted across blocks that did not contain the learned schema, and listeners were able to learn and use multiple schemas simultaneously. No learning was evident when schema were presented in the task-irrelevant (i.e., distractor) source. However, learning from task-relevant stimuli showed signs of being implicit, in that listeners were no more likely to report that sources recurred in experiments containing schema-based sources than in control experiments containing no schema-based sources. The results implicate a mechanism for rapidly internalizing abstract sound structure, facilitating accurate perceptual organization of sound sources that recur in the environment.
Collapse
|
17
|
Neural Decoding of Bistable Sounds Reveals an Effect of Intention on Perceptual Organization. J Neurosci 2018; 38:2844-2853. [PMID: 29440556 PMCID: PMC5852662 DOI: 10.1523/jneurosci.3022-17.2018] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Revised: 01/21/2018] [Accepted: 02/06/2018] [Indexed: 12/05/2022] Open
Abstract
Auditory signals arrive at the ear as a mixture that the brain must decompose into distinct sources based to a large extent on acoustic properties of the sounds. An important question concerns whether listeners have voluntary control over how many sources they perceive. This has been studied using pure high (H) and low (L) tones presented in the repeating pattern HLH-HLH-, which can form a bistable percept heard either as an integrated whole (HLH-) or as segregated into high (H-H-) and low (-L-) sequences. Although instructing listeners to try to integrate or segregate sounds affects reports of what they hear, this could reflect a response bias rather than a perceptual effect. We had human listeners (15 males, 12 females) continuously report their perception of such sequences and recorded neural activity using MEG. During neutral listening, a classifier trained on patterns of neural activity distinguished between periods of integrated and segregated perception. In other conditions, participants tried to influence their perception by allocating attention either to the whole sequence or to a subset of the sounds. They reported hearing the desired percept for a greater proportion of time than when listening neutrally. Critically, neural activity supported these reports; stimulus-locked brain responses in auditory cortex were more likely to resemble the signature of segregation when participants tried to hear segregation than when attempting to perceive integration. These results indicate that listeners can influence how many sound sources they perceive, as reflected in neural responses that track both the input and its perceptual organization. SIGNIFICANCE STATEMENT Can we consciously influence our perception of the external world? We address this question using sound sequences that can be heard either as coming from a single source or as two distinct auditory streams. Listeners reported spontaneous changes in their perception between these two interpretations while we recorded neural activity to identify signatures of such integration and segregation. They also indicated that they could, to some extent, choose between these alternatives. This claim was supported by corresponding changes in responses in auditory cortex. By linking neural and behavioral correlates of perception, we demonstrate that the number of objects that we perceive can depend not only on the physical attributes of our environment, but also on how we intend to experience it.
Collapse
|
18
|
Espinoza-Varas B, Hilton J, Guo S. Object-based attention modulates the discrimination of level increments in stop-consonant noise bursts. PLoS One 2018; 13:e0190956. [PMID: 29364931 PMCID: PMC5783383 DOI: 10.1371/journal.pone.0190956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2017] [Accepted: 12/23/2017] [Indexed: 11/18/2022] Open
Abstract
This study tested the hypothesis that object-based attention modulates the discrimination of level increments in stop-consonant noise bursts. With consonant-vowel-consonant (CvC) words consisting of an ≈80-dB vowel (v), a pre-vocalic (Cv) and a post-vocalic (vC) stop-consonant noise burst (≈60-dB SPL), we measured discrimination thresholds (LDTs) for level increments (ΔL) in the noise bursts presented either in CvC context or in isolation. In the 2-interval 2-alternative forced-choice task, each observation interval presented a CvC word (e.g., /pæk/ /pæk/), and normal-hearing participants had to discern ΔL in the Cv or vC burst. Based on the linguistic word labels, the auditory events of each trial were perceived as two auditory objects (Cv-v-vC and Cv-v-vC) that group together the bursts and vowels, hindering selective attention to ΔL. To discern ΔL in Cv or vC, the events must be reorganized into three auditory objects: the to-be-attended pre-vocalic (Cv-Cv) or post-vocalic burst pair (vC-vC), and the to-be-ignored vowel pair (v-v). Our results suggest that instead of being automatic this reorganization requires training, in spite of using familiar CvC words. Relative to bursts in isolation, bursts in context always produced inferior ΔL discrimination accuracy (a context effect), which depended strongly on the acoustic separation between the bursts and the vowel, being much keener for the object apart from (post-vocalic) than for the object adjoining (pre-vocalic) the vowel (a temporal-position effect). Variability in CvC dimensions that did not alter the noise-burst perceptual grouping had minor effects on discrimination accuracy. In addition to being robust and persistent, these effects are relatively general, evincing in forced-choice tasks with one or two observation intervals, with or without variability in the temporal position of ΔL, and with either fixed or roving CvC standards. The results lend support to the hypothesis.
Collapse
Affiliation(s)
- Blas Espinoza-Varas
- Department of Communication Sciences & Disorders, University of Oklahoma Health Sciences Center, Oklahoma City, OK, United States of America
| | - Jeremiah Hilton
- Department of Communication Sciences & Disorders, University of Oklahoma Health Sciences Center, Oklahoma City, OK, United States of America.,Department of Biostatistics & Epidemiology, University of Oklahoma Health Sciences Center, Oklahoma City, OK, United States of America
| | - Shaoxuan Guo
- College of Medicine, University of Oklahoma Health Sciences Center, Oklahoma City, OK, United States of America
| |
Collapse
|
19
|
Shearer DE, Molis MR, Bennett KO, Leek MR. Auditory stream segregation of iterated rippled noises by normal-hearing and hearing-impaired listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:378. [PMID: 29390743 PMCID: PMC5785299 DOI: 10.1121/1.5021333] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Revised: 08/09/2017] [Accepted: 01/02/2018] [Indexed: 05/28/2023]
Abstract
Individuals with hearing loss are thought to be less sensitive to the often subtle variations of acoustic information that support auditory stream segregation. Perceptual segregation can be influenced by differences in both the spectral and temporal characteristics of interleaved stimuli. The purpose of this study was to determine what stimulus characteristics support sequential stream segregation by normal-hearing and hearing-impaired listeners. Iterated rippled noises (IRNs) were used to assess the effects of tonality, spectral resolvability, and hearing loss on the perception of auditory streams in two pitch regions, corresponding to 250 and 1000 Hz. Overall, listeners with hearing loss were significantly less likely to segregate alternating IRNs into two auditory streams than were normally hearing listeners. Low pitched IRNs were generally less likely to segregate into two streams than were higher pitched IRNs. High-pass filtering was a strong contributor to reduced segregation for both groups. The tonality, or pitch strength, of the IRNs had a significant effect on streaming, but the effect was similar for both groups of subjects. These data demonstrate that stream segregation is influenced by many factors including pitch differences, pitch region, spectral resolution, and degree of stimulus tonality, in addition to the loss of auditory sensitivity.
Collapse
Affiliation(s)
- Daniel E Shearer
- National Center for Rehabilitative Auditory Research, Portland VA Healthcare System. Portland, Oregon 97239, USA
| | - Michelle R Molis
- National Center for Rehabilitative Auditory Research, Portland VA Healthcare System. Portland, Oregon 97239, USA
| | - Keri O Bennett
- National Center for Rehabilitative Auditory Research, Portland VA Healthcare System. Portland, Oregon 97239, USA
| | - Marjorie R Leek
- National Center for Rehabilitative Auditory Research, Portland VA Healthcare System. Portland, Oregon 97239, USA
| |
Collapse
|
20
|
Attention Is Required for Knowledge-Based Sequential Grouping: Insights from the Integration of Syllables into Words. J Neurosci 2017; 38:1178-1188. [PMID: 29255005 DOI: 10.1523/jneurosci.2606-17.2017] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Revised: 11/08/2017] [Accepted: 12/05/2017] [Indexed: 11/21/2022] Open
Abstract
How the brain groups sequential sensory events into chunks is a fundamental question in cognitive neuroscience. This study investigates whether top-down attention or specific tasks are required for the brain to apply lexical knowledge to group syllables into words. Neural responses tracking the syllabic and word rhythms of a rhythmic speech sequence were concurrently monitored using electroencephalography (EEG). The participants performed different tasks, attending to either the rhythmic speech sequence or a distractor, which was another speech stream or a nonlinguistic auditory/visual stimulus. Attention to speech, but not a lexical-meaning-related task, was required for reliable neural tracking of words, even when the distractor was a nonlinguistic stimulus presented cross-modally. Neural tracking of syllables, however, was reliably observed in all tested conditions. These results strongly suggest that neural encoding of individual auditory events (i.e., syllables) is automatic, while knowledge-based construction of temporal chunks (i.e., words) crucially relies on top-down attention.SIGNIFICANCE STATEMENT Why we cannot understand speech when not paying attention is an old question in psychology and cognitive neuroscience. Speech processing is a complex process that involves multiple stages, e.g., hearing and analyzing the speech sound, recognizing words, and combining words into phrases and sentences. The current study investigates which speech-processing stage is blocked when we do not listen carefully. We show that the brain can reliably encode syllables, basic units of speech sounds, even when we do not pay attention. Nevertheless, when distracted, the brain cannot group syllables into multisyllabic words, which are basic units for speech meaning. Therefore, the process of converting speech sound into meaning crucially relies on attention.
Collapse
|
21
|
Attention is shaped by semantic level of event-structure during speech comprehension: an electroencephalogram study. Cogn Neurodyn 2017; 11:467-481. [PMID: 29067134 DOI: 10.1007/s11571-017-9442-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2016] [Revised: 05/04/2017] [Accepted: 05/31/2017] [Indexed: 10/19/2022] Open
Abstract
The present electroencephalogram study used an attention probe paradigm to investigate how semantic and acoustic structures constrain temporal attention during speech comprehension. Spoken sentences were used as stimuli, with each one containing a four-character critical phrase, of which the third character was the target character. We manipulated not only the semantic relationship between the target character and the immediately preceding two characters, but also the presence/absence of a pitch accent on the first character. In addition, an attention probe was either presented concurrently with the target character or not. The results showed that the N1 effect evoked by the attention probe was of larger amplitude and started earlier (enhanced attention) when the target character and the preceding two characters belonged to the same semantic event than when they spanned a semantic-event boundary, and this effect occurred only in the un-accented conditions. The results suggest that, during speech comprehension, the semantic level of event-structure can constrain attention allocation along the temporal dimension, and reverse the attention attenuation effect of prediction; meanwhile, the semantic and acoustic levels of event-structure interact with each other immediately to modulate auditory-temporal attention. The results were discussed with regard to the predictive coding account of attention.
Collapse
|
22
|
Attentional Modulation of Envelope-Following Responses at Lower (93-109 Hz) but Not Higher (217-233 Hz) Modulation Rates. J Assoc Res Otolaryngol 2017; 19:83-97. [PMID: 28971333 PMCID: PMC5783923 DOI: 10.1007/s10162-017-0641-9] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2017] [Accepted: 09/04/2017] [Indexed: 11/03/2022] Open
Abstract
Directing attention to sounds of different frequencies allows listeners to perceive a sound of interest, like a talker, in a mixture. Whether cortically generated frequency-specific attention affects responses as low as the auditory brainstem is currently unclear. Participants attended to either a high- or low-frequency tone stream, which was presented simultaneously and tagged with different amplitude modulation (AM) rates. In a replication design, we showed that envelope-following responses (EFRs) were modulated by attention only when the stimulus AM rate was slow enough for the auditory cortex to track—and not for stimuli with faster AM rates, which are thought to reflect ‘purer’ brainstem sources. Thus, we found no evidence of frequency-specific attentional modulation that can be confidently attributed to brainstem generators. The results demonstrate that different neural populations contribute to EFRs at higher and lower rates, compatible with cortical contributions at lower rates. The results further demonstrate that stimulus AM rate can alter conclusions of EFR studies.
Collapse
|
23
|
Stachurski M, Summers RJ, Roberts B. Stream segregation of concurrent speech and the verbal transformation effect: Influence of fundamental frequency and lateralization cues. Hear Res 2017; 354:16-27. [PMID: 28843209 DOI: 10.1016/j.heares.2017.07.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/04/2017] [Revised: 07/25/2017] [Accepted: 07/31/2017] [Indexed: 10/19/2022]
Abstract
Repeating a recorded word produces verbal transformations (VTs); perceptual regrouping of acoustic-phonetic elements may contribute to this effect. The influence of fundamental frequency (F0) and lateralization grouping cues was explored by presenting two concurrent sequences of the same word resynthesized on different F0s (100 and 178 Hz). In experiment 1, listeners monitored both sequences simultaneously, reporting for each any change in stimulus identity. Three lateralization conditions were used - diotic, ±680-μs interaural time difference, and dichotic. Results were similar for the first two conditions, but fewer forms and later initial transformations were reported in the dichotic condition. This suggests that large lateralization differences per se have little effect - rather, there are more possibilities for regrouping when each ear receives both sequences. In the dichotic condition, VTs reported for one sequence were also more independent of those reported for the other. Experiment 2 used diotic stimuli and explored the effect of the number of sequences presented and monitored. The most forms and earliest transformations were reported when two sequences were presented but only one was monitored, indicating that high task demands decreased reporting of VTs for concurrent sequences. Overall, these findings support the idea that perceptual regrouping contributes to the VT effect.
Collapse
Affiliation(s)
- Marcin Stachurski
- Psychology, School of Life and Health Sciences, Aston University, Birmingham, B4 7ET, UK
| | - Robert J Summers
- Psychology, School of Life and Health Sciences, Aston University, Birmingham, B4 7ET, UK
| | - Brian Roberts
- Psychology, School of Life and Health Sciences, Aston University, Birmingham, B4 7ET, UK.
| |
Collapse
|
24
|
Mehta AH, Yasin I, Oxenham AJ, Shamma S. Neural correlates of attention and streaming in a perceptually multistable auditory illusion. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:2225. [PMID: 27794350 PMCID: PMC5849028 DOI: 10.1121/1.4963902] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2016] [Revised: 09/12/2016] [Accepted: 09/20/2016] [Indexed: 06/06/2023]
Abstract
In a complex acoustic environment, acoustic cues and attention interact in the formation of streams within the auditory scene. In this study, a variant of the "octave illusion" [Deutsch (1974). Nature 251, 307-309] was used to investigate the neural correlates of auditory streaming, and to elucidate the effects of attention on the interaction between sequential and concurrent sound segregation in humans. By directing subjects' attention to different frequencies and ears, it was possible to elicit several different illusory percepts with the identical stimulus. The first experiment tested the hypothesis that the illusion depends on the ability of listeners to perceptually stream the target tones from within the alternating sound sequences. In the second experiment, concurrent psychophysical measures and electroencephalography recordings provided neural correlates of the various percepts elicited by the multistable stimulus. The results show that the perception and neural correlates of the auditory illusion can be manipulated robustly by attentional focus and that the illusion is constrained in much the same way as auditory stream segregation, suggesting common underlying mechanisms.
Collapse
Affiliation(s)
- Anahita H Mehta
- Ear Institute, University College London, 332 Gray's Inn Road, London WC1X 8EE, United Kingdom
| | - Ifat Yasin
- Department of Computer Science, University College London, 66-72 Gower Street, London WC1E 6BT, United Kingdom
| | - Andrew J Oxenham
- Department of Psychology, University of Minnesota, 75 East River Parkway, Minneapolis, Minnesota 55455, USA
| | - Shihab Shamma
- Institute for Systems Research, 2203 A.V. Williams Building, University of Maryland, College Park, Maryland 20742, USA
| |
Collapse
|
25
|
Kösem A, Basirat A, Azizi L, van Wassenhove V. High-frequency neural activity predicts word parsing in ambiguous speech streams. J Neurophysiol 2016; 116:2497-2512. [PMID: 27605528 DOI: 10.1152/jn.00074.2016] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2016] [Accepted: 09/03/2016] [Indexed: 11/22/2022] Open
Abstract
During speech listening, the brain parses a continuous acoustic stream of information into computational units (e.g., syllables or words) necessary for speech comprehension. Recent neuroscientific hypotheses have proposed that neural oscillations contribute to speech parsing, but whether they do so on the basis of acoustic cues (bottom-up acoustic parsing) or as a function of available linguistic representations (top-down linguistic parsing) is unknown. In this magnetoencephalography study, we contrasted acoustic and linguistic parsing using bistable speech sequences. While listening to the speech sequences, participants were asked to maintain one of the two possible speech percepts through volitional control. We predicted that the tracking of speech dynamics by neural oscillations would not only follow the acoustic properties but also shift in time according to the participant's conscious speech percept. Our results show that the latency of high-frequency activity (specifically, beta and gamma bands) varied as a function of the perceptual report. In contrast, the phase of low-frequency oscillations was not strongly affected by top-down control. Whereas changes in low-frequency neural oscillations were compatible with the encoding of prelexical segmentation cues, high-frequency activity specifically informed on an individual's conscious speech percept.
Collapse
Affiliation(s)
- Anne Kösem
- Cognitive Neuroimaging Unit, CEA DRF/I2BM, Institut National de la Santé et de la Recherche Médicale, Université Paris-Sud, Université Paris-Saclay, Gif/Yvette, France; .,Radboud University, Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands.,Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands; and
| | - Anahita Basirat
- Cognitive Neuroimaging Unit, CEA DRF/I2BM, Institut National de la Santé et de la Recherche Médicale, Université Paris-Sud, Université Paris-Saclay, Gif/Yvette, France.,SCALab, Centre National de la Recherche Scientifique UMR 9193, Université Lille, Lille, France
| | - Leila Azizi
- Cognitive Neuroimaging Unit, CEA DRF/I2BM, Institut National de la Santé et de la Recherche Médicale, Université Paris-Sud, Université Paris-Saclay, Gif/Yvette, France
| | - Virginie van Wassenhove
- Cognitive Neuroimaging Unit, CEA DRF/I2BM, Institut National de la Santé et de la Recherche Médicale, Université Paris-Sud, Université Paris-Saclay, Gif/Yvette, France
| |
Collapse
|
26
|
Wayne RV, Hamilton C, Jones Huyck J, Johnsrude IS. Working Memory Training and Speech in Noise Comprehension in Older Adults. Front Aging Neurosci 2016; 8:49. [PMID: 27047370 PMCID: PMC4801856 DOI: 10.3389/fnagi.2016.00049] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2015] [Accepted: 02/22/2016] [Indexed: 11/16/2022] Open
Abstract
Understanding speech in the presence of background sound can be challenging for older adults. Speech comprehension in noise appears to depend on working memory and executive-control processes (e.g., Heald and Nusbaum, 2014), and their augmentation through training may have rehabilitative potential for age-related hearing loss. We examined the efficacy of adaptive working-memory training (Cogmed; Klingberg et al., 2002) in 24 older adults, assessing generalization to other working-memory tasks (near-transfer) and to other cognitive domains (far-transfer) using a cognitive test battery, including the Reading Span test, sensitive to working memory (e.g., Daneman and Carpenter, 1980). We also assessed far transfer to speech-in-noise performance, including a closed-set sentence task (Kidd et al., 2008). To examine the effect of cognitive training on benefit obtained from semantic context, we also assessed transfer to open-set sentences; half were semantically coherent (high-context) and half were semantically anomalous (low-context). Subjects completed 25 sessions (0.5–1 h each; 5 sessions/week) of both adaptive working memory training and placebo training over 10 weeks in a crossover design. Subjects' scores on the adaptive working-memory training tasks improved as a result of training. However, training did not transfer to other working memory tasks, nor to tasks recruiting other cognitive domains. We did not observe any training-related improvement in speech-in-noise performance. Measures of working memory correlated with the intelligibility of low-context, but not high-context, sentences, suggesting that sentence context may reduce the load on working memory. The Reading Span test significantly correlated only with a test of visual episodic memory, suggesting that the Reading Span test is not a pure-test of working memory, as is commonly assumed.
Collapse
Affiliation(s)
- Rachel V Wayne
- Department of Psychology, Queen's University Kingston, ON, Canada
| | - Cheryl Hamilton
- Department of Psychology, Queen's University Kingston, ON, Canada
| | | | - Ingrid S Johnsrude
- Department of Psychology, Queen's UniversityKingston, ON, Canada; Department of Psychology, School of Communication Sciences and Disorders, The Brain and Mind Institute, University of Western OntarioLondon, ON, Canada
| |
Collapse
|
27
|
Masutomi K, Barascud N, Kashino M, McDermott JH, Chait M. Sound segregation via embedded repetition is robust to inattention. J Exp Psychol Hum Percept Perform 2015; 42:386-400. [PMID: 26480248 PMCID: PMC4763252 DOI: 10.1037/xhp0000147] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The segregation of sound sources from the mixture of sounds that enters the ear is a core capacity of human hearing, but the extent to which this process is dependent on attention remains unclear. This study investigated the effect of attention on the ability to segregate sounds via repetition. We utilized a dual task design in which stimuli to be segregated were presented along with stimuli for a "decoy" task that required continuous monitoring. The task to assess segregation presented a target sound 10 times in a row, each time concurrent with a different distractor sound. McDermott, Wrobleski, and Oxenham (2011) demonstrated that repetition causes the target sound to be segregated from the distractors. Segregation was queried by asking listeners whether a subsequent probe sound was identical to the target. A control task presented similar stimuli but probed discrimination without engaging segregation processes. We present results from 3 different decoy tasks: a visual multiple object tracking task, a rapid serial visual presentation (RSVP) digit encoding task, and a demanding auditory monitoring task. Load was manipulated by using high- and low-demand versions of each decoy task. The data provide converging evidence of a small effect of attention that is nonspecific, in that it affected the segregation and control tasks to a similar extent. In all cases, segregation performance remained high despite the presence of a concurrent, objectively demanding decoy task. The results suggest that repetition-based segregation is robust to inattention.
Collapse
Affiliation(s)
- Keiko Masutomi
- Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
| | | | - Makio Kashino
- Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
| | - Josh H McDermott
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology
| | | |
Collapse
|
28
|
Billig AJ, Carlyon RP. Automaticity and primacy of auditory streaming: Concurrent subjective and objective measures. J Exp Psychol Hum Percept Perform 2015; 42:339-353. [PMID: 26414168 PMCID: PMC4763253 DOI: 10.1037/xhp0000146] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Two experiments used subjective and objective measures to study the automaticity and primacy of auditory streaming. Listeners heard sequences of “ABA–” triplets, where “A” and “B” were tones of different frequencies and “–” was a silent gap. Segregation was more frequently reported, and rhythmically deviant triplets less well detected, for a greater between-tone frequency separation and later in the sequence. In Experiment 1, performing a competing auditory task for the first part of the sequence led to a reduction in subsequent streaming compared to when the tones were attended throughout. This is consistent with focused attention promoting streaming, and/or with attention switches resetting it. However, the proportion of segregated reports increased more rapidly following a switch than at the start of a sequence, indicating that some streaming occurred automatically. Modeling ruled out a simple “covert attention” account of this finding. Experiment 2 required listeners to perform subjective and objective tasks concurrently. It revealed superior performance during integrated compared to segregated reports, beyond that explained by the codependence of the two measures on stimulus parameters. We argue that listeners have limited access to low-level stimulus representations once perceptual organization has occurred, and that subjective and objective streaming measures partly index the same processes.
Collapse
|
29
|
Roaring lions and chirruping lemurs: How the brain encodes sound objects in space. Neuropsychologia 2015; 75:304-13. [DOI: 10.1016/j.neuropsychologia.2015.06.012] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2014] [Revised: 06/07/2015] [Accepted: 06/10/2015] [Indexed: 01/29/2023]
|
30
|
Golden HL, Agustus JL, Goll JC, Downey LE, Mummery CJ, Schott JM, Crutch SJ, Warren JD. Functional neuroanatomy of auditory scene analysis in Alzheimer's disease. Neuroimage Clin 2015; 7:699-708. [PMID: 26029629 PMCID: PMC4446369 DOI: 10.1016/j.nicl.2015.02.019] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Revised: 01/16/2015] [Accepted: 02/24/2015] [Indexed: 11/28/2022]
Abstract
Auditory scene analysis is a demanding computational process that is performed automatically and efficiently by the healthy brain but vulnerable to the neurodegenerative pathology of Alzheimer's disease. Here we assessed the functional neuroanatomy of auditory scene analysis in Alzheimer's disease using the well-known 'cocktail party effect' as a model paradigm whereby stored templates for auditory objects (e.g., hearing one's spoken name) are used to segregate auditory 'foreground' and 'background'. Patients with typical amnestic Alzheimer's disease (n = 13) and age-matched healthy individuals (n = 17) underwent functional 3T-MRI using a sparse acquisition protocol with passive listening to auditory stimulus conditions comprising the participant's own name interleaved with or superimposed on multi-talker babble, and spectrally rotated (unrecognisable) analogues of these conditions. Name identification (conditions containing the participant's own name contrasted with spectrally rotated analogues) produced extensive bilateral activation involving superior temporal cortex in both the AD and healthy control groups, with no significant differences between groups. Auditory object segregation (conditions with interleaved name sounds contrasted with superimposed name sounds) produced activation of right posterior superior temporal cortex in both groups, again with no differences between groups. However, the cocktail party effect (interaction of own name identification with auditory object segregation processing) produced activation of right supramarginal gyrus in the AD group that was significantly enhanced compared with the healthy control group. The findings delineate an altered functional neuroanatomical profile of auditory scene analysis in Alzheimer's disease that may constitute a novel computational signature of this neurodegenerative pathology.
Collapse
Affiliation(s)
- Hannah L Golden
- Dementia Research Centre, UCL Institute of Neurology, University College London, London, UK
| | - Jennifer L Agustus
- Dementia Research Centre, UCL Institute of Neurology, University College London, London, UK
| | - Johanna C Goll
- Dementia Research Centre, UCL Institute of Neurology, University College London, London, UK
| | - Laura E Downey
- Dementia Research Centre, UCL Institute of Neurology, University College London, London, UK
| | - Catherine J Mummery
- Dementia Research Centre, UCL Institute of Neurology, University College London, London, UK
| | - Jonathan M Schott
- Dementia Research Centre, UCL Institute of Neurology, University College London, London, UK
| | - Sebastian J Crutch
- Dementia Research Centre, UCL Institute of Neurology, University College London, London, UK
| | - Jason D Warren
- Dementia Research Centre, UCL Institute of Neurology, University College London, London, UK
| |
Collapse
|
31
|
The verbal transformation effect and the perceptual organization of speech: influence of formant transitions and F0-contour continuity. Hear Res 2015; 323:22-31. [PMID: 25620314 DOI: 10.1016/j.heares.2015.01.007] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/13/2014] [Revised: 01/09/2015] [Accepted: 01/12/2015] [Indexed: 11/22/2022]
Abstract
This study explored the role of formant transitions and F0-contour continuity in binding together speech sounds into a coherent stream. Listening to a repeating recorded word produces verbal transformations to different forms; stream segregation contributes to this effect and so it can be used to measure changes in perceptual coherence. In experiment 1, monosyllables with strong formant transitions between the initial consonant and following vowel were monotonized; each monosyllable was paired with a weak-transitions counterpart. Further stimuli were derived by replacing the consonant-vowel transitions with samples from adjacent steady portions. Each stimulus was concatenated into a 3-min-long sequence. Listeners only reported more forms in the transitions-removed condition for strong-transitions words, for which formant-frequency discontinuities were substantial. In experiment 2, the F0 contour of all-voiced monosyllables was shaped to follow a rising or falling pattern, spanning one octave. Consecutive tokens either had the same contour, giving an abrupt F0 change between each token, or alternated, giving a continuous contour. Discontinuous sequences caused more transformations and forms, and shorter times to the first transformation. Overall, these findings support the notion that continuity cues provided by formant transitions and the F0 contour play an important role in maintaining the perceptual coherence of speech.
Collapse
|
32
|
Clarke J, Gaudrain E, Chatterjee M, Başkent D. T'ain't the way you say it, it's what you say--perceptual continuity of voice and top-down restoration of speech. Hear Res 2014; 315:80-7. [PMID: 25019356 DOI: 10.1016/j.heares.2014.07.002] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/13/2013] [Revised: 06/25/2014] [Accepted: 07/02/2014] [Indexed: 11/19/2022]
Abstract
Phonemic restoration, or top-down repair of speech, is the ability of the brain to perceptually reconstruct missing speech sounds, using remaining speech features, linguistic knowledge and context. This usually occurs in conditions where the interrupted speech is perceived as continuous. The main goal of this study was to investigate whether voice continuity was necessary for phonemic restoration. Restoration benefit was measured by the improvement in intelligibility of meaningful sentences interrupted with periodic silent gaps, after the gaps were filled with noise bursts. A discontinuity was induced on the voice characteristics. The fundamental frequency, the vocal tract length, or both of the original vocal characteristics were changed using STRAIGHT to make a talker sound like a different talker from one speech segment to another. Voice discontinuity reduced the global intelligibility of interrupted sentences, confirming the importance of vocal cues for perceptually constructing a speech stream. However, phonemic restoration benefit persisted through all conditions despite the weaker voice continuity. This finding suggests that participants may have relied more on other cues, such as pitch contours or perhaps even linguistic context, when the vocal continuity was disrupted.
Collapse
Affiliation(s)
- Jeanne Clarke
- University of Groningen, University Medical Center Groningen, Department of Otorhinolaryngology/Head and Neck Surgery, Groningen, The Netherlands; University of Groningen, Graduate School of Medical Sciences, Research School of Behavioral and Cognitive Neurosciences, Groningen, The Netherlands.
| | - Etienne Gaudrain
- University of Groningen, University Medical Center Groningen, Department of Otorhinolaryngology/Head and Neck Surgery, Groningen, The Netherlands; University of Groningen, Graduate School of Medical Sciences, Research School of Behavioral and Cognitive Neurosciences, Groningen, The Netherlands.
| | | | - Deniz Başkent
- University of Groningen, University Medical Center Groningen, Department of Otorhinolaryngology/Head and Neck Surgery, Groningen, The Netherlands; University of Groningen, Graduate School of Medical Sciences, Research School of Behavioral and Cognitive Neurosciences, Groningen, The Netherlands.
| |
Collapse
|