1
|
Ferrer P, Upadhyay S, Cai JJ, Clement TM. Novel Nuclear Roles for Testis-Specific ACTL7A and ACTL7B Supported by In Vivo Characterizations and AI Facilitated In Silico Mechanistic Modeling with Implications for Epigenetic Regulation in Spermiogenesis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.29.582797. [PMID: 38464253 PMCID: PMC10925299 DOI: 10.1101/2024.02.29.582797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
A mechanistic role for nuclear function of testis-specific actin related proteins (ARPs) is proposed here through contributions of ARP subunit swapping in canonical chromatin regulatory complexes. This is significant to our understanding of both mechanisms controlling regulation of spermiogenesis, and the expanding functional roles of the ARPs in cell biology. Among these roles, actins and ARPs are pivotal not only in cytoskeletal regulation, but also in intranuclear chromatin organization, influencing gene regulation and nucleosome remodeling. This study focuses on two testis-specific ARPs, ACTL7A and ACTL7B, exploring their intranuclear activities and broader implications utilizing combined in vivo, in vitro, and in silico approaches. ACTL7A and ACTL7B, previously associated with structural roles, are hypothesized here to serve in chromatin regulation during germline development. This study confirms the intranuclear presence of ACTL7B in spermatocytes and round spermatids, revealing a potential role in intranuclear processes, and identifies a putative nuclear localization sequence conserved across mammalian ACTL7B, indicating a potentially unique mode of nuclear transport which differs from conventional actin. Ablation of ACTL7B leads to varied transcriptional changes reported here. Additionally, in the absence of ACTL7A or ACTL7B there is a loss of intranuclear localization of HDAC1 and HDAC3, which are known regulators of epigenetic associated acetylation changes that in turn regulate gene expression. Thus, these HDACs are implicated as contributors to the aberrant gene expression observed in the KO mouse testis transcriptomic analysis. Furthermore, this study employed and confirmed the accuracy of in silico models to predict ARP interactions with Helicase-SANT-associated (HSA) domains, uncovering putative roles for testis-specific ARPs in nucleosome remodeling complexes. In these models, ACTL7A and ACTL7B were found capable of binding to INO80 and SWI/SNF nucleosome remodeler family members in a manner akin to nuclear actin and ACTL6A. These models thus implicate germline-specific ARP subunit swapping within chromatin regulatory complexes as a potential regulatory mechanism for chromatin and associated molecular machinery adaptations in nuclear reorganizations required during spermiogenesis. These results hold implications for male fertility and epigenetic programing in the male-germline that warrant significant future investigation. In summary, this study reveals that ACTL7A and ACTL7B play intranuclear gene regulation roles in male gametogenesis, adding to the multifaceted roles identified also spanning structural, acrosomal, and flagellar stability. ACTL7A and ACTL7B unique nuclear transport, impact on HDAC nuclear associations, impact on transcriptional processes, and proposed mechanism for involvement in nucleosome remodeling complexes supported by AI facilitated in silico modeling contribute to a more comprehensive understanding of the indispensable functions of ARPs broadly in cell biology, and specifically in male fertility.
Collapse
Affiliation(s)
- Pierre Ferrer
- Interdisciplinary Faculty of Toxicology Program, Texas A&M University, College Station, TX 77843
- Department of Veterinary Physiology and Pharmacology, Texas A&M University, College Station, TX 77843
| | - Srijana Upadhyay
- Department of Veterinary Physiology and Pharmacology, Texas A&M University, College Station, TX 77843
| | - James J Cai
- Department of Veterinary Integrative Biosciences, School of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX 77843
| | - Tracy M Clement
- Interdisciplinary Faculty of Toxicology Program, Texas A&M University, College Station, TX 77843
- Department of Veterinary Physiology and Pharmacology, Texas A&M University, College Station, TX 77843
| |
Collapse
|
2
|
Luthra S. Why are listeners hindered by talker variability? Psychon Bull Rev 2024; 31:104-121. [PMID: 37580454 PMCID: PMC10864679 DOI: 10.3758/s13423-023-02355-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/27/2023] [Indexed: 08/16/2023]
Abstract
Though listeners readily recognize speech from a variety of talkers, accommodating talker variability comes at a cost: Myriad studies have shown that listeners are slower to recognize a spoken word when there is talker variability compared with when talker is held constant. This review focuses on two possible theoretical mechanisms for the emergence of these processing penalties. One view is that multitalker processing costs arise through a resource-demanding talker accommodation process, wherein listeners compare sensory representations against hypothesized perceptual candidates and error signals are used to adjust the acoustic-to-phonetic mapping (an active control process known as contextual tuning). An alternative proposal is that these processing costs arise because talker changes involve salient stimulus-level discontinuities that disrupt auditory attention. Some recent data suggest that multitalker processing costs may be driven by both mechanisms operating over different time scales. Fully evaluating this claim requires a foundational understanding of both talker accommodation and auditory streaming; this article provides a primer on each literature and also reviews several studies that have observed multitalker processing costs. The review closes by underscoring a need for comprehensive theories of speech perception that better integrate auditory attention and by highlighting important considerations for future research in this area.
Collapse
Affiliation(s)
- Sahil Luthra
- Department of Psychology, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, 15213, USA.
| |
Collapse
|
3
|
Orepic P, Truccolo W, Halgren E, Cash SS, Giraud AL, Proix T. Neural manifolds carry reactivation of phonetic representations during semantic processing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.10.30.564638. [PMID: 37961305 PMCID: PMC10634964 DOI: 10.1101/2023.10.30.564638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Traditional models of speech perception posit that neural activity encodes speech through a hierarchy of cognitive processes, from low-level representations of acoustic and phonetic features to high-level semantic encoding. Yet it remains unknown how neural representations are transformed across levels of the speech hierarchy. Here, we analyzed unique microelectrode array recordings of neuronal spiking activity from the human left anterior superior temporal gyrus, a brain region at the interface between phonetic and semantic speech processing, during a semantic categorization task and natural speech perception. We identified distinct neural manifolds for semantic and phonetic features, with a functional separation of the corresponding low-dimensional trajectories. Moreover, phonetic and semantic representations were encoded concurrently and reflected in power increases in the beta and low-gamma local field potentials, suggesting top-down predictive and bottom-up cumulative processes. Our results are the first to demonstrate mechanisms for hierarchical speech transformations that are specific to neuronal population dynamics.
Collapse
Affiliation(s)
- Pavo Orepic
- Department of Basic Neurosciences, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Wilson Truccolo
- Department of Neuroscience, Brown University, Providence, Rhode Island, United States of America
- Carney Institute for Brain Science, Brown University, Providence, Rhode Island, United States of America
| | - Eric Halgren
- Department of Neuroscience & Radiology, University of California San Diego, La Jolla, California, United States of America
| | - Sydney S Cash
- Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Anne-Lise Giraud
- Department of Basic Neurosciences, Faculty of Medicine, University of Geneva, Geneva, Switzerland
- Institut Pasteur, Université Paris Cité, Hearing Institute, Paris, France
| | - Timothée Proix
- Department of Basic Neurosciences, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| |
Collapse
|
4
|
Xie X, Jaeger TF, Kurumada C. What we do (not) know about the mechanisms underlying adaptive speech perception: A computational framework and review. Cortex 2023; 166:377-424. [PMID: 37506665 DOI: 10.1016/j.cortex.2023.05.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 12/23/2022] [Accepted: 05/05/2023] [Indexed: 07/30/2023]
Abstract
Speech from unfamiliar talkers can be difficult to comprehend initially. These difficulties tend to dissipate with exposure, sometimes within minutes or less. Adaptivity in response to unfamiliar input is now considered a fundamental property of speech perception, and research over the past two decades has made substantial progress in identifying its characteristics. The mechanisms underlying adaptive speech perception, however, remain unknown. Past work has attributed facilitatory effects of exposure to any one of three qualitatively different hypothesized mechanisms: (1) low-level, pre-linguistic, signal normalization, (2) changes in/selection of linguistic representations, or (3) changes in post-perceptual decision-making. Direct comparisons of these hypotheses, or combinations thereof, have been lacking. We describe a general computational framework for adaptive speech perception (ASP) that-for the first time-implements all three mechanisms. We demonstrate how the framework can be used to derive predictions for experiments on perception from the acoustic properties of the stimuli. Using this approach, we find that-at the level of data analysis presently employed by most studies in the field-the signature results of influential experimental paradigms do not distinguish between the three mechanisms. This highlights the need for a change in research practices, so that future experiments provide more informative results. We recommend specific changes to experimental paradigms and data analysis. All data and code for this study are shared via OSF, including the R markdown document that this article is generated from, and an R library that implements the models we present.
Collapse
Affiliation(s)
- Xin Xie
- Language Science, University of California, Irvine, USA.
| | - T Florian Jaeger
- Brain and Cognitive Sciences, University of Rochester, Rochester, NY, USA; Computer Science, University of Rochester, Rochester, NY, USA
| | - Chigusa Kurumada
- Brain and Cognitive Sciences, University of Rochester, Rochester, NY, USA
| |
Collapse
|
5
|
Oganian Y, Bhaya-Grossman I, Johnson K, Chang EF. Vowel and formant representation in the human auditory speech cortex. Neuron 2023; 111:2105-2118.e4. [PMID: 37105171 PMCID: PMC10330593 DOI: 10.1016/j.neuron.2023.04.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 02/08/2023] [Accepted: 04/04/2023] [Indexed: 04/29/2023]
Abstract
Vowels, a fundamental component of human speech across all languages, are cued acoustically by formants, resonance frequencies of the vocal tract shape during speaking. An outstanding question in neurolinguistics is how formants are processed neurally during speech perception. To address this, we collected high-density intracranial recordings from the human speech cortex on the superior temporal gyrus (STG) while participants listened to continuous speech. We found that two-dimensional receptive fields based on the first two formants provided the best characterization of vowel sound representation. Neural activity at single sites was highly selective for zones in this formant space. Furthermore, formant tuning is adjusted dynamically for speaker-specific spectral context. However, the entire population of formant-encoding sites was required to accurately decode single vowels. Overall, our results reveal that complex acoustic tuning in the two-dimensional formant space underlies local vowel representations in STG. As a population code, this gives rise to phonological vowel perception.
Collapse
Affiliation(s)
- Yulia Oganian
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA
| | - Ilina Bhaya-Grossman
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA; University of California, Berkeley-University of California, San Francisco Graduate Program in Bioengineering, Berkeley, CA 94720, USA
| | - Keith Johnson
- Department of Linguistics, University of California, Berkeley, Berkeley, CA, USA
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA.
| |
Collapse
|
6
|
Chen F, Zhang K, Guo Q, Lv J. Development of Achieving Constancy in Lexical Tone Identification With Contextual Cues. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:1148-1164. [PMID: 36995907 DOI: 10.1044/2022_jslhr-22-00257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
PURPOSE The aim of this study was to explore when and how Mandarin-speaking children use contextual cues to normalize speech variability in perceiving lexical tones. Two different cognitive mechanisms underlying speech normalization (lower level acoustic normalization and higher level acoustic-phonemic normalization) were investigated through the lexical tone identification task in nonspeech contexts and speech contexts, respectively. Besides, another aim of this study was to reveal how domain-general cognitive abilities contribute to the development of the speech normalization process. METHOD In this study, 94 five- to eight-year-old Mandarin-speaking children (50 boys, 44 girls) and 24 young adults (14 men, 10 women) were asked to identify ambiguous Mandarin high-level and mid-rising tones in either speech or nonspeech contexts. Furthermore, in this study, we tested participants' pitch sensitivity through a nonlinguistic pitch discrimination task and their working memory using the digit span task. RESULTS Higher level acoustic-phonemic normalization of lexical tones emerged at the age of 6 years and was relatively stable thereafter. However, lower level acoustic normalization was less stable across different ages. Neither pitch sensitivity nor working memory affected children's lexical tone normalization. CONCLUSIONS Mandarin-speaking children above 6 years of age successfully achieved constancy in lexical tone normalization based on speech contextual cues. The perceptual normalization of lexical tones was not affected by pitch sensitivity and working memory capacity.
Collapse
Affiliation(s)
- Fei Chen
- School of Foreign Languages, Hunan University, Changsha, China
| | - Kaile Zhang
- Centre for Cognitive and Brain Sciences, University of Macau, China
| | - Qingqing Guo
- School of Foreign Languages, Hunan University, Changsha, China
| | - Jia Lv
- School of Foreign Languages and Literature, Wuhan University, China
| |
Collapse
|
7
|
Klimovich-Gray A, Di Liberto G, Amoruso L, Barrena A, Agirre E, Molinaro N. Increased top-down semantic processing in natural speech linked to better reading in dyslexia. Neuroimage 2023; 273:120072. [PMID: 37004829 DOI: 10.1016/j.neuroimage.2023.120072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 03/12/2023] [Accepted: 03/30/2023] [Indexed: 04/03/2023] Open
Abstract
Early research proposed that individuals with developmental dyslexia use contextual information to facilitate lexical access and compensate for phonological deficits. Yet at present there is no corroborating neuro-cognitive evidence. We explored this with a novel combination of magnetoencephalography (MEG), neural encoding and grey matter volume analyses. We analysed MEG data from 41 adult native Spanish speakers (14 with dyslexic symptoms) who passively listened to naturalistic sentences. We used multivariate Temporal Response Function analysis to capture online cortical tracking of both auditory (speech envelope) and contextual information. To compute contextual information tracking we used word-level Semantic Surprisal derived using a Transformer Neural Network language model. We related online information tracking to participants' reading scores and grey matter volumes within the reading-linked cortical network. We found that right hemisphere envelope tracking was related to better phonological decoding (pseudoword reading) for both groups, with dyslexic readers performing worse overall at this task. Consistently, grey matter volume in the superior temporal and bilateral inferior frontal areas increased with better envelope tracking abilities. Critically, for dyslexic readers only, stronger Semantic Surprisal tracking in the right hemisphere was related to better word reading. These findings further support the notion of a speech envelope tracking deficit in dyslexia and provide novel evidence for top-down semantic compensatory mechanisms.
Collapse
|
8
|
Kapadia AM, Tin JAA, Perrachione TK. Multiple sources of acoustic variation affect speech processing efficiency. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:209. [PMID: 36732274 PMCID: PMC9836727 DOI: 10.1121/10.0016611] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 11/14/2022] [Accepted: 12/07/2022] [Indexed: 05/29/2023]
Abstract
Phonetic variability across talkers imposes additional processing costs during speech perception, evident in performance decrements when listening to speech from multiple talkers. However, within-talker phonetic variation is a less well-understood source of variability in speech, and it is unknown how processing costs from within-talker variation compare to those from between-talker variation. Here, listeners performed a speeded word identification task in which three dimensions of variability were factorially manipulated: between-talker variability (single vs multiple talkers), within-talker variability (single vs multiple acoustically distinct recordings per word), and word-choice variability (two- vs six-word choices). All three sources of variability led to reduced speech processing efficiency. Between-talker variability affected both word-identification accuracy and response time, but within-talker variability affected only response time. Furthermore, between-talker variability, but not within-talker variability, had a greater impact when the target phonological contrasts were more similar. Together, these results suggest that natural between- and within-talker variability reflect two distinct magnitudes of common acoustic-phonetic variability: Both affect speech processing efficiency, but they appear to have qualitatively and quantitatively unique effects due to differences in their potential to obscure acoustic-phonemic correspondences across utterances.
Collapse
Affiliation(s)
- Alexandra M Kapadia
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Jessica A A Tin
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Tyler K Perrachione
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| |
Collapse
|
9
|
Hullett PW, Kandahari N, Shih TT, Kleen JK, Knowlton RC, Rao VR, Chang EF. Intact speech perception after resection of dominant hemisphere primary auditory cortex for the treatment of medically refractory epilepsy: illustrative case. JOURNAL OF NEUROSURGERY. CASE LESSONS 2022; 4:CASE22417. [PMID: 36443954 PMCID: PMC9705521 DOI: 10.3171/case22417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 10/27/2022] [Indexed: 11/29/2022]
Abstract
BACKGROUND In classic speech network models, the primary auditory cortex is the source of auditory input to Wernicke's area in the posterior superior temporal gyrus (pSTG). Because resection of the primary auditory cortex in the dominant hemisphere removes inputs to the pSTG, there is a risk of speech impairment. However, recent research has shown the existence of other, nonprimary auditory cortex inputs to the pSTG, potentially reducing the risk of primary auditory cortex resection in the dominant hemisphere. OBSERVATIONS Here, the authors present a clinical case of a woman with severe medically refractory epilepsy with a lesional epileptic focus in the left (dominant) Heschl's gyrus. Analysis of neural responses to speech stimuli was consistent with primary auditory cortex localization to Heschl's gyrus. Although the primary auditory cortex was within the proposed resection margins, she underwent lesionectomy with total resection of Heschl's gyrus. Postoperatively, she had no speech deficits and her seizures were fully controlled. LESSONS While resection of the dominant hemisphere Heschl's gyrus/primary auditory cortex warrants caution, this case illustrates the ability to resect the primary auditory cortex without speech impairment and supports recent models of multiple parallel inputs to the pSTG.
Collapse
Affiliation(s)
- Patrick W. Hullett
- Department of Neurology, Weill Institute for Neurosciences, University of California San Francisco, San Francisco, California
| | - Nazineen Kandahari
- Department of Neurosurgery, University of California San Francisco, San Francisco, California; and ,Department of Neurology, Weill Institute for Neurosciences, University of California San Francisco, San Francisco, California
| | - Tina T. Shih
- Department of Neurology, Weill Institute for Neurosciences, University of California San Francisco, San Francisco, California
| | - Jonathan K. Kleen
- Department of Neurology, Weill Institute for Neurosciences, University of California San Francisco, San Francisco, California
| | - Robert C. Knowlton
- Department of Neurology, Weill Institute for Neurosciences, University of California San Francisco, San Francisco, California
| | - Vikram R. Rao
- Department of Neurology, Weill Institute for Neurosciences, University of California San Francisco, San Francisco, California
| | - Edward F. Chang
- Department of Neurosurgery, University of California San Francisco, San Francisco, California; and
| |
Collapse
|
10
|
Díaz B, Cordero G, Hoogendoorn J, Sebastian-Galles N. Second-language phoneme learning positively relates to voice recognition abilities in the native language: Evidence from behavior and brain potentials. Front Psychol 2022; 13:1008963. [PMID: 36312157 PMCID: PMC9596972 DOI: 10.3389/fpsyg.2022.1008963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 09/20/2022] [Indexed: 11/20/2022] Open
Abstract
Previous studies suggest a relationship between second-language learning and voice recognition processes, but the nature of such relation remains poorly understood. The present study investigates whether phoneme learning relates to voice recognition. A group of bilinguals that varied in their discrimination of a second-language phoneme contrast participated in this study. We assessed participants’ voice recognition skills in their native language at the behavioral and brain electrophysiological levels during a voice-avatar learning paradigm. Second-language phoneme discrimination positively correlated with behavioral and brain measures of voice recognition. At the electrophysiological level, correlations were present at two time windows and are interpreted within the dual-process model of recognition memory. The results are relevant to understanding the processes involved in language learning as they show a common variability for second-language phoneme and voice recognition processes.
Collapse
Affiliation(s)
- Begoña Díaz
- Department of Basic Sciences, Faculty of Medicine and Health Sciences, Universitat Internacional de Catalunya, Barcelona, Spain
- *Correspondence: Begoña Díaz,
| | - Gaël Cordero
- Department of Basic Sciences, Faculty of Medicine and Health Sciences, Universitat Internacional de Catalunya, Barcelona, Spain
| | - Joyce Hoogendoorn
- Center for Brain and Cognition, University Pompeu Fabra, Barcelona, Spain
| | | |
Collapse
|
11
|
Si C, Zhang C, Lau P, Yang Y, Li B. Modelling representations in speech normalization of prosodic cues. Sci Rep 2022; 12:14635. [PMID: 36030274 PMCID: PMC9420126 DOI: 10.1038/s41598-022-18838-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 08/22/2022] [Indexed: 12/02/2022] Open
Abstract
The lack of invariance problem in speech perception refers to a fundamental problem of how listeners deal with differences of speech sounds produced by various speakers. The current study is the first to test the contributions of mentally stored distributional information in normalization of prosodic cues. This study starts out by modelling distributions of acoustic cues from a speech corpus. We proceeded to conduct three experiments using both naturally produced lexical tones with estimated distributions and manipulated lexical tones with f0 values generated from simulated distributions. State of the art statistical techniques have been used to examine the effects of distribution parameters in normalization and identification curves with respect to each parameter. Based on the significant effects of distribution parameters, we proposed a probabilistic parametric representation (PPR), integrating knowledge from previously established distributions of speakers with their indexical information. PPR is still accessed during speech perception even when contextual information is present. We also discussed the procedure of normalization of speech signals produced by unfamiliar talker with and without contexts and the access of long-term stored representations.
Collapse
Affiliation(s)
- Chen Si
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China. .,Hong Kong Polytechnic University-Peking University Research Centre on Chinese Linguistics, Kowloon, Hong Kong SAR, China. .,Research Centre for Language, Cognition, and Neuroscience, University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China.
| | - Caicai Zhang
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China.,Hong Kong Polytechnic University-Peking University Research Centre on Chinese Linguistics, Kowloon, Hong Kong SAR, China.,Research Centre for Language, Cognition, and Neuroscience, University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China
| | - Puiyin Lau
- Department of Statistics and Actuarial Science, University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China
| | - Yike Yang
- Department of Chinese Language and Literature, Hong Kong Shue Yan University, North Point, Hong Kong SAR, China
| | - Bei Li
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China
| |
Collapse
|
12
|
Krumbiegel J, Ufer C, Blank H. Influence of voice properties on vowel perception depends on speaker context. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:820. [PMID: 36050169 DOI: 10.1121/10.0013363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 07/13/2022] [Indexed: 06/15/2023]
Abstract
Different speakers produce the same intended vowel with very different physical properties. Fundamental frequency (F0) and formant frequencies (FF), the two main parameters that discriminate between voices, also influence vowel perception. While it has been shown that listeners comprehend speech more accurately if they are familiar with a talker's voice, it is still unclear how such prior information is used when decoding the speech stream. In three online experiments, we examined the influence of speaker context via F0 and FF shifts on the perception of /o/-/u/ vowel contrasts. Participants perceived vowels from an /o/-/u/ continuum shifted toward /u/ when F0 was lowered or FF increased relative to the original speaker's voice and vice versa. This shift was reduced when the speakers were presented in a block-wise context compared to random order. Conversely, the original base voice was perceived to be shifted toward /u/ when presented in the context of a low F0 or high FF speaker, compared to a shift toward /o/ with high F0 or low FF speaker context. These findings demonstrate that that F0 and FF jointly influence vowel perception in speaker context.
Collapse
Affiliation(s)
- Julius Krumbiegel
- Institute for Systems Neuroscience, University Hospital Hamburg-Eppendorf, Hamburg, Germany
| | - Carina Ufer
- Institute for Systems Neuroscience, University Hospital Hamburg-Eppendorf, Hamburg, Germany
| | - Helen Blank
- Institute for Systems Neuroscience, University Hospital Hamburg-Eppendorf, Hamburg, Germany
| |
Collapse
|
13
|
Bilinguals Show Proportionally Greater Benefit From Visual Speech Cues and Sentence Context in Their Second Compared to Their First Language. Ear Hear 2021; 43:1316-1326. [PMID: 34966162 DOI: 10.1097/aud.0000000000001182] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
OBJECTIVES Speech perception in noise is challenging, but evidence suggests that it may be facilitated by visual speech cues (e.g., lip movements) and supportive sentence context in native speakers. Comparatively few studies have investigated speech perception in noise in bilinguals, and little is known about the impact of visual speech cues and supportive sentence context in a first language compared to a second language within the same individual. The current study addresses this gap by directly investigating the extent to which bilinguals benefit from visual speech cues and supportive sentence context under similarly noisy conditions in their first and second language. DESIGN Thirty young adult English-French/French-English bilinguals were recruited from the undergraduate psychology program at Concordia University and from the Montreal community. They completed a speech perception in noise task during which they were presented with video-recorded sentences and instructed to repeat the last word of each sentence out loud. Sentences were presented in three different modalities: visual-only, auditory-only, and audiovisual. Additionally, sentences had one of two levels of context: moderate (e.g., "In the woods, the hiker saw a bear.") and low (e.g., "I had not thought about that bear."). Each participant completed this task in both their first and second language; crucially, the level of background noise was calibrated individually for each participant and was the same throughout the first language and second language (L2) portions of the experimental task. RESULTS Overall, speech perception in noise was more accurate in bilinguals' first language compared to the second. However, participants benefited from visual speech cues and supportive sentence context to a proportionally greater extent in their second language compared to their first. At the individual level, performance during the speech perception in noise task was related to aspects of bilinguals' experience in their second language (i.e., age of acquisition, relative balance between the first and the second language). CONCLUSIONS Bilinguals benefit from visual speech cues and sentence context in their second language during speech in noise and do so to a greater extent than in their first language given the same level of background noise. Together, this indicates that L2 speech perception can be conceptualized within an inverse effectiveness hypothesis framework with a complex interplay of sensory factors (i.e., the quality of the auditory speech signal and visual speech cues) and linguistic factors (i.e., presence or absence of supportive context and L2 experience of the listener).
Collapse
|
14
|
Distinct mechanisms for talker adaptation operate in parallel on different timescales. Psychon Bull Rev 2021; 29:627-634. [PMID: 34731443 DOI: 10.3758/s13423-021-02019-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/23/2021] [Indexed: 11/08/2022]
Abstract
The mapping between speech acoustics and phonemic representations is highly variable across talkers, and listeners are slower to recognize words when listening to multiple talkers compared with a single talker. Listeners' speech processing efficiency in mixed-talker settings improves when given time to reorient their attention to each new talker. However, it remains unknown how much time is needed to fully reorient attention to a new talker in mixed-talker settings so that speech processing becomes as efficient as when listening to a single talker. In this study, we examined how speech processing efficiency improves in mixed-talker settings as a function of the duration of continuous speech from a talker. In single-talker and mixed-talker conditions, listeners identified target words either in isolation or preceded by a carrier vowel of parametrically varying durations from 300 to 1,500 ms. Listeners' word identification was significantly slower in every mixed-talker condition compared with the corresponding single-talker condition. The costs associated with processing mixed-talker speech declined significantly as the duration of the speech carrier increased from 0 to 600 ms. However, increasing the carrier duration beyond 600 ms did not achieve further reduction in talker variability-related processing costs. These results suggest that two parallel mechanisms support processing talker variability: A stimulus-driven mechanism that operates on short timescales to reorient attention to new auditory sources, and a top-down mechanism that operates over longer timescales to allocate the cognitive resources needed to accommodate uncertainty in acoustic-phonemic correspondences during contexts where speech may come from multiple talkers.
Collapse
|
15
|
Zhang K, Peng G. The time course of normalizing speech variability in vowels. BRAIN AND LANGUAGE 2021; 222:105028. [PMID: 34597904 DOI: 10.1016/j.bandl.2021.105028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2020] [Revised: 07/21/2021] [Accepted: 09/08/2021] [Indexed: 06/13/2023]
Abstract
To achieve perceptual constancy, listeners utilize contextual cues to normalize speech variabilities in speakers. The present study tested the time course of this cognitive process with an event-related potential (ERP) experiment. The first neurophysiological evidence of speech normalization is observed in P2 (130-250 ms), which is functionally related to phonetic and phonological processes. Furthermore, the normalization process was found to ease lexical retrieval, as indexed by smaller N400 (350-470 ms) after larger P2. A cross-language vowel perception task was carried out to further specify whether normalization was processed in the phonetic and/or phonological stage(s). It was found that both phonetic and phonological cues in the speech context contributed to vowel normalization. The results suggest that vowel normalization in the speech context can be observed in the P2 time window and largely overlaps with phonetic and phonological processes.
Collapse
Affiliation(s)
- Kaile Zhang
- Research Centre for Language, Cognition, and Neuroscience, Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region.
| | - Gang Peng
- Research Centre for Language, Cognition, and Neuroscience, Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region; Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Boulevard, Shenzhen 518055, China.
| |
Collapse
|
16
|
Tao R, Zhang K, Peng G. Music Does Not Facilitate Lexical Tone Normalization: A Speech-Specific Perceptual Process. Front Psychol 2021; 12:717110. [PMID: 34777097 PMCID: PMC8585521 DOI: 10.3389/fpsyg.2021.717110] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Accepted: 09/30/2021] [Indexed: 11/13/2022] Open
Abstract
Listeners utilize the immediate contexts to efficiently normalize variable vocal streams into standard phonology units. However, researchers debated whether non-speech contexts can also serve as valid clues for speech normalization. Supporters of the two sides proposed a general-auditory hypothesis and a speech-specific hypothesis to explain the underlying mechanisms. A possible confounding factor of this inconsistency is the listeners' perceptual familiarity of the contexts, as the non-speech contexts were perceptually unfamiliar to listeners. In this study, we examined this confounding factor by recruiting a group of native Cantonese speakers with sufficient musical training experience and a control group with minimal musical training. Participants performed lexical tone judgment tasks in three contextual conditions, i.e., speech, non-speech, and music context conditions. Both groups were familiar with the speech context and not familiar with the non-speech context. The musician group was more familiar with the music context than the non-musician group. The results evidenced the lexical tone normalization process in speech context but not non-speech nor music contexts. More importantly, musicians did not outperform non-musicians on any contextual conditions even if the musicians were experienced at pitch perception, indicating that there is no noticeable transfer in pitch perception from the music domain to the linguistic domain for tonal language speakers. The findings showed that even high familiarity with a non-linguistic context cannot elicit an effective lexical tone normalization process, supporting the speech-specific basis of the perceptual normalization process.
Collapse
Affiliation(s)
| | | | - Gang Peng
- Research Centre for Language, Cognition, and Neuroscience, Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China
| |
Collapse
|
17
|
Abstract
Human speech perception results from neural computations that transform external acoustic speech signals into internal representations of words. The superior temporal gyrus (STG) contains the nonprimary auditory cortex and is a critical locus for phonological processing. Here, we describe how speech sound representation in the STG relies on fundamentally nonlinear and dynamical processes, such as categorization, normalization, contextual restoration, and the extraction of temporal structure. A spatial mosaic of local cortical sites on the STG exhibits complex auditory encoding for distinct acoustic-phonetic and prosodic features. We propose that as a population ensemble, these distributed patterns of neural activity give rise to abstract, higher-order phonemic and syllabic representations that support speech perception. This review presents a multi-scale, recurrent model of phonological processing in the STG, highlighting the critical interface between auditory and language systems. Expected final online publication date for the Annual Review of Psychology, Volume 73 is January 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Ilina Bhaya-Grossman
- Department of Neurological Surgery, University of California, San Francisco, California 94143, USA; .,Joint Graduate Program in Bioengineering, University of California, Berkeley and San Francisco, California 94720, USA
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, California 94143, USA;
| |
Collapse
|
18
|
Lim SJ, Carter YD, Njoroge JM, Shinn-Cunningham BG, Perrachione TK. Talker discontinuity disrupts attention to speech: Evidence from EEG and pupillometry. BRAIN AND LANGUAGE 2021; 221:104996. [PMID: 34358924 PMCID: PMC8515637 DOI: 10.1016/j.bandl.2021.104996] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 07/11/2021] [Accepted: 07/13/2021] [Indexed: 05/13/2023]
Abstract
Speech is processed less efficiently from discontinuous, mixed talkers than one consistent talker, but little is known about the neural mechanisms for processing talker variability. Here, we measured psychophysiological responses to talker variability using electroencephalography (EEG) and pupillometry while listeners performed a delayed recall of digit span task. Listeners heard and recalled seven-digit sequences with both talker (single- vs. mixed-talker digits) and temporal (0- vs. 500-ms inter-digit intervals) discontinuities. Talker discontinuity reduced serial recall accuracy. Both talker and temporal discontinuities elicited P3a-like neural evoked response, while rapid processing of mixed-talkers' speech led to increased phasic pupil dilation. Furthermore, mixed-talkers' speech produced less alpha oscillatory power during working memory maintenance, but not during speech encoding. Overall, these results are consistent with an auditory attention and streaming framework in which talker discontinuity leads to involuntary, stimulus-driven attentional reorientation to novel speech sources, resulting in the processing interference classically associated with talker variability.
Collapse
Affiliation(s)
- Sung-Joo Lim
- Department of Speech, Language, and Hearing Sciences, Boston University, United States.
| | - Yaminah D Carter
- Department of Speech, Language, and Hearing Sciences, Boston University, United States
| | - J Michelle Njoroge
- Department of Speech, Language, and Hearing Sciences, Boston University, United States
| | | | - Tyler K Perrachione
- Department of Speech, Language, and Hearing Sciences, Boston University, United States.
| |
Collapse
|
19
|
Stilp CE. Parameterizing spectral contrast effects in vowel categorization using noise contexts. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:2806. [PMID: 34717452 DOI: 10.1121/10.0006657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 09/18/2021] [Indexed: 06/13/2023]
Abstract
When spectra differ between earlier (context) and later (target) sounds, listeners perceive larger spectral changes than are physically present. When context sounds (e.g., a sentence) possess relatively higher frequencies, the target sound (e.g., a vowel sound) is perceived as possessing relatively lower frequencies, and vice versa. These spectral contrast effects (SCEs) are pervasive in auditory perception, but studies traditionally employed contexts with high spectrotemporal variability that made it difficult to understand exactly when context spectral properties biased perception. Here, contexts were speech-shaped noise divided into four consecutive 500-ms epochs. Contexts were filtered to amplify low-F1 (100-400 Hz) or high-F1 (550-850 Hz) frequencies to encourage target perception of /ɛ/ ("bet") or /ɪ/ ("bit"), respectively, via SCEs. Spectral peaks in the context ranged from its initial epoch(s) to its entire duration (onset paradigm), ranged from its final epoch(s) to its entire duration (offset paradigm), or were present for only one epoch (single paradigm). SCE magnitudes increased as spectral-peak durations increased and/or occurred later in the context (closer to the target). Contrary to predictions, brief early spectral peaks still biased subsequent target categorization. Results are compared to related experiments using speech contexts, and physiological and/or psychoacoustic idiosyncrasies of the noise contexts are considered.
Collapse
Affiliation(s)
- Christian E Stilp
- Department of Psychological and Brain Sciences, 317 Life Sciences Building, University of Louisville, Louisville, Kentucky 40292, USA
| |
Collapse
|
20
|
Feng L, Oxenham AJ. Spectral Contrast Effects Reveal Different Acoustic Cues for Vowel Recognition in Cochlear-Implant Users. Ear Hear 2021; 41:990-997. [PMID: 31815819 PMCID: PMC7874522 DOI: 10.1097/aud.0000000000000820] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES The identity of a speech sound can be affected by the spectrum of a preceding stimulus in a contrastive manner. Although such aftereffects are often reduced in people with hearing loss and cochlear implants (CIs), one recent study demonstrated larger spectral contrast effects in CI users than in normal-hearing (NH) listeners. The present study aimed to shed light on this puzzling finding. We hypothesized that poorer spectral resolution leads CI users to rely on different acoustic cues not only to identify speech sounds but also to adapt to the context. DESIGN Thirteen postlingually deafened adult CI users and 33 NH participants (listening to either vocoded or unprocessed speech) participated in this study. Psychometric functions were estimated in a vowel categorization task along the /I/ to /ε/ (as in "bit" and "bet") continuum following a context sentence, the long-term average spectrum of which was manipulated at the level of either fine-grained local spectral cues or coarser global spectral cues. RESULTS In NH listeners with unprocessed speech, the aftereffect was determined solely by the fine-grained local spectral cues, resulting in a surprising insensitivity to the larger, global spectral cues utilized by CI users. Restricting the spectral resolution available to NH listeners via vocoding resulted in patterns of responses more similar to those found in CI users. However, the size of the contrast aftereffect remained smaller in NH listeners than in CI users. CONCLUSIONS Only the spectral contrasts used by listeners contributed to the spectral contrast effects in vowel identification. These results explain why CI users can experience larger-than-normal context effects under specific conditions. The results also suggest that adaptation to new spectral cues can be very rapid for vowel discrimination, but may follow a longer time course to influence spectral contrast effects.
Collapse
Affiliation(s)
- Lei Feng
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota, USA
| | | |
Collapse
|
21
|
Liu F, Yin Y, Chan AHD, Yip V, Wong PCM. Individuals with congenital amusia do not show context-dependent perception of tonal categories. BRAIN AND LANGUAGE 2021; 215:104908. [PMID: 33578176 DOI: 10.1016/j.bandl.2021.104908] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/26/2019] [Revised: 01/05/2021] [Accepted: 01/05/2021] [Indexed: 06/12/2023]
Abstract
Perceptual adaptation is an active cognitive process where listeners re-analyse speech categories based on new contexts/situations/talkers. It involves top-down influences from higher cortical levels on lower-level auditory processes. Individuals with congenital amusia have impaired pitch processing with reduced connectivity between frontal and temporal regions. This study examined whether deficits in amusia would lead to impaired perceptual adaptation in lexical tone perception. Thirteen Mandarin-speaking amusics and 13 controls identified the category of target tones on an 8-step continuum ranging from rising to high-level, either in isolation or in a high-/low-pitched context. For tones with no context, amusics exhibited reduced categorical perception than controls. While controls' lexical tone categorization demonstrated a significant context effect due to perceptual adaptation, amusics showed similar categorization patterns across both contexts. These findings suggest that congenital amusia impacts the extraction of context-dependent tonal categories in speech perception, indicating that perceptual adaptation may depend on listeners' perceptual acuity.
Collapse
Affiliation(s)
- Fang Liu
- School of Psychology and Clinical Language Sciences, University of Reading, Reading, UK
| | - Yanjun Yin
- Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Hong Kong, China
| | - Alice H D Chan
- Linguistics and Multilingual Studies, School of Humanities, Nanyang Technological University, Singapore.
| | - Virginia Yip
- Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Hong Kong, China
| | - Patrick C M Wong
- Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Hong Kong, China; Brain and Mind Institute, The Chinese University of Hong Kong, Hong Kong, China.
| |
Collapse
|
22
|
Zhang K, Sjerps MJ, Peng G. Integral perception, but separate processing: The perceptual normalization of lexical tones and vowels. Neuropsychologia 2021; 156:107839. [PMID: 33798490 DOI: 10.1016/j.neuropsychologia.2021.107839] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Revised: 03/20/2021] [Accepted: 03/26/2021] [Indexed: 11/28/2022]
Abstract
In tonal languages, speech variability arises in both lexical tone (i.e., suprasegmentally) and vowel quality (segmentally). Listeners can use surrounding speech context to overcome variability in both speech cues, a process known as extrinsic normalization. Although vowels are the main carriers of tones, it is still unknown whether the combined percept (lexical tone and vowel quality) is normalized integrally or in partly separate processes. Here we used electroencephalography (EEG) to investigate the time course of lexical tone normalization and vowel normalization to answer this question. Cantonese adults listened to synthesized three-syllable stimuli in which the identity of a target syllable - ambiguous between high vs. mid-tone (Tone condition) or between /o/ vs. /u/ (Vowel condition) - was dependent on either the tone range (Tone condition) or the formant range (Vowel condition) of the first two syllables. It was observed that the ambiguous tone was more often interpreted as a high-level tone when the context had a relatively low pitch than when it had a high pitch (Tone condition). Similarly, the ambiguous vowel was more often interpreted as /o/ when the context had a relatively low formant range than when it had a relatively high formant range (Vowel condition). These findings show the typical pattern of extrinsic tone and vowel normalization. Importantly, the EEG results of participants showing the contrastive normalization effect demonstrated that the effects of vowel normalization could already be observed within the N2 time window (190-350 ms), while the first reliable effect of lexical tone normalization on cortical processing was observable only from the P3 time window (220-500 ms) onwards. The ERP patterns demonstrate that the contrastive perceptual normalization of lexical tones and that of vowels occur at least in partially separate time windows. This suggests that the extrinsic normalization can operate at the level of phonemes and tonemes separately instead of operating on the whole syllable at once.
Collapse
Affiliation(s)
- Kaile Zhang
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region.
| | - Matthias J Sjerps
- Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, Radboud University, Kapittelweg 29, Nijmegen, 6525 EN, the Netherlands.
| | - Gang Peng
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region; Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Boulevard, Shenzhen, 518055, China.
| |
Collapse
|
23
|
|
24
|
Abstract
Speech processing in the human brain is grounded in non-specific auditory processing in the general mammalian brain, but relies on human-specific adaptations for processing speech and language. For this reason, many recent neurophysiological investigations of speech processing have turned to the human brain, with an emphasis on continuous speech. Substantial progress has been made using the phenomenon of "neural speech tracking", in which neurophysiological responses time-lock to the rhythm of auditory (and other) features in continuous speech. One broad category of investigations concerns the extent to which speech tracking measures are related to speech intelligibility, which has clinical applications in addition to its scientific importance. Recent investigations have also focused on disentangling different neural processes that contribute to speech tracking. The two lines of research are closely related, since processing stages throughout auditory cortex contribute to speech comprehension, in addition to subcortical processing and higher order and attentional processes.
Collapse
Affiliation(s)
- Christian Brodbeck
- Institute for Systems Research, University of Maryland, College Park, Maryland 20742, U.S.A
| | - Jonathan Z. Simon
- Institute for Systems Research, University of Maryland, College Park, Maryland 20742, U.S.A
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland 20742, U.S.A
- Department of Biology, University of Maryland, College Park, Maryland 20742, U.S.A
| |
Collapse
|
25
|
Fox NP, Leonard M, Sjerps MJ, Chang EF. Transformation of a temporal speech cue to a spatial neural code in human auditory cortex. eLife 2020; 9:e53051. [PMID: 32840483 PMCID: PMC7556862 DOI: 10.7554/elife.53051] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Accepted: 08/21/2020] [Indexed: 11/28/2022] Open
Abstract
In speech, listeners extract continuously-varying spectrotemporal cues from the acoustic signal to perceive discrete phonetic categories. Spectral cues are spatially encoded in the amplitude of responses in phonetically-tuned neural populations in auditory cortex. It remains unknown whether similar neurophysiological mechanisms encode temporal cues like voice-onset time (VOT), which distinguishes sounds like /b/ and/p/. We used direct brain recordings in humans to investigate the neural encoding of temporal speech cues with a VOT continuum from /ba/ to /pa/. We found that distinct neural populations respond preferentially to VOTs from one phonetic category, and are also sensitive to sub-phonetic VOT differences within a population's preferred category. In a simple neural network model, simulated populations tuned to detect either temporal gaps or coincidences between spectral cues captured encoding patterns observed in real neural data. These results demonstrate that a spatial/amplitude neural code underlies the cortical representation of both spectral and temporal speech cues.
Collapse
Affiliation(s)
- Neal P Fox
- Department of Neurological Surgery, University of California, San FranciscoSan FranciscoUnited States
| | - Matthew Leonard
- Department of Neurological Surgery, University of California, San FranciscoSan FranciscoUnited States
| | - Matthias J Sjerps
- Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, Radboud UniversityNijmegenNetherlands
- Max Planck Institute for PsycholinguisticsNijmegenNetherlands
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San FranciscoSan FranciscoUnited States
- Weill Institute for Neurosciences, University of California, San FranciscoSan FranciscoUnited States
| |
Collapse
|
26
|
Responses to Visual Speech in Human Posterior Superior Temporal Gyrus Examined with iEEG Deconvolution. J Neurosci 2020; 40:6938-6948. [PMID: 32727820 PMCID: PMC7470920 DOI: 10.1523/jneurosci.0279-20.2020] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Revised: 06/01/2020] [Accepted: 06/02/2020] [Indexed: 12/22/2022] Open
Abstract
Experimentalists studying multisensory integration compare neural responses to multisensory stimuli with responses to the component modalities presented in isolation. This procedure is problematic for multisensory speech perception since audiovisual speech and auditory-only speech are easily intelligible but visual-only speech is not. To overcome this confound, we developed intracranial encephalography (iEEG) deconvolution. Individual stimuli always contained both auditory and visual speech, but jittering the onset asynchrony between modalities allowed for the time course of the unisensory responses and the interaction between them to be independently estimated. We applied this procedure to electrodes implanted in human epilepsy patients (both male and female) over the posterior superior temporal gyrus (pSTG), a brain area known to be important for speech perception. iEEG deconvolution revealed sustained positive responses to visual-only speech and larger, phasic responses to auditory-only speech. Confirming results from scalp EEG, responses to audiovisual speech were weaker than responses to auditory-only speech, demonstrating a subadditive multisensory neural computation. Leveraging the spatial resolution of iEEG, we extended these results to show that subadditivity is most pronounced in more posterior aspects of the pSTG. Across electrodes, subadditivity correlated with visual responsiveness, supporting a model in which visual speech enhances the efficiency of auditory speech processing in pSTG. The ability to separate neural processes may make iEEG deconvolution useful for studying a variety of complex cognitive and perceptual tasks.SIGNIFICANCE STATEMENT Understanding speech is one of the most important human abilities. Speech perception uses information from both the auditory and visual modalities. It has been difficult to study neural responses to visual speech because visual-only speech is difficult or impossible to comprehend, unlike auditory-only and audiovisual speech. We used intracranial encephalography deconvolution to overcome this obstacle. We found that visual speech evokes a positive response in the human posterior superior temporal gyrus, enhancing the efficiency of auditory speech processing.
Collapse
|
27
|
Lehet M, Holt LL. Nevertheless, it persists: Dimension-based statistical learning and normalization of speech impact different levels of perceptual processing. Cognition 2020; 202:104328. [PMID: 32502867 DOI: 10.1016/j.cognition.2020.104328] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2018] [Revised: 05/04/2020] [Accepted: 05/13/2020] [Indexed: 11/25/2022]
Abstract
Speech is notoriously variable, with no simple mapping from acoustics to linguistically-meaningful units like words and phonemes. Empirical research on this theoretically central issue establishes at least two classes of perceptual phenomena that accommodate acoustic variability: normalization and perceptual learning. Intriguingly, perceptual learning is supported by learning across acoustic variability, but normalization is thought to counteract acoustic variability leaving open questions about how these two phenomena might interact. Here, we examine the joint impact of normalization and perceptual learning on how acoustic dimensions map to vowel categories. As listeners categorized nonwords as setch or satch, they experienced a shift in short-term distributional regularities across the vowels' acoustic dimensions. Introduction of this 'artificial accent' resulted in a shift in the contribution of vowel duration in categorization. Although this dimension-based statistical learning impacted the influence of vowel duration on vowel categorization, the duration of these very same vowels nonetheless maintained a consistent influence on categorization of a subsequent consonant via duration contrast, a form of normalization. Thus, vowel duration had a duplex role consistent with normalization and perceptual learning operating on distinct levels in the processing hierarchy. We posit that whereas normalization operates across auditory dimensions, dimension-based statistical learning impacts the connection weights among auditory dimensions and phonetic categories.
Collapse
Affiliation(s)
- Matthew Lehet
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA 15232, USA; Center for the Neural Basis of Cognition, Pittsburgh, PA 15232, USA
| | - Lori L Holt
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA 15232, USA; Center for the Neural Basis of Cognition, Pittsburgh, PA 15232, USA; Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15232, USA.
| |
Collapse
|
28
|
Bosker HR, Sjerps MJ, Reinisch E. Spectral contrast effects are modulated by selective attention in "cocktail party" settings. Atten Percept Psychophys 2020; 82:1318-1332. [PMID: 31338824 PMCID: PMC7303055 DOI: 10.3758/s13414-019-01824-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Speech sounds are perceived relative to spectral properties of surrounding speech. For instance, target words that are ambiguous between /bɪt/ (with low F1) and /bɛt/ (with high F1) are more likely to be perceived as "bet" after a "low F1" sentence, but as "bit" after a "high F1" sentence. However, it is unclear how these spectral contrast effects (SCEs) operate in multi-talker listening conditions. Recently, Feng and Oxenham (J.Exp.Psychol.-Hum.Percept.Perform. 44(9), 1447-1457, 2018b) reported that selective attention affected SCEs to a small degree, using two simultaneously presented sentences produced by a single talker. The present study assessed the role of selective attention in more naturalistic "cocktail party" settings, with 200 lexically unique sentences, 20 target words, and different talkers. Results indicate that selective attention to one talker in one ear (while ignoring another talker in the other ear) modulates SCEs in such a way that only the spectral properties of the attended talker influences target perception. However, SCEs were much smaller in multi-talker settings (Experiment 2) than those in single-talker settings (Experiment 1). Therefore, the influence of SCEs on speech comprehension in more naturalistic settings (i.e., with competing talkers) may be smaller than estimated based on studies without competing talkers.
Collapse
Affiliation(s)
- Hans Rutger Bosker
- Max Planck Institute for Psycholinguistics, PO Box 310, 6500 AH, Nijmegen, The Netherlands.
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands.
| | - Matthias J Sjerps
- Max Planck Institute for Psycholinguistics, PO Box 310, 6500 AH, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Eva Reinisch
- Institute of Phonetics and Speech Processing, Ludwig Maximilian University Munich, Munich, Germany
- Institute of General Linguistics, Ludwig Maximilian University Munich, Munich, Germany
| |
Collapse
|
29
|
Stilp CE. Evaluating peripheral versus central contributions to spectral context effects in speech perception. Hear Res 2020; 392:107983. [PMID: 32464456 DOI: 10.1016/j.heares.2020.107983] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Revised: 04/07/2020] [Accepted: 04/28/2020] [Indexed: 11/27/2022]
Abstract
Perception of a sound is influenced by spectral properties of surrounding sounds. When frequencies are absent in a preceding acoustic context before being introduced in a subsequent target sound, detection of those frequencies is facilitated via an auditory enhancement effect (EE). When spectral composition differs across a preceding context and subsequent target sound, those differences are perceptually magnified and perception shifts via a spectral contrast effect (SCE). Each effect is thought to receive contributions from peripheral and central neural processing, but the relative contributions are unclear. The present experiments manipulated ear of presentation to elucidate the degrees to which peripheral and central processes contributed to each effect in speech perception. In Experiment 1, EE and SCE magnitudes in consonant categorization were substantially diminished through contralateral presentation of contexts and targets compared to ipsilateral or bilateral presentations. In Experiment 2, spectrally complementary contexts were presented dichotically followed by the target in only one ear. This arrangement was predicted to produce context effects peripherally and cancel them centrally, but the competing contralateral context minimally decreased effect magnitudes. Results confirm peripheral and central contributions to EEs and SCEs in speech perception, but both effects appear to be primarily due to peripheral processing.
Collapse
Affiliation(s)
- Christian E Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, KY, 40292, USA.
| |
Collapse
|
30
|
Bosker HR, Sjerps MJ, Reinisch E. Temporal contrast effects in human speech perception are immune to selective attention. Sci Rep 2020; 10:5607. [PMID: 32221376 PMCID: PMC7101381 DOI: 10.1038/s41598-020-62613-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Accepted: 03/16/2020] [Indexed: 11/09/2022] Open
Abstract
Two fundamental properties of perception are selective attention and perceptual contrast, but how these two processes interact remains unknown. Does an attended stimulus history exert a larger contrastive influence on the perception of a following target than unattended stimuli? Dutch listeners categorized target sounds with a reduced prefix "ge-" marking tense (e.g., ambiguous between gegaan-gaan "gone-go"). In 'single talker' Experiments 1-2, participants perceived the reduced syllable (reporting gegaan) when the target was heard after a fast sentence, but not after a slow sentence (reporting gaan). In 'selective attention' Experiments 3-5, participants listened to two simultaneous sentences from two different talkers, followed by the same target sounds, with instructions to attend only one of the two talkers. Critically, the speech rates of attended and unattended talkers were found to equally influence target perception - even when participants could watch the attended talker speak. In fact, participants' target perception in 'selective attention' Experiments 3-5 did not differ from participants who were explicitly instructed to divide their attention equally across the two talkers (Experiment 6). This suggests that contrast effects of speech rate are immune to selective attention, largely operating prior to attentional stream segregation in the auditory processing hierarchy.
Collapse
Affiliation(s)
- Hans Rutger Bosker
- Max Planck Institute for Psycholinguistics, PO Box 310, 6500 AH, Nijmegen, The Netherlands.
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands.
| | - Matthias J Sjerps
- Max Planck Institute for Psycholinguistics, PO Box 310, 6500 AH, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands
| | - Eva Reinisch
- Institute of Phonetics and Speech Processing, Ludwig Maximilian University Munich, Munich, Germany
- Acoustics Research Institute, Austrian Academy of Sciences, Vienna, Austria
| |
Collapse
|