1
|
Clarke A, Tyler LK, Marslen-Wilson W. Hearing what is being said: the distributed neural substrate for early speech interpretation. LANGUAGE, COGNITION AND NEUROSCIENCE 2024; 39:1097-1116. [PMID: 39439863 PMCID: PMC11493057 DOI: 10.1080/23273798.2024.2345308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 03/26/2024] [Indexed: 10/25/2024]
Abstract
Speech comprehension is remarkable for the immediacy with which the listener hears what is being said. Here, we focus on the neural underpinnings of this process in isolated spoken words. We analysed source-localised MEG data for nouns using Representational Similarity Analysis to probe the spatiotemporal coordinates of phonology, lexical form, and the semantics of emerging word candidates. Phonological model fit was detectable within 40-50 ms, engaging a bilateral network including superior and middle temporal cortex and extending into anterior temporal and inferior parietal regions. Lexical form emerged within 60-70 ms, and model fit to semantics from 100-110 ms. Strikingly, the majority of vertices in a central core showed model fit to all three dimensions, consistent with a distributed neural substrate for early speech analysis. The early interpretation of speech seems to be conducted in a unified integrative representational space, in conflict with conventional views of a linguistically stratified representational hierarchy.
Collapse
Affiliation(s)
- Alex Clarke
- Department of Psychology, University of Cambridge, Cambridge, UK
| | | | | |
Collapse
|
2
|
Whalen DH. Direct neural coding of speech: Reconsideration of Whalen et al. (2006) (L). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:1704-1706. [PMID: 38426833 PMCID: PMC10908555 DOI: 10.1121/10.0025125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 02/12/2024] [Accepted: 02/12/2024] [Indexed: 03/02/2024]
Abstract
Previous brain imaging results indicated that speech perception proceeded independently of the auditory primitives that are the product of primary auditory cortex [Whalen, Benson, Richardson, Swainson, Clark, Lai, Mencl, Fulbright, Constable, and Liberman (2006). J. Acoust. Soc. Am. 119, 575-581]. Recent evidence using electrocorticography [Hamilton, Oganian, Hall, and Chang (2021). Cell 184, 4626-4639] indicates that there is a more direct connection from subcortical regions to cortical speech regions than previous studies had shown. Although the mechanism differs, the Hamilton, Oganian, Hall, and Chang result supports the original conclusion even more strongly: Speech perception does not rely on the analysis of primitives from auditory analysis. Rather, the speech signal is processed as speech from the beginning.
Collapse
|
3
|
Wang K, Fang Y, Guo Q, Shen L, Chen Q. Superior Attentional Efficiency of Auditory Cue via the Ventral Auditory-thalamic Pathway. J Cogn Neurosci 2024; 36:303-326. [PMID: 38010315 DOI: 10.1162/jocn_a_02090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Auditory commands are often executed more efficiently than visual commands. However, empirical evidence on the underlying behavioral and neural mechanisms remains scarce. In two experiments, we manipulated the delivery modality of informative cues and the prediction violation effect and found consistently enhanced RT benefits for the matched auditory cues compared with the matched visual cues. At the neural level, when the bottom-up perceptual input matched the prior prediction induced by the auditory cue, the auditory-thalamic pathway was significantly activated. Moreover, the stronger the auditory-thalamic connectivity, the higher the behavioral benefits of the matched auditory cue. When the bottom-up input violated the prior prediction induced by the auditory cue, the ventral auditory pathway was specifically involved. Moreover, the stronger the ventral auditory-prefrontal connectivity, the larger the behavioral costs caused by the violation of the auditory cue. In addition, the dorsal frontoparietal network showed a supramodal function in reacting to the violation of informative cues irrespective of the delivery modality of the cue. Taken together, the results reveal novel behavioral and neural evidence that the superior efficiency of the auditory cue is twofold: The auditory-thalamic pathway is associated with improvements in task performance when the bottom-up input matches the auditory cue, whereas the ventral auditory-prefrontal pathway is involved when the auditory cue is violated.
Collapse
Affiliation(s)
- Ke Wang
- South China Normal University, Guangzhou, China
| | - Ying Fang
- South China Normal University, Guangzhou, China
| | - Qiang Guo
- Guangdong Sanjiu Brain Hospital, Guangzhou, China
| | - Lu Shen
- South China Normal University, Guangzhou, China
| | - Qi Chen
- South China Normal University, Guangzhou, China
| |
Collapse
|
4
|
Oganian Y, Bhaya-Grossman I, Johnson K, Chang EF. Vowel and formant representation in the human auditory speech cortex. Neuron 2023; 111:2105-2118.e4. [PMID: 37105171 PMCID: PMC10330593 DOI: 10.1016/j.neuron.2023.04.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 02/08/2023] [Accepted: 04/04/2023] [Indexed: 04/29/2023]
Abstract
Vowels, a fundamental component of human speech across all languages, are cued acoustically by formants, resonance frequencies of the vocal tract shape during speaking. An outstanding question in neurolinguistics is how formants are processed neurally during speech perception. To address this, we collected high-density intracranial recordings from the human speech cortex on the superior temporal gyrus (STG) while participants listened to continuous speech. We found that two-dimensional receptive fields based on the first two formants provided the best characterization of vowel sound representation. Neural activity at single sites was highly selective for zones in this formant space. Furthermore, formant tuning is adjusted dynamically for speaker-specific spectral context. However, the entire population of formant-encoding sites was required to accurately decode single vowels. Overall, our results reveal that complex acoustic tuning in the two-dimensional formant space underlies local vowel representations in STG. As a population code, this gives rise to phonological vowel perception.
Collapse
Affiliation(s)
- Yulia Oganian
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA
| | - Ilina Bhaya-Grossman
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA; University of California, Berkeley-University of California, San Francisco Graduate Program in Bioengineering, Berkeley, CA 94720, USA
| | - Keith Johnson
- Department of Linguistics, University of California, Berkeley, Berkeley, CA, USA
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA.
| |
Collapse
|
5
|
Hullett PW, Kandahari N, Shih TT, Kleen JK, Knowlton RC, Rao VR, Chang EF. Intact speech perception after resection of dominant hemisphere primary auditory cortex for the treatment of medically refractory epilepsy: illustrative case. JOURNAL OF NEUROSURGERY. CASE LESSONS 2022; 4:CASE22417. [PMID: 36443954 PMCID: PMC9705521 DOI: 10.3171/case22417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 10/27/2022] [Indexed: 11/29/2022]
Abstract
BACKGROUND In classic speech network models, the primary auditory cortex is the source of auditory input to Wernicke's area in the posterior superior temporal gyrus (pSTG). Because resection of the primary auditory cortex in the dominant hemisphere removes inputs to the pSTG, there is a risk of speech impairment. However, recent research has shown the existence of other, nonprimary auditory cortex inputs to the pSTG, potentially reducing the risk of primary auditory cortex resection in the dominant hemisphere. OBSERVATIONS Here, the authors present a clinical case of a woman with severe medically refractory epilepsy with a lesional epileptic focus in the left (dominant) Heschl's gyrus. Analysis of neural responses to speech stimuli was consistent with primary auditory cortex localization to Heschl's gyrus. Although the primary auditory cortex was within the proposed resection margins, she underwent lesionectomy with total resection of Heschl's gyrus. Postoperatively, she had no speech deficits and her seizures were fully controlled. LESSONS While resection of the dominant hemisphere Heschl's gyrus/primary auditory cortex warrants caution, this case illustrates the ability to resect the primary auditory cortex without speech impairment and supports recent models of multiple parallel inputs to the pSTG.
Collapse
Affiliation(s)
- Patrick W. Hullett
- Department of Neurology, Weill Institute for Neurosciences, University of California San Francisco, San Francisco, California
| | - Nazineen Kandahari
- Department of Neurosurgery, University of California San Francisco, San Francisco, California; and ,Department of Neurology, Weill Institute for Neurosciences, University of California San Francisco, San Francisco, California
| | - Tina T. Shih
- Department of Neurology, Weill Institute for Neurosciences, University of California San Francisco, San Francisco, California
| | - Jonathan K. Kleen
- Department of Neurology, Weill Institute for Neurosciences, University of California San Francisco, San Francisco, California
| | - Robert C. Knowlton
- Department of Neurology, Weill Institute for Neurosciences, University of California San Francisco, San Francisco, California
| | - Vikram R. Rao
- Department of Neurology, Weill Institute for Neurosciences, University of California San Francisco, San Francisco, California
| | - Edward F. Chang
- Department of Neurosurgery, University of California San Francisco, San Francisco, California; and
| |
Collapse
|
6
|
Winn MB, Wright RA. Reconsidering commonly used stimuli in speech perception experiments. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:1394. [PMID: 36182291 DOI: 10.1121/10.0013415] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Accepted: 07/18/2022] [Indexed: 06/16/2023]
Abstract
This paper examines some commonly used stimuli in speech perception experiments and raises questions about their use, or about the interpretations of previous results. The takeaway messages are: 1) the Hillenbrand vowels represent a particular dialect rather than a gold standard, and English vowels contain spectral dynamics that have been largely underappreciated, 2) the /ɑ/ context is very common but not clearly superior as a context for testing consonant perception, 3) /ɑ/ is particularly problematic when testing voice-onset-time perception because it introduces strong confounds in the formant transitions, 4) /dɑ/ is grossly overrepresented in neurophysiological studies and yet is insufficient as a generalized proxy for "speech perception," and 5) digit tests and matrix sentences including the coordinate response measure are systematically insensitive to important patterns in speech perception. Each of these stimulus sets and concepts is described with careful attention to their unique value and also cases where they might be misunderstood or over-interpreted.
Collapse
Affiliation(s)
- Matthew B Winn
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | - Richard A Wright
- Department of Linguistics, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
7
|
Murphy E, Woolnough O, Rollo PS, Roccaforte ZJ, Segaert K, Hagoort P, Tandon N. Minimal Phrase Composition Revealed by Intracranial Recordings. J Neurosci 2022; 42:3216-3227. [PMID: 35232761 PMCID: PMC8994536 DOI: 10.1523/jneurosci.1575-21.2022] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 01/11/2022] [Accepted: 01/18/2022] [Indexed: 11/21/2022] Open
Abstract
The ability to comprehend phrases is an essential integrative property of the brain. Here, we evaluate the neural processes that enable the transition from single-word processing to a minimal compositional scheme. Previous research has reported conflicting timing effects of composition, and disagreement persists with respect to inferior frontal and posterior temporal contributions. To address these issues, 19 patients (10 male, 9 female) implanted with penetrating depth or surface subdural intracranial electrodes, heard auditory recordings of adjective-noun, pseudoword-noun, and adjective-pseudoword phrases and judged whether the phrase matched a picture. Stimulus-dependent alterations in broadband gamma activity, low-frequency power, and phase-locking values across the language-dominant left hemisphere were derived. This revealed a mosaic located on the lower bank of the posterior superior temporal sulcus (pSTS), in which closely neighboring cortical sites displayed exclusive sensitivity to either lexicality or phrase structure, but not both. Distinct timings were found for effects of phrase composition (210-300 ms) and pseudoword processing (∼300-700 ms), and these were localized to neighboring electrodes in pSTS. The pars triangularis and temporal pole encoded anticipation of composition in broadband low frequencies, and both regions exhibited greater functional connectivity with pSTS during phrase composition. Our results suggest that the pSTS is a highly specialized region composed of sparsely interwoven heterogeneous constituents that encodes both lower and higher level linguistic features. This hub in pSTS for minimal phrase processing may form the neural basis for the human-specific computational capacity for forming hierarchically organized linguistic structures.SIGNIFICANCE STATEMENT Linguists have claimed that the integration of multiple words into a phrase demands a computational procedure distinct from single-word processing. Here, we provide intracranial recordings from a large patient cohort, with high spatiotemporal resolution, to track the cortical dynamics of phrase composition. Epileptic patients volunteered to participate in a task in which they listened to phrases (red boat), word-pseudoword or pseudoword-word pairs (e.g., red fulg). At the onset of the second word in phrases, greater broadband high gamma activity was found in posterior superior temporal sulcus in electrodes that exclusively indexed phrasal meaning and not lexical meaning. These results provide direct, high-resolution signatures of minimal phrase composition in humans, a potentially species-specific computational capacity.
Collapse
Affiliation(s)
- Elliot Murphy
- Vivian L. Smith Department of Neurosurgery, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, Texas 77030
- Texas Institute for Restorative Neurotechnologies, University of Texas Health Science Center at Houston, Houston, Texas 77030
| | - Oscar Woolnough
- Vivian L. Smith Department of Neurosurgery, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, Texas 77030
- Texas Institute for Restorative Neurotechnologies, University of Texas Health Science Center at Houston, Houston, Texas 77030
| | - Patrick S Rollo
- Vivian L. Smith Department of Neurosurgery, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, Texas 77030
- Texas Institute for Restorative Neurotechnologies, University of Texas Health Science Center at Houston, Houston, Texas 77030
| | - Zachary J Roccaforte
- Vivian L. Smith Department of Neurosurgery, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, Texas 77030
| | - Katrien Segaert
- School of Psychology and Centre for Human Brain Health, University of Birmingham, Birmingham B15 2TT, United Kingdom
- Max Planck Institute for Psycholinguistics, Nijmegen, 6525 XD Nijmegen, The Netherlands
| | - Peter Hagoort
- Max Planck Institute for Psycholinguistics, Nijmegen, 6525 XD Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen, 6525 HR Nijmegen, The Netherlands
| | - Nitin Tandon
- Vivian L. Smith Department of Neurosurgery, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, Texas 77030
- Texas Institute for Restorative Neurotechnologies, University of Texas Health Science Center at Houston, Houston, Texas 77030
- Memorial Hermann Hospital, Texas Medical Center, Houston, Texas 77030
| |
Collapse
|
8
|
Monahan PJ, Schertz J, Fu Z, Pérez A. Unified Coding of Spectral and Temporal Phonetic Cues: Electrophysiological Evidence for Abstract Phonological Features. J Cogn Neurosci 2022; 34:618-638. [DOI: 10.1162/jocn_a_01817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Abstract
Spoken word recognition models and phonological theory propose that abstract features play a central role in speech processing. It remains unknown, however, whether auditory cortex encodes linguistic features in a manner beyond the phonetic properties of the speech sounds themselves. We took advantage of the fact that English phonology functionally codes stops and fricatives as voiced or voiceless with two distinct phonetic cues: Fricatives use a spectral cue, whereas stops use a temporal cue. Evidence that these cues can be grouped together would indicate the disjunctive coding of distinct phonetic cues into a functionally defined abstract phonological feature. In English, the voicing feature, which distinguishes the consonants [s] and [t] from [z] and [d], respectively, is hypothesized to be specified only for voiceless consonants (e.g., [s t]). Here, participants listened to syllables in a many-to-one oddball design, while their EEG was recorded. In one block, both voiceless stops and fricatives were the standards. In the other block, both voiced stops and fricatives were the standards. A critical design element was the presence of intercategory variation within the standards. Therefore, a many-to-one relationship, which is necessary to elicit an MMN, existed only if the stop and fricative standards were grouped together. In addition to the ERPs, event-related spectral power was also analyzed. Results showed an MMN effect in the voiceless standards block—an asymmetric MMN—in a time window consistent with processing in auditory cortex, as well as increased prestimulus beta-band oscillatory power to voiceless standards. These findings suggest that (i) there is an auditory memory trace of the standards based on the shared (voiceless) feature, which is only functionally defined; (ii) voiced consonants are underspecified; and (iii) features can serve as a basis for predictive processing. Taken together, these results point toward auditory cortex's ability to functionally code distinct phonetic cues together and suggest that abstract features can be used to parse the continuous acoustic signal.
Collapse
Affiliation(s)
| | | | - Zhanao Fu
- Cambridge University, United Kingdom
| | - Alejandro Pérez
- University of Toronto Scarborough, Ontario, Canada
- Cambridge University, United Kingdom
| |
Collapse
|
9
|
Abstract
Human speech perception results from neural computations that transform external acoustic speech signals into internal representations of words. The superior temporal gyrus (STG) contains the nonprimary auditory cortex and is a critical locus for phonological processing. Here, we describe how speech sound representation in the STG relies on fundamentally nonlinear and dynamical processes, such as categorization, normalization, contextual restoration, and the extraction of temporal structure. A spatial mosaic of local cortical sites on the STG exhibits complex auditory encoding for distinct acoustic-phonetic and prosodic features. We propose that as a population ensemble, these distributed patterns of neural activity give rise to abstract, higher-order phonemic and syllabic representations that support speech perception. This review presents a multi-scale, recurrent model of phonological processing in the STG, highlighting the critical interface between auditory and language systems.
Collapse
Affiliation(s)
- Ilina Bhaya-Grossman
- Department of Neurological Surgery, University of California, San Francisco, California 94143, USA;
- Joint Graduate Program in Bioengineering, University of California, Berkeley and San Francisco, California 94720, USA
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, California 94143, USA;
| |
Collapse
|
10
|
Fox NP, Leonard M, Sjerps MJ, Chang EF. Transformation of a temporal speech cue to a spatial neural code in human auditory cortex. eLife 2020; 9:e53051. [PMID: 32840483 PMCID: PMC7556862 DOI: 10.7554/elife.53051] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Accepted: 08/21/2020] [Indexed: 11/28/2022] Open
Abstract
In speech, listeners extract continuously-varying spectrotemporal cues from the acoustic signal to perceive discrete phonetic categories. Spectral cues are spatially encoded in the amplitude of responses in phonetically-tuned neural populations in auditory cortex. It remains unknown whether similar neurophysiological mechanisms encode temporal cues like voice-onset time (VOT), which distinguishes sounds like /b/ and/p/. We used direct brain recordings in humans to investigate the neural encoding of temporal speech cues with a VOT continuum from /ba/ to /pa/. We found that distinct neural populations respond preferentially to VOTs from one phonetic category, and are also sensitive to sub-phonetic VOT differences within a population's preferred category. In a simple neural network model, simulated populations tuned to detect either temporal gaps or coincidences between spectral cues captured encoding patterns observed in real neural data. These results demonstrate that a spatial/amplitude neural code underlies the cortical representation of both spectral and temporal speech cues.
Collapse
Affiliation(s)
- Neal P Fox
- Department of Neurological Surgery, University of California, San FranciscoSan FranciscoUnited States
| | - Matthew Leonard
- Department of Neurological Surgery, University of California, San FranciscoSan FranciscoUnited States
| | - Matthias J Sjerps
- Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, Radboud UniversityNijmegenNetherlands
- Max Planck Institute for PsycholinguisticsNijmegenNetherlands
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San FranciscoSan FranciscoUnited States
- Weill Institute for Neurosciences, University of California, San FranciscoSan FranciscoUnited States
| |
Collapse
|