1
|
Lankinen K, Ahveninen J, Uluç I, Daneshzand M, Mareyam A, Kirsch JE, Polimeni JR, Healy BC, Tian Q, Khan S, Nummenmaa A, Wang QM, Green JR, Kimberley TJ, Li S. Role of articulatory motor networks in perceptual categorization of speech signals: a 7T fMRI study. Cereb Cortex 2023; 33:11517-11525. [PMID: 37851854 PMCID: PMC10724868 DOI: 10.1093/cercor/bhad384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 09/28/2023] [Accepted: 09/29/2023] [Indexed: 10/20/2023] Open
Abstract
Speech and language processing involve complex interactions between cortical areas necessary for articulatory movements and auditory perception and a range of areas through which these are connected and interact. Despite their fundamental importance, the precise mechanisms underlying these processes are not fully elucidated. We measured BOLD signals from normal hearing participants using high-field 7 Tesla fMRI with 1-mm isotropic voxel resolution. The subjects performed 2 speech perception tasks (discrimination and classification) and a speech production task during the scan. By employing univariate and multivariate pattern analyses, we identified the neural signatures associated with speech production and perception. The left precentral, premotor, and inferior frontal cortex regions showed significant activations that correlated with phoneme category variability during perceptual discrimination tasks. In addition, the perceived sound categories could be decoded from signals in a region of interest defined based on activation related to production task. The results support the hypothesis that articulatory motor networks in the left hemisphere, typically associated with speech production, may also play a critical role in the perceptual categorization of syllables. The study provides valuable insights into the intricate neural mechanisms that underlie speech processing.
Collapse
Affiliation(s)
- Kaisu Lankinen
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Charlestown, MA 02129, United States
- Harvard Medical School, Boston, MA 02115, United States
| | - Jyrki Ahveninen
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Charlestown, MA 02129, United States
- Harvard Medical School, Boston, MA 02115, United States
| | - Işıl Uluç
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Charlestown, MA 02129, United States
- Harvard Medical School, Boston, MA 02115, United States
| | - Mohammad Daneshzand
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Charlestown, MA 02129, United States
- Harvard Medical School, Boston, MA 02115, United States
| | - Azma Mareyam
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Charlestown, MA 02129, United States
| | - John E Kirsch
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Charlestown, MA 02129, United States
- Harvard Medical School, Boston, MA 02115, United States
| | - Jonathan R Polimeni
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Charlestown, MA 02129, United States
- Harvard Medical School, Boston, MA 02115, United States
| | - Brian C Healy
- Partners Multiple Sclerosis Center, Brigham and Women's Hospital, Boston, MA 02115, United States
- Department of Neurology, Harvard Medical School, Boston, MA 02115, United States
- Biostatistics Center, Massachusetts General Hospital, Boston, MA 02114, United States
| | - Qiyuan Tian
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Charlestown, MA 02129, United States
- Harvard Medical School, Boston, MA 02115, United States
| | - Sheraz Khan
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Charlestown, MA 02129, United States
- Harvard Medical School, Boston, MA 02115, United States
| | - Aapo Nummenmaa
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Charlestown, MA 02129, United States
- Harvard Medical School, Boston, MA 02115, United States
| | - Qing Mei Wang
- Stroke Biological Recovery Laboratory, Spaulding Rehabilitation Hospital, The Teaching Affiliate of Harvard Medical School, Charlestown, MA 02129, United States
| | - Jordan R Green
- Department of Communication Sciences and Disorders, MGH Institute of Health Professions, Boston, MA 02129, United States
| | - Teresa J Kimberley
- Department of Physical Therapy, School of Health and Rehabilitation Sciences, MGH Institute of Health Professions, Boston, MA 02129, United States
| | - Shasha Li
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Charlestown, MA 02129, United States
- Harvard Medical School, Boston, MA 02115, United States
| |
Collapse
|
2
|
Lankinen K, Ahveninen J, Uluç I, Daneshzand M, Mareyam A, Kirsch JE, Polimeni JR, Healy BC, Tian Q, Khan S, Nummenmaa A, Wang QM, Green JR, Kimberley TJ, Li S. Role of Articulatory Motor Networks in Perceptual Categorization of Speech Signals: A 7 T fMRI Study. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.02.547409. [PMID: 37461673 PMCID: PMC10349975 DOI: 10.1101/2023.07.02.547409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/24/2023]
Abstract
BACKGROUND The association between brain regions involved in speech production and those that play a role in speech perception is not yet fully understood. We compared speech production related brain activity with activations resulting from perceptual categorization of syllables using high field 7 Tesla functional magnetic resonance imaging (fMRI) at 1-mm isotropic voxel resolution, enabling high localization accuracy compared to previous studies. METHODS Blood oxygenation level dependent (BOLD) signals were obtained in 20 normal hearing subjects using a simultaneous multi-slice (SMS) 7T echo-planar imaging (EPI) acquisition with whole-head coverage and 1 mm isotropic resolution. In a speech production localizer task, subjects were asked to produce a silent lip-round vowel /u/ in response to the visual cue "U" or purse their lips when they saw the cue "P". In a phoneme discrimination task, subjects were presented with pairs of syllables, which were equiprobably identical or different along an 8-step continuum between the prototypic /ba/ and /da/ sounds. After the presentation of each stimulus pair, the subjects were asked to indicate whether the two syllables they heard were identical or different by pressing one of two buttons. In a phoneme classification task, the subjects heard only one syllable and asked to indicate whether it was /ba/ or /da/. RESULTS Univariate fMRI analyses using a parametric modulation approach suggested that left motor, premotor, and frontal cortex BOLD activations correlate with phoneme category variability in the /ba/-/da/ discrimination task. In contrast, the variability related to acoustic features of the phonemes were the highest in the right primary auditory cortex. Our multivariate pattern analysis (MVPA) suggested that left precentral/inferior frontal cortex areas, which were associated with speech production according to the localizer task, play a role also in perceptual categorization of the syllables. CONCLUSIONS The results support the hypothesis that articulatory motor networks in the left hemisphere that are activated during speech production could also have a role in perceptual categorization of syllables. Importantly, high voxel-resolution combined with advanced coil technology allowed us to pinpoint the exact brain regions involved in both perception and production tasks.
Collapse
Affiliation(s)
- Kaisu Lankinen
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA, US
- Harvard Medical School, Boston, MA, US
| | - Jyrki Ahveninen
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA, US
- Harvard Medical School, Boston, MA, US
| | - Işıl Uluç
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA, US
- Harvard Medical School, Boston, MA, US
| | - Mohammad Daneshzand
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA, US
- Harvard Medical School, Boston, MA, US
| | - Azma Mareyam
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA, US
| | - John E. Kirsch
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA, US
- Harvard Medical School, Boston, MA, US
| | - Jonathan R. Polimeni
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA, US
- Harvard Medical School, Boston, MA, US
| | - Brian C. Healy
- Harvard Medical School, Boston, MA, US
- Stroke Biological Recovery Laboratory, Spaulding Rehabilitation Hospital, the teaching affiliate of Harvard Medical School, Charlestown, MA, US
| | - Qiyuan Tian
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA, US
- Harvard Medical School, Boston, MA, US
| | - Sheraz Khan
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA, US
- Harvard Medical School, Boston, MA, US
| | - Aapo Nummenmaa
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA, US
- Harvard Medical School, Boston, MA, US
| | - Qing-mei Wang
- Stroke Biological Recovery Laboratory, Spaulding Rehabilitation Hospital, the teaching affiliate of Harvard Medical School, Charlestown, MA, US
| | - Jordan R. Green
- Department of Communication Sciences and Disorders, MGH Institute of Health Professions Boston, MA, US
| | - Teresa J. Kimberley
- Department of Physical Therapy, School of Health and Rehabilitation Sciences, MGH Institute of Health Professions, Boston, MA, US
| | - Shasha Li
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA, US
- Harvard Medical School, Boston, MA, US
| |
Collapse
|
3
|
Ostrowski LM, Chinappen DM, Stoyell SM, Song DY, Ross EE, Kramer MA, Emerton BC, Chu CJ. Children with Rolandic epilepsy have micro- and macrostructural abnormalities in white matter constituting networks necessary for language function. Epilepsy Behav 2023; 144:109254. [PMID: 37209552 PMCID: PMC10330597 DOI: 10.1016/j.yebeh.2023.109254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 05/04/2023] [Accepted: 05/05/2023] [Indexed: 05/22/2023]
Abstract
INTRODUCTION Self-limited epilepsy with centrotemporal spikes is a transient developmental epilepsy with a seizure onset zone localized to the centrotemporal cortex that commonly impacts aspects of language function. To better understand the relationship between these anatomical findings and symptoms, we characterized the language profile and white matter microstructural and macrostructural features in a cohort of children with SeLECTS. METHODS Children with active SeLECTS (n = 13), resolved SeLECTS (n = 12), and controls (n = 17) underwent high-resolution MRIs including diffusion tensor imaging sequences and multiple standardized neuropsychological measures of language function. We identified the superficial white matter abutting the inferior rolandic cortex and superior temporal gyrus using a cortical parcellation atlas and derived the arcuate fasciculus connecting them using probabilistic tractography. We compared white matter microstructural characteristics (axial, radial and mean diffusivity, and fractional anisotropy) between groups in each region, and tested for linear relationships between diffusivity metrics in these regions and language scores on neuropsychological testing. RESULTS We found significant differences in several language modalities in children with SeLECTS compared to controls. Children with SeLECTS performed worse on assessments of phonological awareness (p = 0.045) and verbal comprehension (p = 0.050). Reduced performance was more pronounced in children with active SeLECTS compared to controls, namely, phonological awareness (p = 0.028), verbal comprehension (p = 0.028), and verbal category fluency (p = 0.031), with trends toward worse performance also observed in verbal letter fluency (p = 0.052), and the expressive one-word picture vocabulary test (p = 0.068). Children with active SeLECTS perform worse than children with SeLECTS in remission on tests of verbal category fluency (p = 0.009), verbal letter fluency (p = 0.006), and the expressive one-word picture vocabulary test (p = 0.045). We also found abnormal superficial white matter microstructure in centrotemporal ROIs in children with SeLECTS, characterized by increased diffusivity and fractional anisotropy compared to controls (AD p = 0.014, RD p = 0.028, MD p = 0.020, and FA p = 0.024). Structural connectivity of the arcuate fasciculus connecting perisylvian cortical regions was lower in children with SeLECTS (p = 0.045), and in the arcuate fasciculus children with SeLECTS had increased diffusivity (AD p = 0.007, RD p = 0.006, MD p = 0.016), with no difference in fractional anisotropy (p = 0.22). However, linear tests comparing white matter microstructure in areas constituting language networks and language performance did not withstand correction for multiple comparisons in this sample, although a trend was seen between FA in the arcuate fasciculus and verbal category fluency (p = 0.047) and the expressive one-word picture vocabulary test (p = 0.036). CONCLUSION We found impaired language development in children with SeLECTS, particularly in those with active SeLECTS, as well as abnormalities in the superficial centrotemporal white matter as well as the fibers connecting these regions, the arcuate fasciculus. Although relationships between language performance and white matter abnormalities did not pass correction for multiple comparisons, taken together, these results provide evidence of atypical white matter maturation in fibers involved in language processing, which may contribute to the aspects of language function that are commonly affected by the disorder.
Collapse
Affiliation(s)
- Lauren M Ostrowski
- Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, USA.
| | - Dhinakaran M Chinappen
- Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Mathematics and Statistics, Boston University, Boston, MA 02215, USA
| | - Sally M Stoyell
- Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Daniel Y Song
- Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Erin E Ross
- Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Mark A Kramer
- Department of Mathematics and Statistics, Boston University, Boston, MA 02215, USA
| | - Britt C Emerton
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA 02114, USA; Harvard Medical School, Boston, MA 02115, USA
| | - Catherine J Chu
- Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, USA; Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
4
|
Raghavan VS, O’Sullivan J, Bickel S, Mehta AD, Mesgarani N. Distinct neural encoding of glimpsed and masked speech in multitalker situations. PLoS Biol 2023; 21:e3002128. [PMID: 37279203 PMCID: PMC10243639 DOI: 10.1371/journal.pbio.3002128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 04/19/2023] [Indexed: 06/08/2023] Open
Abstract
Humans can easily tune in to one talker in a multitalker environment while still picking up bits of background speech; however, it remains unclear how we perceive speech that is masked and to what degree non-target speech is processed. Some models suggest that perception can be achieved through glimpses, which are spectrotemporal regions where a talker has more energy than the background. Other models, however, require the recovery of the masked regions. To clarify this issue, we directly recorded from primary and non-primary auditory cortex (AC) in neurosurgical patients as they attended to one talker in multitalker speech and trained temporal response function models to predict high-gamma neural activity from glimpsed and masked stimulus features. We found that glimpsed speech is encoded at the level of phonetic features for target and non-target talkers, with enhanced encoding of target speech in non-primary AC. In contrast, encoding of masked phonetic features was found only for the target, with a greater response latency and distinct anatomical organization compared to glimpsed phonetic features. These findings suggest separate mechanisms for encoding glimpsed and masked speech and provide neural evidence for the glimpsing model of speech perception.
Collapse
Affiliation(s)
- Vinay S Raghavan
- Department of Electrical Engineering, Columbia University, New York, New York, United States of America
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York, United States of America
| | - James O’Sullivan
- Department of Electrical Engineering, Columbia University, New York, New York, United States of America
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York, United States of America
| | - Stephan Bickel
- The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, New York, United States of America
- Department of Neurosurgery, Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, United States of America
- Department of Neurology, Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, United States of America
| | - Ashesh D. Mehta
- The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, New York, United States of America
- Department of Neurosurgery, Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, United States of America
| | - Nima Mesgarani
- Department of Electrical Engineering, Columbia University, New York, New York, United States of America
- Zuckerman Mind Brain Behavior Institute, Columbia University, New York, New York, United States of America
| |
Collapse
|
5
|
Luthra S, Magnuson JS, Myers EB. Right Posterior Temporal Cortex Supports Integration of Phonetic and Talker Information. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2023; 4:145-177. [PMID: 37229142 PMCID: PMC10205075 DOI: 10.1162/nol_a_00091] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Accepted: 11/08/2022] [Indexed: 05/27/2023]
Abstract
Though the right hemisphere has been implicated in talker processing, it is thought to play a minimal role in phonetic processing, at least relative to the left hemisphere. Recent evidence suggests that the right posterior temporal cortex may support learning of phonetic variation associated with a specific talker. In the current study, listeners heard a male talker and a female talker, one of whom produced an ambiguous fricative in /s/-biased lexical contexts (e.g., epi?ode) and one who produced it in /∫/-biased contexts (e.g., friend?ip). Listeners in a behavioral experiment (Experiment 1) showed evidence of lexically guided perceptual learning, categorizing ambiguous fricatives in line with their previous experience. Listeners in an fMRI experiment (Experiment 2) showed differential phonetic categorization as a function of talker, allowing for an investigation of the neural basis of talker-specific phonetic processing, though they did not exhibit perceptual learning (likely due to characteristics of our in-scanner headphones). Searchlight analyses revealed that the patterns of activation in the right superior temporal sulcus (STS) contained information about who was talking and what phoneme they produced. We take this as evidence that talker information and phonetic information are integrated in the right STS. Functional connectivity analyses suggested that the process of conditioning phonetic identity on talker information depends on the coordinated activity of a left-lateralized phonetic processing system and a right-lateralized talker processing system. Overall, these results clarify the mechanisms through which the right hemisphere supports talker-specific phonetic processing.
Collapse
Affiliation(s)
- Sahil Luthra
- Department of Psychological Sciences, University of Connecticut, Storrs, CT, USA
| | - James S. Magnuson
- Department of Psychological Sciences, University of Connecticut, Storrs, CT, USA
- Basque Center on Cognition Brain and Language (BCBL), Donostia-San Sebastián, Spain
- Ikerbasque, Basque Foundation for Science, Bilbao, Spain
| | - Emily B. Myers
- Department of Psychological Sciences, University of Connecticut, Storrs, CT, USA
- Speech, Language, and Hearing Sciences, University of Connecticut, Storrs, CT, USA
| |
Collapse
|
6
|
Zoefel B, Gilbert RA, Davis MH. Intelligibility improves perception of timing changes in speech. PLoS One 2023; 18:e0279024. [PMID: 36634109 PMCID: PMC9836318 DOI: 10.1371/journal.pone.0279024] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 11/28/2022] [Indexed: 01/13/2023] Open
Abstract
Auditory rhythms are ubiquitous in music, speech, and other everyday sounds. Yet, it is unclear how perceived rhythms arise from the repeating structure of sounds. For speech, it is unclear whether rhythm is solely derived from acoustic properties (e.g., rapid amplitude changes), or if it is also influenced by the linguistic units (syllables, words, etc.) that listeners extract from intelligible speech. Here, we present three experiments in which participants were asked to detect an irregularity in rhythmically spoken speech sequences. In each experiment, we reduce the number of possible stimulus properties that differ between intelligible and unintelligible speech sounds and show that these acoustically-matched intelligibility conditions nonetheless lead to differences in rhythm perception. In Experiment 1, we replicate a previous study showing that rhythm perception is improved for intelligible (16-channel vocoded) as compared to unintelligible (1-channel vocoded) speech-despite near-identical broadband amplitude modulations. In Experiment 2, we use spectrally-rotated 16-channel speech to show the effect of intelligibility cannot be explained by differences in spectral complexity. In Experiment 3, we compare rhythm perception for sine-wave speech signals when they are heard as non-speech (for naïve listeners), and subsequent to training, when identical sounds are perceived as speech. In all cases, detection of rhythmic regularity is enhanced when participants perceive the stimulus as speech compared to when they do not. Together, these findings demonstrate that intelligibility enhances the perception of timing changes in speech, which is hence linked to processes that extract abstract linguistic units from sound.
Collapse
Affiliation(s)
- Benedikt Zoefel
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
- Centre National de la Recherche Scientifique (CNRS), Centre de Recherche Cerveau et Cognition (CerCo), Toulouse, France
- Université de Toulouse III Paul Sabatier, Toulouse, France
| | - Rebecca A. Gilbert
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
| | - Matthew H. Davis
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
7
|
Wingfield C, Zhang C, Devereux B, Fonteneau E, Thwaites A, Liu X, Woodland P, Marslen-Wilson W, Su L. On the similarities of representations in artificial and brain neural networks for speech recognition. Front Comput Neurosci 2022; 16:1057439. [PMID: 36618270 PMCID: PMC9811675 DOI: 10.3389/fncom.2022.1057439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 11/29/2022] [Indexed: 12/24/2022] Open
Abstract
Introduction In recent years, machines powered by deep learning have achieved near-human levels of performance in speech recognition. The fields of artificial intelligence and cognitive neuroscience have finally reached a similar level of performance, despite their huge differences in implementation, and so deep learning models can-in principle-serve as candidates for mechanistic models of the human auditory system. Methods Utilizing high-performance automatic speech recognition systems, and advanced non-invasive human neuroimaging technology such as magnetoencephalography and multivariate pattern-information analysis, the current study aimed to relate machine-learned representations of speech to recorded human brain representations of the same speech. Results In one direction, we found a quasi-hierarchical functional organization in human auditory cortex qualitatively matched with the hidden layers of deep artificial neural networks trained as part of an automatic speech recognizer. In the reverse direction, we modified the hidden layer organization of the artificial neural network based on neural activation patterns in human brains. The result was a substantial improvement in word recognition accuracy and learned speech representations. Discussion We have demonstrated that artificial and brain neural networks can be mutually informative in the domain of speech recognition.
Collapse
Affiliation(s)
- Cai Wingfield
- Department of Psychology, Lancaster University, Lancaster, United Kingdom
| | - Chao Zhang
- Department of Engineering, University of Cambridge, Cambridge, United Kingdom
| | - Barry Devereux
- School of Electronics, Electrical Engineering and Computer Science, Queens University Belfast, Belfast, United Kingdom
| | - Elisabeth Fonteneau
- Department of Psychology, University Paul Valéry Montpellier, Montpellier, France
| | - Andrew Thwaites
- Department of Psychology, University of Cambridge, Cambridge, United Kingdom
| | - Xunying Liu
- Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
| | - Phil Woodland
- Department of Engineering, University of Cambridge, Cambridge, United Kingdom
| | | | - Li Su
- Department of Neuroscience, Neuroscience Institute, Insigneo Institute for in silico Medicine, University of Sheffield, Sheffield, United Kingdom,Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom,*Correspondence: Li Su
| |
Collapse
|
8
|
Franken MK, Liu BC, Ostry DJ. Towards a somatosensory theory of speech perception. J Neurophysiol 2022; 128:1683-1695. [PMID: 36416451 PMCID: PMC9762980 DOI: 10.1152/jn.00381.2022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2022] [Revised: 11/19/2022] [Accepted: 11/19/2022] [Indexed: 11/24/2022] Open
Abstract
Speech perception is known to be a multimodal process, relying not only on auditory input but also on the visual system and possibly on the motor system as well. To date there has been little work on the potential involvement of the somatosensory system in speech perception. In the present review, we identify the somatosensory system as another contributor to speech perception. First, we argue that evidence in favor of a motor contribution to speech perception can just as easily be interpreted as showing somatosensory involvement. Second, physiological and neuroanatomical evidence for auditory-somatosensory interactions across the auditory hierarchy indicates the availability of a neural infrastructure that supports somatosensory involvement in auditory processing in general. Third, there is accumulating evidence for somatosensory involvement in the context of speech specifically. In particular, tactile stimulation modifies speech perception, and speech auditory input elicits activity in somatosensory cortical areas. Moreover, speech sounds can be decoded from activity in somatosensory cortex; lesions to this region affect perception, and vowels can be identified based on somatic input alone. We suggest that the somatosensory involvement in speech perception derives from the somatosensory-auditory pairing that occurs during speech production and learning. By bringing together findings from a set of studies that have not been previously linked, the present article identifies the somatosensory system as a presently unrecognized contributor to speech perception.
Collapse
Affiliation(s)
| | | | - David J Ostry
- McGill University, Montreal, Quebec, Canada
- Haskins Laboratories, New Haven, Connecticut
| |
Collapse
|
9
|
Beeraka NM, Nikolenko VN, Khaidarovich ZF, Valikovna OM, Aliagayevna RN, Arturovna ZL, Alexandrovich KA, Mikhaleva LM, Sinelnikov MY. Recent Investigations on the Functional Role of Cerebellar Neural Networks in Motor Functions & Nonmotor Functions -Neurodegeneration. Curr Neuropharmacol 2022; 20:1865-1878. [PMID: 35272590 PMCID: PMC9886798 DOI: 10.2174/1570159x20666220310121441] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 01/11/2022] [Accepted: 03/06/2022] [Indexed: 11/22/2022] Open
Abstract
The cerebellum is a well-established primary brain center in charge of controlling sensorimotor functions and non-motor functions. Recent reports depicted the significance of cerebellum in higher-order cognitive functions, including emotion-processing, language, reward-related behavior, working memory, and social behavior. As it can influence diverse behavioral patterns, any defects in cerebellar functions could invoke neuropsychiatric diseases as indicated by the incidence of alexithymia and induce alterations in emotional and behavioral patterns. Furthermore, its defects can trigger motor diseases, such as ataxia and Parkinson's disease (PD). In this review, we have extensively discussed the role of cerebellum in motor and non-motor functions and how the cerebellum malfunctions in relation to the neural circuit wiring as it could impact brain function and behavioral outcomes in patients with neuropsychiatric diseases. Relevant data regarding cerebellar non-motor functions have been vividly described, along with anatomy and physiology of these functions. In addition to the defects in basal ganglia, the lack of activity in motor related regions of the cerebellum could be associated with the severity of motor symptoms. All together, this review delineates the importance of cerebellar involvement in patients with PD and unravels a crucial link for various clinical aspects of PD with specific cerebellar sub-regions.
Collapse
Affiliation(s)
| | - Vladimir N. Nikolenko
- Address correspondence to these authors at the Department of Human Anatomy,I. M. Sechenov First Moscow State Medical University of the Ministry of Health of the Russian Federation (Sechenov University), Moscow, Russia; Department of Human Anatomy, I. M. Sechenov First Moscow State Medical University of the Ministry of Health of the Russian Federation (Sechenov University), Moscow, Russia; E-mail:
| | | | | | | | | | | | | | - Mikhail Y. Sinelnikov
- Address correspondence to these authors at the Department of Human Anatomy,I. M. Sechenov First Moscow State Medical University of the Ministry of Health of the Russian Federation (Sechenov University), Moscow, Russia; Department of Human Anatomy, I. M. Sechenov First Moscow State Medical University of the Ministry of Health of the Russian Federation (Sechenov University), Moscow, Russia; E-mail:
| |
Collapse
|
10
|
Grisoni L, Pulvermüller F. Predictive and perceptual phonemic processing in articulatory motor areas: A prediction potential & mismatch negativity study. Cortex 2022; 155:357-372. [DOI: 10.1016/j.cortex.2022.06.017] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 03/29/2022] [Accepted: 06/27/2022] [Indexed: 11/15/2022]
|
11
|
Cucu MO, Kazanina N, Houghton C. Syllable-Initial Phonemes Affect Neural Entrainment to Consonant-Vowel Syllables. Front Neurosci 2022; 16:826105. [PMID: 35774556 PMCID: PMC9237462 DOI: 10.3389/fnins.2022.826105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Accepted: 05/10/2021] [Indexed: 11/23/2022] Open
Abstract
Neural entrainment to speech appears to rely on syllabic features, especially those pertaining to the acoustic envelope of the stimuli. It has been proposed that the neural tracking of speech depends on the phoneme features. In the present electroencephalography experiment, we examined data from 25 participants to investigate neural entrainment to near-isochronous stimuli comprising syllables beginning with different phonemes. We measured the inter-trial phase coherence of neural responses to these stimuli and assessed the relationship between this coherence and acoustic properties of the stimuli designed to quantify their “edginess.” We found that entrainment was different across different classes of the syllable-initial phoneme and that entrainment depended on the amount of “edge” in the sound envelope. In particular, the best edge marker and predictor of entrainment was the latency of the maximum derivative of each syllable.
Collapse
Affiliation(s)
- M. Oana Cucu
- Department of Computer Science, University of Bristol, Bristol, United Kingdom
- School of Psychological Sciences, University of Bristol, Bristol, United Kingdom
- *Correspondence: M. Oana Cucu
| | - Nina Kazanina
- School of Psychological Sciences, University of Bristol, Bristol, United Kingdom
- International Laboratory of Social Neurobiology, Institute for Cognitive Neuroscience, National Research University Higher School of Economics, HSE University, Moscow, Russia
| | - Conor Houghton
- Department of Computer Science, University of Bristol, Bristol, United Kingdom
| |
Collapse
|
12
|
Preisig BC, Riecke L, Hervais-Adelman A. Speech sound categorization: The contribution of non-auditory and auditory cortical regions. Neuroimage 2022; 258:119375. [PMID: 35700949 DOI: 10.1016/j.neuroimage.2022.119375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 05/13/2022] [Accepted: 06/10/2022] [Indexed: 11/26/2022] Open
Abstract
Which processes in the human brain lead to the categorical perception of speech sounds? Investigation of this question is hampered by the fact that categorical speech perception is normally confounded by acoustic differences in the stimulus. By using ambiguous sounds, however, it is possible to dissociate acoustic from perceptual stimulus representations. Twenty-seven normally hearing individuals took part in an fMRI study in which they were presented with an ambiguous syllable (intermediate between /da/ and /ga/) in one ear and with disambiguating acoustic feature (third formant, F3) in the other ear. Multi-voxel pattern searchlight analysis was used to identify brain areas that consistently differentiated between response patterns associated with different syllable reports. By comparing responses to different stimuli with identical syllable reports and identical stimuli with different syllable reports, we disambiguated whether these regions primarily differentiated the acoustics of the stimuli or the syllable report. We found that BOLD activity patterns in left perisylvian regions (STG, SMG), left inferior frontal regions (vMC, IFG, AI), left supplementary motor cortex (SMA/pre-SMA), and right motor and somatosensory regions (M1/S1) represent listeners' syllable report irrespective of stimulus acoustics. Most of these regions are outside of what is traditionally regarded as auditory or phonological processing areas. Our results indicate that the process of speech sound categorization implicates decision-making mechanisms and auditory-motor transformations.
Collapse
Affiliation(s)
- Basil C Preisig
- Donders Institute for Brain, Cognition, and Behaviour, Radboud University, 6500 HB Nijmegen, The Netherlands; Max Planck Institute for Psycholinguistics, 6525 XD Nijmegen, The Netherlands; Department of Psychology, Neurolinguistics, University of Zurich, 8050 Zurich, Switzerland; Department of Comparative Language Science, Evolutionary Neuroscience of Language, University of Zurich, 8050 Zurich, Switzerland; Neuroscience Center Zurich, University of Zurich and Eidgenössische Technische Hochschule Zurich, 8057 Zurich, Switzerland.
| | - Lars Riecke
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, 6229 ER Maastricht, The Netherlands
| | - Alexis Hervais-Adelman
- Department of Psychology, Neurolinguistics, University of Zurich, 8050 Zurich, Switzerland; Neuroscience Center Zurich, University of Zurich and Eidgenössische Technische Hochschule Zurich, 8057 Zurich, Switzerland
| |
Collapse
|
13
|
Zhang L, Du Y. Lip movements enhance speech representations and effective connectivity in auditory dorsal stream. Neuroimage 2022; 257:119311. [PMID: 35589000 DOI: 10.1016/j.neuroimage.2022.119311] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 05/09/2022] [Accepted: 05/11/2022] [Indexed: 11/25/2022] Open
Abstract
Viewing speaker's lip movements facilitates speech perception, especially under adverse listening conditions, but the neural mechanisms of this perceptual benefit at the phonemic and feature levels remain unclear. This fMRI study addressed this question by quantifying regional multivariate representation and network organization underlying audiovisual speech-in-noise perception. Behaviorally, valid lip movements improved recognition of place of articulation to aid phoneme identification. Meanwhile, lip movements enhanced neural representations of phonemes in left auditory dorsal stream regions, including frontal speech motor areas and supramarginal gyrus (SMG). Moreover, neural representations of place of articulation and voicing features were promoted differentially by lip movements in these regions, with voicing enhanced in Broca's area while place of articulation better encoded in left ventral premotor cortex and SMG. Next, dynamic causal modeling (DCM) analysis showed that such local changes were accompanied by strengthened effective connectivity along the dorsal stream. Moreover, the neurite orientation dispersion of the left arcuate fasciculus, the bearing skeleton of auditory dorsal stream, predicted the visual enhancements of neural representations and effective connectivity. Our findings provide novel insight to speech science that lip movements promote both local phonemic and feature encoding and network connectivity in the dorsal pathway and the functional enhancement is mediated by the microstructural architecture of the circuit.
Collapse
Affiliation(s)
- Lei Zhang
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China 100101; Department of Psychology, University of Chinese Academy of Sciences, Beijing, China 100049
| | - Yi Du
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China 100101; Department of Psychology, University of Chinese Academy of Sciences, Beijing, China 100049; CAS Center for Excellence in Brain Science and Intelligence Technology, Shanghai, China 200031; Chinese Institute for Brain Research, Beijing, China 102206.
| |
Collapse
|
14
|
Lim SJ, Thiel C, Sehm B, Deserno L, Lepsien J, Obleser J. Distributed networks for auditory memory differentially contribute to recall precision. Neuroimage 2022; 256:119227. [PMID: 35452804 DOI: 10.1016/j.neuroimage.2022.119227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Revised: 03/13/2022] [Accepted: 04/17/2022] [Indexed: 11/25/2022] Open
Abstract
Re-directing attention to objects in working memory can enhance their representational fidelity. However, how this attentional enhancement of memory representations is implemented across distinct, sensory and cognitive-control brain network is unspecified. The present fMRI experiment leverages psychophysical modelling and multivariate auditory-pattern decoding as behavioral and neural proxies of mnemonic fidelity. Listeners performed an auditory syllable pitch-discrimination task and received retro-active cues to selectively attend to a to-be-probed syllable in memory. Accompanied by increased neural activation in fronto-parietal and cingulo-opercular networks, valid retro-cues yielded faster and more perceptually sensitive responses in recalling acoustic detail of memorized syllables. Information about the cued auditory object was decodable from hemodynamic response patterns in superior temporal sulcus (STS), fronto-parietal, and sensorimotor regions. However, among these regions retaining auditory memory objects, neural fidelity in the left STS and its enhancement through attention-to-memory best predicted individuals' gain in auditory memory recall precision. Our results demonstrate how functionally discrete brain regions differentially contribute to the attentional enhancement of memory representations.
Collapse
Affiliation(s)
- Sung-Joo Lim
- Department of Psychology, University of Lübeck, Maria-Goeppert-Str. 9a, Lübeck 23562, Germany; Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany; Department of Psychology, Binghamton University, State University of New York, 4400 Vestal Parkway E, Vestal, Binghamton, NY 13902, USA; Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA, USA.
| | - Christiane Thiel
- Department of Psychology, Carl von Ossietzky University of Oldenburg, Oldenburg 26129, Germany
| | - Bernhard Sehm
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany
| | - Lorenz Deserno
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany
| | - Jöran Lepsien
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany
| | - Jonas Obleser
- Department of Psychology, University of Lübeck, Maria-Goeppert-Str. 9a, Lübeck 23562, Germany; Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany; Center of Brain, Behavior, and Metabolism, University of Lübeck, Lübeck 23562, Germany.
| |
Collapse
|
15
|
Whitehead JC, Armony JL. Intra-individual Reliability of Voice- and Music-elicited Responses and their Modulation by Expertise. Neuroscience 2022; 487:184-197. [PMID: 35182696 DOI: 10.1016/j.neuroscience.2022.02.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 01/19/2022] [Accepted: 02/10/2022] [Indexed: 10/19/2022]
Abstract
A growing number of functional neuroimaging studies have identified regions within the temporal lobe, particularly along the planum polare and planum temporale, that respond more strongly to music than other types of acoustic stimuli, including voice. This "music preferred" regions have been reported using a variety of stimulus sets, paradigms and analysis approaches and their consistency across studies confirmed through meta-analyses. However, the critical question of intra-subject reliability of these responses has received less attention. Here, we directly assessed this important issue by contrasting brain responses to musical vs. vocal stimuli in the same subjects across three consecutive fMRI runs, using different types of stimuli. Moreover, we investigated whether these music- and voice-preferred responses were reliably modulated by expertise. Results demonstrated that music-preferred activity previously reported in temporal regions, and its modulation by expertise, exhibits a high intra-subject reliability. However, we also found that activity in some extra-temporal regions, such as the precentral and middle frontal gyri, did depend on the particular stimuli employed, which may explain why these are less consistently reported in the literature. Taken together, our findings confirm and extend the notion that specific regions in the brain consistently respond more strongly to certain socially-relevant stimulus categories, such as faces, voices and music, but that some of these responses appear to depend, at least to some extent, on the specific features of the paradigm employed.
Collapse
Affiliation(s)
- Jocelyne C Whitehead
- Douglas Mental Health University Institute, Verdun, Canada; BRAMS Laboratory, Centre for Research on Brain, Language and Music, Montreal, Canada; Integrated Program in Neuroscience, McGill University, Montreal, Canada.
| | - Jorge L Armony
- Douglas Mental Health University Institute, Verdun, Canada; BRAMS Laboratory, Centre for Research on Brain, Language and Music, Montreal, Canada; Department of Psychiatry, McGill University, Montreal, Canada
| |
Collapse
|
16
|
Tamura S, Hirose N, Mitsudo T, Hoaki N, Nakamura I, Onitsuka T, Hirano Y. Multi-modal imaging of the auditory-larynx motor network for voicing perception. Neuroimage 2022; 251:118981. [PMID: 35150835 DOI: 10.1016/j.neuroimage.2022.118981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Revised: 12/20/2021] [Accepted: 02/07/2022] [Indexed: 10/19/2022] Open
Abstract
Voicing is one of the most important characteristics of phonetic speech sounds. Despite its importance, voicing perception mechanisms remain largely unknown. To explore auditory-motor networks associated with voicing perception, we firstly examined the brain regions that showed common activities for voicing production and perception using functional magnetic resonance imaging. Results indicated that the auditory and speech motor areas were activated with the operculum parietale 4 (OP4) during both voicing production and perception. Secondly, we used a magnetoencephalography and examined the dynamical functional connectivity of the auditory-motor networks during a perceptual categorization task of /da/-/ta/ continuum stimuli varying in voice onset time (VOT) from 0 to 40 ms in 10 ms steps. Significant functional connectivities from the auditory cortical regions to the larynx motor area via OP4 were observed only when perceiving the stimulus with VOT 30 ms. In addition, regional activity analysis showed that the neural representation of VOT in the auditory cortical regions was mostly correlated with categorical perception of voicing but did not reflect the perception of stimulus with VOT 30 ms. We suggest that the larynx motor area, which is considered to play a crucial role in voicing production, contributes to categorical perception of voicing by complementing the temporal processing in the auditory cortical regions.
Collapse
Affiliation(s)
- Shunsuke Tamura
- Department of Neuropsychiatry, Graduate School of Medical Sciences, Kyushu University, 3-1-1 Maidashi, Higashiku, Fukuoka 812-8582, Japan.
| | - Nobuyuki Hirose
- Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan
| | - Takako Mitsudo
- Department of Neuropsychiatry, Graduate School of Medical Sciences, Kyushu University, 3-1-1 Maidashi, Higashiku, Fukuoka 812-8582, Japan
| | | | - Itta Nakamura
- Department of Neuropsychiatry, Graduate School of Medical Sciences, Kyushu University, 3-1-1 Maidashi, Higashiku, Fukuoka 812-8582, Japan
| | - Toshiaki Onitsuka
- Department of Neuropsychiatry, Graduate School of Medical Sciences, Kyushu University, 3-1-1 Maidashi, Higashiku, Fukuoka 812-8582, Japan
| | - Yoji Hirano
- Department of Neuropsychiatry, Graduate School of Medical Sciences, Kyushu University, 3-1-1 Maidashi, Higashiku, Fukuoka 812-8582, Japan; Neural Dynamics Laboratory, Research Service, VA Boston Healthcare System, and Department of Psychiatry, Harvard Medical School, Boston, United States
| |
Collapse
|
17
|
Feng G, Gan Z, Yi HG, Ell SW, Roark CL, Wang S, Wong PCM, Chandrasekaran B. Neural dynamics underlying the acquisition of distinct auditory category structures. Neuroimage 2021; 244:118565. [PMID: 34543762 DOI: 10.1016/j.neuroimage.2021.118565] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 09/05/2021] [Accepted: 09/06/2021] [Indexed: 11/16/2022] Open
Abstract
Despite the multidimensional and temporally fleeting nature of auditory signals we quickly learn to assign novel sounds to behaviorally relevant categories. The neural systems underlying the learning and representation of novel auditory categories are far from understood. Current models argue for a rigid specialization of hierarchically organized core regions that are fine-tuned to extracting and mapping relevant auditory dimensions to meaningful categories. Scaffolded within a dual-learning systems approach, we test a competing hypothesis: the spatial and temporal dynamics of emerging auditory-category representations are not driven by the underlying dimensions but are constrained by category structure and learning strategies. To test these competing models, we used functional Magnetic Resonance Imaging (fMRI) to assess representational dynamics during the feedback-based acquisition of novel non-speech auditory categories with identical dimensions but differing category structures: rule-based (RB) categories, hypothesized to involve an explicit sound-to-rule mapping network, and information integration (II) based categories, involving pre-decisional integration of dimensions via a procedural-based sound-to-reward mapping network. Adults were assigned to either the RB (n = 30, 19 females) or II (n = 30, 22 females) learning tasks. Despite similar behavioral learning accuracies, learning strategies derived from computational modeling and involvements of corticostriatal systems during feedback processing differed across tasks. Spatiotemporal multivariate representational similarity analysis revealed an emerging representation within an auditory sensory-motor pathway exclusively for the II learning task, prominently involving the superior temporal gyrus (STG), inferior frontal gyrus (IFG), and posterior precentral gyrus. In contrast, the RB learning task yielded distributed neural representations within regions involved in cognitive-control and attentional processes that emerged at different time points of learning. Our results unequivocally demonstrate that auditory learners' neural systems are highly flexible and show distinct spatial and temporal patterns that are not dimension-specific but reflect underlying category structures and learning strategies.
Collapse
Affiliation(s)
- Gangyi Feng
- Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China; Brain and Mind Institute, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China.
| | - Zhenzhong Gan
- Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China; Key Laboratory of Brain, Cognition and Education Sciences, Ministry of Education, China, School of Psychology, Center for Studies of Psychological Application, and Guangdong Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou 510631, China
| | - Han Gyol Yi
- Department of Neurological Surgery, University of California, San Francisco, CA 94158, United States
| | - Shawn W Ell
- Department of Psychology, Graduate School of Biomedical Sciences and Engineering, University of Maine, 5742 Little Hall, Room 301, Orono, ME 04469-5742, United States
| | - Casey L Roark
- Department of Communication Science and Disorders, School of Health and Rehabilitation Sciences, University of Pittsburgh, Pittsburgh, PA 15260, United States; Center for the Neural Basis of Cognition, Pittsburgh, PA 15232, United States
| | - Suiping Wang
- Key Laboratory of Brain, Cognition and Education Sciences, Ministry of Education, China, School of Psychology, Center for Studies of Psychological Application, and Guangdong Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou 510631, China
| | - Patrick C M Wong
- Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China; Brain and Mind Institute, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China
| | - Bharath Chandrasekaran
- Department of Communication Science and Disorders, School of Health and Rehabilitation Sciences, University of Pittsburgh, Pittsburgh, PA 15260, United States; Center for the Neural Basis of Cognition, Pittsburgh, PA 15232, United States.
| |
Collapse
|
18
|
Abu Bakar AR, Lai KW, Hamzaid NA. The emergence of machine learning in auditory neural impairment: A systematic review. Neurosci Lett 2021; 765:136250. [PMID: 34536511 DOI: 10.1016/j.neulet.2021.136250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 08/16/2021] [Indexed: 11/25/2022]
Abstract
Hearing loss is a common neurodegenerative disease that can start at any stage of life. Misalignment of the auditory neural impairment may impose challenges in processing incoming auditory stimulus that can be measured using electroencephalography (EEG). The electrophysiological behaviour response emanated from EEG auditory evoked potential (AEP) requires highly trained professionals for analysis and interpretation. Reliable automated methods using techniques of machine learning would assist the auditory assessment process for informed treatment and practice. It is thus highly required to develop models that are more efficient and precise by considering the characteristics of brain signals. This study aims to provide a comprehensive review of several state-of-the-art techniques of machine learning that adopt EEG evoked response for the auditory assessment within the last 13 years. Out of 161 initially screened articles, 11 were retained for synthesis. The outcome of the review presented that the Support Vector Machine (SVM) classifier outperformed with over 80% accuracy metric and was recognized as the best suited model within the field of auditory research. This paper discussed the comprehensive iterative properties of the proposed computed algorithms and the feasible future direction in hearing impaired rehabilitation.
Collapse
Affiliation(s)
- Abdul Rauf Abu Bakar
- Department of Biomedical Engineering, Faculty of Engineering, Universiti Malaya, 50603 Kuala Lumpur, Malaysia.
| | - Khin Wee Lai
- Department of Biomedical Engineering, Faculty of Engineering, Universiti Malaya, 50603 Kuala Lumpur, Malaysia.
| | - Nur Azah Hamzaid
- Department of Biomedical Engineering, Faculty of Engineering, Universiti Malaya, 50603 Kuala Lumpur, Malaysia
| |
Collapse
|
19
|
Learning nonnative speech sounds changes local encoding in the adult human cortex. Proc Natl Acad Sci U S A 2021; 118:2101777118. [PMID: 34475209 DOI: 10.1073/pnas.2101777118] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Accepted: 07/12/2021] [Indexed: 11/18/2022] Open
Abstract
Adults can learn to identify nonnative speech sounds with training, albeit with substantial variability in learning behavior. Increases in behavioral accuracy are associated with increased separability for sound representations in cortical speech areas. However, it remains unclear whether individual auditory neural populations all show the same types of changes with learning, or whether there are heterogeneous encoding patterns. Here, we used high-resolution direct neural recordings to examine local population response patterns, while native English listeners learned to recognize unfamiliar vocal pitch patterns in Mandarin Chinese tones. We found a distributed set of neural populations in bilateral superior temporal gyrus and ventrolateral frontal cortex, where the encoding of Mandarin tones changed throughout training as a function of trial-by-trial accuracy ("learning effect"), including both increases and decreases in the separability of tones. These populations were distinct from populations that showed changes as a function of exposure to the stimuli regardless of trial-by-trial accuracy. These learning effects were driven in part by more variable neural responses to repeated presentations of acoustically identical stimuli. Finally, learning effects could be predicted from speech-evoked activity even before training, suggesting that intrinsic properties of these populations make them amenable to behavior-related changes. Together, these results demonstrate that nonnative speech sound learning involves a wide array of changes in neural representations across a distributed set of brain regions.
Collapse
|
20
|
Abstract
Creating invariant representations from an everchanging speech signal is a major challenge for the human brain. Such an ability is particularly crucial for preverbal infants who must discover the phonological, lexical, and syntactic regularities of an extremely inconsistent signal in order to acquire language. Within the visual domain, an efficient neural solution to overcome variability consists in factorizing the input into a reduced set of orthogonal components. Here, we asked whether a similar decomposition strategy is used in early speech perception. Using a 256-channel electroencephalographic system, we recorded the neural responses of 3-mo-old infants to 120 natural consonant-vowel syllables with varying acoustic and phonetic profiles. Using multivariate pattern analyses, we show that syllables are factorized into distinct and orthogonal neural codes for consonants and vowels. Concerning consonants, we further demonstrate the existence of two stages of processing. A first phase is characterized by orthogonal and context-invariant neural codes for the dimensions of manner and place of articulation. Within the second stage, manner and place codes are integrated to recover the identity of the phoneme. We conclude that, despite the paucity of articulatory motor plans and speech production skills, pre-babbling infants are already equipped with a structured combinatorial code for speech analysis, which might account for the rapid pace of language acquisition during the first year.
Collapse
|
21
|
Beach SD, Ozernov-Palchik O, May SC, Centanni TM, Gabrieli JDE, Pantazis D. Neural Decoding Reveals Concurrent Phonemic and Subphonemic Representations of Speech Across Tasks. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2021; 2:254-279. [PMID: 34396148 PMCID: PMC8360503 DOI: 10.1162/nol_a_00034] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Accepted: 02/21/2021] [Indexed: 06/13/2023]
Abstract
Robust and efficient speech perception relies on the interpretation of acoustically variable phoneme realizations, yet prior neuroimaging studies are inconclusive regarding the degree to which subphonemic detail is maintained over time as categorical representations arise. It is also unknown whether this depends on the demands of the listening task. We addressed these questions by using neural decoding to quantify the (dis)similarity of brain response patterns evoked during two different tasks. We recorded magnetoencephalography (MEG) as adult participants heard isolated, randomized tokens from a /ba/-/da/ speech continuum. In the passive task, their attention was diverted. In the active task, they categorized each token as ba or da. We found that linear classifiers successfully decoded ba vs. da perception from the MEG data. Data from the left hemisphere were sufficient to decode the percept early in the trial, while the right hemisphere was necessary but not sufficient for decoding at later time points. We also decoded stimulus representations and found that they were maintained longer in the active task than in the passive task; however, these representations did not pattern more like discrete phonemes when an active categorical response was required. Instead, in both tasks, early phonemic patterns gave way to a representation of stimulus ambiguity that coincided in time with reliable percept decoding. Our results suggest that the categorization process does not require the loss of subphonemic detail, and that the neural representation of isolated speech sounds includes concurrent phonemic and subphonemic information.
Collapse
Affiliation(s)
- Sara D. Beach
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA, USA
| | - Ola Ozernov-Palchik
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Sidney C. May
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Lynch School of Education and Human Development, Boston College, Chestnut Hill, MA, USA
| | - Tracy M. Centanni
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Psychology, Texas Christian University, Fort Worth, TX, USA
| | - John D. E. Gabrieli
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Dimitrios Pantazis
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
22
|
Cummings AE, Wu YC, Ogiela DA. Phonological Underspecification: An Explanation for How a Rake Can Become Awake. Front Hum Neurosci 2021; 15:585817. [PMID: 33679342 PMCID: PMC7925882 DOI: 10.3389/fnhum.2021.585817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Accepted: 01/25/2021] [Indexed: 11/13/2022] Open
Abstract
Neural markers, such as the mismatch negativity (MMN), have been used to examine the phonological underspecification of English feature contrasts using the Featurally Underspecified Lexicon (FUL) model. However, neural indices have not been examined within the approximant phoneme class, even though there is evidence suggesting processing asymmetries between liquid (e.g., /ɹ/) and glide (e.g., /w/) phonemes. The goal of this study was to determine whether glide phonemes elicit electrophysiological asymmetries related to [consonantal] underspecification when contrasted with liquid phonemes in adult English speakers. Specifically, /ɹɑ/ is categorized as [+consonantal] while /wɑ/ is not specified [i.e., (-consonantal)]. Following the FUL framework, if /w/ is less specified than /ɹ/, the former phoneme should elicit a larger MMN response than the latter phoneme. Fifteen English-speaking adults were presented with two syllables, /ɹɑ/ and /wɑ/, in an event-related potential (ERP) oddball paradigm in which both syllables served as the standard and deviant stimulus in opposite stimulus sets. Three types of analyses were used: (1) traditional mean amplitude measurements; (2) cluster-based permutation analyses; and (3) event-related spectral perturbation (ERSP) analyses. The less specified /wɑ/ elicited a large MMN, while a much smaller MMN was elicited by the more specified /ɹɑ/. In the standard and deviant ERP waveforms, /wɑ/ elicited a significantly larger negative response than did /ɹɑ/. Theta activity elicited by /ɹɑ/ was significantly greater than that elicited by /wɑ/ in the 100-300 ms time window. Also, low gamma activation was significantly lower for /ɹɑ/ vs. /wɑ/ deviants over the left hemisphere, as compared to the right, in the 100-150 ms window. These outcomes suggest that the [consonantal] feature follows the underspecification predictions of FUL previously tested with the place of articulation and voicing features. Thus, this study provides new evidence for phonological underspecification. Moreover, as neural oscillation patterns have not previously been discussed in the underspecification literature, the ERSP analyses identified potential new indices of phonological underspecification.
Collapse
Affiliation(s)
- Alycia E. Cummings
- Department of Communication Sciences and Disorders, Idaho State University, Meridian, ID, United States
| | - Ying C. Wu
- Swartz Center for Computational Neuroscience, University of California, San Diego, San Diego, CA, United States
| | - Diane A. Ogiela
- Department of Communication Sciences and Disorders, Idaho State University, Meridian, ID, United States
| |
Collapse
|
23
|
Yue Q, Martin RC. Maintaining verbal short-term memory representations in non-perceptual parietal regions. Cortex 2021; 138:72-89. [PMID: 33677329 DOI: 10.1016/j.cortex.2021.01.020] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2020] [Revised: 11/09/2020] [Accepted: 01/27/2021] [Indexed: 12/13/2022]
Abstract
Buffer accounts of verbal short-term memory (STM) assume dedicated buffers for maintaining different types of information (e.g., phonological, visual) whereas embedded processes accounts argue against the existence of buffers and claim that STM consists of the activated portion of long-term memory (LTM). We addressed this debate by determining whether STM recruits the same neural substrate as LTM, or whether additional regions are involved in short-term storage. Using fMRI with representational similarity analysis (RSA), we examined the representational correspondence of multi-voxel neural activation patterns with the theoretical predictions for the maintenance of both phonological and semantic codes in STM. We found that during the delay period of a phonological STM task, phonological representations could be decoded in the left supramarginal gyrus (SMG) but not the superior temporal gyrus (STG), a speech processing region, for word stimuli. Whereas the pattern in the SMG was specific to phonology, a different region in the left angular gyrus showed RSA decoding evidence for the retention of either phonological or semantic codes, depending on the task context. Taken together, the results provide clear support for a dedicated buffer account of phonological STM, although evidence for a semantic buffer is equivocal.
Collapse
Affiliation(s)
- Qiuhai Yue
- Department of Psychological Sciences, Rice University, Houston, TX 77005, USA; Department of Psychology, Vanderbilt University, Nashville, TN 37240, USA.
| | - Randi C Martin
- Department of Psychological Sciences, Rice University, Houston, TX 77005, USA.
| |
Collapse
|
24
|
Urbschat A, Uppenkamp S, Anemüller J. Searchlight Classification Informative Region Mixture Model (SCIM): Identification of Cortical Regions Showing Discriminable BOLD Patterns in Event-Related Auditory fMRI Data. Front Neurosci 2021; 14:616906. [PMID: 33597841 PMCID: PMC7882477 DOI: 10.3389/fnins.2020.616906] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 12/29/2020] [Indexed: 11/13/2022] Open
Abstract
The investigation of abstract cognitive tasks, e.g., semantic processing of speech, requires the simultaneous use of a carefully selected stimulus design and sensitive tools for the analysis of corresponding neural activity that are comparable across different studies investigating similar research questions. Multi-voxel pattern analysis (MVPA) methods are commonly used in neuroimaging to investigate BOLD responses corresponding to neural activation associated with specific cognitive tasks. Regions of significant activation are identified by a thresholding operation during multivariate pattern analysis, the results of which are susceptible to the applied threshold value. Investigation of analysis approaches that are robust to a large extent with respect to thresholding, is thus an important goal pursued here. The present paper contributes a novel statistical analysis method for fMRI experiments, searchlight classification informative region mixture model (SCIM), that is based on the assumption that the whole brain volume can be subdivided into two groups of voxels: spatial voxel positions around which recorded BOLD activity does convey information about the present stimulus condition and those that do not. A generative statistical model is proposed that assigns a probability of being informative to each position in the brain, based on a combination of a support vector machine searchlight analysis and Gaussian mixture models. Results from an auditory fMRI study investigating cortical regions that are engaged in the semantic processing of speech indicate that the SCIM method identifies physiologically plausible brain regions as informative, similar to those from two standard methods as reference that we compare to, with two important differences. SCIM-identified regions are very robust to the choice of the threshold for significance, i.e., less “noisy,” in contrast to, e.g., the binomial test whose results in the present experiment are highly dependent on the chosen significance threshold or random permutation tests that are additionally bound to very high computational costs. In group analyses, the SCIM method identifies a physiologically plausible pre-frontal region, anterior cingulate sulcus, to be involved in semantic processing that other methods succeed to identify only in single subject analyses.
Collapse
Affiliation(s)
- Annika Urbschat
- Department of Medical Physics and Acoustics, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
| | - Stefan Uppenkamp
- Department of Medical Physics and Acoustics, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
| | - Jörn Anemüller
- Department of Medical Physics and Acoustics, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
| |
Collapse
|
25
|
Häusler CO, Hanke M. A studyforrest extension, an annotation of spoken language in the German dubbed movie "Forrest Gump" and its audio-description. F1000Res 2021; 10:54. [PMID: 33732435 PMCID: PMC7921887 DOI: 10.12688/f1000research.27621.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/12/2021] [Indexed: 11/20/2022] Open
Abstract
Here we present an annotation of speech in the audio-visual movie "Forrest Gump" and its audio-description for a visually impaired audience, as an addition to a large public functional brain imaging dataset ( studyforrest.org). The annotation provides information about the exact timing of each of the more than 2500 spoken sentences, 16,000 words (including 202 non-speech vocalizations), 66,000 phonemes, and their corresponding speaker. Additionally, for every word, we provide lemmatization, a simple part-of-speech-tagging (15 grammatical categories), a detailed part-of-speech tagging (43 grammatical categories), syntactic dependencies, and a semantic analysis based on word embedding which represents each word in a 300-dimensional semantic space. To validate the dataset's quality, we build a model of hemodynamic brain activity based on information drawn from the annotation. Results suggest that the annotation's content and quality enable independent researchers to create models of brain activity correlating with a variety of linguistic aspects under conditions of near-real-life complexity.
Collapse
Affiliation(s)
- Christian Olaf Häusler
- Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Center Jülich, Jülich, Nordrhein-Westfalen, 52425, Germany
- Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University, Düsseldorf, Nordrhein-Westfalen, 40225, Germany
| | - Michael Hanke
- Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Center Jülich, Jülich, Nordrhein-Westfalen, 52425, Germany
- Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University, Düsseldorf, Nordrhein-Westfalen, 40225, Germany
| |
Collapse
|
26
|
Feng G, Gan Z, Llanos F, Meng D, Wang S, Wong PCM, Chandrasekaran B. A distributed dynamic brain network mediates linguistic tone representation and categorization. Neuroimage 2021; 224:117410. [PMID: 33011415 PMCID: PMC7749825 DOI: 10.1016/j.neuroimage.2020.117410] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Revised: 08/21/2020] [Accepted: 09/25/2020] [Indexed: 12/21/2022] Open
Abstract
Successful categorization requires listeners to represent the incoming sensory information, resolve the "blooming, buzzing confusion" inherent to noisy sensory signals, and leverage the accumulated evidence towards making a decision. Despite decades of intense debate, the neural systems underlying speech categorization remain unresolved. Here we assessed the neural representation and categorization of lexical tones by native Mandarin speakers (N = 31) across a range of acoustic and contextual variabilities (talkers, perceptual saliences, and stimulus-contexts) using functional magnetic imaging (fMRI) and an evidence accumulation model of decision-making. Univariate activation and multivariate pattern analyses reveal that the acoustic-variability-tolerant representations of tone category are observed within the middle portion of the left superior temporal gyrus (STG). Activation patterns in the frontal and parietal regions also contained category-relevant information that was differentially sensitive to various forms of variability. The robustness of neural representations of tone category in a distributed fronto-temporoparietal network is associated with trial-by-trial decision-making parameters. These findings support a hybrid model involving a representational core within the STG that operates dynamically within an extensive frontoparietal network to support the representation and categorization of linguistic pitch patterns.
Collapse
Affiliation(s)
- Gangyi Feng
- Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China; Brain and Mind Institute, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China.
| | - Zhenzhong Gan
- Center for the Study of Applied Psychology and School of Psychology, South China Normal University, Guangzhou 510631, China
| | - Fernando Llanos
- Department of Communication Science and Disorders, School of Health and Rehabilitation Sciences, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Danting Meng
- Center for the Study of Applied Psychology and School of Psychology, South China Normal University, Guangzhou 510631, China
| | - Suiping Wang
- Center for the Study of Applied Psychology and School of Psychology, South China Normal University, Guangzhou 510631, China; Guangdong Provincial Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou 510631, China
| | - Patrick C M Wong
- Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China; Brain and Mind Institute, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China
| | - Bharath Chandrasekaran
- Department of Communication Science and Disorders, School of Health and Rehabilitation Sciences, University of Pittsburgh, Pittsburgh, PA 15260, United States.
| |
Collapse
|
27
|
Ren J, Xu T, Wang D, Li M, Lin Y, Schoeppe F, Ramirez JSB, Han Y, Luan G, Li L, Liu H, Ahveninen J. Individual Variability in Functional Organization of the Human and Monkey Auditory Cortex. Cereb Cortex 2020; 31:2450-2465. [PMID: 33350445 DOI: 10.1093/cercor/bhaa366] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Revised: 11/01/2020] [Accepted: 11/05/2020] [Indexed: 12/13/2022] Open
Abstract
Accumulating evidence shows that auditory cortex (AC) of humans, and other primates, is involved in more complex cognitive processes than feature segregation only, which are shaped by experience-dependent plasticity and thus likely show substantial individual variability. However, thus far, individual variability of ACs has been considered a methodological impediment rather than a phenomenon of theoretical importance. Here, we examined the variability of ACs using intrinsic functional connectivity patterns in humans and macaques. Our results demonstrate that in humans, interindividual variability is greater near the nonprimary than primary ACs, indicating that variability dramatically increases across the processing hierarchy. ACs are also more variable than comparable visual areas and show higher variability in the left than in the right hemisphere, which may be related to the left lateralization of auditory-related functions such as language. Intriguingly, remarkably similar modality differences and lateralization of variability were also observed in macaques. These connectivity-based findings are consistent with a confirmatory task-based functional magnetic resonance imaging analysis. The quantification of variability in auditory function, and the similar findings in both humans and macaques, will have strong implications for understanding the evolution of advanced auditory functions in humans.
Collapse
Affiliation(s)
- Jianxun Ren
- National Engineering Laboratory for Neuromodulation, School of Aerospace Engineering, Tsinghua University, 100084 Beijing, China.,Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Charlestown, MA 02129, USA
| | - Ting Xu
- Center for the Developing Brain, Child Mind Institute, New York, NY 10022, USA
| | - Danhong Wang
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Charlestown, MA 02129, USA
| | - Meiling Li
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Charlestown, MA 02129, USA
| | - Yuanxiang Lin
- Department of Neurosurgery, First Affiliated Hospital, Fujian Medical University, 350108 Fuzhou, China
| | - Franziska Schoeppe
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Charlestown, MA 02129, USA
| | - Julian S B Ramirez
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR 97239, USA
| | - Ying Han
- Department of Neurology, Xuanwu Hospital of Capital Medical University, 100053 Beijing, China
| | - Guoming Luan
- Department of Neurosurgery, Comprehensive Epilepsy Center, Sanbo Brain Hospital, Capital Medical University, 100093 Beijing, China
| | - Luming Li
- National Engineering Laboratory for Neuromodulation, School of Aerospace Engineering, Tsinghua University, 100084 Beijing, China.,Precision Medicine & Healthcare Research Center, Tsinghua-Berkeley Shenzhen Institute, Tsinghua University, 518055 Shenzhen, China.,IDG/McGovern Institute for Brain Research, Tsinghua University, 100084 Beijing, China
| | - Hesheng Liu
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Charlestown, MA 02129, USA.,Department of Neuroscience, Medical University of South Carolina, Charleston, SC 29425, USA
| | - Jyrki Ahveninen
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Charlestown, MA 02129, USA
| |
Collapse
|
28
|
Single-cell activity in human STG during perception of phonemes is organized according to manner of articulation. Neuroimage 2020; 226:117499. [PMID: 33186717 DOI: 10.1016/j.neuroimage.2020.117499] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Revised: 09/29/2020] [Accepted: 10/21/2020] [Indexed: 11/23/2022] Open
Abstract
One of the central tasks of the human auditory system is to extract sound features from incoming acoustic signals that are most critical for speech perception. Specifically, phonological features and phonemes are the building blocks for more complex linguistic entities, such as syllables, words and sentences. Previous ECoG and EEG studies showed that various regions in the superior temporal gyrus (STG) exhibit selective responses to specific phonological features. However, electrical activity recorded by ECoG or EEG grids reflects average responses of large neuronal populations and is therefore limited in providing insights into activity patterns of single neurons. Here, we recorded spiking activity from 45 units in the STG from six neurosurgical patients who performed a listening task with phoneme stimuli. Fourteen units showed significant responsiveness to the stimuli. Using a Naïve-Bayes model, we find that single-cell responses to phonemes are governed by manner-of-articulation features and are organized according to sonority with two main clusters for sonorants and obstruents. We further find that 'neural similarity' (i.e. the similarity of evoked spiking activity between pairs of phonemes) is comparable to the 'perceptual similarity' (i.e. to what extent two phonemes are judged as sounding similar) based on perceptual confusion, assessed behaviorally in healthy subjects. Thus, phonemes that were perceptually similar also had similar neural responses. Taken together, our findings indicate that manner-of-articulation is the dominant organization dimension of phoneme representations at the single-cell level, suggesting a remarkable consistency across levels of analyses, from the single neuron level to that of large neuronal populations and behavior.
Collapse
|
29
|
Chyl K, Kossowski B, Wang S, Dębska A, Łuniewska M, Marchewka A, Wypych M, Bunt MVD, Mencl W, Pugh K, Jednoróg K. The brain signature of emerging reading in two contrasting languages. Neuroimage 2020; 225:117503. [PMID: 33130273 DOI: 10.1016/j.neuroimage.2020.117503] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Revised: 10/16/2020] [Accepted: 10/18/2020] [Indexed: 11/19/2022] Open
Abstract
Despite dissimilarities among scripts, a universal hallmark of literacy in skilled readers is the convergent brain activity for print and speech. Little is known, however, whether this differs as a function of grapheme to phoneme transparency in beginning readers. Here we compare speech and orthographic processing circuits in two contrasting languages, Polish and English, in 100 7-year-old children performing fMRI language localizer tasks. Results show limited language variation, with speech-print convergence evident mostly in left frontotemporal perisylvian regions. Correlational and intersect analyses revealed subtle differences in the strength of this coupling in several regions of interest. Specifically, speech-print convergence was higher for transparent Polish than opaque English in the right temporal area, associated with phonological processing. Conversely, speech-print convergence was higher for English than Polish in left fusiform, associated with visual word recognition. We conclude that speech-print convergence is a universal marker of reading even at the beginning of reading acquisition with minor variations that can be explained by the differences in grapheme to phoneme transparency. This finding at the earliest stages of reading acquisition conforms well with claims that reading exhibits a good deal of universality despite writing systems differences.
Collapse
Affiliation(s)
- Katarzyna Chyl
- Laboratory of Language Neurobiology, Nencki Institute of Experimental Biology, PAS, Warsaw, Poland.
| | - Bartosz Kossowski
- Laboratory of Brain Imaging, Nencki Institute of Experimental Biology, PAS, Warsaw, Poland
| | - Shuai Wang
- Shanghai Key Laboratory of Brain Functional Genomics, East China Normal University, China; CNRS, LPL, Aix Marseille University, Aix-en-Provence, France; Institute of Language, Communication and the Brain, Brain and Language Research Institute, Aix Marseille University, Aix-en-Provence, France
| | - Agnieszka Dębska
- Laboratory of Language Neurobiology, Nencki Institute of Experimental Biology, PAS, Warsaw, Poland
| | - Magdalena Łuniewska
- Laboratory of Language Neurobiology, Nencki Institute of Experimental Biology, PAS, Warsaw, Poland
| | - Artur Marchewka
- Laboratory of Brain Imaging, Nencki Institute of Experimental Biology, PAS, Warsaw, Poland
| | - Marek Wypych
- Laboratory of Brain Imaging, Nencki Institute of Experimental Biology, PAS, Warsaw, Poland
| | | | | | - Kenneth Pugh
- Haskins Laboratories, New Haven, CT, USA; Department of Diagnostic Radiology, Yale University School of Medicine, New Haven, CT, USA; Department of Psychology, University of Connecticut, Storrs, CT, USA
| | - Katarzyna Jednoróg
- Laboratory of Language Neurobiology, Nencki Institute of Experimental Biology, PAS, Warsaw, Poland.
| |
Collapse
|
30
|
Feng G, Yi HG, Chandrasekaran B. The Role of the Human Auditory Corticostriatal Network in Speech Learning. Cereb Cortex 2020; 29:4077-4089. [PMID: 30535138 DOI: 10.1093/cercor/bhy289] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2018] [Revised: 08/30/2018] [Indexed: 01/26/2023] Open
Abstract
We establish a mechanistic account of how the mature human brain functionally reorganizes to acquire and represent new speech sounds. Native speakers of English learned to categorize Mandarin lexical tone categories produced by multiple talkers using trial-by-trial feedback. We hypothesized that the corticostriatal system is a key intermediary in mediating temporal lobe plasticity and the acquisition of new speech categories in adulthood. We conducted a functional magnetic resonance imaging experiment in which participants underwent a sound-to-category mapping task. Diffusion tensor imaging data were collected, and probabilistic fiber tracking analysis was employed to assay the auditory corticostriatal pathways. Multivariate pattern analysis showed that talker-invariant novel tone category representations emerged in the left superior temporal gyrus (LSTG) within a few hundred training trials. Univariate analysis showed that the putamen, a subregion of the striatum, was sensitive to positive feedback in correctly categorized trials. With learning, functional coupling between the putamen and LSTG increased during error processing. Furthermore, fiber tractography demonstrated robust structural connectivity between the feedback-sensitive striatal regions and the LSTG regions that represent the newly learned tone categories. Our convergent findings highlight a critical role for the auditory corticostriatal circuitry in mediating the acquisition of new speech categories.
Collapse
Affiliation(s)
- Gangyi Feng
- Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Hong Kong SAR, China.,Brain and Mind Institute, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Han Gyol Yi
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Bharath Chandrasekaran
- Department of Communication Science and Disorders, School of Health and Rehabilitation Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA
| |
Collapse
|
31
|
Saltzman DI, Myers EB. Neural Representation of Articulable and Inarticulable Novel Sound Contrasts: The Role of the Dorsal Stream. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2020; 1:339-364. [PMID: 35784619 PMCID: PMC9248853 DOI: 10.1162/nol_a_00016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Accepted: 05/23/2020] [Indexed: 06/15/2023]
Abstract
The extent that articulatory information embedded in incoming speech contributes to the formation of new perceptual categories for speech sounds has been a matter of discourse for decades. It has been theorized that the acquisition of new speech sound categories requires a network of sensory and speech motor cortical areas (the "dorsal stream") to successfully integrate auditory and articulatory information. However, it is possible that these brain regions are not sensitive specifically to articulatory information, but instead are sensitive to the abstract phonological categories being learned. We tested this hypothesis by training participants over the course of several days on an articulable non-native speech contrast and acoustically matched inarticulable nonspeech analogues. After reaching comparable levels of proficiency with the two sets of stimuli, activation was measured in fMRI as participants passively listened to both sound types. Decoding of category membership for the articulable speech contrast alone revealed a series of left and right hemisphere regions outside of the dorsal stream that have previously been implicated in the emergence of non-native speech sound categories, while no regions could successfully decode the inarticulable nonspeech contrast. Although activation patterns in the left inferior frontal gyrus, the middle temporal gyrus, and the supplementary motor area provided better information for decoding articulable (speech) sounds compared to the inarticulable (sine wave) sounds, the finding that dorsal stream regions do not emerge as good decoders of the articulable contrast alone suggests that other factors, including the strength and structure of the emerging speech categories are more likely drivers of dorsal stream activation for novel sound learning.
Collapse
|
32
|
Joint Representation of Spatial and Phonetic Features in the Human Core Auditory Cortex. Cell Rep 2020; 24:2051-2062.e2. [PMID: 30134167 DOI: 10.1016/j.celrep.2018.07.076] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Revised: 04/09/2018] [Accepted: 07/22/2018] [Indexed: 12/12/2022] Open
Abstract
The human auditory cortex simultaneously processes speech and determines the location of a speaker in space. Neuroimaging studies in humans have implicated core auditory areas in processing the spectrotemporal and the spatial content of sound; however, how these features are represented together is unclear. We recorded directly from human subjects implanted bilaterally with depth electrodes in core auditory areas as they listened to speech from different directions. We found local and joint selectivity to spatial and spectrotemporal speech features, where the spatial and spectrotemporal features are organized independently of each other. This representation enables successful decoding of both spatial and phonetic information. Furthermore, we found that the location of the speaker does not change the spectrotemporal tuning of the electrodes but, rather, modulates their mean response level. Our findings contribute to defining the functional organization of responses in the human auditory cortex, with implications for more accurate neurophysiological models of speech processing.
Collapse
|
33
|
Kowialiewski B, Van Calster L, Attout L, Phillips C, Majerus S. Neural Patterns in Linguistic Cortices Discriminate the Content of Verbal Working Memory. Cereb Cortex 2019; 30:2997-3014. [PMID: 31813984 DOI: 10.1093/cercor/bhz290] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Revised: 09/16/2019] [Accepted: 06/17/2019] [Indexed: 01/11/2023] Open
Abstract
An influential theoretical account of working memory (WM) considers that WM is based on direct activation of long-term memory knowledge. While there is empirical support for this position in the visual WM domain, direct evidence is scarce in the verbal WM domain. This question is critical for models of verbal WM, as the question of whether short-term maintenance of verbal information relies on direct activation within the long-term linguistic knowledge base or not is still debated. In this study, we examined the extent to which short-term maintenance of lexico-semantic knowledge relies on neural activation patterns in linguistic cortices, and this by using a fast encoding running span task for word and nonword stimuli minimizing strategic encoding mechanisms. Multivariate analyses showed specific neural patterns for the encoding and maintenance of word versus nonword stimuli. These patterns were not detectable anymore when participants were instructed to stop maintaining the memoranda. The patterns involved specific regions within the dorsal and ventral pathways, which are considered to support phonological and semantic processing to various degrees. This study provides novel evidence for a role of linguistic cortices in the representation of long-term memory linguistic knowledge during WM processing.
Collapse
Affiliation(s)
- Benjamin Kowialiewski
- University of Liège, Liège, Belgium.,Fund for Scientific Research-F.R.S.-FNRS, Brussels, Belgium
| | - Laurens Van Calster
- University of Liège, Liège, Belgium.,University of Geneva, Geneva, Switzerland
| | | | - Christophe Phillips
- University of Liège, Liège, Belgium.,Fund for Scientific Research-F.R.S.-FNRS, Brussels, Belgium
| | - Steve Majerus
- University of Liège, Liège, Belgium.,Fund for Scientific Research-F.R.S.-FNRS, Brussels, Belgium
| |
Collapse
|
34
|
Feng G, Gan Z, Wang S, Wong PCM, Chandrasekaran B. Task-General and Acoustic-Invariant Neural Representation of Speech Categories in the Human Brain. Cereb Cortex 2019; 28:3241-3254. [PMID: 28968658 DOI: 10.1093/cercor/bhx195] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2016] [Accepted: 07/13/2017] [Indexed: 11/14/2022] Open
Abstract
A significant neural challenge in speech perception includes extracting discrete phonetic categories from continuous and multidimensional signals despite varying task demands and surface-acoustic variability. While neural representations of speech categories have been previously identified in frontal and posterior temporal-parietal regions, the task dependency and dimensional specificity of these neural representations are still unclear. Here, we asked native Mandarin participants to listen to speech syllables carrying 4 distinct lexical tone categories across passive listening, repetition, and categorization tasks while they underwent functional magnetic resonance imaging (fMRI). We used searchlight classification and representational similarity analysis (RSA) to identify the dimensional structure underlying neural representation across tasks and surface-acoustic properties. Searchlight classification analyses revealed significant "cross-task" lexical tone decoding within the bilateral superior temporal gyrus (STG) and left inferior parietal lobule (LIPL). RSA revealed that the LIPL and LSTG, in contrast to the RSTG, relate to 2 critical dimensions (pitch height, pitch direction) underlying tone perception. Outside this core representational network, we found greater activation in the inferior frontal and parietal regions for stimuli that are more perceptually similar during tone categorization. Our findings reveal the specific characteristics of fronto-tempo-parietal regions that support speech representation and categorization processing.
Collapse
Affiliation(s)
- Gangyi Feng
- Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China.,Brain and Mind Institute, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China.,Department of Communication Sciences & Disorders, Moody College of Communication, The University of Texas at Austin, 2504A Whitis Avenue (A1100), Austin, TX, USA
| | - Zhenzhong Gan
- Center for the Study of Applied Psychology and School of Psychology, South China Normal University, Guangzhou, China
| | - Suiping Wang
- Center for the Study of Applied Psychology and School of Psychology, South China Normal University, Guangzhou, China.,Guangdong Provincial Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou, China
| | - Patrick C M Wong
- Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China.,Brain and Mind Institute, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China
| | - Bharath Chandrasekaran
- Department of Communication Sciences & Disorders, Moody College of Communication, The University of Texas at Austin, 2504A Whitis Avenue (A1100), Austin, TX, USA.,Department of Psychology, The University of Texas at Austin, 108 E. Dean Keeton Stop, Austin, TX, USA.,Department of Linguistics, The University of Texas at Austin, 305 E. 23rd Street STOP, Austin, TX, USA.,Institute for Mental Health Research, College of Liberal Arts, The University of Texas at Austin, 305 E. 23rd St. Stop, Austin, TX, USA.,The Institute for Neuroscience, The University of Texas at Austin, 1 University Station Stop, Austin, TX, USA
| |
Collapse
|
35
|
Yi HG, Leonard MK, Chang EF. The Encoding of Speech Sounds in the Superior Temporal Gyrus. Neuron 2019; 102:1096-1110. [PMID: 31220442 PMCID: PMC6602075 DOI: 10.1016/j.neuron.2019.04.023] [Citation(s) in RCA: 171] [Impact Index Per Article: 34.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Revised: 04/08/2019] [Accepted: 04/16/2019] [Indexed: 01/02/2023]
Abstract
The human superior temporal gyrus (STG) is critical for extracting meaningful linguistic features from speech input. Local neural populations are tuned to acoustic-phonetic features of all consonants and vowels and to dynamic cues for intonational pitch. These populations are embedded throughout broader functional zones that are sensitive to amplitude-based temporal cues. Beyond speech features, STG representations are strongly modulated by learned knowledge and perceptual goals. Currently, a major challenge is to understand how these features are integrated across space and time in the brain during natural speech comprehension. We present a theory that temporally recurrent connections within STG generate context-dependent phonological representations, spanning longer temporal sequences relevant for coherent percepts of syllables, words, and phrases.
Collapse
Affiliation(s)
- Han Gyol Yi
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA
| | - Matthew K Leonard
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA.
| |
Collapse
|
36
|
Rampinini AC, Handjaras G, Leo A, Cecchetti L, Betta M, Marotta G, Ricciardi E, Pietrini P. Formant Space Reconstruction From Brain Activity in Frontal and Temporal Regions Coding for Heard Vowels. Front Hum Neurosci 2019; 13:32. [PMID: 30837851 PMCID: PMC6383050 DOI: 10.3389/fnhum.2019.00032] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Accepted: 01/21/2019] [Indexed: 11/29/2022] Open
Abstract
Classical studies have isolated a distributed network of temporal and frontal areas engaged in the neural representation of speech perception and production. With modern literature arguing against unique roles for these cortical regions, different theories have favored either neural code-sharing or cortical space-sharing, thus trying to explain the intertwined spatial and functional organization of motor and acoustic components across the fronto-temporal cortical network. In this context, the focus of attention has recently shifted toward specific model fitting, aimed at motor and/or acoustic space reconstruction in brain activity within the language network. Here, we tested a model based on acoustic properties (formants), and one based on motor properties (articulation parameters), where model-free decoding of evoked fMRI activity during perception, imagery, and production of vowels had been successful. Results revealed that phonological information organizes around formant structure during the perception of vowels; interestingly, such a model was reconstructed in a broad temporal region, outside of the primary auditory cortex, but also in the pars triangularis of the left inferior frontal gyrus. Conversely, articulatory features were not associated with brain activity in these regions. Overall, our results call for a degree of interdependence based on acoustic information, between the frontal and temporal ends of the language network.
Collapse
Affiliation(s)
| | | | - Andrea Leo
- IMT School for Advanced Studies Lucca, Lucca, Italy
| | | | - Monica Betta
- IMT School for Advanced Studies Lucca, Lucca, Italy
| | - Giovanna Marotta
- Department of Philology, Literature and Linguistics, University of Pisa, Pisa, Italy
| | | | | |
Collapse
|
37
|
Flinker A, Doyle WK, Mehta AD, Devinsky O, Poeppel D. Spectrotemporal modulation provides a unifying framework for auditory cortical asymmetries. Nat Hum Behav 2019; 3:393-405. [PMID: 30971792 PMCID: PMC6650286 DOI: 10.1038/s41562-019-0548-z] [Citation(s) in RCA: 71] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2018] [Accepted: 01/28/2019] [Indexed: 11/29/2022]
Abstract
The principles underlying functional asymmetries in cortex remain debated. For example, it is accepted that speech is processed bilaterally in auditory cortex, but a left hemisphere dominance emerges when the input is interpreted linguistically. The mechanisms, however, are contested: what sound features or processing principles underlie laterality? Recent findings across species (humans, canines, bats) provide converging evidence that spectrotemporal sound features drive asymmetrical responses. Typically, accounts invoke models wherein the hemispheres differ in time-frequency resolution or integration window size. We develop a framework that builds on and unifies prevailing models, using spectrotemporal modulation space. Using signal processing techniques motivated by neural responses, we test this approach employing behavioral and neurophysiological measures. We show how psychophysical judgments align with spectrotemporal modulations and then characterize the neural sensitivities to temporal and spectral modulations. We demonstrate differential contributions from both hemispheres, with a left lateralization for temporal modulations and a weaker right lateralization for spectral modulations. We argue that representations in the modulation domain provide a more mechanistic basis to account for lateralization in auditory cortex.
Collapse
Affiliation(s)
- Adeen Flinker
- Department of Psychology, New York University, New York, NY, USA. .,Department of Neurology, New York University School of Medicine, New York, NY, USA.
| | - Werner K Doyle
- Department of Neurosurgery, New York University School of Medicine, New York, NY, USA
| | - Ashesh D Mehta
- Department of Neurosurgery, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Manhasset, NY, USA
| | - Orrin Devinsky
- Department of Neurology, New York University School of Medicine, New York, NY, USA
| | - David Poeppel
- Department of Psychology, New York University, New York, NY, USA.,Max Planck Institute for Empirical Aesthetics, Frankfurt, Germany
| |
Collapse
|
38
|
Buchsbaum BR, D'Esposito M. A sensorimotor view of verbal working memory. Cortex 2019; 112:134-148. [DOI: 10.1016/j.cortex.2018.11.010] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2018] [Revised: 10/09/2018] [Accepted: 11/11/2018] [Indexed: 12/16/2022]
|
39
|
McCloy DR, Lee AKC. Investigating the fit between phonological feature systems and brain responses to speech using EEG. LANGUAGE, COGNITION AND NEUROSCIENCE 2019; 34:662-676. [PMID: 32984429 PMCID: PMC7518517 DOI: 10.1080/23273798.2019.1569246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Accepted: 01/03/2019] [Indexed: 06/11/2023]
Abstract
This paper describes a technique to assess the correspondence between patterns of similarity in the brain's response to speech sounds and the patterns of similarity encoded in phonological feature systems, by quantifying the recoverability of phonological features from the neural data using supervised learning. The technique is applied to EEG recordings collected during passive listening to consonant-vowel syllables. Three published phonological feature systems are compared, and are shown to differ in their ability to recover certain speech sound contrasts from the neural data. For the phonological feature system that best reflects patterns of similarity in the neural data, a leave-one-out analysis indicates some consistency across subjects in which features have greatest impact on the fit, but considerable across-subject heterogeneity remains in the rank ordering of features in this regard.
Collapse
Affiliation(s)
- Daniel R McCloy
- University of Washington, Institute for Learning and Brain Sciences, Seattle, WA, United States
| | - Adrian K C Lee
- University of Washington, Institute for Learning and Brain Sciences, Seattle, WA, United States
| |
Collapse
|
40
|
Abbott NT, Shahin AJ. Cross-modal phonetic encoding facilitates the McGurk illusion and phonemic restoration. J Neurophysiol 2018; 120:2988-3000. [PMID: 30303762 DOI: 10.1152/jn.00262.2018] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
In spoken language, audiovisual (AV) perception occurs when the visual modality influences encoding of acoustic features (e.g., phonetic representations) at the auditory cortex. We examined how visual speech (mouth movements) transforms phonetic representations, indexed by changes to the N1 auditory evoked potential (AEP). EEG was acquired while human subjects watched and listened to videos of a speaker uttering consonant vowel (CV) syllables, /ba/ and /wa/, presented in auditory-only or AV congruent or incongruent contexts or in a context in which the consonants were replaced by white noise (noise replaced). Subjects reported whether they heard "ba" or "wa." We hypothesized that the auditory N1 amplitude during illusory perception (caused by incongruent AV input, as in the McGurk illusion, or white noise-replaced consonants in CV utterances) should shift to reflect the auditory N1 characteristics of the phonemes conveyed visually (by mouth movements) as opposed to acoustically. Indeed, the N1 AEP became larger and occurred earlier when listeners experienced illusory "ba" (video /ba/, audio /wa/, heard as "ba") and vice versa when they experienced illusory "wa" (video /wa/, audio /ba/, heard as "wa"), mirroring the N1 AEP characteristics for /ba/ and /wa/ observed in natural acoustic situations (e.g., auditory-only setting). This visually mediated N1 behavior was also observed for noise-replaced CVs. Taken together, the findings suggest that information relayed by the visual modality modifies phonetic representations at the auditory cortex and that similar neural mechanisms support the McGurk illusion and visually mediated phonemic restoration. NEW & NOTEWORTHY Using a variant of the McGurk illusion experimental design (using the syllables /ba/ and /wa/), we demonstrate that lipreading influences phonetic encoding at the auditory cortex. We show that the N1 auditory evoked potential morphology shifts to resemble the N1 morphology of the syllable conveyed visually. We also show similar N1 shifts when the consonants are replaced by white noise, suggesting that the McGurk illusion and the visually mediated phonemic restoration rely on common mechanisms.
Collapse
Affiliation(s)
- Noelle T Abbott
- Center for Mind and Brain, University of California, Davis, California.,San Diego State University-University of California, San Diego Joint Doctoral Program in Language and Communicative Disorders, San Diego, California
| | - Antoine J Shahin
- Center for Mind and Brain, University of California, Davis, California.,Department of Cognitive and Information Sciences, University of California, Merced, California
| |
Collapse
|
41
|
Fisher JM, Dick FK, Levy DF, Wilson SM. Neural representation of vowel formants in tonotopic auditory cortex. Neuroimage 2018; 178:574-582. [PMID: 29860083 DOI: 10.1016/j.neuroimage.2018.05.072] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2018] [Revised: 05/29/2018] [Accepted: 05/30/2018] [Indexed: 11/25/2022] Open
Abstract
Speech sounds are encoded by distributed patterns of activity in bilateral superior temporal cortex. However, it is unclear whether speech sounds are topographically represented in cortex, or which acoustic or phonetic dimensions might be spatially mapped. Here, using functional MRI, we investigated the potential spatial representation of vowels, which are largely distinguished from one another by the frequencies of their first and second formants, i.e. peaks in their frequency spectra. This allowed us to generate clear hypotheses about the representation of specific vowels in tonotopic regions of auditory cortex. We scanned participants as they listened to multiple natural tokens of the vowels [ɑ] and [i], which we selected because their first and second formants overlap minimally. Formant-based regions of interest were defined for each vowel based on spectral analysis of the vowel stimuli and independently acquired tonotopic maps for each participant. We found that perception of [ɑ] and [i] yielded differential activation of tonotopic regions corresponding to formants of [ɑ] and [i], such that each vowel was associated with increased signal in tonotopic regions corresponding to its own formants. This pattern was observed in Heschl's gyrus and the superior temporal gyrus, in both hemispheres, and for both the first and second formants. Using linear discriminant analysis of mean signal change in formant-based regions of interest, the identity of untrained vowels was predicted with ∼73% accuracy. Our findings show that cortical encoding of vowels is scaffolded on tonotopy, a fundamental organizing principle of auditory cortex that is not language-specific.
Collapse
Affiliation(s)
- Julia M Fisher
- Department of Linguistics, University of Arizona, Tucson, AZ, USA; Statistics Consulting Laboratory, BIO5 Institute, University of Arizona, Tucson, AZ, USA
| | - Frederic K Dick
- Department of Psychological Sciences, Birkbeck College, University of London, UK; Birkbeck-UCL Center for Neuroimaging, London, UK; Department of Experimental Psychology, University College London, UK
| | - Deborah F Levy
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Stephen M Wilson
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN, USA.
| |
Collapse
|
42
|
Sensorimotor Representation of Speech Perception. Cross-Decoding of Place of Articulation Features during Selective Attention to Syllables in 7T fMRI. eNeuro 2018; 5:eN-NWR-0252-17. [PMID: 29610768 PMCID: PMC5880028 DOI: 10.1523/eneuro.0252-17.2018] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2017] [Revised: 02/09/2018] [Accepted: 02/14/2018] [Indexed: 12/25/2022] Open
Abstract
Sensorimotor integration, the translation between acoustic signals and motoric programs, may constitute a crucial mechanism for speech. During speech perception, the acoustic-motoric translations include the recruitment of cortical areas for the representation of speech articulatory features, such as place of articulation. Selective attention can shape the processing and performance of speech perception tasks. Whether and where sensorimotor integration takes place during attentive speech perception remains to be explored. Here, we investigate articulatory feature representations of spoken consonant-vowel (CV) syllables during two distinct tasks. Fourteen healthy humans attended to either the vowel or the consonant within a syllable in separate delayed-match-to-sample tasks. Single-trial fMRI blood oxygenation level-dependent (BOLD) responses from perception periods were analyzed using multivariate pattern classification and a searchlight approach to reveal neural activation patterns sensitive to the processing of place of articulation (i.e., bilabial/labiodental vs. alveolar). To isolate place of articulation representation from acoustic covariation, we applied a cross-decoding (generalization) procedure across distinct features of manner of articulation (i.e., stop, fricative, and nasal). We found evidence for the representation of place of articulation across tasks and in both tasks separately: for attention to vowels, generalization maps included bilateral clusters of superior and posterior temporal, insular, and frontal regions; for attention to consonants, generalization maps encompassed clusters in temporoparietal, insular, and frontal regions within the right hemisphere only. Our results specify the cortical representation of place of articulation features generalized across manner of articulation during attentive syllable perception, thus supporting sensorimotor integration during attentive speech perception and demonstrating the value of generalization.
Collapse
|
43
|
Mobus GE. Teaching systems thinking to general education 1 1‘General education’ is a term applied to curricula that may be required of first-year college students in which they are required to take courses in liberal arts and a variety of sciences and math to ensure that the students enter a major field of study with more broadly applicable thinking and communication skills. See Wikipedia: https://en.wikipedia.org/wiki/Curriculum#United_States_2 for background. Accessed 12/18/2017. students. Ecol Modell 2018. [DOI: 10.1016/j.ecolmodel.2018.01.013] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
44
|
Yue Q, Martin RC, Hamilton AC, Rose NS. Non-perceptual Regions in the Left Inferior Parietal Lobe Support Phonological Short-term Memory: Evidence for a Buffer Account? Cereb Cortex 2018. [DOI: 10.1093/cercor/bhy037] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Affiliation(s)
- Qiuhai Yue
- Department of Psychology, Rice University, MS-25, P.O. Box 1892, Houston, TX, USA
| | - Randi C Martin
- Department of Psychology, Rice University, MS-25, P.O. Box 1892, Houston, TX, USA
| | - A Cris Hamilton
- Department of Psychology, Rice University, MS-25, P.O. Box 1892, Houston, TX, USA
| | - Nathan S Rose
- Department of Psychology, University of Notre Dame, Notre Dame, IN, USA
| |
Collapse
|
45
|
Carey D, Miquel ME, Evans BG, Adank P, McGettigan C. Vocal Tract Images Reveal Neural Representations of Sensorimotor Transformation During Speech Imitation. Cereb Cortex 2018; 27:3064-3079. [PMID: 28334401 PMCID: PMC5939209 DOI: 10.1093/cercor/bhx056] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2016] [Indexed: 12/23/2022] Open
Abstract
Imitating speech necessitates the transformation from sensory targets to vocal tract motor output, yet little is known about the representational basis of this process in the human brain. Here, we address this question by using real-time MR imaging (rtMRI) of the vocal tract and functional MRI (fMRI) of the brain in a speech imitation paradigm. Participants trained on imitating a native vowel and a similar nonnative vowel that required lip rounding. Later, participants imitated these vowels and an untrained vowel pair during separate fMRI and rtMRI runs. Univariate fMRI analyses revealed that regions including left inferior frontal gyrus were more active during sensorimotor transformation (ST) and production of nonnative vowels, compared with native vowels; further, ST for nonnative vowels activated somatomotor cortex bilaterally, compared with ST of native vowels. Using test representational similarity analysis (RSA) models constructed from participants’ vocal tract images and from stimulus formant distances, we found that RSA searchlight analyses of fMRI data showed either type of model could be represented in somatomotor, temporal, cerebellar, and hippocampal neural activation patterns during ST. We thus provide the first evidence of widespread and robust cortical and subcortical neural representation of vocal tract and/or formant parameters, during prearticulatory ST.
Collapse
Affiliation(s)
- Daniel Carey
- Department of Psychology, Royal Holloway, University of London, London TW20 0EX, UK.,Combined Universities Brain Imaging Centre, Royal Holloway, University of London, London TW20 0EX, UK.,The Irish Longitudinal Study on Ageing (TILDA), Department of Medical Gerontology, Trinity College Dublin, Dublin, Ireland
| | - Marc E Miquel
- William Harvey Research Institute, Queen Mary, University of London, London EC1M 6BQ, UK.,Clinical Physics, Barts Health NHS Trust, London EC1A 7BE, UK
| | - Bronwen G Evans
- Department of Speech, Hearing & Phonetic Sciences, University College London, London WC1E 6BT, UK
| | - Patti Adank
- Department of Speech, Hearing & Phonetic Sciences, University College London, London WC1E 6BT, UK
| | - Carolyn McGettigan
- Department of Psychology, Royal Holloway, University of London, London TW20 0EX, UK.,Combined Universities Brain Imaging Centre, Royal Holloway, University of London, London TW20 0EX, UK.,Institute of Cognitive Neuroscience, University College London, London WC1N 3AR, UK
| |
Collapse
|
46
|
Focal versus distributed temporal cortex activity for speech sound category assignment. Proc Natl Acad Sci U S A 2018; 115:E1299-E1308. [PMID: 29363598 PMCID: PMC5819402 DOI: 10.1073/pnas.1714279115] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
When listening to speech, phonemes are represented in a distributed fashion in our temporal and prefrontal cortices. How these representations are selected in a phonemic decision context, and in particular whether distributed or focal neural information is required for explicit phoneme recognition, is unclear. We hypothesized that focal and early neural encoding of acoustic signals is sufficiently informative to access speech sound representations and permit phoneme recognition. We tested this hypothesis by combining a simple speech-phoneme categorization task with univariate and multivariate analyses of fMRI, magnetoencephalography, intracortical, and clinical data. We show that neural information available focally in the temporal cortex prior to decision-related neural activity is specific enough to account for human phonemic identification. Percepts and words can be decoded from distributed neural activity measures. However, the existence of widespread representations might conflict with the more classical notions of hierarchical processing and efficient coding, which are especially relevant in speech processing. Using fMRI and magnetoencephalography during syllable identification, we show that sensory and decisional activity colocalize to a restricted part of the posterior superior temporal gyrus (pSTG). Next, using intracortical recordings, we demonstrate that early and focal neural activity in this region distinguishes correct from incorrect decisions and can be machine-decoded to classify syllables. Crucially, significant machine decoding was possible from neuronal activity sampled across different regions of the temporal and frontal lobes, despite weak or absent sensory or decision-related responses. These findings show that speech-sound categorization relies on an efficient readout of focal pSTG neural activity, while more distributed activity patterns, although classifiable by machine learning, instead reflect collateral processes of sensory perception and decision.
Collapse
|
47
|
Neural Mechanisms Underlying Cross-Modal Phonetic Encoding. J Neurosci 2017; 38:1835-1849. [PMID: 29263241 DOI: 10.1523/jneurosci.1566-17.2017] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Revised: 11/17/2017] [Accepted: 12/08/2017] [Indexed: 11/21/2022] Open
Abstract
Audiovisual (AV) integration is essential for speech comprehension, especially in adverse listening situations. Divergent, but not mutually exclusive, theories have been proposed to explain the neural mechanisms underlying AV integration. One theory advocates that this process occurs via interactions between the auditory and visual cortices, as opposed to fusion of AV percepts in a multisensory integrator. Building upon this idea, we proposed that AV integration in spoken language reflects visually induced weighting of phonetic representations at the auditory cortex. EEG was recorded while male and female human subjects watched and listened to videos of a speaker uttering consonant vowel (CV) syllables /ba/ and /fa/, presented in Auditory-only, AV congruent or incongruent contexts. Subjects reported whether they heard /ba/ or /fa/. We hypothesized that vision alters phonetic encoding by dynamically weighting which phonetic representation in the auditory cortex is strengthened or weakened. That is, when subjects are presented with visual /fa/ and acoustic /ba/ and hear /fa/ (illusion-fa), the visual input strengthens the weighting of the phone /f/ representation. When subjects are presented with visual /ba/ and acoustic /fa/ and hear /ba/ (illusion-ba), the visual input weakens the weighting of the phone /f/ representation. Indeed, we found an enlarged N1 auditory evoked potential when subjects perceived illusion-ba, and a reduced N1 when they perceived illusion-fa, mirroring the N1 behavior for /ba/ and /fa/ in Auditory-only settings. These effects were especially pronounced in individuals with more robust illusory perception. These findings provide evidence that visual speech modifies phonetic encoding at the auditory cortex.SIGNIFICANCE STATEMENT The current study presents evidence that audiovisual integration in spoken language occurs when one modality (vision) acts on representations of a second modality (audition). Using the McGurk illusion, we show that visual context primes phonetic representations at the auditory cortex, altering the auditory percept, evidenced by changes in the N1 auditory evoked potential. This finding reinforces the theory that audiovisual integration occurs via visual networks influencing phonetic representations in the auditory cortex. We believe that this will lead to the generation of new hypotheses regarding cross-modal mapping, particularly whether it occurs via direct or indirect routes (e.g., via a multisensory mediator).
Collapse
|
48
|
Rampinini AC, Handjaras G, Leo A, Cecchetti L, Ricciardi E, Marotta G, Pietrini P. Functional and spatial segregation within the inferior frontal and superior temporal cortices during listening, articulation imagery, and production of vowels. Sci Rep 2017; 7:17029. [PMID: 29208951 PMCID: PMC5717247 DOI: 10.1038/s41598-017-17314-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2017] [Accepted: 11/24/2017] [Indexed: 11/09/2022] Open
Abstract
Classical models of language localize speech perception in the left superior temporal and production in the inferior frontal cortex. Nonetheless, neuropsychological, structural and functional studies have questioned such subdivision, suggesting an interwoven organization of the speech function within these cortices. We tested whether sub-regions within frontal and temporal speech-related areas retain specific phonological representations during both perception and production. Using functional magnetic resonance imaging and multivoxel pattern analysis, we showed functional and spatial segregation across the left fronto-temporal cortex during listening, imagery and production of vowels. In accordance with classical models of language and evidence from functional studies, the inferior frontal and superior temporal cortices discriminated among perceived and produced vowels respectively, also engaging in the non-classical, alternative function - i.e. perception in the inferior frontal and production in the superior temporal cortex. Crucially, though, contiguous and non-overlapping sub-regions within these hubs performed either the classical or non-classical function, the latter also representing non-linguistic sounds (i.e., pure tones). Extending previous results and in line with integration theories, our findings not only demonstrate that sensitivity to speech listening exists in production-related regions and vice versa, but they also suggest that the nature of such interwoven organisation is built upon low-level perception.
Collapse
Affiliation(s)
| | | | - Andrea Leo
- IMT School for Advanced Studies, Lucca, 55100, Italy
| | | | | | - Giovanna Marotta
- Department of Philology, Literature and Linguistics, University of Pisa, Pisa, 56100, Italy
| | | |
Collapse
|
49
|
Tanaka S, Kirino E. The parietal opercular auditory-sensorimotor network in musicians: A resting-state fMRI study. Brain Cogn 2017; 120:43-47. [PMID: 29122368 DOI: 10.1016/j.bandc.2017.11.001] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2016] [Revised: 10/04/2017] [Accepted: 11/01/2017] [Indexed: 01/09/2023]
Abstract
Auditory-sensorimotor coupling is critical for musical performance, during which auditory and somatosensory feedback signals are used to ensure desired outputs. Previous studies reported opercular activation in subjects performing or listening to music. A functional connectivity analysis suggested the parietal operculum (PO) as a connector hub that links auditory, somatosensory, and motor cortical areas. We therefore examined whether this PO network differs between musicians and non-musicians. We analyzed resting-state PO functional connectivity with Heschl's gyrus (HG), the planum temporale (PT), the precentral gyrus (preCG), and the postcentral gyrus (postCG) in 35 musicians and 35 non-musicians. In musicians, the left PO exhibited increased functional connectivity with the ipsilateral HG, PT, preCG, and postCG, whereas the right PO exhibited enhanced functional connectivity with the contralateral HG, preCG, and postCG and the ipsilateral postCG. Direct functional connectivity between an auditory area (the HG or PT) and a sensorimotor area (the preCG or postCG) did not significantly differ between the groups. The PO's functional connectivity with auditory and sensorimotor areas is enhanced in musicians relative to non-musicians. We propose that the PO network facilitates musical performance by mediating multimodal integration for modulating auditory-sensorimotor control.
Collapse
Affiliation(s)
- Shoji Tanaka
- Department of Information and Communication Sciences, Sophia University, Tokyo 102-0081, Japan.
| | - Eiji Kirino
- Department of Psychiatry, Juntendo University School of Medicine, Tokyo 113-8431, Japan; Juntendo Shizuoka Hospital, Shizuoka 410-2211, Japan
| |
Collapse
|
50
|
Wingfield C, Su L, Liu X, Zhang C, Woodland P, Thwaites A, Fonteneau E, Marslen-Wilson WD. Relating dynamic brain states to dynamic machine states: Human and machine solutions to the speech recognition problem. PLoS Comput Biol 2017; 13:e1005617. [PMID: 28945744 PMCID: PMC5612454 DOI: 10.1371/journal.pcbi.1005617] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2016] [Accepted: 06/12/2017] [Indexed: 01/06/2023] Open
Abstract
There is widespread interest in the relationship between the neurobiological systems supporting human cognition and emerging computational systems capable of emulating these capacities. Human speech comprehension, poorly understood as a neurobiological process, is an important case in point. Automatic Speech Recognition (ASR) systems with near-human levels of performance are now available, which provide a computationally explicit solution for the recognition of words in continuous speech. This research aims to bridge the gap between speech recognition processes in humans and machines, using novel multivariate techniques to compare incremental 'machine states', generated as the ASR analysis progresses over time, to the incremental 'brain states', measured using combined electro- and magneto-encephalography (EMEG), generated as the same inputs are heard by human listeners. This direct comparison of dynamic human and machine internal states, as they respond to the same incrementally delivered sensory input, revealed a significant correspondence between neural response patterns in human superior temporal cortex and the structural properties of ASR-derived phonetic models. Spatially coherent patches in human temporal cortex responded selectively to individual phonetic features defined on the basis of machine-extracted regularities in the speech to lexicon mapping process. These results demonstrate the feasibility of relating human and ASR solutions to the problem of speech recognition, and suggest the potential for further studies relating complex neural computations in human speech comprehension to the rapidly evolving ASR systems that address the same problem domain.
Collapse
Affiliation(s)
- Cai Wingfield
- Department of Psychology, University of Cambridge, Cambridge, United Kingdom
- Department of Psychology, University of Lancaster, Lancaster, United Kingdom
- * E-mail: (CW); (LS)
| | - Li Su
- China–UK Centre for Cognition and Ageing Research, Faculty of Psychology, Southwest University, Chongqing, China
- Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom
- * E-mail: (CW); (LS)
| | - Xunying Liu
- Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong, China
- Department of Engineering, University of Cambridge, Cambridge, United Kingdom
| | - Chao Zhang
- Department of Engineering, University of Cambridge, Cambridge, United Kingdom
| | - Phil Woodland
- Department of Engineering, University of Cambridge, Cambridge, United Kingdom
| | - Andrew Thwaites
- Department of Psychology, University of Cambridge, Cambridge, United Kingdom
- MRC Cognition and Brain Sciences Unit, Cambridge, United Kingdom
| | - Elisabeth Fonteneau
- Department of Psychology, University of Cambridge, Cambridge, United Kingdom
- MRC Cognition and Brain Sciences Unit, Cambridge, United Kingdom
| | - William D. Marslen-Wilson
- Department of Psychology, University of Cambridge, Cambridge, United Kingdom
- MRC Cognition and Brain Sciences Unit, Cambridge, United Kingdom
| |
Collapse
|