1
|
Wang J, Suo J, Liu D, Zhao Y, Tian Y, Bryanston-Cross P, Li WJ, Wang Z. A Nanoparticle-Based Artificial Ear for Personalized Classification of Emotions in the Human Voice Using Deep Learning. ACS APPLIED MATERIALS & INTERFACES 2024; 16:51274-51282. [PMID: 39285705 DOI: 10.1021/acsami.4c13223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/28/2024]
Abstract
Artificial intelligence and human-computer interaction advances demand bioinspired sensing modalities capable of comprehending human affective states and speech. However, endowing skin-like interfaces with such intricate perception abilities remains challenging. Here, we have developed a flexible piezoresistive artificial ear (AE) sensor based on gold nanoparticles, which can convert sound signals into electrical signals through changes in resistance. By testing the sensor's performance at both frequency and sound pressure level (SPL), the AE has a frequency response range of 20 Hz to 12 kHz and can sense sound signals from up to 5 m away at a frequency of 1 kHz and an SPL of 126 dB. Furthermore, through deep learning, the device achieves up to 96.9% and 95.0% accuracy in classification and recognition applications for seven emotional and eight urban environmental noises, respectively. Hence, on one hand, our device can monitor the patient's emotional state by their speech, such as sudden yelling and screaming, which can help healthcare workers understand patients' condition in time. On the other hand, the device could also be used for real-time monitoring of noise levels in aircraft, ships, factories, and other high-decibel equipment and environments.
Collapse
Affiliation(s)
- Jianfei Wang
- International Research Centre for Nano Handling and Manufacturing of China, Changchun University of Science and Technology, Changchun, Jilin 130022, China
- School of Engineering, University of Warwick, Coventry CV4 7AL, U.K
| | - Jiao Suo
- CAS-CityU Joint Laboratory for Robotic Research, Department of Mechanical Engineering, City University of Hong Kong, Kowloon, Hong Kong, China
| | - Dongdong Liu
- International Research Centre for Nano Handling and Manufacturing of China, Changchun University of Science and Technology, Changchun, Jilin 130022, China
| | - Yuliang Zhao
- Department of Control Engineering, Northeastern University, Qinhuangdao, Hebei 066004, China
| | - Yanling Tian
- School of Engineering, University of Warwick, Coventry CV4 7AL, U.K
| | | | - Wen Jung Li
- CAS-CityU Joint Laboratory for Robotic Research, Department of Mechanical Engineering, City University of Hong Kong, Kowloon, Hong Kong, China
| | - Zuobin Wang
- International Research Centre for Nano Handling and Manufacturing of China, Changchun University of Science and Technology, Changchun, Jilin 130022, China
| |
Collapse
|
2
|
Karthik G, Cao CZ, Demidenko MI, Jahn A, Stacey WC, Wasade VS, Brang D. Auditory cortex encodes lipreading information through spatially distributed activity. Curr Biol 2024; 34:4021-4032.e5. [PMID: 39153482 PMCID: PMC11387126 DOI: 10.1016/j.cub.2024.07.073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 04/29/2024] [Accepted: 07/19/2024] [Indexed: 08/19/2024]
Abstract
Watching a speaker's face improves speech perception accuracy. This benefit is enabled, in part, by implicit lipreading abilities present in the general population. While it is established that lipreading can alter the perception of a heard word, it is unknown how these visual signals are represented in the auditory system or how they interact with auditory speech representations. One influential, but untested, hypothesis is that visual speech modulates the population-coded representations of phonetic and phonemic features in the auditory system. This model is largely supported by data showing that silent lipreading evokes activity in the auditory cortex, but these activations could alternatively reflect general effects of arousal or attention or the encoding of non-linguistic features such as visual timing information. This gap limits our understanding of how vision supports speech perception. To test the hypothesis that the auditory system encodes visual speech information, we acquired functional magnetic resonance imaging (fMRI) data from healthy adults and intracranial recordings from electrodes implanted in patients with epilepsy during auditory and visual speech perception tasks. Across both datasets, linear classifiers successfully decoded the identity of silently lipread words using the spatial pattern of auditory cortex responses. Examining the time course of classification using intracranial recordings, lipread words were classified at earlier time points relative to heard words, suggesting a predictive mechanism for facilitating speech. These results support a model in which the auditory system combines the joint neural distributions evoked by heard and lipread words to generate a more precise estimate of what was said.
Collapse
Affiliation(s)
- Ganesan Karthik
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Cody Zhewei Cao
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109, USA
| | | | - Andrew Jahn
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109, USA
| | - William C Stacey
- Department of Neurology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Vibhangini S Wasade
- Henry Ford Hospital, Detroit, MI 48202, USA; Department of Neurology, Wayne State University School of Medicine, Detroit, MI 48201, USA
| | - David Brang
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
3
|
Vitória MA, Fernandes FG, van den Boom M, Ramsey N, Raemaekers M. Decoding Single and Paired Phonemes Using 7T Functional MRI. Brain Topogr 2024; 37:731-747. [PMID: 38261272 PMCID: PMC11393141 DOI: 10.1007/s10548-024-01034-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 01/12/2024] [Indexed: 01/24/2024]
Abstract
Several studies have shown that mouth movements related to the pronunciation of individual phonemes are represented in the sensorimotor cortex. This would theoretically allow for brain computer interfaces that are capable of decoding continuous speech by training classifiers based on the activity in the sensorimotor cortex related to the production of individual phonemes. To address this, we investigated the decodability of trials with individual and paired phonemes (pronounced consecutively with one second interval) using activity in the sensorimotor cortex. Fifteen participants pronounced 3 different phonemes and 3 combinations of two of the same phonemes in a 7T functional MRI experiment. We confirmed that support vector machine (SVM) classification of single and paired phonemes was possible. Importantly, by combining classifiers trained on single phonemes, we were able to classify paired phonemes with an accuracy of 53% (33% chance level), demonstrating that activity of isolated phonemes is present and distinguishable in combined phonemes. A SVM searchlight analysis showed that the phoneme representations are widely distributed in the ventral sensorimotor cortex. These findings provide insights about the neural representations of single and paired phonemes. Furthermore, it supports the notion that speech BCI may be feasible based on machine learning algorithms trained on individual phonemes using intracranial electrode grids.
Collapse
Affiliation(s)
- Maria Araújo Vitória
- Brain Center Rudolf Magnus, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Francisco Guerreiro Fernandes
- Brain Center Rudolf Magnus, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Max van den Boom
- Brain Center Rudolf Magnus, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht, The Netherlands
- Department of Physiology and Biomedical Engineering, Mayo Clinic, Rochester, MN, USA
| | - Nick Ramsey
- Brain Center Rudolf Magnus, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Mathijs Raemaekers
- Brain Center Rudolf Magnus, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht, The Netherlands.
| |
Collapse
|
4
|
Roswandowitz C, Kathiresan T, Pellegrino E, Dellwo V, Frühholz S. Cortical-striatal brain network distinguishes deepfake from real speaker identity. Commun Biol 2024; 7:711. [PMID: 38862808 PMCID: PMC11166919 DOI: 10.1038/s42003-024-06372-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 05/22/2024] [Indexed: 06/13/2024] Open
Abstract
Deepfakes are viral ingredients of digital environments, and they can trick human cognition into misperceiving the fake as real. Here, we test the neurocognitive sensitivity of 25 participants to accept or reject person identities as recreated in audio deepfakes. We generate high-quality voice identity clones from natural speakers by using advanced deepfake technologies. During an identity matching task, participants show intermediate performance with deepfake voices, indicating levels of deception and resistance to deepfake identity spoofing. On the brain level, univariate and multivariate analyses consistently reveal a central cortico-striatal network that decoded the vocal acoustic pattern and deepfake-level (auditory cortex), as well as natural speaker identities (nucleus accumbens), which are valued for their social relevance. This network is embedded in a broader neural identity and object recognition network. Humans can thus be partly tricked by deepfakes, but the neurocognitive mechanisms identified during deepfake processing open windows for strengthening human resilience to fake information.
Collapse
Affiliation(s)
- Claudia Roswandowitz
- Cognitive and Affective Neuroscience Unit, Department of Psychology, University of Zurich, Zurich, Switzerland.
- Phonetics and Speech Sciences Group, Department of Computational Linguistics, University of Zurich, Zurich, Switzerland.
- Neuroscience Centre Zurich, University of Zurich and ETH Zurich, Zurich, Switzerland.
| | - Thayabaran Kathiresan
- Centre for Neuroscience of Speech, University Melbourne, Melbourne, Australia
- Redenlab, Melbourne, Australia
| | - Elisa Pellegrino
- Phonetics and Speech Sciences Group, Department of Computational Linguistics, University of Zurich, Zurich, Switzerland
| | - Volker Dellwo
- Phonetics and Speech Sciences Group, Department of Computational Linguistics, University of Zurich, Zurich, Switzerland
| | - Sascha Frühholz
- Cognitive and Affective Neuroscience Unit, Department of Psychology, University of Zurich, Zurich, Switzerland
- Neuroscience Centre Zurich, University of Zurich and ETH Zurich, Zurich, Switzerland
- Department of Psychology, University of Oslo, Oslo, Norway
| |
Collapse
|
5
|
Rupp KM, Hect JL, Harford EE, Holt LL, Ghuman AS, Abel TJ. A hierarchy of processing complexity and timescales for natural sounds in human auditory cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.24.595822. [PMID: 38826304 PMCID: PMC11142240 DOI: 10.1101/2024.05.24.595822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Efficient behavior is supported by humans' ability to rapidly recognize acoustically distinct sounds as members of a common category. Within auditory cortex, there are critical unanswered questions regarding the organization and dynamics of sound categorization. Here, we performed intracerebral recordings in the context of epilepsy surgery as 20 patient-participants listened to natural sounds. We built encoding models to predict neural responses using features of these sounds extracted from different layers within a sound-categorization deep neural network (DNN). This approach yielded highly accurate models of neural responses throughout auditory cortex. The complexity of a cortical site's representation (measured by the depth of the DNN layer that produced the best model) was closely related to its anatomical location, with shallow, middle, and deep layers of the DNN associated with core (primary auditory cortex), lateral belt, and parabelt regions, respectively. Smoothly varying gradients of representational complexity also existed within these regions, with complexity increasing along a posteromedial-to-anterolateral direction in core and lateral belt, and along posterior-to-anterior and dorsal-to-ventral dimensions in parabelt. When we estimated the time window over which each recording site integrates information, we found shorter integration windows in core relative to lateral belt and parabelt. Lastly, we found a relationship between the length of the integration window and the complexity of information processing within core (but not lateral belt or parabelt). These findings suggest hierarchies of timescales and processing complexity, and their interrelationship, represent a functional organizational principle of the auditory stream that underlies our perception of complex, abstract auditory information.
Collapse
Affiliation(s)
- Kyle M. Rupp
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Jasmine L. Hect
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Emily E. Harford
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Lori L. Holt
- Department of Psychology, The University of Texas at Austin, Austin, Texas, United States of America
| | - Avniel Singh Ghuman
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Taylor J. Abel
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Department of Bioengineering, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
6
|
Ten Oever S, Martin AE. Interdependence of "What" and "When" in the Brain. J Cogn Neurosci 2024; 36:167-186. [PMID: 37847823 DOI: 10.1162/jocn_a_02067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2023]
Abstract
From a brain's-eye-view, when a stimulus occurs and what it is are interrelated aspects of interpreting the perceptual world. Yet in practice, the putative perceptual inferences about sensory content and timing are often dichotomized and not investigated as an integrated process. We here argue that neural temporal dynamics can influence what is perceived, and in turn, stimulus content can influence the time at which perception is achieved. This computational principle results from the highly interdependent relationship of what and when in the environment. Both brain processes and perceptual events display strong temporal variability that is not always modeled; we argue that understanding-and, minimally, modeling-this temporal variability is key for theories of how the brain generates unified and consistent neural representations and that we ignore temporal variability in our analysis practice at the peril of both data interpretation and theory-building. Here, we review what and when interactions in the brain, demonstrate via simulations how temporal variability can result in misguided interpretations and conclusions, and outline how to integrate and synthesize what and when in theories and models of brain computation.
Collapse
Affiliation(s)
- Sanne Ten Oever
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- Donders Centre for Cognitive Neuroimaging, Nijmegen, The Netherlands
- Maastricht University, The Netherlands
| | - Andrea E Martin
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- Donders Centre for Cognitive Neuroimaging, Nijmegen, The Netherlands
| |
Collapse
|
7
|
Lee JH, Cho KH, Cho K. Emerging Trends in Soft Electronics: Integrating Machine Intelligence with Soft Acoustic/Vibration Sensors. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2023; 35:e2209673. [PMID: 37043776 DOI: 10.1002/adma.202209673] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 03/22/2023] [Indexed: 06/19/2023]
Abstract
In the last decade, soft acoustic/vibration sensors have gained tremendous research interest due to their unique ability to detect broadband acoustic/vibration stimuli, potentializing futuristic applications including voice biometrics, voice-controlled human-machine-interfaces, electronic skin, and skin-mountable healthcare devices. Importantly, to benefit most from these sensors, it is inevitable to use machine learning (ML) to process their output signals; with ML, a more accurate and efficient interpretation of original data is possible. This paper is dedicated to offering an overview of recent advances empowering the development of soft acoustic/vibration sensors and their signal processing using ML. First, the key performance parameters of the sensors are discussed. Second, popular transduction mechanisms for the sensors are addressed, followed by an in-depth overview of each type, covering materials used, structural designs, and sensing performances. Third, potential applications of the sensors are elaborated and fourth, a thorough discussion on ML is conducted, exploring different types of ML, specific ML algorithms suitable for processing acoustic/vibration signals, and current trends in ML-assisted applications. Finally, the challenges and potential opportunities in soft acoustic/vibration sensor and ML research are revealed to offer new insights into future prospects in these fields.
Collapse
Affiliation(s)
- Jeng-Hun Lee
- Department of Chemical Engineering, Pohang University of Science and Technology, Pohang, 37673, South Korea
| | - Kang Hyuk Cho
- Department of Chemical Engineering, Pohang University of Science and Technology, Pohang, 37673, South Korea
| | - Kilwon Cho
- Department of Chemical Engineering, Pohang University of Science and Technology, Pohang, 37673, South Korea
| |
Collapse
|
8
|
Oganian Y, Bhaya-Grossman I, Johnson K, Chang EF. Vowel and formant representation in the human auditory speech cortex. Neuron 2023; 111:2105-2118.e4. [PMID: 37105171 PMCID: PMC10330593 DOI: 10.1016/j.neuron.2023.04.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 02/08/2023] [Accepted: 04/04/2023] [Indexed: 04/29/2023]
Abstract
Vowels, a fundamental component of human speech across all languages, are cued acoustically by formants, resonance frequencies of the vocal tract shape during speaking. An outstanding question in neurolinguistics is how formants are processed neurally during speech perception. To address this, we collected high-density intracranial recordings from the human speech cortex on the superior temporal gyrus (STG) while participants listened to continuous speech. We found that two-dimensional receptive fields based on the first two formants provided the best characterization of vowel sound representation. Neural activity at single sites was highly selective for zones in this formant space. Furthermore, formant tuning is adjusted dynamically for speaker-specific spectral context. However, the entire population of formant-encoding sites was required to accurately decode single vowels. Overall, our results reveal that complex acoustic tuning in the two-dimensional formant space underlies local vowel representations in STG. As a population code, this gives rise to phonological vowel perception.
Collapse
Affiliation(s)
- Yulia Oganian
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA
| | - Ilina Bhaya-Grossman
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA; University of California, Berkeley-University of California, San Francisco Graduate Program in Bioengineering, Berkeley, CA 94720, USA
| | - Keith Johnson
- Department of Linguistics, University of California, Berkeley, Berkeley, CA, USA
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA.
| |
Collapse
|
9
|
Mill RD, Cole MW. Neural representation dynamics reveal computational principles of cognitive task learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.27.546751. [PMID: 37425922 PMCID: PMC10327096 DOI: 10.1101/2023.06.27.546751] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
During cognitive task learning, neural representations must be rapidly constructed for novel task performance, then optimized for robust practiced task performance. How the geometry of neural representations changes to enable this transition from novel to practiced performance remains unknown. We hypothesized that practice involves a shift from compositional representations (task-general activity patterns that can be flexibly reused across tasks) to conjunctive representations (task-specific activity patterns specialized for the current task). Functional MRI during learning of multiple complex tasks substantiated this dynamic shift from compositional to conjunctive representations, which was associated with reduced cross-task interference (via pattern separation) and behavioral improvement. Further, we found that conjunctions originated in subcortex (hippocampus and cerebellum) and slowly spread to cortex, extending multiple memory systems theories to encompass task representation learning. The formation of conjunctive representations hence serves as a computational signature of learning, reflecting cortical-subcortical dynamics that optimize task representations in the human brain.
Collapse
|
10
|
Polver S, Háden GP, Bulf H, Winkler I, Tóth B. Early maturation of sound duration processing in the infant's brain. Sci Rep 2023; 13:10287. [PMID: 37355709 PMCID: PMC10290631 DOI: 10.1038/s41598-023-36794-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Accepted: 06/12/2023] [Indexed: 06/26/2023] Open
Abstract
The ability to process sound duration is crucial already at a very early age for laying the foundation for the main functions of auditory perception, such as object perception and music and language acquisition. With the availability of age-appropriate structural anatomical templates, we can reconstruct EEG source activity with much-improved reliability. The current study capitalized on this possibility by reconstructing the sources of event-related potential (ERP) waveforms sensitive to sound duration in 4- and 9-month-old infants. Infants were presented with short (200 ms) and long (300 ms) sounds equiprobable delivered in random order. Two temporally separate ERP waveforms were found to be modulated by sound duration. Generators of these waveforms were mainly located in the primary and secondary auditory areas and other language-related regions. The results show marked developmental changes between 4 and 9 months, partly reflected by scalp-recorded ERPs, but appearing in the underlying generators in a far more nuanced way. The results also confirm the feasibility of the application of anatomical templates in developmental populations.
Collapse
Affiliation(s)
- Silvia Polver
- Department of Psychology, University of Milano-Bicocca, Milan, Italy
| | - Gábor P Háden
- Institute of Cognitive Neuroscience and Psychology, Research Center for Natural Sciences, Budapest, Hungary
- Department of Telecommunications and Media Informatics, Faculty of Electrical Engineering and Informatics, Budapest University of Technology and Economics, Budapest, Hungary
| | - Hermann Bulf
- Department of Psychology, University of Milano-Bicocca, Milan, Italy
- NeuroMI, Milan Center for Neuroscience, University of Milano-Bicocca, Milan, Italy
| | - István Winkler
- Institute of Cognitive Neuroscience and Psychology, Research Center for Natural Sciences, Budapest, Hungary
| | - Brigitta Tóth
- Institute of Cognitive Neuroscience and Psychology, Research Center for Natural Sciences, Budapest, Hungary.
| |
Collapse
|
11
|
Schwartz E, Alreja A, Richardson RM, Ghuman A, Anzellotti S. Intracranial Electroencephalography and Deep Neural Networks Reveal Shared Substrates for Representations of Face Identity and Expressions. J Neurosci 2023; 43:4291-4303. [PMID: 37142430 PMCID: PMC10255163 DOI: 10.1523/jneurosci.1277-22.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 03/25/2023] [Accepted: 04/17/2023] [Indexed: 05/06/2023] Open
Abstract
According to a classical view of face perception (Bruce and Young, 1986; Haxby et al., 2000), face identity and facial expression recognition are performed by separate neural substrates (ventral and lateral temporal face-selective regions, respectively). However, recent studies challenge this view, showing that expression valence can also be decoded from ventral regions (Skerry and Saxe, 2014; Li et al., 2019), and identity from lateral regions (Anzellotti and Caramazza, 2017). These findings could be reconciled with the classical view if regions specialized for one task (either identity or expression) contain a small amount of information for the other task (that enables above-chance decoding). In this case, we would expect representations in lateral regions to be more similar to representations in deep convolutional neural networks (DCNNs) trained to recognize facial expression than to representations in DCNNs trained to recognize face identity (the converse should hold for ventral regions). We tested this hypothesis by analyzing neural responses to faces varying in identity and expression. Representational dissimilarity matrices (RDMs) computed from human intracranial recordings (n = 11 adults; 7 females) were compared with RDMs from DCNNs trained to label either identity or expression. We found that RDMs from DCNNs trained to recognize identity correlated with intracranial recordings more strongly in all regions tested-even in regions classically hypothesized to be specialized for expression. These results deviate from the classical view, suggesting that face-selective ventral and lateral regions contribute to the representation of both identity and expression.SIGNIFICANCE STATEMENT Previous work proposed that separate brain regions are specialized for the recognition of face identity and facial expression. However, identity and expression recognition mechanisms might share common brain regions instead. We tested these alternatives using deep neural networks and intracranial recordings from face-selective brain regions. Deep neural networks trained to recognize identity and networks trained to recognize expression learned representations that correlate with neural recordings. Identity-trained representations correlated with intracranial recordings more strongly in all regions tested, including regions hypothesized to be expression specialized in the classical hypothesis. These findings support the view that identity and expression recognition rely on common brain regions. This discovery may require reevaluation of the roles that the ventral and lateral neural pathways play in processing socially relevant stimuli.
Collapse
Affiliation(s)
- Emily Schwartz
- Department of Psychology and Neuroscience, Boston College, Chestnut Hill, Massachusetts 02467
| | - Arish Alreja
- Center for the Neural Basis of Cognition, Carnegie Mellon University/University of Pittsburgh, Pittsburgh, Pennsylvania 15213
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
- Department of Neurological Surgery, University of Pittsburgh Medical Center Presbyterian, Pittsburgh, Pennsylvania 15213
| | - R Mark Richardson
- Department of Neurosurgery, Massachusetts General Hospital, Boston, Massachusetts 02114
- Harvard Medical School, Boston, Massachusetts 02115
| | - Avniel Ghuman
- Center for the Neural Basis of Cognition, Carnegie Mellon University/University of Pittsburgh, Pittsburgh, Pennsylvania 15213
- Department of Neurological Surgery, University of Pittsburgh Medical Center Presbyterian, Pittsburgh, Pennsylvania 15213
- Center for Neuroscience, University of Pittsburgh, Pittsburgh, Pennsylvania 15260
| | - Stefano Anzellotti
- Department of Psychology and Neuroscience, Boston College, Chestnut Hill, Massachusetts 02467
| |
Collapse
|
12
|
Luthra S, Mechtenberg H, Giorio C, Theodore RM, Magnuson JS, Myers EB. Using TMS to evaluate a causal role for right posterior temporal cortex in talker-specific phonetic processing. BRAIN AND LANGUAGE 2023; 240:105264. [PMID: 37087863 PMCID: PMC10286152 DOI: 10.1016/j.bandl.2023.105264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Revised: 04/06/2023] [Accepted: 04/08/2023] [Indexed: 05/03/2023]
Abstract
Theories suggest that speech perception is informed by listeners' beliefs of what phonetic variation is typical of a talker. A previous fMRI study found right middle temporal gyrus (RMTG) sensitivity to whether a phonetic variant was typical of a talker, consistent with literature suggesting that the right hemisphere may play a key role in conditioning phonetic identity on talker information. The current work used transcranial magnetic stimulation (TMS) to test whether the RMTG plays a causal role in processing talker-specific phonetic variation. Listeners were exposed to talkers who differed in how they produced voiceless stop consonants while TMS was applied to RMTG, left MTG, or scalp vertex. Listeners subsequently showed near-ceiling performance in indicating which of two variants was typical of a trained talker, regardless of previous stimulation site. Thus, even though the RMTG is recruited for talker-specific phonetic processing, modulation of its function may have only modest consequences.
Collapse
Affiliation(s)
| | | | | | | | - James S Magnuson
- University of Connecticut, United States; BCBL. Basque Center on Cognition Brain and Language, Donostia-San Sebastián, Spain; Ikerbasque, Basque Foundation for Science, Bilbao, Spain
| | | |
Collapse
|
13
|
Park SJ, Lee HB, Kim GW. Eardrum-inspired soft viscoelastic diaphragms for CNN-based speech recognition with audio visualization images. Sci Rep 2023; 13:6414. [PMID: 37076548 PMCID: PMC10115895 DOI: 10.1038/s41598-023-33755-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Accepted: 04/18/2023] [Indexed: 04/21/2023] Open
Abstract
In this study, we present initial efforts for a new speech recognition approach aimed at producing different input images for convolutional neural network (CNN)-based speech recognition. We explored the potential of the tympanic membrane (eardrum)-inspired viscoelastic membrane-type diaphragms to deliver audio visualization images using a cross-recurrence plot (CRP). These images were formed by the two phase-shifted vibration responses of viscoelastic diaphragms. We expect this technique to replace the fast Fourier transform (FFT) spectrum currently used for speech recognition. Herein, we report that the new creation method of color images enabled by combining two phase-shifted vibration responses of viscoelastic diaphragms with CRP shows a lower computation burden and a promising potential alternative way to STFT (conventional spectrogram) when the image resolution (pixel size) is below critical resolution.
Collapse
Affiliation(s)
- Seok-Jin Park
- Department of Mechanical Engineering, Inha University, 100 Inha-ro, Michuhol-gu, Incheon, 22212, Republic of Korea
| | - Hee-Beom Lee
- Department of Mechanical Engineering, Inha University, 100 Inha-ro, Michuhol-gu, Incheon, 22212, Republic of Korea
| | - Gi-Woo Kim
- Department of Mechanical Engineering, Inha University, 100 Inha-ro, Michuhol-gu, Incheon, 22212, Republic of Korea.
| |
Collapse
|
14
|
Yu K, Wood WE, Johnston LG, Theunissen FE. Lesions to Caudomedial Nidopallium Impair Individual Vocal Recognition in the Zebra Finch. J Neurosci 2023; 43:2579-2596. [PMID: 36859308 PMCID: PMC10082456 DOI: 10.1523/jneurosci.0643-22.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 02/20/2023] [Accepted: 02/23/2023] [Indexed: 03/03/2023] Open
Abstract
Many social animals can recognize other individuals by their vocalizations. This requires a memory system capable of mapping incoming acoustic signals to one of many known individuals. Using the zebra finch, a social songbird that uses songs and distance calls to communicate individual identity (Elie and Theunissen, 2018), we tested the role of two cortical-like brain regions in a vocal recognition task. We found that the rostral region of the Cadomedial Nidopallium (NCM), a secondary auditory region of the avian pallium, was necessary for maintaining auditory memories for conspecific vocalizations in both male and female birds, whereas HVC (used as a proper name), a premotor areas that gates auditory input into the vocal motor and song learning pathways in male birds (Roberts and Mooney, 2013), was not. Both NCM and HVC have previously been implicated for processing the tutor song in the context of song learning (Sakata and Yazaki-Sugiyama, 2020). Our results suggest that NCM might not only store songs as templates for future vocal imitation but also songs and calls for perceptual discrimination of vocalizers in both male and female birds. NCM could therefore operate as a site for auditory memories for vocalizations used in various facets of communication. We also observed that new auditory memories could be acquired without intact HVC or NCM but that for these new memories NCM lesions caused deficits in either memory capacity or auditory discrimination. These results suggest that the high-capacity memory functions of the avian pallial auditory system depend on NCM.SIGNIFICANCE STATEMENT Many aspects of vocal communication require the formation of auditory memories. Voice recognition, for example, requires a memory for vocalizers to identify acoustical features. In both birds and primates, the locus and neural correlates of these high-level memories remain poorly described. Previous work suggests that this memory formation is mediated by high-level sensory areas, not traditional memory areas such as the hippocampus. Using lesion experiments, we show that one secondary auditory brain region in songbirds that had previously been implicated in storing song memories for vocal imitation is also implicated in storing vocal memories for individual recognition. The role of the neural circuits in this region in interpreting the meaning of communication calls should be investigated in the future.
Collapse
Affiliation(s)
- Kevin Yu
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley California 94720
| | - William E Wood
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley California 94720
| | - Leah G Johnston
- Herbert Wertheim School of Optometry and Vision Science, University of California, Berkeley, Berkeley California 94720
| | - Frederic E Theunissen
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley California 94720
- Departments of Psychology
- Integrative Biology, University of California, Berkeley, Berkeley California 94720
| |
Collapse
|
15
|
Giordano BL, Esposito M, Valente G, Formisano E. Intermediate acoustic-to-semantic representations link behavioral and neural responses to natural sounds. Nat Neurosci 2023; 26:664-672. [PMID: 36928634 PMCID: PMC10076214 DOI: 10.1038/s41593-023-01285-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Accepted: 02/15/2023] [Indexed: 03/18/2023]
Abstract
Recognizing sounds implicates the cerebral transformation of input waveforms into semantic representations. Although past research identified the superior temporal gyrus (STG) as a crucial cortical region, the computational fingerprint of these cerebral transformations remains poorly characterized. Here, we exploit a model comparison framework and contrasted the ability of acoustic, semantic (continuous and categorical) and sound-to-event deep neural network representation models to predict perceived sound dissimilarity and 7 T human auditory cortex functional magnetic resonance imaging responses. We confirm that spectrotemporal modulations predict early auditory cortex (Heschl's gyrus) responses, and that auditory dimensions (for example, loudness, periodicity) predict STG responses and perceived dissimilarity. Sound-to-event deep neural networks predict Heschl's gyrus responses similar to acoustic models but, notably, they outperform all competing models at predicting both STG responses and perceived dissimilarity. Our findings indicate that STG entails intermediate acoustic-to-semantic sound representations that neither acoustic nor semantic models can account for. These representations are compositional in nature and relevant to behavior.
Collapse
Affiliation(s)
- Bruno L Giordano
- Institut de Neurosciences de La Timone, UMR 7289, CNRS and Université Aix-Marseille, Marseille, France.
| | - Michele Esposito
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, the Netherlands
| | - Giancarlo Valente
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, the Netherlands
| | - Elia Formisano
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, the Netherlands. .,Maastricht Centre for Systems Biology (MaCSBio), Faculty of Science and Engineering, Maastricht University, Maastricht, the Netherlands. .,Brightlands Institute for Smart Society (BISS), Maastricht University, Maastricht, the Netherlands.
| |
Collapse
|
16
|
Sun L, Li C, Wang S, Si Q, Lin M, Wang N, Sun J, Li H, Liang Y, Wei J, Zhang X, Zhang J. Left frontal eye field encodes sound locations during passive listening. Cereb Cortex 2023; 33:3067-3079. [PMID: 35858212 DOI: 10.1093/cercor/bhac261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 06/02/2022] [Accepted: 06/04/2022] [Indexed: 11/12/2022] Open
Abstract
Previous studies reported that auditory cortices (AC) were mostly activated by sounds coming from the contralateral hemifield. As a result, sound locations could be encoded by integrating opposite activations from both sides of AC ("opponent hemifield coding"). However, human auditory "where" pathway also includes a series of parietal and prefrontal regions. It was unknown how sound locations were represented in those high-level regions during passive listening. Here, we investigated the neural representation of sound locations in high-level regions by voxel-level tuning analysis, regions-of-interest-level (ROI-level) laterality analysis, and ROI-level multivariate pattern analysis. Functional magnetic resonance imaging data were collected while participants listened passively to sounds from various horizontal locations. We found that opponent hemifield coding of sound locations not only existed in AC, but also spanned over intraparietal sulcus, superior parietal lobule, and frontal eye field (FEF). Furthermore, multivariate pattern representation of sound locations in both hemifields could be observed in left AC, right AC, and left FEF. Overall, our results demonstrate that left FEF, a high-level region along the auditory "where" pathway, encodes sound locations during passive listening in two ways: a univariate opponent hemifield activation representation and a multivariate full-field activation pattern representation.
Collapse
Affiliation(s)
- Liwei Sun
- School of Biomedical Engineering, Capital Medical University, Beijing 100069, China
- Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Capital Medical University, Beijing 100069, China
- Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing 100069, China
| | - Chunlin Li
- School of Biomedical Engineering, Capital Medical University, Beijing 100069, China
- Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Capital Medical University, Beijing 100069, China
- Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing 100069, China
| | - Songjian Wang
- School of Biomedical Engineering, Capital Medical University, Beijing 100069, China
- Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing 100069, China
| | - Qian Si
- School of Biomedical Engineering, Capital Medical University, Beijing 100069, China
- Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Capital Medical University, Beijing 100069, China
- Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing 100069, China
| | - Meng Lin
- School of Biomedical Engineering, Capital Medical University, Beijing 100069, China
- Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing 100069, China
| | - Ningyu Wang
- Department of Otorhinolaryngology, Head and Neck Surgery, Beijing Chaoyang Hospital, Capital Medical University, Beijing 100020, China
| | - Jun Sun
- Department of Radiology, Beijing Youan Hospital, Capital Medical University, Beijing 100069, China
| | - Hongjun Li
- Department of Radiology, Beijing Youan Hospital, Capital Medical University, Beijing 100069, China
| | - Ying Liang
- School of Biomedical Engineering, Capital Medical University, Beijing 100069, China
- Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Capital Medical University, Beijing 100069, China
- Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing 100069, China
| | - Jing Wei
- School of Biomedical Engineering, Capital Medical University, Beijing 100069, China
- Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Capital Medical University, Beijing 100069, China
- Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing 100069, China
| | - Xu Zhang
- School of Biomedical Engineering, Capital Medical University, Beijing 100069, China
- Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Capital Medical University, Beijing 100069, China
- Beijing Key Laboratory of Fundamental Research on Biomechanics in Clinical Application, Capital Medical University, Beijing 100069, China
| | - Juan Zhang
- Department of Otorhinolaryngology, Head and Neck Surgery, Beijing Chaoyang Hospital, Capital Medical University, Beijing 100020, China
| |
Collapse
|
17
|
Luthra S, Magnuson JS, Myers EB. Right Posterior Temporal Cortex Supports Integration of Phonetic and Talker Information. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2023; 4:145-177. [PMID: 37229142 PMCID: PMC10205075 DOI: 10.1162/nol_a_00091] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Accepted: 11/08/2022] [Indexed: 05/27/2023]
Abstract
Though the right hemisphere has been implicated in talker processing, it is thought to play a minimal role in phonetic processing, at least relative to the left hemisphere. Recent evidence suggests that the right posterior temporal cortex may support learning of phonetic variation associated with a specific talker. In the current study, listeners heard a male talker and a female talker, one of whom produced an ambiguous fricative in /s/-biased lexical contexts (e.g., epi?ode) and one who produced it in /∫/-biased contexts (e.g., friend?ip). Listeners in a behavioral experiment (Experiment 1) showed evidence of lexically guided perceptual learning, categorizing ambiguous fricatives in line with their previous experience. Listeners in an fMRI experiment (Experiment 2) showed differential phonetic categorization as a function of talker, allowing for an investigation of the neural basis of talker-specific phonetic processing, though they did not exhibit perceptual learning (likely due to characteristics of our in-scanner headphones). Searchlight analyses revealed that the patterns of activation in the right superior temporal sulcus (STS) contained information about who was talking and what phoneme they produced. We take this as evidence that talker information and phonetic information are integrated in the right STS. Functional connectivity analyses suggested that the process of conditioning phonetic identity on talker information depends on the coordinated activity of a left-lateralized phonetic processing system and a right-lateralized talker processing system. Overall, these results clarify the mechanisms through which the right hemisphere supports talker-specific phonetic processing.
Collapse
Affiliation(s)
- Sahil Luthra
- Department of Psychological Sciences, University of Connecticut, Storrs, CT, USA
| | - James S. Magnuson
- Department of Psychological Sciences, University of Connecticut, Storrs, CT, USA
- Basque Center on Cognition Brain and Language (BCBL), Donostia-San Sebastián, Spain
- Ikerbasque, Basque Foundation for Science, Bilbao, Spain
| | - Emily B. Myers
- Department of Psychological Sciences, University of Connecticut, Storrs, CT, USA
- Speech, Language, and Hearing Sciences, University of Connecticut, Storrs, CT, USA
| |
Collapse
|
18
|
Lan B, Yang T, Tian G, Ao Y, Jin L, Xiong D, Wang S, Zhang H, Deng L, Sun Y, Zhang J, Deng W, Yang W. Multichannel Gradient Piezoelectric Transducer Assisted with Deep Learning for Broadband Acoustic Sensing. ACS APPLIED MATERIALS & INTERFACES 2023; 15:12146-12153. [PMID: 36811621 DOI: 10.1021/acsami.2c20520] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
As an important part of human-machine interfaces, piezoelectric voice recognition has received extensive attention due to its unique self-powered nature. However, conventional voice recognition devices exhibit a limited response frequency band due to the intrinsic hardness and brittleness of piezoelectric ceramics or the flexibility of piezoelectric fibers. Here, we propose a cochlear-inspired multichannel piezoelectric acoustic sensor (MAS) based on gradient PVDF piezoelectric nanofibers for broadband voice recognition by a programmable electrospinning technique. Compared with the common electrospun PVDF membrane-based acoustic sensor, the developed MAS demonstrates the greatly 300%-broadened frequency band and the substantially 334.6%-enhanced piezoelectric output. More importantly, this MAS can serve as a high-fidelity auditory platform for music recording and human voice recognition, in which the classification accuracy rate can reach up to 100% in coordination with deep learning. The programmable bionic gradient piezoelectric nanofiber may provide a universal strategy for the development of intelligent bioelectronics.
Collapse
Affiliation(s)
- Boling Lan
- Key Laboratory of Advanced Technologies of Materials (Ministry of Education), School of Materials Science and Engineering, Southwest Jiaotong University, Chengdu, Sichuan 610031, P. R. China
| | - Tao Yang
- Key Laboratory of Advanced Technologies of Materials (Ministry of Education), School of Materials Science and Engineering, Southwest Jiaotong University, Chengdu, Sichuan 610031, P. R. China
| | - Guo Tian
- Key Laboratory of Advanced Technologies of Materials (Ministry of Education), School of Materials Science and Engineering, Southwest Jiaotong University, Chengdu, Sichuan 610031, P. R. China
| | - Yong Ao
- Key Laboratory of Advanced Technologies of Materials (Ministry of Education), School of Materials Science and Engineering, Southwest Jiaotong University, Chengdu, Sichuan 610031, P. R. China
| | - Long Jin
- Key Laboratory of Advanced Technologies of Materials (Ministry of Education), School of Materials Science and Engineering, Southwest Jiaotong University, Chengdu, Sichuan 610031, P. R. China
| | - Da Xiong
- Key Laboratory of Advanced Technologies of Materials (Ministry of Education), School of Materials Science and Engineering, Southwest Jiaotong University, Chengdu, Sichuan 610031, P. R. China
| | - Shenglong Wang
- Key Laboratory of Advanced Technologies of Materials (Ministry of Education), School of Materials Science and Engineering, Southwest Jiaotong University, Chengdu, Sichuan 610031, P. R. China
| | - Hongrui Zhang
- Key Laboratory of Advanced Technologies of Materials (Ministry of Education), School of Materials Science and Engineering, Southwest Jiaotong University, Chengdu, Sichuan 610031, P. R. China
| | - Lin Deng
- Key Laboratory of Advanced Technologies of Materials (Ministry of Education), School of Materials Science and Engineering, Southwest Jiaotong University, Chengdu, Sichuan 610031, P. R. China
| | - Yue Sun
- Key Laboratory of Advanced Technologies of Materials (Ministry of Education), School of Materials Science and Engineering, Southwest Jiaotong University, Chengdu, Sichuan 610031, P. R. China
| | - Jieling Zhang
- Key Laboratory of Advanced Technologies of Materials (Ministry of Education), School of Materials Science and Engineering, Southwest Jiaotong University, Chengdu, Sichuan 610031, P. R. China
| | - Weili Deng
- Key Laboratory of Advanced Technologies of Materials (Ministry of Education), School of Materials Science and Engineering, Southwest Jiaotong University, Chengdu, Sichuan 610031, P. R. China
| | - Weiqing Yang
- Key Laboratory of Advanced Technologies of Materials (Ministry of Education), School of Materials Science and Engineering, Southwest Jiaotong University, Chengdu, Sichuan 610031, P. R. China
| |
Collapse
|
19
|
Sun Y, Ming L, Sun J, Guo F, Li Q, Hu X. Brain mechanism of unfamiliar and familiar voice processing: an activation likelihood estimation meta-analysis. PeerJ 2023; 11:e14976. [PMID: 36935917 PMCID: PMC10019337 DOI: 10.7717/peerj.14976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Accepted: 02/08/2023] [Indexed: 03/14/2023] Open
Abstract
Interpersonal communication through vocal information is very important for human society. During verbal interactions, our vocal cord vibrations convey important information regarding voice identity, which allows us to decide how to respond to speakers (e.g., neither greeting a stranger too warmly or speaking too coldly to a friend). Numerous neural studies have shown that identifying familiar and unfamiliar voices may rely on different neural bases. However, the mechanism underlying voice identification of individuals of varying familiarity has not been determined due to vague definitions, confusion of terms, and differences in task design. To address this issue, the present study first categorized three kinds of voice identity processing (perception, recognition and identification) from speakers with different degrees of familiarity. We defined voice identity perception as passively listening to a voice or determining if the voice was human, voice identity recognition as determining if the sound heard was acoustically familiar, and voice identity identification as ascertaining whether a voice is associated with a name or face. Of these, voice identity perception involves processing unfamiliar voices, and voice identity recognition and identification involves processing familiar voices. According to these three definitions, we performed activation likelihood estimation (ALE) on 32 studies and revealed different brain mechanisms underlying processing of unfamiliar and familiar voice identities. The results were as follows: (1) familiar voice recognition/identification was supported by a network involving most regions in the temporal lobe, some regions in the frontal lobe, subcortical structures and regions around the marginal lobes; (2) the bilateral superior temporal gyrus was recruited for voice identity perception of an unfamiliar voice; (3) voice identity recognition/identification of familiar voices was more likely to activate the right frontal lobe than voice identity perception of unfamiliar voices, while voice identity perception of an unfamiliar voice was more likely to activate the bilateral temporal lobe and left frontal lobe; and (4) the bilateral superior temporal gyrus served as a shared neural basis of unfamiliar voice identity perception and familiar voice identity recognition/identification. In general, the results of the current study address gaps in the literature, provide clear definitions of concepts, and indicate brain mechanisms for subsequent investigations.
Collapse
|
20
|
Drown L, Philip B, Francis AL, Theodore RM. Revisiting the left ear advantage for phonetic cues to talker identification. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:3107. [PMID: 36456295 PMCID: PMC9715276 DOI: 10.1121/10.0015093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 09/13/2022] [Accepted: 10/18/2022] [Indexed: 06/17/2023]
Abstract
Previous research suggests that learning to use a phonetic property [e.g., voice-onset-time, (VOT)] for talker identity supports a left ear processing advantage. Specifically, listeners trained to identify two "talkers" who only differed in characteristic VOTs showed faster talker identification for stimuli presented to the left ear compared to that presented to the right ear, which is interpreted as evidence of hemispheric lateralization consistent with task demands. Experiment 1 (n = 97) aimed to replicate this finding and identify predictors of performance; experiment 2 (n = 79) aimed to replicate this finding under conditions that better facilitate observation of laterality effects. Listeners completed a talker identification task during pretest, training, and posttest phases. Inhibition, category identification, and auditory acuity were also assessed in experiment 1. Listeners learned to use VOT for talker identity, which was positively associated with auditory acuity. Talker identification was not influenced by ear of presentation, and Bayes factors indicated strong support for the null. These results suggest that talker-specific phonetic variation is not sufficient to induce a left ear advantage for talker identification; together with the extant literature, this instead suggests that hemispheric lateralization for talker-specific phonetic variation requires phonetic variation to be conditioned on talker differences in source characteristics.
Collapse
Affiliation(s)
- Lee Drown
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, Storrs, Connecticut 06269-1085, USA
| | - Betsy Philip
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, Storrs, Connecticut 06269-1085, USA
| | - Alexander L Francis
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana 47907-2122, USA
| | - Rachel M Theodore
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, Storrs, Connecticut 06269-1085, USA
| |
Collapse
|
21
|
Schultz DH, Ito T, Cole MW. Global connectivity fingerprints predict the domain generality of multiple-demand regions. Cereb Cortex 2022; 32:4464-4479. [PMID: 35076709 PMCID: PMC9574240 DOI: 10.1093/cercor/bhab495] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Revised: 12/01/2021] [Accepted: 12/02/2021] [Indexed: 01/26/2023] Open
Abstract
A set of distributed cognitive control networks are known to contribute to diverse cognitive demands, yet it is unclear how these networks gain this domain-general capacity. We hypothesized that this capacity is largely due to the particular organization of the human brain's intrinsic network architecture. Specifically, we tested the possibility that each brain region's domain generality is reflected in its level of global (hub-like) intrinsic connectivity as well as its particular global connectivity pattern ("connectivity fingerprint"). Consistent with prior work, we found that cognitive control networks exhibited domain generality as they represented diverse task context information covering sensory, motor response, and logic rule domains. Supporting our hypothesis, we found that the level of global intrinsic connectivity (estimated with resting-state functional magnetic resonance imaging [fMRI]) was correlated with domain generality during tasks. Further, using a novel information fingerprint mapping approach, we found that each cognitive control region's unique rule response profile("information fingerprint") could be predicted based on its unique intrinsic connectivity fingerprint and the information content in regions outside cognitive control networks. Together, these results suggest that the human brain's intrinsic network architecture supports its ability to represent diverse cognitive task information largely via the location of multiple-demand regions within the brain's global network organization.
Collapse
Affiliation(s)
- Douglas H Schultz
- Center for Brain, Biology and Behavior, University of Nebraska-Lincoln, Lincoln, NE 68588, USA.,Department of Psychology, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
| | - Takuya Ito
- Center for Molecular and Behavioral Neuroscience, Rutgers University-Newark, Newark, NJ 07102, USA
| | - Michael W Cole
- Center for Molecular and Behavioral Neuroscience, Rutgers University-Newark, Newark, NJ 07102, USA
| |
Collapse
|
22
|
Bailey KM, Giordano BL, Kaas AL, Smith FW. Decoding sounds depicting hand-object interactions in primary somatosensory cortex. Cereb Cortex 2022; 33:3621-3635. [PMID: 36045002 DOI: 10.1093/cercor/bhac296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Revised: 05/24/2022] [Accepted: 07/07/2022] [Indexed: 11/13/2022] Open
Abstract
Neurons, even in the earliest sensory regions of cortex, are subject to a great deal of contextual influences from both within and across modality connections. Recent work has shown that primary sensory areas can respond to and, in some cases, discriminate stimuli that are not of their target modality: for example, primary somatosensory cortex (SI) discriminates visual images of graspable objects. In the present work, we investigated whether SI would discriminate sounds depicting hand-object interactions (e.g. bouncing a ball). In a rapid event-related functional magnetic resonance imaging experiment, participants listened attentively to sounds from 3 categories: hand-object interactions, and control categories of pure tones and animal vocalizations, while performing a one-back repetition detection task. Multivoxel pattern analysis revealed significant decoding of hand-object interaction sounds within SI, but not for either control category. Crucially, in the hand-sensitive voxels defined from an independent tactile localizer, decoding accuracies were significantly higher for hand-object interactions compared to pure tones in left SI. Our findings indicate that simply hearing sounds depicting familiar hand-object interactions elicit different patterns of activity in SI, despite the complete absence of tactile stimulation. These results highlight the rich contextual information that can be transmitted across sensory modalities even to primary sensory areas.
Collapse
Affiliation(s)
- Kerri M Bailey
- School of Psychology, University of East Anglia, Norwich NR4 7TJ, United Kingdom
| | - Bruno L Giordano
- Institut des Neurosciences de La Timone, CNRS UMR 7289, Université Aix-Marseille, Marseille CNRS UMR 7289, France
| | - Amanda L Kaas
- Department of Cognitive Neuroscience, Maastricht University, Maastricht 6229 EV, The Netherlands
| | - Fraser W Smith
- School of Psychology, University of East Anglia, Norwich NR4 7TJ, United Kingdom
| |
Collapse
|
23
|
Johnson JF, Belyk M, Schwartze M, Pinheiro AP, Kotz SA. Hypersensitivity to passive voice hearing in hallucination proneness. Front Hum Neurosci 2022; 16:859731. [PMID: 35966990 PMCID: PMC9366353 DOI: 10.3389/fnhum.2022.859731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 06/29/2022] [Indexed: 11/21/2022] Open
Abstract
Voices are a complex and rich acoustic signal processed in an extensive cortical brain network. Specialized regions within this network support voice perception and production and may be differentially affected in pathological voice processing. For example, the experience of hallucinating voices has been linked to hyperactivity in temporal and extra-temporal voice areas, possibly extending into regions associated with vocalization. Predominant self-monitoring hypotheses ascribe a primary role of voice production regions to auditory verbal hallucinations (AVH). Alternative postulations view a generalized perceptual salience bias as causal to AVH. These theories are not mutually exclusive as both ascribe the emergence and phenomenology of AVH to unbalanced top-down and bottom-up signal processing. The focus of the current study was to investigate the neurocognitive mechanisms underlying predisposition brain states for emergent hallucinations, detached from the effects of inner speech. Using the temporal voice area (TVA) localizer task, we explored putative hypersalient responses to passively presented sounds in relation to hallucination proneness (HP). Furthermore, to avoid confounds commonly found in in clinical samples, we employed the Launay-Slade Hallucination Scale (LSHS) for the quantification of HP levels in healthy people across an experiential continuum spanning the general population. We report increased activation in the right posterior superior temporal gyrus (pSTG) during the perception of voice features that positively correlates with increased HP scores. In line with prior results, we propose that this right-lateralized pSTG activation might indicate early hypersensitivity to acoustic features coding speaker identity that extends beyond own voice production to perception in healthy participants prone to experience AVH.
Collapse
Affiliation(s)
- Joseph F. Johnson
- Department of Neuropsychology and Psychopharmacology, University of Maastricht, Maastricht, Netherlands
| | - Michel Belyk
- Department of Psychology, Edge Hill University, Ormskirk, United Kingdom
| | - Michael Schwartze
- Department of Neuropsychology and Psychopharmacology, University of Maastricht, Maastricht, Netherlands
| | - Ana P. Pinheiro
- Faculdade de Psicologia, Universidade de Lisboa, Lisbon, Portugal
| | - Sonja A. Kotz
- Department of Neuropsychology and Psychopharmacology, University of Maastricht, Maastricht, Netherlands
- Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| |
Collapse
|
24
|
Rinke P, Schmidt T, Beier K, Kaul R, Scharinger M. Rapid pre-attentive processing of a famous speaker: Electrophysiological effects of Angela Merkel's voice. Neuropsychologia 2022; 173:108312. [PMID: 35781011 DOI: 10.1016/j.neuropsychologia.2022.108312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 06/27/2022] [Accepted: 06/27/2022] [Indexed: 11/18/2022]
Abstract
The recognition of human speakers by their voices is a remarkable cognitive ability. Previous research has established a voice area in the right temporal cortex involved in the integration of speaker-specific acoustic features. This integration appears to occur rapidly, especially in case of familiar voices. However, the exact time course of this process is less well understood. To this end, we here investigated the automatic change detection response of the human brain while listening to the famous voice of German chancellor Angela Merkel, embedded in the context of acoustically matched voices. A classic passive oddball paradigm contrasted short word stimuli uttered by Merkel with word stimuli uttered by two unfamiliar female speakers. Electrophysiological voice processing indices from 21 participants were quantified as mismatch negativities (MMNs) and P3a differences. Cortical sources were approximated by variable resolution electromagnetic tomography. The results showed amplitude and latency effects for both MMN and P3a: The famous (familiar) voice elicited a smaller but earlier MMN than the unfamiliar voices. The P3a, by contrast, was both larger and later for the familiar than for the unfamiliar voices. Familiar-voice MMNs originated from right-hemispheric regions in temporal cortex, overlapping with the temporal voice area, while unfamiliar-voice MMNs stemmed from left superior temporal gyrus. These results suggest that the processing of a very famous voice relies on pre-attentive right temporal processing within the first 150 ms of the acoustic signal. The findings further our understanding of the neural dynamics underlying familiar voice processing.
Collapse
Affiliation(s)
- Paula Rinke
- Research Group Phonetics, Institute of German Linguistics, Philipps-University Marburg, Germany; Center for Mind, Brain & Behavior, Universities of Marburg & Gießen, Germany
| | - Tatjana Schmidt
- Center for Mind, Brain & Behavior, Universities of Marburg & Gießen, Germany; Faculté de biologie et de médecine, University of Lausanne, Switzerland
| | - Kjartan Beier
- Research Group Phonetics, Institute of German Linguistics, Philipps-University Marburg, Germany
| | - Ramona Kaul
- Research Group Phonetics, Institute of German Linguistics, Philipps-University Marburg, Germany
| | - Mathias Scharinger
- Research Group Phonetics, Institute of German Linguistics, Philipps-University Marburg, Germany; Research Center »Deutscher Sprachatlas«, Philipps-University Marburg, Germany; Center for Mind, Brain & Behavior, Universities of Marburg & Gießen, Germany.
| |
Collapse
|
25
|
Yue Q, Martin RC. Phonological Working Memory Representations in the Left Inferior Parietal Lobe in the Face of Distraction and Neural Stimulation. Front Hum Neurosci 2022; 16:890483. [PMID: 35814962 PMCID: PMC9259857 DOI: 10.3389/fnhum.2022.890483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Accepted: 05/30/2022] [Indexed: 11/21/2022] Open
Abstract
The neural basis of phonological working memory (WM) was investigated through an examination of the effects of irrelevant speech distractors and disruptive neural stimulation from transcranial magnetic stimulation (TMS). Embedded processes models argue that the same regions involved in speech perception are used to support phonological WM whereas buffer models assume that a region separate from speech perception regions is used to support WM. Thus, according to the embedded processes approach but not the buffer approach, irrelevant speech and TMS to the speech perception region should disrupt the decoding of phonological WM representations. According to the buffer account, decoding of WM items should be possible in the buffer region despite distraction and should be disrupted with TMS to this region. Experiment 1 used fMRI and representational similarity analyses (RSA) with a delayed recognition memory paradigm using nonword stimuli. Results showed that decoding of memory items in the speech perception regions (superior temporal gyrus, STG) was possible in the absence of distractors. However, the decoding evidence in the left STG was susceptible to interference from distractors presented during the delay period whereas decoding in the proposed buffer region (supramarginal gyrus, SMG) persisted. Experiment 2 examined the causal roles of the speech processing region and the buffer region in phonological WM performance using TMS. TMS to the SMG during the early delay period caused a disruption in recognition performance for the memory nonwords, whereas stimulations at the STG and an occipital control region did not affect WM performance. Taken together, results from the two experiments are consistent with predictions of a buffer model of phonological WM, pointing to a critical role of the left SMG in maintaining phonological representations.
Collapse
Affiliation(s)
- Qiuhai Yue
- Department of Psychological Sciences, Rice University, Houston, TX, United States
- Department of Psychology, Vanderbilt University, Nashville, TN, United States
- *Correspondence: Qiuhai Yue Randi C. Martin
| | - Randi C. Martin
- Department of Psychological Sciences, Rice University, Houston, TX, United States
- *Correspondence: Qiuhai Yue Randi C. Martin
| |
Collapse
|
26
|
Preisig BC, Riecke L, Hervais-Adelman A. Speech sound categorization: The contribution of non-auditory and auditory cortical regions. Neuroimage 2022; 258:119375. [PMID: 35700949 DOI: 10.1016/j.neuroimage.2022.119375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 05/13/2022] [Accepted: 06/10/2022] [Indexed: 11/26/2022] Open
Abstract
Which processes in the human brain lead to the categorical perception of speech sounds? Investigation of this question is hampered by the fact that categorical speech perception is normally confounded by acoustic differences in the stimulus. By using ambiguous sounds, however, it is possible to dissociate acoustic from perceptual stimulus representations. Twenty-seven normally hearing individuals took part in an fMRI study in which they were presented with an ambiguous syllable (intermediate between /da/ and /ga/) in one ear and with disambiguating acoustic feature (third formant, F3) in the other ear. Multi-voxel pattern searchlight analysis was used to identify brain areas that consistently differentiated between response patterns associated with different syllable reports. By comparing responses to different stimuli with identical syllable reports and identical stimuli with different syllable reports, we disambiguated whether these regions primarily differentiated the acoustics of the stimuli or the syllable report. We found that BOLD activity patterns in left perisylvian regions (STG, SMG), left inferior frontal regions (vMC, IFG, AI), left supplementary motor cortex (SMA/pre-SMA), and right motor and somatosensory regions (M1/S1) represent listeners' syllable report irrespective of stimulus acoustics. Most of these regions are outside of what is traditionally regarded as auditory or phonological processing areas. Our results indicate that the process of speech sound categorization implicates decision-making mechanisms and auditory-motor transformations.
Collapse
Affiliation(s)
- Basil C Preisig
- Donders Institute for Brain, Cognition, and Behaviour, Radboud University, 6500 HB Nijmegen, The Netherlands; Max Planck Institute for Psycholinguistics, 6525 XD Nijmegen, The Netherlands; Department of Psychology, Neurolinguistics, University of Zurich, 8050 Zurich, Switzerland; Department of Comparative Language Science, Evolutionary Neuroscience of Language, University of Zurich, 8050 Zurich, Switzerland; Neuroscience Center Zurich, University of Zurich and Eidgenössische Technische Hochschule Zurich, 8057 Zurich, Switzerland.
| | - Lars Riecke
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, 6229 ER Maastricht, The Netherlands
| | - Alexis Hervais-Adelman
- Department of Psychology, Neurolinguistics, University of Zurich, 8050 Zurich, Switzerland; Neuroscience Center Zurich, University of Zurich and Eidgenössische Technische Hochschule Zurich, 8057 Zurich, Switzerland
| |
Collapse
|
27
|
Lim SJ, Thiel C, Sehm B, Deserno L, Lepsien J, Obleser J. Distributed networks for auditory memory differentially contribute to recall precision. Neuroimage 2022; 256:119227. [PMID: 35452804 DOI: 10.1016/j.neuroimage.2022.119227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Revised: 03/13/2022] [Accepted: 04/17/2022] [Indexed: 11/25/2022] Open
Abstract
Re-directing attention to objects in working memory can enhance their representational fidelity. However, how this attentional enhancement of memory representations is implemented across distinct, sensory and cognitive-control brain network is unspecified. The present fMRI experiment leverages psychophysical modelling and multivariate auditory-pattern decoding as behavioral and neural proxies of mnemonic fidelity. Listeners performed an auditory syllable pitch-discrimination task and received retro-active cues to selectively attend to a to-be-probed syllable in memory. Accompanied by increased neural activation in fronto-parietal and cingulo-opercular networks, valid retro-cues yielded faster and more perceptually sensitive responses in recalling acoustic detail of memorized syllables. Information about the cued auditory object was decodable from hemodynamic response patterns in superior temporal sulcus (STS), fronto-parietal, and sensorimotor regions. However, among these regions retaining auditory memory objects, neural fidelity in the left STS and its enhancement through attention-to-memory best predicted individuals' gain in auditory memory recall precision. Our results demonstrate how functionally discrete brain regions differentially contribute to the attentional enhancement of memory representations.
Collapse
Affiliation(s)
- Sung-Joo Lim
- Department of Psychology, University of Lübeck, Maria-Goeppert-Str. 9a, Lübeck 23562, Germany; Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany; Department of Psychology, Binghamton University, State University of New York, 4400 Vestal Parkway E, Vestal, Binghamton, NY 13902, USA; Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA, USA.
| | - Christiane Thiel
- Department of Psychology, Carl von Ossietzky University of Oldenburg, Oldenburg 26129, Germany
| | - Bernhard Sehm
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany
| | - Lorenz Deserno
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany
| | - Jöran Lepsien
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany
| | - Jonas Obleser
- Department of Psychology, University of Lübeck, Maria-Goeppert-Str. 9a, Lübeck 23562, Germany; Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany; Center of Brain, Behavior, and Metabolism, University of Lübeck, Lübeck 23562, Germany.
| |
Collapse
|
28
|
Rennig J, Beauchamp MS. Intelligibility of audiovisual sentences drives multivoxel response patterns in human superior temporal cortex. Neuroimage 2022; 247:118796. [PMID: 34906712 PMCID: PMC8819942 DOI: 10.1016/j.neuroimage.2021.118796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 11/18/2021] [Accepted: 12/08/2021] [Indexed: 11/18/2022] Open
Abstract
Regions of the human posterior superior temporal gyrus and sulcus (pSTG/S) respond to the visual mouth movements that constitute visual speech and the auditory vocalizations that constitute auditory speech, and neural responses in pSTG/S may underlie the perceptual benefit of visual speech for the comprehension of noisy auditory speech. We examined this possibility through the lens of multivoxel pattern responses in pSTG/S. BOLD fMRI data was collected from 22 participants presented with speech consisting of English sentences presented in five different formats: visual-only; auditory with and without added auditory noise; and audiovisual with and without auditory noise. Participants reported the intelligibility of each sentence with a button press and trials were sorted post-hoc into those that were more or less intelligible. Response patterns were measured in regions of the pSTG/S identified with an independent localizer. Noisy audiovisual sentences with very similar physical properties evoked very different response patterns depending on their intelligibility. When a noisy audiovisual sentence was reported as intelligible, the pattern was nearly identical to that elicited by clear audiovisual sentences. In contrast, an unintelligible noisy audiovisual sentence evoked a pattern like that of visual-only sentences. This effect was less pronounced for noisy auditory-only sentences, which evoked similar response patterns regardless of intelligibility. The successful integration of visual and auditory speech produces a characteristic neural signature in pSTG/S, highlighting the importance of this region in generating the perceptual benefit of visual speech.
Collapse
Affiliation(s)
- Johannes Rennig
- Division of Neuropsychology, Center of Neurology, Hertie-Institute for Clinical Brain Research, University of Tübingen, Tübingen, Germany
| | - Michael S Beauchamp
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Richards Medical Research Building, A607, 3700 Hamilton Walk, Philadelphia, PA 19104-6016, United States.
| |
Collapse
|
29
|
Abstract
Human speech perception results from neural computations that transform external acoustic speech signals into internal representations of words. The superior temporal gyrus (STG) contains the nonprimary auditory cortex and is a critical locus for phonological processing. Here, we describe how speech sound representation in the STG relies on fundamentally nonlinear and dynamical processes, such as categorization, normalization, contextual restoration, and the extraction of temporal structure. A spatial mosaic of local cortical sites on the STG exhibits complex auditory encoding for distinct acoustic-phonetic and prosodic features. We propose that as a population ensemble, these distributed patterns of neural activity give rise to abstract, higher-order phonemic and syllabic representations that support speech perception. This review presents a multi-scale, recurrent model of phonological processing in the STG, highlighting the critical interface between auditory and language systems.
Collapse
Affiliation(s)
- Ilina Bhaya-Grossman
- Department of Neurological Surgery, University of California, San Francisco, California 94143, USA;
- Joint Graduate Program in Bioengineering, University of California, Berkeley and San Francisco, California 94720, USA
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, California 94143, USA;
| |
Collapse
|
30
|
Al-Zubaidi A, Bräuer S, Holdgraf CR, Schepers IM, Rieger JW. OUP accepted manuscript. Cereb Cortex Commun 2022; 3:tgac007. [PMID: 35281216 PMCID: PMC8914075 DOI: 10.1093/texcom/tgac007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Revised: 01/26/2022] [Accepted: 01/29/2022] [Indexed: 11/14/2022] Open
Affiliation(s)
- Arkan Al-Zubaidi
- Applied Neurocognitive Psychology Lab and Cluster of Excellence Hearing4all, Oldenburg University, Oldenburg, Germany
- Research Center Neurosensory Science, Oldenburg University, 26129 Oldenburg, Germany
| | - Susann Bräuer
- Applied Neurocognitive Psychology Lab and Cluster of Excellence Hearing4all, Oldenburg University, Oldenburg, Germany
| | - Chris R Holdgraf
- Department of Statistics, UC Berkeley, Berkeley, CA 94720, USA
- International Interactive Computing Collaboration
| | - Inga M Schepers
- Applied Neurocognitive Psychology Lab and Cluster of Excellence Hearing4all, Oldenburg University, Oldenburg, Germany
| | - Jochem W Rieger
- Corresponding author: Department of Psychology, Faculty VI, Oldenburg University, 26129 Oldenburg, Germany.
| |
Collapse
|
31
|
Hierarchical cortical networks of "voice patches" for processing voices in human brain. Proc Natl Acad Sci U S A 2021; 118:2113887118. [PMID: 34930846 DOI: 10.1073/pnas.2113887118] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/11/2021] [Indexed: 12/26/2022] Open
Abstract
Humans have an extraordinary ability to recognize and differentiate voices. It is yet unclear whether voices are uniquely processed in the human brain. To explore the underlying neural mechanisms of voice processing, we recorded electrocorticographic signals from intracranial electrodes in epilepsy patients while they listened to six different categories of voice and nonvoice sounds. Subregions in the temporal lobe exhibited preferences for distinct voice stimuli, which were defined as "voice patches." Latency analyses suggested a dual hierarchical organization of the voice patches. We also found that voice patches were functionally connected under both task-engaged and resting states. Furthermore, the left motor areas were coactivated and correlated with the temporal voice patches during the sound-listening task. Taken together, this work reveals hierarchical cortical networks in the human brain for processing human voices.
Collapse
|
32
|
Romanovska L, Bonte M. How Learning to Read Changes the Listening Brain. Front Psychol 2021; 12:726882. [PMID: 34987442 PMCID: PMC8721231 DOI: 10.3389/fpsyg.2021.726882] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 11/23/2021] [Indexed: 01/18/2023] Open
Abstract
Reading acquisition reorganizes existing brain networks for speech and visual processing to form novel audio-visual language representations. This requires substantial cortical plasticity that is reflected in changes in brain activation and functional as well as structural connectivity between brain areas. The extent to which a child's brain can accommodate these changes may underlie the high variability in reading outcome in both typical and dyslexic readers. In this review, we focus on reading-induced functional changes of the dorsal speech network in particular and discuss how its reciprocal interactions with the ventral reading network contributes to reading outcome. We discuss how the dynamic and intertwined development of both reading networks may be best captured by approaching reading from a skill learning perspective, using audio-visual learning paradigms and longitudinal designs to follow neuro-behavioral changes while children's reading skills unfold.
Collapse
Affiliation(s)
| | - Milene Bonte
- *Correspondence: Linda Romanovska, ; Milene Bonte,
| |
Collapse
|
33
|
Wang J, Wagley N, Rice ML, Booth JR. Semantic and syntactic specialization during auditory sentence processing in 7-8-year-old children. Cortex 2021; 145:169-186. [PMID: 34731687 PMCID: PMC8633078 DOI: 10.1016/j.cortex.2021.09.006] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Revised: 08/27/2021] [Accepted: 09/21/2021] [Indexed: 01/12/2023]
Abstract
Previous studies indicate that adults show specialized syntactic and semantic processes in both the temporal and frontal lobes during language comprehension. Neuro-cognitive models of language development argue that this specialization appears earlier in the temporal than the frontal lobe. However, there is little evidence supporting this proposed progression. Our recently published study (Wang, Rice, & Booth, 2020), using multivoxel pattern analyses, detected that children as young as 5 to 6 years old exhibit specialization and integration in the temporal lobe, but not the frontal lobe. In the current study, we used the same approach to examine semantic and syntactic specialization in children ages 7 to 8 years old. We found support for semantic specialization in the left middle temporal gyrus (MTG) for correct sentences and in the triangular part of the left inferior frontal gyrus (IFG) for incorrect sentences. We also found that the left superior temporal gyrus (STG) played an integration role and was sensitive to both semantic and syntactic processing during both correct and incorrect sentence processing. However, there was no support for syntactic specialization in 7- to 8-year-old children. As compared to our previous study on 5- to 6-year-old children, which only showed semantic specialization in the temporal lobe, the current study suggests a developmental progression to semantic specialization in the frontal lobe. This project represents an important step forward in testing neuro-cognitive models of language processing in children.
Collapse
Affiliation(s)
- Jin Wang
- Department of Psychology and Human Development, Vanderbilt University, Nashville, TN, USA.
| | - Neelima Wagley
- Department of Psychology and Human Development, Vanderbilt University, Nashville, TN, USA
| | - Mabel L Rice
- Child Language Doctoral Program, University of Kansas, Lawrence, KS, USA
| | - James R Booth
- Department of Psychology and Human Development, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
34
|
Feng G, Gan Z, Yi HG, Ell SW, Roark CL, Wang S, Wong PCM, Chandrasekaran B. Neural dynamics underlying the acquisition of distinct auditory category structures. Neuroimage 2021; 244:118565. [PMID: 34543762 DOI: 10.1016/j.neuroimage.2021.118565] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 09/05/2021] [Accepted: 09/06/2021] [Indexed: 11/16/2022] Open
Abstract
Despite the multidimensional and temporally fleeting nature of auditory signals we quickly learn to assign novel sounds to behaviorally relevant categories. The neural systems underlying the learning and representation of novel auditory categories are far from understood. Current models argue for a rigid specialization of hierarchically organized core regions that are fine-tuned to extracting and mapping relevant auditory dimensions to meaningful categories. Scaffolded within a dual-learning systems approach, we test a competing hypothesis: the spatial and temporal dynamics of emerging auditory-category representations are not driven by the underlying dimensions but are constrained by category structure and learning strategies. To test these competing models, we used functional Magnetic Resonance Imaging (fMRI) to assess representational dynamics during the feedback-based acquisition of novel non-speech auditory categories with identical dimensions but differing category structures: rule-based (RB) categories, hypothesized to involve an explicit sound-to-rule mapping network, and information integration (II) based categories, involving pre-decisional integration of dimensions via a procedural-based sound-to-reward mapping network. Adults were assigned to either the RB (n = 30, 19 females) or II (n = 30, 22 females) learning tasks. Despite similar behavioral learning accuracies, learning strategies derived from computational modeling and involvements of corticostriatal systems during feedback processing differed across tasks. Spatiotemporal multivariate representational similarity analysis revealed an emerging representation within an auditory sensory-motor pathway exclusively for the II learning task, prominently involving the superior temporal gyrus (STG), inferior frontal gyrus (IFG), and posterior precentral gyrus. In contrast, the RB learning task yielded distributed neural representations within regions involved in cognitive-control and attentional processes that emerged at different time points of learning. Our results unequivocally demonstrate that auditory learners' neural systems are highly flexible and show distinct spatial and temporal patterns that are not dimension-specific but reflect underlying category structures and learning strategies.
Collapse
Affiliation(s)
- Gangyi Feng
- Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China; Brain and Mind Institute, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China.
| | - Zhenzhong Gan
- Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China; Key Laboratory of Brain, Cognition and Education Sciences, Ministry of Education, China, School of Psychology, Center for Studies of Psychological Application, and Guangdong Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou 510631, China
| | - Han Gyol Yi
- Department of Neurological Surgery, University of California, San Francisco, CA 94158, United States
| | - Shawn W Ell
- Department of Psychology, Graduate School of Biomedical Sciences and Engineering, University of Maine, 5742 Little Hall, Room 301, Orono, ME 04469-5742, United States
| | - Casey L Roark
- Department of Communication Science and Disorders, School of Health and Rehabilitation Sciences, University of Pittsburgh, Pittsburgh, PA 15260, United States; Center for the Neural Basis of Cognition, Pittsburgh, PA 15232, United States
| | - Suiping Wang
- Key Laboratory of Brain, Cognition and Education Sciences, Ministry of Education, China, School of Psychology, Center for Studies of Psychological Application, and Guangdong Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou 510631, China
| | - Patrick C M Wong
- Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China; Brain and Mind Institute, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China
| | - Bharath Chandrasekaran
- Department of Communication Science and Disorders, School of Health and Rehabilitation Sciences, University of Pittsburgh, Pittsburgh, PA 15260, United States; Center for the Neural Basis of Cognition, Pittsburgh, PA 15232, United States.
| |
Collapse
|
35
|
Lowe MX, Mohsenzadeh Y, Lahner B, Charest I, Oliva A, Teng S. Cochlea to categories: The spatiotemporal dynamics of semantic auditory representations. Cogn Neuropsychol 2021; 38:468-489. [PMID: 35729704 PMCID: PMC10589059 DOI: 10.1080/02643294.2022.2085085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 03/31/2022] [Accepted: 05/25/2022] [Indexed: 10/17/2022]
Abstract
How does the auditory system categorize natural sounds? Here we apply multimodal neuroimaging to illustrate the progression from acoustic to semantically dominated representations. Combining magnetoencephalographic (MEG) and functional magnetic resonance imaging (fMRI) scans of observers listening to naturalistic sounds, we found superior temporal responses beginning ∼55 ms post-stimulus onset, spreading to extratemporal cortices by ∼100 ms. Early regions were distinguished less by onset/peak latency than by functional properties and overall temporal response profiles. Early acoustically-dominated representations trended systematically toward category dominance over time (after ∼200 ms) and space (beyond primary cortex). Semantic category representation was spatially specific: Vocalizations were preferentially distinguished in frontotemporal voice-selective regions and the fusiform; scenes and objects were distinguished in parahippocampal and medial place areas. Our results are consistent with real-world events coded via an extended auditory processing hierarchy, in which acoustic representations rapidly enter multiple streams specialized by category, including areas typically considered visual cortex.
Collapse
Affiliation(s)
- Matthew X. Lowe
- Computer Science and Artificial Intelligence Lab (CSAIL), MIT, Cambridge, MA
- Unlimited Sciences, Colorado Springs, CO
| | - Yalda Mohsenzadeh
- Computer Science and Artificial Intelligence Lab (CSAIL), MIT, Cambridge, MA
- The Brain and Mind Institute, The University of Western Ontario, London, ON, Canada
- Department of Computer Science, The University of Western Ontario, London, ON, Canada
| | - Benjamin Lahner
- Computer Science and Artificial Intelligence Lab (CSAIL), MIT, Cambridge, MA
| | - Ian Charest
- Département de Psychologie, Université de Montréal, Montréal, Québec, Canada
- Center for Human Brain Health, University of Birmingham, UK
| | - Aude Oliva
- Computer Science and Artificial Intelligence Lab (CSAIL), MIT, Cambridge, MA
| | - Santani Teng
- Computer Science and Artificial Intelligence Lab (CSAIL), MIT, Cambridge, MA
- Smith-Kettlewell Eye Research Institute (SKERI), San Francisco, CA
| |
Collapse
|
36
|
Learning nonnative speech sounds changes local encoding in the adult human cortex. Proc Natl Acad Sci U S A 2021; 118:2101777118. [PMID: 34475209 DOI: 10.1073/pnas.2101777118] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Accepted: 07/12/2021] [Indexed: 11/18/2022] Open
Abstract
Adults can learn to identify nonnative speech sounds with training, albeit with substantial variability in learning behavior. Increases in behavioral accuracy are associated with increased separability for sound representations in cortical speech areas. However, it remains unclear whether individual auditory neural populations all show the same types of changes with learning, or whether there are heterogeneous encoding patterns. Here, we used high-resolution direct neural recordings to examine local population response patterns, while native English listeners learned to recognize unfamiliar vocal pitch patterns in Mandarin Chinese tones. We found a distributed set of neural populations in bilateral superior temporal gyrus and ventrolateral frontal cortex, where the encoding of Mandarin tones changed throughout training as a function of trial-by-trial accuracy ("learning effect"), including both increases and decreases in the separability of tones. These populations were distinct from populations that showed changes as a function of exposure to the stimuli regardless of trial-by-trial accuracy. These learning effects were driven in part by more variable neural responses to repeated presentations of acoustically identical stimuli. Finally, learning effects could be predicted from speech-evoked activity even before training, suggesting that intrinsic properties of these populations make them amenable to behavior-related changes. Together, these results demonstrate that nonnative speech sound learning involves a wide array of changes in neural representations across a distributed set of brain regions.
Collapse
|
37
|
Di Dona G, Scaltritti M, Sulpizio S. Early differentiation of memory retrieval processes for newly learned voices and phonemes as indexed by the MMN. BRAIN AND LANGUAGE 2021; 220:104981. [PMID: 34166941 DOI: 10.1016/j.bandl.2021.104981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Revised: 05/31/2021] [Accepted: 06/01/2021] [Indexed: 06/13/2023]
Abstract
Linguistic and vocal information are thought to be differentially processed since the early stages of speech perception, but it remains unclear if this differentiation also concerns automatic processes of memory retrieval. The aim of this ERP study was to compare the automatic retrieval processes for newly learned voices vs phonemes. In a longitudinal experiment, two groups of participants were trained in learning either a new phoneme or a new voice. The MMN elicited by the presentation of the two was measured before and after the training. An enhanced MMN was elicited by the presentation of the learned phoneme, reflecting the activation of an automatic memory retrieval process. Instead, a reduced MMN was elicited by the learned voice, indicating that the voice was perceived as a typical member of the learned voice identity. This suggests that the automatic processes that retrieve linguistic and vocal information are differently affected by experience.
Collapse
Affiliation(s)
- Giuseppe Di Dona
- Dipartimento di Psicologia e Scienze Cognitive, Università degli Studi di Trento, Corso Bettini 84, 38068 Rovereto (TN), Italy.
| | - Michele Scaltritti
- Dipartimento di Psicologia e Scienze Cognitive, Università degli Studi di Trento, Corso Bettini 84, 38068 Rovereto (TN), Italy.
| | - Simone Sulpizio
- Dipartimento di Psicologia, Università degli Studi di Milano-Bicocca, Piazza dell'Ateneo Nuovo 1, 20126 Milano (MI), Italy; Milan Center for Neuroscience (NeuroMi), Università degli Studi di Milano-Bicocca, Piazza dell'Ateneo Nuovo 1, 20126 Milano (MI), Italy.
| |
Collapse
|
38
|
Abstract
Creating invariant representations from an everchanging speech signal is a major challenge for the human brain. Such an ability is particularly crucial for preverbal infants who must discover the phonological, lexical, and syntactic regularities of an extremely inconsistent signal in order to acquire language. Within the visual domain, an efficient neural solution to overcome variability consists in factorizing the input into a reduced set of orthogonal components. Here, we asked whether a similar decomposition strategy is used in early speech perception. Using a 256-channel electroencephalographic system, we recorded the neural responses of 3-mo-old infants to 120 natural consonant-vowel syllables with varying acoustic and phonetic profiles. Using multivariate pattern analyses, we show that syllables are factorized into distinct and orthogonal neural codes for consonants and vowels. Concerning consonants, we further demonstrate the existence of two stages of processing. A first phase is characterized by orthogonal and context-invariant neural codes for the dimensions of manner and place of articulation. Within the second stage, manner and place codes are integrated to recover the identity of the phoneme. We conclude that, despite the paucity of articulatory motor plans and speech production skills, pre-babbling infants are already equipped with a structured combinatorial code for speech analysis, which might account for the rapid pace of language acquisition during the first year.
Collapse
|
39
|
Khalighinejad B, Patel P, Herrero JL, Bickel S, Mehta AD, Mesgarani N. Functional characterization of human Heschl's gyrus in response to natural speech. Neuroimage 2021; 235:118003. [PMID: 33789135 PMCID: PMC8608271 DOI: 10.1016/j.neuroimage.2021.118003] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Revised: 03/23/2021] [Accepted: 03/25/2021] [Indexed: 01/11/2023] Open
Abstract
Heschl's gyrus (HG) is a brain area that includes the primary auditory cortex in humans. Due to the limitations in obtaining direct neural measurements from this region during naturalistic speech listening, the functional organization and the role of HG in speech perception remain uncertain. Here, we used intracranial EEG to directly record neural activity in HG in eight neurosurgical patients as they listened to continuous speech stories. We studied the spatial distribution of acoustic tuning and the organization of linguistic feature encoding. We found a main gradient of change from posteromedial to anterolateral parts of HG. We also observed a decrease in frequency and temporal modulation tuning and an increase in phonemic representation, speaker normalization, speech sensitivity, and response latency. We did not observe a difference between the two brain hemispheres. These findings reveal a functional role for HG in processing and transforming simple to complex acoustic features and inform neurophysiological models of speech processing in the human auditory cortex.
Collapse
Affiliation(s)
- Bahar Khalighinejad
- Mortimer B. Zuckerman Brain Behavior Institute, Columbia University, New York, NY, United States,Department of Electrical Engineering, Columbia University, New York, NY, United States
| | - Prachi Patel
- Mortimer B. Zuckerman Brain Behavior Institute, Columbia University, New York, NY, United States,Department of Electrical Engineering, Columbia University, New York, NY, United States
| | - Jose L. Herrero
- Hofstra Northwell School of Medicine, Manhasset, NY, United States,The Feinstein Institutes for Medical Research, Manhasset, NY, United States
| | - Stephan Bickel
- Hofstra Northwell School of Medicine, Manhasset, NY, United States,The Feinstein Institutes for Medical Research, Manhasset, NY, United States
| | - Ashesh D. Mehta
- Hofstra Northwell School of Medicine, Manhasset, NY, United States,The Feinstein Institutes for Medical Research, Manhasset, NY, United States
| | - Nima Mesgarani
- Mortimer B. Zuckerman Brain Behavior Institute, Columbia University, New York, NY, United States,Department of Electrical Engineering, Columbia University, New York, NY, United States,Corresponding author at: Department of Electrical Engineering, Columbia University, New York, NY, United States. (B. Khalighinejad), (P. Patel), (J.L. Herrero), (S. Bickel), (A.D. Mehta), (N. Mesgarani)
| |
Collapse
|
40
|
Auditory cortical micro-networks show differential connectivity during voice and speech processing in humans. Commun Biol 2021; 4:801. [PMID: 34172824 PMCID: PMC8233416 DOI: 10.1038/s42003-021-02328-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 06/09/2021] [Indexed: 02/05/2023] Open
Abstract
The temporal voice areas (TVAs) in bilateral auditory cortex (AC) appear specialized for voice processing. Previous research assumed a uniform functional profile for the TVAs which are broadly spread along the bilateral AC. Alternatively, the TVAs might comprise separate AC nodes controlling differential neural functions for voice and speech decoding, organized as local micro-circuits. To investigate micro-circuits, we modeled the directional connectivity between TVA nodes during voice processing in humans while acquiring brain activity using neuroimaging. Results show several bilateral AC nodes for general voice decoding (speech and non-speech voices) and for speech decoding in particular. Furthermore, non-hierarchical and differential bilateral AC networks manifest distinct excitatory and inhibitory pathways for voice and speech processing. Finally, while voice and speech processing seem to have distinctive but integrated neural circuits in the left AC, the right AC reveals disintegrated neural circuits for both sounds. Altogether, we demonstrate a functional heterogeneity in the TVAs for voice decoding based on local micro-circuits.
Collapse
|
41
|
Levy DF, Wilson SM. Categorical Encoding of Vowels in Primary Auditory Cortex. Cereb Cortex 2021; 30:618-627. [PMID: 31241149 DOI: 10.1093/cercor/bhz112] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Revised: 04/05/2019] [Accepted: 05/02/2019] [Indexed: 11/14/2022] Open
Abstract
Speech perception involves mapping from a continuous and variable acoustic speech signal to discrete, linguistically meaningful units. However, it is unclear where in the auditory processing stream speech sound representations cease to be veridical (faithfully encoding precise acoustic properties) and become categorical (encoding sounds as linguistic categories). In this study, we used functional magnetic resonance imaging and multivariate pattern analysis to determine whether tonotopic primary auditory cortex (PAC), defined as tonotopic voxels falling within Heschl's gyrus, represents one class of speech sounds-vowels-veridically or categorically. For each of 15 participants, 4 individualized synthetic vowel stimuli were generated such that the vowels were equidistant in acoustic space, yet straddled a categorical boundary (with the first 2 vowels perceived as [i] and the last 2 perceived as [i]). Each participant's 4 vowels were then presented in a block design with an irrelevant but attention-demanding level change detection task. We found that in PAC bilaterally, neural discrimination between pairs of vowels that crossed the categorical boundary was more accurate than neural discrimination between equivalently spaced vowel pairs that fell within a category. These findings suggest that PAC does not represent vowel sounds veridically, but that encoding of vowels is shaped by linguistically relevant phonemic categories.
Collapse
Affiliation(s)
- Deborah F Levy
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Stephen M Wilson
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| |
Collapse
|
42
|
Baus C, Ruiz-Tada E, Escera C, Costa A. Early detection of language categories in face perception. Sci Rep 2021; 11:9715. [PMID: 33958663 PMCID: PMC8102523 DOI: 10.1038/s41598-021-89007-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Accepted: 04/09/2021] [Indexed: 02/03/2023] Open
Abstract
Does language categorization influence face identification? The present study addressed this question by means of two experiments. First, to establish language categorization of faces, the memory confusion paradigm was used to create two language categories of faces, Spanish and English. Subsequently, participants underwent an oddball paradigm, in which faces that had been previously paired with one of the two languages (Spanish or English), were presented. We measured EEG perceptual differences (vMMN) between standard and two types of deviant faces: within-language category (faces sharing language with standards) or between-language category (faces paired with the other language). Participants were more likely to confuse faces within the language category than between categories, an index that faces were categorized by language. At the neural level, early vMMN were obtained for between-language category faces, but not for within-language category faces. At a later stage, however, larger vMMNs were obtained for those faces from the same language category. Our results showed that language is a relevant social cue that individuals used to categorize others and this categorization subsequently affects face perception.
Collapse
Affiliation(s)
- Cristina Baus
- Department of Cognition, Development and Educational Psychology, University of Barcelona, 08035, Barcelona, Spain.
- Center for Brain and Cognition, CBC, Pompeu Fabra University, Barcelona, Spain.
| | | | - Carles Escera
- Brainlab-Cognitive Neuroscience Research Group, Department of Clinical Psychology and Psychobiology, University of Barcelona, Barcelona, Spain
- Institute of Neurosciences, University of Barcelona, Barcelona, Spain
- Institut de Recerca Sant Joan de Déu, Esplugues de Llobregat, Barcelona, Spain
| | - Albert Costa
- Center for Brain and Cognition, CBC, Pompeu Fabra University, Barcelona, Spain
| |
Collapse
|
43
|
Johnson JF, Belyk M, Schwartze M, Pinheiro AP, Kotz SA. Expectancy changes the self-monitoring of voice identity. Eur J Neurosci 2021; 53:2681-2695. [PMID: 33638190 PMCID: PMC8252045 DOI: 10.1111/ejn.15162] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Revised: 01/18/2021] [Accepted: 02/20/2021] [Indexed: 12/02/2022]
Abstract
Self‐voice attribution can become difficult when voice characteristics are ambiguous, but functional magnetic resonance imaging (fMRI) investigations of such ambiguity are sparse. We utilized voice‐morphing (self‐other) to manipulate (un‐)certainty in self‐voice attribution in a button‐press paradigm. This allowed investigating how levels of self‐voice certainty alter brain activation in brain regions monitoring voice identity and unexpected changes in voice playback quality. FMRI results confirmed a self‐voice suppression effect in the right anterior superior temporal gyrus (aSTG) when self‐voice attribution was unambiguous. Although the right inferior frontal gyrus (IFG) was more active during a self‐generated compared to a passively heard voice, the putative role of this region in detecting unexpected self‐voice changes during the action was demonstrated only when hearing the voice of another speaker and not when attribution was uncertain. Further research on the link between right aSTG and IFG is required and may establish a threshold monitoring voice identity in action. The current results have implications for a better understanding of the altered experience of self‐voice feedback in auditory verbal hallucinations.
Collapse
Affiliation(s)
- Joseph F Johnson
- Department of Neuropsychology and Psychopharmacology, Maastricht University, Maastricht, the Netherlands
| | - Michel Belyk
- Division of Psychology and Language Sciences, University College London, London, UK
| | - Michael Schwartze
- Department of Neuropsychology and Psychopharmacology, Maastricht University, Maastricht, the Netherlands
| | - Ana P Pinheiro
- Faculdade de Psicologia, Universidade de Lisboa, Lisbon, Portugal
| | - Sonja A Kotz
- Department of Neuropsychology and Psychopharmacology, Maastricht University, Maastricht, the Netherlands.,Department of Neuropsychology, Max Planck Institute for Human and Cognitive Sciences, Leipzig, Germany
| |
Collapse
|
44
|
Wang HS, Hong SK, Han JH, Jung YH, Jeong HK, Im TH, Jeong CK, Lee BY, Kim G, Yoo CD, Lee KJ. Biomimetic and flexible piezoelectric mobile acoustic sensors with multiresonant ultrathin structures for machine learning biometrics. SCIENCE ADVANCES 2021; 7:eabe5683. [PMID: 33579699 PMCID: PMC7880591 DOI: 10.1126/sciadv.abe5683] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Accepted: 12/30/2020] [Indexed: 05/19/2023]
Abstract
Flexible resonant acoustic sensors have attracted substantial attention as an essential component for intuitive human-machine interaction (HMI) in the future voice user interface (VUI). Several researches have been reported by mimicking the basilar membrane but still have dimensional drawback due to limitation of controlling a multifrequency band and broadening resonant spectrum for full-cover phonetic frequencies. Here, highly sensitive piezoelectric mobile acoustic sensor (PMAS) is demonstrated by exploiting an ultrathin membrane for biomimetic frequency band control. Simulation results prove that resonant bandwidth of a piezoelectric film can be broadened by adopting a lead-zirconate-titanate (PZT) membrane on the ultrathin polymer to cover the entire voice spectrum. Machine learning-based biometric authentication is demonstrated by the integrated acoustic sensor module with an algorithm processor and customized Android app. Last, exceptional error rate reduction in speaker identification is achieved by a PMAS module with a small amount of training data, compared to a conventional microelectromechanical system microphone.
Collapse
Affiliation(s)
- Hee Seung Wang
- Department of Materials Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Seong Kwang Hong
- Department of Materials Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Jae Hyun Han
- Department of Materials Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Young Hoon Jung
- Department of Materials Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Hyun Kyu Jeong
- School of Computing, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Tae Hong Im
- Department of Materials Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Chang Kyu Jeong
- Division of Advanced Materials Engineering, Jeonbuk National University, 567 Baekje-daero, Deokjin-gu, Jeonju, Jeonbuk 54896, Republic of Korea
| | - Bo-Yeon Lee
- Department of Nature-Inspired Nano-convergence System, Korea Institute of Machinery and Materials (KIMM), 156 Gajeongbuk-Ro, Yuseong-gu, Daejeon 34103, Republic of Korea
| | - Gwangsu Kim
- Department of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Chang D Yoo
- Department of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Keon Jae Lee
- Department of Materials Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea.
| |
Collapse
|
45
|
Urbschat A, Uppenkamp S, Anemüller J. Searchlight Classification Informative Region Mixture Model (SCIM): Identification of Cortical Regions Showing Discriminable BOLD Patterns in Event-Related Auditory fMRI Data. Front Neurosci 2021; 14:616906. [PMID: 33597841 PMCID: PMC7882477 DOI: 10.3389/fnins.2020.616906] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 12/29/2020] [Indexed: 11/13/2022] Open
Abstract
The investigation of abstract cognitive tasks, e.g., semantic processing of speech, requires the simultaneous use of a carefully selected stimulus design and sensitive tools for the analysis of corresponding neural activity that are comparable across different studies investigating similar research questions. Multi-voxel pattern analysis (MVPA) methods are commonly used in neuroimaging to investigate BOLD responses corresponding to neural activation associated with specific cognitive tasks. Regions of significant activation are identified by a thresholding operation during multivariate pattern analysis, the results of which are susceptible to the applied threshold value. Investigation of analysis approaches that are robust to a large extent with respect to thresholding, is thus an important goal pursued here. The present paper contributes a novel statistical analysis method for fMRI experiments, searchlight classification informative region mixture model (SCIM), that is based on the assumption that the whole brain volume can be subdivided into two groups of voxels: spatial voxel positions around which recorded BOLD activity does convey information about the present stimulus condition and those that do not. A generative statistical model is proposed that assigns a probability of being informative to each position in the brain, based on a combination of a support vector machine searchlight analysis and Gaussian mixture models. Results from an auditory fMRI study investigating cortical regions that are engaged in the semantic processing of speech indicate that the SCIM method identifies physiologically plausible brain regions as informative, similar to those from two standard methods as reference that we compare to, with two important differences. SCIM-identified regions are very robust to the choice of the threshold for significance, i.e., less “noisy,” in contrast to, e.g., the binomial test whose results in the present experiment are highly dependent on the chosen significance threshold or random permutation tests that are additionally bound to very high computational costs. In group analyses, the SCIM method identifies a physiologically plausible pre-frontal region, anterior cingulate sulcus, to be involved in semantic processing that other methods succeed to identify only in single subject analyses.
Collapse
Affiliation(s)
- Annika Urbschat
- Department of Medical Physics and Acoustics, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
| | - Stefan Uppenkamp
- Department of Medical Physics and Acoustics, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
| | - Jörn Anemüller
- Department of Medical Physics and Acoustics, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
| |
Collapse
|
46
|
Phonetic perception but not perception of speaker gender is impaired in chronic tinnitus. PROGRESS IN BRAIN RESEARCH 2021; 260:397-422. [PMID: 33637229 DOI: 10.1016/bs.pbr.2020.12.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
While tinnitus is known to compromise the perception of speech, it is unclear if the same holds for extralinguistic speaker information. Furthermore, research with simple tone stimuli showed that unilateral tinnitus binds spatial attention, thereby impeding the detection of auditory changes in the non-affected ear. Using dichotic listening tasks, we tested left-ear tinnitus patients and control patients for their ability to ignore speech and speaker information in the task-irrelevant ear. To this end they heard vowel-consonant-vowel (VCV) syllables simultaneously spoken by gender-ambiguous voices in one ear and male or female voices in the contralateral ear. They selectively attended to speech (Exp. 1) or speaker (Exp. 2) information in a designated target ear, by classifying either the consonant (/b/ or /g/) in VCV syllables or voice gender (male or female) while ignoring distractor voices in the other ear. While performance was comparable across groups in the gender task, tinnitus patients responded slower than controls in the consonant task, with no effect of target ear. This suggests that tinnitus hampers phonetic perception in speech, while preserving the processing of extralinguistic speaker information. These findings support the growing evidence for speech perception impairments in tinnitus.
Collapse
|
47
|
FMRI-based identity classification accuracy in left temporal and frontal regions predicts speaker recognition performance. Sci Rep 2021; 11:489. [PMID: 33436825 PMCID: PMC7803954 DOI: 10.1038/s41598-020-79922-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Accepted: 12/14/2020] [Indexed: 01/29/2023] Open
Abstract
Speaker recognition is characterized by considerable inter-individual variability with poorly understood neural bases. This study was aimed at (1) clarifying the cerebral correlates of speaker recognition in humans, in particular the involvement of prefrontal areas, using multi voxel pattern analysis (MVPA) applied to fMRI data from a relatively large group of participants, and (2) at investigating the relationship across participants between fMRI-based classification and the group's variable behavioural performance at the speaker recognition task. A cohort of subjects (N = 40, 28 females) selected to present a wide distribution of voice recognition abilities underwent an fMRI speaker identification task during which they were asked to recognize three previously learned speakers with finger button presses. The results showed that speaker identity could be significantly decoded based on fMRI patterns in voice-sensitive regions including bilateral temporal voice areas (TVAs) along the superior temporal sulcus/gyrus but also in bilateral parietal and left inferior frontal regions. Furthermore, fMRI-based classification accuracy showed a significant correlation with individual behavioural performance in left anterior STG/STS and left inferior frontal gyrus. These results highlight the role of both temporal and extra-temporal regions in performing a speaker identity recognition task with motor responses.
Collapse
|
48
|
Luthra S. The Role of the Right Hemisphere in Processing Phonetic Variability Between Talkers. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2021; 2:138-151. [PMID: 37213418 PMCID: PMC10174361 DOI: 10.1162/nol_a_00028] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/17/2020] [Accepted: 11/13/2020] [Indexed: 05/23/2023]
Abstract
Neurobiological models of speech perception posit that both left and right posterior temporal brain regions are involved in the early auditory analysis of speech sounds. However, frank deficits in speech perception are not readily observed in individuals with right hemisphere damage. Instead, damage to the right hemisphere is often associated with impairments in vocal identity processing. Herein lies an apparent paradox: The mapping between acoustics and speech sound categories can vary substantially across talkers, so why might right hemisphere damage selectively impair vocal identity processing without obvious effects on speech perception? In this review, I attempt to clarify the role of the right hemisphere in speech perception through a careful consideration of its role in processing vocal identity. I review evidence showing that right posterior superior temporal, right anterior superior temporal, and right inferior / middle frontal regions all play distinct roles in vocal identity processing. In considering the implications of these findings for neurobiological accounts of speech perception, I argue that the recruitment of right posterior superior temporal cortex during speech perception may specifically reflect the process of conditioning phonetic identity on talker information. I suggest that the relative lack of involvement of other right hemisphere regions in speech perception may be because speech perception does not necessarily place a high burden on talker processing systems, and I argue that the extant literature hints at potential subclinical impairments in the speech perception abilities of individuals with right hemisphere damage.
Collapse
|
49
|
Feng G, Gan Z, Llanos F, Meng D, Wang S, Wong PCM, Chandrasekaran B. A distributed dynamic brain network mediates linguistic tone representation and categorization. Neuroimage 2021; 224:117410. [PMID: 33011415 PMCID: PMC7749825 DOI: 10.1016/j.neuroimage.2020.117410] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Revised: 08/21/2020] [Accepted: 09/25/2020] [Indexed: 12/21/2022] Open
Abstract
Successful categorization requires listeners to represent the incoming sensory information, resolve the "blooming, buzzing confusion" inherent to noisy sensory signals, and leverage the accumulated evidence towards making a decision. Despite decades of intense debate, the neural systems underlying speech categorization remain unresolved. Here we assessed the neural representation and categorization of lexical tones by native Mandarin speakers (N = 31) across a range of acoustic and contextual variabilities (talkers, perceptual saliences, and stimulus-contexts) using functional magnetic imaging (fMRI) and an evidence accumulation model of decision-making. Univariate activation and multivariate pattern analyses reveal that the acoustic-variability-tolerant representations of tone category are observed within the middle portion of the left superior temporal gyrus (STG). Activation patterns in the frontal and parietal regions also contained category-relevant information that was differentially sensitive to various forms of variability. The robustness of neural representations of tone category in a distributed fronto-temporoparietal network is associated with trial-by-trial decision-making parameters. These findings support a hybrid model involving a representational core within the STG that operates dynamically within an extensive frontoparietal network to support the representation and categorization of linguistic pitch patterns.
Collapse
Affiliation(s)
- Gangyi Feng
- Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China; Brain and Mind Institute, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China.
| | - Zhenzhong Gan
- Center for the Study of Applied Psychology and School of Psychology, South China Normal University, Guangzhou 510631, China
| | - Fernando Llanos
- Department of Communication Science and Disorders, School of Health and Rehabilitation Sciences, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Danting Meng
- Center for the Study of Applied Psychology and School of Psychology, South China Normal University, Guangzhou 510631, China
| | - Suiping Wang
- Center for the Study of Applied Psychology and School of Psychology, South China Normal University, Guangzhou 510631, China; Guangdong Provincial Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou 510631, China
| | - Patrick C M Wong
- Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China; Brain and Mind Institute, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China
| | - Bharath Chandrasekaran
- Department of Communication Science and Disorders, School of Health and Rehabilitation Sciences, University of Pittsburgh, Pittsburgh, PA 15260, United States.
| |
Collapse
|
50
|
Cortical voice processing is grounded in elementary sound analyses for vocalization relevant sound patterns. Prog Neurobiol 2020; 200:101982. [PMID: 33338555 DOI: 10.1016/j.pneurobio.2020.101982] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 12/05/2020] [Accepted: 12/11/2020] [Indexed: 01/31/2023]
Abstract
A subregion of the auditory cortex (AC) was proposed to selectively process voices. This selectivity of the temporal voice area (TVA) and its role in processing non-voice sounds however have remained elusive. For a better functional description of the TVA, we investigated its neural responses both to voice and non-voice sounds, and critically also to textural sound patterns (TSPs) that share basic features with natural sounds but that are perceptually very distant from voices. Listening to these TSPs, first, elicited activity in large subregions of the TVA, which was mainly driven by perpetual ratings of TSPs along a voice similarity scale. This similar TVA activity in response to TSPs might partially explain activation patterns typically observed during voice processing. Second, we reconstructed the TVA activity that is usually observed in voice processing with a linear combination of activation patterns from TSPs. An analysis of the reconstruction model weights demonstrated that the TVA similarly processes both natural voice and non-voice sounds as well as TSPs along their acoustic and perceptual features. The predominant factor in reconstructing the TVA pattern by TSPs were the perceptual voice similarity ratings. Third, a multi-voxel pattern analysis confirms that the TSPs contain sufficient sound information to explain TVA activity for voice processing. Altogether, rather than being restricted to higher-order voice processing only, the human "voice area" uses mechanisms to evaluate the perceptual and acoustic quality of non-voice sounds, and responds to the latter with a "voice-like" processing pattern when detecting some rudimentary perceptual similarity with voices.
Collapse
|