51
|
Feng G, Yi HG, Chandrasekaran B. The Role of the Human Auditory Corticostriatal Network in Speech Learning. Cereb Cortex 2020; 29:4077-4089. [PMID: 30535138 DOI: 10.1093/cercor/bhy289] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2018] [Revised: 08/30/2018] [Indexed: 01/26/2023] Open
Abstract
We establish a mechanistic account of how the mature human brain functionally reorganizes to acquire and represent new speech sounds. Native speakers of English learned to categorize Mandarin lexical tone categories produced by multiple talkers using trial-by-trial feedback. We hypothesized that the corticostriatal system is a key intermediary in mediating temporal lobe plasticity and the acquisition of new speech categories in adulthood. We conducted a functional magnetic resonance imaging experiment in which participants underwent a sound-to-category mapping task. Diffusion tensor imaging data were collected, and probabilistic fiber tracking analysis was employed to assay the auditory corticostriatal pathways. Multivariate pattern analysis showed that talker-invariant novel tone category representations emerged in the left superior temporal gyrus (LSTG) within a few hundred training trials. Univariate analysis showed that the putamen, a subregion of the striatum, was sensitive to positive feedback in correctly categorized trials. With learning, functional coupling between the putamen and LSTG increased during error processing. Furthermore, fiber tractography demonstrated robust structural connectivity between the feedback-sensitive striatal regions and the LSTG regions that represent the newly learned tone categories. Our convergent findings highlight a critical role for the auditory corticostriatal circuitry in mediating the acquisition of new speech categories.
Collapse
Affiliation(s)
- Gangyi Feng
- Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Hong Kong SAR, China.,Brain and Mind Institute, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Han Gyol Yi
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Bharath Chandrasekaran
- Department of Communication Science and Disorders, School of Health and Rehabilitation Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA
| |
Collapse
|
52
|
Dash D, Wisler A, Ferrari P, Davenport EM, Maldjian J, Wang J. MEG Sensor Selection for Neural Speech Decoding. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 8:182320-182337. [PMID: 33204579 PMCID: PMC7668411 DOI: 10.1109/access.2020.3028831] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Direct decoding of speech from the brain is a faster alternative to current electroencephalography (EEG) speller-based brain-computer interfaces (BCI) in providing communication assistance to locked-in patients. Magnetoencephalography (MEG) has recently shown great potential as a non-invasive neuroimaging modality for neural speech decoding, owing in part to its spatial selectivity over other high-temporal resolution devices. Standard MEG systems have a large number of cryogenically cooled channels/sensors (200 - 300) encapsulated within a fixed liquid helium dewar, precluding their use as wearable BCI devices. Fortunately, recently developed optically pumped magnetometers (OPM) do not require cryogens, and have the potential to be wearable and movable making them more suitable for BCI applications. This design is also modular allowing for customized montages to include only the sensors necessary for a particular task. As the number of sensors bears a heavy influence on the cost, size, and weight of MEG systems, minimizing the number of sensors is critical for designing practical MEG-based BCIs in the future. In this study, we sought to identify an optimal set of MEG channels to decode imagined and spoken phrases from the MEG signals. Using a forward selection algorithm with a support vector machine classifier we found that nine optimally located MEG gradiometers provided higher decoding accuracy compared to using all channels. Additionally, the forward selection algorithm achieved similar performance to dimensionality reduction using a stacked-sparse-autoencoder. Analysis of spatial dynamics of speech decoding suggested that both left and right hemisphere sensors contribute to speech decoding. Sensors approximately located near Broca's area were found to be commonly contributing among the higher-ranked sensors across all subjects.
Collapse
Affiliation(s)
- Debadatta Dash
- Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX 78712, USA
- Department of Neurology, Dell Medical School, The University of Texas at Austin, Austin, TX 78712, USA
| | - Alan Wisler
- Department of Speech, Language, and Hearing Sciences, University of Texas at Austin, Austin, TX 78712, USA
| | - Paul Ferrari
- MEG Laboratory, Dell Children's Medical Center, Austin, TX 78723, USA
- Department of Psychology, The University of Texas at Austin, Austin, TX 78712, USA
| | | | - Joseph Maldjian
- Department of Radiology, University of Texas at Southwestern, Dallas, TX 75390, USA
| | - Jun Wang
- Department of Neurology, Dell Medical School, The University of Texas at Austin, Austin, TX 78712, USA
- Department of Speech, Language, and Hearing Sciences, University of Texas at Austin, Austin, TX 78712, USA
| |
Collapse
|
53
|
Luthra S, Correia JM, Kleinschmidt DF, Mesite L, Myers EB. Lexical Information Guides Retuning of Neural Patterns in Perceptual Learning for Speech. J Cogn Neurosci 2020; 32:2001-2012. [PMID: 32662731 PMCID: PMC8048099 DOI: 10.1162/jocn_a_01612] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
A listener's interpretation of a given speech sound can vary probabilistically from moment to moment. Previous experience (i.e., the contexts in which one has encountered an ambiguous sound) can further influence the interpretation of speech, a phenomenon known as perceptual learning for speech. This study used multivoxel pattern analysis to query how neural patterns reflect perceptual learning, leveraging archival fMRI data from a lexically guided perceptual learning study conducted by Myers and Mesite [Myers, E. B., & Mesite, L. M. Neural systems underlying perceptual adjustment to non-standard speech tokens. Journal of Memory and Language, 76, 80-93, 2014]. In that study, participants first heard ambiguous /s/-/∫/ blends in either /s/-biased lexical contexts (epi_ode) or /∫/-biased contexts (refre_ing); subsequently, they performed a phonetic categorization task on tokens from an /asi/-/a∫i/ continuum. In the current work, a classifier was trained to distinguish between phonetic categorization trials in which participants heard unambiguous productions of /s/ and those in which they heard unambiguous productions of /∫/. The classifier was able to generalize this training to ambiguous tokens from the middle of the continuum on the basis of individual participants' trial-by-trial perception. We take these findings as evidence that perceptual learning for speech involves neural recalibration, such that the pattern of activation approximates the perceived category. Exploratory analyses showed that left parietal regions (supramarginal and angular gyri) and right temporal regions (superior, middle, and transverse temporal gyri) were most informative for categorization. Overall, our results inform an understanding of how moment-to-moment variability in speech perception is encoded in the brain.
Collapse
Affiliation(s)
| | - João M Correia
- University of Algarve
- Basque Center on Cognition, Brain and Language
| | | | - Laura Mesite
- MGH Institute of Health Professions
- Harvard Graduate School of Education
| | | |
Collapse
|
54
|
Jung YH, Hong SK, Wang HS, Han JH, Pham TX, Park H, Kim J, Kang S, Yoo CD, Lee KJ. Flexible Piezoelectric Acoustic Sensors and Machine Learning for Speech Processing. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2020; 32:e1904020. [PMID: 31617274 DOI: 10.1002/adma.201904020] [Citation(s) in RCA: 75] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 08/28/2019] [Indexed: 05/22/2023]
Abstract
Flexible piezoelectric acoustic sensors have been developed to generate multiple sound signals with high sensitivity, shifting the paradigm of future voice technologies. Speech recognition based on advanced acoustic sensors and optimized machine learning software will play an innovative interface for artificial intelligence (AI) services. Collaboration and novel approaches between both smart sensors and speech algorithms should be attempted to realize a hyperconnected society, which can offer personalized services such as biometric authentication, AI secretaries, and home appliances. Here, representative developments in speech recognition are reviewed in terms of flexible piezoelectric materials, self-powered sensors, machine learning algorithms, and speaker recognition.
Collapse
Affiliation(s)
- Young Hoon Jung
- Department of Materials Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea
| | - Seong Kwang Hong
- Department of Materials Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea
| | - Hee Seung Wang
- Department of Materials Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea
| | - Jae Hyun Han
- Department of Materials Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea
| | - Trung Xuan Pham
- Department of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea
| | - Hyunsin Park
- Department of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea
| | - Junyeong Kim
- Department of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea
| | - Sunghun Kang
- Department of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea
| | - Chang D Yoo
- Department of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea
| | - Keon Jae Lee
- Department of Materials Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea
| |
Collapse
|
55
|
Dynamic Time-Locking Mechanism in the Cortical Representation of Spoken Words. eNeuro 2020; 7:ENEURO.0475-19.2020. [PMID: 32513662 PMCID: PMC7470935 DOI: 10.1523/eneuro.0475-19.2020] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2019] [Revised: 05/15/2020] [Accepted: 06/01/2020] [Indexed: 11/21/2022] Open
Abstract
Human speech has a unique capacity to carry and communicate rich meanings. However, it is not known how the highly dynamic and variable perceptual signal is mapped to existing linguistic and semantic representations. In this novel approach, we used the natural acoustic variability of sounds and mapped them to magnetoencephalography (MEG) data using physiologically-inspired machine-learning models. We aimed at determining how well the models, differing in their representation of temporal information, serve to decode and reconstruct spoken words from MEG recordings in 16 healthy volunteers. We discovered that dynamic time-locking of the cortical activation to the unfolding speech input is crucial for the encoding of the acoustic-phonetic features of speech. In contrast, time-locking was not highlighted in cortical processing of non-speech environmental sounds that conveyed the same meanings as the spoken words, including human-made sounds with temporal modulation content similar to speech. The amplitude envelope of the spoken words was particularly well reconstructed based on cortical evoked responses. Our results indicate that speech is encoded cortically with especially high temporal fidelity. This speech tracking by evoked responses may partly reflect the same underlying neural mechanism as the frequently reported entrainment of the cortical oscillations to the amplitude envelope of speech. Furthermore, the phoneme content was reflected in cortical evoked responses simultaneously with the spectrotemporal features, pointing to an instantaneous transformation of the unfolding acoustic features into linguistic representations during speech processing.
Collapse
|
56
|
Responses to Visual Speech in Human Posterior Superior Temporal Gyrus Examined with iEEG Deconvolution. J Neurosci 2020; 40:6938-6948. [PMID: 32727820 PMCID: PMC7470920 DOI: 10.1523/jneurosci.0279-20.2020] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Revised: 06/01/2020] [Accepted: 06/02/2020] [Indexed: 12/22/2022] Open
Abstract
Experimentalists studying multisensory integration compare neural responses to multisensory stimuli with responses to the component modalities presented in isolation. This procedure is problematic for multisensory speech perception since audiovisual speech and auditory-only speech are easily intelligible but visual-only speech is not. To overcome this confound, we developed intracranial encephalography (iEEG) deconvolution. Individual stimuli always contained both auditory and visual speech, but jittering the onset asynchrony between modalities allowed for the time course of the unisensory responses and the interaction between them to be independently estimated. We applied this procedure to electrodes implanted in human epilepsy patients (both male and female) over the posterior superior temporal gyrus (pSTG), a brain area known to be important for speech perception. iEEG deconvolution revealed sustained positive responses to visual-only speech and larger, phasic responses to auditory-only speech. Confirming results from scalp EEG, responses to audiovisual speech were weaker than responses to auditory-only speech, demonstrating a subadditive multisensory neural computation. Leveraging the spatial resolution of iEEG, we extended these results to show that subadditivity is most pronounced in more posterior aspects of the pSTG. Across electrodes, subadditivity correlated with visual responsiveness, supporting a model in which visual speech enhances the efficiency of auditory speech processing in pSTG. The ability to separate neural processes may make iEEG deconvolution useful for studying a variety of complex cognitive and perceptual tasks.SIGNIFICANCE STATEMENT Understanding speech is one of the most important human abilities. Speech perception uses information from both the auditory and visual modalities. It has been difficult to study neural responses to visual speech because visual-only speech is difficult or impossible to comprehend, unlike auditory-only and audiovisual speech. We used intracranial encephalography deconvolution to overcome this obstacle. We found that visual speech evokes a positive response in the human posterior superior temporal gyrus, enhancing the efficiency of auditory speech processing.
Collapse
|
57
|
Ullas S, Hausfeld L, Cutler A, Eisner F, Formisano E. Neural Correlates of Phonetic Adaptation as Induced by Lexical and Audiovisual Context. J Cogn Neurosci 2020; 32:2145-2158. [PMID: 32662723 DOI: 10.1162/jocn_a_01608] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
When speech perception is difficult, one way listeners adjust is by reconfiguring phoneme category boundaries, drawing on contextual information. Both lexical knowledge and lipreading cues are used in this way, but it remains unknown whether these two differing forms of perceptual learning are similar at a neural level. This study compared phoneme boundary adjustments driven by lexical or audiovisual cues, using ultra-high-field 7-T fMRI. During imaging, participants heard exposure stimuli and test stimuli. Exposure stimuli for lexical retuning were audio recordings of words, and those for audiovisual recalibration were audio-video recordings of lip movements during utterances of pseudowords. Test stimuli were ambiguous phonetic strings presented without context, and listeners reported what phoneme they heard. Reports reflected phoneme biases in preceding exposure blocks (e.g., more reported /p/ after /p/-biased exposure). Analysis of corresponding brain responses indicated that both forms of cue use were associated with a network of activity across the temporal cortex, plus parietal, insula, and motor areas. Audiovisual recalibration also elicited significant occipital cortex activity despite the lack of visual stimuli. Activity levels in several ROIs also covaried with strength of audiovisual recalibration, with greater activity accompanying larger recalibration shifts. Similar activation patterns appeared for lexical retuning, but here, no significant ROIs were identified. Audiovisual and lexical forms of perceptual learning thus induce largely similar brain response patterns. However, audiovisual recalibration involves additional visual cortex contributions, suggesting that previously acquired visual information (on lip movements) is retrieved and deployed to disambiguate auditory perception.
Collapse
Affiliation(s)
- Shruti Ullas
- Maastricht University.,Maastricht Brain Imaging Centre
| | - Lars Hausfeld
- Maastricht University.,Maastricht Brain Imaging Centre
| | | | | | - Elia Formisano
- Maastricht University.,Maastricht Brain Imaging Centre.,Maastricht Centre for Systems Biology
| |
Collapse
|
58
|
Dash D, Ferrari P, Heitzman D, Wang J. Decoding Speech from Single Trial MEG Signals Using Convolutional Neural Networks and Transfer Learning. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2020; 2019:5531-5535. [PMID: 31947107 DOI: 10.1109/embc.2019.8857874] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Decoding speech directly from the brain has the potential for the development of the next generation, more efficient brain computer interfaces (BCIs) to assist in the communication of patients with locked-in syndrome (fully paralyzed but aware). In this study, we have explored the spectral and temporal features of the magnetoencephalography (MEG) signals and trained those features with convolutional neural networks (CNN) for the classification of neural signals corresponding to phrases. Experimental results demonstrated the effectiveness of CNNs in decoding speech during perception, imagination, and production tasks. Furthermore, to overcome the long training time issue of CNNs, we leveraged principal component analysis (PCA) for spatial dimension reduction of MEG data and transfer learning for model initialization. Both PCA and transfer learning were found to be highly beneficial for faster model training. The best configuration (50 principal coefficients + transfer learning) led to more than 10 times faster training than the original setting while the speech decoding accuracy remained at a similarly high level.
Collapse
|
59
|
NeuroVAD: Real-Time Voice Activity Detection from Non-Invasive Neuromagnetic Signals. SENSORS 2020; 20:s20082248. [PMID: 32316162 PMCID: PMC7218843 DOI: 10.3390/s20082248] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/19/2020] [Revised: 04/11/2020] [Accepted: 04/14/2020] [Indexed: 11/26/2022]
Abstract
Neural speech decoding-driven brain-computer interface (BCI) or speech-BCI is a novel paradigm for exploring communication restoration for locked-in (fully paralyzed but aware) patients. Speech-BCIs aim to map a direct transformation from neural signals to text or speech, which has the potential for a higher communication rate than the current BCIs. Although recent progress has demonstrated the potential of speech-BCIs from either invasive or non-invasive neural signals, the majority of the systems developed so far still assume knowing the onset and offset of the speech utterances within the continuous neural recordings. This lack of real-time voice/speech activity detection (VAD) is a current obstacle for future applications of neural speech decoding wherein BCI users can have a continuous conversation with other speakers. To address this issue, in this study, we attempted to automatically detect the voice/speech activity directly from the neural signals recorded using magnetoencephalography (MEG). First, we classified the whole segments of pre-speech, speech, and post-speech in the neural signals using a support vector machine (SVM). Second, for continuous prediction, we used a long short-term memory-recurrent neural network (LSTM-RNN) to efficiently decode the voice activity at each time point via its sequential pattern-learning mechanism. Experimental results demonstrated the possibility of real-time VAD directly from the non-invasive neural signals with about 88% accuracy.
Collapse
|
60
|
Dash D, Ferrari P, Wang J. Decoding Imagined and Spoken Phrases From Non-invasive Neural (MEG) Signals. Front Neurosci 2020; 14:290. [PMID: 32317917 PMCID: PMC7154084 DOI: 10.3389/fnins.2020.00290] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2019] [Accepted: 03/13/2020] [Indexed: 11/16/2022] Open
Abstract
Speech production is a hierarchical mechanism involving the synchronization of the brain and the oral articulators, where the intention of linguistic concepts is transformed into meaningful sounds. Individuals with locked-in syndrome (fully paralyzed but aware) lose their motor ability completely including articulation and even eyeball movement. The neural pathway may be the only option to resume a certain level of communication for these patients. Current brain-computer interfaces (BCIs) use patients' visual and attentional correlates to build communication, resulting in a slow communication rate (a few words per minute). Direct decoding of imagined speech from the neural signals (and then driving a speech synthesizer) has the potential for a higher communication rate. In this study, we investigated the decoding of five imagined and spoken phrases from single-trial, non-invasive magnetoencephalography (MEG) signals collected from eight adult subjects. Two machine learning algorithms were used. One was an artificial neural network (ANN) with statistical features as the baseline approach. The other was convolutional neural networks (CNNs) applied on the spatial, spectral and temporal features extracted from the MEG signals. Experimental results indicated the possibility to decode imagined and spoken phrases directly from neuromagnetic signals. CNNs were found to be highly effective with an average decoding accuracy of up to 93% for the imagined and 96% for the spoken phrases.
Collapse
Affiliation(s)
- Debadatta Dash
- Department of Electrical and Computer Engineering, University of Texas at Austin, Austin, TX, United States
- Department of Neurology, Dell Medical School, University of Texas at Austin, Austin, TX, United States
| | - Paul Ferrari
- MEG Lab, Dell Children's Medical Center, Austin, TX, United States
- Department of Psychology, University of Texas at Austin, Austin, TX, United States
| | - Jun Wang
- Department of Neurology, Dell Medical School, University of Texas at Austin, Austin, TX, United States
- Department of Communication Sciences and Disorders, University of Texas at Austin, Austin, TX, United States
| |
Collapse
|
61
|
Al-Wasity S, Vogt S, Vuckovic A, Pollick FE. Hyperalignment of motor cortical areas based on motor imagery during action observation. Sci Rep 2020; 10:5362. [PMID: 32210277 PMCID: PMC7093515 DOI: 10.1038/s41598-020-62071-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Accepted: 02/28/2020] [Indexed: 12/31/2022] Open
Abstract
Multivariate Pattern Analysis (MVPA) has grown in importance due to its capacity to use both coarse and fine scale patterns of brain activity. However, a major limitation of multivariate analysis is the difficulty of aligning features across brains, which makes MVPA a subject specific analysis. Recent work by Haxby et al. (2011) introduced a method called Hyperalignment that explored neural activity in ventral temporal cortex during object recognition and demonstrated the ability to align individual patterns of brain activity into a common high dimensional space to facilitate Between Subject Classification (BSC). Here we examined BSC based on Hyperalignment of motor cortex during a task of motor imagery of three natural actions (lift, knock and throw). To achieve this we collected brain activity during the combined tasks of action observation and motor imagery to a parametric action space containing 25 stick-figure blends of the three natural actions. From these responses we derived Hyperalignment transformation parameters that were used to map subjects’ representational spaces of the motor imagery task in the motor cortex into a common model representational space. Results showed that BSC of the neural response patterns based on Hyperalignment exceeded both BSC based on anatomical alignment as well as a standard Within Subject Classification (WSC) approach. We also found that results were sensitive to the order in which participants entered the Hyperalignment algorithm. These results demonstrate the effectiveness of Hyperalignment to align neural responses across subject in motor cortex to enable BSC of motor imagery.
Collapse
Affiliation(s)
- Salim Al-Wasity
- School of Psychology, University of Glasgow, Glasgow, G12 8QB, UK. .,School of Engineering, University of Glasgow, Glasgow, G12 8QB, UK. .,College of Engineering, University of Wasit, Wasit, Iraq.
| | - Stefan Vogt
- Department of Psychology, Lancaster University, Lancaster, LA1 4YF, UK
| | | | - Frank E Pollick
- School of Psychology, University of Glasgow, Glasgow, G12 8QB, UK
| |
Collapse
|
62
|
Correia JM, Caballero-Gaudes C, Guediche S, Carreiras M. Phonatory and articulatory representations of speech production in cortical and subcortical fMRI responses. Sci Rep 2020; 10:4529. [PMID: 32161310 PMCID: PMC7066132 DOI: 10.1038/s41598-020-61435-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Accepted: 02/24/2020] [Indexed: 11/25/2022] Open
Abstract
Speaking involves coordination of multiple neuromotor systems, including respiration, phonation and articulation. Developing non-invasive imaging methods to study how the brain controls these systems is critical for understanding the neurobiology of speech production. Recent models and animal research suggest that regions beyond the primary motor cortex (M1) help orchestrate the neuromotor control needed for speaking, including cortical and sub-cortical regions. Using contrasts between speech conditions with controlled respiratory behavior, this fMRI study investigates articulatory gestures involving the tongue, lips and velum (i.e., alveolars versus bilabials, and nasals versus orals), and phonatory gestures (i.e., voiced versus whispered speech). Multivariate pattern analysis (MVPA) was used to decode articulatory gestures in M1, cerebellum and basal ganglia. Furthermore, apart from confirming the role of a mid-M1 region for phonation, we found that a dorsal M1 region, linked to respiratory control, showed significant differences for voiced compared to whispered speech despite matched lung volume observations. This region was also functionally connected to tongue and lip M1 seed regions, underlying its importance in the coordination of speech. Our study confirms and extends current knowledge regarding the neural mechanisms underlying neuromotor speech control, which hold promise to study neural dysfunctions involved in motor-speech disorders non-invasively.
Collapse
Affiliation(s)
- Joao M Correia
- BCBL, Basque Center on Cognition Brain and Language, San Sebastian, Spain. .,Centre for Biomedical Research (CBMR)/Department of Psychology, University of Algarve, Faro, Portugal.
| | | | - Sara Guediche
- BCBL, Basque Center on Cognition Brain and Language, San Sebastian, Spain
| | - Manuel Carreiras
- BCBL, Basque Center on Cognition Brain and Language, San Sebastian, Spain.,Ikerbasque. Basque Foundation for Science, Bilbao, Spain.,University of the Basque Country. UPV/EHU, Bilbao, Spain
| |
Collapse
|
63
|
Repetition enhancement to voice identities in the dog brain. Sci Rep 2020; 10:3989. [PMID: 32132562 PMCID: PMC7055288 DOI: 10.1038/s41598-020-60395-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Accepted: 02/06/2020] [Indexed: 01/19/2023] Open
Abstract
In the human speech signal, cues of speech sounds and voice identities are conflated, but they are processed separately in the human brain. The processing of speech sounds and voice identities is typically performed by non-primary auditory regions in humans and non-human primates. Additionally, these processes exhibit functional asymmetry in humans, indicating the involvement of distinct mechanisms. Behavioural studies indicate analogue side biases in dogs, but neural evidence for this functional dissociation is missing. In two experiments, using an fMRI adaptation paradigm, we presented awake dogs with natural human speech that either varied in segmental (change in speech sound) or suprasegmental (change in voice identity) content. In auditory regions, we found a repetition enhancement effect for voice identity processing in a secondary auditory region - the caudal ectosylvian gyrus. The same region did not show repetition effects for speech sounds, nor did the primary auditory cortex exhibit sensitivity to changes either in the segmental or in the suprasegmental content. Furthermore, we did not find evidence for functional asymmetry neither in the processing of speech sounds or voice identities. Our results in dogs corroborate former human and non-human primate evidence on the role of secondary auditory regions in the processing of suprasegmental cues, suggesting similar neural sensitivity to the identity of the vocalizer across the mammalian order.
Collapse
|
64
|
Mercure E, Evans S, Pirazzoli L, Goldberg L, Bowden-Howl H, Coulson-Thaker K, Beedie I, Lloyd-Fox S, Johnson MH, MacSweeney M. Language Experience Impacts Brain Activation for Spoken and Signed Language in Infancy: Insights From Unimodal and Bimodal Bilinguals. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2020; 1:9-32. [PMID: 32274469 PMCID: PMC7145445 DOI: 10.1162/nol_a_00001] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Recent neuroimaging studies suggest that monolingual infants activate a left-lateralized frontotemporal brain network in response to spoken language, which is similar to the network involved in processing spoken and signed language in adulthood. However, it is unclear how brain activation to language is influenced by early experience in infancy. To address this question, we present functional near-infrared spectroscopy (fNIRS) data from 60 hearing infants (4 to 8 months of age): 19 monolingual infants exposed to English, 20 unimodal bilingual infants exposed to two spoken languages, and 21 bimodal bilingual infants exposed to English and British Sign Language (BSL). Across all infants, spoken language elicited activation in a bilateral brain network including the inferior frontal and posterior temporal areas, whereas sign language elicited activation in the right temporoparietal area. A significant difference in brain lateralization was observed between groups. Activation in the posterior temporal region was not lateralized in monolinguals and bimodal bilinguals, but right lateralized in response to both language modalities in unimodal bilinguals. This suggests that the experience of two spoken languages influences brain activation for sign language when experienced for the first time. Multivariate pattern analyses (MVPAs) could classify distributed patterns of activation within the left hemisphere for spoken and signed language in monolinguals (proportion correct = 0.68; p = 0.039) but not in unimodal or bimodal bilinguals. These results suggest that bilingual experience in infancy influences brain activation for language and that unimodal bilingual experience has greater impact on early brain lateralization than bimodal bilingual experience.
Collapse
Affiliation(s)
| | - Samuel Evans
- University College London, London, UK
- University of Westminster, London, UK
| | - Laura Pirazzoli
- Birkbeck - University of London, London, UK
- Boston Children’s Hospital, Boston, Massachusetts, US
| | | | - Harriet Bowden-Howl
- University College London, London, UK
- University of Plymouth, Plymouth, Devon, UK
| | | | | | - Sarah Lloyd-Fox
- Birkbeck - University of London, London, UK
- University of Cambridge, Cambridge, Cambridgeshire, UK
| | - Mark H. Johnson
- Birkbeck - University of London, London, UK
- University of Cambridge, Cambridge, Cambridgeshire, UK
| | | |
Collapse
|
65
|
Dash D, Ferrari P, Malik S, Montillo A, Maldjian JA, Wang J. Determining the Optimal Number of MEG Trials: A Machine Learning and Speech Decoding Perspective. BRAIN INFORMATICS : INTERNATIONAL CONFERENCE, BI 2018, ARLINGTON, TX, USA, DECEMBER 7-9, 2018, PROCEEDINGS. INTERNATIONAL CONFERENCE ON BRAIN INFORMATICS (2018 : ARLINGTON, TEX.) 2019; 11309:163-172. [PMID: 31768504 PMCID: PMC6876632 DOI: 10.1007/978-3-030-05587-5_16] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Advancing the knowledge about neural speech mechanisms is critical for developing next-generation, faster brain computer interface to assist in speech communication for the patients with severe neurological conditions (e.g., locked-in syndrome). Among current neuroimaging techniques, Magnetoencephalography (MEG) provides direct representation for the large-scale neural dynamics of underlying cognitive processes based on its optimal spatiotemporal resolution. However, the MEG measured neural signals are smaller in magnitude compared to the background noise and hence, MEG usually suffers from a low signal-to-noise ratio (SNR) at the single-trial level. To overcome this limitation, it is common to record many trials of the same event-task and use the time-locked average signal for analysis, which can be very time consuming. In this study, we investigated the effect of the number of MEG recording trials required for speech decoding using a machine learning algorithm. We used a wavelet filter for generating the denoised neural features to train an Artificial Neural Network (ANN) for speech decoding. We found that wavelet based denoising increased the SNR of the neural signal prior to analysis and facilitated accurate speech decoding performance using as few as 40 single-trials. This study may open up the possibility of limiting MEG trials for other task evoked studies as well.
Collapse
Affiliation(s)
- Debadatta Dash
- Department of Bioengineering, University of Texas at Dallas, Richardson, USA
| | - Paul Ferrari
- Department of Psychology, University of Texas at Austin, Austin, USA
- MEG Laboratory, Dell Children's Medical Center, Austin, USA
| | - Saleem Malik
- MEG Lab, Cook Children's Hospital, Fort Worth, TX, USA
| | - Albert Montillo
- Department of Radiology, UT Southwestern Medical Center, Dallas, USA
- Department of Bioinformatics, UT Southwestern Medical Center, Dallas, USA
| | - Joseph A Maldjian
- Department of Radiology, UT Southwestern Medical Center, Dallas, USA
| | - Jun Wang
- Department of Bioengineering, University of Texas at Dallas, Richardson, USA
- Callier Center for Communication Disorders, University of Texas at Dallas, Richardson, USA
| |
Collapse
|
66
|
Bodin C, Belin P. Exploring the cerebral substrate of voice perception in primate brains. Philos Trans R Soc Lond B Biol Sci 2019; 375:20180386. [PMID: 31735143 PMCID: PMC6895549 DOI: 10.1098/rstb.2018.0386] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
One can consider human language to be the Swiss army knife of the vast domain of animal communication. There is now growing evidence suggesting that this technology may have emerged from already operational material instead of being a sudden innovation. Sharing ideas and thoughts with conspecifics via language constitutes an amazing ability, but what value would it hold if our conspecifics were not first detected and recognized? Conspecific voice (CV) perception is fundamental to communication and widely shared across the animal kingdom. Two questions that arise then are: is this apparently shared ability reflected in common cerebral substrate? And, how has this substrate evolved? The paper addresses these questions by examining studies on the cerebral basis of CV perception in humans' closest relatives, non-human primates. Neuroimaging studies, in particular, suggest the existence of a ‘voice patch system’, a network of interconnected cortical areas that can provide a common template for the cerebral processing of CV in primates. This article is part of the theme issue ‘What can animal communication teach us about human language?’
Collapse
Affiliation(s)
- Clémentine Bodin
- Institut de Neurosciences de la Timone, UMR 7289 Centre National de la Recherche Scientifique and Aix-Marseille Université, Marseille, France
| | - Pascal Belin
- Institut de Neurosciences de la Timone, UMR 7289 Centre National de la Recherche Scientifique and Aix-Marseille Université, Marseille, France.,Département de Psychologie, Université de Montréal, Montréal, Canada
| |
Collapse
|
67
|
Feng G, Gan Z, Wang S, Wong PCM, Chandrasekaran B. Task-General and Acoustic-Invariant Neural Representation of Speech Categories in the Human Brain. Cereb Cortex 2019; 28:3241-3254. [PMID: 28968658 DOI: 10.1093/cercor/bhx195] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2016] [Accepted: 07/13/2017] [Indexed: 11/14/2022] Open
Abstract
A significant neural challenge in speech perception includes extracting discrete phonetic categories from continuous and multidimensional signals despite varying task demands and surface-acoustic variability. While neural representations of speech categories have been previously identified in frontal and posterior temporal-parietal regions, the task dependency and dimensional specificity of these neural representations are still unclear. Here, we asked native Mandarin participants to listen to speech syllables carrying 4 distinct lexical tone categories across passive listening, repetition, and categorization tasks while they underwent functional magnetic resonance imaging (fMRI). We used searchlight classification and representational similarity analysis (RSA) to identify the dimensional structure underlying neural representation across tasks and surface-acoustic properties. Searchlight classification analyses revealed significant "cross-task" lexical tone decoding within the bilateral superior temporal gyrus (STG) and left inferior parietal lobule (LIPL). RSA revealed that the LIPL and LSTG, in contrast to the RSTG, relate to 2 critical dimensions (pitch height, pitch direction) underlying tone perception. Outside this core representational network, we found greater activation in the inferior frontal and parietal regions for stimuli that are more perceptually similar during tone categorization. Our findings reveal the specific characteristics of fronto-tempo-parietal regions that support speech representation and categorization processing.
Collapse
Affiliation(s)
- Gangyi Feng
- Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China.,Brain and Mind Institute, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China.,Department of Communication Sciences & Disorders, Moody College of Communication, The University of Texas at Austin, 2504A Whitis Avenue (A1100), Austin, TX, USA
| | - Zhenzhong Gan
- Center for the Study of Applied Psychology and School of Psychology, South China Normal University, Guangzhou, China
| | - Suiping Wang
- Center for the Study of Applied Psychology and School of Psychology, South China Normal University, Guangzhou, China.,Guangdong Provincial Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou, China
| | - Patrick C M Wong
- Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China.,Brain and Mind Institute, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China
| | - Bharath Chandrasekaran
- Department of Communication Sciences & Disorders, Moody College of Communication, The University of Texas at Austin, 2504A Whitis Avenue (A1100), Austin, TX, USA.,Department of Psychology, The University of Texas at Austin, 108 E. Dean Keeton Stop, Austin, TX, USA.,Department of Linguistics, The University of Texas at Austin, 305 E. 23rd Street STOP, Austin, TX, USA.,Institute for Mental Health Research, College of Liberal Arts, The University of Texas at Austin, 305 E. 23rd St. Stop, Austin, TX, USA.,The Institute for Neuroscience, The University of Texas at Austin, 1 University Station Stop, Austin, TX, USA
| |
Collapse
|
68
|
Faces and voices in the brain: A modality-general person-identity representation in superior temporal sulcus. Neuroimage 2019; 201:116004. [DOI: 10.1016/j.neuroimage.2019.07.017] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Revised: 05/17/2019] [Accepted: 07/07/2019] [Indexed: 11/18/2022] Open
|
69
|
Hajj N, Rizk Y, Awad M. A subjectivity classification framework for sports articles using improved cortical algorithms. Neural Comput Appl 2019. [DOI: 10.1007/s00521-018-3549-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
70
|
The Jena Speaker Set (JESS)-A database of voice stimuli from unfamiliar young and old adult speakers. Behav Res Methods 2019; 52:990-1007. [PMID: 31637667 DOI: 10.3758/s13428-019-01296-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Here we describe the Jena Speaker Set (JESS), a free database for unfamiliar adult voice stimuli, comprising voices from 61 young (18-25 years) and 59 old (60-81 years) female and male speakers uttering various sentences, syllables, read text, semi-spontaneous speech, and vowels. Listeners rated two voice samples (short sentences) per speaker for attractiveness, likeability, two measures of distinctiveness ("deviation"-based [DEV] and "voice in the crowd"-based [VITC]), regional accent, and age. Interrater reliability was high, with Cronbach's α between .82 and .99. Young voices were generally rated as more attractive than old voices, but particularly so when male listeners judged female voices. Moreover, young female voices were rated as more likeable than both young male and old female voices. Young voices were judged to be less distinctive than old voices according to the DEV measure, with no differences in the VITC measure. In age ratings, listeners almost perfectly discriminated young from old voices; additionally, young female voices were perceived as being younger than young male voices. Correlations between the rating dimensions above demonstrated (among other things) that DEV-based distinctiveness was strongly negatively correlated with rated attractiveness and likeability. By contrast, VITC-based distinctiveness was uncorrelated with rated attractiveness and likeability in young voices, although a moderate negative correlation was observed for old voices. Overall, the present results demonstrate systematic effects of vocal age and gender on impressions based on the voice and inform as to the selection of suitable voice stimuli for further research into voice perception, learning, and memory.
Collapse
|
71
|
Ogg M, Carlson TA, Slevc LR. The Rapid Emergence of Auditory Object Representations in Cortex Reflect Central Acoustic Attributes. J Cogn Neurosci 2019; 32:111-123. [PMID: 31560265 DOI: 10.1162/jocn_a_01472] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Human listeners are bombarded by acoustic information that the brain rapidly organizes into coherent percepts of objects and events in the environment, which aids speech and music perception. The efficiency of auditory object recognition belies the critical constraint that acoustic stimuli necessarily require time to unfold. Using magnetoencephalography, we studied the time course of the neural processes that transform dynamic acoustic information into auditory object representations. Participants listened to a diverse set of 36 tokens comprising everyday sounds from a typical human environment. Multivariate pattern analysis was used to decode the sound tokens from the magnetoencephalographic recordings. We show that sound tokens can be decoded from brain activity beginning 90 msec after stimulus onset with peak decoding performance occurring at 155 msec poststimulus onset. Decoding performance was primarily driven by differences between category representations (e.g., environmental vs. instrument sounds), although within-category decoding was better than chance. Representational similarity analysis revealed that these emerging neural representations were related to harmonic and spectrotemporal differences among the stimuli, which correspond to canonical acoustic features processed by the auditory pathway. Our findings begin to link the processing of physical sound properties with the perception of auditory objects and events in cortex.
Collapse
|
72
|
Abstract
How do we learn what we know about others? Answering this question requires understanding the perceptual mechanisms with which we recognize individuals and their actions, and the processes by which the resulting perceptual representations lead to inferences about people's mental states and traits. This review discusses recent behavioral, neural, and computational studies that have contributed to this broad research program, encompassing both social perception and social cognition.
Collapse
Affiliation(s)
- Stefano Anzellotti
- Department of Psychology, Boston College, Boston, Massachusetts 02467, USA; ,
| | - Liane L Young
- Department of Psychology, Boston College, Boston, Massachusetts 02467, USA; ,
| |
Collapse
|
73
|
Gallivan JP, Chapman CS, Wolpert DM, Flanagan JR. Decision-making in sensorimotor control. Nat Rev Neurosci 2019; 19:519-534. [PMID: 30089888 DOI: 10.1038/s41583-018-0045-9] [Citation(s) in RCA: 137] [Impact Index Per Article: 27.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Skilled sensorimotor interactions with the world result from a series of decision-making processes that determine, on the basis of information extracted during the unfolding sequence of events, which movements to make and when and how to make them. Despite this inherent link between decision-making and sensorimotor control, research into each of these two areas has largely evolved in isolation, and it is only fairly recently that researchers have begun investigating how they interact and, together, influence behaviour. Here, we review recent behavioural, neurophysiological and computational research that highlights the role of decision-making processes in the selection, planning and control of goal-directed movements in humans and nonhuman primates.
Collapse
Affiliation(s)
- Jason P Gallivan
- Centre for Neuroscience Studies and Department of Psychology, Queen's University, Kingston, Ontario, Canada. .,Department of Biomedical and Molecular Sciences, Queen's University, Kingston, Ontario, Canada.
| | - Craig S Chapman
- Faculty of Kinesiology, Sport, and Recreation and Neuroscience and Mental Health Institute, University of Alberta, Edmonton, Alberta, Canada
| | - Daniel M Wolpert
- Department of Engineering, University of Cambridge, Cambridge, UK.,Zuckerman Mind Brain Behavior Institute, Department of Neuroscience, Columbia University, New York, NY, USA
| | - J Randall Flanagan
- Centre for Neuroscience Studies and Department of Psychology, Queen's University, Kingston, Ontario, Canada.
| |
Collapse
|
74
|
Karas PJ, Magnotti JF, Metzger BA, Zhu LL, Smith KB, Yoshor D, Beauchamp MS. The visual speech head start improves perception and reduces superior temporal cortex responses to auditory speech. eLife 2019; 8:e48116. [PMID: 31393261 PMCID: PMC6687434 DOI: 10.7554/elife.48116] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Accepted: 07/17/2019] [Indexed: 12/30/2022] Open
Abstract
Visual information about speech content from the talker's mouth is often available before auditory information from the talker's voice. Here we examined perceptual and neural responses to words with and without this visual head start. For both types of words, perception was enhanced by viewing the talker's face, but the enhancement was significantly greater for words with a head start. Neural responses were measured from electrodes implanted over auditory association cortex in the posterior superior temporal gyrus (pSTG) of epileptic patients. The presence of visual speech suppressed responses to auditory speech, more so for words with a visual head start. We suggest that the head start inhibits representations of incompatible auditory phonemes, increasing perceptual accuracy and decreasing total neural responses. Together with previous work showing visual cortex modulation (Ozker et al., 2018b) these results from pSTG demonstrate that multisensory interactions are a powerful modulator of activity throughout the speech perception network.
Collapse
Affiliation(s)
- Patrick J Karas
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - John F Magnotti
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Brian A Metzger
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Lin L Zhu
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Kristen B Smith
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Daniel Yoshor
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | | |
Collapse
|
75
|
Ogg M, Slevc LR. Acoustic Correlates of Auditory Object and Event Perception: Speakers, Musical Timbres, and Environmental Sounds. Front Psychol 2019; 10:1594. [PMID: 31379658 PMCID: PMC6650748 DOI: 10.3389/fpsyg.2019.01594] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2019] [Accepted: 06/25/2019] [Indexed: 11/13/2022] Open
Abstract
Human listeners must identify and orient themselves to auditory objects and events in their environment. What acoustic features support a listener's ability to differentiate the great variety of natural sounds they might encounter? Studies of auditory object perception typically examine identification (and confusion) responses or dissimilarity ratings between pairs of objects and events. However, the majority of this prior work has been conducted within single categories of sound. This separation has precluded a broader understanding of the general acoustic attributes that govern auditory object and event perception within and across different behaviorally relevant sound classes. The present experiments take a broader approach by examining multiple categories of sound relative to one another. This approach bridges critical gaps in the literature and allows us to identify (and assess the relative importance of) features that are useful for distinguishing sounds within, between and across behaviorally relevant sound categories. To do this, we conducted behavioral sound identification (Experiment 1) and dissimilarity rating (Experiment 2) studies using a broad set of stimuli that leveraged the acoustic variability within and between different sound categories via a diverse set of 36 sound tokens (12 utterances from different speakers, 12 instrument timbres, and 12 everyday objects from a typical human environment). Multidimensional scaling solutions as well as analyses of item-pair-level responses as a function of different acoustic qualities were used to understand what acoustic features informed participants' responses. In addition to the spectral and temporal envelope qualities noted in previous work, listeners' dissimilarity ratings were associated with spectrotemporal variability and aperiodicity. Subsets of these features (along with fundamental frequency variability) were also useful for making specific within or between sound category judgments. Dissimilarity ratings largely paralleled sound identification performance, however the results of these tasks did not completely mirror one another. In addition, musical training was related to improved sound identification performance.
Collapse
Affiliation(s)
- Mattson Ogg
- Neuroscience and Cognitive Science Program, University of Maryland, College Park, College Park, MD, United States
- Department of Psychology, University of Maryland, College Park, College Park, MD, United States
| | - L. Robert Slevc
- Neuroscience and Cognitive Science Program, University of Maryland, College Park, College Park, MD, United States
- Department of Psychology, University of Maryland, College Park, College Park, MD, United States
| |
Collapse
|
76
|
Rutten S, Santoro R, Hervais-Adelman A, Formisano E, Golestani N. Cortical encoding of speech enhances task-relevant acoustic information. Nat Hum Behav 2019; 3:974-987. [DOI: 10.1038/s41562-019-0648-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Accepted: 06/03/2019] [Indexed: 11/09/2022]
|
77
|
Vandermosten M, Correia J, Vanderauwera J, Wouters J, Ghesquière P, Bonte M. Brain activity patterns of phonemic representations are atypical in beginning readers with family risk for dyslexia. Dev Sci 2019; 23:e12857. [PMID: 31090993 DOI: 10.1111/desc.12857] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Revised: 04/03/2019] [Accepted: 04/29/2019] [Indexed: 12/13/2022]
Abstract
There is an ongoing debate whether phonological deficits in dyslexics should be attributed to (a) less specified representations of speech sounds, like suggested by studies in young children with a familial risk for dyslexia, or (b) to an impaired access to these phonemic representations, as suggested by studies in adults with dyslexia. These conflicting findings are rooted in between study differences in sample characteristics and/or testing techniques. The current study uses the same multivariate functional MRI (fMRI) approach as previously used in adults with dyslexia to investigate phonemic representations in 30 beginning readers with a familial risk and 24 beginning readers without a familial risk of dyslexia, of whom 20 were later retrospectively classified as dyslexic. Based on fMRI response patterns evoked by listening to different utterances of /bA/ and /dA/ sounds, multivoxel analyses indicate that the underlying activation patterns of the two phonemes were distinct in children with a low family risk but not in children with high family risk. However, no group differences were observed between children that were later classified as typical versus dyslexic readers, regardless of their family risk status, indicating that poor phonemic representations constitute a risk for dyslexia but are not sufficient to result in reading problems. We hypothesize that poor phonemic representations are trait (family risk) and not state (dyslexia) dependent, and that representational deficits only lead to reading difficulties when they are present in conjunction with other neuroanatomical or-functional deficits.
Collapse
Affiliation(s)
- Maaike Vandermosten
- Research Group ExpORL, Department of Neuroscience, KU Leuven, Leuven, Belgium.,Department of Cognitive Neuroscience and Maastricht Brain Imaging Center, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, The Netherlands
| | - Joao Correia
- Department of Cognitive Neuroscience and Maastricht Brain Imaging Center, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, The Netherlands.,Basque Center on Cognition, Brain and Language, San Sebastian, Spain
| | - Jolijn Vanderauwera
- Research Group ExpORL, Department of Neuroscience, KU Leuven, Leuven, Belgium.,Parenting and Special Education Research Unit, Faculty of Psychology and Educational Sciences, KU Leuven, Leuven, Belgium
| | - Jan Wouters
- Research Group ExpORL, Department of Neuroscience, KU Leuven, Leuven, Belgium
| | - Pol Ghesquière
- Parenting and Special Education Research Unit, Faculty of Psychology and Educational Sciences, KU Leuven, Leuven, Belgium
| | - Milene Bonte
- Department of Cognitive Neuroscience and Maastricht Brain Imaging Center, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, The Netherlands
| |
Collapse
|
78
|
Yi HG, Leonard MK, Chang EF. The Encoding of Speech Sounds in the Superior Temporal Gyrus. Neuron 2019; 102:1096-1110. [PMID: 31220442 PMCID: PMC6602075 DOI: 10.1016/j.neuron.2019.04.023] [Citation(s) in RCA: 173] [Impact Index Per Article: 34.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Revised: 04/08/2019] [Accepted: 04/16/2019] [Indexed: 01/02/2023]
Abstract
The human superior temporal gyrus (STG) is critical for extracting meaningful linguistic features from speech input. Local neural populations are tuned to acoustic-phonetic features of all consonants and vowels and to dynamic cues for intonational pitch. These populations are embedded throughout broader functional zones that are sensitive to amplitude-based temporal cues. Beyond speech features, STG representations are strongly modulated by learned knowledge and perceptual goals. Currently, a major challenge is to understand how these features are integrated across space and time in the brain during natural speech comprehension. We present a theory that temporally recurrent connections within STG generate context-dependent phonological representations, spanning longer temporal sequences relevant for coherent percepts of syllables, words, and phrases.
Collapse
Affiliation(s)
- Han Gyol Yi
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA
| | - Matthew K Leonard
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA.
| |
Collapse
|
79
|
Sjerps MJ, Fox NP, Johnson K, Chang EF. Speaker-normalized sound representations in the human auditory cortex. Nat Commun 2019; 10:2465. [PMID: 31165733 PMCID: PMC6549175 DOI: 10.1038/s41467-019-10365-z] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Accepted: 05/03/2019] [Indexed: 11/08/2022] Open
Abstract
The acoustic dimensions that distinguish speech sounds (like the vowel differences in "boot" and "boat") also differentiate speakers' voices. Therefore, listeners must normalize across speakers without losing linguistic information. Past behavioral work suggests an important role for auditory contrast enhancement in normalization: preceding context affects listeners' perception of subsequent speech sounds. Here, using intracranial electrocorticography in humans, we investigate whether and how such context effects arise in auditory cortex. Participants identified speech sounds that were preceded by phrases from two different speakers whose voices differed along the same acoustic dimension as target words (the lowest resonance of the vocal tract). In every participant, target vowels evoke a speaker-dependent neural response that is consistent with the listener's perception, and which follows from a contrast enhancement model. Auditory cortex processing thus displays a critical feature of normalization, allowing listeners to extract meaningful content from the voices of diverse speakers.
Collapse
Affiliation(s)
- Matthias J Sjerps
- Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, Radboud University, Kapittelweg 29, Nijmegen, 6525 EN, The Netherlands
- Max Planck Institute for Psycholinguistics, Wundtlaan 1, Nijmegen, 6525 XD, Netherlands
| | - Neal P Fox
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, California, 94158, USA
| | - Keith Johnson
- Department of Linguistics, University of California, Berkeley, 1203 Dwinelle Hall #2650, Berkeley, California, 94720, USA
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, California, 94158, USA.
- Weill Institute for Neurosciences, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, California, 94158, USA.
| |
Collapse
|
80
|
Rampinini AC, Handjaras G, Leo A, Cecchetti L, Betta M, Marotta G, Ricciardi E, Pietrini P. Formant Space Reconstruction From Brain Activity in Frontal and Temporal Regions Coding for Heard Vowels. Front Hum Neurosci 2019; 13:32. [PMID: 30837851 PMCID: PMC6383050 DOI: 10.3389/fnhum.2019.00032] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Accepted: 01/21/2019] [Indexed: 11/29/2022] Open
Abstract
Classical studies have isolated a distributed network of temporal and frontal areas engaged in the neural representation of speech perception and production. With modern literature arguing against unique roles for these cortical regions, different theories have favored either neural code-sharing or cortical space-sharing, thus trying to explain the intertwined spatial and functional organization of motor and acoustic components across the fronto-temporal cortical network. In this context, the focus of attention has recently shifted toward specific model fitting, aimed at motor and/or acoustic space reconstruction in brain activity within the language network. Here, we tested a model based on acoustic properties (formants), and one based on motor properties (articulation parameters), where model-free decoding of evoked fMRI activity during perception, imagery, and production of vowels had been successful. Results revealed that phonological information organizes around formant structure during the perception of vowels; interestingly, such a model was reconstructed in a broad temporal region, outside of the primary auditory cortex, but also in the pars triangularis of the left inferior frontal gyrus. Conversely, articulatory features were not associated with brain activity in these regions. Overall, our results call for a degree of interdependence based on acoustic information, between the frontal and temporal ends of the language network.
Collapse
Affiliation(s)
| | | | - Andrea Leo
- IMT School for Advanced Studies Lucca, Lucca, Italy
| | | | - Monica Betta
- IMT School for Advanced Studies Lucca, Lucca, Italy
| | - Giovanna Marotta
- Department of Philology, Literature and Linguistics, University of Pisa, Pisa, Italy
| | | | | |
Collapse
|
81
|
Buchsbaum BR, D'Esposito M. A sensorimotor view of verbal working memory. Cortex 2019; 112:134-148. [DOI: 10.1016/j.cortex.2018.11.010] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2018] [Revised: 10/09/2018] [Accepted: 11/11/2018] [Indexed: 12/16/2022]
|
82
|
Neural processes of vocal social perception: Dog-human comparative fMRI studies. Neurosci Biobehav Rev 2019; 85:54-64. [PMID: 29287629 DOI: 10.1016/j.neubiorev.2017.11.017] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2017] [Revised: 11/20/2017] [Accepted: 11/23/2017] [Indexed: 11/20/2022]
Abstract
In this review we focus on the exciting new opportunities in comparative neuroscience to study neural processes of vocal social perception by comparing dog and human neural activity using fMRI methods. The dog is a relatively new addition to this research area; however, it has a large potential to become a standard species in such investigations. Although there has been great interest in the emergence of human language abilities, in case of fMRI methods, most research to date focused on homologue comparisons within Primates. By belonging to a very different clade of mammalian evolution, dogs could give such research agendas a more general mammalian foundation. In addition, broadening the scope of investigations into vocal communication in general can also deepen our understanding of human vocal skills. Being selected for and living in an anthropogenic environment, research with dogs may also be informative about the way in which human non-linguistic and linguistic signals are represented in a mammalian brain without skills for language production.
Collapse
|
83
|
Ogg M, Moraczewski D, Kuchinsky SE, Slevc LR. Separable neural representations of sound sources: Speaker identity and musical timbre. Neuroimage 2019; 191:116-126. [PMID: 30731247 DOI: 10.1016/j.neuroimage.2019.01.075] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2018] [Revised: 12/14/2018] [Accepted: 01/30/2019] [Indexed: 11/28/2022] Open
Abstract
Human listeners can quickly and easily recognize different sound sources (objects and events) in their environment. Understanding how this impressive ability is accomplished can improve signal processing and machine intelligence applications along with assistive listening technologies. However, it is not clear how the brain represents the many sounds that humans can recognize (such as speech and music) at the level of individual sources, categories and acoustic features. To examine the cortical organization of these representations, we used patterns of fMRI responses to decode 1) four individual speakers and instruments from one another (separately, within each category), 2) the superordinate category labels associated with each stimulus (speech or instrument), and 3) a set of simple synthesized sounds that could be differentiated entirely on their acoustic features. Data were collected using an interleaved silent steady state sequence to increase the temporal signal-to-noise ratio, and mitigate issues with auditory stimulus presentation in fMRI. Largely separable clusters of voxels in the temporal lobes supported the decoding of individual speakers and instruments from other stimuli in the same category. Decoding the superordinate category of each sound was more accurate and involved a larger portion of the temporal lobes. However, these clusters all overlapped with areas that could decode simple, acoustically separable stimuli. Thus, individual sound sources from different sound categories are represented in separate regions of the temporal lobes that are situated within regions implicated in more general acoustic processes. These results bridge an important gap in our understanding of cortical representations of sounds and their acoustics.
Collapse
Affiliation(s)
- Mattson Ogg
- Program in Neuroscience and Cognitive Science, University of Maryland, College Park, MD, 20742, USA; Department of Psychology, University of Maryland, College Park, MD, 20742, USA.
| | - Dustin Moraczewski
- Program in Neuroscience and Cognitive Science, University of Maryland, College Park, MD, 20742, USA; Department of Psychology, University of Maryland, College Park, MD, 20742, USA
| | - Stefanie E Kuchinsky
- Program in Neuroscience and Cognitive Science, University of Maryland, College Park, MD, 20742, USA; Center for Advanced Study of Language, University of Maryland, College Park, MD, 20742, USA; Maryland Neuroimaging Center, University of Maryland, College Park, MD, 20742, USA
| | - L Robert Slevc
- Program in Neuroscience and Cognitive Science, University of Maryland, College Park, MD, 20742, USA; Department of Psychology, University of Maryland, College Park, MD, 20742, USA
| |
Collapse
|
84
|
Hellbernd N, Sammler D. Neural bases of social communicative intentions in speech. Soc Cogn Affect Neurosci 2019; 13:604-615. [PMID: 29771359 PMCID: PMC6022564 DOI: 10.1093/scan/nsy034] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2017] [Accepted: 05/13/2018] [Indexed: 11/15/2022] Open
Abstract
Our ability to understand others’ communicative intentions in speech is key to successful social interaction. Indeed, misunderstanding an ‘excuse me’ as apology, while meant as criticism, may have important consequences. Recent behavioural studies have provided evidence that prosody, that is, vocal tone, is an important indicator for speakers’ intentions. Using a novel audio-morphing paradigm, the present functional magnetic resonance imaging study examined the neurocognitive mechanisms that allow listeners to ‘read’ speakers’ intents from vocal prosodic patterns. Participants categorized prosodic expressions that gradually varied in their acoustics between criticism, doubt, and suggestion. Categorizing typical exemplars of the three intentions induced activations along the ventral auditory stream, complemented by amygdala and mentalizing system. These findings likely depict the stepwise conversion of external perceptual information into abstract prosodic categories and internal social semantic concepts, including the speaker’s mental state. Ambiguous tokens, in turn, involved cingulo-opercular areas known to assist decision-making in case of conflicting cues. Auditory and decision-making processes were flexibly coupled with the amygdala, depending on prosodic typicality, indicating enhanced categorization efficiency of overtly relevant, meaningful prosodic signals. Altogether, the results point to a model in which auditory prosodic categorization and socio-inferential conceptualization cooperate to translate perceived vocal tone into a coherent representation of the speaker’s intent.
Collapse
Affiliation(s)
- Nele Hellbernd
- Otto Hahn Group Neural Bases of Intonation in Speech and Music, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstraße 1a, D-04103 Leipzig, Germany
| | - Daniela Sammler
- Otto Hahn Group Neural Bases of Intonation in Speech and Music, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstraße 1a, D-04103 Leipzig, Germany
| |
Collapse
|
85
|
Venezia JH, Thurman SM, Richards VM, Hickok G. Hierarchy of speech-driven spectrotemporal receptive fields in human auditory cortex. Neuroimage 2018; 186:647-666. [PMID: 30500424 DOI: 10.1016/j.neuroimage.2018.11.049] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2018] [Revised: 10/11/2018] [Accepted: 11/26/2018] [Indexed: 12/22/2022] Open
Abstract
Existing data indicate that cortical speech processing is hierarchically organized. Numerous studies have shown that early auditory areas encode fine acoustic details while later areas encode abstracted speech patterns. However, it remains unclear precisely what speech information is encoded across these hierarchical levels. Estimation of speech-driven spectrotemporal receptive fields (STRFs) provides a means to explore cortical speech processing in terms of acoustic or linguistic information associated with characteristic spectrotemporal patterns. Here, we estimate STRFs from cortical responses to continuous speech in fMRI. Using a novel approach based on filtering randomly-selected spectrotemporal modulations (STMs) from aurally-presented sentences, STRFs were estimated for a group of listeners and categorized using a data-driven clustering algorithm. 'Behavioral STRFs' highlighting STMs crucial for speech recognition were derived from intelligibility judgments. Clustering revealed that STRFs in the supratemporal plane represented a broad range of STMs, while STRFs in the lateral temporal lobe represented circumscribed STM patterns important to intelligibility. Detailed analysis recovered a bilateral organization with posterior-lateral regions preferentially processing STMs associated with phonological information and anterior-lateral regions preferentially processing STMs associated with word- and phrase-level information. Regions in lateral Heschl's gyrus preferentially processed STMs associated with vocalic information (pitch).
Collapse
Affiliation(s)
- Jonathan H Venezia
- VA Loma Linda Healthcare System, Loma Linda, CA, USA; Dept. of Otolaryngology, School of Medicine, Loma Linda University, Loma Linda, CA, USA.
| | | | - Virginia M Richards
- Depts. of Cognitive Sciences and Language Science, University of California, Irvine, Irvine, CA, USA
| | - Gregory Hickok
- Depts. of Cognitive Sciences and Language Science, University of California, Irvine, Irvine, CA, USA
| |
Collapse
|
86
|
Hebart MN, Baker CI. Deconstructing multivariate decoding for the study of brain function. Neuroimage 2018; 180:4-18. [PMID: 28782682 PMCID: PMC5797513 DOI: 10.1016/j.neuroimage.2017.08.005] [Citation(s) in RCA: 138] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2017] [Revised: 07/28/2017] [Accepted: 08/01/2017] [Indexed: 12/24/2022] Open
Abstract
Multivariate decoding methods were developed originally as tools to enable accurate predictions in real-world applications. The realization that these methods can also be employed to study brain function has led to their widespread adoption in the neurosciences. However, prior to the rise of multivariate decoding, the study of brain function was firmly embedded in a statistical philosophy grounded on univariate methods of data analysis. In this way, multivariate decoding for brain interpretation grew out of two established frameworks: multivariate decoding for predictions in real-world applications, and classical univariate analysis based on the study and interpretation of brain activation. We argue that this led to two confusions, one reflecting a mixture of multivariate decoding for prediction or interpretation, and the other a mixture of the conceptual and statistical philosophies underlying multivariate decoding and classical univariate analysis. Here we attempt to systematically disambiguate multivariate decoding for the study of brain function from the frameworks it grew out of. After elaborating these confusions and their consequences, we describe six, often unappreciated, differences between classical univariate analysis and multivariate decoding. We then focus on how the common interpretation of what is signal and noise changes in multivariate decoding. Finally, we use four examples to illustrate where these confusions may impact the interpretation of neuroimaging data. We conclude with a discussion of potential strategies to help resolve these confusions in interpreting multivariate decoding results, including the potential departure from multivariate decoding methods for the study of brain function.
Collapse
Affiliation(s)
- Martin N Hebart
- Section on Learning and Plasticity, Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD 20892, USA.
| | - Chris I Baker
- Section on Learning and Plasticity, Laboratory of Brain and Cognition, National Institute of Mental Health, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
87
|
Huang N, Slaney M, Elhilali M. Connecting Deep Neural Networks to Physical, Perceptual, and Electrophysiological Auditory Signals. Front Neurosci 2018; 12:532. [PMID: 30154688 PMCID: PMC6102345 DOI: 10.3389/fnins.2018.00532] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2018] [Accepted: 07/16/2018] [Indexed: 11/13/2022] Open
Abstract
Deep neural networks have been recently shown to capture intricate information transformation of signals from the sensory profiles to semantic representations that facilitate recognition or discrimination of complex stimuli. In this vein, convolutional neural networks (CNNs) have been used very successfully in image and audio classification. Designed to imitate the hierarchical structure of the nervous system, CNNs reflect activation with increasing degrees of complexity that transform the incoming signal onto object-level representations. In this work, we employ a CNN trained for large-scale audio object classification to gain insights about the contribution of various audio representations that guide sound perception. The analysis contrasts activation of different layers of a CNN with acoustic features extracted directly from the scenes, perceptual salience obtained from behavioral responses of human listeners, as well as neural oscillations recorded by electroencephalography (EEG) in response to the same natural scenes. All three measures are tightly linked quantities believed to guide percepts of salience and object formation when listening to complex scenes. The results paint a picture of the intricate interplay between low-level and object-level representations in guiding auditory salience that is very much dependent on context and sound category.
Collapse
Affiliation(s)
- Nicholas Huang
- Laboratory for Computational Audio Perception, Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, United States
| | - Malcolm Slaney
- Machine Hearing, Google AI, Google (United States), Mountain View, CA, United States
| | - Mounya Elhilali
- Laboratory for Computational Audio Perception, Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, United States
| |
Collapse
|
88
|
Hjortkjær J, Kassuba T, Madsen KH, Skov M, Siebner HR. Task-Modulated Cortical Representations of Natural Sound Source Categories. Cereb Cortex 2018; 28:295-306. [PMID: 29069292 DOI: 10.1093/cercor/bhx263] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In everyday sound environments, we recognize sound sources and events by attending to relevant aspects of an acoustic input. Evidence about the cortical mechanisms involved in extracting relevant category information from natural sounds is, however, limited to speech. Here, we used functional MRI to measure cortical response patterns while human listeners categorized real-world sounds created by objects of different solid materials (glass, metal, wood) manipulated by different sound-producing actions (striking, rattling, dropping). In different sessions, subjects had to identify either material or action categories in the same sound stimuli. The sound-producing action and the material of the sound source could be decoded from multivoxel activity patterns in auditory cortex, including Heschl's gyrus and planum temporale. Importantly, decoding success depended on task relevance and category discriminability. Action categories were more accurately decoded in auditory cortex when subjects identified action information. Conversely, the material of the same sound sources was decoded with higher accuracy in the inferior frontal cortex during material identification. Representational similarity analyses indicated that both early and higher-order auditory cortex selectively enhanced spectrotemporal features relevant to the target category. Together, the results indicate a cortical selection mechanism that favors task-relevant information in the processing of nonvocal sound categories.
Collapse
Affiliation(s)
- Jens Hjortkjær
- Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital Hvidovre, 2650 Hvidovre, Denmark.,Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | - Tanja Kassuba
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544, USA
| | - Kristoffer H Madsen
- Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital Hvidovre, 2650 Hvidovre, Denmark.,Cognitive Systems, Department of Applied Mathematics and Computer Science, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | - Martin Skov
- Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital Hvidovre, 2650 Hvidovre, Denmark.,Decision Neuroscience Research Group, Copenhagen Business School, 2000 Frederiksberg, Denmark
| | - Hartwig R Siebner
- Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital Hvidovre, 2650 Hvidovre, Denmark.,Department of Neurology, Copenhagen University Hospital Bispebjerg, Copenhagen, 2400 København NV, Denmark
| |
Collapse
|
89
|
What's what in auditory cortices? Neuroimage 2018; 176:29-40. [DOI: 10.1016/j.neuroimage.2018.04.028] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Revised: 04/04/2018] [Accepted: 04/12/2018] [Indexed: 11/30/2022] Open
|
90
|
Electrophysiological correlates of voice memory for young and old speakers in young and old listeners. Neuropsychologia 2018; 116:215-227. [PMID: 28802769 DOI: 10.1016/j.neuropsychologia.2017.08.011] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Revised: 08/04/2017] [Accepted: 08/07/2017] [Indexed: 11/23/2022]
Abstract
Faces of one's own-age group are easier to recognize than other-age faces. Using behavioral measures and EEG, we studied whether an own-age bias (OAB) also exists in voice memory. Young (19 - 26 years) and old (60-75 years) participants studied young (18-25 years) and old (60-77 years) unfamiliar voices from short sentences. Subsequently, they classified studied and novel voices as "old" (i.e. studied) or "new", from the same sentences. Recognition performance was higher in young compared to old participants, and for old compared to young voices, with no OAB. At the same time, we found evidence for higher distinctiveness of old compared to young voices, both in terms of acoustic measures and subjective ratings (independent of rater age). Analyses of event-related brain potentials (ERPs) indicated more negative-going deflections (400-1000ms) for old compared to young voices in young participants. In old participants, we observed a reversed OLD/NEW memory effect, with overall more positive amplitudes for novel compared to studied old (but not young) voices (400-1000ms). Time-frequency analyses revealed less beta power (16-26Hz) for young compared to old voices at left anterior sites, and also reduced beta power for correctly recognized studied (compared to novel) voices at left posterior sites (300-900ms). These findings could suggest an engagement of cortical areas during stimulus-specific recollection from about 300ms, in a task that emphasized the analysis of individual acoustic features.
Collapse
|
91
|
de Borst AW, de Gelder B. Mental Imagery Follows Similar Cortical Reorganization as Perception: Intra-Modal and Cross-Modal Plasticity in Congenitally Blind. Cereb Cortex 2018; 29:2859-2875. [DOI: 10.1093/cercor/bhy151] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2018] [Revised: 05/27/2018] [Accepted: 06/05/2018] [Indexed: 11/14/2022] Open
Abstract
Abstract
Cortical plasticity in congenitally blind individuals leads to cross-modal activation of the visual cortex and may lead to superior perceptual processing in the intact sensory domains. Although mental imagery is often defined as a quasi-perceptual experience, it is unknown whether it follows similar cortical reorganization as perception in blind individuals. In this study, we show that auditory versus tactile perception evokes similar intra-modal discriminative patterns in congenitally blind compared with sighted participants. These results indicate that cortical plasticity following visual deprivation does not influence broad intra-modal organization of auditory and tactile perception as measured by our task. Furthermore, not only the blind, but also the sighted participants showed cross-modal discriminative patterns for perception modality in the visual cortex. During mental imagery, both groups showed similar decoding accuracies for imagery modality in the intra-modal primary sensory cortices. However, no cross-modal discriminative information for imagery modality was found in early visual cortex of blind participants, in contrast to the sighted participants. We did find evidence of cross-modal activation of higher visual areas in blind participants, including the representation of specific-imagined auditory features in visual area V4.
Collapse
Affiliation(s)
- A W de Borst
- Department of Computer Science, University College London, London, UK
- Brain and Emotion Lab, Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, the Netherlands
| | - B de Gelder
- Department of Computer Science, University College London, London, UK
- Brain and Emotion Lab, Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, the Netherlands
| |
Collapse
|
92
|
Kragel PA, Koban L, Barrett LF, Wager TD. Representation, Pattern Information, and Brain Signatures: From Neurons to Neuroimaging. Neuron 2018; 99:257-273. [PMID: 30048614 PMCID: PMC6296466 DOI: 10.1016/j.neuron.2018.06.009] [Citation(s) in RCA: 102] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Revised: 06/01/2018] [Accepted: 06/05/2018] [Indexed: 01/22/2023]
Abstract
Human neuroimaging research has transitioned from mapping local effects to developing predictive models of mental events that integrate information distributed across multiple brain systems. Here we review work demonstrating how multivariate predictive models have been utilized to provide quantitative, falsifiable predictions; establish mappings between brain and mind with larger effects than traditional approaches; and help explain how the brain represents mental constructs and processes. Although there is increasing progress toward the first two of these goals, models are only beginning to address the latter objective. By explicitly identifying gaps in knowledge, research programs can move deliberately and programmatically toward the goal of identifying brain representations underlying mental states and processes.
Collapse
Affiliation(s)
- Philip A Kragel
- Department of Psychology and Neuroscience and the Institute of Cognitive Science, University of Colorado, Boulder, CO, USA; Institute for Behavioral Genetics, University of Colorado, Boulder, CO, USA
| | - Leonie Koban
- Department of Psychology and Neuroscience and the Institute of Cognitive Science, University of Colorado, Boulder, CO, USA
| | - Lisa Feldman Barrett
- Department of Psychology, Northeastern University, Boston, MA, USA; Department of Radiology, Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA; Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Tor D Wager
- Department of Psychology and Neuroscience and the Institute of Cognitive Science, University of Colorado, Boulder, CO, USA.
| |
Collapse
|
93
|
Maguinness C, Roswandowitz C, von Kriegstein K. Understanding the mechanisms of familiar voice-identity recognition in the human brain. Neuropsychologia 2018; 116:179-193. [DOI: 10.1016/j.neuropsychologia.2018.03.039] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2017] [Revised: 03/28/2018] [Accepted: 03/29/2018] [Indexed: 11/26/2022]
|
94
|
Neural Prediction Errors Distinguish Perception and Misperception of Speech. J Neurosci 2018; 38:6076-6089. [PMID: 29891730 DOI: 10.1523/jneurosci.3258-17.2018] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Revised: 03/08/2018] [Accepted: 03/28/2018] [Indexed: 11/21/2022] Open
Abstract
Humans use prior expectations to improve perception, especially of sensory signals that are degraded or ambiguous. However, if sensory input deviates from prior expectations, then correct perception depends on adjusting or rejecting prior expectations. Failure to adjust or reject the prior leads to perceptual illusions, especially if there is partial overlap (and thus partial mismatch) between expectations and input. With speech, "slips of the ear" occur when expectations lead to misperception. For instance, an entomologist might be more susceptible to hear "The ants are my friends" for "The answer, my friend" (in the Bob Dylan song Blowing in the Wind). Here, we contrast two mechanisms by which prior expectations may lead to misperception of degraded speech. First, clear representations of the common sounds in the prior and input (i.e., expected sounds) may lead to incorrect confirmation of the prior. Second, insufficient representations of sounds that deviate between prior and input (i.e., prediction errors) could lead to deception. We used crossmodal predictions from written words that partially match degraded speech to compare neural responses when male and female human listeners were deceived into accepting the prior or correctly reject it. Combined behavioral and multivariate representational similarity analysis of fMRI data show that veridical perception of degraded speech is signaled by representations of prediction error in the left superior temporal sulcus. Instead of using top-down processes to support perception of expected sensory input, our findings suggest that the strength of neural prediction error representations distinguishes correct perception and misperception.SIGNIFICANCE STATEMENT Misperceiving spoken words is an everyday experience, with outcomes that range from shared amusement to serious miscommunication. For hearing-impaired individuals, frequent misperception can lead to social withdrawal and isolation, with severe consequences for wellbeing. In this work, we specify the neural mechanisms by which prior expectations, which are so often helpful for perception, can lead to misperception of degraded sensory signals. Most descriptive theories of illusory perception explain misperception as arising from a clear sensory representation of features or sounds that are in common between prior expectations and sensory input. Our work instead provides support for a complementary proposal: that misperception occurs when there is an insufficient sensory representations of the deviation between expectations and sensory signals.
Collapse
|
95
|
Fisher JM, Dick FK, Levy DF, Wilson SM. Neural representation of vowel formants in tonotopic auditory cortex. Neuroimage 2018; 178:574-582. [PMID: 29860083 DOI: 10.1016/j.neuroimage.2018.05.072] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2018] [Revised: 05/29/2018] [Accepted: 05/30/2018] [Indexed: 11/25/2022] Open
Abstract
Speech sounds are encoded by distributed patterns of activity in bilateral superior temporal cortex. However, it is unclear whether speech sounds are topographically represented in cortex, or which acoustic or phonetic dimensions might be spatially mapped. Here, using functional MRI, we investigated the potential spatial representation of vowels, which are largely distinguished from one another by the frequencies of their first and second formants, i.e. peaks in their frequency spectra. This allowed us to generate clear hypotheses about the representation of specific vowels in tonotopic regions of auditory cortex. We scanned participants as they listened to multiple natural tokens of the vowels [ɑ] and [i], which we selected because their first and second formants overlap minimally. Formant-based regions of interest were defined for each vowel based on spectral analysis of the vowel stimuli and independently acquired tonotopic maps for each participant. We found that perception of [ɑ] and [i] yielded differential activation of tonotopic regions corresponding to formants of [ɑ] and [i], such that each vowel was associated with increased signal in tonotopic regions corresponding to its own formants. This pattern was observed in Heschl's gyrus and the superior temporal gyrus, in both hemispheres, and for both the first and second formants. Using linear discriminant analysis of mean signal change in formant-based regions of interest, the identity of untrained vowels was predicted with ∼73% accuracy. Our findings show that cortical encoding of vowels is scaffolded on tonotopy, a fundamental organizing principle of auditory cortex that is not language-specific.
Collapse
Affiliation(s)
- Julia M Fisher
- Department of Linguistics, University of Arizona, Tucson, AZ, USA; Statistics Consulting Laboratory, BIO5 Institute, University of Arizona, Tucson, AZ, USA
| | - Frederic K Dick
- Department of Psychological Sciences, Birkbeck College, University of London, UK; Birkbeck-UCL Center for Neuroimaging, London, UK; Department of Experimental Psychology, University College London, UK
| | - Deborah F Levy
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Stephen M Wilson
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN, USA.
| |
Collapse
|
96
|
A "voice patch" system in the primate brain for processing vocal information? Hear Res 2018; 366:65-74. [PMID: 29776691 DOI: 10.1016/j.heares.2018.04.010] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/05/2018] [Revised: 04/14/2018] [Accepted: 04/25/2018] [Indexed: 12/13/2022]
Abstract
We review behavioural and neural evidence for the processing of information contained in conspecific vocalizations (CVs) in three primate species: humans, macaques and marmosets. We focus on abilities that are present and ecologically relevant in all three species: the detection and sensitivity to CVs; and the processing of identity cues in CVs. Current evidence, although fragmentary, supports the notion of a "voice patch system" in the primate brain analogous to the face patch system of visual cortex: a series of discrete, interconnected cortical areas supporting increasingly abstract representations of the vocal input. A central question concerns the degree to which the voice patch system is conserved in evolution. We outline challenges that arise and suggesting potential avenues for comparing the organization of the voice patch system across primate brains.
Collapse
|
97
|
de Borst AW, de Gelder B. fMRI-based Multivariate Pattern Analyses Reveal Imagery Modality and Imagery Content Specific Representations in Primary Somatosensory, Motor and Auditory Cortices. Cereb Cortex 2018; 27:3994-4009. [PMID: 27473324 DOI: 10.1093/cercor/bhw211] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2016] [Accepted: 06/13/2016] [Indexed: 11/12/2022] Open
Abstract
Previous studies have shown that the early visual cortex contains content-specific representations of stimuli during visual imagery, and that these representational patterns of imagery content have a perceptual basis. To date, there is little evidence for the presence of a similar organization in the auditory and tactile domains. Using fMRI-based multivariate pattern analyses we showed that primary somatosensory, auditory, motor, and visual cortices are discriminative for imagery of touch versus sound. In the somatosensory, motor and visual cortices the imagery modality discriminative patterns were similar to perception modality discriminative patterns, suggesting that top-down modulations in these regions rely on similar neural representations as bottom-up perceptual processes. Moreover, we found evidence for content-specific representations of the stimuli during auditory imagery in the primary somatosensory and primary motor cortices. Both the imagined emotions and the imagined identities of the auditory stimuli could be successfully classified in these regions.
Collapse
Affiliation(s)
- Aline W de Borst
- Brain and Emotion Laboratory, Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, Limburg 6200 MD, the Netherlands
| | - Beatrice de Gelder
- Brain and Emotion Laboratory, Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, Limburg 6200 MD, the Netherlands.,Department of Psychiatry and Mental Health, Faculty of Health Sciences, University of Cape Town, Cape Town 7925, South Africa
| |
Collapse
|
98
|
Kato M, Yokoyama C, Kawasaki A, Takeda C, Koike T, Onoe H, Iriki A. Individual identity and affective valence in marmoset calls: in vivo brain imaging with vocal sound playback. Anim Cogn 2018; 21:331-343. [PMID: 29488110 PMCID: PMC5908821 DOI: 10.1007/s10071-018-1169-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2017] [Revised: 02/12/2018] [Accepted: 02/15/2018] [Indexed: 12/29/2022]
Abstract
As with humans, vocal communication is an important social tool for nonhuman primates. Common marmosets (Callithrix jacchus) often produce whistle-like 'phee' calls when they are visually separated from conspecifics. The neural processes specific to phee call perception, however, are largely unknown, despite the possibility that these processes involve social information. Here, we examined behavioral and whole-brain mapping evidence regarding the detection of individual conspecific phee calls using an audio playback procedure. Phee calls evoked sound exploratory responses when the caller changed, indicating that marmosets can discriminate between caller identities. Positron emission tomography with [18F] fluorodeoxyglucose revealed that perception of phee calls from a single subject was associated with activity in the dorsolateral prefrontal, medial prefrontal, orbitofrontal cortices, and the amygdala. These findings suggest that these regions are implicated in cognitive and affective processing of salient social information. However, phee calls from multiple subjects induced brain activation in only some of these regions, such as the dorsolateral prefrontal cortex. We also found distinctive brain deactivation and functional connectivity associated with phee call perception depending on the caller change. According to changes in pupillary size, phee calls from a single subject induced a higher arousal level compared with those from multiple subjects. These results suggest that marmoset phee calls convey information about individual identity and affective valence depending on the consistency or variability of the caller. Based on the flexible perception of the call based on individual recognition, humans and marmosets may share some neural mechanisms underlying conspecific vocal perception.
Collapse
Affiliation(s)
- Masaki Kato
- Laboratory for Symbolic Cognitive Development, RIKEN Brain Science Institute, Wako, Saitama, Japan
- Research Development Section, Research Promotion Hub, Office for Enhancing Institutional Capacity, Hokkaido University, Sapporo, Hokkaido, Japan
| | - Chihiro Yokoyama
- Division of Bio-Function Dynamics Imaging, RIKEN Center for Life Science Technologies, Kobe, Hyogo, Japan.
| | - Akihiro Kawasaki
- Division of Bio-Function Dynamics Imaging, RIKEN Center for Life Science Technologies, Kobe, Hyogo, Japan
| | - Chiho Takeda
- Division of Bio-Function Dynamics Imaging, RIKEN Center for Life Science Technologies, Kobe, Hyogo, Japan
| | - Taku Koike
- Laboratory for Symbolic Cognitive Development, RIKEN Brain Science Institute, Wako, Saitama, Japan
| | - Hirotaka Onoe
- Division of Bio-Function Dynamics Imaging, RIKEN Center for Life Science Technologies, Kobe, Hyogo, Japan
| | - Atsushi Iriki
- Laboratory for Symbolic Cognitive Development, RIKEN Brain Science Institute, Wako, Saitama, Japan.
- RIKEN-NTU Research Centre for Human Biology, Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.
| |
Collapse
|
99
|
Kell AJ, Yamins DL, Shook EN, Norman-Haignere SV, McDermott JH. A Task-Optimized Neural Network Replicates Human Auditory Behavior, Predicts Brain Responses, and Reveals a Cortical Processing Hierarchy. Neuron 2018; 98:630-644.e16. [DOI: 10.1016/j.neuron.2018.03.044] [Citation(s) in RCA: 232] [Impact Index Per Article: 38.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2017] [Revised: 12/22/2017] [Accepted: 03/23/2018] [Indexed: 11/28/2022]
|
100
|
Activity in Human Auditory Cortex Represents Spatial Separation Between Concurrent Sounds. J Neurosci 2018; 38:4977-4984. [PMID: 29712782 DOI: 10.1523/jneurosci.3323-17.2018] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2017] [Revised: 03/05/2018] [Accepted: 03/09/2018] [Indexed: 11/21/2022] Open
Abstract
The primary and posterior auditory cortex (AC) are known for their sensitivity to spatial information, but how this information is processed is not yet understood. AC that is sensitive to spatial manipulations is also modulated by the number of auditory streams present in a scene (Smith et al., 2010), suggesting that spatial and nonspatial cues are integrated for stream segregation. We reasoned that, if this is the case, then it is the distance between sounds rather than their absolute positions that is essential. To test this hypothesis, we measured human brain activity in response to spatially separated concurrent sounds with fMRI at 7 tesla in five men and five women. Stimuli were spatialized amplitude-modulated broadband noises recorded for each participant via in-ear microphones before scanning. Using a linear support vector machine classifier, we investigated whether sound location and/or location plus spatial separation between sounds could be decoded from the activity in Heschl's gyrus and the planum temporale. The classifier was successful only when comparing patterns associated with the conditions that had the largest difference in perceptual spatial separation. Our pattern of results suggests that the representation of spatial separation is not merely the combination of single locations, but rather is an independent feature of the auditory scene.SIGNIFICANCE STATEMENT Often, when we think of auditory spatial information, we think of where sounds are coming from-that is, the process of localization. However, this information can also be used in scene analysis, the process of grouping and segregating features of a soundwave into objects. Essentially, when sounds are further apart, they are more likely to be segregated into separate streams. Here, we provide evidence that activity in the human auditory cortex represents the spatial separation between sounds rather than their absolute locations, indicating that scene analysis and localization processes may be independent.
Collapse
|