1
|
Degano G, Donhauser PW, Gwilliams L, Merlo P, Golestani N. Speech prosody enhances the neural processing of syntax. Commun Biol 2024; 7:748. [PMID: 38902370 PMCID: PMC11190187 DOI: 10.1038/s42003-024-06444-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 06/12/2024] [Indexed: 06/22/2024] Open
Abstract
Human language relies on the correct processing of syntactic information, as it is essential for successful communication between speakers. As an abstract level of language, syntax has often been studied separately from the physical form of the speech signal, thus often masking the interactions that can promote better syntactic processing in the human brain. However, behavioral and neural evidence from adults suggests the idea that prosody and syntax interact, and studies in infants support the notion that prosody assists language learning. Here we analyze a MEG dataset to investigate how acoustic cues, specifically prosody, interact with syntactic representations in the brains of native English speakers. More specifically, to examine whether prosody enhances the cortical encoding of syntactic representations, we decode syntactic phrase boundaries directly from brain activity, and evaluate possible modulations of this decoding by the prosodic boundaries. Our findings demonstrate that the presence of prosodic boundaries improves the neural representation of phrase boundaries, indicating the facilitative role of prosodic cues in processing abstract linguistic features. This work has implications for interactive models of how the brain processes different linguistic features. Future research is needed to establish the neural underpinnings of prosody-syntax interactions in languages with different typological characteristics.
Collapse
Affiliation(s)
- Giulio Degano
- Department of Psychology, Faculty of Psychology and Educational Sciences, University of Geneva, Geneva, Switzerland.
| | - Peter W Donhauser
- Ernst Strüngmann Institute for Neuroscience in Cooperation with Max Planck Society, Frankfurt am Main, Germany
| | - Laura Gwilliams
- Department of Psychology, Stanford University, Stanford, CA, USA
| | - Paola Merlo
- Department of Linguistics, University of Geneva, Geneva, Switzerland
- University Centre for Informatics, University of Geneva, Geneva, Switzerland
| | - Narly Golestani
- Department of Psychology, Faculty of Psychology and Educational Sciences, University of Geneva, Geneva, Switzerland
- Brain and Language Lab, Cognitive Science Hub, University of Vienna, Vienna, Austria
- Department of Behavioral and Cognitive Biology, Faculty of Life Sciences, University of Vienna, Vienna, Austria
| |
Collapse
|
2
|
Nora A, Rinkinen O, Renvall H, Service E, Arkkila E, Smolander S, Laasonen M, Salmelin R. Impaired Cortical Tracking of Speech in Children with Developmental Language Disorder. J Neurosci 2024; 44:e2048232024. [PMID: 38589232 PMCID: PMC11140678 DOI: 10.1523/jneurosci.2048-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 03/25/2024] [Accepted: 03/26/2024] [Indexed: 04/10/2024] Open
Abstract
In developmental language disorder (DLD), learning to comprehend and express oneself with spoken language is impaired, but the reason for this remains unknown. Using millisecond-scale magnetoencephalography recordings combined with machine learning models, we investigated whether the possible neural basis of this disruption lies in poor cortical tracking of speech. The stimuli were common spoken Finnish words (e.g., dog, car, hammer) and sounds with corresponding meanings (e.g., dog bark, car engine, hammering). In both children with DLD (10 boys and 7 girls) and typically developing (TD) control children (14 boys and 3 girls), aged 10-15 years, the cortical activation to spoken words was best modeled as time-locked to the unfolding speech input at ∼100 ms latency between sound and cortical activation. Amplitude envelope (amplitude changes) and spectrogram (detailed time-varying spectral content) of the spoken words, but not other sounds, were very successfully decoded based on time-locked brain responses in bilateral temporal areas; based on the cortical responses, the models could tell at ∼75-85% accuracy which of the two sounds had been presented to the participant. However, the cortical representation of the amplitude envelope information was poorer in children with DLD compared with TD children at longer latencies (at ∼200-300 ms lag). We interpret this effect as reflecting poorer retention of acoustic-phonetic information in short-term memory. This impaired tracking could potentially affect the processing and learning of words as well as continuous speech. The present results offer an explanation for the problems in language comprehension and acquisition in DLD.
Collapse
Affiliation(s)
- Anni Nora
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo FI-00076, Finland
- Aalto NeuroImaging (ANI), Aalto University, Espoo FI-00076, Finland
| | - Oona Rinkinen
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo FI-00076, Finland
- Aalto NeuroImaging (ANI), Aalto University, Espoo FI-00076, Finland
| | - Hanna Renvall
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo FI-00076, Finland
- Aalto NeuroImaging (ANI), Aalto University, Espoo FI-00076, Finland
- BioMag Laboratory, HUS Diagnostic Center, Helsinki University Hospital, Helsinki FI-00029, Finland
| | - Elisabet Service
- Department of Linguistics and Languages, Centre for Advanced Research in Experimental and Applied Linguistics (ARiEAL), McMaster University, Hamilton, Ontario L8S 4L8, Canada
- Department of Psychology and Logopedics, University of Helsinki, Helsinki FI-00014, Finland
| | - Eva Arkkila
- Department of Otorhinolaryngology and Phoniatrics, Head and Neck Center, Helsinki University Hospital and University of Helsinki, Helsinki FI-00014, Finland
| | - Sini Smolander
- Department of Otorhinolaryngology and Phoniatrics, Head and Neck Center, Helsinki University Hospital and University of Helsinki, Helsinki FI-00014, Finland
- Research Unit of Logopedics, University of Oulu, Oulu FI-90014, Finland
- Department of Logopedics, University of Eastern Finland, Joensuu FI-80101, Finland
| | - Marja Laasonen
- Department of Otorhinolaryngology and Phoniatrics, Head and Neck Center, Helsinki University Hospital and University of Helsinki, Helsinki FI-00014, Finland
- Department of Logopedics, University of Eastern Finland, Joensuu FI-80101, Finland
| | - Riitta Salmelin
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo FI-00076, Finland
- Aalto NeuroImaging (ANI), Aalto University, Espoo FI-00076, Finland
| |
Collapse
|
3
|
Rupp KM, Hect JL, Harford EE, Holt LL, Ghuman AS, Abel TJ. A hierarchy of processing complexity and timescales for natural sounds in human auditory cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.24.595822. [PMID: 38826304 PMCID: PMC11142240 DOI: 10.1101/2024.05.24.595822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Efficient behavior is supported by humans' ability to rapidly recognize acoustically distinct sounds as members of a common category. Within auditory cortex, there are critical unanswered questions regarding the organization and dynamics of sound categorization. Here, we performed intracerebral recordings in the context of epilepsy surgery as 20 patient-participants listened to natural sounds. We built encoding models to predict neural responses using features of these sounds extracted from different layers within a sound-categorization deep neural network (DNN). This approach yielded highly accurate models of neural responses throughout auditory cortex. The complexity of a cortical site's representation (measured by the depth of the DNN layer that produced the best model) was closely related to its anatomical location, with shallow, middle, and deep layers of the DNN associated with core (primary auditory cortex), lateral belt, and parabelt regions, respectively. Smoothly varying gradients of representational complexity also existed within these regions, with complexity increasing along a posteromedial-to-anterolateral direction in core and lateral belt, and along posterior-to-anterior and dorsal-to-ventral dimensions in parabelt. When we estimated the time window over which each recording site integrates information, we found shorter integration windows in core relative to lateral belt and parabelt. Lastly, we found a relationship between the length of the integration window and the complexity of information processing within core (but not lateral belt or parabelt). These findings suggest hierarchies of timescales and processing complexity, and their interrelationship, represent a functional organizational principle of the auditory stream that underlies our perception of complex, abstract auditory information.
Collapse
Affiliation(s)
- Kyle M. Rupp
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Jasmine L. Hect
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Emily E. Harford
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Lori L. Holt
- Department of Psychology, The University of Texas at Austin, Austin, Texas, United States of America
| | - Avniel Singh Ghuman
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Taylor J. Abel
- Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Department of Bioengineering, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
4
|
Chang A, Teng X, Assaneo MF, Poeppel D. The human auditory system uses amplitude modulation to distinguish music from speech. PLoS Biol 2024; 22:e3002631. [PMID: 38805517 PMCID: PMC11132470 DOI: 10.1371/journal.pbio.3002631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Accepted: 04/17/2024] [Indexed: 05/30/2024] Open
Abstract
Music and speech are complex and distinct auditory signals that are both foundational to the human experience. The mechanisms underpinning each domain are widely investigated. However, what perceptual mechanism transforms a sound into music or speech and how basic acoustic information is required to distinguish between them remain open questions. Here, we hypothesized that a sound's amplitude modulation (AM), an essential temporal acoustic feature driving the auditory system across processing levels, is critical for distinguishing music and speech. Specifically, in contrast to paradigms using naturalistic acoustic signals (that can be challenging to interpret), we used a noise-probing approach to untangle the auditory mechanism: If AM rate and regularity are critical for perceptually distinguishing music and speech, judging artificially noise-synthesized ambiguous audio signals should align with their AM parameters. Across 4 experiments (N = 335), signals with a higher peak AM frequency tend to be judged as speech, lower as music. Interestingly, this principle is consistently used by all listeners for speech judgments, but only by musically sophisticated listeners for music. In addition, signals with more regular AM are judged as music over speech, and this feature is more critical for music judgment, regardless of musical sophistication. The data suggest that the auditory system can rely on a low-level acoustic property as basic as AM to distinguish music from speech, a simple principle that provokes both neurophysiological and evolutionary experiments and speculations.
Collapse
Affiliation(s)
- Andrew Chang
- Department of Psychology, New York University, New York, New York, United States of America
| | - Xiangbin Teng
- Department of Psychology, Chinese University of Hong Kong, Hong Kong SAR, China
| | - M. Florencia Assaneo
- Instituto de Neurobiología, Universidad Nacional Autónoma de México, Juriquilla, Querétaro, México
| | - David Poeppel
- Department of Psychology, New York University, New York, New York, United States of America
- Ernst Struengmann Institute for Neuroscience, Frankfurt am Main, Germany
- Center for Language, Music, and Emotion (CLaME), New York University, New York, New York, United States of America
- Music and Audio Research Lab (MARL), New York University, New York, New York, United States of America
| |
Collapse
|
5
|
Kim SG, De Martino F, Overath T. Linguistic modulation of the neural encoding of phonemes. Cereb Cortex 2024; 34:bhae155. [PMID: 38687241 PMCID: PMC11059272 DOI: 10.1093/cercor/bhae155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 03/21/2024] [Accepted: 03/22/2024] [Indexed: 05/02/2024] Open
Abstract
Speech comprehension entails the neural mapping of the acoustic speech signal onto learned linguistic units. This acousto-linguistic transformation is bi-directional, whereby higher-level linguistic processes (e.g. semantics) modulate the acoustic analysis of individual linguistic units. Here, we investigated the cortical topography and linguistic modulation of the most fundamental linguistic unit, the phoneme. We presented natural speech and "phoneme quilts" (pseudo-randomly shuffled phonemes) in either a familiar (English) or unfamiliar (Korean) language to native English speakers while recording functional magnetic resonance imaging. This allowed us to dissociate the contribution of acoustic vs. linguistic processes toward phoneme analysis. We show that (i) the acoustic analysis of phonemes is modulated by linguistic analysis and (ii) that for this modulation, both of acoustic and phonetic information need to be incorporated. These results suggest that the linguistic modulation of cortical sensitivity to phoneme classes minimizes prediction error during natural speech perception, thereby aiding speech comprehension in challenging listening situations.
Collapse
Affiliation(s)
- Seung-Goo Kim
- Department of Psychology and Neuroscience, Duke University, 308 Research Dr, Durham, NC 27708, United States
- Research Group Neurocognition of Music and Language, Max Planck Institute for Empirical Aesthetics, Grüneburgweg 14, Frankfurt am Main 60322, Germany
| | - Federico De Martino
- Faculty of Psychology and Neuroscience, University of Maastricht, Universiteitssingel 40, 6229 ER Maastricht, Netherlands
| | - Tobias Overath
- Department of Psychology and Neuroscience, Duke University, 308 Research Dr, Durham, NC 27708, United States
- Duke Institute for Brain Sciences, Duke University, 308 Research Dr, Durham, NC 27708, United States
- Center for Cognitive Neuroscience, Duke University, 308 Research Dr, Durham, NC 27708, United States
| |
Collapse
|
6
|
Sankaran N, Leonard MK, Theunissen F, Chang EF. Encoding of melody in the human auditory cortex. SCIENCE ADVANCES 2024; 10:eadk0010. [PMID: 38363839 PMCID: PMC10871532 DOI: 10.1126/sciadv.adk0010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 01/17/2024] [Indexed: 02/18/2024]
Abstract
Melody is a core component of music in which discrete pitches are serially arranged to convey emotion and meaning. Perception varies along several pitch-based dimensions: (i) the absolute pitch of notes, (ii) the difference in pitch between successive notes, and (iii) the statistical expectation of each note given prior context. How the brain represents these dimensions and whether their encoding is specialized for music remains unknown. We recorded high-density neurophysiological activity directly from the human auditory cortex while participants listened to Western musical phrases. Pitch, pitch-change, and expectation were selectively encoded at different cortical sites, indicating a spatial map for representing distinct melodic dimensions. The same participants listened to spoken English, and we compared responses to music and speech. Cortical sites selective for music encoded expectation, while sites that encoded pitch and pitch-change in music used the same neural code to represent equivalent properties of speech. Findings reveal how the perception of melody recruits both music-specific and general-purpose sound representations.
Collapse
Affiliation(s)
- Narayan Sankaran
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA
| | - Matthew K. Leonard
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA
| | - Frederic Theunissen
- Department of Psychology, University of California, Berkeley, 2121 Berkeley Way, Berkeley, CA 94720, USA
| | - Edward F. Chang
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA
| |
Collapse
|
7
|
Miceli G, Caccia A. The Auditory Agnosias: a Short Review of Neurofunctional Evidence. Curr Neurol Neurosci Rep 2023; 23:671-679. [PMID: 37747655 PMCID: PMC10673750 DOI: 10.1007/s11910-023-01302-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/05/2023] [Indexed: 09/26/2023]
Abstract
PURPOSE OF REVIEW To investigate the neurofunctional correlates of pure auditory agnosia and its varieties (global, verbal, and nonverbal), based on 116 anatomoclinical reports published between 1893 and 2022, with emphasis on hemispheric lateralization, intrahemispheric lesion site, underlying cognitive impairments. RECENT FINDINGS Pure auditory agnosia is rare, and observations accumulate slowly. Recent patient reports and neuroimaging studies on neurotypical subjects offer insights into the putative mechanisms underlying auditory agnosia, while challenging traditional accounts. Global auditory agnosia frequently results from bilateral temporal damage. Verbal auditory agnosia strictly correlates with language-dominant hemisphere lesions. Damage involves the auditory pathways, but the critical lesion site is unclear. Both the auditory cortex and associative areas are reasonable candidates, but cases resulting from brainstem damage are on record. The hemispheric correlates of nonverbal auditory input disorders are less clear. They correlate with unilateral damage to either hemisphere, but evidence is scarce. Based on published cases, pure auditory agnosias are neurologically and functionally heterogeneous. Phenotypes are influenced by co-occurring cognitive impairments. Future studies should start from these facts and integrate patient data and studies in neurotypical individuals.
Collapse
Affiliation(s)
- Gabriele Miceli
- Professor of Neurology, Center for Mind/Brain Studies, University of Trento, Trento, Italy.
| | | |
Collapse
|
8
|
Sankaran N, Leonard MK, Theunissen F, Chang EF. Encoding of melody in the human auditory cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.17.562771. [PMID: 37905047 PMCID: PMC10614915 DOI: 10.1101/2023.10.17.562771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Melody is a core component of music in which discrete pitches are serially arranged to convey emotion and meaning. Perception of melody varies along several pitch-based dimensions: (1) the absolute pitch of notes, (2) the difference in pitch between successive notes, and (3) the higher-order statistical expectation of each note conditioned on its prior context. While humans readily perceive melody, how these dimensions are collectively represented in the brain and whether their encoding is specialized for music remains unknown. Here, we recorded high-density neurophysiological activity directly from the surface of human auditory cortex while Western participants listened to Western musical phrases. Pitch, pitch-change, and expectation were selectively encoded at different cortical sites, indicating a spatial code for representing distinct dimensions of melody. The same participants listened to spoken English, and we compared evoked responses to music and speech. Cortical sites selective for music were systematically driven by the encoding of expectation. In contrast, sites that encoded pitch and pitch-change used the same neural code to represent equivalent properties of speech. These findings reveal the multidimensional nature of melody encoding, consisting of both music-specific and domain-general sound representations in auditory cortex. Teaser The human brain contains both general-purpose and music-specific neural populations for processing distinct attributes of melody.
Collapse
|
9
|
Markow ZE, Trobaugh JW, Richter EJ, Tripathy K, Rafferty SM, Svoboda AM, Schroeder ML, Burns-Yocum TM, Bergonzi KM, Chevillet MA, Mugler EM, Eggebrecht AT, Culver JP. Ultra-high density imaging arrays for diffuse optical tomography of human brain improve resolution, signal-to-noise, and information decoding. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.21.549920. [PMID: 37547013 PMCID: PMC10401969 DOI: 10.1101/2023.07.21.549920] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Functional magnetic resonance imaging (fMRI) has dramatically advanced non-invasive human brain mapping and decoding. Functional near-infrared spectroscopy (fNIRS) and high-density diffuse optical tomography (HD-DOT) non-invasively measure blood oxygen fluctuations related to brain activity, like fMRI, at the brain surface, using more-lightweight equipment that circumvents ergonomic and logistical limitations of fMRI. HD-DOT grids have smaller inter-optode spacing (∼13 mm) than sparse fNIRS (∼30 mm) and therefore provide higher image quality, with spatial resolution ∼1/2 that of fMRI. Herein, simulations indicated reducing inter-optode spacing to 6.5 mm would further improve image quality and noise-resolution tradeoff, with diminishing returns below 6.5 mm. We then constructed an ultra-high-density DOT system (6.5-mm spacing) with 140 dB dynamic range that imaged stimulus-evoked activations with 30-50% higher spatial resolution and repeatable multi-focal activity with excellent agreement with participant-matched fMRI. Further, this system decoded visual stimulus position with 19-35% lower error than previous HD-DOT, throughout occipital cortex.
Collapse
|
10
|
Bellur A, Thakkar K, Elhilali M. Explicit-memory multiresolution adaptive framework for speech and music separation. EURASIP JOURNAL ON AUDIO, SPEECH, AND MUSIC PROCESSING 2023; 2023:20. [PMID: 37181589 PMCID: PMC10169896 DOI: 10.1186/s13636-023-00286-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Accepted: 04/21/2023] [Indexed: 05/16/2023]
Abstract
The human auditory system employs a number of principles to facilitate the selection of perceptually separated streams from a complex sound mixture. The brain leverages multi-scale redundant representations of the input and uses memory (or priors) to guide the selection of a target sound from the input mixture. Moreover, feedback mechanisms refine the memory constructs resulting in further improvement of selectivity of a particular sound object amidst dynamic backgrounds. The present study proposes a unified end-to-end computational framework that mimics these principles for sound source separation applied to both speech and music mixtures. While the problems of speech enhancement and music separation have often been tackled separately due to constraints and specificities of each signal domain, the current work posits that common principles for sound source separation are domain-agnostic. In the proposed scheme, parallel and hierarchical convolutional paths map input mixtures onto redundant but distributed higher-dimensional subspaces and utilize the concept of temporal coherence to gate the selection of embeddings belonging to a target stream abstracted in memory. These explicit memories are further refined through self-feedback from incoming observations in order to improve the system's selectivity when faced with unknown backgrounds. The model yields stable outcomes of source separation for both speech and music mixtures and demonstrates benefits of explicit memory as a powerful representation of priors that guide information selection from complex inputs.
Collapse
Affiliation(s)
- Ashwin Bellur
- Electrical and Computer Engineering, Johns Hopkins University, Baltimore, USA
| | - Karan Thakkar
- Electrical and Computer Engineering, Johns Hopkins University, Baltimore, USA
| | - Mounya Elhilali
- Electrical and Computer Engineering, Johns Hopkins University, Baltimore, USA
| |
Collapse
|
11
|
Giordano BL, Esposito M, Valente G, Formisano E. Intermediate acoustic-to-semantic representations link behavioral and neural responses to natural sounds. Nat Neurosci 2023; 26:664-672. [PMID: 36928634 PMCID: PMC10076214 DOI: 10.1038/s41593-023-01285-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Accepted: 02/15/2023] [Indexed: 03/18/2023]
Abstract
Recognizing sounds implicates the cerebral transformation of input waveforms into semantic representations. Although past research identified the superior temporal gyrus (STG) as a crucial cortical region, the computational fingerprint of these cerebral transformations remains poorly characterized. Here, we exploit a model comparison framework and contrasted the ability of acoustic, semantic (continuous and categorical) and sound-to-event deep neural network representation models to predict perceived sound dissimilarity and 7 T human auditory cortex functional magnetic resonance imaging responses. We confirm that spectrotemporal modulations predict early auditory cortex (Heschl's gyrus) responses, and that auditory dimensions (for example, loudness, periodicity) predict STG responses and perceived dissimilarity. Sound-to-event deep neural networks predict Heschl's gyrus responses similar to acoustic models but, notably, they outperform all competing models at predicting both STG responses and perceived dissimilarity. Our findings indicate that STG entails intermediate acoustic-to-semantic sound representations that neither acoustic nor semantic models can account for. These representations are compositional in nature and relevant to behavior.
Collapse
Affiliation(s)
- Bruno L Giordano
- Institut de Neurosciences de La Timone, UMR 7289, CNRS and Université Aix-Marseille, Marseille, France.
| | - Michele Esposito
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, the Netherlands
| | - Giancarlo Valente
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, the Netherlands
| | - Elia Formisano
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, the Netherlands. .,Maastricht Centre for Systems Biology (MaCSBio), Faculty of Science and Engineering, Maastricht University, Maastricht, the Netherlands. .,Brightlands Institute for Smart Society (BISS), Maastricht University, Maastricht, the Netherlands.
| |
Collapse
|
12
|
Setti F, Handjaras G, Bottari D, Leo A, Diano M, Bruno V, Tinti C, Cecchetti L, Garbarini F, Pietrini P, Ricciardi E. A modality-independent proto-organization of human multisensory areas. Nat Hum Behav 2023; 7:397-410. [PMID: 36646839 PMCID: PMC10038796 DOI: 10.1038/s41562-022-01507-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 12/05/2022] [Indexed: 01/18/2023]
Abstract
The processing of multisensory information is based upon the capacity of brain regions, such as the superior temporal cortex, to combine information across modalities. However, it is still unclear whether the representation of coherent auditory and visual events requires any prior audiovisual experience to develop and function. Here we measured brain synchronization during the presentation of an audiovisual, audio-only or video-only version of the same narrative in distinct groups of sensory-deprived (congenitally blind and deaf) and typically developed individuals. Intersubject correlation analysis revealed that the superior temporal cortex was synchronized across auditory and visual conditions, even in sensory-deprived individuals who lack any audiovisual experience. This synchronization was primarily mediated by low-level perceptual features, and relied on a similar modality-independent topographical organization of slow temporal dynamics. The human superior temporal cortex is naturally endowed with a functional scaffolding to yield a common representation across multisensory events.
Collapse
Affiliation(s)
- Francesca Setti
- MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
| | | | - Davide Bottari
- MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
| | - Andrea Leo
- Department of Translational Research and Advanced Technologies in Medicine and Surgery, University of Pisa, Pisa, Italy
| | - Matteo Diano
- Department of Psychology, University of Turin, Turin, Italy
| | - Valentina Bruno
- Manibus Lab, Department of Psychology, University of Turin, Turin, Italy
| | - Carla Tinti
- Department of Psychology, University of Turin, Turin, Italy
| | - Luca Cecchetti
- MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
| | | | - Pietro Pietrini
- MoMiLab, IMT School for Advanced Studies Lucca, Lucca, Italy
| | | |
Collapse
|
13
|
Carta S, Mangiacotti AMA, Valdes AL, Reilly RB, Franco F, Di Liberto GM. The impact of temporal synchronisation imprecision on TRF analyses. J Neurosci Methods 2023; 385:109765. [PMID: 36481165 DOI: 10.1016/j.jneumeth.2022.109765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 11/17/2022] [Accepted: 12/02/2022] [Indexed: 12/12/2022]
Affiliation(s)
- Sara Carta
- ADAPT Centre, Trinity College, The University of Dublin, Ireland; School of Computer Science and Statistics, Trinity College, The University of Dublin, Ireland
| | - Anthony M A Mangiacotti
- Department of Psychology, Middlesex University, London, United Kingdom; FISPPA Department, University of Padova, Padova, Italy
| | - Alejandro Lopez Valdes
- Trinity Centre for Biomedical Engineering, Trinity College, The University of Dublin, Ireland; Global Brain Health Institute, Trinity College, The University of Dublin, Ireland; Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Ireland; School of Engineering, Trinity College, The University of Dublin, Ireland
| | - Richard B Reilly
- Trinity Centre for Biomedical Engineering, Trinity College, The University of Dublin, Ireland; Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Ireland; School of Engineering, Trinity College, The University of Dublin, Ireland; School of Medicine, Trinity College, The University of Dublin, Ireland
| | - Fabia Franco
- Department of Psychology, Middlesex University, London, United Kingdom
| | - Giovanni M Di Liberto
- ADAPT Centre, Trinity College, The University of Dublin, Ireland; School of Computer Science and Statistics, Trinity College, The University of Dublin, Ireland; Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Ireland.
| |
Collapse
|
14
|
Lage-Castellanos A, De Martino F, Ghose GM, Gulban OF, Moerel M. Selective attention sharpens population receptive fields in human auditory cortex. Cereb Cortex 2022; 33:5395-5408. [PMID: 36336333 PMCID: PMC10152083 DOI: 10.1093/cercor/bhac427] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 10/03/2022] [Accepted: 10/04/2022] [Indexed: 11/09/2022] Open
Abstract
Abstract
Selective attention enables the preferential processing of relevant stimulus aspects. Invasive animal studies have shown that attending a sound feature rapidly modifies neuronal tuning throughout the auditory cortex. Human neuroimaging studies have reported enhanced auditory cortical responses with selective attention. To date, it remains unclear how the results obtained with functional magnetic resonance imaging (fMRI) in humans relate to the electrophysiological findings in animal models. Here we aim to narrow the gap between animal and human research by combining a selective attention task similar in design to those used in animal electrophysiology with high spatial resolution ultra-high field fMRI at 7 Tesla. Specifically, human participants perform a detection task, whereas the probability of target occurrence varies with sound frequency. Contrary to previous fMRI studies, we show that selective attention resulted in population receptive field sharpening, and consequently reduced responses, at the attended sound frequencies. The difference between our results to those of previous fMRI studies supports the notion that the influence of selective attention on auditory cortex is diverse and may depend on context, stimulus, and task.
Collapse
Affiliation(s)
- Agustin Lage-Castellanos
- Department of Cognitive Neuroscience , Faculty of Psychology and Neuroscience, , 6200 MD, Maastricht , The Netherlands
- Maastricht University , Faculty of Psychology and Neuroscience, , 6200 MD, Maastricht , The Netherlands
- Maastricht Brain Imaging Center (MBIC) , 6200 MD, Maastricht , The Netherlands
- Department of NeuroInformatics, Cuban Neuroscience Center , Havana City 11600 , Cuba
| | - Federico De Martino
- Department of Cognitive Neuroscience , Faculty of Psychology and Neuroscience, , 6200 MD, Maastricht , The Netherlands
- Maastricht University , Faculty of Psychology and Neuroscience, , 6200 MD, Maastricht , The Netherlands
- Maastricht Brain Imaging Center (MBIC) , 6200 MD, Maastricht , The Netherlands
- Center for Magnetic Resonance Research , Department of Radiology, , Minneapolis, MN 55455 , United States
- University of Minnesota , Department of Radiology, , Minneapolis, MN 55455 , United States
| | - Geoffrey M Ghose
- Center for Magnetic Resonance Research , Department of Radiology, , Minneapolis, MN 55455 , United States
- University of Minnesota , Department of Radiology, , Minneapolis, MN 55455 , United States
| | | | - Michelle Moerel
- Department of Cognitive Neuroscience , Faculty of Psychology and Neuroscience, , 6200 MD, Maastricht , The Netherlands
- Maastricht University , Faculty of Psychology and Neuroscience, , 6200 MD, Maastricht , The Netherlands
- Maastricht Brain Imaging Center (MBIC) , 6200 MD, Maastricht , The Netherlands
- Maastricht Centre for Systems Biology, Maastricht University , 6200 MD, Maastricht , The Netherlands
| |
Collapse
|
15
|
Dheerendra P, Baumann S, Joly O, Balezeau F, Petkov CI, Thiele A, Griffiths TD. The Representation of Time Windows in Primate Auditory Cortex. Cereb Cortex 2021; 32:3568-3580. [PMID: 34875029 PMCID: PMC9376871 DOI: 10.1093/cercor/bhab434] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 11/04/2021] [Accepted: 11/05/2021] [Indexed: 11/13/2022] Open
Abstract
Whether human and nonhuman primates process the temporal dimension of sound similarly remains an open question. We examined the brain basis for the processing of acoustic time windows in rhesus macaques using stimuli simulating the spectrotemporal complexity of vocalizations. We conducted functional magnetic resonance imaging in awake macaques to identify the functional anatomy of response patterns to different time windows. We then contrasted it against the responses to identical stimuli used previously in humans. Despite a similar overall pattern, ranging from the processing of shorter time windows in core areas to longer time windows in lateral belt and parabelt areas, monkeys exhibited lower sensitivity to longer time windows than humans. This difference in neuronal sensitivity might be explained by a specialization of the human brain for processing longer time windows in speech.
Collapse
Affiliation(s)
- Pradeep Dheerendra
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK.,Institute of Neuroscience and Psychology, University of Glasgow, Glasgow G128QB, UK
| | - Simon Baumann
- National Institute of Mental Health, NIH, Bethesda, MD 20892-1148, USA.,Department of Psychology, University of Turin, Torino 10124, Italy
| | - Olivier Joly
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK
| | - Fabien Balezeau
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK
| | | | - Alexander Thiele
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK
| | - Timothy D Griffiths
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK
| |
Collapse
|
16
|
Schmitt LM, Erb J, Tune S, Rysop AU, Hartwigsen G, Obleser J. Predicting speech from a cortical hierarchy of event-based time scales. SCIENCE ADVANCES 2021. [PMID: 34860554 DOI: 10.1101/2020.12.19.423616] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
How do predictions in the brain incorporate the temporal unfolding of context in our natural environment? We here provide evidence for a neural coding scheme that sparsely updates contextual representations at the boundary of events. This yields a hierarchical, multilayered organization of predictive language comprehension. Training artificial neural networks to predict the next word in a story at five stacked time scales and then using model-based functional magnetic resonance imaging, we observe an event-based “surprisal hierarchy” evolving along a temporoparietal pathway. Along this hierarchy, surprisal at any given time scale gated bottom-up and top-down connectivity to neighboring time scales. In contrast, surprisal derived from continuously updated context influenced temporoparietal activity only at short time scales. Representing context in the form of increasingly coarse events constitutes a network architecture for making predictions that is both computationally efficient and contextually diverse.
Collapse
Affiliation(s)
- Lea-Maria Schmitt
- Department of Psychology, University of Lübeck, Ratzeburger Allee 160, 23562 Lübeck, Germany
- Center of Brain, Behavior and Metabolism, University of Lübeck, Ratzeburger Allee 160, 23562 Lübeck, Germany
| | - Julia Erb
- Department of Psychology, University of Lübeck, Ratzeburger Allee 160, 23562 Lübeck, Germany
- Center of Brain, Behavior and Metabolism, University of Lübeck, Ratzeburger Allee 160, 23562 Lübeck, Germany
| | - Sarah Tune
- Department of Psychology, University of Lübeck, Ratzeburger Allee 160, 23562 Lübeck, Germany
- Center of Brain, Behavior and Metabolism, University of Lübeck, Ratzeburger Allee 160, 23562 Lübeck, Germany
| | - Anna U Rysop
- Lise Meitner Research Group Cognition and Plasticity, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstraße 1 A, 04103 Leipzig, Germany
| | - Gesa Hartwigsen
- Lise Meitner Research Group Cognition and Plasticity, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstraße 1 A, 04103 Leipzig, Germany
| | - Jonas Obleser
- Department of Psychology, University of Lübeck, Ratzeburger Allee 160, 23562 Lübeck, Germany
- Center of Brain, Behavior and Metabolism, University of Lübeck, Ratzeburger Allee 160, 23562 Lübeck, Germany
| |
Collapse
|
17
|
Schmitt LM, Erb J, Tune S, Rysop AU, Hartwigsen G, Obleser J. Predicting speech from a cortical hierarchy of event-based time scales. SCIENCE ADVANCES 2021; 7:eabi6070. [PMID: 34860554 PMCID: PMC8641937 DOI: 10.1126/sciadv.abi6070] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Accepted: 10/15/2021] [Indexed: 05/30/2023]
Abstract
How do predictions in the brain incorporate the temporal unfolding of context in our natural environment? We here provide evidence for a neural coding scheme that sparsely updates contextual representations at the boundary of events. This yields a hierarchical, multilayered organization of predictive language comprehension. Training artificial neural networks to predict the next word in a story at five stacked time scales and then using model-based functional magnetic resonance imaging, we observe an event-based “surprisal hierarchy” evolving along a temporoparietal pathway. Along this hierarchy, surprisal at any given time scale gated bottom-up and top-down connectivity to neighboring time scales. In contrast, surprisal derived from continuously updated context influenced temporoparietal activity only at short time scales. Representing context in the form of increasingly coarse events constitutes a network architecture for making predictions that is both computationally efficient and contextually diverse.
Collapse
Affiliation(s)
- Lea-Maria Schmitt
- Department of Psychology, University of Lübeck, Ratzeburger Allee 160, 23562 Lübeck, Germany
- Center of Brain, Behavior and Metabolism, University of Lübeck, Ratzeburger Allee 160, 23562 Lübeck, Germany
| | - Julia Erb
- Department of Psychology, University of Lübeck, Ratzeburger Allee 160, 23562 Lübeck, Germany
- Center of Brain, Behavior and Metabolism, University of Lübeck, Ratzeburger Allee 160, 23562 Lübeck, Germany
| | - Sarah Tune
- Department of Psychology, University of Lübeck, Ratzeburger Allee 160, 23562 Lübeck, Germany
- Center of Brain, Behavior and Metabolism, University of Lübeck, Ratzeburger Allee 160, 23562 Lübeck, Germany
| | - Anna U. Rysop
- Lise Meitner Research Group Cognition and Plasticity, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstraße 1 A, 04103 Leipzig, Germany
| | - Gesa Hartwigsen
- Lise Meitner Research Group Cognition and Plasticity, Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstraße 1 A, 04103 Leipzig, Germany
| | - Jonas Obleser
- Department of Psychology, University of Lübeck, Ratzeburger Allee 160, 23562 Lübeck, Germany
- Center of Brain, Behavior and Metabolism, University of Lübeck, Ratzeburger Allee 160, 23562 Lübeck, Germany
| |
Collapse
|
18
|
Predicting neuronal response properties from hemodynamic responses in the auditory cortex. Neuroimage 2021; 244:118575. [PMID: 34517127 DOI: 10.1016/j.neuroimage.2021.118575] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Accepted: 09/10/2021] [Indexed: 11/22/2022] Open
Abstract
Recent functional MRI (fMRI) studies have highlighted differences in responses to natural sounds along the rostral-caudal axis of the human superior temporal gyrus. However, due to the indirect nature of the fMRI signal, it has been challenging to relate these fMRI observations to actual neuronal response properties. To bridge this gap, we present a forward model of the fMRI responses to natural sounds combining a neuronal model of the auditory cortex with physiological modeling of the hemodynamic BOLD response. Neuronal responses are modeled with a dynamic recurrent firing rate model, reflecting the tonotopic, hierarchical processing in the auditory cortex along with the spectro-temporal tradeoff in the rostral-caudal axis of its belt areas. To link modeled neuronal response properties with human fMRI data in the auditory belt regions, we generated a space of neuronal models, which differed parametrically in spectral and temporal specificity of neuronal responses. Then, we obtained predictions of fMRI responses through a biophysical model of the hemodynamic BOLD response (P-DCM). Using Bayesian model comparison, our results showed that the hemodynamic BOLD responses of the caudal belt regions in the human auditory cortex were best explained by modeling faster temporal dynamics and broader spectral tuning of neuronal populations, while rostral belt regions were best explained through fine spectral tuning combined with slower temporal dynamics. These results support the hypotheses of complementary neural information processing along the rostral-caudal axis of the human superior temporal gyrus.
Collapse
|
19
|
Moerel M, Yacoub E, Gulban OF, Lage-Castellanos A, De Martino F. Using high spatial resolution fMRI to understand representation in the auditory network. Prog Neurobiol 2021; 207:101887. [PMID: 32745500 PMCID: PMC7854960 DOI: 10.1016/j.pneurobio.2020.101887] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Revised: 05/27/2020] [Accepted: 07/15/2020] [Indexed: 12/23/2022]
Abstract
Following rapid methodological advances, ultra-high field (UHF) functional and anatomical magnetic resonance imaging (MRI) has been repeatedly and successfully used for the investigation of the human auditory system in recent years. Here, we review this work and argue that UHF MRI is uniquely suited to shed light on how sounds are represented throughout the network of auditory brain regions. That is, the provided gain in spatial resolution at UHF can be used to study the functional role of the small subcortical auditory processing stages and details of cortical processing. Further, by combining high spatial resolution with the versatility of MRI contrasts, UHF MRI has the potential to localize the primary auditory cortex in individual hemispheres. This is a prerequisite to study how sound representation in higher-level auditory cortex evolves from that in early (primary) auditory cortex. Finally, the access to independent signals across auditory cortical depths, as afforded by UHF, may reveal the computations that underlie the emergence of an abstract, categorical sound representation based on low-level acoustic feature processing. Efforts on these research topics are underway. Here we discuss promises as well as challenges that come with studying these research questions using UHF MRI, and provide a future outlook.
Collapse
Affiliation(s)
- Michelle Moerel
- Maastricht Centre for Systems Biology, Maastricht University, Maastricht, the Netherlands; Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, the Netherlands; Maastricht Brain Imaging Center (MBIC), Maastricht, the Netherlands.
| | - Essa Yacoub
- Center for Magnetic Resonance Research, Department of Radiology, University of Minnesota, Minneapolis, USA.
| | - Omer Faruk Gulban
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, the Netherlands; Maastricht Brain Imaging Center (MBIC), Maastricht, the Netherlands; Center for Magnetic Resonance Research, Department of Radiology, University of Minnesota, Minneapolis, USA; Brain Innovation B.V., Maastricht, the Netherlands.
| | - Agustin Lage-Castellanos
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, the Netherlands; Maastricht Brain Imaging Center (MBIC), Maastricht, the Netherlands; Department of NeuroInformatics, Cuban Center for Neuroscience, Cuba.
| | - Federico De Martino
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, the Netherlands; Maastricht Brain Imaging Center (MBIC), Maastricht, the Netherlands; Center for Magnetic Resonance Research, Department of Radiology, University of Minnesota, Minneapolis, USA.
| |
Collapse
|
20
|
Fuglsang SA, Madsen KH, Puonti O, Hjortkjær J, Siebner HR. Mapping cortico-subcortical sensitivity to 4 Hz amplitude modulation depth in human auditory system with functional MRI. Neuroimage 2021; 246:118745. [PMID: 34808364 DOI: 10.1016/j.neuroimage.2021.118745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 11/17/2021] [Accepted: 11/18/2021] [Indexed: 10/19/2022] Open
Abstract
Temporal modulations in the envelope of acoustic waveforms at rates around 4 Hz constitute a strong acoustic cue in speech and other natural sounds. It is often assumed that the ascending auditory pathway is increasingly sensitive to slow amplitude modulation (AM), but sensitivity to AM is typically considered separately for individual stages of the auditory system. Here, we used blood oxygen level dependent (BOLD) fMRI in twenty human subjects (10 male) to measure sensitivity of regional neural activity in the auditory system to 4 Hz temporal modulations. Participants were exposed to AM noise stimuli varying parametrically in modulation depth to characterize modulation-depth effects on BOLD responses. A Bayesian hierarchical modeling approach was used to model potentially nonlinear relations between AM depth and group-level BOLD responses in auditory regions of interest (ROIs). Sound stimulation activated the auditory brainstem and cortex structures in single subjects. BOLD responses to noise exposure in core and belt auditory cortices scaled positively with modulation depth. This finding was corroborated by whole-brain cluster-level inference. Sensitivity to AM depth variations was particularly pronounced in the Heschl's gyrus but also found in higher-order auditory cortical regions. None of the sound-responsive subcortical auditory structures showed a BOLD response profile that reflected the parametric variation in AM depth. The results are compatible with the notion that early auditory cortical regions play a key role in processing low-rate modulation content of sounds in the human auditory system.
Collapse
Affiliation(s)
- Søren A Fuglsang
- Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital Amager and Hvidovre, Hvidovre Denmark.
| | - Kristoffer H Madsen
- Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital Amager and Hvidovre, Hvidovre Denmark; Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Oula Puonti
- Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital Amager and Hvidovre, Hvidovre Denmark; Department of Health Technology, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Jens Hjortkjær
- Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital Amager and Hvidovre, Hvidovre Denmark; Department of Health Technology, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Hartwig R Siebner
- Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital Amager and Hvidovre, Hvidovre Denmark; Department of Neurology, Copenhagen University Hospital Bispebjerg and Frederiksberg, Copenhagen, Denmark; Department of Clinical Medicine, Faculty of Medical and Health Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
21
|
Abstract
Human speech perception results from neural computations that transform external acoustic speech signals into internal representations of words. The superior temporal gyrus (STG) contains the nonprimary auditory cortex and is a critical locus for phonological processing. Here, we describe how speech sound representation in the STG relies on fundamentally nonlinear and dynamical processes, such as categorization, normalization, contextual restoration, and the extraction of temporal structure. A spatial mosaic of local cortical sites on the STG exhibits complex auditory encoding for distinct acoustic-phonetic and prosodic features. We propose that as a population ensemble, these distributed patterns of neural activity give rise to abstract, higher-order phonemic and syllabic representations that support speech perception. This review presents a multi-scale, recurrent model of phonological processing in the STG, highlighting the critical interface between auditory and language systems. Expected final online publication date for the Annual Review of Psychology, Volume 73 is January 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Ilina Bhaya-Grossman
- Department of Neurological Surgery, University of California, San Francisco, California 94143, USA; .,Joint Graduate Program in Bioengineering, University of California, Berkeley and San Francisco, California 94720, USA
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, California 94143, USA;
| |
Collapse
|
22
|
Khalighinejad B, Patel P, Herrero JL, Bickel S, Mehta AD, Mesgarani N. Functional characterization of human Heschl's gyrus in response to natural speech. Neuroimage 2021; 235:118003. [PMID: 33789135 PMCID: PMC8608271 DOI: 10.1016/j.neuroimage.2021.118003] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Revised: 03/23/2021] [Accepted: 03/25/2021] [Indexed: 01/11/2023] Open
Abstract
Heschl's gyrus (HG) is a brain area that includes the primary auditory cortex in humans. Due to the limitations in obtaining direct neural measurements from this region during naturalistic speech listening, the functional organization and the role of HG in speech perception remain uncertain. Here, we used intracranial EEG to directly record neural activity in HG in eight neurosurgical patients as they listened to continuous speech stories. We studied the spatial distribution of acoustic tuning and the organization of linguistic feature encoding. We found a main gradient of change from posteromedial to anterolateral parts of HG. We also observed a decrease in frequency and temporal modulation tuning and an increase in phonemic representation, speaker normalization, speech sensitivity, and response latency. We did not observe a difference between the two brain hemispheres. These findings reveal a functional role for HG in processing and transforming simple to complex acoustic features and inform neurophysiological models of speech processing in the human auditory cortex.
Collapse
Affiliation(s)
- Bahar Khalighinejad
- Mortimer B. Zuckerman Brain Behavior Institute, Columbia University, New York, NY, United States,Department of Electrical Engineering, Columbia University, New York, NY, United States
| | - Prachi Patel
- Mortimer B. Zuckerman Brain Behavior Institute, Columbia University, New York, NY, United States,Department of Electrical Engineering, Columbia University, New York, NY, United States
| | - Jose L. Herrero
- Hofstra Northwell School of Medicine, Manhasset, NY, United States,The Feinstein Institutes for Medical Research, Manhasset, NY, United States
| | - Stephan Bickel
- Hofstra Northwell School of Medicine, Manhasset, NY, United States,The Feinstein Institutes for Medical Research, Manhasset, NY, United States
| | - Ashesh D. Mehta
- Hofstra Northwell School of Medicine, Manhasset, NY, United States,The Feinstein Institutes for Medical Research, Manhasset, NY, United States
| | - Nima Mesgarani
- Mortimer B. Zuckerman Brain Behavior Institute, Columbia University, New York, NY, United States,Department of Electrical Engineering, Columbia University, New York, NY, United States,Corresponding author at: Department of Electrical Engineering, Columbia University, New York, NY, United States. (B. Khalighinejad), (P. Patel), (J.L. Herrero), (S. Bickel), (A.D. Mehta), (N. Mesgarani)
| |
Collapse
|
23
|
Riad R, Karadayi J, Bachoud-Lévi AC, Dupoux E. Learning spectro-temporal representations of complex sounds with parameterized neural networks. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:353. [PMID: 34340514 DOI: 10.1121/10.0005482] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 06/08/2021] [Indexed: 06/13/2023]
Abstract
Deep learning models have become potential candidates for auditory neuroscience research, thanks to their recent successes in a variety of auditory tasks, yet these models often lack interpretability to fully understand the exact computations that have been performed. Here, we proposed a parametrized neural network layer, which computes specific spectro-temporal modulations based on Gabor filters [learnable spectro-temporal filters (STRFs)] and is fully interpretable. We evaluated this layer on speech activity detection, speaker verification, urban sound classification, and zebra finch call type classification. We found that models based on learnable STRFs are on par for all tasks with state-of-the-art and obtain the best performance for speech activity detection. As this layer remains a Gabor filter, it is fully interpretable. Thus, we used quantitative measures to describe distribution of the learned spectro-temporal modulations. Filters adapted to each task and focused mostly on low temporal and spectral modulations. The analyses show that the filters learned on human speech have similar spectro-temporal parameters as the ones measured directly in the human auditory cortex. Finally, we observed that the tasks organized in a meaningful way: the human vocalization tasks closer to each other and bird vocalizations far away from human vocalizations and urban sounds tasks.
Collapse
Affiliation(s)
- Rachid Riad
- Ecole des Hautes Etudes en Sciences Sociales, CNRS, Institut National de Recherche informatique et Automatique, Département d'Études Cognitives, Ecole Normale Supérieure-Paris Sciences et Lettres University, 29 Rue d'Ulm, 75005 Paris, France
| | - Julien Karadayi
- Ecole des Hautes Etudes en Sciences Sociales, CNRS, Institut National de Recherche informatique et Automatique, Département d'Études Cognitives, Ecole Normale Supérieure-Paris Sciences et Lettres University, 29 Rue d'Ulm, 75005 Paris, France
| | - Anne-Catherine Bachoud-Lévi
- NeuroPsychologie Interventionnelle, Département d'Études Cognitives, Ecole Normale Supérieure, Institut National de la Santé et de la Recherche Médicale, Institut Mondor de Recherche Biomédicale, Neuratris, Université Paris-Est Créteil, Paris Sciences et Lettres University, 29 Rue d'Ulm, 75005 Paris, France
| | - Emmanuel Dupoux
- Ecole des Hautes Etudes en Sciences Sociales, CNRS, Institut National de Recherche informatique et Automatique, Département d'Études Cognitives, Ecole Normale Supérieure-Paris Sciences et Lettres University, 29 Rue d'Ulm, 75005 Paris, France
| |
Collapse
|
24
|
Fast Periodic Auditory Stimulation Reveals a Robust Categorical Response to Voices in the Human Brain. eNeuro 2021; 8:ENEURO.0471-20.2021. [PMID: 34016602 PMCID: PMC8225406 DOI: 10.1523/eneuro.0471-20.2021] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 03/03/2021] [Accepted: 04/04/2021] [Indexed: 11/21/2022] Open
Abstract
Voices are arguably among the most relevant sounds in humans' everyday life, and several studies have suggested the existence of voice-selective regions in the human brain. Despite two decades of research, defining the human brain regions supporting voice recognition remains challenging. Moreover, whether neural selectivity to voices is merely driven by acoustic properties specific to human voices (e.g., spectrogram, harmonicity), or whether it also reflects a higher-level categorization response is still under debate. Here, we objectively measured rapid automatic categorization responses to human voices with fast periodic auditory stimulation (FPAS) combined with electroencephalography (EEG). Participants were tested with stimulation sequences containing heterogeneous non-vocal sounds from different categories presented at 4 Hz (i.e., four stimuli/s), with vocal sounds appearing every three stimuli (1.333 Hz). A few minutes of stimulation are sufficient to elicit robust 1.333 Hz voice-selective focal brain responses over superior temporal regions of individual participants. This response is virtually absent for sequences using frequency-scrambled sounds, but is clearly observed when voices are presented among sounds from musical instruments matched for pitch and harmonicity-to-noise ratio (HNR). Overall, our FPAS paradigm demonstrates that the human brain seamlessly categorizes human voices when compared with other sounds including musical instruments' sounds matched for low level acoustic features and that voice-selective responses are at least partially independent from low-level acoustic features, making it a powerful and versatile tool to understand human auditory categorization in general.
Collapse
|
25
|
Nakai T, Koide-Majima N, Nishimoto S. Correspondence of categorical and feature-based representations of music in the human brain. Brain Behav 2021; 11:e01936. [PMID: 33164348 PMCID: PMC7821620 DOI: 10.1002/brb3.1936] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/17/2020] [Revised: 09/24/2020] [Accepted: 10/21/2020] [Indexed: 01/11/2023] Open
Abstract
INTRODUCTION Humans tend to categorize auditory stimuli into discrete classes, such as animal species, language, musical instrument, and music genre. Of these, music genre is a frequently used dimension of human music preference and is determined based on the categorization of complex auditory stimuli. Neuroimaging studies have reported that the superior temporal gyrus (STG) is involved in response to general music-related features. However, there is considerable uncertainty over how discrete music categories are represented in the brain and which acoustic features are more suited for explaining such representations. METHODS We used a total of 540 music clips to examine comprehensive cortical representations and the functional organization of music genre categories. For this purpose, we applied a voxel-wise modeling approach to music-evoked brain activity measured using functional magnetic resonance imaging. In addition, we introduced a novel technique for feature-brain similarity analysis and assessed how discrete music categories are represented based on the cortical response pattern to acoustic features. RESULTS Our findings indicated distinct cortical organizations for different music genres in the bilateral STG, and they revealed representational relationships between different music genres. On comparing different acoustic feature models, we found that these representations of music genres could be explained largely by a biologically plausible spectro-temporal modulation-transfer function model. CONCLUSION Our findings have elucidated the quantitative representation of music genres in the human cortex, indicating the possibility of modeling this categorization of complex auditory stimuli based on brain activity.
Collapse
Affiliation(s)
- Tomoya Nakai
- Center for Information and Neural Networks, National Institute of Information and Communications Technology, Suita, Japan.,Graduate School of Frontier Biosciences, Osaka University, Suita, Japan
| | - Naoko Koide-Majima
- Graduate School of Frontier Biosciences, Osaka University, Suita, Japan.,AI Science Research and Development Promotion Center, National Institute of Information and Communications Technology, Suita, Japan
| | - Shinji Nishimoto
- Center for Information and Neural Networks, National Institute of Information and Communications Technology, Suita, Japan.,Graduate School of Frontier Biosciences, Osaka University, Suita, Japan.,Graduate School of Medicine, Osaka University, Suita, Japan
| |
Collapse
|
26
|
Ponsot E, Varnet L, Wallaert N, Daoud E, Shamma SA, Lorenzi C, Neri P. Mechanisms of Spectrotemporal Modulation Detection for Normal- and Hearing-Impaired Listeners. Trends Hear 2021; 25:2331216520978029. [PMID: 33620023 PMCID: PMC7905488 DOI: 10.1177/2331216520978029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Revised: 10/26/2020] [Accepted: 11/06/2020] [Indexed: 11/20/2022] Open
Abstract
Spectrotemporal modulations (STM) are essential features of speech signals that make them intelligible. While their encoding has been widely investigated in neurophysiology, we still lack a full understanding of how STMs are processed at the behavioral level and how cochlear hearing loss impacts this processing. Here, we introduce a novel methodological framework based on psychophysical reverse correlation deployed in the modulation space to characterize the mechanisms underlying STM detection in noise. We derive perceptual filters for young normal-hearing and older hearing-impaired individuals performing a detection task of an elementary target STM (a given product of temporal and spectral modulations) embedded in other masking STMs. Analyzed with computational tools, our data show that both groups rely on a comparable linear (band-pass)-nonlinear processing cascade, which can be well accounted for by a temporal modulation filter bank model combined with cross-correlation against the target representation. Our results also suggest that the modulation mistuning observed for the hearing-impaired group results primarily from broader cochlear filters. Yet, we find idiosyncratic behaviors that cannot be captured by cochlear tuning alone, highlighting the need to consider variability originating from additional mechanisms. Overall, this integrated experimental-computational approach offers a principled way to assess suprathreshold processing distortions in each individual and could thus be used to further investigate interindividual differences in speech intelligibility.
Collapse
Affiliation(s)
- Emmanuel Ponsot
- Laboratoire des systèmes perceptifs, Département
d′études cognitives, École normale supérieure, Université PSL, CNRS,
Paris, France
- Hearing Technology @ WAVES, Department of Information
Technology, Ghent University, Ghent, Belgium
| | - Léo Varnet
- Laboratoire des systèmes perceptifs, Département
d′études cognitives, École normale supérieure, Université PSL, CNRS,
Paris, France
| | - Nicolas Wallaert
- Laboratoire des systèmes perceptifs, Département
d′études cognitives, École normale supérieure, Université PSL, CNRS,
Paris, France
| | - Elza Daoud
- Aix-Marseille Université, UMR CNRS 7260, Laboratoire
Neurosciences Intégratives et Adaptatives, Centre Saint-Charles,
Marseille, France
| | - Shihab A. Shamma
- Laboratoire des systèmes perceptifs, Département
d′études cognitives, École normale supérieure, Université PSL, CNRS,
Paris, France
| | - Christian Lorenzi
- Laboratoire des systèmes perceptifs, Département
d′études cognitives, École normale supérieure, Université PSL, CNRS,
Paris, France
| | - Peter Neri
- Laboratoire des systèmes perceptifs, Département
d′études cognitives, École normale supérieure, Université PSL, CNRS,
Paris, France
| |
Collapse
|
27
|
Pinho AL, Amadon A, Fabre M, Dohmatob E, Denghien I, Torre JJ, Ginisty C, Becuwe-Desmidt S, Roger S, Laurier L, Joly-Testault V, Médiouni-Cloarec G, Doublé C, Martins B, Pinel P, Eger E, Varoquaux G, Pallier C, Dehaene S, Hertz-Pannier L, Thirion B. Subject-specific segregation of functional territories based on deep phenotyping. Hum Brain Mapp 2020; 42:841-870. [PMID: 33368868 PMCID: PMC7856658 DOI: 10.1002/hbm.25189] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Revised: 07/11/2020] [Accepted: 08/04/2020] [Indexed: 11/08/2022] Open
Abstract
Functional magnetic resonance imaging (fMRI) has opened the possibility to investigate how brain activity is modulated by behavior. Most studies so far are bound to one single task, in which functional responses to a handful of contrasts are analyzed and reported as a group average brain map. Contrariwise, recent data-collection efforts have started to target a systematic spatial representation of multiple mental functions. In this paper, we leverage the Individual Brain Charting (IBC) dataset-a high-resolution task-fMRI dataset acquired in a fixed environment-in order to study the feasibility of individual mapping. First, we verify that the IBC brain maps reproduce those obtained from previous, large-scale datasets using the same tasks. Second, we confirm that the elementary spatial components, inferred across all tasks, are consistently mapped within and, to a lesser extent, across participants. Third, we demonstrate the relevance of the topographic information of the individual contrast maps, showing that contrasts from one task can be predicted by contrasts from other tasks. At last, we showcase the benefit of contrast accumulation for the fine functional characterization of brain regions within a prespecified network. To this end, we analyze the cognitive profile of functional territories pertaining to the language network and prove that these profiles generalize across participants.
Collapse
Affiliation(s)
| | - Alexis Amadon
- Université Paris-Saclay, CEA, CNRS, BAOBAB, NeuroSpin, Gif-sur-Yvette, France
| | - Murielle Fabre
- Cognitive Neuroimaging Unit, INSERM, CEA, Université Paris-Saclay, NeuroSpin center, Gif-sur-Yvette, 91191, France
| | - Elvis Dohmatob
- Université Paris-Saclay, Inria, CEA, Palaiseau, France.,Criteo AI Lab, Paris, France
| | - Isabelle Denghien
- Cognitive Neuroimaging Unit, INSERM, CEA, Université Paris-Saclay, NeuroSpin center, Gif-sur-Yvette, 91191, France
| | | | | | | | | | | | | | | | | | | | - Philippe Pinel
- Cognitive Neuroimaging Unit, INSERM, CEA, Université Paris-Saclay, NeuroSpin center, Gif-sur-Yvette, 91191, France
| | - Evelyn Eger
- Cognitive Neuroimaging Unit, INSERM, CEA, Université Paris-Saclay, NeuroSpin center, Gif-sur-Yvette, 91191, France
| | | | - Christophe Pallier
- Cognitive Neuroimaging Unit, INSERM, CEA, Université Paris-Saclay, NeuroSpin center, Gif-sur-Yvette, 91191, France
| | - Stanislas Dehaene
- Cognitive Neuroimaging Unit, INSERM, CEA, Université Paris-Saclay, NeuroSpin center, Gif-sur-Yvette, 91191, France.,Collège de France, Paris, France
| | - Lucie Hertz-Pannier
- CEA Saclay/DRF/IFJ/NeuroSpin/UNIACT, Paris, France.,UMR 1141, NeuroDiderot, Université de Paris, Paris, France
| | | |
Collapse
|
28
|
Sohoglu E, Davis MH. Rapid computations of spectrotemporal prediction error support perception of degraded speech. eLife 2020; 9:e58077. [PMID: 33147138 PMCID: PMC7641582 DOI: 10.7554/elife.58077] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2020] [Accepted: 10/19/2020] [Indexed: 12/15/2022] Open
Abstract
Human speech perception can be described as Bayesian perceptual inference but how are these Bayesian computations instantiated neurally? We used magnetoencephalographic recordings of brain responses to degraded spoken words and experimentally manipulated signal quality and prior knowledge. We first demonstrate that spectrotemporal modulations in speech are more strongly represented in neural responses than alternative speech representations (e.g. spectrogram or articulatory features). Critically, we found an interaction between speech signal quality and expectations from prior written text on the quality of neural representations; increased signal quality enhanced neural representations of speech that mismatched with prior expectations, but led to greater suppression of speech that matched prior expectations. This interaction is a unique neural signature of prediction error computations and is apparent in neural responses within 100 ms of speech input. Our findings contribute to the detailed specification of a computational model of speech perception based on predictive coding frameworks.
Collapse
Affiliation(s)
- Ediz Sohoglu
- School of Psychology, University of SussexBrightonUnited Kingdom
| | - Matthew H Davis
- MRC Cognition and Brain Sciences UnitCambridgeUnited Kingdom
| |
Collapse
|
29
|
Dynamic Time-Locking Mechanism in the Cortical Representation of Spoken Words. eNeuro 2020; 7:ENEURO.0475-19.2020. [PMID: 32513662 PMCID: PMC7470935 DOI: 10.1523/eneuro.0475-19.2020] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2019] [Revised: 05/15/2020] [Accepted: 06/01/2020] [Indexed: 11/21/2022] Open
Abstract
Human speech has a unique capacity to carry and communicate rich meanings. However, it is not known how the highly dynamic and variable perceptual signal is mapped to existing linguistic and semantic representations. In this novel approach, we used the natural acoustic variability of sounds and mapped them to magnetoencephalography (MEG) data using physiologically-inspired machine-learning models. We aimed at determining how well the models, differing in their representation of temporal information, serve to decode and reconstruct spoken words from MEG recordings in 16 healthy volunteers. We discovered that dynamic time-locking of the cortical activation to the unfolding speech input is crucial for the encoding of the acoustic-phonetic features of speech. In contrast, time-locking was not highlighted in cortical processing of non-speech environmental sounds that conveyed the same meanings as the spoken words, including human-made sounds with temporal modulation content similar to speech. The amplitude envelope of the spoken words was particularly well reconstructed based on cortical evoked responses. Our results indicate that speech is encoded cortically with especially high temporal fidelity. This speech tracking by evoked responses may partly reflect the same underlying neural mechanism as the frequently reported entrainment of the cortical oscillations to the amplitude envelope of speech. Furthermore, the phoneme content was reflected in cortical evoked responses simultaneously with the spectrotemporal features, pointing to an instantaneous transformation of the unfolding acoustic features into linguistic representations during speech processing.
Collapse
|
30
|
Sohoglu E, Kumar S, Chait M, Griffiths TD. Multivoxel codes for representing and integrating acoustic features in human cortex. Neuroimage 2020; 217:116661. [PMID: 32081785 PMCID: PMC7339141 DOI: 10.1016/j.neuroimage.2020.116661] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Revised: 02/13/2020] [Accepted: 02/15/2020] [Indexed: 10/25/2022] Open
Abstract
Using fMRI and multivariate pattern analysis, we determined whether spectral and temporal acoustic features are represented by independent or integrated multivoxel codes in human cortex. Listeners heard band-pass noise varying in frequency (spectral) and amplitude-modulation (AM) rate (temporal) features. In the superior temporal plane, changes in multivoxel activity due to frequency were largely invariant with respect to AM rate (and vice versa), consistent with an independent representation. In contrast, in posterior parietal cortex, multivoxel representation was exclusively integrated and tuned to specific conjunctions of frequency and AM features (albeit weakly). Direct between-region comparisons show that whereas independent coding of frequency weakened with increasing levels of the hierarchy, such a progression for AM and integrated coding was less fine-grained and only evident in the higher hierarchical levels from non-core to parietal cortex (with AM coding weakening and integrated coding strengthening). Our findings support the notion that primary auditory cortex can represent spectral and temporal acoustic features in an independent fashion and suggest a role for parietal cortex in feature integration and the structuring of sensory input.
Collapse
Affiliation(s)
- Ediz Sohoglu
- School of Psychology, University of Sussex, Brighton, BN1 9QH, United Kingdom.
| | - Sukhbinder Kumar
- Institute of Neurobiology, Medical School, Newcastle University, Newcastle Upon Tyne, NE2 4HH, United Kingdom; Wellcome Trust Centre for Human Neuroimaging, University College London, London, WC1N 3BG, United Kingdom
| | - Maria Chait
- Ear Institute, University College London, London, United Kingdom
| | - Timothy D Griffiths
- Institute of Neurobiology, Medical School, Newcastle University, Newcastle Upon Tyne, NE2 4HH, United Kingdom; Wellcome Trust Centre for Human Neuroimaging, University College London, London, WC1N 3BG, United Kingdom
| |
Collapse
|
31
|
Erb J, Schmitt LM, Obleser J. Temporal selectivity declines in the aging human auditory cortex. eLife 2020; 9:55300. [PMID: 32618270 PMCID: PMC7410487 DOI: 10.7554/elife.55300] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Accepted: 07/02/2020] [Indexed: 12/03/2022] Open
Abstract
Current models successfully describe the auditory cortical response to natural sounds with a set of spectro-temporal features. However, these models have hardly been linked to the ill-understood neurobiological changes that occur in the aging auditory cortex. Modelling the hemodynamic response to a rich natural sound mixture in N = 64 listeners of varying age, we here show that in older listeners’ auditory cortex, the key feature of temporal rate is represented with a markedly broader tuning. This loss of temporal selectivity is most prominent in primary auditory cortex and planum temporale, with no such changes in adjacent auditory or other brain areas. Amongst older listeners, we observe a direct relationship between chronological age and temporal-rate tuning, unconfounded by auditory acuity or model goodness of fit. In line with senescent neural dedifferentiation more generally, our results highlight decreased selectivity to temporal information as a hallmark of the aging auditory cortex. It can often be difficult for an older person to understand what someone is saying, particularly in noisy environments. Exactly how and why this age-related change occurs is not clear, but it is thought that older individuals may become less able to tune in to certain features of sound. Newer tools are making it easier to study age-related changes in hearing in the brain. For example, functional magnetic resonance imaging (fMRI) can allow scientists to ‘see’ and measure how certain parts of the brain react to different features of sound. Using fMRI data, researchers can compare how younger and older people process speech. They can also track how speech processing in the brain changes with age. Now, Erb et al. show that older individuals have a harder time tuning into the rhythm of speech. In the experiments, 64 people between the ages of 18 to 78 were asked to listen to speech in a noisy setting while they underwent fMRI. The researchers then tested a computer model using the data. In the older individuals, the brain’s tuning to the timing or rhythm of speech was broader, while the younger participants were more able to finely tune into this feature of sound. The older a person was the less able their brain was to distinguish rhythms in speech, likely making it harder to understand what had been said. This hearing change likely occurs because brain cells become less specialised overtime, which can contribute to many kinds of age-related cognitive decline. This new information about why understanding speech becomes more difficult with age may help scientists develop better hearing aids that are individualised to a person’s specific needs.
Collapse
Affiliation(s)
- Julia Erb
- Department of Psychology, University of Lübeck, Lübeck, Germany
| | | | - Jonas Obleser
- Department of Psychology, University of Lübeck, Lübeck, Germany
| |
Collapse
|
32
|
Affiliation(s)
- Daniela Sammler
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany.
| |
Collapse
|
33
|
Bellur A, Elhilali M. Audio object classification using distributed beliefs and attention. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 2020; 28:729-739. [PMID: 33564695 PMCID: PMC7869589 DOI: 10.1109/taslp.2020.2966867] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
One of the unique characteristics of human hearing is its ability to recognize acoustic objects even in presence of severe noise and distortions. In this work, we explore two mechanisms underlying this ability: 1) redundant mapping of acoustic waveforms along distributed latent representations and 2) adaptive feedback based on prior knowledge to selectively attend to targets of interest. We propose a bio-mimetic account of acoustic object classification by developing a novel distributed deep belief network validated for the task of robust acoustic object classification using the UrbanSound database. The proposed distributed belief network (DBN) encompasses an array of independent sub-networks trained generatively to capture different abstractions of natural sounds. A supervised classifier then performs a readout of this distributed mapping. The overall architecture not only matches the state of the art system for acoustic object classification but leads to significant improvement over the baseline in mismatched noisy conditions (31.4% relative improvement in 0dB conditions). Furthermore, we incorporate mechanisms of attentional feedback that allows the DBN to deploy local memories of sounds targets estimated at multiple views to bias network activation when attending to a particular object. This adaptive feedback results in further improvement of object classification in unseen noise conditions (relative improvement of 54% over the baseline in 0dB conditions).
Collapse
Affiliation(s)
- Ashwin Bellur
- Department of Electrical and Computer Engineering, Laboratory for Computational Audio Perception, Johns Hopkins University
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Laboratory for Computational Audio Perception, Johns Hopkins University
| |
Collapse
|
34
|
Knott V, Wright N, Shah D, Baddeley A, Bowers H, de la Salle S, Labelle A. Change in the Neural Response to Auditory Deviance Following Cognitive Therapy for Hallucinations in Patients With Schizophrenia. Front Psychiatry 2020; 11:555. [PMID: 32595542 PMCID: PMC7304235 DOI: 10.3389/fpsyt.2020.00555] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 06/01/2020] [Indexed: 12/28/2022] Open
Abstract
Adjunctive psychotherapeutic approaches recommended for patients with schizophrenia (SZ) who are fully or partially resistant to pharmacotherapy have rarely utilized biomarkers to enhance the understanding of treatment-effective mechanisms. As SZ patients with persistent auditory verbal hallucinations (AVH) frequently evidence reduced neural responsiveness to external auditory stimulation, which may impact cognitive and functional outcomes, this study examined the effects of cognitive behavioral therapy for voices (CBTv) on clinical and AVH symptoms and the sensory processing of auditory deviants as measured with the electroencephalographically derived mismatch negativity (MMN) response. Twenty-four patients with SZ and AVH were randomly assigned to group CBTv treatment or a treatment as usual (TAU) condition. Patients in the group CBTv condition received treatment for 5 months while the matched control patients received TAU for the same period, followed by 5 months of group CBTv. Assessments were conducted at baseline and at the end of treatment. Although not showing consistent changes in the frequency of AVHs, CBTv (vs. TAU) improved patients' appraisal (p = 0.001) of and behavioral/emotional responses to AVHs, and increased both MMN generation (p = 0.001) and auditory cortex current density (p = 0.002) in response to tone pitch deviants. Improvements in AVH symptoms were correlated with change in pitch deviant MMN and current density in left primary auditory cortex. These findings of improved auditory information processing and symptom-response attributable to CBTv suggest potential clinical and functional benefits of psychotherapeutical approaches for patients with persistent AVHs.
Collapse
Affiliation(s)
- Verner Knott
- School of Psychology, University of Ottawa, Ottawa, ON, Canada.,Clinical Neuroelectrophysiology and Cognitive Research Laboratory, University of Ottawa Institute of Mental Health Research, Ottawa, ON, Canada.,Department of Psychiatry, University of Ottawa, Ottawa, ON, Canada
| | - Nicola Wright
- Schizophrenia Program, The Royal Ottawa Mental Health Centre, Ottawa, ON, Canada
| | - Dhrasti Shah
- School of Psychology, University of Ottawa, Ottawa, ON, Canada
| | - Ashley Baddeley
- Clinical Neuroelectrophysiology and Cognitive Research Laboratory, University of Ottawa Institute of Mental Health Research, Ottawa, ON, Canada
| | - Hayley Bowers
- Schizophrenia Program, The Royal Ottawa Mental Health Centre, Ottawa, ON, Canada
| | - Sara de la Salle
- School of Psychology, University of Ottawa, Ottawa, ON, Canada.,Clinical Neuroelectrophysiology and Cognitive Research Laboratory, University of Ottawa Institute of Mental Health Research, Ottawa, ON, Canada
| | - Alain Labelle
- Department of Psychiatry, University of Ottawa, Ottawa, ON, Canada.,Schizophrenia Program, The Royal Ottawa Mental Health Centre, Ottawa, ON, Canada
| |
Collapse
|
35
|
Herff C, Diener L, Angrick M, Mugler E, Tate MC, Goldrick MA, Krusienski DJ, Slutzky MW, Schultz T. Generating Natural, Intelligible Speech From Brain Activity in Motor, Premotor, and Inferior Frontal Cortices. Front Neurosci 2019; 13:1267. [PMID: 31824257 PMCID: PMC6882773 DOI: 10.3389/fnins.2019.01267] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 11/07/2019] [Indexed: 12/17/2022] Open
Abstract
Neural interfaces that directly produce intelligible speech from brain activity would allow people with severe impairment from neurological disorders to communicate more naturally. Here, we record neural population activity in motor, premotor and inferior frontal cortices during speech production using electrocorticography (ECoG) and show that ECoG signals alone can be used to generate intelligible speech output that can preserve conversational cues. To produce speech directly from neural data, we adapted a method from the field of speech synthesis called unit selection, in which units of speech are concatenated to form audible output. In our approach, which we call Brain-To-Speech, we chose subsequent units of speech based on the measured ECoG activity to generate audio waveforms directly from the neural recordings. Brain-To-Speech employed the user's own voice to generate speech that sounded very natural and included features such as prosody and accentuation. By investigating the brain areas involved in speech production separately, we found that speech motor cortex provided more information for the reconstruction process than the other cortical areas.
Collapse
Affiliation(s)
- Christian Herff
- School of Mental Health & Neuroscience, Maastricht University, Maastricht, Netherlands
- Cognitive Systems Lab, University of Bremen, Bremen, Germany
| | - Lorenz Diener
- Cognitive Systems Lab, University of Bremen, Bremen, Germany
| | - Miguel Angrick
- Cognitive Systems Lab, University of Bremen, Bremen, Germany
| | - Emily Mugler
- Department of Neurology, Northwestern University, Chicago, IL, United States
| | - Matthew C. Tate
- Department of Neurosurgery, Northwestern University, Chicago, IL, United States
| | - Matthew A. Goldrick
- Department of Linguistics, Northwestern University, Chicago, IL, United States
| | - Dean J. Krusienski
- Biomedical Engineering Department, Virginia Commonwealth University, Richmond, VA, United States
| | - Marc W. Slutzky
- Department of Neurology, Northwestern University, Chicago, IL, United States
- Department of Physiology, Northwestern University, Chicago, IL, United States
- Department of Physical Medicine & Rehabilitation, Northwestern University, Chicago, IL, United States
| | - Tanja Schultz
- Cognitive Systems Lab, University of Bremen, Bremen, Germany
| |
Collapse
|
36
|
Abstract
Humans and other animals use spatial hearing to rapidly localize events in the environment. However, neural encoding of sound location is a complex process involving the computation and integration of multiple spatial cues that are not represented directly in the sensory organ (the cochlea). Our understanding of these mechanisms has increased enormously in the past few years. Current research is focused on the contribution of animal models for understanding human spatial audition, the effects of behavioural demands on neural sound location encoding, the emergence of a cue-independent location representation in the auditory cortex, and the relationship between single-source and concurrent location encoding in complex auditory scenes. Furthermore, computational modelling seeks to unravel how neural representations of sound source locations are derived from the complex binaural waveforms of real-life sounds. In this article, we review and integrate the latest insights from neurophysiological, neuroimaging and computational modelling studies of mammalian spatial hearing. We propose that the cortical representation of sound location emerges from recurrent processing taking place in a dynamic, adaptive network of early (primary) and higher-order (posterior-dorsal and dorsolateral prefrontal) auditory regions. This cortical network accommodates changing behavioural requirements and is especially relevant for processing the location of real-life, complex sounds and complex auditory scenes.
Collapse
|
37
|
Ten Oever S, Sack AT. Interactions Between Rhythmic and Feature Predictions to Create Parallel Time-Content Associations. Front Neurosci 2019; 13:791. [PMID: 31427917 PMCID: PMC6688653 DOI: 10.3389/fnins.2019.00791] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Accepted: 07/15/2019] [Indexed: 11/13/2022] Open
Abstract
The brain is inherently proactive, constantly predicting the when (moment) and what (content) of future input in order to optimize information processing. Previous research on such predictions has mainly studied the "when" or "what" domain separately, missing to investigate the potential integration of both types of predictive information. In the absence of such integration, temporal cues are assumed to enhance any upcoming content at the predicted moment in time (general temporal predictor). However, if the when and what prediction domain were integrated, a much more flexible neural mechanism may be proposed in which temporal-feature interactions would allow for the creation of multiple concurrent time-content predictions (parallel time-content predictor). Here, we used a temporal association paradigm in two experiments in which sound identity was systematically paired with a specific time delay after the offset of a rhythmic visual input stream. In Experiment 1, we revealed that participants associated the time delay of presentation with the identity of the sound. In Experiment 2, we unexpectedly found that the strength of this temporal association was negatively related to the EEG steady-state evoked responses (SSVEP) in preceding trials, showing that after high neuronal responses participants responded inconsistent with the time-content associations, similar to adaptation mechanisms. In this experiment, time-content associations were only present for low SSVEP responses in previous trials. These results tentatively show that it is possible to represent multiple time-content paired predictions in parallel, however, future research is needed to investigate this interaction further.
Collapse
Affiliation(s)
- Sanne Ten Oever
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, Netherlands.,Maastricht Brain Imaging Centre, Maastricht, Netherlands
| | - Alexander T Sack
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, Netherlands.,Maastricht Brain Imaging Centre, Maastricht, Netherlands
| |
Collapse
|
38
|
Daube C, Ince RAA, Gross J. Simple Acoustic Features Can Explain Phoneme-Based Predictions of Cortical Responses to Speech. Curr Biol 2019; 29:1924-1937.e9. [PMID: 31130454 PMCID: PMC6584359 DOI: 10.1016/j.cub.2019.04.067] [Citation(s) in RCA: 69] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2018] [Revised: 03/25/2019] [Accepted: 04/25/2019] [Indexed: 01/06/2023]
Abstract
When we listen to speech, we have to make sense of a waveform of sound pressure. Hierarchical models of speech perception assume that, to extract semantic meaning, the signal is transformed into unknown, intermediate neuronal representations. Traditionally, studies of such intermediate representations are guided by linguistically defined concepts, such as phonemes. Here, we argue that in order to arrive at an unbiased understanding of the neuronal responses to speech, we should focus instead on representations obtained directly from the stimulus. We illustrate our view with a data-driven, information theoretic analysis of a dataset of 24 young, healthy humans who listened to a 1 h narrative while their magnetoencephalogram (MEG) was recorded. We find that two recent results, the improved performance of an encoding model in which annotated linguistic and acoustic features were combined and the decoding of phoneme subgroups from phoneme-locked responses, can be explained by an encoding model that is based entirely on acoustic features. These acoustic features capitalize on acoustic edges and outperform Gabor-filtered spectrograms, which can explicitly describe the spectrotemporal characteristics of individual phonemes. By replicating our results in publicly available electroencephalography (EEG) data, we conclude that models of brain responses based on linguistic features can serve as excellent benchmarks. However, we believe that in order to further our understanding of human cortical responses to speech, we should also explore low-level and parsimonious explanations for apparent high-level phenomena.
Collapse
Affiliation(s)
- Christoph Daube
- Institute of Neuroscience and Psychology, University of Glasgow, 62 Hillhead Street, Glasgow G12 8QB, UK.
| | - Robin A A Ince
- Institute of Neuroscience and Psychology, University of Glasgow, 62 Hillhead Street, Glasgow G12 8QB, UK
| | - Joachim Gross
- Institute of Neuroscience and Psychology, University of Glasgow, 62 Hillhead Street, Glasgow G12 8QB, UK; Institute for Biomagnetism and Biosignalanalysis, University of Münster, Malmedyweg 15, 48149 Münster, Germany
| |
Collapse
|
39
|
Angrick M, Herff C, Mugler E, Tate MC, Slutzky MW, Krusienski DJ, Schultz T. Speech synthesis from ECoG using densely connected 3D convolutional neural networks. J Neural Eng 2019; 16:036019. [PMID: 30831567 PMCID: PMC6822609 DOI: 10.1088/1741-2552/ab0c59] [Citation(s) in RCA: 71] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
OBJECTIVE Direct synthesis of speech from neural signals could provide a fast and natural way of communication to people with neurological diseases. Invasively-measured brain activity (electrocorticography; ECoG) supplies the necessary temporal and spatial resolution to decode fast and complex processes such as speech production. A number of impressive advances in speech decoding using neural signals have been achieved in recent years, but the complex dynamics are still not fully understood. However, it is unlikely that simple linear models can capture the relation between neural activity and continuous spoken speech. APPROACH Here we show that deep neural networks can be used to map ECoG from speech production areas onto an intermediate representation of speech (logMel spectrogram). The proposed method uses a densely connected convolutional neural network topology which is well-suited to work with the small amount of data available from each participant. MAIN RESULTS In a study with six participants, we achieved correlations up to r = 0.69 between the reconstructed and original logMel spectrograms. We transfered our prediction back into an audible waveform by applying a Wavenet vocoder. The vocoder was conditioned on logMel features that harnessed a much larger, pre-existing data corpus to provide the most natural acoustic output. SIGNIFICANCE To the best of our knowledge, this is the first time that high-quality speech has been reconstructed from neural recordings during speech production using deep neural networks.
Collapse
Affiliation(s)
- Miguel Angrick
- Cognitive Systems Lab, University of Bremen, Bremen, Germany
| | | | | | | | | | | | | |
Collapse
|
40
|
Kriegeskorte N, Douglas PK. Interpreting encoding and decoding models. Curr Opin Neurobiol 2019; 55:167-179. [PMID: 31039527 DOI: 10.1016/j.conb.2019.04.002] [Citation(s) in RCA: 71] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Revised: 04/08/2019] [Accepted: 04/10/2019] [Indexed: 11/18/2022]
Abstract
Encoding and decoding models are widely used in systems, cognitive, and computational neuroscience to make sense of brain-activity data. However, the interpretation of their results requires care. Decoding models can help reveal whether particular information is present in a brain region in a format the decoder can exploit. Encoding models make comprehensive predictions about representational spaces. In the context of sensory experiments, where stimuli are experimentally controlled, encoding models enable us to test and compare brain-computational theories. Encoding and decoding models typically include fitted linear-model components. Sometimes the weights of the fitted linear combinations are interpreted as reflecting, in an encoding model, the contribution of different sensory features to the representation or, in a decoding model, the contribution of different measured brain responses to a decoded feature. Such interpretations can be problematic when the predictor variables or their noise components are correlated and when priors (or penalties) are used to regularize the fit. Encoding and decoding models are evaluated in terms of their generalization performance. The correct interpretation depends on the level of generalization a model achieves (e.g. to new response measurements for the same stimuli, to new stimuli from the same population, or to stimuli from a different population). Significant decoding or encoding performance of a single model (at whatever level of generality) does not provide strong constraints for theory. Many models must be tested and inferentially compared for analyses to drive theoretical progress.
Collapse
Affiliation(s)
- Nikolaus Kriegeskorte
- Department of Psychology, Department of Neuroscience, Department of Electrical Engineering, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, United States.
| | - Pamela K Douglas
- Center for Cognitive Neuroscience, University of California, Los Angeles, CA, United States
| |
Collapse
|
41
|
Early Blindness Shapes Cortical Representations of Auditory Frequency within Auditory Cortex. J Neurosci 2019; 39:5143-5152. [PMID: 31010853 DOI: 10.1523/jneurosci.2896-18.2019] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2018] [Revised: 04/03/2019] [Accepted: 04/04/2019] [Indexed: 12/29/2022] Open
Abstract
Early loss of vision is classically linked to large-scale cross-modal plasticity within occipital cortex. Much less is known about the effects of early blindness on auditory cortex. Here, we examine the effects of early blindness on the cortical representation of auditory frequency within human primary and secondary auditory areas using fMRI. We observe that 4 individuals with early blindness (2 females), and a group of 5 individuals with anophthalmia (1 female), a condition in which both eyes fail to develop, have lower response amplitudes and narrower voxelwise tuning bandwidths compared with a group of typically sighted individuals. These results provide some of the first evidence in human participants for compensatory plasticity within nondeprived sensory areas as a result of sensory loss.SIGNIFICANCE STATEMENT Early blindness has been linked to enhanced perception of the auditory world, including auditory localization and pitch perception. Here we used fMRI to compare neural responses with auditory stimuli within auditory cortex across sighted, early blind, and anophthalmic individuals, in whom both eyes fail to develop. We find more refined frequency tuning in blind subjects, providing some of the first evidence in human subjects for compensation within nondeprived primary sensory areas as a result of blindness early in life.
Collapse
|
42
|
Kay K, Jamison KW, Vizioli L, Zhang R, Margalit E, Ugurbil K. A critical assessment of data quality and venous effects in sub-millimeter fMRI. Neuroimage 2019; 189:847-869. [PMID: 30731246 PMCID: PMC7737092 DOI: 10.1016/j.neuroimage.2019.02.006] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2018] [Revised: 02/02/2019] [Accepted: 02/04/2019] [Indexed: 01/07/2023] Open
Abstract
Advances in hardware, pulse sequences, and reconstruction techniques have made it possible to perform functional magnetic resonance imaging (fMRI) at sub-millimeter resolution while maintaining high spatial coverage and acceptable signal-to-noise ratio. Here, we examine whether sub-millimeter fMRI can be used as a routine method for obtaining accurate measurements of fine-scale local neural activity. We conducted fMRI in human visual cortex during a simple event-related visual experiment (7 T, gradient-echo EPI, 0.8-mm isotropic voxels, 2.2-s sampling rate, 84 slices), and developed analysis and visualization tools to assess the quality of the data. Our results fall along three lines of inquiry. First, we find that the acquired fMRI images, combined with appropriate surface-based processing, provide reliable and accurate measurements of fine-scale blood oxygenation level dependent (BOLD) activity patterns. Second, we show that the highly folded structure of cortex causes substantial biases on spatial resolution and data visualization. Third, we examine the well-recognized issue of venous contributions to fMRI signals. In a systematic assessment of large sections of cortex measured at a fine scale, we show that time-averaged T2*-weighted EPI intensity is a simple, robust marker of venous effects. These venous effects are unevenly distributed across cortex, are more pronounced in gyri and outer cortical depths, and are, to a certain degree, in consistent locations across subjects relative to cortical folding. Furthermore, we show that these venous effects are strongly correlated with BOLD responses evoked by the experiment. We conclude that sub-millimeter fMRI can provide robust information about fine-scale BOLD activity patterns, but special care must be exercised in visualizing and interpreting these patterns, especially with regards to the confounding influence of the brain's vasculature. To help translate these methodological findings to neuroscience research, we provide practical suggestions for both high-resolution and standard-resolution fMRI studies.
Collapse
Affiliation(s)
- Kendrick Kay
- Center for Magnetic Resonance Research (CMRR), Department of Radiology, University of Minnesota, USA.
| | - Keith W Jamison
- Center for Magnetic Resonance Research (CMRR), Department of Radiology, University of Minnesota, USA
| | - Luca Vizioli
- Center for Magnetic Resonance Research (CMRR), Department of Radiology, University of Minnesota, USA
| | - Ruyuan Zhang
- Center for Magnetic Resonance Research (CMRR), Department of Radiology, University of Minnesota, USA
| | - Eshed Margalit
- Stanford Neurosciences Institute, Stanford University, USA
| | - Kamil Ugurbil
- Center for Magnetic Resonance Research (CMRR), Department of Radiology, University of Minnesota, USA
| |
Collapse
|
43
|
Towards reconstructing intelligible speech from the human auditory cortex. Sci Rep 2019; 9:874. [PMID: 30696881 PMCID: PMC6351601 DOI: 10.1038/s41598-018-37359-z] [Citation(s) in RCA: 86] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2018] [Accepted: 11/30/2018] [Indexed: 11/08/2022] Open
Abstract
Auditory stimulus reconstruction is a technique that finds the best approximation of the acoustic stimulus from the population of evoked neural activity. Reconstructing speech from the human auditory cortex creates the possibility of a speech neuroprosthetic to establish a direct communication with the brain and has been shown to be possible in both overt and covert conditions. However, the low quality of the reconstructed speech has severely limited the utility of this method for brain-computer interface (BCI) applications. To advance the state-of-the-art in speech neuroprosthesis, we combined the recent advances in deep learning with the latest innovations in speech synthesis technologies to reconstruct closed-set intelligible speech from the human auditory cortex. We investigated the dependence of reconstruction accuracy on linear and nonlinear (deep neural network) regression methods and the acoustic representation that is used as the target of reconstruction, including auditory spectrogram and speech synthesis parameters. In addition, we compared the reconstruction accuracy from low and high neural frequency ranges. Our results show that a deep neural network model that directly estimates the parameters of a speech synthesizer from all neural frequencies achieves the highest subjective and objective scores on a digit recognition task, improving the intelligibility by 65% over the baseline method which used linear regression to reconstruct the auditory spectrogram. These results demonstrate the efficacy of deep learning and speech synthesis algorithms for designing the next generation of speech BCI systems, which not only can restore communications for paralyzed patients but also have the potential to transform human-computer interaction technologies.
Collapse
|
44
|
Varoquaux G, Poldrack RA. Predictive models avoid excessive reductionism in cognitive neuroimaging. Curr Opin Neurobiol 2018; 55:1-6. [PMID: 30513462 DOI: 10.1016/j.conb.2018.11.002] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2018] [Revised: 09/15/2018] [Accepted: 11/19/2018] [Indexed: 11/28/2022]
Abstract
Understanding the organization of complex behavior as it relates to the brain requires modeling the behavior, the relevant mental processes, and the corresponding neural activity. Experiments in cognitive neuroscience typically study a psychological process via controlled manipulations, reducing behavior to one of its components. Such reductionism can easily lead to paradigm-bound theories. Predictive models can generalize brain-mind associations to arbitrary new tasks and stimuli. We argue that they are needed to broaden theories beyond specific paradigms. Predicting behavior from neural activity can support robust reverse inference, isolating brain structures that support particular mental processes. The converse prediction enables modeling brain responses as a function of a complete description of the task, rather than building on oppositions.
Collapse
|
45
|
Norman-Haignere SV, McDermott JH. Neural responses to natural and model-matched stimuli reveal distinct computations in primary and nonprimary auditory cortex. PLoS Biol 2018; 16:e2005127. [PMID: 30507943 PMCID: PMC6292651 DOI: 10.1371/journal.pbio.2005127] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Revised: 12/13/2018] [Accepted: 11/08/2018] [Indexed: 11/19/2022] Open
Abstract
A central goal of sensory neuroscience is to construct models that can explain neural responses to natural stimuli. As a consequence, sensory models are often tested by comparing neural responses to natural stimuli with model responses to those stimuli. One challenge is that distinct model features are often correlated across natural stimuli, and thus model features can predict neural responses even if they do not in fact drive them. Here, we propose a simple alternative for testing a sensory model: we synthesize a stimulus that yields the same model response as each of a set of natural stimuli, and test whether the natural and "model-matched" stimuli elicit the same neural responses. We used this approach to test whether a common model of auditory cortex-in which spectrogram-like peripheral input is processed by linear spectrotemporal filters-can explain fMRI responses in humans to natural sounds. Prior studies have that shown that this model has good predictive power throughout auditory cortex, but this finding could reflect feature correlations in natural stimuli. We observed that fMRI responses to natural and model-matched stimuli were nearly equivalent in primary auditory cortex (PAC) but that nonprimary regions, including those selective for music or speech, showed highly divergent responses to the two sound sets. This dissociation between primary and nonprimary regions was less clear from model predictions due to the influence of feature correlations across natural stimuli. Our results provide a signature of hierarchical organization in human auditory cortex, and suggest that nonprimary regions compute higher-order stimulus properties that are not well captured by traditional models. Our methodology enables stronger tests of sensory models and could be broadly applied in other domains.
Collapse
Affiliation(s)
- Sam V. Norman-Haignere
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Zuckerman Institute of Mind, Brain and Behavior, Columbia University, New York, New York, United States of America
- Laboratoire des Sytèmes Perceptifs, Département d’Études Cognitives, ENS, PSL University, CNRS, Paris France
| | - Josh H. McDermott
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Program in Speech and Hearing Biosciences and Technology, Harvard University, Cambridge, Massachusetts, United States of America
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| |
Collapse
|
46
|
Venezia JH, Thurman SM, Richards VM, Hickok G. Hierarchy of speech-driven spectrotemporal receptive fields in human auditory cortex. Neuroimage 2018; 186:647-666. [PMID: 30500424 DOI: 10.1016/j.neuroimage.2018.11.049] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2018] [Revised: 10/11/2018] [Accepted: 11/26/2018] [Indexed: 12/22/2022] Open
Abstract
Existing data indicate that cortical speech processing is hierarchically organized. Numerous studies have shown that early auditory areas encode fine acoustic details while later areas encode abstracted speech patterns. However, it remains unclear precisely what speech information is encoded across these hierarchical levels. Estimation of speech-driven spectrotemporal receptive fields (STRFs) provides a means to explore cortical speech processing in terms of acoustic or linguistic information associated with characteristic spectrotemporal patterns. Here, we estimate STRFs from cortical responses to continuous speech in fMRI. Using a novel approach based on filtering randomly-selected spectrotemporal modulations (STMs) from aurally-presented sentences, STRFs were estimated for a group of listeners and categorized using a data-driven clustering algorithm. 'Behavioral STRFs' highlighting STMs crucial for speech recognition were derived from intelligibility judgments. Clustering revealed that STRFs in the supratemporal plane represented a broad range of STMs, while STRFs in the lateral temporal lobe represented circumscribed STM patterns important to intelligibility. Detailed analysis recovered a bilateral organization with posterior-lateral regions preferentially processing STMs associated with phonological information and anterior-lateral regions preferentially processing STMs associated with word- and phrase-level information. Regions in lateral Heschl's gyrus preferentially processed STMs associated with vocalic information (pitch).
Collapse
Affiliation(s)
- Jonathan H Venezia
- VA Loma Linda Healthcare System, Loma Linda, CA, USA; Dept. of Otolaryngology, School of Medicine, Loma Linda University, Loma Linda, CA, USA.
| | | | - Virginia M Richards
- Depts. of Cognitive Sciences and Language Science, University of California, Irvine, Irvine, CA, USA
| | - Gregory Hickok
- Depts. of Cognitive Sciences and Language Science, University of California, Irvine, Irvine, CA, USA
| |
Collapse
|
47
|
Erb J, Armendariz M, De Martino F, Goebel R, Vanduffel W, Formisano E. Homology and Specificity of Natural Sound-Encoding in Human and Monkey Auditory Cortex. Cereb Cortex 2018; 29:3636-3650. [DOI: 10.1093/cercor/bhy243] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2018] [Revised: 08/08/2018] [Accepted: 09/05/2018] [Indexed: 01/01/2023] Open
Abstract
Abstract
Understanding homologies and differences in auditory cortical processing in human and nonhuman primates is an essential step in elucidating the neurobiology of speech and language. Using fMRI responses to natural sounds, we investigated the representation of multiple acoustic features in auditory cortex of awake macaques and humans. Comparative analyses revealed homologous large-scale topographies not only for frequency but also for temporal and spectral modulations. In both species, posterior regions preferably encoded relatively fast temporal and coarse spectral information, whereas anterior regions encoded slow temporal and fine spectral modulations. Conversely, we observed a striking interspecies difference in cortical sensitivity to temporal modulations: While decoding from macaque auditory cortex was most accurate at fast rates (> 30 Hz), humans had highest sensitivity to ~3 Hz, a relevant rate for speech analysis. These findings suggest that characteristic tuning of human auditory cortex to slow temporal modulations is unique and may have emerged as a critical step in the evolution of speech and language.
Collapse
Affiliation(s)
- Julia Erb
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, 6200 MD Maastricht, The Netherlands
- Maastricht Brain Imaging Center (MBIC), MD Maastricht, The Netherlands
- Department of Psychology, University of Lübeck, Lübeck, Germany
| | | | - Federico De Martino
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, 6200 MD Maastricht, The Netherlands
- Maastricht Brain Imaging Center (MBIC), MD Maastricht, The Netherlands
| | - Rainer Goebel
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, 6200 MD Maastricht, The Netherlands
- Maastricht Brain Imaging Center (MBIC), MD Maastricht, The Netherlands
| | - Wim Vanduffel
- Laboratorium voor Neuro-en Psychofysiologie, KU Leuven, Leuven, Belgium
- MGH Martinos Center, Charlestown, MA, USA
- Harvard Medical School, Boston, MA, USA
- Leuven Brain Institute, Leuven, Belgium
| | - Elia Formisano
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, 6200 MD Maastricht, The Netherlands
- Maastricht Brain Imaging Center (MBIC), MD Maastricht, The Netherlands
- Maastricht Center for Systems Biology (MaCSBio), MD Maastricht, The Netherlands
| |
Collapse
|
48
|
Cortical tracking of multiple streams outside the focus of attention in naturalistic auditory scenes. Neuroimage 2018; 181:617-626. [DOI: 10.1016/j.neuroimage.2018.07.052] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Revised: 07/19/2018] [Accepted: 07/22/2018] [Indexed: 11/30/2022] Open
|
49
|
Abstract
Our ability to make sense of the auditory world results from neural processing that begins in the ear, goes through multiple subcortical areas, and continues in the cortex. The specific contribution of the auditory cortex to this chain of processing is far from understood. Although many of the properties of neurons in the auditory cortex resemble those of subcortical neurons, they show somewhat more complex selectivity for sound features, which is likely to be important for the analysis of natural sounds, such as speech, in real-life listening conditions. Furthermore, recent work has shown that auditory cortical processing is highly context-dependent, integrates auditory inputs with other sensory and motor signals, depends on experience, and is shaped by cognitive demands, such as attention. Thus, in addition to being the locus for more complex sound selectivity, the auditory cortex is increasingly understood to be an integral part of the network of brain regions responsible for prediction, auditory perceptual decision-making, and learning. In this review, we focus on three key areas that are contributing to this understanding: the sound features that are preferentially represented by cortical neurons, the spatial organization of those preferences, and the cognitive roles of the auditory cortex.
Collapse
Affiliation(s)
- Andrew J King
- Department of Physiology, Anatomy & Genetics, University of Oxford, Oxford, OX1 3PT, UK
| | - Sundeep Teki
- Department of Physiology, Anatomy & Genetics, University of Oxford, Oxford, OX1 3PT, UK
| | - Ben D B Willmore
- Department of Physiology, Anatomy & Genetics, University of Oxford, Oxford, OX1 3PT, UK
| |
Collapse
|
50
|
Dai B, Chen C, Long Y, Zheng L, Zhao H, Bai X, Liu W, Zhang Y, Liu L, Guo T, Ding G, Lu C. Neural mechanisms for selectively tuning in to the target speaker in a naturalistic noisy situation. Nat Commun 2018; 9:2405. [PMID: 29921937 PMCID: PMC6008393 DOI: 10.1038/s41467-018-04819-z] [Citation(s) in RCA: 77] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Accepted: 05/29/2018] [Indexed: 11/23/2022] Open
Abstract
The neural mechanism for selectively tuning in to a target speaker while tuning out the others in a multi-speaker situation (i.e., the cocktail-party effect) remains elusive. Here we addressed this issue by measuring brain activity simultaneously from a listener and from multiple speakers while they were involved in naturalistic conversations. Results consistently show selectively enhanced interpersonal neural synchronization (INS) between the listener and the attended speaker at left temporal–parietal junction, compared with that between the listener and the unattended speaker across different multi-speaker situations. Moreover, INS increases significantly prior to the occurrence of verbal responses, and even when the listener’s brain activity precedes that of the speaker. The INS increase is independent of brain-to-speech synchronization in both the anatomical location and frequency range. These findings suggest that INS underlies the selective process in a multi-speaker situation through neural predictions at the content level but not the sensory level of speech. When many people are speaking, e.g. at a party, we can selectively attend to just one speaker. Here, using ‘hyperscanning’, the authors show that interpersonal neural synchronization is selectively increased between a listener and the attended speaker, compared to between the listener and an unattended speaker.
Collapse
Affiliation(s)
- Bohan Dai
- State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, 100875, China.,Max Planck Institute for Psycholinguistics, Nijmegen, 6525 XD, The Netherlands.,Donders Institute for Brain, Cognition and Behavior, Radboud University, Nijmegen, 6525 EN, The Netherlands
| | - Chuansheng Chen
- Department of Psychology and Social Behavior, University of California, Irvine, 92697-7085, CA, USA
| | - Yuhang Long
- State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, 100875, China
| | - Lifen Zheng
- State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, 100875, China
| | - Hui Zhao
- State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, 100875, China
| | - Xialu Bai
- State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, 100875, China
| | - Wenda Liu
- State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, 100875, China
| | - Yuxuan Zhang
- State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, 100875, China
| | - Li Liu
- State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, 100875, China
| | - Taomei Guo
- State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, 100875, China
| | - Guosheng Ding
- State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, 100875, China
| | - Chunming Lu
- State Key Laboratory of Cognitive Neuroscience and Learning & IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing, 100875, China.
| |
Collapse
|