1
|
Homma NY, See JZ, Atencio CA, Hu C, Downer JD, Beitel RE, Cheung SW, Najafabadi MS, Olsen T, Bigelow J, Hasenstaub AR, Malone BJ, Schreiner CE. Receptive-field nonlinearities in primary auditory cortex: a comparative perspective. Cereb Cortex 2024; 34:bhae364. [PMID: 39270676 PMCID: PMC11398879 DOI: 10.1093/cercor/bhae364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 08/14/2024] [Accepted: 08/21/2024] [Indexed: 09/15/2024] Open
Abstract
Cortical processing of auditory information can be affected by interspecies differences as well as brain states. Here we compare multifeature spectro-temporal receptive fields (STRFs) and associated input/output functions or nonlinearities (NLs) of neurons in primary auditory cortex (AC) of four mammalian species. Single-unit recordings were performed in awake animals (female squirrel monkeys, female, and male mice) and anesthetized animals (female squirrel monkeys, rats, and cats). Neuronal responses were modeled as consisting of two STRFs and their associated NLs. The NLs for the STRF with the highest information content show a broad distribution between linear and quadratic forms. In awake animals, we find a higher percentage of quadratic-like NLs as opposed to more linear NLs in anesthetized animals. Moderate sex differences of the shape of NLs were observed between male and female unanesthetized mice. This indicates that the core AC possesses a rich variety of potential computations, particularly in awake animals, suggesting that multiple computational algorithms are at play to enable the auditory system's robust recognition of auditory events.
Collapse
Affiliation(s)
- Natsumi Y Homma
- John & Edward Coleman Memorial Laboratory, Kavli Institute for Fundamental Neuroscience, Department of Otolaryngology-Head and Neck Surgery, University of California San Francisco, San Francisco, CA, USA
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge, UK
| | - Jermyn Z See
- John & Edward Coleman Memorial Laboratory, Kavli Institute for Fundamental Neuroscience, Department of Otolaryngology-Head and Neck Surgery, University of California San Francisco, San Francisco, CA, USA
| | - Craig A Atencio
- John & Edward Coleman Memorial Laboratory, Kavli Institute for Fundamental Neuroscience, Department of Otolaryngology-Head and Neck Surgery, University of California San Francisco, San Francisco, CA, USA
| | - Congcong Hu
- John & Edward Coleman Memorial Laboratory, Kavli Institute for Fundamental Neuroscience, Department of Otolaryngology-Head and Neck Surgery, University of California San Francisco, San Francisco, CA, USA
| | - Joshua D Downer
- John & Edward Coleman Memorial Laboratory, Kavli Institute for Fundamental Neuroscience, Department of Otolaryngology-Head and Neck Surgery, University of California San Francisco, San Francisco, CA, USA
- Center of Neuroscience, University of California Davis, Newton Ct, Davis, CA, USA
| | - Ralph E Beitel
- John & Edward Coleman Memorial Laboratory, Kavli Institute for Fundamental Neuroscience, Department of Otolaryngology-Head and Neck Surgery, University of California San Francisco, San Francisco, CA, USA
| | - Steven W Cheung
- John & Edward Coleman Memorial Laboratory, Kavli Institute for Fundamental Neuroscience, Department of Otolaryngology-Head and Neck Surgery, University of California San Francisco, San Francisco, CA, USA
| | - Mina Sadeghi Najafabadi
- John & Edward Coleman Memorial Laboratory, Kavli Institute for Fundamental Neuroscience, Department of Otolaryngology-Head and Neck Surgery, University of California San Francisco, San Francisco, CA, USA
| | - Timothy Olsen
- John & Edward Coleman Memorial Laboratory, Kavli Institute for Fundamental Neuroscience, Department of Otolaryngology-Head and Neck Surgery, University of California San Francisco, San Francisco, CA, USA
| | - James Bigelow
- John & Edward Coleman Memorial Laboratory, Kavli Institute for Fundamental Neuroscience, Department of Otolaryngology-Head and Neck Surgery, University of California San Francisco, San Francisco, CA, USA
| | - Andrea R Hasenstaub
- John & Edward Coleman Memorial Laboratory, Kavli Institute for Fundamental Neuroscience, Department of Otolaryngology-Head and Neck Surgery, University of California San Francisco, San Francisco, CA, USA
| | - Brian J Malone
- John & Edward Coleman Memorial Laboratory, Kavli Institute for Fundamental Neuroscience, Department of Otolaryngology-Head and Neck Surgery, University of California San Francisco, San Francisco, CA, USA
- Center of Neuroscience, University of California Davis, Newton Ct, Davis, CA, USA
| | - Christoph E Schreiner
- John & Edward Coleman Memorial Laboratory, Kavli Institute for Fundamental Neuroscience, Department of Otolaryngology-Head and Neck Surgery, University of California San Francisco, San Francisco, CA, USA
| |
Collapse
|
2
|
Kurteff GL, Field AM, Asghar S, Tyler-Kabara EC, Clarke D, Weiner HL, Anderson AE, Watrous AJ, Buchanan RJ, Modur PN, Hamilton LS. Processing of auditory feedback in perisylvian and insular cortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.14.593257. [PMID: 38798574 PMCID: PMC11118286 DOI: 10.1101/2024.05.14.593257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
When we speak, we not only make movements with our mouth, lips, and tongue, but we also hear the sound of our own voice. Thus, speech production in the brain involves not only controlling the movements we make, but also auditory and sensory feedback. Auditory responses are typically suppressed during speech production compared to perception, but how this manifests across space and time is unclear. Here we recorded intracranial EEG in seventeen pediatric, adolescent, and adult patients with medication-resistant epilepsy who performed a reading/listening task to investigate how other auditory responses are modulated during speech production. We identified onset and sustained responses to speech in bilateral auditory cortex, with a selective suppression of onset responses during speech production. Onset responses provide a temporal landmark during speech perception that is redundant with forward prediction during speech production. Phonological feature tuning in these "onset suppression" electrodes remained stable between perception and production. Notably, the posterior insula responded at sentence onset for both perception and production, suggesting a role in multisensory integration during feedback control.
Collapse
Affiliation(s)
- Garret Lynn Kurteff
- Department of Speech, Language, and Hearing Sciences, Moody College of Communication, The University of Texas at Austin, Austin, TX, USA
| | - Alyssa M. Field
- Department of Speech, Language, and Hearing Sciences, Moody College of Communication, The University of Texas at Austin, Austin, TX, USA
| | - Saman Asghar
- Department of Speech, Language, and Hearing Sciences, Moody College of Communication, The University of Texas at Austin, Austin, TX, USA
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - Elizabeth C. Tyler-Kabara
- Department of Neurosurgery, Dell Medical School, The University of Texas at Austin, Austin, TX, USA
- Department of Pediatrics, Dell Medical School, The University of Texas at Austin, Austin, TX, USA
| | - Dave Clarke
- Department of Neurosurgery, Dell Medical School, The University of Texas at Austin, Austin, TX, USA
- Department of Pediatrics, Dell Medical School, The University of Texas at Austin, Austin, TX, USA
- Department of Neurology, Dell Medical School, The University of Texas at Austin, Austin, TX, USA
| | - Howard L. Weiner
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - Anne E. Anderson
- Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
| | - Andrew J. Watrous
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - Robert J. Buchanan
- Department of Neurosurgery, Dell Medical School, The University of Texas at Austin, Austin, TX, USA
| | - Pradeep N. Modur
- Department of Neurology, Dell Medical School, The University of Texas at Austin, Austin, TX, USA
| | - Liberty S. Hamilton
- Department of Speech, Language, and Hearing Sciences, Moody College of Communication, The University of Texas at Austin, Austin, TX, USA
- Department of Neurology, Dell Medical School, The University of Texas at Austin, Austin, TX, USA
- Lead contact
| |
Collapse
|
3
|
López Espejo M, David SV. A sparse code for natural sound context in auditory cortex. CURRENT RESEARCH IN NEUROBIOLOGY 2023; 6:100118. [PMID: 38152461 PMCID: PMC10749876 DOI: 10.1016/j.crneur.2023.100118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 10/27/2023] [Accepted: 11/14/2023] [Indexed: 12/29/2023] Open
Abstract
Accurate sound perception can require integrating information over hundreds of milliseconds or even seconds. Spectro-temporal models of sound coding by single neurons in auditory cortex indicate that the majority of sound-evoked activity can be attributed to stimuli with a few tens of milliseconds. It remains uncertain how the auditory system integrates information about sensory context on a longer timescale. Here we characterized long-lasting contextual effects in auditory cortex (AC) using a diverse set of natural sound stimuli. We measured context effects as the difference in a neuron's response to a single probe sound following two different context sounds. Many AC neurons showed context effects lasting longer than the temporal window of a traditional spectro-temporal receptive field. The duration and magnitude of context effects varied substantially across neurons and stimuli. This diversity of context effects formed a sparse code across the neural population that encoded a wider range of contexts than any constituent neuron. Encoding model analysis indicates that context effects can be explained by activity in the local neural population, suggesting that recurrent local circuits support a long-lasting representation of sensory context in auditory cortex.
Collapse
Affiliation(s)
- Mateo López Espejo
- Neuroscience Graduate Program, Oregon Health & Science University, Portland, OR, USA
| | - Stephen V. David
- Otolaryngology, Oregon Health & Science University, Portland, OR, USA
| |
Collapse
|
4
|
Stephen EP, Li Y, Metzger S, Oganian Y, Chang EF. Latent neural dynamics encode temporal context in speech. Hear Res 2023; 437:108838. [PMID: 37441880 PMCID: PMC11182421 DOI: 10.1016/j.heares.2023.108838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 06/15/2023] [Accepted: 07/03/2023] [Indexed: 07/15/2023]
Abstract
Direct neural recordings from human auditory cortex have demonstrated encoding for acoustic-phonetic features of consonants and vowels. Neural responses also encode distinct acoustic amplitude cues related to timing, such as those that occur at the onset of a sentence after a silent period or the onset of the vowel in each syllable. Here, we used a group reduced rank regression model to show that distributed cortical responses support a low-dimensional latent state representation of temporal context in speech. The timing cues each capture more unique variance than all other phonetic features and exhibit rotational or cyclical dynamics in latent space from activity that is widespread over the superior temporal gyrus. We propose that these spatially distributed timing signals could serve to provide temporal context for, and possibly bind across time, the concurrent processing of individual phonetic features, to compose higher-order phonological (e.g. word-level) representations.
Collapse
Affiliation(s)
- Emily P Stephen
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA 94143, United States; Department of Mathematics and Statistics, Boston University, Boston, MA 02215, United States
| | - Yuanning Li
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA 94143, United States; School of Biomedical Engineering, ShanghaiTech University, Shanghai, China
| | - Sean Metzger
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA 94143, United States
| | - Yulia Oganian
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA 94143, United States; Center for Integrative Neuroscience, University of Tübingen, Tübingen, Germany
| | - Edward F Chang
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA 94143, United States.
| |
Collapse
|
5
|
Sadagopan S, Kar M, Parida S. Quantitative models of auditory cortical processing. Hear Res 2023; 429:108697. [PMID: 36696724 PMCID: PMC9928778 DOI: 10.1016/j.heares.2023.108697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 12/17/2022] [Accepted: 01/12/2023] [Indexed: 01/15/2023]
Abstract
To generate insight from experimental data, it is critical to understand the inter-relationships between individual data points and place them in context within a structured framework. Quantitative modeling can provide the scaffolding for such an endeavor. Our main objective in this review is to provide a primer on the range of quantitative tools available to experimental auditory neuroscientists. Quantitative modeling is advantageous because it can provide a compact summary of observed data, make underlying assumptions explicit, and generate predictions for future experiments. Quantitative models may be developed to characterize or fit observed data, to test theories of how a task may be solved by neural circuits, to determine how observed biophysical details might contribute to measured activity patterns, or to predict how an experimental manipulation would affect neural activity. In complexity, quantitative models can range from those that are highly biophysically realistic and that include detailed simulations at the level of individual synapses, to those that use abstract and simplified neuron models to simulate entire networks. Here, we survey the landscape of recently developed models of auditory cortical processing, highlighting a small selection of models to demonstrate how they help generate insight into the mechanisms of auditory processing. We discuss examples ranging from models that use details of synaptic properties to explain the temporal pattern of cortical responses to those that use modern deep neural networks to gain insight into human fMRI data. We conclude by discussing a biologically realistic and interpretable model that our laboratory has developed to explore aspects of vocalization categorization in the auditory pathway.
Collapse
Affiliation(s)
- Srivatsun Sadagopan
- Department of Neurobiology, University of Pittsburgh, Pittsburgh, PA, USA; Center for Neuroscience, University of Pittsburgh, Pittsburgh, PA, USA; Center for the Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, PA, USA; Department of Bioengineering, University of Pittsburgh, Pittsburgh, PA, USA; Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, PA, USA.
| | - Manaswini Kar
- Department of Neurobiology, University of Pittsburgh, Pittsburgh, PA, USA; Center for Neuroscience, University of Pittsburgh, Pittsburgh, PA, USA; Center for the Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, PA, USA
| | - Satyabrata Parida
- Department of Neurobiology, University of Pittsburgh, Pittsburgh, PA, USA; Center for Neuroscience, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
6
|
Desai M, Field AM, Hamilton LS. Dataset size considerations for robust acoustic and phonetic speech encoding models in EEG. Front Hum Neurosci 2023; 16:1001171. [PMID: 36741776 PMCID: PMC9895838 DOI: 10.3389/fnhum.2022.1001171] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 12/22/2022] [Indexed: 01/21/2023] Open
Abstract
In many experiments that investigate auditory and speech processing in the brain using electroencephalography (EEG), the experimental paradigm is often lengthy and tedious. Typically, the experimenter errs on the side of including more data, more trials, and therefore conducting a longer task to ensure that the data are robust and effects are measurable. Recent studies used naturalistic stimuli to investigate the brain's response to individual or a combination of multiple speech features using system identification techniques, such as multivariate temporal receptive field (mTRF) analyses. The neural data collected from such experiments must be divided into a training set and a test set to fit and validate the mTRF weights. While a good strategy is clearly to collect as much data as is feasible, it is unclear how much data are needed to achieve stable results. Furthermore, it is unclear whether the specific stimulus used for mTRF fitting and the choice of feature representation affects how much data would be required for robust and generalizable results. Here, we used previously collected EEG data from our lab using sentence stimuli and movie stimuli as well as EEG data from an open-source dataset using audiobook stimuli to better understand how much data needs to be collected for naturalistic speech experiments measuring acoustic and phonetic tuning. We found that the EEG receptive field structure tested here stabilizes after collecting a training dataset of approximately 200 s of TIMIT sentences, around 600 s of movie trailers training set data, and approximately 460 s of audiobook training set data. Thus, we provide suggestions on the minimum amount of data that would be necessary for fitting mTRFs from naturalistic listening data. Our findings are motivated by highly practical concerns when working with children, patient populations, or others who may not tolerate long study sessions. These findings will aid future researchers who wish to study naturalistic speech processing in healthy and clinical populations while minimizing participant fatigue and retaining signal quality.
Collapse
Affiliation(s)
- Maansi Desai
- Department of Speech, Language, and Hearing Sciences, Moody College of Communication, The University of Texas at Austin, Austin, TX, United States
| | - Alyssa M. Field
- Department of Speech, Language, and Hearing Sciences, Moody College of Communication, The University of Texas at Austin, Austin, TX, United States
| | - Liberty S. Hamilton
- Department of Speech, Language, and Hearing Sciences, Moody College of Communication, The University of Texas at Austin, Austin, TX, United States,Department of Neurology, Dell Medical School, The University of Texas at Austin, Austin, TX, United States,*Correspondence: Liberty S. Hamilton ✉
| |
Collapse
|
7
|
Gilday OD, Praegel B, Maor I, Cohen T, Nelken I, Mizrahi A. Surround suppression in mouse auditory cortex underlies auditory edge detection. PLoS Comput Biol 2023; 19:e1010861. [PMID: 36656876 PMCID: PMC9888713 DOI: 10.1371/journal.pcbi.1010861] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2022] [Revised: 01/31/2023] [Accepted: 01/09/2023] [Indexed: 01/20/2023] Open
Abstract
Surround suppression (SS) is a fundamental property of sensory processing throughout the brain. In the auditory system, the early processing stream encodes sounds using a one dimensional physical space-frequency. Previous studies in the auditory system have shown SS to manifest as bandwidth tuning around the preferred frequency. We asked whether bandwidth tuning can be found around frequencies away from the preferred frequency. We exploited the simplicity of spectral representation of sounds to study SS by manipulating both sound frequency and bandwidth. We recorded single unit spiking activity from the auditory cortex (ACx) of awake mice in response to an array of broadband stimuli with varying central frequencies and bandwidths. Our recordings revealed that a significant portion of neuronal response profiles had a preferred bandwidth that varied in a regular way with the sound's central frequency. To gain insight into the possible mechanism underlying these responses, we modelled neuronal activity using a variation of the "Mexican hat" function often used to model SS. The model accounted for response properties of single neurons with high accuracy. Our data and model show that these responses in ACx obey simple rules resulting from the presence of lateral inhibitory sidebands, mostly above the excitatory band of the neuron, that result in sensitivity to the location of top frequency edges, invariant to other spectral attributes. Our work offers a simple explanation for auditory edge detection and possibly other computations of spectral content in sounds.
Collapse
Affiliation(s)
- Omri David Gilday
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Benedikt Praegel
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
- Department of Neurobiology, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Ido Maor
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
- Department of Neurobiology, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Tav Cohen
- Department of Neurobiology, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Israel Nelken
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
- Department of Neurobiology, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Adi Mizrahi
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
- Department of Neurobiology, The Hebrew University of Jerusalem, Jerusalem, Israel
- * E-mail:
| |
Collapse
|
8
|
Morrill RJ, Bigelow J, DeKloe J, Hasenstaub AR. Audiovisual task switching rapidly modulates sound encoding in mouse auditory cortex. eLife 2022; 11:e75839. [PMID: 35980027 PMCID: PMC9427107 DOI: 10.7554/elife.75839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 08/17/2022] [Indexed: 11/13/2022] Open
Abstract
In everyday behavior, sensory systems are in constant competition for attentional resources, but the cellular and circuit-level mechanisms of modality-selective attention remain largely uninvestigated. We conducted translaminar recordings in mouse auditory cortex (AC) during an audiovisual (AV) attention shifting task. Attending to sound elements in an AV stream reduced both pre-stimulus and stimulus-evoked spiking activity, primarily in deep-layer neurons and neurons without spectrotemporal tuning. Despite reduced spiking, stimulus decoder accuracy was preserved, suggesting improved sound encoding efficiency. Similarly, task-irrelevant mapping stimuli during inter-trial intervals evoked fewer spikes without impairing stimulus encoding, indicating that attentional modulation generalized beyond training stimuli. Importantly, spiking reductions predicted trial-to-trial behavioral accuracy during auditory attention, but not visual attention. Together, these findings suggest auditory attention facilitates sound discrimination by filtering sound-irrelevant background activity in AC, and that the deepest cortical layers serve as a hub for integrating extramodal contextual information.
Collapse
Affiliation(s)
- Ryan J Morrill
- Coleman Memorial Laboratory, University of California, San FranciscoSan FranciscoUnited States
- Neuroscience Graduate Program, University of California, San FranciscoSan FranciscoUnited States
- Department of Otolaryngology–Head and Neck Surgery, University of California, San FranciscoSan FranciscoUnited States
| | - James Bigelow
- Coleman Memorial Laboratory, University of California, San FranciscoSan FranciscoUnited States
- Department of Otolaryngology–Head and Neck Surgery, University of California, San FranciscoSan FranciscoUnited States
| | - Jefferson DeKloe
- Coleman Memorial Laboratory, University of California, San FranciscoSan FranciscoUnited States
- Department of Otolaryngology–Head and Neck Surgery, University of California, San FranciscoSan FranciscoUnited States
| | - Andrea R Hasenstaub
- Coleman Memorial Laboratory, University of California, San FranciscoSan FranciscoUnited States
- Neuroscience Graduate Program, University of California, San FranciscoSan FranciscoUnited States
- Department of Otolaryngology–Head and Neck Surgery, University of California, San FranciscoSan FranciscoUnited States
| |
Collapse
|
9
|
DIANA, a Process-Oriented Model of Human Auditory Word Recognition. Brain Sci 2022; 12:brainsci12050681. [PMID: 35625067 PMCID: PMC9140177 DOI: 10.3390/brainsci12050681] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 05/05/2022] [Accepted: 05/10/2022] [Indexed: 02/04/2023] Open
Abstract
This article presents DIANA, a new, process-oriented model of human auditory word recognition, which takes as its input the acoustic signal and can produce as its output word identifications and lexicality decisions, as well as reaction times. This makes it possible to compare its output with human listeners’ behavior in psycholinguistic experiments. DIANA differs from existing models in that it takes more available neuro-physiological evidence on speech processing into account. For instance, DIANA accounts for the effect of ambiguity in the acoustic signal on reaction times following the Hick–Hyman law and it interprets the acoustic signal in the form of spectro-temporal receptive fields, which are attested in the human superior temporal gyrus, instead of in the form of abstract phonological units. The model consists of three components: activation, decision and execution. The activation and decision components are described in detail, both at the conceptual level (in the running text) and at the computational level (in the Appendices). While the activation component is independent of the listener’s task, the functioning of the decision component depends on this task. The article also describes how DIANA could be improved in the future in order to even better resemble the behavior of human listeners.
Collapse
|
10
|
Abstract
A common approach to interpreting spiking activity is based on identifying the firing fields—regions in physical or configuration spaces that elicit responses of neurons. Common examples include hippocampal place cells that fire at preferred locations in the navigated environment, head direction cells that fire at preferred orientations of the animal’s head, view cells that respond to preferred spots in the visual field, etc. In all these cases, firing fields were discovered empirically, by trial and error. We argue that the existence and a number of properties of the firing fields can be established theoretically, through topological analyses of the neuronal spiking activity. In particular, we use Leray criterion powered by persistent homology theory, Eckhoff conditions and Region Connection Calculus to verify consistency of neuronal responses with a single coherent representation of space.
Collapse
|
11
|
Hamilton LS, Oganian Y, Hall J, Chang EF. Parallel and distributed encoding of speech across human auditory cortex. Cell 2021; 184:4626-4639.e13. [PMID: 34411517 PMCID: PMC8456481 DOI: 10.1016/j.cell.2021.07.019] [Citation(s) in RCA: 80] [Impact Index Per Article: 26.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 02/11/2021] [Accepted: 07/19/2021] [Indexed: 12/27/2022]
Abstract
Speech perception is thought to rely on a cortical feedforward serial transformation of acoustic into linguistic representations. Using intracranial recordings across the entire human auditory cortex, electrocortical stimulation, and surgical ablation, we show that cortical processing across areas is not consistent with a serial hierarchical organization. Instead, response latency and receptive field analyses demonstrate parallel and distinct information processing in the primary and nonprimary auditory cortices. This functional dissociation was also observed where stimulation of the primary auditory cortex evokes auditory hallucination but does not distort or interfere with speech perception. Opposite effects were observed during stimulation of nonprimary cortex in superior temporal gyrus. Ablation of the primary auditory cortex does not affect speech perception. These results establish a distributed functional organization of parallel information processing throughout the human auditory cortex and demonstrate an essential independent role for nonprimary auditory cortex in speech processing.
Collapse
Affiliation(s)
- Liberty S Hamilton
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA
| | - Yulia Oganian
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA
| | - Jeffery Hall
- Department of Neurology and Neurosurgery, McGill University Montreal Neurological Institute, Montreal, QC, H3A 2B4, Canada
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, CA 94158, USA.
| |
Collapse
|
12
|
Boos M, Lücke J, Rieger JW. Generalizable dimensions of human cortical auditory processing of speech in natural soundscapes: A data-driven ultra high field fMRI approach. Neuroimage 2021; 237:118106. [PMID: 33991696 DOI: 10.1016/j.neuroimage.2021.118106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Accepted: 04/25/2021] [Indexed: 11/27/2022] Open
Abstract
Speech comprehension in natural soundscapes rests on the ability of the auditory system to extract speech information from a complex acoustic signal with overlapping contributions from many sound sources. Here we reveal the canonical processing of speech in natural soundscapes on multiple scales by using data-driven modeling approaches to characterize sounds to analyze ultra high field fMRI recorded while participants listened to the audio soundtrack of a movie. We show that at the functional level the neuronal processing of speech in natural soundscapes can be surprisingly low dimensional in the human cortex, highlighting the functional efficiency of the auditory system for a seemingly complex task. Particularly, we find that a model comprising three functional dimensions of auditory processing in the temporal lobes is shared across participants' fMRI activity. We further demonstrate that the three functional dimensions are implemented in anatomically overlapping networks that process different aspects of speech in natural soundscapes. One is most sensitive to complex auditory features present in speech, another to complex auditory features and fast temporal modulations, that are not specific to speech, and one codes mainly sound level. These results were derived with few a-priori assumptions and provide a detailed and computationally reproducible account of the cortical activity in the temporal lobe elicited by the processing of speech in natural soundscapes.
Collapse
Affiliation(s)
- Moritz Boos
- Applied Neurocognitive Psychology Lab, University of Oldenburg, Oldenburg, Germany; Cluster of Excellence "Hearing4all", University of Oldenburg, Oldenburg, Germany.
| | - Jörg Lücke
- Machine Learning Division, University of Oldenburg, Oldenburg, Germany; Cluster of Excellence "Hearing4all", University of Oldenburg, Oldenburg, Germany
| | - Jochem W Rieger
- Applied Neurocognitive Psychology Lab, University of Oldenburg, Oldenburg, Germany; Cluster of Excellence "Hearing4all", University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
13
|
Homma NY, Atencio CA, Schreiner CE. Plasticity of Multidimensional Receptive Fields in Core Rat Auditory Cortex Directed by Sound Statistics. Neuroscience 2021; 467:150-170. [PMID: 33951506 DOI: 10.1016/j.neuroscience.2021.04.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 04/09/2021] [Accepted: 04/24/2021] [Indexed: 11/17/2022]
Abstract
Sensory cortical neurons can nonlinearly integrate a wide range of inputs. The outcome of this nonlinear process can be approximated by more than one receptive field component or filter to characterize the ensuing stimulus preference. The functional properties of multidimensional filters are, however, not well understood. Here we estimated two spectrotemporal receptive fields (STRFs) per neuron using maximally informative dimension analysis. We compared their temporal and spectral modulation properties and determined the stimulus information captured by the two STRFs in core rat auditory cortical fields, primary auditory cortex (A1) and ventral auditory field (VAF). The first STRF is the dominant filter and acts as a sound feature detector in both fields. The second STRF is less feature specific, preferred lower modulations, and had less spike information compared to the first STRF. The information jointly captured by the two STRFs was larger than that captured by the sum of the individual STRFs, reflecting nonlinear interactions of two filters. This information gain was larger in A1. We next determined how the acoustic environment affects the structure and relationship of these two STRFs. Rats were exposed to moderate levels of spectrotemporally modulated noise during development. Noise exposure strongly altered the spectrotemporal preference of the first STRF in both cortical fields. The interaction between the two STRFs was reduced by noise exposure in A1 but not in VAF. The results reveal new functional distinctions between A1 and VAF indicating that (i) A1 has stronger interactions of the two STRFs than VAF, (ii) noise exposure diminishes modulation parameter representation contained in the noise more strongly for the first STRF in both fields, and (iii) plasticity induced by noise exposure can affect the strength of filter interactions in A1. Taken together, ascertaining two STRFs per neuron enhances the understanding of cortical information processing and plasticity effects in core auditory cortex.
Collapse
Affiliation(s)
- Natsumi Y Homma
- Coleman Memorial Laboratory, Department of Otolaryngology - Head and Neck Surgery, University of California San Francisco, San Francisco, USA; Center for Integrative Neuroscience, University of California San Francisco, San Francisco, USA.
| | - Craig A Atencio
- Coleman Memorial Laboratory, Department of Otolaryngology - Head and Neck Surgery, University of California San Francisco, San Francisco, USA
| | - Christoph E Schreiner
- Coleman Memorial Laboratory, Department of Otolaryngology - Head and Neck Surgery, University of California San Francisco, San Francisco, USA; Center for Integrative Neuroscience, University of California San Francisco, San Francisco, USA
| |
Collapse
|
14
|
Lostanlen V, El-Hajj C, Rossignol M, Lafay G, Andén J, Lagrange M. Time-frequency scattering accurately models auditory similarities between instrumental playing techniques. EURASIP JOURNAL ON AUDIO, SPEECH, AND MUSIC PROCESSING 2021; 2021:3. [PMID: 33488686 PMCID: PMC7801324 DOI: 10.1186/s13636-020-00187-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Accepted: 11/09/2020] [Indexed: 06/12/2023]
Abstract
Instrumentalplaying techniques such as vibratos, glissandos, and trills often denote musical expressivity, both in classical and folk contexts. However, most existing approaches to music similarity retrieval fail to describe timbre beyond the so-called "ordinary" technique, use instrument identity as a proxy for timbre quality, and do not allow for customization to the perceptual idiosyncrasies of a new subject. In this article, we ask 31 human participants to organize 78 isolated notes into a set of timbre clusters. Analyzing their responses suggests that timbre perception operates within a more flexible taxonomy than those provided by instruments or playing techniques alone. In addition, we propose a machine listening model to recover the cluster graph of auditory similarities across instruments, mutes, and techniques. Our model relies on joint time-frequency scattering features to extract spectrotemporal modulations as acoustic features. Furthermore, it minimizes triplet loss in the cluster graph by means of the large-margin nearest neighbor (LMNN) metric learning algorithm. Over a dataset of 9346 isolated notes, we report a state-of-the-art average precision at rank five (AP@5) of 99.0%±1. An ablation study demonstrates that removing either the joint time-frequency scattering transform or the metric learning algorithm noticeably degrades performance.
Collapse
Affiliation(s)
- Vincent Lostanlen
- LS2N, CNRS, Centrale Nantes, Nantes University, 1, rue de la Noe, Nantes, 44000 France
| | - Christian El-Hajj
- LS2N, CNRS, Centrale Nantes, Nantes University, 1, rue de la Noe, Nantes, 44000 France
| | | | | | - Joakim Andén
- Department of Mathematics, KTH Royal Institute of Technology, Lindstedtsvägen 25, Stockholm, SE-100 44 Sweden
- Center for Computational Mathematics, Flatiron Institute, 162 5th Avenue, New York, 10010 NY USA
| | - Mathieu Lagrange
- LS2N, CNRS, Centrale Nantes, Nantes University, 1, rue de la Noe, Nantes, 44000 France
| |
Collapse
|
15
|
Pennington JR, David SV. Complementary Effects of Adaptation and Gain Control on Sound Encoding in Primary Auditory Cortex. eNeuro 2020; 7:ENEURO.0205-20.2020. [PMID: 33109632 PMCID: PMC7675144 DOI: 10.1523/eneuro.0205-20.2020] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Revised: 08/15/2020] [Accepted: 09/05/2020] [Indexed: 11/24/2022] Open
Abstract
An important step toward understanding how the brain represents complex natural sounds is to develop accurate models of auditory coding by single neurons. A commonly used model is the linear-nonlinear spectro-temporal receptive field (STRF; LN model). The LN model accounts for many features of auditory tuning, but it cannot account for long-lasting effects of sensory context on sound-evoked activity. Two mechanisms that may support these contextual effects are short-term plasticity (STP) and contrast-dependent gain control (GC), which have inspired expanded versions of the LN model. Both models improve performance over the LN model, but they have never been compared directly. Thus, it is unclear whether they account for distinct processes or describe one phenomenon in different ways. To address this question, we recorded activity of neurons in primary auditory cortex (A1) of awake ferrets during presentation of natural sounds. We then fit models incorporating one nonlinear mechanism (GC or STP) or both (GC+STP) using this single dataset, and measured the correlation between the models' predictions and the recorded neural activity. Both the STP and GC models performed significantly better than the LN model, but the GC+STP model outperformed both individual models. We also quantified the equivalence of STP and GC model predictions and found only modest similarity. Consistent results were observed for a dataset collected in clean and noisy acoustic contexts. These results establish general methods for evaluating the equivalence of arbitrarily complex encoding models and suggest that the STP and GC models describe complementary processes in the auditory system.
Collapse
Affiliation(s)
- Jacob R Pennington
- Department of Mathematics, Washington State University, Vancouver, WA, 98686
| | - Stephen V David
- Department of Otolaryngology, Oregon Health and Science University, Portland, OR, 97239
| |
Collapse
|
16
|
Keshishian M, Akbari H, Khalighinejad B, Herrero JL, Mehta AD, Mesgarani N. Estimating and interpreting nonlinear receptive field of sensory neural responses with deep neural network models. eLife 2020; 9:53445. [PMID: 32589140 PMCID: PMC7347387 DOI: 10.7554/elife.53445] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Accepted: 06/21/2020] [Indexed: 12/21/2022] Open
Abstract
Our understanding of nonlinear stimulus transformations by neural circuits is hindered by the lack of comprehensive yet interpretable computational modeling frameworks. Here, we propose a data-driven approach based on deep neural networks to directly model arbitrarily nonlinear stimulus-response mappings. Reformulating the exact function of a trained neural network as a collection of stimulus-dependent linear functions enables a locally linear receptive field interpretation of the neural network. Predicting the neural responses recorded invasively from the auditory cortex of neurosurgical patients as they listened to speech, this approach significantly improves the prediction accuracy of auditory cortical responses, particularly in nonprimary areas. Moreover, interpreting the functions learned by neural networks uncovered three distinct types of nonlinear transformations of speech that varied considerably from primary to nonprimary auditory regions. The ability of this framework to capture arbitrary stimulus-response mappings while maintaining model interpretability leads to a better understanding of cortical processing of sensory signals.
Collapse
Affiliation(s)
- Menoua Keshishian
- Department of Electrical Engineering, Columbia University, New York, United States.,Zuckerman Mind Brain Behavior Institute, Columbia University, New York, United States
| | - Hassan Akbari
- Department of Electrical Engineering, Columbia University, New York, United States.,Zuckerman Mind Brain Behavior Institute, Columbia University, New York, United States
| | - Bahar Khalighinejad
- Department of Electrical Engineering, Columbia University, New York, United States.,Zuckerman Mind Brain Behavior Institute, Columbia University, New York, United States
| | - Jose L Herrero
- Feinstein Institute for Medical Research, Manhasset, United States.,Department of Neurosurgery, Hofstra-Northwell School of Medicine and Feinstein Institute for Medical Research, Manhasset, United States
| | - Ashesh D Mehta
- Feinstein Institute for Medical Research, Manhasset, United States.,Department of Neurosurgery, Hofstra-Northwell School of Medicine and Feinstein Institute for Medical Research, Manhasset, United States
| | - Nima Mesgarani
- Department of Electrical Engineering, Columbia University, New York, United States.,Zuckerman Mind Brain Behavior Institute, Columbia University, New York, United States
| |
Collapse
|
17
|
Streaming of Repeated Noise in Primary and Secondary Fields of Auditory Cortex. J Neurosci 2020; 40:3783-3798. [PMID: 32273487 DOI: 10.1523/jneurosci.2105-19.2020] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 02/06/2020] [Accepted: 02/11/2020] [Indexed: 11/21/2022] Open
Abstract
Statistical regularities in natural sounds facilitate the perceptual segregation of auditory sources, or streams. Repetition is one cue that drives stream segregation in humans, but the neural basis of this perceptual phenomenon remains unknown. We demonstrated a similar perceptual ability in animals by training ferrets of both sexes to detect a stream of repeating noise samples (foreground) embedded in a stream of random samples (background). During passive listening, we recorded neural activity in primary auditory cortex (A1) and secondary auditory cortex (posterior ectosylvian gyrus, PEG). We used two context-dependent encoding models to test for evidence of streaming of the repeating stimulus. The first was based on average evoked activity per noise sample and the second on the spectro-temporal receptive field. Both approaches tested whether differences in neural responses to repeating versus random stimuli were better modeled by scaling the response to both streams equally (global gain) or by separately scaling the response to the foreground versus background stream (stream-specific gain). Consistent with previous observations of adaptation, we found an overall reduction in global gain when the stimulus began to repeat. However, when we measured stream-specific changes in gain, responses to the foreground were enhanced relative to the background. This enhancement was stronger in PEG than A1. In A1, enhancement was strongest in units with low sparseness (i.e., broad sensory tuning) and with tuning selective for the repeated sample. Enhancement of responses to the foreground relative to the background provides evidence for stream segregation that emerges in A1 and is refined in PEG.SIGNIFICANCE STATEMENT To interact with the world successfully, the brain must parse behaviorally important information from a complex sensory environment. Complex mixtures of sounds often arrive at the ears simultaneously or in close succession, yet they are effortlessly segregated into distinct perceptual sources. This process breaks down in hearing-impaired individuals and speech recognition devices. By identifying the underlying neural mechanisms that facilitate perceptual segregation, we can develop strategies for ameliorating hearing loss and improving speech recognition technology in the presence of background noise. Here, we present evidence to support a hierarchical process, present in primary auditory cortex and refined in secondary auditory cortex, in which sound repetition facilitates segregation.
Collapse
|
18
|
Shih JY, Yuan K, Atencio CA, Schreiner CE. Distinct Manifestations of Cooperative, Multidimensional Stimulus Representations in Different Auditory Forebrain Stations. Cereb Cortex 2020; 30:3130-3147. [PMID: 32047882 DOI: 10.1093/cercor/bhz299] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Classic spectrotemporal receptive fields (STRFs) for auditory neurons are usually expressed as a single linear filter representing a single encoded stimulus feature. Multifilter STRF models represent the stimulus-response relationship of primary auditory cortex (A1) neurons more accurately because they can capture multiple stimulus features. To determine whether multifilter processing is unique to A1, we compared the utility of single-filter versus multifilter STRF models in the medial geniculate body (MGB), anterior auditory field (AAF), and A1 of ketamine-anesthetized cats. We estimated STRFs using both spike-triggered average (STA) and maximally informative dimension (MID) methods. Comparison of basic filter properties of first maximally informative dimension (MID1) and second maximally informative dimension (MID2) in the 3 stations revealed broader spectral integration of MID2s in MGBv and A1 as opposed to AAF. MID2 peak latency was substantially longer than for STAs and MID1s in all 3 stations. The 2-filter MID model captured more information and yielded better predictions in many neurons from all 3 areas but disproportionately more so in AAF and A1 compared with MGBv. Significantly, information-enhancing cooperation between the 2 MIDs was largely restricted to A1 neurons. This demonstrates significant differences in how these 3 forebrain stations process auditory information, as expressed in effective and synergistic multifilter processing.
Collapse
Affiliation(s)
- Jonathan Y Shih
- Department of Otolaryngology-Head and Neck Surgery, Coleman Memorial Laboratory, UCSF Center for Integrative Neuroscience, University of California, San Francisco, CA 94158-0444, USA
| | - Kexin Yuan
- Department of Otolaryngology-Head and Neck Surgery, Coleman Memorial Laboratory, UCSF Center for Integrative Neuroscience, University of California, San Francisco, CA 94158-0444, USA.,Department of Biomedical Engineering, School of Medicine, Tsinghua University, Beijing, 100084, China
| | - Craig A Atencio
- Department of Otolaryngology-Head and Neck Surgery, Coleman Memorial Laboratory, UCSF Center for Integrative Neuroscience, University of California, San Francisco, CA 94158-0444, USA
| | - Christoph E Schreiner
- Department of Otolaryngology-Head and Neck Surgery, Coleman Memorial Laboratory, UCSF Center for Integrative Neuroscience, University of California, San Francisco, CA 94158-0444, USA
| |
Collapse
|
19
|
Sadras N, Pesaran B, Shanechi MM. A point-process matched filter for event detection and decoding from population spike trains. J Neural Eng 2019; 16:066016. [PMID: 31437831 DOI: 10.1088/1741-2552/ab3dbc] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
OBJECTIVE Information encoding in neurons can be described through their response fields. The spatial response field of a neuron is the region of space in which a sensory stimulus or a behavioral event causes that neuron to fire. Neurons can also exhibit temporal response fields (TRFs), which characterize a transient response to stimulus or behavioral event onsets. These neurons can thus be described by a spatio-temporal response field (STRF). The activity of neurons with STRFs can be well-described with point process models that characterize binary spike trains with an instantaneous firing rate that is a function of both time and space. However, developing decoders for point process models of neurons that exhibit TRFs is challenging because it requires prior knowledge of event onset times, which are unknown. Indeed, point process filters (PPF) to date have largely focused on decoding neuronal activity without considering TRFs. Also, neural classifiers have required data to be behavior- or stimulus-aligned, i.e. event times to be known, which is often not possible in real-world applications. Our objective in this work is to develop a viable decoder for neurons with STRFs when event times are unknown. APPROACH To enable decoding of neurons with STRFs, we develop a novel point-process matched filter (PPMF) that can detect events and estimate their onset times from population spike trains. We also devise a PPF for neurons with transient responses as characterized by STRFs. When neurons exhibit STRFs and event times are unknown, the PPMF can be combined with the PPF or with discrete classifiers for continuous and discrete brain state decoding, respectively. MAIN RESULTS We validate our algorithm on two datasets: simulated spikes from neurons that encode visual saliency in response to stimuli, and prefrontal spikes recorded in a monkey performing a delayed-saccade task. We show that the PPMF can estimate the stimulus times and saccade times accurately. Further, the PPMF combined with the PPF can decode visual saliency maps without knowing the stimulus times. Similarly, the PPMF combined with a point process classifier can decode the saccade direction without knowing the saccade times. SIGNIFICANCE These event detection and decoding algorithms can help develop neurotechnologies to decode cognitive states from neural responses that exhibit STRFs.
Collapse
Affiliation(s)
- Nitin Sadras
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, United States of America
| | | | | |
Collapse
|
20
|
Lopez Espejo M, Schwartz ZP, David SV. Spectral tuning of adaptation supports coding of sensory context in auditory cortex. PLoS Comput Biol 2019; 15:e1007430. [PMID: 31626624 PMCID: PMC6821137 DOI: 10.1371/journal.pcbi.1007430] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 10/30/2019] [Accepted: 09/23/2019] [Indexed: 12/19/2022] Open
Abstract
Perception of vocalizations and other behaviorally relevant sounds requires integrating acoustic information over hundreds of milliseconds. Sound-evoked activity in auditory cortex typically has much shorter latency, but the acoustic context, i.e., sound history, can modulate sound evoked activity over longer periods. Contextual effects are attributed to modulatory phenomena, such as stimulus-specific adaption and contrast gain control. However, an encoding model that links context to natural sound processing has yet to be established. We tested whether a model in which spectrally tuned inputs undergo adaptation mimicking short-term synaptic plasticity (STP) can account for contextual effects during natural sound processing. Single-unit activity was recorded from primary auditory cortex of awake ferrets during presentation of noise with natural temporal dynamics and fully natural sounds. Encoding properties were characterized by a standard linear-nonlinear spectro-temporal receptive field (LN) model and variants that incorporated STP-like adaptation. In the adapting models, STP was applied either globally across all input spectral channels or locally to subsets of channels. For most neurons, models incorporating local STP predicted neural activity as well or better than LN and global STP models. The strength of nonlinear adaptation varied across neurons. Within neurons, adaptation was generally stronger for spectral channels with excitatory than inhibitory gain. Neurons showing improved STP model performance also tended to undergo stimulus-specific adaptation, suggesting a common mechanism for these phenomena. When STP models were compared between passive and active behavior conditions, response gain often changed, but average STP parameters were stable. Thus, spectrally and temporally heterogeneous adaptation, subserved by a mechanism with STP-like dynamics, may support representation of the complex spectro-temporal patterns that comprise natural sounds across wide-ranging sensory contexts.
Collapse
Affiliation(s)
- Mateo Lopez Espejo
- Neuroscience Graduate Program, Oregon Health and Science University, Portland, OR, United States of America
| | - Zachary P. Schwartz
- Neuroscience Graduate Program, Oregon Health and Science University, Portland, OR, United States of America
| | - Stephen V. David
- Oregon Hearing Research Center, Oregon Health and Science University, Portland, OR, United States of America
| |
Collapse
|
21
|
Cortical Tracking of Complex Sound Envelopes: Modeling the Changes in Response with Intensity. eNeuro 2019; 6:ENEURO.0082-19.2019. [PMID: 31171606 PMCID: PMC6597859 DOI: 10.1523/eneuro.0082-19.2019] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Revised: 05/02/2019] [Accepted: 05/03/2019] [Indexed: 11/21/2022] Open
Abstract
Characterizing how the brain responds to stimuli has been a goal of sensory neuroscience for decades. One key approach has been to fit linear models to describe the relationship between sensory inputs and neural responses. This has included models aimed at predicting spike trains, local field potentials, BOLD responses, and EEG/MEG. In the case of EEG/MEG, one explicit use of this linear modeling approach has been the fitting of so-called temporal response functions (TRFs). TRFs have been used to study how auditory cortex tracks the amplitude envelope of acoustic stimuli, including continuous speech. However, such linear models typically assume that variations in the amplitude of the stimulus feature (i.e., the envelope) produce variations in the magnitude but not the latency or morphology of the resulting neural response. Here, we show that by amplitude binning the stimulus envelope, and then using it to fit a multivariate TRF, we can better account for these amplitude-dependent changes, and that this leads to a significant improvement in model performance for both amplitude-modulated noise and continuous speech in humans. We also show that this performance can be further improved through the inclusion of an additional envelope representation that emphasizes onsets and positive changes in the stimulus, consistent with the idea that while some neurons track the entire envelope, others respond preferentially to onsets in the stimulus. We contend that these results have practical implications for researchers interested in modeling brain responses to amplitude modulated sounds.
Collapse
|
22
|
Dong M, Huang X, Xu B. Unsupervised speech recognition through spike-timing-dependent plasticity in a convolutional spiking neural network. PLoS One 2018; 13:e0204596. [PMID: 30496179 PMCID: PMC6264808 DOI: 10.1371/journal.pone.0204596] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Accepted: 09/11/2018] [Indexed: 11/17/2022] Open
Abstract
Speech recognition (SR) has been improved significantly by artificial neural networks (ANNs), but ANNs have the drawbacks of biologically implausibility and excessive power consumption because of the nonlocal transfer of real-valued errors and weights. While spiking neural networks (SNNs) have the potential to solve these drawbacks of ANNs due to their efficient spike communication and their natural way to utilize kinds of synaptic plasticity rules found in brain for weight modification. However, existing SNN models for SR either had bad performance, or were trained in biologically implausible ways. In this paper, we present a biologically inspired convolutional SNN model for SR. The network adopts the time-to-first-spike coding scheme for fast and efficient information processing. A biological learning rule, spike-timing-dependent plasticity (STDP), is used to adjust the synaptic weights of convolutional neurons to form receptive fields in an unsupervised way. In the convolutional structure, the strategy of local weight sharing is introduced and could lead to better feature extraction of speech signals than global weight sharing. We first evaluated the SNN model with a linear support vector machine (SVM) on the TIDIGITS dataset and it got the performance of 97.5%, comparable to the best results of ANNs. Deep analysis on network outputs showed that, not only are the output data more linearly separable, but they also have fewer dimensions and become sparse. To further confirm the validity of our model, we trained it on a more difficult recognition task based on the TIMIT dataset, and it got a high performance of 93.8%. Moreover, a linear spike-based classifier-tempotron-can also achieve high accuracies very close to that of SVM on both the two tasks. These demonstrate that an STDP-based convolutional SNN model equipped with local weight sharing and temporal coding is capable of solving the SR task accurately and efficiently.
Collapse
Affiliation(s)
- Meng Dong
- School of Automation, Harbin University of Science and Technology, Harbin, Heilongjiang, China
- Research Center for Brain-inspired Intelligence, Institute of Automation, Chinese Academy of Sciences, Beijing, China
| | - Xuhui Huang
- Research Center for Brain-inspired Intelligence, Institute of Automation, Chinese Academy of Sciences, Beijing, China
| | - Bo Xu
- Research Center for Brain-inspired Intelligence, Institute of Automation, Chinese Academy of Sciences, Beijing, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
- Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
23
|
Sadras N, Shanechi MM. Decoding Spike Trains from Neurons with Spatio-Temporal Receptive Fields. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2018; 2018:2012-2015. [PMID: 30440795 DOI: 10.1109/embc.2018.8512598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The point-process filter (PPF) is a real-time recursive algorithm that computes the minimum mean-squared error estimate of a behavioral state, given neural spiking observations. When used with stimulus-sensitive neurons that represent behavioral states transiently, the PPF needs to know the times at which stimuli will occur. However, these times will not be known a-priori. In this work, we develop a matched-filter point process filter (MF-PPF) that can decode behavioral states that are encoded transiently in neural activity when stimulus times are unknown. A linear filter matched to each neuron's temporal receptive field is used to estimate stimulus onset times, which are then fed into the PPF to decode the behavioral state. As an example, we use the MF-PPF to decode visual saliency from simulated superior colliculus spiking activity. This new decoder has the potential to decode behavioral states from brain regions with transient representations and temporal receptive fields.
Collapse
|
24
|
Zhu S, Allitt B, Samuel A, Lui L, Rosa MGP, Rajan R. Distributed representation of vocalization pitch in marmoset primary auditory cortex. Eur J Neurosci 2018; 49:179-198. [PMID: 30307660 DOI: 10.1111/ejn.14204] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2018] [Revised: 09/10/2018] [Accepted: 10/04/2018] [Indexed: 11/30/2022]
Abstract
The pitch of vocalizations is a key communication feature aiding recognition of individuals and separating sound sources in complex acoustic environments. The neural representation of the pitch of periodic sounds is well defined. However, many natural sounds, like complex vocalizations, contain rich, aperiodic or not strictly periodic frequency content and/or include high-frequency components, but still evoke a strong sense of pitch. Indeed, such sounds are the rule, not the exception but the cortical mechanisms for encoding pitch of such sounds are unknown. We investigated how neurons in the high-frequency representation of primary auditory cortex (A1) of marmosets encoded changes in pitch of four natural vocalizations, two centred around a dominant frequency similar to the neuron's best sensitivity and two around a much lower dominant frequency. Pitch was varied over a fine range that can be used by marmosets to differentiate individuals. The responses of most high-frequency A1 neurons were sensitive to pitch changes in all four vocalizations, with a smaller proportion of the neurons showing pitch-insensitive responses. Classically defined excitatory drive, from the neuron's monaural frequency response area, predicted responses to changes in vocalization pitch in <30% of neurons suggesting most pitch tuning observed is not simple frequency-level response. Moreover, 39% of A1 neurons showed call-invariant tuning of pitch. These results suggest that distributed activity across A1 can represent the pitch of natural sounds over a fine, functionally relevant range, and exhibits pitch tuning for vocalizations within and outside the classical neural tuning area.
Collapse
Affiliation(s)
- Shuyu Zhu
- Biomedicine Discovery Institute and Department of Physiology, Monash University, Clayton, Victoria, Australia.,Centre of Excellence in Integrative Brain Function, Australian Research Council, Clayton, Victoria, Australia
| | - Ben Allitt
- Biomedicine Discovery Institute and Department of Physiology, Monash University, Clayton, Victoria, Australia
| | - Anil Samuel
- Biomedicine Discovery Institute and Department of Physiology, Monash University, Clayton, Victoria, Australia
| | - Leo Lui
- Biomedicine Discovery Institute and Department of Physiology, Monash University, Clayton, Victoria, Australia.,Centre of Excellence in Integrative Brain Function, Australian Research Council, Clayton, Victoria, Australia
| | - Marcello G P Rosa
- Biomedicine Discovery Institute and Department of Physiology, Monash University, Clayton, Victoria, Australia.,Centre of Excellence in Integrative Brain Function, Australian Research Council, Clayton, Victoria, Australia
| | - Ramesh Rajan
- Biomedicine Discovery Institute and Department of Physiology, Monash University, Clayton, Victoria, Australia
| |
Collapse
|
25
|
Abstract
Our ability to make sense of the auditory world results from neural processing that begins in the ear, goes through multiple subcortical areas, and continues in the cortex. The specific contribution of the auditory cortex to this chain of processing is far from understood. Although many of the properties of neurons in the auditory cortex resemble those of subcortical neurons, they show somewhat more complex selectivity for sound features, which is likely to be important for the analysis of natural sounds, such as speech, in real-life listening conditions. Furthermore, recent work has shown that auditory cortical processing is highly context-dependent, integrates auditory inputs with other sensory and motor signals, depends on experience, and is shaped by cognitive demands, such as attention. Thus, in addition to being the locus for more complex sound selectivity, the auditory cortex is increasingly understood to be an integral part of the network of brain regions responsible for prediction, auditory perceptual decision-making, and learning. In this review, we focus on three key areas that are contributing to this understanding: the sound features that are preferentially represented by cortical neurons, the spatial organization of those preferences, and the cognitive roles of the auditory cortex.
Collapse
Affiliation(s)
- Andrew J King
- Department of Physiology, Anatomy & Genetics, University of Oxford, Oxford, OX1 3PT, UK
| | - Sundeep Teki
- Department of Physiology, Anatomy & Genetics, University of Oxford, Oxford, OX1 3PT, UK
| | - Ben D B Willmore
- Department of Physiology, Anatomy & Genetics, University of Oxford, Oxford, OX1 3PT, UK
| |
Collapse
|
26
|
Schwartz ZP, David SV. Focal Suppression of Distractor Sounds by Selective Attention in Auditory Cortex. Cereb Cortex 2018; 28:323-339. [PMID: 29136104 PMCID: PMC6057511 DOI: 10.1093/cercor/bhx288] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2017] [Indexed: 11/15/2022] Open
Abstract
Auditory selective attention is required for parsing crowded acoustic environments, but cortical systems mediating the influence of behavioral state on auditory perception are not well characterized. Previous neurophysiological studies suggest that attention produces a general enhancement of neural responses to important target sounds versus irrelevant distractors. However, behavioral studies suggest that in the presence of masking noise, attention provides a focal suppression of distractors that compete with targets. Here, we compared effects of attention on cortical responses to masking versus non-masking distractors, controlling for effects of listening effort and general task engagement. We recorded single-unit activity from primary auditory cortex (A1) of ferrets during behavior and found that selective attention decreased responses to distractors masking targets in the same spectral band, compared with spectrally distinct distractors. This suppression enhanced neural target detection thresholds, suggesting that limited attention resources serve to focally suppress responses to distractors that interfere with target detection. Changing effort by manipulating target salience consistently modulated spontaneous but not evoked activity. Task engagement and changing effort tended to affect the same neurons, while attention affected an independent population, suggesting that distinct feedback circuits mediate effects of attention and effort in A1.
Collapse
Affiliation(s)
- Zachary P Schwartz
- Neuroscience Graduate Program, Oregon Health and Science University, OR, USA
| | - Stephen V David
- Oregon Hearing Research Center, Oregon Health and Science University, OR, USA
- Address Correspondence to Stephen V. David, Oregon Hearing Research Center, Oregon Health and Science University, 3181 SW Sam Jackson Park Road, MC L335A, Portland, OR 97239, USA.
| |
Collapse
|
27
|
Hamilton LS, Huth AG. The revolution will not be controlled: natural stimuli in speech neuroscience. LANGUAGE, COGNITION AND NEUROSCIENCE 2018; 35:573-582. [PMID: 32656294 PMCID: PMC7324135 DOI: 10.1080/23273798.2018.1499946] [Citation(s) in RCA: 106] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2018] [Accepted: 07/03/2018] [Indexed: 05/22/2023]
Abstract
Humans have a unique ability to produce and consume rich, complex, and varied language in order to communicate ideas to one another. Still, outside of natural reading, the most common methods for studying how our brains process speech or understand language use only isolated words or simple sentences. Recent studies have upset this status quo by employing complex natural stimuli and measuring how the brain responds to language as it is used. In this article we argue that natural stimuli offer many advantages over simplified, controlled stimuli for studying how language is processed by the brain. Furthermore, the downsides of using natural language stimuli can be mitigated using modern statistical and computational techniques.
Collapse
Affiliation(s)
- Liberty S. Hamilton
- Communication Sciences & Disorders, Moody College of Communication, The University of Texas at Austin, Austin, USA
- Department of Neurology, Dell Medical School, The University of Texas at Austin, Austin, USA
| | - Alexander G. Huth
- Department of Neuroscience, The University of Texas at Austin, Austin, USA
- Department of Computer Science, The University of Texas at Austin, Austin, USA
| |
Collapse
|
28
|
See JZ, Atencio CA, Sohal VS, Schreiner CE. Coordinated neuronal ensembles in primary auditory cortical columns. eLife 2018; 7:e35587. [PMID: 29869986 PMCID: PMC6017807 DOI: 10.7554/elife.35587] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Accepted: 06/03/2018] [Indexed: 12/15/2022] Open
Abstract
The synchronous activity of groups of neurons is increasingly thought to be important in cortical information processing and transmission. However, most studies of processing in the primary auditory cortex (AI) have viewed neurons as independent filters; little is known about how coordinated AI neuronal activity is expressed throughout cortical columns and how it might enhance the processing of auditory information. To address this, we recorded from populations of neurons in AI cortical columns of anesthetized rats and, using dimensionality reduction techniques, identified multiple coordinated neuronal ensembles (cNEs), which are groups of neurons with reliable synchronous activity. We show that cNEs reflect local network configurations with enhanced information encoding properties that cannot be accounted for by stimulus-driven synchronization alone. Furthermore, similar cNEs were identified in both spontaneous and evoked activity, indicating that columnar cNEs are stable functional constructs that may represent principal units of information processing in AI.
Collapse
Affiliation(s)
- Jermyn Z See
- UCSF Center for Integrative NeuroscienceUniversity of California, San FranciscoSan FranciscoUnited States
- Coleman Memorial LaboratoryUniversity of California, San FranciscoSan FranciscoUnited States
- Department of Otolaryngology – Head and Neck SurgeryUniversity of California, San FranciscoSan FranciscoUnited States
- Department of PsychiatryUniversity of CaliforniaSan FranciscoUnited States
| | - Craig A Atencio
- UCSF Center for Integrative NeuroscienceUniversity of California, San FranciscoSan FranciscoUnited States
- Coleman Memorial LaboratoryUniversity of California, San FranciscoSan FranciscoUnited States
- Department of Otolaryngology – Head and Neck SurgeryUniversity of California, San FranciscoSan FranciscoUnited States
| | - Vikaas S Sohal
- UCSF Center for Integrative NeuroscienceUniversity of California, San FranciscoSan FranciscoUnited States
- Department of PsychiatryUniversity of CaliforniaSan FranciscoUnited States
| | - Christoph E Schreiner
- UCSF Center for Integrative NeuroscienceUniversity of California, San FranciscoSan FranciscoUnited States
- Coleman Memorial LaboratoryUniversity of California, San FranciscoSan FranciscoUnited States
- Department of Otolaryngology – Head and Neck SurgeryUniversity of California, San FranciscoSan FranciscoUnited States
| |
Collapse
|
29
|
Kuchibhotla K, Bathellier B. Neural encoding of sensory and behavioral complexity in the auditory cortex. Curr Opin Neurobiol 2018; 52:65-71. [PMID: 29709885 DOI: 10.1016/j.conb.2018.04.002] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Revised: 03/01/2018] [Accepted: 04/07/2018] [Indexed: 01/07/2023]
Abstract
Converging evidence now supports the idea that auditory cortex is an important step for the emergence of auditory percepts. Recent studies have extended the list of complex, nonlinear sound features coded by cortical neurons. Moreover, we are beginning to uncover general properties of cortical representations, such as invariance and discreteness, which reflect the structure of auditory perception. Complexity, however, emerges not only through nonlinear shaping of auditory information into perceptual bricks. Behavioral context and task-related information strongly influence cortical encoding of sounds via ascending neuromodulation and descending top-down frontal control. These effects appear to be mediated through local inhibitory networks. Thus, auditory cortex can be seen as a hub linking structured sensory representations with behavioral variables.
Collapse
Affiliation(s)
- Kishore Kuchibhotla
- Department of Psychological and Brain Sciences, Department of Neuroscience, Johns Hopkins University, Baltimore, MD 21218, United States; Laboratoire de Neurosciences Cognitives, INSERM U960, École Normale Supérieure - PSL Research University, Paris, France
| | - Brice Bathellier
- Unité de Neuroscience, Information et Complexité (UNIC), FRE 3693, Centre National de la Recherche Scientifique and Paris-Saclay University, Gif-sur-Yvette, 91198, France.
| |
Collapse
|
30
|
David SV. Incorporating behavioral and sensory context into spectro-temporal models of auditory encoding. Hear Res 2018; 360:107-123. [PMID: 29331232 PMCID: PMC6292525 DOI: 10.1016/j.heares.2017.12.021] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/27/2017] [Revised: 12/18/2017] [Accepted: 12/26/2017] [Indexed: 01/11/2023]
Abstract
For several decades, auditory neuroscientists have used spectro-temporal encoding models to understand how neurons in the auditory system represent sound. Derived from early applications of systems identification tools to the auditory periphery, the spectro-temporal receptive field (STRF) and more sophisticated variants have emerged as an efficient means of characterizing representation throughout the auditory system. Most of these encoding models describe neurons as static sensory filters. However, auditory neural coding is not static. Sensory context, reflecting the acoustic environment, and behavioral context, reflecting the internal state of the listener, can both influence sound-evoked activity, particularly in central auditory areas. This review explores recent efforts to integrate context into spectro-temporal encoding models. It begins with a brief tutorial on the basics of estimating and interpreting STRFs. Then it describes three recent studies that have characterized contextual effects on STRFs, emerging over a range of timescales, from many minutes to tens of milliseconds. An important theme of this work is not simply that context influences auditory coding, but also that contextual effects span a large continuum of internal states. The added complexity of these context-dependent models introduces new experimental and theoretical challenges that must be addressed in order to be used effectively. Several new methodological advances promise to address these limitations and allow the development of more comprehensive context-dependent models in the future.
Collapse
Affiliation(s)
- Stephen V David
- Oregon Hearing Research Center, Oregon Health & Science University, 3181 SW Sam Jackson Park Rd, MC L335A, Portland, OR 97239, United States.
| |
Collapse
|
31
|
Młynarski W, McDermott JH. Learning Midlevel Auditory Codes from Natural Sound Statistics. Neural Comput 2017; 30:631-669. [PMID: 29220308 DOI: 10.1162/neco_a_01048] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Interaction with the world requires an organism to transform sensory signals into representations in which behaviorally meaningful properties of the environment are made explicit. These representations are derived through cascades of neuronal processing stages in which neurons at each stage recode the output of preceding stages. Explanations of sensory coding may thus involve understanding how low-level patterns are combined into more complex structures. To gain insight into such midlevel representations for sound, we designed a hierarchical generative model of natural sounds that learns combinations of spectrotemporal features from natural stimulus statistics. In the first layer, the model forms a sparse convolutional code of spectrograms using a dictionary of learned spectrotemporal kernels. To generalize from specific kernel activation patterns, the second layer encodes patterns of time-varying magnitude of multiple first-layer coefficients. When trained on corpora of speech and environmental sounds, some second-layer units learned to group similar spectrotemporal features. Others instantiate opponency between distinct sets of features. Such groupings might be instantiated by neurons in the auditory cortex, providing a hypothesis for midlevel neuronal computation.
Collapse
|
32
|
Holdgraf CR, Rieger JW, Micheli C, Martin S, Knight RT, Theunissen FE. Encoding and Decoding Models in Cognitive Electrophysiology. Front Syst Neurosci 2017; 11:61. [PMID: 29018336 PMCID: PMC5623038 DOI: 10.3389/fnsys.2017.00061] [Citation(s) in RCA: 69] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Accepted: 08/07/2017] [Indexed: 11/13/2022] Open
Abstract
Cognitive neuroscience has seen rapid growth in the size and complexity of data recorded from the human brain as well as in the computational tools available to analyze this data. This data explosion has resulted in an increased use of multivariate, model-based methods for asking neuroscience questions, allowing scientists to investigate multiple hypotheses with a single dataset, to use complex, time-varying stimuli, and to study the human brain under more naturalistic conditions. These tools come in the form of "Encoding" models, in which stimulus features are used to model brain activity, and "Decoding" models, in which neural features are used to generated a stimulus output. Here we review the current state of encoding and decoding models in cognitive electrophysiology and provide a practical guide toward conducting experiments and analyses in this emerging field. Our examples focus on using linear models in the study of human language and audition. We show how to calculate auditory receptive fields from natural sounds as well as how to decode neural recordings to predict speech. The paper aims to be a useful tutorial to these approaches, and a practical introduction to using machine learning and applied statistics to build models of neural activity. The data analytic approaches we discuss may also be applied to other sensory modalities, motor systems, and cognitive systems, and we cover some examples in these areas. In addition, a collection of Jupyter notebooks is publicly available as a complement to the material covered in this paper, providing code examples and tutorials for predictive modeling in python. The aim is to provide a practical understanding of predictive modeling of human brain data and to propose best-practices in conducting these analyses.
Collapse
Affiliation(s)
- Christopher R. Holdgraf
- Department of Psychology, Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, United States
- Office of the Vice Chancellor for Research, Berkeley Institute for Data Science, University of California, Berkeley, Berkeley, CA, United States
| | - Jochem W. Rieger
- Department of Psychology, Carl-von-Ossietzky University, Oldenburg, Germany
| | - Cristiano Micheli
- Department of Psychology, Carl-von-Ossietzky University, Oldenburg, Germany
- Institut des Sciences Cognitives Marc Jeannerod, Lyon, France
| | - Stephanie Martin
- Department of Psychology, Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, United States
- Defitech Chair in Brain-Machine Interface, Center for Neuroprosthetics, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Robert T. Knight
- Department of Psychology, Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, United States
| | - Frederic E. Theunissen
- Department of Psychology, Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, United States
- Department of Psychology, University of California, Berkeley, Berkeley, CA, United States
| |
Collapse
|
33
|
Casey MA. Music of the 7Ts: Predicting and Decoding Multivoxel fMRI Responses with Acoustic, Schematic, and Categorical Music Features. Front Psychol 2017; 8:1179. [PMID: 28769835 PMCID: PMC5509941 DOI: 10.3389/fpsyg.2017.01179] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2016] [Accepted: 06/28/2017] [Indexed: 11/26/2022] Open
Abstract
Underlying the experience of listening to music are parallel streams of auditory, categorical, and schematic qualia, whose representations and cortical organization remain largely unresolved. We collected high-field (7T) fMRI data in a music listening task, and analyzed the data using multivariate decoding and stimulus-encoding models. Twenty subjects participated in the experiment, which measured BOLD responses evoked by naturalistic listening to twenty-five music clips from five genres. Our first analysis applied machine classification to the multivoxel patterns that were evoked in temporal cortex. Results yielded above-chance levels for both stimulus identification and genre classification–cross-validated by holding out data from multiple of the stimuli during model training and then testing decoding performance on the held-out data. Genre model misclassifications were significantly correlated with those in a corresponding behavioral music categorization task, supporting the hypothesis that geometric properties of multivoxel pattern spaces underlie observed musical behavior. A second analysis employed a spherical searchlight regression analysis which predicted multivoxel pattern responses to music features representing melody and harmony across a large area of cortex. The resulting prediction-accuracy maps yielded significant clusters in the temporal, frontal, parietal, and occipital lobes, as well as in the parahippocampal gyrus and the cerebellum. These maps provide evidence in support of our hypothesis that geometric properties of music cognition are neurally encoded as multivoxel representational spaces. The maps also reveal a cortical topography that differentially encodes categorical and absolute-pitch information in distributed and overlapping networks, with smaller specialized regions that encode tonal music information in relative-pitch representations.
Collapse
Affiliation(s)
- Michael A Casey
- Bregman Music and Audio Lab, Computer Science and Music Departments, Dartmouth CollegeHanover, NH, United States
| |
Collapse
|
34
|
Abstract
Sounds in everyday life seldom appear in isolation. Both humans and machines are constantly flooded with a cacophony of sounds that need to be sorted through and scoured for relevant information-a phenomenon referred to as the 'cocktail party problem'. A key component in parsing acoustic scenes is the role of attention, which mediates perception and behaviour by focusing both sensory and cognitive resources on pertinent information in the stimulus space. The current article provides a review of modelling studies of auditory attention. The review highlights how the term attention refers to a multitude of behavioural and cognitive processes that can shape sensory processing. Attention can be modulated by 'bottom-up' sensory-driven factors, as well as 'top-down' task-specific goals, expectations and learned schemas. Essentially, it acts as a selection process or processes that focus both sensory and cognitive resources on the most relevant events in the soundscape; with relevance being dictated by the stimulus itself (e.g. a loud explosion) or by a task at hand (e.g. listen to announcements in a busy airport). Recent computational models of auditory attention provide key insights into its role in facilitating perception in cluttered auditory scenes.This article is part of the themed issue 'Auditory and visual scene analysis'.
Collapse
Affiliation(s)
- Emine Merve Kaya
- Laboratory for Computational Audio Perception, Department of Electrical and Computer Engineering, The Johns Hopkins University, 3400 N Charles Street, Barton Hall, Baltimore, MD 21218, USA
| | - Mounya Elhilali
- Laboratory for Computational Audio Perception, Department of Electrical and Computer Engineering, The Johns Hopkins University, 3400 N Charles Street, Barton Hall, Baltimore, MD 21218, USA
| |
Collapse
|
35
|
Bach JH, Kollmeier B, Anemüller J. Matching Pursuit Analysis of Auditory Receptive Fields' Spectro-Temporal Properties. Front Syst Neurosci 2017; 11:4. [PMID: 28232791 PMCID: PMC5299023 DOI: 10.3389/fnsys.2017.00004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2016] [Accepted: 01/23/2017] [Indexed: 11/13/2022] Open
Abstract
Gabor filters have long been proposed as models for spectro-temporal receptive fields (STRFs), with their specific spectral and temporal rate of modulation qualitatively replicating characteristics of STRF filters estimated from responses to auditory stimuli in physiological data. The present study builds on the Gabor-STRF model by proposing a methodology to quantitatively decompose STRFs into a set of optimally matched Gabor filters through matching pursuit, and by quantitatively evaluating spectral and temporal characteristics of STRFs in terms of the derived optimal Gabor-parameters. To summarize a neuron's spectro-temporal characteristics, we introduce a measure for the “diagonality,” i.e., the extent to which an STRF exhibits spectro-temporal transients which cannot be factorized into a product of a spectral and a temporal modulation. With this methodology, it is shown that approximately half of 52 analyzed zebra finch STRFs can each be well approximated by a single Gabor or a linear combination of two Gabor filters. Moreover, the dominant Gabor functions tend to be oriented either in the spectral or in the temporal direction, with truly “diagonal” Gabor functions rarely being necessary for reconstruction of an STRF's main characteristics. As a toy example for the applicability of STRF and Gabor-STRF filters to auditory detection tasks, we use STRF filters as features in an automatic event detection task and compare them to idealized Gabor filters and mel-frequency cepstral coefficients (MFCCs). STRFs classify a set of six everyday sounds with an accuracy similar to reference Gabor features (94% recognition rate). Spectro-temporal STRF and Gabor features outperform reference spectral MFCCs in quiet and in low noise conditions (down to 0 dB signal to noise ratio).
Collapse
Affiliation(s)
- Jörg-Hendrik Bach
- Medizinische Physik, Universität OldenburgOldenburg, Germany
- Cluster of Excellence Hearing4all, Universität OldenburgOldenburg, Germany
| | - Birger Kollmeier
- Medizinische Physik, Universität OldenburgOldenburg, Germany
- Cluster of Excellence Hearing4all, Universität OldenburgOldenburg, Germany
| | - Jörn Anemüller
- Medizinische Physik, Universität OldenburgOldenburg, Germany
- Cluster of Excellence Hearing4all, Universität OldenburgOldenburg, Germany
- *Correspondence: Jörn Anemüller
| |
Collapse
|
36
|
Yildiz IB, Mesgarani N, Deneve S. Predictive Ensemble Decoding of Acoustical Features Explains Context-Dependent Receptive Fields. J Neurosci 2016; 36:12338-12350. [PMID: 27927954 PMCID: PMC5148225 DOI: 10.1523/jneurosci.4648-15.2016] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2015] [Revised: 09/18/2016] [Accepted: 09/20/2016] [Indexed: 11/23/2022] Open
Abstract
UNLABELLED A primary goal of auditory neuroscience is to identify the sound features extracted and represented by auditory neurons. Linear encoding models, which describe neural responses as a function of the stimulus, have been primarily used for this purpose. Here, we provide theoretical arguments and experimental evidence in support of an alternative approach, based on decoding the stimulus from the neural response. We used a Bayesian normative approach to predict the responses of neurons detecting relevant auditory features, despite ambiguities and noise. We compared the model predictions to recordings from the primary auditory cortex of ferrets and found that: (1) the decoding filters of auditory neurons resemble the filters learned from the statistics of speech sounds; (2) the decoding model captures the dynamics of responses better than a linear encoding model of similar complexity; and (3) the decoding model accounts for the accuracy with which the stimulus is represented in neural activity, whereas linear encoding model performs very poorly. Most importantly, our model predicts that neuronal responses are fundamentally shaped by "explaining away," a divisive competition between alternative interpretations of the auditory scene. SIGNIFICANCE STATEMENT Neural responses in the auditory cortex are dynamic, nonlinear, and hard to predict. Traditionally, encoding models have been used to describe neural responses as a function of the stimulus. However, in addition to external stimulation, neural activity is strongly modulated by the responses of other neurons in the network. We hypothesized that auditory neurons aim to collectively decode their stimulus. In particular, a stimulus feature that is decoded (or explained away) by one neuron is not explained by another. We demonstrated that this novel Bayesian decoding model is better at capturing the dynamic responses of cortical neurons in ferrets. Whereas the linear encoding model poorly reflects selectivity of neurons, the decoding model can account for the strong nonlinearities observed in neural data.
Collapse
Affiliation(s)
- Izzet B Yildiz
- Group for Neural Theory, Laboratoire de Neurosciences Cognitives, Département d'Etudes Cognitives, Ecole Normale Supérieure, 75005 Paris, France, and
| | - Nima Mesgarani
- Department of Electrical Engineering, Columbia University, New York, New York 10027
| | - Sophie Deneve
- Group for Neural Theory, Laboratoire de Neurosciences Cognitives, Département d'Etudes Cognitives, Ecole Normale Supérieure, 75005 Paris, France, and
| |
Collapse
|
37
|
Cheung C, Hamiton LS, Johnson K, Chang EF. The auditory representation of speech sounds in human motor cortex. eLife 2016; 5. [PMID: 26943778 PMCID: PMC4786411 DOI: 10.7554/elife.12577] [Citation(s) in RCA: 100] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2015] [Accepted: 02/12/2016] [Indexed: 11/13/2022] Open
Abstract
In humans, listening to speech evokes neural responses in the motor cortex. This has been controversially interpreted as evidence that speech sounds are processed as articulatory gestures. However, it is unclear what information is actually encoded by such neural activity. We used high-density direct human cortical recordings while participants spoke and listened to speech sounds. Motor cortex neural patterns during listening were substantially different than during articulation of the same sounds. During listening, we observed neural activity in the superior and inferior regions of ventral motor cortex. During speaking, responses were distributed throughout somatotopic representations of speech articulators in motor cortex. The structure of responses in motor cortex during listening was organized along acoustic features similar to auditory cortex, rather than along articulatory features as during speaking. Motor cortex does not contain articulatory representations of perceived actions in speech, but rather, represents auditory vocal information.
Collapse
Affiliation(s)
- Connie Cheung
- Graduate Program in Bioengineering, University of California, Berkeley-University of California, San Francisco, San Francisco, United States.,Department of Neurological Surgery, University of California, San Francisco, San Francisco, United States.,Center for Integrative Neuroscience, University of California, San Francisco, San Francisco, United States.,Department of Physiology, University of California, San Francisco, San Francisco, United States
| | - Liberty S Hamiton
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, United States.,Center for Integrative Neuroscience, University of California, San Francisco, San Francisco, United States.,Department of Physiology, University of California, San Francisco, San Francisco, United States
| | - Keith Johnson
- Department of Linguistics, University of California, Berkeley, Berkeley, United States
| | - Edward F Chang
- Graduate Program in Bioengineering, University of California, Berkeley-University of California, San Francisco, San Francisco, United States.,Department of Neurological Surgery, University of California, San Francisco, San Francisco, United States.,Center for Integrative Neuroscience, University of California, San Francisco, San Francisco, United States.,Department of Physiology, University of California, San Francisco, San Francisco, United States
| |
Collapse
|
38
|
Willmore BDB, Schoppe O, King AJ, Schnupp JWH, Harper NS. Incorporating Midbrain Adaptation to Mean Sound Level Improves Models of Auditory Cortical Processing. J Neurosci 2016; 36:280-9. [PMID: 26758822 PMCID: PMC4710761 DOI: 10.1523/jneurosci.2441-15.2016] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2015] [Revised: 11/03/2015] [Accepted: 11/10/2015] [Indexed: 11/21/2022] Open
Abstract
Adaptation to stimulus statistics, such as the mean level and contrast of recently heard sounds, has been demonstrated at various levels of the auditory pathway. It allows the nervous system to operate over the wide range of intensities and contrasts found in the natural world. Yet current standard models of the response properties of auditory neurons do not incorporate such adaptation. Here we present a model of neural responses in the ferret auditory cortex (the IC Adaptation model), which takes into account adaptation to mean sound level at a lower level of processing: the inferior colliculus (IC). The model performs high-pass filtering with frequency-dependent time constants on the sound spectrogram, followed by half-wave rectification, and passes the output to a standard linear-nonlinear (LN) model. We find that the IC Adaptation model consistently predicts cortical responses better than the standard LN model for a range of synthetic and natural stimuli. The IC Adaptation model introduces no extra free parameters, so it improves predictions without sacrificing parsimony. Furthermore, the time constants of adaptation in the IC appear to be matched to the statistics of natural sounds, suggesting that neurons in the auditory midbrain predict the mean level of future sounds and adapt their responses appropriately. SIGNIFICANCE STATEMENT An ability to accurately predict how sensory neurons respond to novel stimuli is critical if we are to fully characterize their response properties. Attempts to model these responses have had a distinguished history, but it has proven difficult to improve their predictive power significantly beyond that of simple, mostly linear receptive field models. Here we show that auditory cortex receptive field models benefit from a nonlinear preprocessing stage that replicates known adaptation properties of the auditory midbrain. This improves their predictive power across a wide range of stimuli but keeps model complexity low as it introduces no new free parameters. Incorporating the adaptive coding properties of neurons will likely improve receptive field models in other sensory modalities too.
Collapse
Affiliation(s)
- Ben D B Willmore
- Department of Physiology, Anatomy, and Genetics, University of Oxford, Oxford OX1 3PT, United Kingdom, and
| | - Oliver Schoppe
- Department of Physiology, Anatomy, and Genetics, University of Oxford, Oxford OX1 3PT, United Kingdom, and Bio-Inspired Information Processing, Technische Universität München, 85748 Garching, Germany
| | - Andrew J King
- Department of Physiology, Anatomy, and Genetics, University of Oxford, Oxford OX1 3PT, United Kingdom, and
| | - Jan W H Schnupp
- Department of Physiology, Anatomy, and Genetics, University of Oxford, Oxford OX1 3PT, United Kingdom, and
| | - Nicol S Harper
- Department of Physiology, Anatomy, and Genetics, University of Oxford, Oxford OX1 3PT, United Kingdom, and
| |
Collapse
|
39
|
Thorson IL, Liénard J, David SV. The Essential Complexity of Auditory Receptive Fields. PLoS Comput Biol 2015; 11:e1004628. [PMID: 26683490 PMCID: PMC4684325 DOI: 10.1371/journal.pcbi.1004628] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2015] [Accepted: 10/26/2015] [Indexed: 12/05/2022] Open
Abstract
Encoding properties of sensory neurons are commonly modeled using linear finite impulse response (FIR) filters. For the auditory system, the FIR filter is instantiated in the spectro-temporal receptive field (STRF), often in the framework of the generalized linear model. Despite widespread use of the FIR STRF, numerous formulations for linear filters are possible that require many fewer parameters, potentially permitting more efficient and accurate model estimates. To explore these alternative STRF architectures, we recorded single-unit neural activity from auditory cortex of awake ferrets during presentation of natural sound stimuli. We compared performance of > 1000 linear STRF architectures, evaluating their ability to predict neural responses to a novel natural stimulus. Many were able to outperform the FIR filter. Two basic constraints on the architecture lead to the improved performance: (1) factorization of the STRF matrix into a small number of spectral and temporal filters and (2) low-dimensional parameterization of the factorized filters. The best parameterized model was able to outperform the full FIR filter in both primary and secondary auditory cortex, despite requiring fewer than 30 parameters, about 10% of the number required by the FIR filter. After accounting for noise from finite data sampling, these STRFs were able to explain an average of 40% of A1 response variance. The simpler models permitted more straightforward interpretation of sensory tuning properties. They also showed greater benefit from incorporating nonlinear terms, such as short term plasticity, that provide theoretical advances over the linear model. Architectures that minimize parameter count while maintaining maximum predictive power provide insight into the essential degrees of freedom governing auditory cortical function. They also maximize statistical power available for characterizing additional nonlinear properties that limit current auditory models. Understanding how the brain solves sensory problems can provide useful insight for the development of automated systems such as speech recognizers and image classifiers. Recent developments in nonlinear regression and machine learning have produced powerful algorithms for characterizing the input-output relationship of complex systems. However, the complexity of sensory neural systems, combined with practical limitations on experimental data, make it difficult to apply arbitrarily complex analyses to neural data. In this study we pushed analysis in the opposite direction, toward simpler models. We asked how simple a model can be while still capturing the essential sensory properties of neurons in auditory cortex. We found that substantially simpler formulations of the widely-used spectro-temporal receptive field are able to perform as well as the best current models. These simpler formulations define new basis sets that can be incorporated into state-of-the-art machine learning algorithms for a more exhaustive exploration of sensory processing.
Collapse
Affiliation(s)
- Ivar L. Thorson
- Oregon Hearing Research Center, Oregon Health & Science University, Portland, Oregon, United States of America
| | - Jean Liénard
- Department of Mathematics, Washington State University, Vancouver, Washington, United States of America
| | - Stephen V. David
- Oregon Hearing Research Center, Oregon Health & Science University, Portland, Oregon, United States of America
- * E-mail:
| |
Collapse
|
40
|
Carlin MA, Elhilali M. A Framework for Speech Activity Detection Using Adaptive Auditory Receptive Fields. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 2015; 23:2422-2433. [PMID: 29904642 PMCID: PMC5997283 DOI: 10.1109/taslp.2015.2481179] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
One of the hallmarks of sound processing in the brain is the ability of the nervous system to adapt to changing behavioral demands and surrounding soundscapes. It can dynamically shift sensory and cognitive resources to focus on relevant sounds. Neurophysiological studies indicate that this ability is supported by adaptively retuning the shapes of cortical spectro-temporal receptive fields (STRFs) to enhance features of target sounds while suppressing those of task-irrelevant distractors. Because an important component of human communication is the ability of a listener to dynamically track speech in noisy environments, the solution obtained by auditory neurophysiology implies a useful adaptation strategy for speech activity detection (SAD). SAD is an important first step in a number of automated speech processing systems, and performance is often reduced in highly noisy environments. In this paper, we describe how task-driven adaptation is induced in an ensemble of neurophysiological STRFs, and show how speech-adapted STRFs reorient themselves to enhance spectro-temporal modulations of speech while suppressing those associated with a variety of nonspeech sounds. We then show how an adapted ensemble of STRFs can better detect speech in unseen noisy environments compared to an unadapted ensemble and a noise-robust baseline. Finally, we use a stimulus reconstruction task to demonstrate how the adapted STRF ensemble better captures the spectrotemporal modulations of attended speech in clean and noisy conditions. Our results suggest that a biologically plausible adaptation framework can be applied to speech processing systems to dynamically adapt feature representations for improving noise robustness.
Collapse
Affiliation(s)
- Michael A Carlin
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD 21218 USA
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD 21218 USA
| |
Collapse
|
41
|
Sandler RA, Marmarelis VZ. Understanding spike-triggered covariance using Wiener theory for receptive field identification. J Vis 2015; 15:16. [PMID: 26230978 DOI: 10.1167/15.9.16] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Receptive field identification is a vital problem in sensory neurophysiology and vision. Much research has been done in identifying the receptive fields of nonlinear neurons whose firing rate is determined by the nonlinear interactions of a small number of linear filters. Despite more advanced methods that have been proposed, spike-triggered covariance (STC) continues to be the most widely used method in such situations due to its simplicity and intuitiveness. Although the connection between STC and Wiener/Volterra kernels has often been mentioned in the literature, this relationship has never been explicitly derived. Here we derive this relationship and show that the STC matrix is actually a modified version of the second-order Wiener kernel, which incorporates the input autocorrelation and mixes first- and second-order dynamics. It is then shown how, with little modification of the STC method, the Wiener kernels may be obtained and, from them, the principal dynamic modes, a set of compact and efficient linear filters that essentially combine the spike-triggered average and STC matrix and generalize to systems with both continuous and point-process outputs. Finally, using Wiener theory, we show how these obtained filters may be corrected when they were estimated using correlated inputs. Our correction technique is shown to be superior to those commonly used in the literature for both correlated Gaussian images and natural images.
Collapse
|
42
|
Carlin MA, Elhilali M. Modeling attention-driven plasticity in auditory cortical receptive fields. Front Comput Neurosci 2015; 9:106. [PMID: 26347643 PMCID: PMC4541291 DOI: 10.3389/fncom.2015.00106] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2014] [Accepted: 07/30/2015] [Indexed: 11/24/2022] Open
Abstract
To navigate complex acoustic environments, listeners adapt neural processes to focus on behaviorally relevant sounds in the acoustic foreground while minimizing the impact of distractors in the background, an ability referred to as top-down selective attention. Particularly striking examples of attention-driven plasticity have been reported in primary auditory cortex via dynamic reshaping of spectro-temporal receptive fields (STRFs). By enhancing the neural response to features of the foreground while suppressing those to the background, STRFs can act as adaptive contrast matched filters that directly contribute to an improved cognitive segregation between behaviorally relevant and irrelevant sounds. In this study, we propose a novel discriminative framework for modeling attention-driven plasticity of STRFs in primary auditory cortex. The model describes a general strategy for cortical plasticity via an optimization that maximizes discriminability between the foreground and distractors while maintaining a degree of stability in the cortical representation. The first instantiation of the model describes a form of feature-based attention and yields STRF adaptation patterns consistent with a contrast matched filter previously reported in neurophysiological studies. An extension of the model captures a form of object-based attention, where top-down signals act on an abstracted representation of the sensory input characterized in the modulation domain. The object-based model makes explicit predictions in line with limited neurophysiological data currently available but can be readily evaluated experimentally. Finally, we draw parallels between the model and anatomical circuits reported to be engaged during active attention. The proposed model strongly suggests an interpretation of attention-driven plasticity as a discriminative adaptation operating at the level of sensory cortex, in line with similar strategies previously described across different sensory modalities.
Collapse
Affiliation(s)
- Michael A Carlin
- Laboratory for Computational Audio Perception, Department of Electrical and Computer Engineering, Johns Hopkins University Baltimore, MD, USA
| | - Mounya Elhilali
- Laboratory for Computational Audio Perception, Department of Electrical and Computer Engineering, Johns Hopkins University Baltimore, MD, USA
| |
Collapse
|
43
|
Clemens J, Rau F, Hennig RM, Hildebrandt KJ. Context-dependent coding and gain control in the auditory system of crickets. Eur J Neurosci 2015; 42:2390-406. [PMID: 26179973 DOI: 10.1111/ejn.13019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2014] [Revised: 07/07/2015] [Accepted: 07/08/2015] [Indexed: 11/29/2022]
Abstract
Sensory systems process stimuli that greatly vary in intensity and complexity. To maintain efficient information transmission, neural systems need to adjust their properties to these different sensory contexts, yielding adaptive or stimulus-dependent codes. Here, we demonstrated adaptive spectrotemporal tuning in a small neural network, i.e. the peripheral auditory system of the cricket. We found that tuning of cricket auditory neurons was sharper for complex multi-band than for simple single-band stimuli. Information theoretical considerations revealed that this sharpening improved information transmission by separating the neural representations of individual stimulus components. A network model inspired by the structure of the cricket auditory system suggested two putative mechanisms underlying this adaptive tuning: a saturating peripheral nonlinearity could change the spectral tuning, whereas broad feed-forward inhibition was able to reproduce the observed adaptive sharpening of temporal tuning. Our study revealed a surprisingly dynamic code usually found in more complex nervous systems and suggested that stimulus-dependent codes could be implemented using common neural computations.
Collapse
Affiliation(s)
- Jan Clemens
- Behavioral Physiology Group, Department of Biology, Humboldt-Universität zu Berlin, Berlin, Germany.,Bernstein Center for Computational Neuroscience Berlin, Berlin, Germany.,Princeton Neuroscience Institute, Princeton University, Washington Road, Princeton, NJ 08540, USA
| | - Florian Rau
- Behavioral Physiology Group, Department of Biology, Humboldt-Universität zu Berlin, Berlin, Germany
| | - R Matthias Hennig
- Behavioral Physiology Group, Department of Biology, Humboldt-Universität zu Berlin, Berlin, Germany
| | - K Jannis Hildebrandt
- Cluster of Excellence 'Hearing4all', Department for Neuroscience, University of Oldenburg, Oldenburg, Germany.,Research Center Neurosensory Science, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
44
|
Bibikov NG. Some features of the sound-signal envelope extracted by cochlear nucleus neurons in grass frog. Biophysics (Nagoya-shi) 2015. [DOI: 10.1134/s0006350915030045] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
|
45
|
Lynch EP, Houghton CJ. Parameter estimation of neuron models using in-vitro and in-vivo electrophysiological data. Front Neuroinform 2015; 9:10. [PMID: 25941485 PMCID: PMC4403314 DOI: 10.3389/fninf.2015.00010] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2014] [Accepted: 03/27/2015] [Indexed: 11/30/2022] Open
Abstract
Spiking neuron models can accurately predict the response of neurons to somatically injected currents if the model parameters are carefully tuned. Predicting the response of in-vivo neurons responding to natural stimuli presents a far more challenging modeling problem. In this study, an algorithm is presented for parameter estimation of spiking neuron models. The algorithm is a hybrid evolutionary algorithm which uses a spike train metric as a fitness function. We apply this to parameter discovery in modeling two experimental data sets with spiking neurons; in-vitro current injection responses from a regular spiking pyramidal neuron are modeled using spiking neurons and in-vivo extracellular auditory data is modeled using a two stage model consisting of a stimulus filter and spiking neuron model.
Collapse
Affiliation(s)
- Eoin P Lynch
- School of Mathematics, Trinity College Dublin Dublin, Ireland ; Department of Computer Science, University of Bristol Bristol, UK
| | - Conor J Houghton
- Department of Computer Science, University of Bristol Bristol, UK
| |
Collapse
|
46
|
Lindeberg T, Friberg A. Idealized computational models for auditory receptive fields. PLoS One 2015; 10:e0119032. [PMID: 25822973 PMCID: PMC4379182 DOI: 10.1371/journal.pone.0119032] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2014] [Accepted: 01/24/2015] [Indexed: 11/19/2022] Open
Abstract
We present a theory by which idealized models of auditory receptive fields can be derived in a principled axiomatic manner, from a set of structural properties to (i) enable invariance of receptive field responses under natural sound transformations and (ii) ensure internal consistency between spectro-temporal receptive fields at different temporal and spectral scales. For defining a time-frequency transformation of a purely temporal sound signal, it is shown that the framework allows for a new way of deriving the Gabor and Gammatone filters as well as a novel family of generalized Gammatone filters, with additional degrees of freedom to obtain different trade-offs between the spectral selectivity and the temporal delay of time-causal temporal window functions. When applied to the definition of a second-layer of receptive fields from a spectrogram, it is shown that the framework leads to two canonical families of spectro-temporal receptive fields, in terms of spectro-temporal derivatives of either spectro-temporal Gaussian kernels for non-causal time or a cascade of time-causal first-order integrators over the temporal domain and a Gaussian filter over the logspectral domain. For each filter family, the spectro-temporal receptive fields can be either separable over the time-frequency domain or be adapted to local glissando transformations that represent variations in logarithmic frequencies over time. Within each domain of either non-causal or time-causal time, these receptive field families are derived by uniqueness from the assumptions. It is demonstrated how the presented framework allows for computation of basic auditory features for audio processing and that it leads to predictions about auditory receptive fields with good qualitative similarity to biological receptive fields measured in the inferior colliculus (ICC) and primary auditory cortex (A1) of mammals.
Collapse
Affiliation(s)
- Tony Lindeberg
- Department of Computational Biology, School of Computer Science and Communication, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Anders Friberg
- Department of Speech, Music and Hearing, School of Computer Science and Communication, KTH Royal Institute of Technology, Stockholm, Sweden
| |
Collapse
|
47
|
Spectrotemporal response properties of core auditory cortex neurons in awake monkey. PLoS One 2015; 10:e0116118. [PMID: 25680187 PMCID: PMC4332665 DOI: 10.1371/journal.pone.0116118] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2014] [Accepted: 12/03/2014] [Indexed: 11/19/2022] Open
Abstract
So far, most studies of core auditory cortex (AC) have characterized the spectral and temporal tuning properties of cells in non-awake, anesthetized preparations. As experiments in awake animals are scarce, we here used dynamic spectral-temporal broadband ripples to study the properties of the spectrotemporal receptive fields (STRFs) of AC cells in awake monkeys. We show that AC neurons were typically most sensitive to low ripple densities (spectral) and low velocities (temporal), and that most cells were not selective for a particular spectrotemporal sweep direction. A substantial proportion of neurons preferred amplitude-modulated sounds (at zero ripple density) to dynamic ripples (at non-zero densities). The vast majority (>93%) of modulation transfer functions were separable with respect to spectral and temporal modulations, indicating that time and spectrum are independently processed in AC neurons. We also analyzed the linear predictability of AC responses to natural vocalizations on the basis of the STRF. We discuss our findings in the light of results obtained from the monkey midbrain inferior colliculus by comparing the spectrotemporal tuning properties and linear predictability of these two important auditory stages.
Collapse
|
48
|
Meyer AF, Diepenbrock JP, Ohl FW, Anemüller J. Temporal variability of spectro-temporal receptive fields in the anesthetized auditory cortex. Front Comput Neurosci 2014; 8:165. [PMID: 25566049 PMCID: PMC4274980 DOI: 10.3389/fncom.2014.00165] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2014] [Accepted: 11/30/2014] [Indexed: 11/13/2022] Open
Abstract
Temporal variability of neuronal response characteristics during sensory stimulation is a ubiquitous phenomenon that may reflect processes such as stimulus-driven adaptation, top-down modulation or spontaneous fluctuations. It poses a challenge to functional characterization methods such as the receptive field, since these often assume stationarity. We propose a novel method for estimation of sensory neurons' receptive fields that extends the classic static linear receptive field model to the time-varying case. Here, the long-term estimate of the static receptive field serves as the mean of a probabilistic prior distribution from which the short-term temporally localized receptive field may deviate stochastically with time-varying standard deviation. The derived corresponding generalized linear model permits robust characterization of temporal variability in receptive field structure also for highly non-Gaussian stimulus ensembles. We computed and analyzed short-term auditory spectro-temporal receptive field (STRF) estimates with characteristic temporal resolution 5-30 s based on model simulations and responses from in total 60 single-unit recordings in anesthetized Mongolian gerbil auditory midbrain and cortex. Stimulation was performed with short (100 ms) overlapping frequency-modulated tones. Results demonstrate identification of time-varying STRFs, with obtained predictive model likelihoods exceeding those from baseline static STRF estimation. Quantitative characterization of STRF variability reveals a higher degree thereof in auditory cortex compared to midbrain. Cluster analysis indicates that significant deviations from the long-term static STRF are brief, but reliably estimated. We hypothesize that the observed variability more likely reflects spontaneous or state-dependent internal fluctuations that interact with stimulus-induced processing, rather than experimental or stimulus design.
Collapse
Affiliation(s)
- Arne F Meyer
- Medizinische Physik and Cluster of Excellence Hearing4all, Department of Medical Physics and Acoustics, Carl von Ossietzky University Oldenburg, Germany
| | - Jan-Philipp Diepenbrock
- Department of Systems Physiology of Learning, Leibniz Institute for Neurobiology Magdeburg, Germany
| | - Frank W Ohl
- Department of Systems Physiology of Learning, Leibniz Institute for Neurobiology Magdeburg, Germany ; Department of Neuroprosthetics, Institute of Biology, Otto-von-Guericke University Magdeburg, Germany
| | - Jörn Anemüller
- Medizinische Physik and Cluster of Excellence Hearing4all, Department of Medical Physics and Acoustics, Carl von Ossietzky University Oldenburg, Germany
| |
Collapse
|
49
|
Lazar AA, Slutskiy YB. Channel identification machines for multidimensional receptive fields. Front Comput Neurosci 2014; 8:117. [PMID: 25309413 PMCID: PMC4176398 DOI: 10.3389/fncom.2014.00117] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Accepted: 08/31/2014] [Indexed: 12/04/2022] Open
Abstract
We present algorithms for identifying multidimensional receptive fields directly from spike trains produced by biophysically-grounded neuron models. We demonstrate that only the projection of a receptive field onto the input stimulus space may be perfectly identified and derive conditions under which this identification is possible. We also provide detailed examples of identification of neural circuits incorporating spatiotemporal and spectrotemporal receptive fields.
Collapse
Affiliation(s)
- Aurel A Lazar
- Bionet Group, Department of Electrical Engineering, Columbia University in the City of New York New York, NY, USA
| | - Yevgeniy B Slutskiy
- Bionet Group, Department of Electrical Engineering, Columbia University in the City of New York New York, NY, USA
| |
Collapse
|
50
|
Online stimulus optimization rapidly reveals multidimensional selectivity in auditory cortical neurons. J Neurosci 2014; 34:8963-75. [PMID: 24990917 DOI: 10.1523/jneurosci.0260-14.2014] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Neurons in sensory brain regions shape our perception of the surrounding environment through two parallel operations: decomposition and integration. For example, auditory neurons decompose sounds by separately encoding their frequency, temporal modulation, intensity, and spatial location. Neurons also integrate across these various features to support a unified perceptual gestalt of an auditory object. At higher levels of a sensory pathway, neurons may select for a restricted region of feature space defined by the intersection of multiple, independent stimulus dimensions. To further characterize how auditory cortical neurons decompose and integrate multiple facets of an isolated sound, we developed an automated procedure that manipulated five fundamental acoustic properties in real time based on single-unit feedback in awake mice. Within several minutes, the online approach converged on regions of the multidimensional stimulus manifold that reliably drove neurons at significantly higher rates than predefined stimuli. Optimized stimuli were cross-validated against pure tone receptive fields and spectrotemporal receptive field estimates in the inferior colliculus and primary auditory cortex. We observed, from midbrain to cortex, increases in both level invariance and frequency selectivity, which may underlie equivalent sparseness of responses in the two areas. We found that onset and steady-state spike rates increased proportionately as the stimulus was tailored to the multidimensional receptive field. By separately evaluating the amount of leverage each sound feature exerted on the overall firing rate, these findings reveal interdependencies between stimulus features as well as hierarchical shifts in selectivity and invariance that may go unnoticed with traditional approaches.
Collapse
|