Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Khatami F, Escabí MA. Spiking network optimized for word recognition in noise predicts auditory system hierarchy. PLoS Comput Biol 2020;16:e1007558. [PMID: 32559204 DOI: 10.1371/journal.pcbi.1007558] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 07/01/2020] [Accepted: 11/22/2019] [Indexed: 11/21/2022] Open

For:	Khatami F, Escabí MA. Spiking network optimized for word recognition in noise predicts auditory system hierarchy. PLoS Comput Biol 2020;16:e1007558. [PMID: 32559204 DOI: 10.1371/journal.pcbi.1007558] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 07/01/2020] [Accepted: 11/22/2019] [Indexed: 11/21/2022] Open

Number

Cited by Other Article(s)

Saddler MR, McDermott JH. Models optimized for real-world tasks reveal the necessity of precise temporal coding in hearing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.21.590435. [PMID: 38712054 PMCID: PMC11071365 DOI: 10.1101/2024.04.21.590435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]

Tuckute G, Feather J, Boebinger D, McDermott JH. Many but not all deep neural network audio models capture brain responses and exhibit correspondence between model stages and brain regions. PLoS Biol 2023;21:e3002366. [PMID: 38091351 PMCID: PMC10718467 DOI: 10.1371/journal.pbio.3002366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 10/06/2023] [Indexed: 12/18/2023] Open

Deng B, Fan Y, Wang J, Yang S. Auditory perception architecture with spiking neural network and implementation on FPGA. Neural Netw 2023;165:31-42. [PMID: 37276809 DOI: 10.1016/j.neunet.2023.05.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 05/07/2023] [Accepted: 05/16/2023] [Indexed: 06/07/2023]

Beguš G, Zhou A, Zhao TC. Encoding of speech in convolutional layers and the brain stem based on language experience. Sci Rep 2023;13:6480. [PMID: 37081119 PMCID: PMC10119295 DOI: 10.1038/s41598-023-33384-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 04/12/2023] [Indexed: 04/22/2023] Open

Abstract

Comparing artificial neural networks with outputs of neuroimaging techniques has recently seen substantial advances in (computer) vision and text-based language models. Here, we propose a framework to compare biological and artificial neural computations of spoken language representations and propose several new challenges to this paradigm. The proposed technique is based on a similar principle that underlies electroencephalography (EEG): averaging of neural (artificial or biological) activity across neurons in the time domain, and allows to compare encoding of any acoustic property in the brain and in intermediate convolutional layers of an artificial neural network. Our approach allows a direct comparison of responses to a phonetic property in the brain and in deep neural networks that requires no linear transformations between the signals. We argue that the brain stem response (cABR) and the response in intermediate convolutional layers to the exact same stimulus are highly similar without applying any transformations, and we quantify this observation. The proposed technique not only reveals similarities, but also allows for analysis of the encoding of actual acoustic properties in the two signals: we compare peak latency (i) in cABR relative to the stimulus in the brain stem and in (ii) intermediate convolutional layers relative to the input/output in deep convolutional networks. We also examine and compare the effect of prior language exposure on the peak latency in cABR and in intermediate convolutional layers. Substantial similarities in peak latency encoding between the human brain and intermediate convolutional networks emerge based on results from eight trained networks (including a replication experiment). The proposed technique can be used to compare encoding between the human brain and intermediate convolutional layers for any acoustic property and for other neuroimaging techniques.

Collapse

Sadagopan S, Kar M, Parida S. Quantitative models of auditory cortical processing. Hear Res 2023;429:108697. [PMID: 36696724 PMCID: PMC9928778 DOI: 10.1016/j.heares.2023.108697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 12/17/2022] [Accepted: 01/12/2023] [Indexed: 01/15/2023]

He F, Stevenson IH, Escabí MA. Two stages of bandwidth scaling drives efficient neural coding of natural sounds. PLoS Comput Biol 2023;19:e1010862. [PMID: 36787338 PMCID: PMC9970106 DOI: 10.1371/journal.pcbi.1010862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 02/27/2023] [Accepted: 01/09/2023] [Indexed: 02/15/2023] Open

Smith SS, Sollini J, Akeroyd MA. Inferring the basis of binaural detection with a modified autoencoder. Front Neurosci 2023;17:1000079. [PMID: 36777633 PMCID: PMC9909603 DOI: 10.3389/fnins.2023.1000079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 01/02/2023] [Indexed: 01/28/2023] Open

Norman-Haignere SV, Long LK, Devinsky O, Doyle W, Irobunda I, Merricks EM, Feldstein NA, McKhann GM, Schevon CA, Flinker A, Mesgarani N. Multiscale temporal integration organizes hierarchical computation in human auditory cortex. Nat Hum Behav 2022;6:455-469. [PMID: 35145280 PMCID: PMC8957490 DOI: 10.1038/s41562-021-01261-y] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 11/18/2021] [Indexed: 01/11/2023]

Keshishian M, Norman-Haignere SV, Mesgarani N. Understanding Adaptive, Multiscale Temporal Integration In Deep Speech Recognition Systems. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 2021;34:24455-24467. [PMID: 38737583 PMCID: PMC11087060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/14/2024]

Abstract

Natural signals such as speech are hierarchically structured across many different timescales, spanning tens (e.g., phonemes) to hundreds (e.g., words) of milliseconds, each of which is highly variable and context-dependent. While deep neural networks (DNNs) excel at recognizing complex patterns from natural signals, relatively little is known about how DNNs flexibly integrate across multiple timescales. Here, we show how a recently developed method for studying temporal integration in biological neural systems - the temporal context invariance (TCI) paradigm - can be used to understand temporal integration in DNNs. The method is simple: we measure responses to a large number of stimulus segments presented in two different contexts and estimate the smallest segment duration needed to achieve a context invariant response. We applied our method to understand how the popular DeepSpeech2 model learns to integrate across time in speech. We find that nearly all of the model units, even in recurrent layers, have a compact integration window within which stimuli substantially alter the response and outside of which stimuli have little effect. We show that training causes these integration windows to shrink at early layers and expand at higher layers, creating a hierarchy of integration windows across the network. Moreover, by measuring integration windows for time-stretched/compressed speech, we reveal a transition point, midway through the trained network, where integration windows become yoked to the duration of stimulus structures (e.g., phonemes or words) rather than absolute time. Similar phenomena were observed in a purely recurrent and purely convolutional network although structure-yoked integration was more prominent in the recurrent network. These findings suggest that deep speech recognition systems use a common motif to encode the hierarchical structure of speech: integrating across short, time-yoked windows at early layers and long, structure-yoked windows at later layers. Our method provides a straightforward and general-purpose toolkit for understanding temporal integration in black-box machine learning models.

Collapse

Deng B, Fan Y, Wang J, Yang S. Reconstruction of a Fully Paralleled Auditory Spiking Neural Network and FPGA Implementation. IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS 2021;15:1320-1331. [PMID: 34699367 DOI: 10.1109/tbcas.2021.3122549] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]