1
|
Baek S, Lee J. Snn and sound: a comprehensive review of spiking neural networks in sound. Biomed Eng Lett 2024; 14:981-991. [PMID: 39220030 PMCID: PMC11362401 DOI: 10.1007/s13534-024-00406-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 05/08/2024] [Accepted: 06/24/2024] [Indexed: 09/04/2024] Open
Abstract
The rapid advancement of AI and machine learning has significantly enhanced sound and acoustic recognition technologies, moving beyond traditional models to more sophisticated neural network-based methods. Among these, Spiking Neural Networks (SNNs) are particularly noteworthy. SNNs mimic biological neurons and operate on principles similar to the human brain, using analog computing mechanisms. This capability allows for efficient sound processing with low power consumption and minimal latency, ideal for real-time applications in embedded systems. This paper reviews recent developments in SNNs for sound recognition, underscoring their potential to overcome the limitations of digital computing and suggesting directions for future research. The unique attributes of SNNs could lead to breakthroughs in mimicking human auditory processing more closely.
Collapse
Affiliation(s)
- Suwhan Baek
- AI R &D Laboratory, Posco-Holdings, Cheongam-ro, Pohang-si, Gyeongsangbuk-do 37673 Korea
- Department of Computer Science, Kwangwoon University, Gwangun-ro, Nowon-gu, Seoul, 01899 Republic of Korea
| | - Jaewon Lee
- Department of Psychology, Seoul National University, Gwanak-ro, Gwanak-gu, Seoul, 08826 Republic of Korea
| |
Collapse
|
2
|
Generative Adversarial Network for Musical Notation Recognition during Music Teaching. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:8724688. [PMID: 35712062 PMCID: PMC9197657 DOI: 10.1155/2022/8724688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/16/2022] [Accepted: 05/14/2022] [Indexed: 11/17/2022]
Abstract
In order to improve the quality and efficiency of music teaching, we try to automate the teaching of music notation. With the addition of computer vision technology and note recognition algorithms, we improve the generative adversarial network to enhance the recognition accuracy and efficiency of music short scores. We adopt an embedded matching structure based on adversarial neural networks, starting from generators and discriminators, respectively, to unify generators and discriminators from the note input side. Each network layer is then laid out according to a cascade structure to preserve the different layers of note features in each convolutional layer. Residual blocks are then inserted in some network layers to break the symmetry of the network structure and enhance the ability of the adversarial network to acquire note features. To verify the efficiency of our method, we select monophonic spectrum, polyphonic spectrum, and miscellaneous spectrum datasets for validation. The experimental results demonstrate that our method has the best recognition accuracy in the monophonic spectrum and the miscellaneous spectrum, which is better than the machine learning method. In the recognition efficiency of note detail information, our method is more efficient in recognition and outperforms other deep learning methods.
Collapse
|
3
|
Analysis of Two-Piano Teaching Assistant Training Based on Neural Network Model Sound Sequence Recognition. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:5768291. [PMID: 35694593 PMCID: PMC9184186 DOI: 10.1155/2022/5768291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Revised: 05/13/2022] [Accepted: 05/20/2022] [Indexed: 11/21/2022]
Abstract
In today's society, with the gradual improve5ment of material living standards, people are also more in pursuit of their own spiritual enjoyment. The study of piano has gradually become a way for people to enrich their spiritual life, and more and more people attach importance to it. In the field of piano teaching, the two-piano method is a unique form of playing the piano. In order to solve the problem that the recognition accuracy of the sequence of two pianos is seriously reduced in the environment of noise and reverberation, this paper proposes an auxiliary training analysis system based on the neural network model. Firstly, in order to learn the nonlinear relationship between the sound order and the target task label from the massive data, a multitask preprocessing method combining speech enhancement and detection is used to supervise the deep neural network training. Then, convolutional neural network is used to construct the end-to-end recognition system, and the initial recognition results are checked and corrected by the phonological sequence model. Finally, the sequence recognition is carried out under the condition of noise, and the articulation is improved by speech enhancement front-end module, and then the sequence recognition model is used for recognition. Compared with traditional training methods, it is proved that our method is effective in improving the training efficiency and performance quality of players. At the same time, this method breaks through the limitation of traditional training method of double piano, creates a more scientific training means, and realizes the practice and application of artificial intelligence technology in the teaching of double piano.
Collapse
|
4
|
Yu Q, Song S, Ma C, Pan L, Tan KC. Synaptic Learning With Augmented Spikes. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:1134-1146. [PMID: 33471768 DOI: 10.1109/tnnls.2020.3040969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Traditional neuron models use analog values for information representation and computation, while all-or-nothing spikes are employed in the spiking ones. With a more brain-like processing paradigm, spiking neurons are more promising for improvements in efficiency and computational capability. They extend the computation of traditional neurons with an additional dimension of time carried by all-or-nothing spikes. Could one benefit from both the accuracy of analog values and the time-processing capability of spikes? In this article, we introduce a concept of augmented spikes to carry complementary information with spike coefficients in addition to spike latencies. New augmented spiking neuron model and synaptic learning rules are proposed to process and learn patterns of augmented spikes. We provide systematic insights into the properties and characteristics of our methods, including classification of augmented spike patterns, learning capacity, construction of causality, feature detection, robustness, and applicability to practical tasks, such as acoustic and visual pattern recognition. Our augmented approaches show several advanced learning properties and reliably outperform the baseline ones that use typical all-or-nothing spikes. Our approaches significantly improve the accuracies of a temporal-based approach on sound and MNIST recognition tasks to 99.38% and 97.90%, respectively, highlighting the effectiveness and potential merits of our methods. More importantly, our augmented approaches are versatile and can be easily generalized to other spike-based systems, contributing to a potential development for them, including neuromorphic computing.
Collapse
|
5
|
Yan Q, Zheng Y, Jia S, Zhang Y, Yu Z, Chen F, Tian Y, Huang T, Liu JK. Revealing Fine Structures of the Retinal Receptive Field by Deep-Learning Networks. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:39-50. [PMID: 32167923 DOI: 10.1109/tcyb.2020.2972983] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Deep convolutional neural networks (CNNs) have demonstrated impressive performance on many visual tasks. Recently, they became useful models for the visual system in neuroscience. However, it is still not clear what is learned by CNNs in terms of neuronal circuits. When a deep CNN with many layers is used for the visual system, it is not easy to compare the structure components of CNNs with possible neuroscience underpinnings due to highly complex circuits from the retina to the higher visual cortex. Here, we address this issue by focusing on single retinal ganglion cells with biophysical models and recording data from animals. By training CNNs with white noise images to predict neuronal responses, we found that fine structures of the retinal receptive field can be revealed. Specifically, convolutional filters learned are resembling biological components of the retinal circuit. This suggests that a CNN learning from one single retinal cell reveals a minimal neural network carried out in this cell. Furthermore, when CNNs learned from different cells are transferred between cells, there is a diversity of transfer learning performance, which indicates that CNNs are cell specific. Moreover, when CNNs are transferred between different types of input images, here white noise versus natural images, transfer learning shows a good performance, which implies that CNNs indeed capture the full computational ability of a single retinal cell for different inputs. Taken together, these results suggest that CNNs could be used to reveal structure components of neuronal circuits, and provide a powerful model for neural system identification.
Collapse
|
6
|
Song S, Ma C, Sun W, Xu J, Dang J, Yu Q. Efficient learning with augmented spikes: A case study with image classification. Neural Netw 2021; 142:205-212. [PMID: 34023641 DOI: 10.1016/j.neunet.2021.05.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Revised: 02/15/2021] [Accepted: 05/06/2021] [Indexed: 10/21/2022]
Abstract
Efficient learning of spikes plays a valuable role in training spiking neural networks (SNNs) to have desired responses to input stimuli. However, current learning rules are limited to a binary form of spikes. The seemingly ubiquitous phenomenon of burst in nervous systems suggests a new way to carry more information with spike bursts in addition to times. Based on this, we introduce an advanced form, the augmented spikes, where spike coefficients are used to carry additional information. How could neurons learn and benefit from augmented spikes remains unclear. In this paper, we propose two new efficient learning rules to process spatiotemporal patterns composed of augmented spikes. Moreover, we examine the learning abilities of our methods with a synthetic recognition task of augmented spike patterns and two practical ones for image classification. Experimental results demonstrate that our rules are capable of extracting information carried by both the timing and coefficient of spikes. Our proposed approaches achieve remarkable performance and good robustness under various noise conditions, as compared to benchmarks. The improved performance indicates the merits of augmented spikes and our learning rules, which could be beneficial and generalized to a broad range of spike-based platforms.
Collapse
Affiliation(s)
- Shiming Song
- Tianjin Key Laboratory of Cognitive Computing and Application, College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China
| | - Chenxiang Ma
- Tianjin Key Laboratory of Cognitive Computing and Application, College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China
| | - Wei Sun
- Tianjin Key Laboratory of Cognitive Computing and Application, College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China
| | - Junhai Xu
- Tianjin Key Laboratory of Cognitive Computing and Application, College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China
| | - Jianwu Dang
- Tianjin Key Laboratory of Cognitive Computing and Application, College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China
| | - Qiang Yu
- Tianjin Key Laboratory of Cognitive Computing and Application, College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| |
Collapse
|
7
|
Yu Q, Yao Y, Wang L, Tang H, Dang J, Tan KC. Robust Environmental Sound Recognition With Sparse Key-Point Encoding and Efficient Multispike Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:625-638. [PMID: 32203038 DOI: 10.1109/tnnls.2020.2978764] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The capability for environmental sound recognition (ESR) can determine the fitness of individuals in a way to avoid dangers or pursue opportunities when critical sound events occur. It still remains mysterious about the fundamental principles of biological systems that result in such a remarkable ability. Additionally, the practical importance of ESR has attracted an increasing amount of research attention, but the chaotic and nonstationary difficulties continue to make it a challenging task. In this article, we propose a spike-based framework from a more brain-like perspective for the ESR task. Our framework is a unifying system with consistent integration of three major functional parts which are sparse encoding, efficient learning, and robust readout. We first introduce a simple sparse encoding, where key points are used for feature representation, and demonstrate its generalization to both spike- and nonspike-based systems. Then, we evaluate the learning properties of different learning rules in detail with our contributions being added for improvements. Our results highlight the advantages of multispike learning, providing a selection reference for various spike-based developments. Finally, we combine the multispike readout with the other parts to form a system for ESR. Experimental results show that our framework performs the best as compared to other baseline approaches. In addition, we show that our spike-based framework has several advantageous characteristics including early decision making, small dataset acquiring, and ongoing dynamic processing. Our framework is the first attempt to apply the multispike characteristic of nervous neurons to ESR. The outstanding performance of our approach would potentially contribute to draw more research efforts to push the boundaries of spike-based paradigm to a new horizon.
Collapse
|