Ham J, Yoo HJ, Kim J, Lee B. Vowel speech recognition from rat electroencephalography using long short-term memory neural network.
PLoS One 2022;
17:e0270405. [PMID:
35737731 PMCID:
PMC9223328 DOI:
10.1371/journal.pone.0270405]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Accepted: 06/09/2022] [Indexed: 11/24/2022] Open
Abstract
Over the years, considerable research has been conducted to investigate the mechanisms of speech perception and recognition. Electroencephalography (EEG) is a powerful tool for identifying brain activity; therefore, it has been widely used to determine the neural basis of speech recognition. In particular, for the classification of speech recognition, deep learning-based approaches are in the spotlight because they can automatically learn and extract representative features through end-to-end learning. This study aimed to identify particular components that are potentially related to phoneme representation in the rat brain and to discriminate brain activity for each vowel stimulus on a single-trial basis using a bidirectional long short-term memory (BiLSTM) network and classical machine learning methods. Nineteen male Sprague-Dawley rats subjected to microelectrode implantation surgery to record EEG signals from the bilateral anterior auditory fields were used. Five different vowel speech stimuli were chosen, /a/, /e/, /i/, /o/, and /u/, which have highly different formant frequencies. EEG recorded under randomly given vowel stimuli was minimally preprocessed and normalized by a z-score transformation to be used as input for the classification of speech recognition. The BiLSTM network showed the best performance among the classifiers by achieving an overall accuracy, f1-score, and Cohen’s κ values of 75.18%, 0.75, and 0.68, respectively, using a 10-fold cross-validation approach. These results indicate that LSTM layers can effectively model sequential data, such as EEG; hence, informative features can be derived through BiLSTM trained with end-to-end learning without any additional hand-crafted feature extraction methods.
Collapse