1
|
Zhang W, Jiang M, Teo KAC, Bhuvanakantham R, Fong L, Sim WKJ, Guo Z, Foo CHV, Chua RHJ, Padmanabhan P, Leong V, Lu J, Gulyás B, Guan C. Revealing the spatiotemporal brain dynamics of covert speech compared with overt speech: A simultaneous EEG-fMRI study. Neuroimage 2024; 293:120629. [PMID: 38697588 DOI: 10.1016/j.neuroimage.2024.120629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 04/17/2024] [Accepted: 04/29/2024] [Indexed: 05/05/2024] Open
Abstract
Covert speech (CS) refers to speaking internally to oneself without producing any sound or movement. CS is involved in multiple cognitive functions and disorders. Reconstructing CS content by brain-computer interface (BCI) is also an emerging technique. However, it is still controversial whether CS is a truncated neural process of overt speech (OS) or involves independent patterns. Here, we performed a word-speaking experiment with simultaneous EEG-fMRI. It involved 32 participants, who generated words both overtly and covertly. By integrating spatial constraints from fMRI into EEG source localization, we precisely estimated the spatiotemporal dynamics of neural activity. During CS, EEG source activity was localized in three regions: the left precentral gyrus, the left supplementary motor area, and the left putamen. Although OS involved more brain regions with stronger activations, CS was characterized by an earlier event-locked activation in the left putamen (peak at 262 ms versus 1170 ms). The left putamen was also identified as the only hub node within the functional connectivity (FC) networks of both OS and CS, while showing weaker FC strength towards speech-related regions in the dominant hemisphere during CS. Path analysis revealed significant multivariate associations, indicating an indirect association between the earlier activation in the left putamen and CS, which was mediated by reduced FC towards speech-related regions. These findings revealed the specific spatiotemporal dynamics of CS, offering insights into CS mechanisms that are potentially relevant for future treatment of self-regulation deficits, speech disorders, and development of BCI speech applications.
Collapse
Affiliation(s)
- Wei Zhang
- Cognitive Neuroimaging Centre, Nanyang Technological University, Singapore; Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore
| | - Muyun Jiang
- School of Computer Science and Engineering, Nanyang Technological University, Singapore
| | - Kok Ann Colin Teo
- Cognitive Neuroimaging Centre, Nanyang Technological University, Singapore; Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore; IGP-Neuroscience, Interdisciplinary Graduate Programme, Nanyang Technological University, Singapore; Division of Neurosurgery, National University Health System, Singapore
| | - Raghavan Bhuvanakantham
- Cognitive Neuroimaging Centre, Nanyang Technological University, Singapore; Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore
| | - LaiGuan Fong
- Cognitive Neuroimaging Centre, Nanyang Technological University, Singapore
| | - Wei Khang Jeremy Sim
- Cognitive Neuroimaging Centre, Nanyang Technological University, Singapore; IGP-Neuroscience, Interdisciplinary Graduate Programme, Nanyang Technological University, Singapore
| | - Zhiwei Guo
- School of Computer Science and Engineering, Nanyang Technological University, Singapore
| | | | | | - Parasuraman Padmanabhan
- Cognitive Neuroimaging Centre, Nanyang Technological University, Singapore; Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore
| | - Victoria Leong
- Division of Psychology, Nanyang Technological University, Singapore; Department of Pediatrics, University of Cambridge, United Kingdom
| | - Jia Lu
- Cognitive Neuroimaging Centre, Nanyang Technological University, Singapore; DSO National Laboratories, Singapore; Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Balázs Gulyás
- Cognitive Neuroimaging Centre, Nanyang Technological University, Singapore; Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore; Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden.
| | - Cuntai Guan
- School of Computer Science and Engineering, Nanyang Technological University, Singapore.
| |
Collapse
|
2
|
Moon J, Chau T. Online Ternary Classification of Covert Speech by Leveraging the Passive Perception of Speech. Int J Neural Syst 2023; 33:2350048. [PMID: 37522623 DOI: 10.1142/s012906572350048x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/01/2023]
Abstract
Brain-computer interfaces (BCIs) provide communicative alternatives to those without functional speech. Covert speech (CS)-based BCIs enable communication simply by thinking of words and thus have intuitive appeal. However, an elusive barrier to their clinical translation is the collection of voluminous examples of high-quality CS signals, as iteratively rehearsing words for long durations is mentally fatiguing. Research on CS and speech perception (SP) identifies common spatiotemporal patterns in their respective electroencephalographic (EEG) signals, pointing towards shared encoding mechanisms. The goal of this study was to investigate whether a model that leverages the signal similarities between SP and CS can differentiate speech-related EEG signals online. Ten participants completed a dyadic protocol where in each trial, they listened to a randomly selected word and then subsequently mentally rehearsed the word. In the offline sessions, eight words were presented to participants. For the subsequent online sessions, the two most distinct words (most separable in terms of their EEG signals) were chosen to form a ternary classification problem (two words and rest). The model comprised a functional mapping derived from SP and CS signals of the same speech token (features are extracted via a Riemannian approach). An average ternary online accuracy of 75.3% (60% chance level) was achieved across participants, with individual accuracies as high as 93%. Moreover, we observed that the signal-to-noise ratio (SNR) of CS signals was enhanced by perception-covert modeling according to the level of high-frequency ([Formula: see text]-band) correspondence between CS and SP. These findings may lead to less burdensome data collection for training speech BCIs, which could eventually enhance the rate at which the vocabulary can grow.
Collapse
Affiliation(s)
- Jae Moon
- Institute of Biomedical Engineering, University of Toronto, Holland Bloorview Kid's Rehabilitation Hospital, Toronto, Ontario, Canada
| | - Tom Chau
- Institute of Biomedical Engineering, University of Toronto, Holland Bloorview Kid's Rehabilitation Hospital, Toronto, Ontario, Canada
| |
Collapse
|
3
|
Shi Y, Li Y, Koike Y. Sparse Logistic Regression-Based EEG Channel Optimization Algorithm for Improved Universality across Participants. Bioengineering (Basel) 2023; 10:664. [PMID: 37370595 DOI: 10.3390/bioengineering10060664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 05/25/2023] [Accepted: 05/26/2023] [Indexed: 06/29/2023] Open
Abstract
Electroencephalogram (EEG) channel optimization can reduce redundant information and improve EEG decoding accuracy by selecting the most informative channels. This article aims to investigate the universality regarding EEG channel optimization in terms of how well the selected EEG channels can be generalized to different participants. In particular, this study proposes a sparse logistic regression (SLR)-based EEG channel optimization algorithm using a non-zero model parameter ranking method. The proposed channel optimization algorithm was evaluated in both individual analysis and group analysis using the raw EEG data, compared with the conventional channel selection method based on the correlation coefficients (CCS). The experimental results demonstrate that the SLR-based EEG channel optimization algorithm not only filters out most redundant channels (filters 75-96.9% of channels) with a 1.65-5.1% increase in decoding accuracy, but it can also achieve a satisfactory level of decoding accuracy in the group analysis by employing only a few (2-15) common EEG electrodes, even for different participants. The proposed channel optimization algorithm can realize better universality for EEG decoding, which can reduce the burden of EEG data acquisition and enhance the real-world application of EEG-based brain-computer interface (BCI).
Collapse
Affiliation(s)
- Yuxi Shi
- School of Engineering, Tokyo Institute of Technology, Yokohama 226-8503, Japan
| | - Yuanhao Li
- School of Engineering, Tokyo Institute of Technology, Yokohama 226-8503, Japan
| | - Yasuharu Koike
- Institute of Innovative Research, Tokyo Institute of Technology, Yokohama 226-8503, Japan
| |
Collapse
|
4
|
Nitta T, Horikawa J, Iribe Y, Taguchi R, Katsurada K, Shinohara S, Kawai G. Linguistic representation of vowels in speech imagery EEG. Front Hum Neurosci 2023; 17:1163578. [PMID: 37275343 PMCID: PMC10237317 DOI: 10.3389/fnhum.2023.1163578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Accepted: 04/27/2023] [Indexed: 06/07/2023] Open
Abstract
Speech imagery recognition from electroencephalograms (EEGs) could potentially become a strong contender among non-invasive brain-computer interfaces (BCIs). In this report, first we extract language representations as the difference of line-spectra of phones by statistically analyzing many EEG signals from the Broca area. Then we extract vowels by using iterative search from hand-labeled short-syllable data. The iterative search process consists of principal component analysis (PCA) that visualizes linguistic representation of vowels through eigen-vectors φ(m), and subspace method (SM) that searches an optimum line-spectrum for redesigning φ(m). The extracted linguistic representation of Japanese vowels /i/ /e/ /a/ /o/ /u/ shows 2 distinguished spectral peaks (P1, P2) in the upper frequency range. The 5 vowels are aligned on the P1-P2 chart. A 5-vowel recognition experiment using a data set of 5 subjects and a convolutional neural network (CNN) classifier gave a mean accuracy rate of 72.6%.
Collapse
Affiliation(s)
- Tsuneo Nitta
- Graduate School of Engineering, Toyohashi University of Technology, Toyohashi, Japan
| | - Junsei Horikawa
- Graduate School of Engineering, Toyohashi University of Technology, Toyohashi, Japan
| | - Yurie Iribe
- Graduate School of Information Science and Technology, Aichi Prefectural University, Nagakute, Japan
| | - Ryo Taguchi
- Graduate School of Information, Nagoya Institute of Technology, Nagoya, Japan
| | - Kouichi Katsurada
- Faculty of Science and Technology, Tokyo University of Science, Noda, Japan
| | - Shuji Shinohara
- School of Science and Engineering, Tokyo Denki University, Saitama, Japan
| | - Goh Kawai
- Online Learning Support Team, Tokyo University of Foreign Studies, Tokyo, Japan
| |
Collapse
|
5
|
Guo Z, Chen F. Decoding lexical tones and vowels in imagined tonal monosyllables using fNIRS signals. J Neural Eng 2022; 19. [PMID: 36317255 DOI: 10.1088/1741-2552/ac9e1d] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Accepted: 10/27/2022] [Indexed: 11/11/2022]
Abstract
Objective.Speech is a common way of communication. Decoding verbal intent could provide a naturalistic communication way for people with severe motor disabilities. Active brain computer interaction (BCI) speller is one of the most commonly used speech BCIs. To reduce the spelling time of Chinese words, identifying vowels and tones that are embedded in imagined Chinese words is essential. Functional near-infrared spectroscopy (fNIRS) has been widely used in BCI because it is portable, non-invasive, safe, low cost, and has a relatively high spatial resolution.Approach.In this study, an active BCI speller based on fNIRS is presented by covertly rehearsing tonal monosyllables with vowels (i.e. /a/, /i/, /o/, and /u/) and four lexical tones in Mandarin Chinese (i.e. tones 1, 2, 3, and 4) for 10 s.Main results.fNIRS results showed significant differences in the right superior temporal gyrus between imagined vowels with tone 2/3/4 and those with tone 1 (i.e. more activations and stronger connections to other brain regions for imagined vowels with tones 2/3/4 than for those with tone 1). Speech-related areas for tone imagery (i.e. the right hemisphere) provided majority of information for identifying tones, while the left hemisphere had advantages in vowel identification. Having decoded both vowels and tones during the post-stimulus 15 s period, the average classification accuracies exceeded 40% and 70% in multiclass (i.e. four classes) and binary settings, respectively. To spell words more quickly, the time window size for decoding was reduced from 15 s to 2.5 s while the classification accuracies were not significantly reduced.Significance.For the first time, this work demonstrated the possibility of discriminating lexical tones and vowels in imagined tonal syllables simultaneously. In addition, the reduced time window for decoding indicated that the spelling time of Chinese words could be significantly reduced in the fNIRS-based BCIs.
Collapse
Affiliation(s)
- Zengzhi Guo
- School of Electronics and Information Engineering, Harbin Institute of Technology, Harbin, People's Republic of China.,Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, People's Republic of China
| | - Fei Chen
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, People's Republic of China
| |
Collapse
|
6
|
Cooney C, Folli R, Coyle D. Opportunities, pitfalls and trade-offs in designing protocols for measuring the neural correlates of speech. Neurosci Biobehav Rev 2022; 140:104783. [PMID: 35907491 DOI: 10.1016/j.neubiorev.2022.104783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 07/12/2022] [Accepted: 07/15/2022] [Indexed: 11/25/2022]
Abstract
Decoding speech and speech-related processes directly from the human brain has intensified in studies over recent years as such a decoder has the potential to positively impact people with limited communication capacity due to disease or injury. Additionally, it can present entirely new forms of human-computer interaction and human-machine communication in general and facilitate better neuroscientific understanding of speech processes. Here, we synthesize the literature on neural speech decoding pertaining to how speech decoding experiments have been conducted, coalescing around a necessity for thoughtful experimental design aimed at specific research goals, and robust procedures for evaluating speech decoding paradigms. We examine the use of different modalities for presenting stimuli to participants, methods for construction of paradigms including timings and speech rhythms, and possible linguistic considerations. In addition, novel methods for eliciting naturalistic speech and validating imagined speech task performance in experimental settings are presented based on recent research. We also describe the multitude of terms used to instruct participants on how to produce imagined speech during experiments and propose methods for investigating the effect of these terms on imagined speech decoding. We demonstrate that the range of experimental procedures used in neural speech decoding studies can have unintended consequences which can impact upon the efficacy of the knowledge obtained. The review delineates the strengths and weaknesses of present approaches and poses methodological advances which we anticipate will enhance experimental design, and progress toward the optimal design of movement independent direct speech brain-computer interfaces.
Collapse
Affiliation(s)
- Ciaran Cooney
- Intelligent Systems Research Centre, Ulster University, Derry, UK.
| | - Raffaella Folli
- Institute for Research in Social Sciences, Ulster University, Jordanstown, UK
| | - Damien Coyle
- Intelligent Systems Research Centre, Ulster University, Derry, UK
| |
Collapse
|
7
|
Multiclass Classification of Imagined Speech Vowels and Words of Electroencephalography Signals Using Deep Learning. ADVANCES IN HUMAN-COMPUTER INTERACTION 2022. [DOI: 10.1155/2022/1374880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The paper’s emphasis is on the imagined speech decoding of electroencephalography (EEG) neural signals of individuals in accordance with the expansion of the brain-computer interface to encompass individuals with speech problems encountering communication challenges. Decoding an individual’s imagined speech from nonstationary and nonlinear EEG neural signals is a complex task. Related research work in the field of imagined speech has revealed that imagined speech decoding performance and accuracy require attention to further improve. The evolution of deep learning technology increases the likelihood of decoding imagined speech from EEG signals with enhanced performance. We proposed a novel supervised deep learning model that combined the temporal convolutional networks and the convolutional neural networks with the intent of retrieving information from the EEG signals. The experiment was carried out using an open-access dataset of fifteen subjects’ imagined speech multichannel signals of vowels and words. The raw multichannel EEG signals of multiple subjects were processed using discrete wavelet transformation technique. The model was trained and evaluated using the preprocessed signals, and the model hyperparameters were adjusted to achieve higher accuracy in the classification of imagined speech. The experiment results demonstrated that the multiclass imagined speech classification of the proposed model exhibited a higher overall accuracy of 0.9649 and a classification error rate of 0.0350. The results of the study indicate that individuals with speech difficulties might well be able to leverage a noninvasive EEG-based imagined speech brain-computer interface system as one of the long-term alternative artificial verbal communication mediums.
Collapse
|
8
|
Effect of functional and effective brain connectivity in identifying vowels from articulation imagery procedures. Cogn Process 2022; 23:593-618. [PMID: 35794496 DOI: 10.1007/s10339-022-01103-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Accepted: 06/15/2022] [Indexed: 11/03/2022]
Abstract
Articulation imagery, a form of mental imagery, refers to the activity of imagining or speaking to oneself mentally without an articulation movement. It is an effective domain of research in speech impaired neural disorders, as speech imagination has high similarity to real voice communication. This work employs electroencephalography (EEG) signals acquired from articulation and articulation imagery in identifying the vowel being imagined during different tasks. EEG signals from chosen electrodes are decomposed using the empirical mode decomposition (EMD) method into a series of intrinsic mode functions. Brain connectivity estimators and entropy measures have been computed to analyze the functional cooperation and causal dependence between different cortical regions as well as the regularity in the signals. Using machine learning techniques such as multiclass support vector machine (MSVM) and random forest (RF), the vowels have been classified. Three different training and testing protocols (Articulation-AR, Articulation imagery-AI and Articulation vs Articulation imagery-AR vs AI) were employed for identifying the vowel being imagined of articulating. An overall classification accuracy of 80% was obtained for articulation imagery protocol which was found to be higher than the other two protocols. Also, MSVM techniques outperformed the RF technique in terms of the classification accuracy. The effect of brain connectivity estimators and machine learning techniques seems to be reliable in identifying the vowel from the subjects' thought and thereby assisting the people with speech impairment.
Collapse
|
9
|
刘 艳, 龚 安, 丁 鹏, 赵 磊, 钱 谦, 周 建, 苏 磊, 伏 云. [Key technology of brain-computer interaction based on speech imagery]. SHENG WU YI XUE GONG CHENG XUE ZA ZHI = JOURNAL OF BIOMEDICAL ENGINEERING = SHENGWU YIXUE GONGCHENGXUE ZAZHI 2022; 39:596-611. [PMID: 35788530 PMCID: PMC10950764 DOI: 10.7507/1001-5515.202107018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Revised: 04/14/2022] [Indexed: 06/15/2023]
Abstract
Speech expression is an important high-level cognitive behavior of human beings. The realization of this behavior is closely related to human brain activity. Both true speech expression and speech imagination can activate part of the same brain area. Therefore, speech imagery becomes a new paradigm of brain-computer interaction. Brain-computer interface (BCI) based on speech imagery has the advantages of spontaneous generation, no training, and friendliness to subjects, so it has attracted the attention of many scholars. However, this interactive technology is not mature in the design of experimental paradigms and the choice of imagination materials, and there are many issues that need to be discussed urgently. Therefore, in response to these problems, this article first expounds the neural mechanism of speech imagery. Then, by reviewing the previous BCI research of speech imagery, the mainstream methods and core technologies of experimental paradigm, imagination materials, data processing and so on are systematically analyzed. Finally, the key problems and main challenges that restrict the development of this type of BCI are discussed. And the future development and application perspective of the speech imaginary BCI system are prospected.
Collapse
Affiliation(s)
- 艳鹏 刘
- 昆明理工大学 信息工程与自动化学院(昆明 650500)School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, P. R. China
- 昆明理工大学 脑认知与脑机智能融合创新团队(昆明 650500)Brain Cognition and Brain-Computer Intelligence Integration Group, Kunming University of Science and Technology, Kunming 650500, P. R. China
| | - 安民 龚
- 昆明理工大学 信息工程与自动化学院(昆明 650500)School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, P. R. China
| | - 鹏 丁
- 昆明理工大学 信息工程与自动化学院(昆明 650500)School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, P. R. China
- 昆明理工大学 脑认知与脑机智能融合创新团队(昆明 650500)Brain Cognition and Brain-Computer Intelligence Integration Group, Kunming University of Science and Technology, Kunming 650500, P. R. China
| | - 磊 赵
- 昆明理工大学 信息工程与自动化学院(昆明 650500)School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, P. R. China
| | - 谦 钱
- 昆明理工大学 信息工程与自动化学院(昆明 650500)School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, P. R. China
- 昆明理工大学 脑认知与脑机智能融合创新团队(昆明 650500)Brain Cognition and Brain-Computer Intelligence Integration Group, Kunming University of Science and Technology, Kunming 650500, P. R. China
| | - 建华 周
- 昆明理工大学 信息工程与自动化学院(昆明 650500)School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, P. R. China
- 昆明理工大学 脑认知与脑机智能融合创新团队(昆明 650500)Brain Cognition and Brain-Computer Intelligence Integration Group, Kunming University of Science and Technology, Kunming 650500, P. R. China
| | - 磊 苏
- 昆明理工大学 信息工程与自动化学院(昆明 650500)School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, P. R. China
- 昆明理工大学 脑认知与脑机智能融合创新团队(昆明 650500)Brain Cognition and Brain-Computer Intelligence Integration Group, Kunming University of Science and Technology, Kunming 650500, P. R. China
| | - 云发 伏
- 昆明理工大学 信息工程与自动化学院(昆明 650500)School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, P. R. China
- 昆明理工大学 脑认知与脑机智能融合创新团队(昆明 650500)Brain Cognition and Brain-Computer Intelligence Integration Group, Kunming University of Science and Technology, Kunming 650500, P. R. China
- 武警工程大学 信息工程学院(西安 710000)College of Information Engineering, Engineering University of PAP, Xi’an 710000, P. R. China
- 昆明理工大学 理学院(昆明 650500)Faculty of Science, Kunming University of Science and Technology, Kunming 650500, P. R. China
| |
Collapse
|
10
|
Sheth J, Tankus A, Tran M, Pouratian N, Fried I, Speier W. Generalizing neural signal-to-text brain-computer interfaces. Biomed Phys Eng Express 2021; 7. [PMID: 33836507 DOI: 10.1088/2057-1976/abf6ab] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Accepted: 04/09/2021] [Indexed: 11/12/2022]
Abstract
Objective:Brain-Computer Interfaces (BCI) may help patients with faltering communication abilities due to neurodegenerative diseases produce text or speech by direct neural processing. However, their practical realization has proven difficult due to limitations in speed, accuracy, and generalizability of existing interfaces. The goal of this study is to evaluate the BCI performance of a robust speech decoding system that translates neural signals evoked by speech to a textual output. While previous studies have approached this problem by using neural signals to choose from a limited set of possible words, we employ a more general model that can type any word from a large corpus of English text.Approach:In this study, we create an end-to-end BCI that translates neural signals associated with overt speech into text output. Our decoding system first isolates frequency bands in the input depth-electrode signal encapsulating differential information regarding production of various phonemic classes. These bands form a feature set that then feeds into a Long Short-Term Memory (LSTM) model which discerns at each time point probability distributions across all phonemes uttered by a subject. Finally, a particle filtering algorithm temporally smooths these probabilities by incorporating prior knowledge of the English language to output text corresponding to the decoded word. The generalizability of our decoder is driven by the lack of a vocabulary constraint on this output word.Main result:This method was evaluated using a dataset of 6 neurosurgical patients implanted with intra-cranial depth electrodes to identify seizure foci for potential surgical treatment of epilepsy. We averaged 32% word accuracy and on the phoneme-level obtained 46% precision, 51% recall and 73.32% average phoneme error rate while also achieving significant increases in speed when compared to several other BCI approaches.Significance:Our study employs a more general neural signal-to-text model which could facilitate communication by patients in everyday environments.
Collapse
Affiliation(s)
- Janaki Sheth
- Department of Physics and Astronomy, UCLA, Los Angeles, CA, United States of America
| | - Ariel Tankus
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel.,Functional Neurosurgery Unit, Tel Aviv, Sourasky Medical Center, Tel Aviv, Israel.,Department of Neurology and Neurosurgery, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Michelle Tran
- Department of Neurosurgery, UCLA, Los Angeles, CA, United States of America
| | - Nader Pouratian
- Department of Neurosurgery, UCLA, Los Angeles, CA, United States of America
| | - Itzhak Fried
- Department of Neurosurgery, UCLA, Los Angeles, CA, United States of America
| | - William Speier
- Department of Radiology, UCLA, Los Angeles, CA, United States of America
| |
Collapse
|
11
|
Cooney C, Korik A, Folli R, Coyle D. Evaluation of Hyperparameter Optimization in Machine and Deep Learning Methods for Decoding Imagined Speech EEG. SENSORS 2020; 20:s20164629. [PMID: 32824559 PMCID: PMC7472624 DOI: 10.3390/s20164629] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Revised: 08/10/2020] [Accepted: 08/13/2020] [Indexed: 01/12/2023]
Abstract
Classification of electroencephalography (EEG) signals corresponding to imagined speech production is important for the development of a direct-speech brain-computer interface (DS-BCI). Deep learning (DL) has been utilized with great success across several domains. However, it remains an open question whether DL methods provide significant advances over traditional machine learning (ML) approaches for classification of imagined speech. Furthermore, hyperparameter (HP) optimization has been neglected in DL-EEG studies, resulting in the significance of its effects remaining uncertain. In this study, we aim to improve classification of imagined speech EEG by employing DL methods while also statistically evaluating the impact of HP optimization on classifier performance. We trained three distinct convolutional neural networks (CNN) on imagined speech EEG using a nested cross-validation approach to HP optimization. Each of the CNNs evaluated was designed specifically for EEG decoding. An imagined speech EEG dataset consisting of both words and vowels facilitated training on both sets independently. CNN results were compared with three benchmark ML methods: Support Vector Machine, Random Forest and regularized Linear Discriminant Analysis. Intra- and inter-subject methods of HP optimization were tested and the effects of HPs statistically analyzed. Accuracies obtained by the CNNs were significantly greater than the benchmark methods when trained on both datasets (words: 24.97%, p < 1 × 10-7, chance: 16.67%; vowels: 30.00%, p < 1 × 10-7, chance: 20%). The effects of varying HP values, and interactions between HPs and the CNNs were both statistically significant. The results of HP optimization demonstrate how critical it is for training CNNs to decode imagined speech.
Collapse
Affiliation(s)
- Ciaran Cooney
- Intelligent Systems Research Centre, Ulster University, Londonderry BT48 7JL, UK; (A.K.); (D.C.)
- Correspondence:
| | - Attila Korik
- Intelligent Systems Research Centre, Ulster University, Londonderry BT48 7JL, UK; (A.K.); (D.C.)
| | - Raffaella Folli
- Institute for Research in Social Sciences, Ulster University, Jordanstown BT37 0QB, UK;
| | - Damien Coyle
- Intelligent Systems Research Centre, Ulster University, Londonderry BT48 7JL, UK; (A.K.); (D.C.)
| |
Collapse
|
12
|
Kristensen AB, Subhi Y, Puthusserypady S. Vocal Imagery vs Intention: Viability of Vocal-Based EEG-BCI Paradigms. IEEE Trans Neural Syst Rehabil Eng 2020; 28:1750-1759. [PMID: 32746304 DOI: 10.1109/tnsre.2020.3004924] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The viability of electroencephalogram (EEG) based vocal imagery (VIm) and vocal intention (VInt) Brain-Computer Interface (BCI) systems has been investigated in this study. Four different types of experimental tasks related to humming has been designed and exploited here. They are: (i) non-task specific (NTS), (ii) motor task (MT), (iii) VIm task, and (iv) VInt task. EEG signals from seventeen participants for each of these tasks were recorded from 16 electrode locations on the scalp and its features were extracted and analysed using common spatial pattern (CSP) filter. These features were subsequently fed into a support vector machine (SVM) classifier for classification. This analysis aimed to perform a binary classification, predicting whether the subject was performing one task or the other. Results from an extensive analysis showed a mean classification accuracy of 88.9% for VIm task and 91.1% for VInt task. This study clearly shows that VIm can be classified with ease and is a viable paradigm to integrate in BCIs. Such systems are not only useful for people with speech problems, but in general for people who use BCI systems to help them out in their everyday life, giving them another dimension of system control.
Collapse
|
13
|
Dash D, Ferrari P, Wang J. Decoding Imagined and Spoken Phrases From Non-invasive Neural (MEG) Signals. Front Neurosci 2020; 14:290. [PMID: 32317917 PMCID: PMC7154084 DOI: 10.3389/fnins.2020.00290] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2019] [Accepted: 03/13/2020] [Indexed: 11/16/2022] Open
Abstract
Speech production is a hierarchical mechanism involving the synchronization of the brain and the oral articulators, where the intention of linguistic concepts is transformed into meaningful sounds. Individuals with locked-in syndrome (fully paralyzed but aware) lose their motor ability completely including articulation and even eyeball movement. The neural pathway may be the only option to resume a certain level of communication for these patients. Current brain-computer interfaces (BCIs) use patients' visual and attentional correlates to build communication, resulting in a slow communication rate (a few words per minute). Direct decoding of imagined speech from the neural signals (and then driving a speech synthesizer) has the potential for a higher communication rate. In this study, we investigated the decoding of five imagined and spoken phrases from single-trial, non-invasive magnetoencephalography (MEG) signals collected from eight adult subjects. Two machine learning algorithms were used. One was an artificial neural network (ANN) with statistical features as the baseline approach. The other was convolutional neural networks (CNNs) applied on the spatial, spectral and temporal features extracted from the MEG signals. Experimental results indicated the possibility to decode imagined and spoken phrases directly from neuromagnetic signals. CNNs were found to be highly effective with an average decoding accuracy of up to 93% for the imagined and 96% for the spoken phrases.
Collapse
Affiliation(s)
- Debadatta Dash
- Department of Electrical and Computer Engineering, University of Texas at Austin, Austin, TX, United States
- Department of Neurology, Dell Medical School, University of Texas at Austin, Austin, TX, United States
| | - Paul Ferrari
- MEG Lab, Dell Children's Medical Center, Austin, TX, United States
- Department of Psychology, University of Texas at Austin, Austin, TX, United States
| | - Jun Wang
- Department of Neurology, Dell Medical School, University of Texas at Austin, Austin, TX, United States
- Department of Communication Sciences and Disorders, University of Texas at Austin, Austin, TX, United States
| |
Collapse
|
14
|
Dash D, Ferrari P, Malik S, Montillo A, Maldjian JA, Wang J. Determining the Optimal Number of MEG Trials: A Machine Learning and Speech Decoding Perspective. BRAIN INFORMATICS : INTERNATIONAL CONFERENCE, BI 2018, ARLINGTON, TX, USA, DECEMBER 7-9, 2018, PROCEEDINGS. INTERNATIONAL CONFERENCE ON BRAIN INFORMATICS (2018 : ARLINGTON, TEX.) 2019; 11309:163-172. [PMID: 31768504 PMCID: PMC6876632 DOI: 10.1007/978-3-030-05587-5_16] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Advancing the knowledge about neural speech mechanisms is critical for developing next-generation, faster brain computer interface to assist in speech communication for the patients with severe neurological conditions (e.g., locked-in syndrome). Among current neuroimaging techniques, Magnetoencephalography (MEG) provides direct representation for the large-scale neural dynamics of underlying cognitive processes based on its optimal spatiotemporal resolution. However, the MEG measured neural signals are smaller in magnitude compared to the background noise and hence, MEG usually suffers from a low signal-to-noise ratio (SNR) at the single-trial level. To overcome this limitation, it is common to record many trials of the same event-task and use the time-locked average signal for analysis, which can be very time consuming. In this study, we investigated the effect of the number of MEG recording trials required for speech decoding using a machine learning algorithm. We used a wavelet filter for generating the denoised neural features to train an Artificial Neural Network (ANN) for speech decoding. We found that wavelet based denoising increased the SNR of the neural signal prior to analysis and facilitated accurate speech decoding performance using as few as 40 single-trials. This study may open up the possibility of limiting MEG trials for other task evoked studies as well.
Collapse
Affiliation(s)
- Debadatta Dash
- Department of Bioengineering, University of Texas at Dallas, Richardson, USA
| | - Paul Ferrari
- Department of Psychology, University of Texas at Austin, Austin, USA
- MEG Laboratory, Dell Children's Medical Center, Austin, USA
| | - Saleem Malik
- MEG Lab, Cook Children's Hospital, Fort Worth, TX, USA
| | - Albert Montillo
- Department of Radiology, UT Southwestern Medical Center, Dallas, USA
- Department of Bioinformatics, UT Southwestern Medical Center, Dallas, USA
| | - Joseph A Maldjian
- Department of Radiology, UT Southwestern Medical Center, Dallas, USA
| | - Jun Wang
- Department of Bioengineering, University of Texas at Dallas, Richardson, USA
- Callier Center for Communication Disorders, University of Texas at Dallas, Richardson, USA
| |
Collapse
|
15
|
Chengaiyan S, Retnapandian AS, Anandan K. Identification of vowels in consonant-vowel-consonant words from speech imagery based EEG signals. Cogn Neurodyn 2019; 14:1-19. [PMID: 32015764 DOI: 10.1007/s11571-019-09558-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Revised: 09/18/2019] [Accepted: 09/30/2019] [Indexed: 10/25/2022] Open
Abstract
Retrieval of unintelligible speech is a basic need for speech impaired and is under research for several decades. But retrieval of random words from thoughts needs a substantial and consistent approach. This work focuses on the preliminary steps of retrieving vowels from Electroencephalography (EEG) signals acquired while speaking and imagining of speaking a consonant-vowel-consonant (CVC) word. The process, referred to as Speech imagery is imagining of speaking to oneself silently in the mind. Speech imagery is a form of mental imagery. Brain connectivity estimators such as EEG coherence, Partial Directed Coherence, Directed Transfer Function and Transfer Entropy have been used to estimate the concurrency and causal dependence (direction and strength) between different brain regions. From brain connectivity results it has been observed that the left frontal and left temporal electrodes were activated for speech and speech imagery processes. These brain connectivity estimators have been used for training Recurrent Neural Networks (RNN) and Deep Belief Networks (DBN) for identifying the vowel from the subject's thought. Though the accuracy level was found to be varying for each vowel while speaking and imagining of speaking the CVC word, the overall classification accuracy was found to be 72% while using RNN whereas a classification accuracy of 80% was observed while using DBN. DBN was found to outperform RNN in both the speech and speech imagery processes. Thus, the combination of brain connectivity estimators and deep learning techniques appear to be effective in identifying the vowel from EEG signals of subjects' thought.
Collapse
Affiliation(s)
- Sandhya Chengaiyan
- Department of Biomedical Engineering, Centre for Healthcare Technologies, SSN College of Engineering, Chennai, Tamilnadu India
| | - Anandha Sree Retnapandian
- Department of Biomedical Engineering, Centre for Healthcare Technologies, SSN College of Engineering, Chennai, Tamilnadu India
| | - Kavitha Anandan
- Department of Biomedical Engineering, Centre for Healthcare Technologies, SSN College of Engineering, Chennai, Tamilnadu India
| |
Collapse
|
16
|
Minati L, Yoshimura N, Frasca M, Drożdż S, Koike Y. Warped phase coherence: An empirical synchronization measure combining phase and amplitude information. CHAOS (WOODBURY, N.Y.) 2019; 29:021102. [PMID: 30823716 DOI: 10.1063/1.5082749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2018] [Accepted: 01/21/2019] [Indexed: 06/09/2023]
Abstract
The entrainment between weakly coupled nonlinear oscillators, as well as between complex signals such as those representing physiological activity, is frequently assessed in terms of whether a stable relationship is detectable between the instantaneous phases extracted from the measured or simulated time-series via the analytic signal. Here, we demonstrate that adding a possibly complex constant value to this normally null-mean signal has a non-trivial warping effect. Among other consequences, this introduces a level of sensitivity to the amplitude fluctuations and average relative phase. By means of simulations of Rössler systems and experiments on single-transistor oscillator networks, it is shown that the resulting coherence measure may have an empirical value in improving the inference of the structural couplings from the dynamics. When tentatively applied to the electroencephalogram recorded while performing imaginary and real movements, this straightforward modification of the phase locking value substantially improved the classification accuracy. Hence, its possible practical relevance in brain-computer and brain-machine interfaces deserves consideration.
Collapse
Affiliation(s)
- Ludovico Minati
- Tokyo Tech World Research Hub Initiative-Institute of Innovative Research, Tokyo Institute of Technology, Yokohama 226-8503, Japan
| | - Natsue Yoshimura
- FIRST-Institute of Innovative Research, Tokyo Institute of Technology, Yokohama 226-8503, Japan
| | - Mattia Frasca
- Department of Electrical Electronic and Computer Engineering (DIEEI), University of Catania, 95131 Catania, Italy
| | - Stanisław Drożdż
- Complex Systems Theory Department, Institute of Nuclear Physics-Polish Academy of Sciences (IFJ-PAN), 31-342 Kraków, Poland
| | - Yasuharu Koike
- FIRST-Institute of Innovative Research, Tokyo Institute of Technology, Yokohama 226-8503, Japan
| |
Collapse
|
17
|
Cooney C, Folli R, Coyle D. Neurolinguistics Research Advancing Development of a Direct-Speech Brain-Computer Interface. iScience 2018; 8:103-125. [PMID: 30296666 PMCID: PMC6174918 DOI: 10.1016/j.isci.2018.09.016] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2018] [Revised: 09/04/2018] [Accepted: 09/18/2018] [Indexed: 01/09/2023] Open
Abstract
A direct-speech brain-computer interface (DS-BCI) acquires neural signals corresponding to imagined speech, then processes and decodes these signals to produce a linguistic output in the form of phonemes, words, or sentences. Recent research has shown the potential of neurolinguistics to enhance decoding approaches to imagined speech with the inclusion of semantics and phonology in experimental procedures. As neurolinguistics research findings are beginning to be incorporated within the scope of DS-BCI research, it is our view that a thorough understanding of imagined speech, and its relationship with overt speech, must be considered an integral feature of research in this field. With a focus on imagined speech, we provide a review of the most important neurolinguistics research informing the field of DS-BCI and suggest how this research may be utilized to improve current experimental protocols and decoding techniques. Our review of the literature supports a cross-disciplinary approach to DS-BCI research, in which neurolinguistics concepts and methods are utilized to aid development of a naturalistic mode of communication.
Collapse
Affiliation(s)
- Ciaran Cooney
- Intelligent Systems Research Centre, Ulster University, Derry, UK.
| | - Raffaella Folli
- Institute for Research in Social Sciences, Ulster University, Jordanstown, UK
| | - Damien Coyle
- Intelligent Systems Research Centre, Ulster University, Derry, UK
| |
Collapse
|
18
|
Liu Y, Ayaz H. Speech Recognition via fNIRS Based Brain Signals. Front Neurosci 2018; 12:695. [PMID: 30356771 PMCID: PMC6189799 DOI: 10.3389/fnins.2018.00695] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2018] [Accepted: 09/18/2018] [Indexed: 11/13/2022] Open
Abstract
In this paper, we present the first evidence that perceived speech can be identified from the listeners' brain signals measured via functional-near infrared spectroscopy (fNIRS)—a non-invasive, portable, and wearable neuroimaging technique suitable for ecologically valid settings. In this study, participants listened audio clips containing English stories while prefrontal and parietal cortices were monitored with fNIRS. Machine learning was applied to train predictive models using fNIRS data from a subject pool to predict which part of a story was listened by a new subject not in the pool based on the brain's hemodynamic response as measured by fNIRS. fNIRS signals can vary considerably from subject to subject due to the different head size, head shape, and spatial locations of brain functional regions. To overcome this difficulty, a generalized canonical correlation analysis (GCCA) was adopted to extract latent variables that are shared among the listeners before applying principal component analysis (PCA) for dimension reduction and applying logistic regression for classification. A 74.7% average accuracy has been achieved for differentiating between two 50 s. long story segments and a 43.6% average accuracy has been achieved for differentiating four 25 s. long story segments. These results suggest the potential of an fNIRS based-approach for building a speech decoding brain-computer-interface for developing a new type of neural prosthetic system.
Collapse
Affiliation(s)
- Yichuan Liu
- School of Biomedical Engineering, Drexel University, Science and Health Systems, Philadelphia, PA, United States.,Cognitive Neuroengineering and Quantitative Experimental Research (CONQUER) Collaborative, Drexel University, Philadelphia, PA, United States
| | - Hasan Ayaz
- School of Biomedical Engineering, Drexel University, Science and Health Systems, Philadelphia, PA, United States.,Cognitive Neuroengineering and Quantitative Experimental Research (CONQUER) Collaborative, Drexel University, Philadelphia, PA, United States.,Department of Family and Community Health, University of Pennsylvania, Philadelphia, PA, United States.,The Division of General Pediatrics, Children's Hospital of Philadelphia, Philadelphia, PA, United States
| |
Collapse
|
19
|
Mejia Tobar A, Hyoudou R, Kita K, Nakamura T, Kambara H, Ogata Y, Hanakawa T, Koike Y, Yoshimura N. Decoding of Ankle Flexion and Extension from Cortical Current Sources Estimated from Non-invasive Brain Activity Recording Methods. Front Neurosci 2018; 11:733. [PMID: 29358903 PMCID: PMC5766671 DOI: 10.3389/fnins.2017.00733] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Accepted: 12/15/2017] [Indexed: 11/27/2022] Open
Abstract
The classification of ankle movements from non-invasive brain recordings can be applied to a brain-computer interface (BCI) to control exoskeletons, prosthesis, and functional electrical stimulators for the benefit of patients with walking impairments. In this research, ankle flexion and extension tasks at two force levels in both legs, were classified from cortical current sources estimated by a hierarchical variational Bayesian method, using electroencephalography (EEG) and functional magnetic resonance imaging (fMRI) recordings. The hierarchical prior for the current source estimation from EEG was obtained from activated brain areas and their intensities from an fMRI group (second-level) analysis. The fMRI group analysis was performed on regions of interest defined over the primary motor cortex, the supplementary motor area, and the somatosensory area, which are well-known to contribute to movement control. A sparse logistic regression method was applied for a nine-class classification (eight active tasks and a resting control task) obtaining a mean accuracy of 65.64% for time series of current sources, estimated from the EEG and the fMRI signals using a variational Bayesian method, and a mean accuracy of 22.19% for the classification of the pre-processed of EEG sensor signals, with a chance level of 11.11%. The higher classification accuracy of current sources, when compared to EEG classification accuracy, was attributed to the high number of sources and the different signal patterns obtained in the same vertex for different motor tasks. Since the inverse filter estimation for current sources can be done offline with the present method, the present method is applicable to real-time BCIs. Finally, due to the highly enhanced spatial distribution of current sources over the brain cortex, this method has the potential to identify activation patterns to design BCIs for the control of an affected limb in patients with stroke, or BCIs from motor imagery in patients with spinal cord injury.
Collapse
Affiliation(s)
| | - Rikiya Hyoudou
- Institute of Innovative Research, Tokyo Institute of Technology, Yokohama, Japan
| | - Kahori Kita
- Center for Frontier Medical Engineering, Chiba University, Chiba, Japan.,Department of Advanced Neuroimaging, Integrative Brain Imaging Center, National Center of Neurology and Psychiatry, Tokyo, Japan
| | - Tatsuhiro Nakamura
- Department of Advanced Neuroimaging, Integrative Brain Imaging Center, National Center of Neurology and Psychiatry, Tokyo, Japan
| | - Hiroyuki Kambara
- Institute of Innovative Research, Tokyo Institute of Technology, Yokohama, Japan
| | - Yousuke Ogata
- Institute of Innovative Research, Tokyo Institute of Technology, Yokohama, Japan.,Department of Advanced Neuroimaging, Integrative Brain Imaging Center, National Center of Neurology and Psychiatry, Tokyo, Japan
| | - Takashi Hanakawa
- Department of Advanced Neuroimaging, Integrative Brain Imaging Center, National Center of Neurology and Psychiatry, Tokyo, Japan
| | - Yasuharu Koike
- Institute of Innovative Research, Tokyo Institute of Technology, Yokohama, Japan.,Department of Advanced Neuroimaging, Integrative Brain Imaging Center, National Center of Neurology and Psychiatry, Tokyo, Japan
| | - Natsue Yoshimura
- Institute of Innovative Research, Tokyo Institute of Technology, Yokohama, Japan.,Department of Advanced Neuroimaging, Integrative Brain Imaging Center, National Center of Neurology and Psychiatry, Tokyo, Japan
| |
Collapse
|
20
|
Okawa H, Suefusa K, Tanaka T. Neural Entrainment to Auditory Imagery of Rhythms. Front Hum Neurosci 2017; 11:493. [PMID: 29081742 PMCID: PMC5645537 DOI: 10.3389/fnhum.2017.00493] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2017] [Accepted: 09/26/2017] [Indexed: 11/13/2022] Open
Abstract
A method of reconstructing perceived or imagined music by analyzing brain activity has not yet been established. As a first step toward developing such a method, we aimed to reconstruct the imagery of rhythm, which is one element of music. It has been reported that a periodic electroencephalogram (EEG) response is elicited while a human imagines a binary or ternary meter on a musical beat. However, it is not clear whether or not brain activity synchronizes with fully imagined beat and meter without auditory stimuli. To investigate neural entrainment to imagined rhythm during auditory imagery of beat and meter, we recorded EEG while nine participants (eight males and one female) imagined three types of rhythm without auditory stimuli but with visual timing, and then we analyzed the amplitude spectra of the EEG. We also recorded EEG while the participants only gazed at the visual timing as a control condition to confirm the visual effect. Furthermore, we derived features of the EEG using canonical correlation analysis (CCA) and conducted an experiment to individually classify the three types of imagined rhythm from the EEG. The results showed that classification accuracies exceeded the chance level in all participants. These results suggest that auditory imagery of meter elicits a periodic EEG response that changes at the imagined beat and meter frequency even in the fully imagined conditions. This study represents the first step toward the realization of a method for reconstructing the imagined music from brain activity.
Collapse
Affiliation(s)
- Haruki Okawa
- Department of Electrical and Electronic Engineering, Tokyo University of Agriculture and Technology, Tokyo, Japan
| | - Kaori Suefusa
- Department of Electrical and Information Engineering, Tokyo University of Agriculture and Technology, Tokyo, Japan
| | - Toshihisa Tanaka
- Department of Electrical and Electronic Engineering, Tokyo University of Agriculture and Technology, Tokyo, Japan.,RIKEN Brain Science Institute, Saitama, Japan
| |
Collapse
|
21
|
Herff C, Schultz T. Automatic Speech Recognition from Neural Signals: A Focused Review. Front Neurosci 2016; 10:429. [PMID: 27729844 PMCID: PMC5037201 DOI: 10.3389/fnins.2016.00429] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Accepted: 09/05/2016] [Indexed: 11/13/2022] Open
Abstract
Speech interfaces have become widely accepted and are nowadays integrated in various real-life applications and devices. They have become a part of our daily life. However, speech interfaces presume the ability to produce intelligible speech, which might be impossible due to either loud environments, bothering bystanders or incapabilities to produce speech (i.e., patients suffering from locked-in syndrome). For these reasons it would be highly desirable to not speak but to simply envision oneself to say words or sentences. Interfaces based on imagined speech would enable fast and natural communication without the need for audible speech and would give a voice to otherwise mute people. This focused review analyzes the potential of different brain imaging techniques to recognize speech from neural signals by applying Automatic Speech Recognition technology. We argue that modalities based on metabolic processes, such as functional Near Infrared Spectroscopy and functional Magnetic Resonance Imaging, are less suited for Automatic Speech Recognition from neural signals due to low temporal resolution but are very useful for the investigation of the underlying neural mechanisms involved in speech processes. In contrast, electrophysiologic activity is fast enough to capture speech processes and is therefor better suited for ASR. Our experimental results indicate the potential of these signals for speech recognition from neural data with a focus on invasively measured brain activity (electrocorticography). As a first example of Automatic Speech Recognition techniques used from neural signals, we discuss the Brain-to-text system.
Collapse
Affiliation(s)
- Christian Herff
- Cognitive Systems Lab, Department for Mathematics and Computer Science, University of Bremen Bremen, Germany
| | - Tanja Schultz
- Cognitive Systems Lab, Department for Mathematics and Computer Science, University of Bremen Bremen, Germany
| |
Collapse
|