1
|
Kwon J, Hwang J, Sung JE, Im CH. Speech synthesis from three-axis accelerometer signals using conformer-based deep neural network. Comput Biol Med 2024; 182:109090. [PMID: 39232406 DOI: 10.1016/j.compbiomed.2024.109090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 08/23/2024] [Accepted: 08/29/2024] [Indexed: 09/06/2024]
Abstract
Silent speech interfaces (SSIs) have emerged as innovative non-acoustic communication methods, and our previous study demonstrated the significant potential of three-axis accelerometer-based SSIs to identify silently spoken words with high classification accuracy. The developed accelerometer-based SSI with only four accelerometers and a small training dataset outperformed a conventional surface electromyography (sEMG)-based SSI. In this study, motivated by the promising initial results, we investigated the feasibility of synthesizing spoken speech from three-axis accelerometer signals. This exploration aimed to assess the potential of accelerometer-based SSIs for practical silent communication applications. Nineteen healthy individuals participated in our experiments. Five accelerometers were attached to the face to acquire speech-related facial movements while the participants read 270 Korean sentences aloud. For the speech synthesis, we used a convolution-augmented Transformer (Conformer)-based deep neural network model to convert the accelerometer signals into a Mel spectrogram, from which an audio waveform was synthesized using HiFi-GAN. To evaluate the quality of the generated Mel spectrograms, ten-fold cross-validation was performed, and the Mel cepstral distortion (MCD) was chosen as the evaluation metric. As a result, an average MCD of 5.03 ± 0.65 was achieved using four optimized accelerometers based on our previous study. Furthermore, the quality of generated Mel spectrograms was significantly enhanced by adding one more accelerometer attached under the chin, achieving an average MCD of 4.86 ± 0.65 (p < 0.001, Wilcoxon signed-rank test). Although an objective comparison is difficult, these results surpass those obtained using conventional SSIs based on sEMG, electromagnetic articulography, and electropalatography with the fewest sensors and a similar or smaller number of sentences to train the model. Our proposed approach will contribute to the widespread adoption of accelerometer-based SSIs, leveraging the advantages of accelerometers like low power consumption, invulnerability to physiological artifacts, and high portability.
Collapse
Affiliation(s)
- Jinuk Kwon
- Department of Electronic Engineering, Hanyang University, Seoul, South Korea.
| | - Jihun Hwang
- Department of Electronic Engineering, Hanyang University, Seoul, South Korea.
| | - Jee Eun Sung
- Department of Communication Disorders, Ewha Womans University, Seoul, South Korea.
| | - Chang-Hwan Im
- Department of Electronic Engineering, Hanyang University, Seoul, South Korea; Department of Biomedical Engineering, Hanyang University, Seoul, South Korea; Department of Artificial Intelligence, Hanyang University, Seoul, South Korea; Department of HY-KIST Bio-Convergence, Hanyang University, Seoul, South Korea.
| |
Collapse
|
2
|
Dash D, Ferrari P, Wang J. Neural Decoding of Spontaneous Overt and Intended Speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024:1-10. [PMID: 39106199 DOI: 10.1044/2024_jslhr-24-00046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/09/2024]
Abstract
PURPOSE The aim of this study was to decode intended and overt speech from neuromagnetic signals while the participants performed spontaneous overt speech tasks without cues or prompts (stimuli). METHOD Magnetoencephalography (MEG), a noninvasive neuroimaging technique, was used to collect neural signals from seven healthy adult English speakers performing spontaneous, overt speech tasks. The participants randomly spoke the words yes or no at a self-paced rate without cues. Two machine learning models, namely, linear discriminant analysis (LDA) and one-dimensional convolutional neural network (1D CNN), were employed to classify the two words from the recorded MEG signals. RESULTS LDA and 1D CNN achieved average decoding accuracies of 79.02% and 90.40%, respectively, in decoding overt speech, significantly surpassing the chance level (50%). The accuracy for decoding intended speech was 67.19% using 1D CNN. CONCLUSIONS This study showcases the possibility of decoding spontaneous overt and intended speech directly from neural signals in the absence of perceptual interference. We believe that these findings make a steady step toward the future spontaneous speech-based brain-computer interface.
Collapse
Affiliation(s)
- Debadatta Dash
- Department of Neurology, The University of Texas at Austin
| | - Paul Ferrari
- Helen DeVos Children's Hospital, Corewell Health, Grand Rapids, MI
| | - Jun Wang
- Department of Neurology, The University of Texas at Austin
- Department of Speech, Language, and Hearing Sciences, The University of Texas at Austin
| |
Collapse
|
3
|
Li J, Shi Y, Chen J, Huang Q, Ye M, Guo W. Flexible Self-Powered Low-Decibel Voice Recognition Mask. SENSORS (BASEL, SWITZERLAND) 2024; 24:3007. [PMID: 38793860 PMCID: PMC11124924 DOI: 10.3390/s24103007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 04/22/2024] [Accepted: 04/23/2024] [Indexed: 05/26/2024]
Abstract
In environments where silent communication is essential, such as libraries and conference rooms, the need for a discreet means of interaction is paramount. Here, we present a single-electrode, contact-separated triboelectric nanogenerator (CS-TENG) characterized by robust high-frequency sensing capabilities and long-term stability. Integrating this TENG onto the inner surface of a mask allows for the capture of conversational speech signals through airflow vibrations, generating a comprehensive dataset. Employing advanced signal processing techniques, including short-time Fourier transform (STFT), Mel-frequency cepstral coefficients (MFCC), and deep learning neural networks, facilitates the accurate identification of speaker content and verification of their identity. The accuracy rates for each category of vocabulary and identity recognition exceed 92% and 90%, respectively. This system represents a pivotal advancement in facilitating secure and efficient unobtrusive communication in quiet settings, with promising implications for smart home applications, virtual assistant technology, and potential deployment in security and confidentiality-sensitive contexts.
Collapse
Affiliation(s)
- Jianing Li
- Department of Physics, College of Physical Science and Technology, Research Institution for Biomimetics and Soft Matter, Xiamen University, Xiamen 361005, China; (J.L.); (Y.S.); (J.C.); (Q.H.); (M.Y.)
| | - Yating Shi
- Department of Physics, College of Physical Science and Technology, Research Institution for Biomimetics and Soft Matter, Xiamen University, Xiamen 361005, China; (J.L.); (Y.S.); (J.C.); (Q.H.); (M.Y.)
| | - Jianfeng Chen
- Department of Physics, College of Physical Science and Technology, Research Institution for Biomimetics and Soft Matter, Xiamen University, Xiamen 361005, China; (J.L.); (Y.S.); (J.C.); (Q.H.); (M.Y.)
| | - Qiaoling Huang
- Department of Physics, College of Physical Science and Technology, Research Institution for Biomimetics and Soft Matter, Xiamen University, Xiamen 361005, China; (J.L.); (Y.S.); (J.C.); (Q.H.); (M.Y.)
- Jiujiang Research Institute, Xiamen University, Jiujiang 332000, China
| | - Meidan Ye
- Department of Physics, College of Physical Science and Technology, Research Institution for Biomimetics and Soft Matter, Xiamen University, Xiamen 361005, China; (J.L.); (Y.S.); (J.C.); (Q.H.); (M.Y.)
| | - Wenxi Guo
- Department of Physics, College of Physical Science and Technology, Research Institution for Biomimetics and Soft Matter, Xiamen University, Xiamen 361005, China; (J.L.); (Y.S.); (J.C.); (Q.H.); (M.Y.)
- Jiujiang Research Institute, Xiamen University, Jiujiang 332000, China
| |
Collapse
|
4
|
Angrick M, Luo S, Rabbani Q, Candrea DN, Shah S, Milsap GW, Anderson WS, Gordon CR, Rosenblatt KR, Clawson L, Tippett DC, Maragakis N, Tenore FV, Fifer MS, Hermansky H, Ramsey NF, Crone NE. Online speech synthesis using a chronically implanted brain-computer interface in an individual with ALS. Sci Rep 2024; 14:9617. [PMID: 38671062 PMCID: PMC11053081 DOI: 10.1038/s41598-024-60277-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 04/21/2024] [Indexed: 04/28/2024] Open
Abstract
Brain-computer interfaces (BCIs) that reconstruct and synthesize speech using brain activity recorded with intracranial electrodes may pave the way toward novel communication interfaces for people who have lost their ability to speak, or who are at high risk of losing this ability, due to neurological disorders. Here, we report online synthesis of intelligible words using a chronically implanted brain-computer interface (BCI) in a man with impaired articulation due to ALS, participating in a clinical trial (ClinicalTrials.gov, NCT03567213) exploring different strategies for BCI communication. The 3-stage approach reported here relies on recurrent neural networks to identify, decode and synthesize speech from electrocorticographic (ECoG) signals acquired across motor, premotor and somatosensory cortices. We demonstrate a reliable BCI that synthesizes commands freely chosen and spoken by the participant from a vocabulary of 6 keywords previously used for decoding commands to control a communication board. Evaluation of the intelligibility of the synthesized speech indicates that 80% of the words can be correctly recognized by human listeners. Our results show that a speech-impaired individual with ALS can use a chronically implanted BCI to reliably produce synthesized words while preserving the participant's voice profile, and provide further evidence for the stability of ECoG for speech-based BCIs.
Collapse
Affiliation(s)
- Miguel Angrick
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| | - Shiyu Luo
- Department of Biomedical Engineering, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Qinwan Rabbani
- Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, MD, USA
| | - Daniel N Candrea
- Department of Biomedical Engineering, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Samyak Shah
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Griffin W Milsap
- Research and Exploratory Development Department, Johns Hopkins Applied Physics Laboratory, Laurel, MD, USA
| | - William S Anderson
- Department of Neurosurgery, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Chad R Gordon
- Department of Neurosurgery, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Section of Neuroplastic and Reconstructive Surgery, Department of Plastic Surgery, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Kathryn R Rosenblatt
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Anesthesiology & Critical Care Medicine, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Lora Clawson
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Donna C Tippett
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Otolaryngology-Head and Neck Surgery, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Department of Physical Medicine and Rehabilitation, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Nicholas Maragakis
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Francesco V Tenore
- Research and Exploratory Development Department, Johns Hopkins Applied Physics Laboratory, Laurel, MD, USA
| | - Matthew S Fifer
- Research and Exploratory Development Department, Johns Hopkins Applied Physics Laboratory, Laurel, MD, USA
| | - Hynek Hermansky
- Center for Language and Speech Processing, The Johns Hopkins University, Baltimore, MD, USA
- Human Language Technology Center of Excellence, The Johns Hopkins University, Baltimore, MD, USA
| | - Nick F Ramsey
- UMC Utrecht Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Nathan E Crone
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
5
|
Berezutskaya J, Freudenburg ZV, Vansteensel MJ, Aarnoutse EJ, Ramsey NF, van Gerven MAJ. Direct speech reconstruction from sensorimotor brain activity with optimized deep learning models. J Neural Eng 2023; 20:056010. [PMID: 37467739 PMCID: PMC10510111 DOI: 10.1088/1741-2552/ace8be] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Revised: 07/12/2023] [Accepted: 07/19/2023] [Indexed: 07/21/2023]
Abstract
Objective.Development of brain-computer interface (BCI) technology is key for enabling communication in individuals who have lost the faculty of speech due to severe motor paralysis. A BCI control strategy that is gaining attention employs speech decoding from neural data. Recent studies have shown that a combination of direct neural recordings and advanced computational models can provide promising results. Understanding which decoding strategies deliver best and directly applicable results is crucial for advancing the field.Approach.In this paper, we optimized and validated a decoding approach based on speech reconstruction directly from high-density electrocorticography recordings from sensorimotor cortex during a speech production task.Main results.We show that (1) dedicated machine learning optimization of reconstruction models is key for achieving the best reconstruction performance; (2) individual word decoding in reconstructed speech achieves 92%-100% accuracy (chance level is 8%); (3) direct reconstruction from sensorimotor brain activity produces intelligible speech.Significance.These results underline the need for model optimization in achieving best speech decoding results and highlight the potential that reconstruction-based speech decoding from sensorimotor cortex can offer for development of next-generation BCI technology for communication.
Collapse
Affiliation(s)
- Julia Berezutskaya
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht 3584 CX, The Netherlands
- Donders Center for Brain, Cognition and Behaviour, Nijmegen 6525 GD, The Netherlands
| | - Zachary V Freudenburg
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht 3584 CX, The Netherlands
| | - Mariska J Vansteensel
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht 3584 CX, The Netherlands
| | - Erik J Aarnoutse
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht 3584 CX, The Netherlands
| | - Nick F Ramsey
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht 3584 CX, The Netherlands
| | - Marcel A J van Gerven
- Donders Center for Brain, Cognition and Behaviour, Nijmegen 6525 GD, The Netherlands
| |
Collapse
|
6
|
Chen X, Wang R, Khalilian-Gourtani A, Yu L, Dugan P, Friedman D, Doyle W, Devinsky O, Wang Y, Flinker A. A Neural Speech Decoding Framework Leveraging Deep Learning and Speech Synthesis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.16.558028. [PMID: 37745380 PMCID: PMC10516019 DOI: 10.1101/2023.09.16.558028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Decoding human speech from neural signals is essential for brain-computer interface (BCI) technologies restoring speech function in populations with neurological deficits. However, it remains a highly challenging task, compounded by the scarce availability of neural signals with corresponding speech, data complexity, and high dimensionality, and the limited publicly available source code. Here, we present a novel deep learning-based neural speech decoding framework that includes an ECoG Decoder that translates electrocorticographic (ECoG) signals from the cortex into interpretable speech parameters and a novel differentiable Speech Synthesizer that maps speech parameters to spectrograms. We develop a companion audio-to-audio auto-encoder consisting of a Speech Encoder and the same Speech Synthesizer to generate reference speech parameters to facilitate the ECoG Decoder training. This framework generates natural-sounding speech and is highly reproducible across a cohort of 48 participants. Among three neural network architectures for the ECoG Decoder, the 3D ResNet model has the best decoding performance (PCC=0.804) in predicting the original speech spectrogram, closely followed by the SWIN model (PCC=0.796). Our experimental results show that our models can decode speech with high correlation even when limited to only causal operations, which is necessary for adoption by real-time neural prostheses. We successfully decode speech in participants with either left or right hemisphere coverage, which could lead to speech prostheses in patients with speech deficits resulting from left hemisphere damage. Further, we use an occlusion analysis to identify cortical regions contributing to speech decoding across our models. Finally, we provide open-source code for our two-stage training pipeline along with associated preprocessing and visualization tools to enable reproducible research and drive research across the speech science and prostheses communities.
Collapse
|
7
|
Sen O, Sheehan AM, Raman PR, Khara KS, Khalifa A, Chatterjee B. Machine-Learning Methods for Speech and Handwriting Detection Using Neural Signals: A Review. SENSORS (BASEL, SWITZERLAND) 2023; 23:5575. [PMID: 37420741 DOI: 10.3390/s23125575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 06/09/2023] [Accepted: 06/12/2023] [Indexed: 07/09/2023]
Abstract
Brain-Computer Interfaces (BCIs) have become increasingly popular in recent years due to their potential applications in diverse fields, ranging from the medical sector (people with motor and/or communication disabilities), cognitive training, gaming, and Augmented Reality/Virtual Reality (AR/VR), among other areas. BCI which can decode and recognize neural signals involved in speech and handwriting has the potential to greatly assist individuals with severe motor impairments in their communication and interaction needs. Innovative and cutting-edge advancements in this field have the potential to develop a highly accessible and interactive communication platform for these people. The purpose of this review paper is to analyze the existing research on handwriting and speech recognition from neural signals. So that the new researchers who are interested in this field can gain thorough knowledge in this research area. The current research on neural signal-based recognition of handwriting and speech has been categorized into two main types: invasive and non-invasive studies. We have examined the latest papers on converting speech-activity-based neural signals and handwriting-activity-based neural signals into text data. The methods of extracting data from the brain have also been discussed in this review. Additionally, this review includes a brief summary of the datasets, preprocessing techniques, and methods used in these studies, which were published between 2014 and 2022. This review aims to provide a comprehensive summary of the methodologies used in the current literature on neural signal-based recognition of handwriting and speech. In essence, this article is intended to serve as a valuable resource for future researchers who wish to investigate neural signal-based machine-learning methods in their work.
Collapse
Affiliation(s)
- Ovishake Sen
- Department of ECE, University of Florida, Gainesville, FL 32611, USA
| | - Anna M Sheehan
- Department of ECE, University of Florida, Gainesville, FL 32611, USA
| | - Pranay R Raman
- Department of ECE, University of Florida, Gainesville, FL 32611, USA
| | - Kabir S Khara
- Department of ECE, University of Florida, Gainesville, FL 32611, USA
| | - Adam Khalifa
- Department of ECE, University of Florida, Gainesville, FL 32611, USA
| | | |
Collapse
|
8
|
Nitta T, Horikawa J, Iribe Y, Taguchi R, Katsurada K, Shinohara S, Kawai G. Linguistic representation of vowels in speech imagery EEG. Front Hum Neurosci 2023; 17:1163578. [PMID: 37275343 PMCID: PMC10237317 DOI: 10.3389/fnhum.2023.1163578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Accepted: 04/27/2023] [Indexed: 06/07/2023] Open
Abstract
Speech imagery recognition from electroencephalograms (EEGs) could potentially become a strong contender among non-invasive brain-computer interfaces (BCIs). In this report, first we extract language representations as the difference of line-spectra of phones by statistically analyzing many EEG signals from the Broca area. Then we extract vowels by using iterative search from hand-labeled short-syllable data. The iterative search process consists of principal component analysis (PCA) that visualizes linguistic representation of vowels through eigen-vectors φ(m), and subspace method (SM) that searches an optimum line-spectrum for redesigning φ(m). The extracted linguistic representation of Japanese vowels /i/ /e/ /a/ /o/ /u/ shows 2 distinguished spectral peaks (P1, P2) in the upper frequency range. The 5 vowels are aligned on the P1-P2 chart. A 5-vowel recognition experiment using a data set of 5 subjects and a convolutional neural network (CNN) classifier gave a mean accuracy rate of 72.6%.
Collapse
Affiliation(s)
- Tsuneo Nitta
- Graduate School of Engineering, Toyohashi University of Technology, Toyohashi, Japan
| | - Junsei Horikawa
- Graduate School of Engineering, Toyohashi University of Technology, Toyohashi, Japan
| | - Yurie Iribe
- Graduate School of Information Science and Technology, Aichi Prefectural University, Nagakute, Japan
| | - Ryo Taguchi
- Graduate School of Information, Nagoya Institute of Technology, Nagoya, Japan
| | - Kouichi Katsurada
- Faculty of Science and Technology, Tokyo University of Science, Noda, Japan
| | - Shuji Shinohara
- School of Science and Engineering, Tokyo Denki University, Saitama, Japan
| | - Goh Kawai
- Online Learning Support Team, Tokyo University of Foreign Studies, Tokyo, Japan
| |
Collapse
|
9
|
Wang S, Zhu G, Shi L, Zhang C, Wu B, Yang A, Meng F, Jiang Y, Zhang J. Closed-Loop Adaptive Deep Brain Stimulation in Parkinson's Disease: Procedures to Achieve It and Future Perspectives. JOURNAL OF PARKINSON'S DISEASE 2023:JPD225053. [PMID: 37182899 DOI: 10.3233/jpd-225053] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
Parkinson's disease (PD) is a neurodegenerative disease with a heavy burden on patients, families, and society. Deep brain stimulation (DBS) can improve the symptoms of PD patients for whom medication is insufficient. However, current open-loop uninterrupted conventional DBS (cDBS) has inherent limitations, such as adverse effects, rapid battery consumption, and a need for frequent parameter adjustment. To overcome these shortcomings, adaptive DBS (aDBS) was proposed to provide responsive optimized stimulation for PD. This topic has attracted scientific interest, and a growing body of preclinical and clinical evidence has shown its benefits. However, both achievements and challenges have emerged in this novel field. To date, only limited reviews comprehensively analyzed the full framework and procedures for aDBS implementation. Herein, we review current preclinical and clinical data on aDBS for PD to discuss the full procedures for its achievement and to provide future perspectives on this treatment.
Collapse
Affiliation(s)
- Shu Wang
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Guanyu Zhu
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Lin Shi
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Chunkui Zhang
- Center of Cognition and Brain Science, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Bing Wu
- Center of Cognition and Brain Science, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Anchao Yang
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Fangang Meng
- Department of Functional Neurosurgery, Beijing Neurosurgical Institute, Capital Medical University, Beijing, China
- Beijing Key Laboratory of Neurostimulation, Beijing, China
| | - Yin Jiang
- Department of Functional Neurosurgery, Beijing Neurosurgical Institute, Capital Medical University, Beijing, China
- Beijing Key Laboratory of Neurostimulation, Beijing, China
| | - Jianguo Zhang
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- Department of Functional Neurosurgery, Beijing Neurosurgical Institute, Capital Medical University, Beijing, China
- Beijing Key Laboratory of Neurostimulation, Beijing, China
| |
Collapse
|
10
|
Gusein-zade NG, Slezkin AA, Allahyarov E. Statistical processing of time slices of electroencephalography signals during brain reaction to visual stimuli. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2023.104656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
|
11
|
Zhang J, Wu J, Qiu Y, Song A, Li W, Li X, Liu Y. Intelligent speech technologies for transcription, disease diagnosis, and medical equipment interactive control in smart hospitals: A review. Comput Biol Med 2023; 153:106517. [PMID: 36623438 PMCID: PMC9814440 DOI: 10.1016/j.compbiomed.2022.106517] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Revised: 12/23/2022] [Accepted: 12/31/2022] [Indexed: 01/07/2023]
Abstract
The growing and aging of the world population have driven the shortage of medical resources in recent years, especially during the COVID-19 pandemic. Fortunately, the rapid development of robotics and artificial intelligence technologies help to adapt to the challenges in the healthcare field. Among them, intelligent speech technology (IST) has served doctors and patients to improve the efficiency of medical behavior and alleviate the medical burden. However, problems like noise interference in complex medical scenarios and pronunciation differences between patients and healthy people hamper the broad application of IST in hospitals. In recent years, technologies such as machine learning have developed rapidly in intelligent speech recognition, which is expected to solve these problems. This paper first introduces IST's procedure and system architecture and analyzes its application in medical scenarios. Secondly, we review existing IST applications in smart hospitals in detail, including electronic medical documentation, disease diagnosis and evaluation, and human-medical equipment interaction. In addition, we elaborate on an application case of IST in the early recognition, diagnosis, rehabilitation training, evaluation, and daily care of stroke patients. Finally, we discuss IST's limitations, challenges, and future directions in the medical field. Furthermore, we propose a novel medical voice analysis system architecture that employs active hardware, active software, and human-computer interaction to realize intelligent and evolvable speech recognition. This comprehensive review and the proposed architecture offer directions for future studies on IST and its applications in smart hospitals.
Collapse
Affiliation(s)
- Jun Zhang
- The State Key Laboratory of Bioelectronics, School of Instrument Science and Engineering, Southeast University, Nanjing, 210096, China,Corresponding author
| | - Jingyue Wu
- The State Key Laboratory of Bioelectronics, School of Instrument Science and Engineering, Southeast University, Nanjing, 210096, China
| | - Yiyi Qiu
- The State Key Laboratory of Bioelectronics, School of Instrument Science and Engineering, Southeast University, Nanjing, 210096, China
| | - Aiguo Song
- The State Key Laboratory of Bioelectronics, School of Instrument Science and Engineering, Southeast University, Nanjing, 210096, China
| | - Weifeng Li
- Department of Emergency Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, 510080, China
| | - Xin Li
- Department of Emergency Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, 510080, China
| | - Yecheng Liu
- Emergency Department, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Science and Peking Union Medical College, Beijing, 100730, China
| |
Collapse
|
12
|
Verwoert M, Ottenhoff MC, Goulis S, Colon AJ, Wagner L, Tousseyn S, van Dijk JP, Kubben PL, Herff C. Dataset of Speech Production in intracranial.Electroencephalography. Sci Data 2022; 9:434. [PMID: 35869138 PMCID: PMC9307753 DOI: 10.1038/s41597-022-01542-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Accepted: 07/08/2022] [Indexed: 11/28/2022] Open
Abstract
Speech production is an intricate process involving a large number of muscles and cognitive processes. The neural processes underlying speech production are not completely understood. As speech is a uniquely human ability, it can not be investigated in animal models. High-fidelity human data can only be obtained in clinical settings and is therefore not easily available to all researchers. Here, we provide a dataset of 10 participants reading out individual words while we measured intracranial EEG from a total of 1103 electrodes. The data, with its high temporal resolution and coverage of a large variety of cortical and sub-cortical brain regions, can help in understanding the speech production process better. Simultaneously, the data can be used to test speech decoding and synthesis approaches from neural data to develop speech Brain-Computer Interfaces and speech neuroprostheses. Measurement(s) | Brain activity | Technology Type(s) | Stereotactic electroencephalography | Sample Characteristic - Organism | Homo sapiens | Sample Characteristic - Environment | Epilepsy monitoring center | Sample Characteristic - Location | The Netherlands |
Collapse
|
13
|
Guo Z, Chen F. Decoding lexical tones and vowels in imagined tonal monosyllables using fNIRS signals. J Neural Eng 2022; 19. [PMID: 36317255 DOI: 10.1088/1741-2552/ac9e1d] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Accepted: 10/27/2022] [Indexed: 11/11/2022]
Abstract
Objective.Speech is a common way of communication. Decoding verbal intent could provide a naturalistic communication way for people with severe motor disabilities. Active brain computer interaction (BCI) speller is one of the most commonly used speech BCIs. To reduce the spelling time of Chinese words, identifying vowels and tones that are embedded in imagined Chinese words is essential. Functional near-infrared spectroscopy (fNIRS) has been widely used in BCI because it is portable, non-invasive, safe, low cost, and has a relatively high spatial resolution.Approach.In this study, an active BCI speller based on fNIRS is presented by covertly rehearsing tonal monosyllables with vowels (i.e. /a/, /i/, /o/, and /u/) and four lexical tones in Mandarin Chinese (i.e. tones 1, 2, 3, and 4) for 10 s.Main results.fNIRS results showed significant differences in the right superior temporal gyrus between imagined vowels with tone 2/3/4 and those with tone 1 (i.e. more activations and stronger connections to other brain regions for imagined vowels with tones 2/3/4 than for those with tone 1). Speech-related areas for tone imagery (i.e. the right hemisphere) provided majority of information for identifying tones, while the left hemisphere had advantages in vowel identification. Having decoded both vowels and tones during the post-stimulus 15 s period, the average classification accuracies exceeded 40% and 70% in multiclass (i.e. four classes) and binary settings, respectively. To spell words more quickly, the time window size for decoding was reduced from 15 s to 2.5 s while the classification accuracies were not significantly reduced.Significance.For the first time, this work demonstrated the possibility of discriminating lexical tones and vowels in imagined tonal syllables simultaneously. In addition, the reduced time window for decoding indicated that the spelling time of Chinese words could be significantly reduced in the fNIRS-based BCIs.
Collapse
Affiliation(s)
- Zengzhi Guo
- School of Electronics and Information Engineering, Harbin Institute of Technology, Harbin, People's Republic of China.,Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, People's Republic of China
| | - Fei Chen
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, People's Republic of China
| |
Collapse
|
14
|
Shah U, Alzubaidi M, Mohsen F, Abd-Alrazaq A, Alam T, Househ M. The Role of Artificial Intelligence in Decoding Speech from EEG Signals: A Scoping Review. SENSORS (BASEL, SWITZERLAND) 2022; 22:6975. [PMID: 36146323 PMCID: PMC9505262 DOI: 10.3390/s22186975] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/25/2022] [Revised: 08/01/2022] [Accepted: 08/09/2022] [Indexed: 06/16/2023]
Abstract
Background: Brain traumas, mental disorders, and vocal abuse can result in permanent or temporary speech impairment, significantly impairing one's quality of life and occasionally resulting in social isolation. Brain-computer interfaces (BCI) can support people who have issues with their speech or who have been paralyzed to communicate with their surroundings via brain signals. Therefore, EEG signal-based BCI has received significant attention in the last two decades for multiple reasons: (i) clinical research has capitulated detailed knowledge of EEG signals, (ii) inexpensive EEG devices, and (iii) its application in medical and social fields. Objective: This study explores the existing literature and summarizes EEG data acquisition, feature extraction, and artificial intelligence (AI) techniques for decoding speech from brain signals. Method: We followed the PRISMA-ScR guidelines to conduct this scoping review. We searched six electronic databases: PubMed, IEEE Xplore, the ACM Digital Library, Scopus, arXiv, and Google Scholar. We carefully selected search terms based on target intervention (i.e., imagined speech and AI) and target data (EEG signals), and some of the search terms were derived from previous reviews. The study selection process was carried out in three phases: study identification, study selection, and data extraction. Two reviewers independently carried out study selection and data extraction. A narrative approach was adopted to synthesize the extracted data. Results: A total of 263 studies were evaluated; however, 34 met the eligibility criteria for inclusion in this review. We found 64-electrode EEG signal devices to be the most widely used in the included studies. The most common signal normalization and feature extractions in the included studies were the bandpass filter and wavelet-based feature extraction. We categorized the studies based on AI techniques, such as machine learning and deep learning. The most prominent ML algorithm was a support vector machine, and the DL algorithm was a convolutional neural network. Conclusions: EEG signal-based BCI is a viable technology that can enable people with severe or temporal voice impairment to communicate to the world directly from their brain. However, the development of BCI technology is still in its infancy.
Collapse
Affiliation(s)
- Uzair Shah
- College of Science and Engineering, Hamad Bin Khalifa University, Doha P.O. Box 34110, Qatar
| | - Mahmood Alzubaidi
- College of Science and Engineering, Hamad Bin Khalifa University, Doha P.O. Box 34110, Qatar
| | - Farida Mohsen
- College of Science and Engineering, Hamad Bin Khalifa University, Doha P.O. Box 34110, Qatar
| | - Alaa Abd-Alrazaq
- AI Center for Precision Health, Weill Cornell Medicine-Qatar, Doha P.O. Box 34110, Qatar
| | - Tanvir Alam
- College of Science and Engineering, Hamad Bin Khalifa University, Doha P.O. Box 34110, Qatar
| | - Mowafa Househ
- College of Science and Engineering, Hamad Bin Khalifa University, Doha P.O. Box 34110, Qatar
| |
Collapse
|
15
|
Wilson BS, Tucci DL, Moses DA, Chang EF, Young NM, Zeng FG, Lesica NA, Bur AM, Kavookjian H, Mussatto C, Penn J, Goodwin S, Kraft S, Wang G, Cohen JM, Ginsburg GS, Dawson G, Francis HW. Harnessing the Power of Artificial Intelligence in Otolaryngology and the Communication Sciences. J Assoc Res Otolaryngol 2022; 23:319-349. [PMID: 35441936 PMCID: PMC9086071 DOI: 10.1007/s10162-022-00846-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Accepted: 04/02/2022] [Indexed: 02/01/2023] Open
Abstract
Use of artificial intelligence (AI) is a burgeoning field in otolaryngology and the communication sciences. A virtual symposium on the topic was convened from Duke University on October 26, 2020, and was attended by more than 170 participants worldwide. This review presents summaries of all but one of the talks presented during the symposium; recordings of all the talks, along with the discussions for the talks, are available at https://www.youtube.com/watch?v=ktfewrXvEFg and https://www.youtube.com/watch?v=-gQ5qX2v3rg . Each of the summaries is about 2500 words in length and each summary includes two figures. This level of detail far exceeds the brief summaries presented in traditional reviews and thus provides a more-informed glimpse into the power and diversity of current AI applications in otolaryngology and the communication sciences and how to harness that power for future applications.
Collapse
Affiliation(s)
- Blake S. Wilson
- Department of Head and Neck Surgery & Communication Sciences, Duke University School of Medicine, Durham, NC 27710 USA
- Duke Hearing Center, Duke University School of Medicine, Durham, NC 27710 USA
- Department of Electrical & Computer Engineering, Duke University, Durham, NC 27708 USA
- Department of Biomedical Engineering, Duke University, Durham, NC 27708 USA
- Department of Otolaryngology – Head & Neck Surgery, University of North Carolina, Chapel Hill, Chapel Hill, NC 27599 USA
| | - Debara L. Tucci
- Department of Head and Neck Surgery & Communication Sciences, Duke University School of Medicine, Durham, NC 27710 USA
- National Institute On Deafness and Other Communication Disorders, National Institutes of Health, Bethesda, MD 20892 USA
| | - David A. Moses
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA 94143 USA
- UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94117 USA
| | - Edward F. Chang
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA 94143 USA
- UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94117 USA
| | - Nancy M. Young
- Division of Otolaryngology, Ann and Robert H. Lurie Childrens Hospital of Chicago, Chicago, IL 60611 USA
- Department of Otolaryngology - Head and Neck Surgery, Northwestern University Feinberg School of Medicine, Chicago, IL 60611 USA
- Department of Communication, Knowles Hearing Center, Northwestern University, Evanston, IL 60208 USA
| | - Fan-Gang Zeng
- Center for Hearing Research, University of California, Irvine, Irvine, CA 92697 USA
- Department of Anatomy and Neurobiology, University of California, Irvine, Irvine, CA 92697 USA
- Department of Biomedical Engineering, University of California, Irvine, Irvine, CA 92697 USA
- Department of Cognitive Sciences, University of California, Irvine, Irvine, CA 92697 USA
- Department of Otolaryngology – Head and Neck Surgery, University of California, Irvine, CA 92697 USA
| | | | - Andrés M. Bur
- Department of Otolaryngology - Head and Neck Surgery, Medical Center, University of Kansas, Kansas City, KS 66160 USA
| | - Hannah Kavookjian
- Department of Otolaryngology - Head and Neck Surgery, Medical Center, University of Kansas, Kansas City, KS 66160 USA
| | - Caroline Mussatto
- Department of Otolaryngology - Head and Neck Surgery, Medical Center, University of Kansas, Kansas City, KS 66160 USA
| | - Joseph Penn
- Department of Otolaryngology - Head and Neck Surgery, Medical Center, University of Kansas, Kansas City, KS 66160 USA
| | - Sara Goodwin
- Department of Otolaryngology - Head and Neck Surgery, Medical Center, University of Kansas, Kansas City, KS 66160 USA
| | - Shannon Kraft
- Department of Otolaryngology - Head and Neck Surgery, Medical Center, University of Kansas, Kansas City, KS 66160 USA
| | - Guanghui Wang
- Department of Computer Science, Ryerson University, Toronto, ON M5B 2K3 Canada
| | - Jonathan M. Cohen
- Department of Head and Neck Surgery & Communication Sciences, Duke University School of Medicine, Durham, NC 27710 USA
- ENT Department, Kaplan Medical Center, 7661041 Rehovot, Israel
| | - Geoffrey S. Ginsburg
- Department of Biomedical Engineering, Duke University, Durham, NC 27708 USA
- MEDx (Medicine & Engineering at Duke), Duke University, Durham, NC 27708 USA
- Center for Applied Genomics & Precision Medicine, Duke University School of Medicine, Durham, NC 27710 USA
- Department of Medicine, Duke University School of Medicine, Durham, NC 27710 USA
- Department of Pathology, Duke University School of Medicine, Durham, NC 27710 USA
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710 USA
| | - Geraldine Dawson
- Duke Institute for Brain Sciences, Duke University, Durham, NC 27710 USA
- Duke Center for Autism and Brain Development, Duke University School of Medicine and the Duke Institute for Brain Sciences, NIH Autism Center of Excellence, Durham, NC 27705 USA
- Department of Psychiatry and Behavioral Sciences, Duke University School of Medicine, Durham, NC 27701 USA
| | - Howard W. Francis
- Department of Head and Neck Surgery & Communication Sciences, Duke University School of Medicine, Durham, NC 27710 USA
| |
Collapse
|
16
|
Lopez-Bernal D, Balderas D, Ponce P, Molina A. A State-of-the-Art Review of EEG-Based Imagined Speech Decoding. Front Hum Neurosci 2022; 16:867281. [PMID: 35558735 PMCID: PMC9086783 DOI: 10.3389/fnhum.2022.867281] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 03/24/2022] [Indexed: 11/13/2022] Open
Abstract
Currently, the most used method to measure brain activity under a non-invasive procedure is the electroencephalogram (EEG). This is because of its high temporal resolution, ease of use, and safety. These signals can be used under a Brain Computer Interface (BCI) framework, which can be implemented to provide a new communication channel to people that are unable to speak due to motor disabilities or other neurological diseases. Nevertheless, EEG-based BCI systems have presented challenges to be implemented in real life situations for imagined speech recognition due to the difficulty to interpret EEG signals because of their low signal-to-noise ratio (SNR). As consequence, in order to help the researcher make a wise decision when approaching this problem, we offer a review article that sums the main findings of the most relevant studies on this subject since 2009. This review focuses mainly on the pre-processing, feature extraction, and classification techniques used by several authors, as well as the target vocabulary. Furthermore, we propose ideas that may be useful for future work in order to achieve a practical application of EEG-based BCI systems toward imagined speech decoding.
Collapse
Affiliation(s)
- Diego Lopez-Bernal
- Tecnologico de Monterrey, National Department of Research, Mexico City, Mexico
| | | | | | | |
Collapse
|
17
|
Abdul Nabi Ali A, Alam M, Klein SC, Behmann N, Krauss JK, Doll T, Blume H, Schwabe K. Predictive accuracy of CNN for cortical oscillatory activity in an acute rat model of parkinsonism. Neural Netw 2021; 146:334-340. [PMID: 34923220 DOI: 10.1016/j.neunet.2021.11.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 10/08/2021] [Accepted: 11/24/2021] [Indexed: 10/19/2022]
Abstract
In neurological and neuropsychiatric disorders neuronal oscillatory activity between basal ganglia and cortical circuits are altered, which may be useful as biomarker for adaptive deep brain stimulation. We investigated whether changes in the spectral power of oscillatory activity in the motor cortex (MCtx) and the sensorimotor cortex (SMCtx) of rats after injection of the dopamine (DA) receptor antagonist haloperidol (HALO) would be similar to those observed in Parkinson disease. Thereafter, we tested whether a convolutional neural network (CNN) model would identify brain signal alterations in this acute model of parkinsonism. A sixteen channel surface micro-electrocorticogram (ECoG) recording array was placed under the dura above the MCtx and SMCtx areas of one hemisphere under general anaesthesia in rats. Seven days after surgery, micro ECoG was recorded in individual free moving rats in three conditions: (1) basal activity, (2) after injection of HALO (0.5 mg/kg), and (3) with additional injection of apomorphine (APO) (1 mg/kg). Furthermore, a CNN-based classification consisting of 23,530 parameters was applied on the raw data. HALO injection decreased oscillatory theta band activity (4-8 Hz) and enhanced beta (12-30 Hz) and gamma (30-100 Hz) in MCtx and SMCtx, which was compensated after APO injection (P ¡ 0.001). Evaluation of classification performance of the CNN model provided accuracy of 92%, sensitivity of 90% and specificity of 93% on one-dimensional signals. The CNN proposed model requires a minimum of sensory hardware and may be integrated into future research on therapeutic devices for Parkinson disease, such as adaptive closed loop stimulation, thus contributing to more efficient way of treatment.
Collapse
Affiliation(s)
- Ali Abdul Nabi Ali
- Institute of Microelectronic Systems, Architectures and Systems, Leibniz University Hannover, Hannover, D-30167, Lower Saxony, Germany
| | - Mesbah Alam
- Department of Neurosurgery, Hannover Medical School, Hannover, D-30625, Lower Saxony, Germany.
| | - Simon C Klein
- Institute of Microelectronic Systems, Architectures and Systems, Leibniz University Hannover, Hannover, D-30167, Lower Saxony, Germany
| | - Nicolai Behmann
- Institute of Microelectronic Systems, Architectures and Systems, Leibniz University Hannover, Hannover, D-30167, Lower Saxony, Germany
| | - Joachim K Krauss
- Department of Neurosurgery, Hannover Medical School, Hannover, D-30625, Lower Saxony, Germany
| | - Theodor Doll
- Biomaterial Engineering, Hannover Medical School and Translational Medical Engineering Fraunhofer ITEM, Hannover, D-30625, Lower Saxony, Germany
| | - Holger Blume
- Institute of Microelectronic Systems, Architectures and Systems, Leibniz University Hannover, Hannover, D-30167, Lower Saxony, Germany
| | - Kerstin Schwabe
- Department of Neurosurgery, Hannover Medical School, Hannover, D-30625, Lower Saxony, Germany
| |
Collapse
|
18
|
Valeriani D, Ayaz H, Kosmyna N, Poli R, Maes P. Editorial: Neurotechnologies for Human Augmentation. Front Neurosci 2021; 15:789868. [PMID: 34858136 PMCID: PMC8631818 DOI: 10.3389/fnins.2021.789868] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Accepted: 10/18/2021] [Indexed: 11/17/2022] Open
Affiliation(s)
| | - Hasan Ayaz
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, United States
| | - Nataliya Kosmyna
- Media Lab, Massachusetts Institute of Technology, Cambridge, MA, United States
| | - Riccardo Poli
- School of Computer Science and Electronic Engineering, University of Essex, Colchester, United Kingdom
| | - Pattie Maes
- Media Lab, Massachusetts Institute of Technology, Cambridge, MA, United States
| |
Collapse
|
19
|
Angrick M, Ottenhoff M, Goulis S, Colon AJ, Wagner L, Krusienski DJ, Kubben PL, Schultz T, Herff C. Speech Synthesis from Stereotactic EEG using an Electrode Shaft Dependent Multi-Input Convolutional Neural Network Approach. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2021; 2021:6045-6048. [PMID: 34892495 DOI: 10.1109/embc46164.2021.9629711] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Neurological disorders can lead to significant impairments in speech communication and, in severe cases, cause the complete loss of the ability to speak. Brain-Computer Interfaces have shown promise as an alternative communication modality by directly transforming neural activity of speech processes into a textual or audible representations. Previous studies investigating such speech neuroprostheses relied on electrocorticography (ECoG) or microelectrode arrays that acquire neural signals from superficial areas on the cortex. While both measurement methods have demonstrated successful speech decoding, they do not capture activity from deeper brain structures and this activity has therefore not been harnessed for speech-related BCIs. In this study, we bridge this gap by adapting a previously presented decoding pipeline for speech synthesis based on ECoG signals to implanted depth electrodes (sEEG). For this purpose, we propose a multi-input convolutional neural network that extracts speech-related activity separately for each electrode shaft and estimates spectral coefficients to reconstruct an audible waveform. We evaluate our approach on open-loop data from 5 patients who conducted a recitation task of Dutch utterances. We achieve correlations of up to 0.80 between original and reconstructed speech spectrograms, which are significantly above chance level for all patients (p < 0.001). Our results indicate that sEEG can yield similar speech decoding performance to prior ECoG studies and is a promising modality for speech BCIs.
Collapse
|
20
|
Si X, Li S, Xiang S, Yu J, Ming D. Imagined speech increases the hemodynamic response and functional connectivity of the dorsal motor cortex. J Neural Eng 2021; 18. [PMID: 34507311 DOI: 10.1088/1741-2552/ac25d9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Accepted: 09/10/2021] [Indexed: 11/12/2022]
Abstract
Objective. Decoding imagined speech from brain signals could provide a more natural, user-friendly way for developing the next generation of the brain-computer interface (BCI). With the advantages of non-invasive, portable, relatively high spatial resolution and insensitivity to motion artifacts, the functional near-infrared spectroscopy (fNIRS) shows great potential for developing the non-invasive speech BCI. However, there is a lack of fNIRS evidence in uncovering the neural mechanism of imagined speech. Our goal is to investigate the specific brain regions and the corresponding cortico-cortical functional connectivity features during imagined speech with fNIRS.Approach. fNIRS signals were recorded from 13 subjects' bilateral motor and prefrontal cortex during overtly and covertly repeating words. Cortical activation was determined through the mean oxygen-hemoglobin concentration changes, and functional connectivity was calculated by Pearson's correlation coefficient.Main results. (a) The bilateral dorsal motor cortex was significantly activated during the covert speech, whereas the bilateral ventral motor cortex was significantly activated during the overt speech. (b) As a subregion of the motor cortex, sensorimotor cortex (SMC) showed a dominant dorsal response to covert speech condition, whereas a dominant ventral response to overt speech condition. (c) Broca's area was deactivated during the covert speech but activated during the overt speech. (d) Compared to overt speech, dorsal SMC(dSMC)-related functional connections were enhanced during the covert speech.Significance. We provide fNIRS evidence for the involvement of dSMC in speech imagery. dSMC is the speech imagery network's key hub and is probably involved in the sensorimotor information processing during the covert speech. This study could inspire the BCI community to focus on the potential contribution of dSMC during speech imagery.
Collapse
Affiliation(s)
- Xiaopeng Si
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, People's Republic of China.,Tianjin Key Laboratory of Brain Science and Neural Engineering, Tianjin University, Tianjin 300072, People's Republic of China.,Tianjin International Engineering Institute, Tianjin University, Tianjin 300072, People's Republic of China.,Institute of Applied Psychology, Tianjin University, Tianjin 300350, People's Republic of China
| | - Sicheng Li
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, People's Republic of China.,Tianjin Key Laboratory of Brain Science and Neural Engineering, Tianjin University, Tianjin 300072, People's Republic of China
| | - Shaoxin Xiang
- Tianjin Key Laboratory of Brain Science and Neural Engineering, Tianjin University, Tianjin 300072, People's Republic of China.,Tianjin International Engineering Institute, Tianjin University, Tianjin 300072, People's Republic of China
| | - Jiayue Yu
- Tianjin Key Laboratory of Brain Science and Neural Engineering, Tianjin University, Tianjin 300072, People's Republic of China.,Tianjin International Engineering Institute, Tianjin University, Tianjin 300072, People's Republic of China
| | - Dong Ming
- Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, People's Republic of China.,Tianjin Key Laboratory of Brain Science and Neural Engineering, Tianjin University, Tianjin 300072, People's Republic of China
| |
Collapse
|
21
|
Local field potentials in a pre-motor region predict learned vocal sequences. PLoS Comput Biol 2021; 17:e1008100. [PMID: 34555020 PMCID: PMC8460039 DOI: 10.1371/journal.pcbi.1008100] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 07/08/2021] [Indexed: 11/19/2022] Open
Abstract
Neuronal activity within the premotor region HVC is tightly synchronized to, and crucial for, the articulate production of learned song in birds. Characterizations of this neural activity detail patterns of sequential bursting in small, carefully identified subsets of neurons in the HVC population. The dynamics of HVC are well described by these characterizations, but have not been verified beyond this scale of measurement. There is a rich history of using local field potentials (LFP) to extract information about behavior that extends beyond the contribution of individual cells. These signals have the advantage of being stable over longer periods of time, and they have been used to study and decode human speech and other complex motor behaviors. Here we characterize LFP signals presumptively from the HVC of freely behaving male zebra finches during song production to determine if population activity may yield similar insights into the mechanisms underlying complex motor-vocal behavior. Following an initial observation that structured changes in the LFP were distinct to all vocalizations during song, we show that it is possible to extract time-varying features from multiple frequency bands to decode the identity of specific vocalization elements (syllables) and to predict their temporal onsets within the motif. This demonstrates the utility of LFP for studying vocal behavior in songbirds. Surprisingly, the time frequency structure of HVC LFP is qualitatively similar to well-established oscillations found in both human and non-human mammalian motor areas. This physiological similarity, despite distinct anatomical structures, may give insight into common computational principles for learning and/or generating complex motor-vocal behaviors. Vocalizations, such as speech and song, are a motor process that requires the coordination of numerous muscle groups receiving instructions from specific brain regions. In songbirds, HVC is a premotor brain region required for singing; it is populated by a set of neurons that fire sparsely during song. How HVC enables song generation is not well understood. Here we describe network activity presumptively from HVC that precedes the initiation of each vocal element during singing. This network activity can be used to predict both the identity of each vocal element (syllable) and when it will occur during song. In addition, this network activity is similar to activity that has been documented in human, non-human primate, and mammalian premotor regions tied to muscle movements. These similarities add to a growing body of literature that finds parallels between songbirds and humans in respect to the motor control of vocal organs. Furthermore, given the similarities of the songbird and human motor-vocal systems, these results suggest that the songbird model could be leveraged to accelerate the development of clinically translatable speech prosthesis.
Collapse
|
22
|
Panachakel JT, Ramakrishnan AG. Decoding Covert Speech From EEG-A Comprehensive Review. Front Neurosci 2021; 15:642251. [PMID: 33994922 PMCID: PMC8116487 DOI: 10.3389/fnins.2021.642251] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 03/18/2021] [Indexed: 11/13/2022] Open
Abstract
Over the past decade, many researchers have come up with different implementations of systems for decoding covert or imagined speech from EEG (electroencephalogram). They differ from each other in several aspects, from data acquisition to machine learning algorithms, due to which, a comparison between different implementations is often difficult. This review article puts together all the relevant works published in the last decade on decoding imagined speech from EEG into a single framework. Every important aspect of designing such a system, such as selection of words to be imagined, number of electrodes to be recorded, temporal and spatial filtering, feature extraction and classifier are reviewed. This helps a researcher to compare the relative merits and demerits of the different approaches and choose the one that is most optimal. Speech being the most natural form of communication which human beings acquire even without formal education, imagined speech is an ideal choice of prompt for evoking brain activity patterns for a BCI (brain-computer interface) system, although the research on developing real-time (online) speech imagery based BCI systems is still in its infancy. Covert speech based BCI can help people with disabilities to improve their quality of life. It can also be used for covert communication in environments that do not support vocal communication. This paper also discusses some future directions, which will aid the deployment of speech imagery based BCI for practical applications, rather than only for laboratory experiments.
Collapse
Affiliation(s)
- Jerrin Thomas Panachakel
- Medical Intelligence and Language Engineering Laboratory, Department of Electrical Engineering, Indian Institute of Science, Bangalore, India
| | | |
Collapse
|
23
|
Yang Y, Ahmadipour P, Shanechi MM. Adaptive latent state modeling of brain network dynamics with real-time learning rate optimization. J Neural Eng 2021; 18. [PMID: 33254159 DOI: 10.1088/1741-2552/abcefd] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Accepted: 11/30/2020] [Indexed: 12/29/2022]
Abstract
Objective. Dynamic latent state models are widely used to characterize the dynamics of brain network activity for various neural signal types. To date, dynamic latent state models have largely been developed for stationary brain network dynamics. However, brain network dynamics can be non-stationary for example due to learning, plasticity or recording instability. To enable modeling these non-stationarities, two problems need to be resolved. First, novel methods should be developed that can adaptively update the parameters of latent state models, which is difficult due to the state being latent. Second, new methods are needed to optimize the adaptation learning rate, which specifies how fast new neural observations update the model parameters and can significantly influence adaptation accuracy.Approach. We develop a Rate Optimized-adaptive Linear State-Space Modeling (RO-adaptive LSSM) algorithm that solves these two problems. First, to enable adaptation, we derive a computation- and memory-efficient adaptive LSSM fitting algorithm that updates the LSSM parameters recursively and in real time in the presence of the latent state. Second, we develop a real-time learning rate optimization algorithm. We use comprehensive simulations of a broad range of non-stationary brain network dynamics to validate both algorithms, which together constitute the RO-adaptive LSSM.Main results. We show that the adaptive LSSM fitting algorithm can accurately track the broad simulated non-stationary brain network dynamics. We also find that the learning rate significantly affects the LSSM fitting accuracy. Finally, we show that the real-time learning rate optimization algorithm can run in parallel with the adaptive LSSM fitting algorithm. Doing so, the combined RO-adaptive LSSM algorithm rapidly converges to the optimal learning rate and accurately tracks non-stationarities.Significance. These algorithms can be used to study time-varying neural dynamics underlying various brain functions and enhance future neurotechnologies such as brain-machine interfaces and closed-loop brain stimulation systems.
Collapse
Affiliation(s)
- Yuxiao Yang
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, United States of America.,These authors contributed equally to this work
| | - Parima Ahmadipour
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, United States of America.,These authors contributed equally to this work
| | - Maryam M Shanechi
- Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, United States of America.,Neuroscience Graduate Program, University of Southern California, Los Angeles, CA, United States of America
| |
Collapse
|
24
|
Latif S, Qadir J, Qayyum A, Usama M, Younis S. Speech Technology for Healthcare: Opportunities, Challenges, and State of the Art. IEEE Rev Biomed Eng 2021; 14:342-356. [PMID: 32746367 DOI: 10.1109/rbme.2020.3006860] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Speech technology is not appropriately explored even though modern advances in speech technology-especially those driven by deep learning (DL) technology-offer unprecedented opportunities for transforming the healthcare industry. In this paper, we have focused on the enormous potential of speech technology for revolutionising the healthcare domain. More specifically, we review the state-of-the-art approaches in automatic speech recognition (ASR), speech synthesis or text to speech (TTS), and health detection and monitoring using speech signals. We also present a comprehensive overview of various challenges hindering the growth of speech-based services in healthcare. To make speech-based healthcare solutions more prevalent, we discuss open issues and suggest some possible research directions aimed at fully leveraging the advantages of other technologies for making speech-based healthcare solutions more effective.
Collapse
|
25
|
Standardization-refinement domain adaptation method for cross-subject EEG-based classification in imagined speech recognition. Pattern Recognit Lett 2021. [DOI: 10.1016/j.patrec.2020.11.013] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
26
|
Livezey JA, Glaser JI. Deep learning approaches for neural decoding across architectures and recording modalities. Brief Bioinform 2020; 22:1577-1591. [PMID: 33372958 DOI: 10.1093/bib/bbaa355] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Revised: 10/31/2020] [Accepted: 11/04/2020] [Indexed: 12/19/2022] Open
Abstract
Decoding behavior, perception or cognitive state directly from neural signals is critical for brain-computer interface research and an important tool for systems neuroscience. In the last decade, deep learning has become the state-of-the-art method in many machine learning tasks ranging from speech recognition to image segmentation. The success of deep networks in other domains has led to a new wave of applications in neuroscience. In this article, we review deep learning approaches to neural decoding. We describe the architectures used for extracting useful features from neural recording modalities ranging from spikes to functional magnetic resonance imaging. Furthermore, we explore how deep learning has been leveraged to predict common outputs including movement, speech and vision, with a focus on how pretrained deep networks can be incorporated as priors for complex decoding targets like acoustic speech or images. Deep learning has been shown to be a useful tool for improving the accuracy and flexibility of neural decoding across a wide range of tasks, and we point out areas for future scientific development.
Collapse
Affiliation(s)
- Jesse A Livezey
- Neural Systems and Data Science Laboratory at the Lawrence Berkeley National Laboratory. He obtained his PhD in Physics from the University of California, Berkeley
| | - Joshua I Glaser
- Center for Theoretical Neuroscience and Department of Statistics at Columbia University. He obtained his PhD in Neuroscience from Northwestern University
| |
Collapse
|
27
|
Wilson GH, Stavisky SD, Willett FR, Avansino DT, Kelemen JN, Hochberg LR, Henderson JM, Druckmann S, Shenoy KV. Decoding spoken English from intracortical electrode arrays in dorsal precentral gyrus. J Neural Eng 2020; 17:066007. [PMID: 33236720 PMCID: PMC8293867 DOI: 10.1088/1741-2552/abbfef] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
OBJECTIVE To evaluate the potential of intracortical electrode array signals for brain-computer interfaces (BCIs) to restore lost speech, we measured the performance of decoders trained to discriminate a comprehensive basis set of 39 English phonemes and to synthesize speech sounds via a neural pattern matching method. We decoded neural correlates of spoken-out-loud words in the 'hand knob' area of precentral gyrus, a step toward the eventual goal of decoding attempted speech from ventral speech areas in patients who are unable to speak. APPROACH Neural and audio data were recorded while two BrainGate2 pilot clinical trial participants, each with two chronically-implanted 96-electrode arrays, spoke 420 different words that broadly sampled English phonemes. Phoneme onsets were identified from audio recordings, and their identities were then classified from neural features consisting of each electrode's binned action potential counts or high-frequency local field potential power. Speech synthesis was performed using the 'Brain-to-Speech' pattern matching method. We also examined two potential confounds specific to decoding overt speech: acoustic contamination of neural signals and systematic differences in labeling different phonemes' onset times. MAIN RESULTS A linear decoder achieved up to 29.3% classification accuracy (chance = 6%) across 39 phonemes, while an RNN classifier achieved 33.9% accuracy. Parameter sweeps indicated that performance did not saturate when adding more electrodes or more training data, and that accuracy improved when utilizing time-varying structure in the data. Microphonic contamination and phoneme onset differences modestly increased decoding accuracy, but could be mitigated by acoustic artifact subtraction and using a neural speech onset marker, respectively. Speech synthesis achieved r = 0.523 correlation between true and reconstructed audio. SIGNIFICANCE The ability to decode speech using intracortical electrode array signals from a nontraditional speech area suggests that placing electrode arrays in ventral speech areas is a promising direction for speech BCIs.
Collapse
Affiliation(s)
- Guy H Wilson
- Neurosciences Graduate Program, Stanford University, Stanford, CA, United States of America
| | - Sergey D Stavisky
- Department of Neurosurgery, Stanford University, Stanford, CA, United States of America
- Wu Tsai Neurosciences Institute and Bio-X Institute, Stanford University, Stanford, CA, United States of America
- Department of Electrical Engineering, Stanford University, Stanford, CA, United States of America
| | - Francis R Willett
- Department of Neurosurgery, Stanford University, Stanford, CA, United States of America
- Department of Electrical Engineering, Stanford University, Stanford, CA, United States of America
- Howard Hughes Medical Institute at Stanford University, Stanford, CA, United States of America
| | - Donald T Avansino
- Department of Neurosurgery, Stanford University, Stanford, CA, United States of America
| | - Jessica N Kelemen
- Department of Neurology, Harvard Medical School, Boston, MA, United States of America
| | - Leigh R Hochberg
- Department of Neurology, Harvard Medical School, Boston, MA, United States of America
- Center for Neurotechnology and Neurorecovery, Dept. of Neurology, Massachusetts General Hospital, Boston, MA, United States of America
- VA RR&D Center for Neurorestoration and Neurotechnology, Rehabilitation R&D Service, Providence VA Medical Center, Providence, RI, United States of America
- Carney Institute for Brain Science and School of Engineering, Brown University, Providence, RI, United States of America
| | - Jaimie M Henderson
- Department of Neurosurgery, Stanford University, Stanford, CA, United States of America
- Wu Tsai Neurosciences Institute and Bio-X Institute, Stanford University, Stanford, CA, United States of America
| | - Shaul Druckmann
- Wu Tsai Neurosciences Institute and Bio-X Institute, Stanford University, Stanford, CA, United States of America
- Department of Neurobiology, Stanford University, Stanford, CA, United States of America
| | - Krishna V Shenoy
- Wu Tsai Neurosciences Institute and Bio-X Institute, Stanford University, Stanford, CA, United States of America
- Department of Electrical Engineering, Stanford University, Stanford, CA, United States of America
- Howard Hughes Medical Institute at Stanford University, Stanford, CA, United States of America
- Department of Neurobiology, Stanford University, Stanford, CA, United States of America
- Department of Bioengineering, Stanford University, Stanford, CA, United States of America
| |
Collapse
|
28
|
Modeling behaviorally relevant neural dynamics enabled by preferential subspace identification. Nat Neurosci 2020; 24:140-149. [PMID: 33169030 DOI: 10.1038/s41593-020-00733-0] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Accepted: 10/02/2020] [Indexed: 11/09/2022]
Abstract
Neural activity exhibits complex dynamics related to various brain functions, internal states and behaviors. Understanding how neural dynamics explain specific measured behaviors requires dissociating behaviorally relevant and irrelevant dynamics, which is not achieved with current neural dynamic models as they are learned without considering behavior. We develop preferential subspace identification (PSID), which is an algorithm that models neural activity while dissociating and prioritizing its behaviorally relevant dynamics. Modeling data in two monkeys performing three-dimensional reach and grasp tasks, PSID revealed that the behaviorally relevant dynamics are significantly lower-dimensional than otherwise implied. Moreover, PSID discovered distinct rotational dynamics that were more predictive of behavior. Furthermore, PSID more accurately learned behaviorally relevant dynamics for each joint and recording channel. Finally, modeling data in two monkeys performing saccades demonstrated the generalization of PSID across behaviors, brain regions and neural signal types. PSID provides a general new tool to reveal behaviorally relevant neural dynamics that can otherwise go unnoticed.
Collapse
|
29
|
Herff C, Krusienski DJ, Kubben P. The Potential of Stereotactic-EEG for Brain-Computer Interfaces: Current Progress and Future Directions. Front Neurosci 2020; 14:123. [PMID: 32174810 PMCID: PMC7056827 DOI: 10.3389/fnins.2020.00123] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Accepted: 01/30/2020] [Indexed: 12/17/2022] Open
Abstract
Stereotactic electroencephalogaphy (sEEG) utilizes localized, penetrating depth electrodes to measure electrophysiological brain activity. It is most commonly used in the identification of epileptogenic zones in cases of refractory epilepsy. The implanted electrodes generally provide a sparse sampling of a unique set of brain regions including deeper brain structures such as hippocampus, amygdala and insula that cannot be captured by superficial measurement modalities such as electrocorticography (ECoG). Despite the overlapping clinical application and recent progress in decoding of ECoG for Brain-Computer Interfaces (BCIs), sEEG has thus far received comparatively little attention for BCI decoding. Additionally, the success of the related deep-brain stimulation (DBS) implants bodes well for the potential for chronic sEEG applications. This article provides an overview of sEEG technology, BCI-related research, and prospective future directions of sEEG for long-term BCI applications.
Collapse
Affiliation(s)
- Christian Herff
- Department of Neurosurgery, School of Mental Health and Neurosciences, Maastricht University, Maastricht, Netherlands
| | - Dean J Krusienski
- ASPEN Lab, Biomedical Engineering Department, Virginia Commonwealth University, Richmond, VA, United States
| | - Pieter Kubben
- Department of Neurosurgery, Maastricht University Medical Center, Maastricht, Netherlands
| |
Collapse
|
30
|
Electrocorticogram (ECoG) Is Highly Informative in Primate Visual Cortex. J Neurosci 2020; 40:2430-2444. [PMID: 32066581 DOI: 10.1523/jneurosci.1368-19.2020] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2019] [Revised: 02/08/2020] [Accepted: 02/10/2020] [Indexed: 12/21/2022] Open
Abstract
Neural signals recorded at different scales contain information about environment and behavior and have been used to control Brain Machine Interfaces with varying degrees of success. However, a direct comparison of their efficacy has not been possible due to different recording setups, tasks, species, etc. To address this, we implanted customized arrays having both microelectrodes and electrocorticogram (ECoG) electrodes in the primary visual cortex of 2 female macaque monkeys, and also recorded electroencephalogram (EEG), while they viewed a variety of naturalistic images and parametric gratings. Surprisingly, ECoG had higher information and decodability than all other signals. Combining a few ECoG electrodes allowed more accurate decoding than combining a much larger number of microelectrodes. Control analyses showed that higher decoding accuracy of ECoG compared with local field potential was not because of differences in low-level visual features captured by them but instead because of larger spatial summation of the ECoG. Information was high in the 30-80 Hz range and at lower frequencies. Information in different frequencies and scales was nonredundant. These results have strong implications for Brain Machine Interface applications and for study of population representation of visual stimuli.SIGNIFICANCE STATEMENT Electrophysiological signals captured across scales by different recording electrodes are regularly used for Brain Machine Interfaces, but the information content varies due to electrode size and location. A systematic comparison of their efficiency for Brain Machine Interfaces is important but technically challenging. Here, we recorded simultaneous signals across four scales: spikes, local field potential, electrocorticogram (ECoG), and EEG, and compared their information and decoding accuracy for a large variety of naturalistic stimuli. We found that ECoGs were highly informative and outperformed other signals in information content and decoding accuracy.
Collapse
|
31
|
Schultz T, Angrick M, Diener L, Kuster D, Meier M, Krusienski DJ, Herff C, Brumberg JS. Towards Restoration of Articulatory Movements: Functional Electrical Stimulation of Orofacial Muscles. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2020; 2019:3111-3114. [PMID: 31946546 DOI: 10.1109/embc.2019.8857670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Millions of individuals suffer from impairments that significantly disrupt or completely eliminate their ability to speak. An ideal intervention would restore one's natural ability to physically produce speech. Recent progress has been made in decoding speech-related brain activity to generate synthesized speech. Our vision is to extend these recent advances toward the goal of restoring physical speech production using decoded speech-related brain activity to modulate the electrical stimulation of the orofacial musculature involved in speech. In this pilot study we take a step toward this vision by investigating the feasibility of stimulating orofacial muscles during vocalization in order to alter acoustic production. The results of our study provide necessary foundation for eventual orofacial stimulation controlled directly from decoded speech-related brain activity.
Collapse
|
32
|
Stavisky SD, Willett FR, Wilson GH, Murphy BA, Rezaii P, Avansino DT, Memberg WD, Miller JP, Kirsch RF, Hochberg LR, Ajiboye AB, Druckmann S, Shenoy KV, Henderson JM. Neural ensemble dynamics in dorsal motor cortex during speech in people with paralysis. eLife 2019; 8:e46015. [PMID: 31820736 PMCID: PMC6954053 DOI: 10.7554/elife.46015] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Accepted: 11/14/2019] [Indexed: 01/20/2023] Open
Abstract
Speaking is a sensorimotor behavior whose neural basis is difficult to study with single neuron resolution due to the scarcity of human intracortical measurements. We used electrode arrays to record from the motor cortex 'hand knob' in two people with tetraplegia, an area not previously implicated in speech. Neurons modulated during speaking and during non-speaking movements of the tongue, lips, and jaw. This challenges whether the conventional model of a 'motor homunculus' division by major body regions extends to the single-neuron scale. Spoken words and syllables could be decoded from single trials, demonstrating the potential of intracortical recordings for brain-computer interfaces to restore speech. Two neural population dynamics features previously reported for arm movements were also present during speaking: a component that was mostly invariant across initiating different words, followed by rotatory dynamics during speaking. This suggests that common neural dynamical motifs may underlie movement of arm and speech articulators.
Collapse
Affiliation(s)
- Sergey D Stavisky
- Department of NeurosurgeryStanford UniversityStanfordUnited States
- Department of Electrical EngineeringStanford UniversityStanfordUnited States
| | - Francis R Willett
- Department of NeurosurgeryStanford UniversityStanfordUnited States
- Department of Electrical EngineeringStanford UniversityStanfordUnited States
| | - Guy H Wilson
- Neurosciences ProgramStanford UniversityStanfordUnited States
| | - Brian A Murphy
- Department of Biomedical EngineeringCase Western Reserve UniversityClevelandUnited States
- FES Center, Rehab R&D ServiceLouis Stokes Cleveland Department of Veterans Affairs Medical CenterClevelandUnited States
| | - Paymon Rezaii
- Department of NeurosurgeryStanford UniversityStanfordUnited States
| | | | - William D Memberg
- Department of Biomedical EngineeringCase Western Reserve UniversityClevelandUnited States
- FES Center, Rehab R&D ServiceLouis Stokes Cleveland Department of Veterans Affairs Medical CenterClevelandUnited States
| | - Jonathan P Miller
- FES Center, Rehab R&D ServiceLouis Stokes Cleveland Department of Veterans Affairs Medical CenterClevelandUnited States
- Department of NeurosurgeryUniversity Hospitals Cleveland Medical CenterClevelandUnited States
| | - Robert F Kirsch
- Department of Biomedical EngineeringCase Western Reserve UniversityClevelandUnited States
- FES Center, Rehab R&D ServiceLouis Stokes Cleveland Department of Veterans Affairs Medical CenterClevelandUnited States
| | - Leigh R Hochberg
- VA RR&D Center for Neurorestoration and Neurotechnology, Rehabilitation R&D ServiceProvidence VA Medical CenterProvidenceUnited States
- Center for Neurotechnology and Neurorecovery, Department of NeurologyMassachusetts General Hospital, Harvard Medical SchoolBostonUnited States
- School of Engineering and Robert J. & Nandy D. Carney Institute for Brain ScienceBrown UniversityProvidenceUnited States
| | - A Bolu Ajiboye
- Department of Biomedical EngineeringCase Western Reserve UniversityClevelandUnited States
- FES Center, Rehab R&D ServiceLouis Stokes Cleveland Department of Veterans Affairs Medical CenterClevelandUnited States
| | - Shaul Druckmann
- Department of NeurobiologyStanford UniversityStanfordUnited States
| | - Krishna V Shenoy
- Department of Electrical EngineeringStanford UniversityStanfordUnited States
- Department of NeurobiologyStanford UniversityStanfordUnited States
- Department of BioengineeringStanford UniversityStanfordUnited States
- Howard Hughes Medical Institute, Stanford UniversityStanfordUnited States
- Wu Tsai Neurosciences InstituteStanford UniversityStanfordUnited States
- Bio-X ProgramStanford UniversityStanfordUnited States
| | - Jaimie M Henderson
- Department of NeurosurgeryStanford UniversityStanfordUnited States
- Wu Tsai Neurosciences InstituteStanford UniversityStanfordUnited States
- Bio-X ProgramStanford UniversityStanfordUnited States
| |
Collapse
|
33
|
Huggins JE, Guger C, Aarnoutse E, Allison B, Anderson CW, Bedrick S, Besio W, Chavarriaga R, Collinger JL, Do AH, Herff C, Hohmann M, Kinsella M, Lee K, Lotte F, Müller-Putz G, Nijholt A, Pels E, Peters B, Putze F, Rupp R, Schalk G, Scott S, Tangermann M, Tubig P, Zander T. Workshops of the Seventh International Brain-Computer Interface Meeting: Not Getting Lost in Translation. BRAIN-COMPUTER INTERFACES 2019; 6:71-101. [PMID: 33033729 PMCID: PMC7539697 DOI: 10.1080/2326263x.2019.1697163] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Accepted: 10/30/2019] [Indexed: 12/11/2022]
Abstract
The Seventh International Brain-Computer Interface (BCI) Meeting was held May 21-25th, 2018 at the Asilomar Conference Grounds, Pacific Grove, California, United States. The interactive nature of this conference was embodied by 25 workshops covering topics in BCI (also called brain-machine interface) research. Workshops covered foundational topics such as hardware development and signal analysis algorithms, new and imaginative topics such as BCI for virtual reality and multi-brain BCIs, and translational topics such as clinical applications and ethical assumptions of BCI development. BCI research is expanding in the diversity of applications and populations for whom those applications are being developed. BCI applications are moving toward clinical readiness as researchers struggle with the practical considerations to make sure that BCI translational efforts will be successful. This paper summarizes each workshop, providing an overview of the topic of discussion, references for additional information, and identifying future issues for research and development that resulted from the interactions and discussion at the workshop.
Collapse
Affiliation(s)
- Jane E Huggins
- Department of Physical Medicine and Rehabilitation, Department of Biomedical Engineering, Neuroscience Graduate Program, University of Michigan, Ann Arbor, Michigan, United States, 325 East Eisenhower, Room 3017; Ann Arbor, Michigan 48108-5744
| | - Christoph Guger
- g.tec medical engineering GmbH/Guger Technologies OG, Austria, Sierningstrasse 14, 4521 Schiedlberg, Austria
| | - Erik Aarnoutse
- UMC Utrecht Brain Center, Department of Neurology & Neurosurgery, University Medical Center Utrecht, Heidelberglaan 100, 3584 CX Utrecht, The Netherlands
| | - Brendan Allison
- Dept. of Cognitive Science, Mail Code 0515, University of California at San Diego, La Jolla, United States
| | - Charles W Anderson
- Department of Computer Science, Molecular, Cellular and Integrative Neurosience Program, Colorado State University, Fort Collins, CO 80523
| | - Steven Bedrick
- Center for Spoken Language Understanding, Oregon Health & Science University, Portland, OR 97239
| | - Walter Besio
- Department of Electrical, Computer, & Biomedical Engineering and Interdisciplinary Neuroscience Program, University of Rhode Island, Kingston, Rhode Island, USA, CREmedical Corp. Kingston, Rhode Island, USA
| | - Ricardo Chavarriaga
- Defitech Chair in Brain-Machine Interface (CNBI), Center for Neuroprosthetics, Ecole Polytechnique Fédérale de Lausanne - EPFL, Switzerland
| | - Jennifer L Collinger
- University of Pittsburgh, Department of Physical Medicine and Rehabilitation, VA Pittsburgh Healthcare System, Department of Veterans Affairs, 3520 5th Ave, Pittsburgh, PA, 15213
| | - An H Do
- UC Irvine Brain Computer Interface Lab, Department of Neurology, University of California, Irvine
| | - Christian Herff
- School of Mental Health and Neuroscience, Maastricht University, Maastricht, The Netherlands
| | - Matthias Hohmann
- Max Planck Institute for Intelligent Systems, Department for Empirical Inference, Max-Planck-Ring 4, 72074 Tübingen, Germany
| | - Michelle Kinsella
- Oregon Health & Science University, Institute on Development & Disability, 707 SW Gaines St, #1290, Portland, OR 97239
| | - Kyuhwa Lee
- Swiss Federal Institute of Technology in Lausanne-EPFL
| | - Fabien Lotte
- Inria Bordeaux Sud-Ouest, LaBRI (Univ. Bordeaux/CNRS/Bordeaux INP), 200 avenue de la vieille tour, 33405, Talence Cedex, France
| | | | - Anton Nijholt
- Faculty EEMCS, University of Twente, Enschede, The Netherlands
| | - Elmar Pels
- UMC Utrecht Brain Center, Department of Neurology & Neurosurgery, University Medical Center Utrecht, Heidelberglaan 100, 3584 CX Utrecht, The Netherlands
| | - Betts Peters
- Oregon Health & Science University, Institute on Development & Disability, 707 SW Gaines St, #1290, Portland, OR 97239
| | - Felix Putze
- University of Bremen, Germany, Cognitive Systems Lab, University of Bremen, Enrique-Schmidt-Straße 5 (Cartesium), 28359 Bremen
| | - Rüdiger Rupp
- Spinal Cord Injury Center, Heidelberg University Hospital
| | - Gerwin Schalk
- National Center for Adaptive Neurotechnologies, Wadsworth Center, NYS Dept. of Health, Dept. of Neurology, Albany Medical College, Dept. of Biomed. Sci., State Univ. of New York at Albany, Center for Medical Sciences 2003, 150 New Scotland Avenue, Albany, New York 12208
| | - Stephanie Scott
- Department of Media Communications, Colorado State University, Fort Collins, CO 80523
| | - Michael Tangermann
- Brain State Decoding Lab, Cluster of Excellence BrainLinks-BrainTools, Computer Science Dept., University of Freiburg, Germany, Autonomous Intelligent Systems Lab, Computer Science Dept., University of Freiburg, Germany
| | - Paul Tubig
- Department of Philosophy, Center for Neurotechnology, University of Washington, Savery Hall, Room 361, Seattle, WA 98195
| | - Thorsten Zander
- Team PhyPA, Biological Psychology and Neuroergonomics, Technische Universität Berlin, Berlin, Germany, 7 Zander Laboratories B.V., Amsterdam, The Netherlands
| |
Collapse
|
34
|
Dash D, Ferrari P, Malik S, Montillo A, Maldjian JA, Wang J. Determining the Optimal Number of MEG Trials: A Machine Learning and Speech Decoding Perspective. BRAIN INFORMATICS : INTERNATIONAL CONFERENCE, BI 2018, ARLINGTON, TX, USA, DECEMBER 7-9, 2018, PROCEEDINGS. INTERNATIONAL CONFERENCE ON BRAIN INFORMATICS (2018 : ARLINGTON, TEX.) 2019; 11309:163-172. [PMID: 31768504 PMCID: PMC6876632 DOI: 10.1007/978-3-030-05587-5_16] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Advancing the knowledge about neural speech mechanisms is critical for developing next-generation, faster brain computer interface to assist in speech communication for the patients with severe neurological conditions (e.g., locked-in syndrome). Among current neuroimaging techniques, Magnetoencephalography (MEG) provides direct representation for the large-scale neural dynamics of underlying cognitive processes based on its optimal spatiotemporal resolution. However, the MEG measured neural signals are smaller in magnitude compared to the background noise and hence, MEG usually suffers from a low signal-to-noise ratio (SNR) at the single-trial level. To overcome this limitation, it is common to record many trials of the same event-task and use the time-locked average signal for analysis, which can be very time consuming. In this study, we investigated the effect of the number of MEG recording trials required for speech decoding using a machine learning algorithm. We used a wavelet filter for generating the denoised neural features to train an Artificial Neural Network (ANN) for speech decoding. We found that wavelet based denoising increased the SNR of the neural signal prior to analysis and facilitated accurate speech decoding performance using as few as 40 single-trials. This study may open up the possibility of limiting MEG trials for other task evoked studies as well.
Collapse
Affiliation(s)
- Debadatta Dash
- Department of Bioengineering, University of Texas at Dallas, Richardson, USA
| | - Paul Ferrari
- Department of Psychology, University of Texas at Austin, Austin, USA
- MEG Laboratory, Dell Children's Medical Center, Austin, USA
| | - Saleem Malik
- MEG Lab, Cook Children's Hospital, Fort Worth, TX, USA
| | - Albert Montillo
- Department of Radiology, UT Southwestern Medical Center, Dallas, USA
- Department of Bioinformatics, UT Southwestern Medical Center, Dallas, USA
| | - Joseph A Maldjian
- Department of Radiology, UT Southwestern Medical Center, Dallas, USA
| | - Jun Wang
- Department of Bioengineering, University of Texas at Dallas, Richardson, USA
- Callier Center for Communication Disorders, University of Texas at Dallas, Richardson, USA
| |
Collapse
|
35
|
Herff C, Diener L, Angrick M, Mugler E, Tate MC, Goldrick MA, Krusienski DJ, Slutzky MW, Schultz T. Generating Natural, Intelligible Speech From Brain Activity in Motor, Premotor, and Inferior Frontal Cortices. Front Neurosci 2019; 13:1267. [PMID: 31824257 PMCID: PMC6882773 DOI: 10.3389/fnins.2019.01267] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 11/07/2019] [Indexed: 12/17/2022] Open
Abstract
Neural interfaces that directly produce intelligible speech from brain activity would allow people with severe impairment from neurological disorders to communicate more naturally. Here, we record neural population activity in motor, premotor and inferior frontal cortices during speech production using electrocorticography (ECoG) and show that ECoG signals alone can be used to generate intelligible speech output that can preserve conversational cues. To produce speech directly from neural data, we adapted a method from the field of speech synthesis called unit selection, in which units of speech are concatenated to form audible output. In our approach, which we call Brain-To-Speech, we chose subsequent units of speech based on the measured ECoG activity to generate audio waveforms directly from the neural recordings. Brain-To-Speech employed the user's own voice to generate speech that sounded very natural and included features such as prosody and accentuation. By investigating the brain areas involved in speech production separately, we found that speech motor cortex provided more information for the reconstruction process than the other cortical areas.
Collapse
Affiliation(s)
- Christian Herff
- School of Mental Health & Neuroscience, Maastricht University, Maastricht, Netherlands
- Cognitive Systems Lab, University of Bremen, Bremen, Germany
| | - Lorenz Diener
- Cognitive Systems Lab, University of Bremen, Bremen, Germany
| | - Miguel Angrick
- Cognitive Systems Lab, University of Bremen, Bremen, Germany
| | - Emily Mugler
- Department of Neurology, Northwestern University, Chicago, IL, United States
| | - Matthew C. Tate
- Department of Neurosurgery, Northwestern University, Chicago, IL, United States
| | - Matthew A. Goldrick
- Department of Linguistics, Northwestern University, Chicago, IL, United States
| | - Dean J. Krusienski
- Biomedical Engineering Department, Virginia Commonwealth University, Richmond, VA, United States
| | - Marc W. Slutzky
- Department of Neurology, Northwestern University, Chicago, IL, United States
- Department of Physiology, Northwestern University, Chicago, IL, United States
- Department of Physical Medicine & Rehabilitation, Northwestern University, Chicago, IL, United States
| | - Tanja Schultz
- Cognitive Systems Lab, University of Bremen, Bremen, Germany
| |
Collapse
|
36
|
Stavisky SD, Rezaii P, Willett FR, Hochberg LR, Shenoy KV, Henderson JM. Decoding Speech from Intracortical Multielectrode Arrays in Dorsal "Arm/Hand Areas" of Human Motor Cortex. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2019; 2018:93-97. [PMID: 30440349 DOI: 10.1109/embc.2018.8512199] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Neural prostheses are being developed to restore speech to people with neurological injury or disease. A key design consideration is where and how to access neural correlates of intended speech. Most prior work has examined cortical field potentials at a coarse resolution using electroencephalography (EEG) or medium resolution using electrocorticography (ECoG). The few studies of speech with single-neuron resolution recorded from ventral areas known to be part of the speech network. Here, we recorded from two 96- electrode arrays chronically implanted into the 'hand knob' area of motor cortex while a person with tetraplegia spoke. Despite being located in an area previously demonstrated to modulate during attempted arm movements, many electrodes' neuronal firing rates responded to speech production. In offline analyses, we could classify which of 9 phonemes (plus silence) was spoken with 81% single-trial accuracy using a combination of spike rate and local field potential (LFP) power. This suggests that high-fidelity speech prostheses may be possible using large-scale intracortical recordings in motor cortical areas involved in controlling speech articulators.
Collapse
|
37
|
Tam WK, Wu T, Zhao Q, Keefer E, Yang Z. Human motor decoding from neural signals: a review. BMC Biomed Eng 2019; 1:22. [PMID: 32903354 PMCID: PMC7422484 DOI: 10.1186/s42490-019-0022-z] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Accepted: 07/21/2019] [Indexed: 01/24/2023] Open
Abstract
Many people suffer from movement disability due to amputation or neurological diseases. Fortunately, with modern neurotechnology now it is possible to intercept motor control signals at various points along the neural transduction pathway and use that to drive external devices for communication or control. Here we will review the latest developments in human motor decoding. We reviewed the various strategies to decode motor intention from human and their respective advantages and challenges. Neural control signals can be intercepted at various points in the neural signal transduction pathway, including the brain (electroencephalography, electrocorticography, intracortical recordings), the nerves (peripheral nerve recordings) and the muscles (electromyography). We systematically discussed the sites of signal acquisition, available neural features, signal processing techniques and decoding algorithms in each of these potential interception points. Examples of applications and the current state-of-the-art performance were also reviewed. Although great strides have been made in human motor decoding, we are still far away from achieving naturalistic and dexterous control like our native limbs. Concerted efforts from material scientists, electrical engineers, and healthcare professionals are needed to further advance the field and make the technology widely available in clinical use.
Collapse
Affiliation(s)
- Wing-kin Tam
- Department of Biomedical Engineering, University of Minnesota Twin Cities, 7-105 Hasselmo Hall, 312 Church St. SE, Minnesota, 55455 USA
| | - Tong Wu
- Department of Biomedical Engineering, University of Minnesota Twin Cities, 7-105 Hasselmo Hall, 312 Church St. SE, Minnesota, 55455 USA
| | - Qi Zhao
- Department of Computer Science and Engineering, University of Minnesota Twin Cities, 4-192 Keller Hall, 200 Union Street SE, Minnesota, 55455 USA
| | - Edward Keefer
- Nerves Incorporated, Dallas, TX P. O. Box 141295 USA
| | - Zhi Yang
- Department of Biomedical Engineering, University of Minnesota Twin Cities, 7-105 Hasselmo Hall, 312 Church St. SE, Minnesota, 55455 USA
| |
Collapse
|
38
|
Angrick M, Herff C, Mugler E, Tate MC, Slutzky MW, Krusienski DJ, Schultz T. Speech synthesis from ECoG using densely connected 3D convolutional neural networks. J Neural Eng 2019; 16:036019. [PMID: 30831567 PMCID: PMC6822609 DOI: 10.1088/1741-2552/ab0c59] [Citation(s) in RCA: 71] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
OBJECTIVE Direct synthesis of speech from neural signals could provide a fast and natural way of communication to people with neurological diseases. Invasively-measured brain activity (electrocorticography; ECoG) supplies the necessary temporal and spatial resolution to decode fast and complex processes such as speech production. A number of impressive advances in speech decoding using neural signals have been achieved in recent years, but the complex dynamics are still not fully understood. However, it is unlikely that simple linear models can capture the relation between neural activity and continuous spoken speech. APPROACH Here we show that deep neural networks can be used to map ECoG from speech production areas onto an intermediate representation of speech (logMel spectrogram). The proposed method uses a densely connected convolutional neural network topology which is well-suited to work with the small amount of data available from each participant. MAIN RESULTS In a study with six participants, we achieved correlations up to r = 0.69 between the reconstructed and original logMel spectrograms. We transfered our prediction back into an audible waveform by applying a Wavenet vocoder. The vocoder was conditioned on logMel features that harnessed a much larger, pre-existing data corpus to provide the most natural acoustic output. SIGNIFICANCE To the best of our knowledge, this is the first time that high-quality speech has been reconstructed from neural recordings during speech production using deep neural networks.
Collapse
Affiliation(s)
- Miguel Angrick
- Cognitive Systems Lab, University of Bremen, Bremen, Germany
| | | | | | | | | | | | | |
Collapse
|
39
|
Angrick M, Herff C, Johnson G, Shih J, Krusienski D, Schultz T. Interpretation of convolutional neural networks for speech spectrogram regression from intracranial recordings. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.10.080] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
40
|
Towards reconstructing intelligible speech from the human auditory cortex. Sci Rep 2019; 9:874. [PMID: 30696881 PMCID: PMC6351601 DOI: 10.1038/s41598-018-37359-z] [Citation(s) in RCA: 88] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2018] [Accepted: 11/30/2018] [Indexed: 11/08/2022] Open
Abstract
Auditory stimulus reconstruction is a technique that finds the best approximation of the acoustic stimulus from the population of evoked neural activity. Reconstructing speech from the human auditory cortex creates the possibility of a speech neuroprosthetic to establish a direct communication with the brain and has been shown to be possible in both overt and covert conditions. However, the low quality of the reconstructed speech has severely limited the utility of this method for brain-computer interface (BCI) applications. To advance the state-of-the-art in speech neuroprosthesis, we combined the recent advances in deep learning with the latest innovations in speech synthesis technologies to reconstruct closed-set intelligible speech from the human auditory cortex. We investigated the dependence of reconstruction accuracy on linear and nonlinear (deep neural network) regression methods and the acoustic representation that is used as the target of reconstruction, including auditory spectrogram and speech synthesis parameters. In addition, we compared the reconstruction accuracy from low and high neural frequency ranges. Our results show that a deep neural network model that directly estimates the parameters of a speech synthesizer from all neural frequencies achieves the highest subjective and objective scores on a digit recognition task, improving the intelligibility by 65% over the baseline method which used linear regression to reconstruct the auditory spectrogram. These results demonstrate the efficacy of deep learning and speech synthesis algorithms for designing the next generation of speech BCI systems, which not only can restore communications for paralyzed patients but also have the potential to transform human-computer interaction technologies.
Collapse
|
41
|
Habets JGV, Heijmans M, Kuijf ML, Janssen MLF, Temel Y, Kubben PL. An update on adaptive deep brain stimulation in Parkinson's disease. Mov Disord 2018; 33:1834-1843. [PMID: 30357911 PMCID: PMC6587997 DOI: 10.1002/mds.115] [Citation(s) in RCA: 81] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Revised: 06/26/2018] [Accepted: 07/08/2018] [Indexed: 12/24/2022] Open
Abstract
Advancing conventional open‐loop DBS as a therapy for PD is crucial for overcoming important issues such as the delicate balance between beneficial and adverse effects and limited battery longevity that are currently associated with treatment. Closed‐loop or adaptive DBS aims to overcome these limitations by real‐time adjustment of stimulation parameters based on continuous feedback input signals that are representative of the patient's clinical state. The focus of this update is to discuss the most recent developments regarding potential input signals and possible stimulation parameter modulation for adaptive DBS in PD. Potential input signals for adaptive DBS include basal ganglia local field potentials, cortical recordings (electrocorticography), wearable sensors, and eHealth and mHealth devices. Furthermore, adaptive DBS can be applied with different approaches of stimulation parameter modulation, the feasibility of which can be adapted depending on specific PD phenotypes. Implementation of technological developments like machine learning show potential in the design of such approaches; however, energy consumption deserves further attention. Furthermore, we discuss future considerations regarding the clinical implementation of adaptive DBS in PD. © 2018 The Authors. Movement Disorders published by Wiley Periodicals, Inc. on behalf of International Parkinson and Movement Disorder Society.
Collapse
Affiliation(s)
- Jeroen G V Habets
- Departments of Neurosurgery, Maastricht University Medical Center, Maastricht, The Netherlands.,School of Mental Health and Neuroscience, Maastricht University Medical Center, Maastricht, The Netherlands
| | - Margot Heijmans
- Departments of Neurosurgery, Maastricht University Medical Center, Maastricht, The Netherlands.,School of Mental Health and Neuroscience, Maastricht University Medical Center, Maastricht, The Netherlands
| | - Mark L Kuijf
- Department of Neurology, Maastricht University Medical Center, Maastricht, The Netherlands
| | - Marcus L F Janssen
- Department of Neurology, Maastricht University Medical Center, Maastricht, The Netherlands.,Department of Clinical Neurophysiology, Maastricht University Medical Center, Maastricht, The Netherlands.,School of Mental Health and Neuroscience, Maastricht University Medical Center, Maastricht, The Netherlands
| | - Yasin Temel
- Departments of Neurosurgery, Maastricht University Medical Center, Maastricht, The Netherlands.,School of Mental Health and Neuroscience, Maastricht University Medical Center, Maastricht, The Netherlands
| | - Pieter L Kubben
- Departments of Neurosurgery, Maastricht University Medical Center, Maastricht, The Netherlands.,School of Mental Health and Neuroscience, Maastricht University Medical Center, Maastricht, The Netherlands
| |
Collapse
|
42
|
Liu Y, Ayaz H. Speech Recognition via fNIRS Based Brain Signals. Front Neurosci 2018; 12:695. [PMID: 30356771 PMCID: PMC6189799 DOI: 10.3389/fnins.2018.00695] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2018] [Accepted: 09/18/2018] [Indexed: 11/13/2022] Open
Abstract
In this paper, we present the first evidence that perceived speech can be identified from the listeners' brain signals measured via functional-near infrared spectroscopy (fNIRS)—a non-invasive, portable, and wearable neuroimaging technique suitable for ecologically valid settings. In this study, participants listened audio clips containing English stories while prefrontal and parietal cortices were monitored with fNIRS. Machine learning was applied to train predictive models using fNIRS data from a subject pool to predict which part of a story was listened by a new subject not in the pool based on the brain's hemodynamic response as measured by fNIRS. fNIRS signals can vary considerably from subject to subject due to the different head size, head shape, and spatial locations of brain functional regions. To overcome this difficulty, a generalized canonical correlation analysis (GCCA) was adopted to extract latent variables that are shared among the listeners before applying principal component analysis (PCA) for dimension reduction and applying logistic regression for classification. A 74.7% average accuracy has been achieved for differentiating between two 50 s. long story segments and a 43.6% average accuracy has been achieved for differentiating four 25 s. long story segments. These results suggest the potential of an fNIRS based-approach for building a speech decoding brain-computer-interface for developing a new type of neural prosthetic system.
Collapse
Affiliation(s)
- Yichuan Liu
- School of Biomedical Engineering, Drexel University, Science and Health Systems, Philadelphia, PA, United States.,Cognitive Neuroengineering and Quantitative Experimental Research (CONQUER) Collaborative, Drexel University, Philadelphia, PA, United States
| | - Hasan Ayaz
- School of Biomedical Engineering, Drexel University, Science and Health Systems, Philadelphia, PA, United States.,Cognitive Neuroengineering and Quantitative Experimental Research (CONQUER) Collaborative, Drexel University, Philadelphia, PA, United States.,Department of Family and Community Health, University of Pennsylvania, Philadelphia, PA, United States.,The Division of General Pediatrics, Children's Hospital of Philadelphia, Philadelphia, PA, United States
| |
Collapse
|
43
|
Nguyen CH, Karavas GK, Artemiadis P. Inferring imagined speech using EEG signals: a new approach using Riemannian manifold features. J Neural Eng 2017; 15:016002. [DOI: 10.1088/1741-2552/aa8235] [Citation(s) in RCA: 89] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
44
|
|
45
|
Iljina O, Derix J, Schirrmeister RT, Schulze-Bonhage A, Auer P, Aertsen A, Ball T. Neurolinguistic and machine-learning perspectives on direct speech BCIs for restoration of naturalistic communication. BRAIN-COMPUTER INTERFACES 2017. [DOI: 10.1080/2326263x.2017.1330611] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Olga Iljina
- GRK 1624 ‘Frequency effects in language’, University of Freiburg, Freiburg, Germany
- Department of German Linguistics, University of Freiburg, Freiburg, Germany
- Hermann Paul School of Linguistics, University of Freiburg, Germany
- BrainLinks-BrainTools, University of Freiburg, Freiburg, Germany
- Neurobiology and Biophysics, Faculty of Biology, University of Freiburg, Freiburg, Germany
| | - Johanna Derix
- BrainLinks-BrainTools, University of Freiburg, Freiburg, Germany
- Translational Neurotechnology Lab, Department of Neurosurgery, University Medical Center Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Robin Tibor Schirrmeister
- BrainLinks-BrainTools, University of Freiburg, Freiburg, Germany
- Translational Neurotechnology Lab, Department of Neurosurgery, University Medical Center Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Andreas Schulze-Bonhage
- Epilepsy Center, Department of Neurosurgery, University Medical Center Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
- BrainLinks-BrainTools, University of Freiburg, Freiburg, Germany
| | - Peter Auer
- GRK 1624 ‘Frequency effects in language’, University of Freiburg, Freiburg, Germany
- Department of German Linguistics, University of Freiburg, Freiburg, Germany
- Hermann Paul School of Linguistics, University of Freiburg, Germany
- Freiburg Institute for Advanced Studies (FRIAS), University of Freiburg, Freiburg, Germany
| | - Ad Aertsen
- Neurobiology and Biophysics, Faculty of Biology, University of Freiburg, Freiburg, Germany
- Bernstein Center Freiburg, University of Freiburg, Germany
| | - Tonio Ball
- BrainLinks-BrainTools, University of Freiburg, Freiburg, Germany
- Translational Neurotechnology Lab, Department of Neurosurgery, University Medical Center Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| |
Collapse
|
46
|
Vowel Imagery Decoding toward Silent Speech BCI Using Extreme Learning Machine with Electroencephalogram. BIOMED RESEARCH INTERNATIONAL 2016; 2016:2618265. [PMID: 28097128 PMCID: PMC5206788 DOI: 10.1155/2016/2618265] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Revised: 11/04/2016] [Accepted: 11/17/2016] [Indexed: 01/07/2023]
Abstract
The purpose of this study is to classify EEG data on imagined speech in a single trial. We recorded EEG data while five subjects imagined different vowels, /a/, /e/, /i/, /o/, and /u/. We divided each single trial dataset into thirty segments and extracted features (mean, variance, standard deviation, and skewness) from all segments. To reduce the dimension of the feature vector, we applied a feature selection algorithm based on the sparse regression model. These features were classified using a support vector machine with a radial basis function kernel, an extreme learning machine, and two variants of an extreme learning machine with different kernels. Because each single trial consisted of thirty segments, our algorithm decided the label of the single trial by selecting the most frequent output among the outputs of the thirty segments. As a result, we observed that the extreme learning machine and its variants achieved better classification rates than the support vector machine with a radial basis function kernel and linear discrimination analysis. Thus, our results suggested that EEG responses to imagined speech could be successfully classified in a single trial using an extreme learning machine with a radial basis function and linear kernel. This study with classification of imagined speech might contribute to the development of silent speech BCI systems.
Collapse
|