1
|
Rabbani Q, Shah S, Milsap G, Fifer M, Hermansky H, Crone N. Iterative alignment discovery of speech-associated neural activity. J Neural Eng 2024; 21:046056. [PMID: 39194182 DOI: 10.1088/1741-2552/ad663c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Accepted: 07/22/2024] [Indexed: 08/29/2024]
Abstract
Objective. Brain-computer interfaces (BCIs) have the potential to preserve or restore speech in patients with neurological disorders that weaken the muscles involved in speech production. However, successful training of low-latency speech synthesis and recognition models requires alignment of neural activity with intended phonetic or acoustic output with high temporal precision. This is particularly challenging in patients who cannot produce audible speech, as ground truth with which to pinpoint neural activity synchronized with speech is not available.Approach. In this study, we present a new iterative algorithm for neural voice activity detection (nVAD) called iterative alignment discovery dynamic time warping (IAD-DTW) that integrates DTW into the loss function of a deep neural network (DNN). The algorithm is designed to discover the alignment between a patient's electrocorticographic (ECoG) neural responses and their attempts to speak during collection of data for training BCI decoders for speech synthesis and recognition.Main results. To demonstrate the effectiveness of the algorithm, we tested its accuracy in predicting the onset and duration of acoustic signals produced by able-bodied patients with intact speech undergoing short-term diagnostic ECoG recordings for epilepsy surgery. We simulated a lack of ground truth by randomly perturbing the temporal correspondence between neural activity and an initial single estimate for all speech onsets and durations. We examined the model's ability to overcome these perturbations to estimate ground truth. IAD-DTW showed no notable degradation (<1% absolute decrease in accuracy) in performance in these simulations, even in the case of maximal misalignments between speech and silence.Significance. IAD-DTW is computationally inexpensive and can be easily integrated into existing DNN-based nVAD approaches, as it pertains only to the final loss computation. This approach makes it possible to train speech BCI algorithms using ECoG data from patients who are unable to produce audible speech, including those with Locked-In Syndrome.
Collapse
Affiliation(s)
- Qinwan Rabbani
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD 21218, United States of America
| | - Samyak Shah
- Department of Neurology, Johns Hopkins Medicine, Baltimore, MD 21287, United States of America
| | - Griffin Milsap
- Research and Exploratory Development Department, Johns Hopkins University Applied Physics Laboratory, Laurel, MD 20723, United States of America
| | - Matthew Fifer
- Research and Exploratory Development Department, Johns Hopkins University Applied Physics Laboratory, Laurel, MD 20723, United States of America
| | - Hynek Hermansky
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD 21218, United States of America
| | - Nathan Crone
- Department of Neurology, Johns Hopkins Medicine, Baltimore, MD 21287, United States of America
| |
Collapse
|
2
|
de Borman A, Wittevrongel B, Dauwe I, Carrette E, Meurs A, Van Roost D, Boon P, Van Hulle MM. Imagined speech event detection from electrocorticography and its transfer between speech modes and subjects. Commun Biol 2024; 7:818. [PMID: 38969758 PMCID: PMC11226700 DOI: 10.1038/s42003-024-06518-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 06/27/2024] [Indexed: 07/07/2024] Open
Abstract
Speech brain-computer interfaces aim to support communication-impaired patients by translating neural signals into speech. While impressive progress was achieved in decoding performed, perceived and attempted speech, imagined speech remains elusive, mainly due to the absence of behavioral output. Nevertheless, imagined speech is advantageous since it does not depend on any articulator movements that might become impaired or even lost throughout the stages of a neurodegenerative disease. In this study, we analyzed electrocortigraphy data recorded from 16 participants in response to 3 speech modes: performed, perceived (listening), and imagined speech. We used a linear model to detect speech events and examined the contributions of each frequency band, from delta to high gamma, given the speech mode and electrode location. For imagined speech detection, we observed a strong contribution of gamma bands in the motor cortex, whereas lower frequencies were more prominent in the temporal lobe, in particular of the left hemisphere. Based on the similarities in frequency patterns, we were able to transfer models between speech modes and participants with similar electrode locations.
Collapse
Affiliation(s)
- Aurélie de Borman
- Laboratory for Neuro- and Psychophysiology, KU Leuven, Leuven, Belgium.
| | | | - Ine Dauwe
- Department of Neurology, Ghent University Hospital, Ghent, Belgium
| | - Evelien Carrette
- Department of Neurology, Ghent University Hospital, Ghent, Belgium
| | - Alfred Meurs
- Department of Neurology, Ghent University Hospital, Ghent, Belgium
| | - Dirk Van Roost
- Department of Neurosurgery, Ghent University Hospital, Ghent, Belgium
| | - Paul Boon
- Department of Neurology, Ghent University Hospital, Ghent, Belgium
| | - Marc M Van Hulle
- Laboratory for Neuro- and Psychophysiology, KU Leuven, Leuven, Belgium
- Leuven Brain Institute (LBI), Leuven, Belgium
- Leuven Institute for Artificial Intelligence (Leuven.AI), Leuven, Belgium
| |
Collapse
|
3
|
Luo S, Angrick M, Coogan C, Candrea DN, Wyse‐Sookoo K, Shah S, Rabbani Q, Milsap GW, Weiss AR, Anderson WS, Tippett DC, Maragakis NJ, Clawson LL, Vansteensel MJ, Wester BA, Tenore FV, Hermansky H, Fifer MS, Ramsey NF, Crone NE. Stable Decoding from a Speech BCI Enables Control for an Individual with ALS without Recalibration for 3 Months. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023; 10:e2304853. [PMID: 37875404 PMCID: PMC10724434 DOI: 10.1002/advs.202304853] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 09/18/2023] [Indexed: 10/26/2023]
Abstract
Brain-computer interfaces (BCIs) can be used to control assistive devices by patients with neurological disorders like amyotrophic lateral sclerosis (ALS) that limit speech and movement. For assistive control, it is desirable for BCI systems to be accurate and reliable, preferably with minimal setup time. In this study, a participant with severe dysarthria due to ALS operates computer applications with six intuitive speech commands via a chronic electrocorticographic (ECoG) implant over the ventral sensorimotor cortex. Speech commands are accurately detected and decoded (median accuracy: 90.59%) throughout a 3-month study period without model retraining or recalibration. Use of the BCI does not require exogenous timing cues, enabling the participant to issue self-paced commands at will. These results demonstrate that a chronically implanted ECoG-based speech BCI can reliably control assistive devices over long time periods with only initial model training and calibration, supporting the feasibility of unassisted home use.
Collapse
Affiliation(s)
- Shiyu Luo
- Department of Biomedical EngineeringJohns Hopkins University School of MedicineBaltimoreMD21205USA
| | - Miguel Angrick
- Department of NeurologyJohns Hopkins University School of MedicineBaltimoreMD21287USA
| | - Christopher Coogan
- Department of NeurologyJohns Hopkins University School of MedicineBaltimoreMD21287USA
| | - Daniel N. Candrea
- Department of Biomedical EngineeringJohns Hopkins University School of MedicineBaltimoreMD21205USA
| | - Kimberley Wyse‐Sookoo
- Department of Biomedical EngineeringJohns Hopkins University School of MedicineBaltimoreMD21205USA
| | - Samyak Shah
- Department of NeurologyJohns Hopkins University School of MedicineBaltimoreMD21287USA
| | - Qinwan Rabbani
- Department of Electrical and Computer EngineeringJohns Hopkins UniversityBaltimoreMD21218USA
- Center for Language and Speech ProcessingJohns Hopkins UniversityBaltimoreMD21218USA
| | - Griffin W. Milsap
- Research and Exploratory Development DepartmentJohns Hopkins University Applied Physics LaboratoryLaurelMD20723USA
| | - Alexander R. Weiss
- Department of NeurologyJohns Hopkins University School of MedicineBaltimoreMD21287USA
| | - William S. Anderson
- Department of NeurosurgeryJohns Hopkins University School of MedicineBaltimoreMD21205USA
| | - Donna C. Tippett
- Department of NeurologyJohns Hopkins University School of MedicineBaltimoreMD21287USA
- Department of Otolaryngology‐Head and Neck SurgeryJohns Hopkins University School of MedicineBaltimoreMD21205USA
- Department of Physical Medicine and RehabilitationJohns Hopkins University School of MedicineBaltimoreMD21205USA
| | - Nicholas J. Maragakis
- Department of NeurologyJohns Hopkins University School of MedicineBaltimoreMD21287USA
| | - Lora L. Clawson
- Department of NeurologyJohns Hopkins University School of MedicineBaltimoreMD21287USA
| | - Mariska J. Vansteensel
- Department of Neurology and NeurosurgeryUMC Utrecht Brain CenterUtrecht3584The Netherlands
| | - Brock A. Wester
- Research and Exploratory Development DepartmentJohns Hopkins University Applied Physics LaboratoryLaurelMD20723USA
| | - Francesco V. Tenore
- Research and Exploratory Development DepartmentJohns Hopkins University Applied Physics LaboratoryLaurelMD20723USA
| | - Hynek Hermansky
- Department of Electrical and Computer EngineeringJohns Hopkins UniversityBaltimoreMD21218USA
- Center for Language and Speech ProcessingJohns Hopkins UniversityBaltimoreMD21218USA
| | - Matthew S. Fifer
- Research and Exploratory Development DepartmentJohns Hopkins University Applied Physics LaboratoryLaurelMD20723USA
| | - Nick F. Ramsey
- Department of Neurology and NeurosurgeryUMC Utrecht Brain CenterUtrecht3584The Netherlands
| | - Nathan E. Crone
- Department of NeurologyJohns Hopkins University School of MedicineBaltimoreMD21287USA
| |
Collapse
|
4
|
Meier A, Kuzdeba S, Jackson L, Daliri A, Tourville JA, Guenther FH, Greenlee JDW. Lateralization and Time-Course of Cortical Phonological Representations during Syllable Production. eNeuro 2023; 10:ENEURO.0474-22.2023. [PMID: 37739786 PMCID: PMC10561542 DOI: 10.1523/eneuro.0474-22.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 08/15/2023] [Accepted: 08/28/2023] [Indexed: 09/24/2023] Open
Abstract
Spoken language contains information at a broad range of timescales, from phonetic distinctions on the order of milliseconds to semantic contexts which shift over seconds to minutes. It is not well understood how the brain's speech production systems combine features at these timescales into a coherent vocal output. We investigated the spatial and temporal representations in cerebral cortex of three phonological units with different durations: consonants, vowels, and syllables. Electrocorticography (ECoG) recordings were obtained from five participants while speaking single syllables. We developed a novel clustering and Kalman filter-based trend analysis procedure to sort electrodes into temporal response profiles. A linear discriminant classifier was used to determine how strongly each electrode's response encoded phonological features. We found distinct time-courses of encoding phonological units depending on their duration: consonants were represented more during speech preparation, vowels were represented evenly throughout trials, and syllables during production. Locations of strongly speech-encoding electrodes (the top 30% of electrodes) likewise depended on phonological element duration, with consonant-encoding electrodes left-lateralized, vowel-encoding hemispherically balanced, and syllable-encoding right-lateralized. The lateralization of speech-encoding electrodes depended on onset time, with electrodes active before or after speech production favoring left hemisphere and those active during speech favoring the right. Single-electrode speech classification revealed cortical areas with preferential encoding of particular phonemic elements, including consonant encoding in the left precentral and postcentral gyri and syllable encoding in the right middle frontal gyrus. Our findings support neurolinguistic theories of left hemisphere specialization for processing short-timescale linguistic units and right hemisphere processing of longer-duration units.
Collapse
Affiliation(s)
- Andrew Meier
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA 02215
| | - Scott Kuzdeba
- Graduate Program for Neuroscience, Boston University, Boston, MA 02215
| | - Liam Jackson
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA 02215
| | - Ayoub Daliri
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA 02215
- College of Health Solutions, Arizona State University, Tempe, AZ 85004
| | - Jason A Tourville
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA 02215
| | - Frank H Guenther
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA 02215
- Department of Biomedical Engineering, Boston University, Boston, MA 02215
- Department of Radiology, Massachusetts General Hospital, Boston, MA 02215
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, MA 02215
| | - Jeremy D W Greenlee
- Department of Neurosurgery, University of Iowa Hospitals and Clinics, Iowa City, IA 52242
| |
Collapse
|
5
|
Verwoert M, Ottenhoff MC, Goulis S, Colon AJ, Wagner L, Tousseyn S, van Dijk JP, Kubben PL, Herff C. Dataset of Speech Production in intracranial.Electroencephalography. Sci Data 2022; 9:434. [PMID: 35869138 PMCID: PMC9307753 DOI: 10.1038/s41597-022-01542-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Accepted: 07/08/2022] [Indexed: 11/28/2022] Open
Abstract
Speech production is an intricate process involving a large number of muscles and cognitive processes. The neural processes underlying speech production are not completely understood. As speech is a uniquely human ability, it can not be investigated in animal models. High-fidelity human data can only be obtained in clinical settings and is therefore not easily available to all researchers. Here, we provide a dataset of 10 participants reading out individual words while we measured intracranial EEG from a total of 1103 electrodes. The data, with its high temporal resolution and coverage of a large variety of cortical and sub-cortical brain regions, can help in understanding the speech production process better. Simultaneously, the data can be used to test speech decoding and synthesis approaches from neural data to develop speech Brain-Computer Interfaces and speech neuroprostheses. Measurement(s) | Brain activity | Technology Type(s) | Stereotactic electroencephalography | Sample Characteristic - Organism | Homo sapiens | Sample Characteristic - Environment | Epilepsy monitoring center | Sample Characteristic - Location | The Netherlands |
Collapse
|
6
|
Luo S, Rabbani Q, Crone NE. Brain-Computer Interface: Applications to Speech Decoding and Synthesis to Augment Communication. Neurotherapeutics 2022; 19:263-273. [PMID: 35099768 PMCID: PMC9130409 DOI: 10.1007/s13311-022-01190-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/16/2022] [Indexed: 01/03/2023] Open
Abstract
Damage or degeneration of motor pathways necessary for speech and other movements, as in brainstem strokes or amyotrophic lateral sclerosis (ALS), can interfere with efficient communication without affecting brain structures responsible for language or cognition. In the worst-case scenario, this can result in the locked in syndrome (LIS), a condition in which individuals cannot initiate communication and can only express themselves by answering yes/no questions with eye blinks or other rudimentary movements. Existing augmentative and alternative communication (AAC) devices that rely on eye tracking can improve the quality of life for people with this condition, but brain-computer interfaces (BCIs) are also increasingly being investigated as AAC devices, particularly when eye tracking is too slow or unreliable. Moreover, with recent and ongoing advances in machine learning and neural recording technologies, BCIs may offer the only means to go beyond cursor control and text generation on a computer, to allow real-time synthesis of speech, which would arguably offer the most efficient and expressive channel for communication. The potential for BCI speech synthesis has only recently been realized because of seminal studies of the neuroanatomical and neurophysiological underpinnings of speech production using intracranial electrocorticographic (ECoG) recordings in patients undergoing epilepsy surgery. These studies have shown that cortical areas responsible for vocalization and articulation are distributed over a large area of ventral sensorimotor cortex, and that it is possible to decode speech and reconstruct its acoustics from ECoG if these areas are recorded with sufficiently dense and comprehensive electrode arrays. In this article, we review these advances, including the latest neural decoding strategies that range from deep learning models to the direct concatenation of speech units. We also discuss state-of-the-art vocoders that are integral in constructing natural-sounding audio waveforms for speech BCIs. Finally, this review outlines some of the challenges ahead in directly synthesizing speech for patients with LIS.
Collapse
Affiliation(s)
- Shiyu Luo
- Department of Biomedical Engineering, The Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| | - Qinwan Rabbani
- Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, MD, USA
| | - Nathan E Crone
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| |
Collapse
|
7
|
CyberEye: New Eye-Tracking Interfaces for Assessment and Modulation of Cognitive Functions beyond the Brain. SENSORS 2021; 21:s21227605. [PMID: 34833681 PMCID: PMC8617901 DOI: 10.3390/s21227605] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 11/09/2021] [Accepted: 11/11/2021] [Indexed: 11/16/2022]
Abstract
The emergence of innovative neurotechnologies in global brain projects has accelerated research and clinical applications of BCIs beyond sensory and motor functions. Both invasive and noninvasive sensors are developed to interface with cognitive functions engaged in thinking, communication, or remembering. The detection of eye movements by a camera offers a particularly attractive external sensor for computer interfaces to monitor, assess, and control these higher brain functions without acquiring signals from the brain. Features of gaze position and pupil dilation can be effectively used to track our attention in healthy mental processes, to enable interaction in disorders of consciousness, or to even predict memory performance in various brain diseases. In this perspective article, we propose the term ‘CyberEye’ to encompass emerging cognitive applications of eye-tracking interfaces for neuroscience research, clinical practice, and the biomedical industry. As CyberEye technologies continue to develop, we expect BCIs to become less dependent on brain activities, to be less invasive, and to thus be more applicable.
Collapse
|
8
|
Real-time synthesis of imagined speech processes from minimally invasive recordings of neural activity. Commun Biol 2021; 4:1055. [PMID: 34556793 PMCID: PMC8460739 DOI: 10.1038/s42003-021-02578-0] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Accepted: 08/11/2021] [Indexed: 11/17/2022] Open
Abstract
Speech neuroprosthetics aim to provide a natural communication channel to individuals who are unable to speak due to physical or neurological impairments. Real-time synthesis of acoustic speech directly from measured neural activity could enable natural conversations and notably improve quality of life, particularly for individuals who have severely limited means of communication. Recent advances in decoding approaches have led to high quality reconstructions of acoustic speech from invasively measured neural activity. However, most prior research utilizes data collected during open-loop experiments of articulated speech, which might not directly translate to imagined speech processes. Here, we present an approach that synthesizes audible speech in real-time for both imagined and whispered speech conditions. Using a participant implanted with stereotactic depth electrodes, we were able to reliably generate audible speech in real-time. The decoding models rely predominately on frontal activity suggesting that speech processes have similar representations when vocalized, whispered, or imagined. While reconstructed audio is not yet intelligible, our real-time synthesis approach represents an essential step towards investigating how patients will learn to operate a closed-loop speech neuroprosthesis based on imagined speech. Miguel Angrick et al. develop an intracranial EEG-based method to decode imagined speech from a human patient and translate it into audible speech in real-time. This report presents an important proof of concept that acoustic output can be reconstructed on the basis of neural signals, and serves as a valuable step in the development of neuroprostheses to help nonverbal patients interact with their environment.
Collapse
|
9
|
Dash D, Wisler A, Ferrari P, Davenport EM, Maldjian J, Wang J. MEG Sensor Selection for Neural Speech Decoding. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 8:182320-182337. [PMID: 33204579 PMCID: PMC7668411 DOI: 10.1109/access.2020.3028831] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Direct decoding of speech from the brain is a faster alternative to current electroencephalography (EEG) speller-based brain-computer interfaces (BCI) in providing communication assistance to locked-in patients. Magnetoencephalography (MEG) has recently shown great potential as a non-invasive neuroimaging modality for neural speech decoding, owing in part to its spatial selectivity over other high-temporal resolution devices. Standard MEG systems have a large number of cryogenically cooled channels/sensors (200 - 300) encapsulated within a fixed liquid helium dewar, precluding their use as wearable BCI devices. Fortunately, recently developed optically pumped magnetometers (OPM) do not require cryogens, and have the potential to be wearable and movable making them more suitable for BCI applications. This design is also modular allowing for customized montages to include only the sensors necessary for a particular task. As the number of sensors bears a heavy influence on the cost, size, and weight of MEG systems, minimizing the number of sensors is critical for designing practical MEG-based BCIs in the future. In this study, we sought to identify an optimal set of MEG channels to decode imagined and spoken phrases from the MEG signals. Using a forward selection algorithm with a support vector machine classifier we found that nine optimally located MEG gradiometers provided higher decoding accuracy compared to using all channels. Additionally, the forward selection algorithm achieved similar performance to dimensionality reduction using a stacked-sparse-autoencoder. Analysis of spatial dynamics of speech decoding suggested that both left and right hemisphere sensors contribute to speech decoding. Sensors approximately located near Broca's area were found to be commonly contributing among the higher-ranked sensors across all subjects.
Collapse
Affiliation(s)
- Debadatta Dash
- Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX 78712, USA
- Department of Neurology, Dell Medical School, The University of Texas at Austin, Austin, TX 78712, USA
| | - Alan Wisler
- Department of Speech, Language, and Hearing Sciences, University of Texas at Austin, Austin, TX 78712, USA
| | - Paul Ferrari
- MEG Laboratory, Dell Children's Medical Center, Austin, TX 78723, USA
- Department of Psychology, The University of Texas at Austin, Austin, TX 78712, USA
| | | | - Joseph Maldjian
- Department of Radiology, University of Texas at Southwestern, Dallas, TX 75390, USA
| | - Jun Wang
- Department of Neurology, Dell Medical School, The University of Texas at Austin, Austin, TX 78712, USA
- Department of Speech, Language, and Hearing Sciences, University of Texas at Austin, Austin, TX 78712, USA
| |
Collapse
|
10
|
Farrokhi B, Erfanian A. A state-based probabilistic method for decoding hand position during movement from ECoG signals in non-human primate. J Neural Eng 2020; 17:026042. [PMID: 32224511 DOI: 10.1088/1741-2552/ab848b] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
OBJECTIVE In this study, we proposed a state-based probabilistic method for decoding hand positions during unilateral and bilateral movements using the ECoG signals recorded from the brain of Rhesus monkey. APPROACH A customized electrode array was implanted subdurally in the right hemisphere of the brain covering from the primary motor cortex to the frontal cortex. Three different experimental paradigms were considered: ipsilateral, contralateral, and bilateral movements. During unilateral movement, the monkey was trained to get food with one hand, while during bilateral movement, the monkey used its left and right hands alternately to get food. To estimate the hand positions, a state-based probabilistic method was introduced which was based on the conditional probability of the hand movement state (i.e. idle, right hand movement, and left hand movement) and the conditional expectation of the hand position for each state. Moreover, a hybrid feature extraction method based on linear discriminant analysis and partial least squares (PLS) was introduced. MAIN RESULTS The proposed method could successfully decode the hand positions during ipsilateral, contralateral, and bilateral movements and significantly improved the decoding performance compared to the conventional Kalman and PLS regression methods [Formula: see text]. The proposed hybrid feature extraction method was found to outperform both the PLS and PCA methods [Formula: see text]. Investigating the kinematic information of each frequency band shows that more informative frequency bands were [Formula: see text] (15-30 Hz) and [Formula: see text](50-100 Hz) for ipsilateral and [Formula: see text] and [Formula: see text] (100-200 Hz) for contralateral movements. It is observed that ipsilateral movement was decoded better than contralateral movement for [Formula: see text] (5-15 Hz) and [Formula: see text] bands, while contralateral movements was decoded better for [Formula: see text] (30-200 Hz) and hfECoG (200-400 Hz) bands. SIGNIFICANCE Accurate decoding the bilateral movement using the ECoG recorded from one brain hemisphere is an important issue toward real-life applications of the brain-machine interface technologies.
Collapse
Affiliation(s)
- Behraz Farrokhi
- Department of Biomedical Engineering, School of Electrical Engineering, Iran University of Science and Technology (IUST), Iran Neural Technology Research Centre, Tehran, Iran
| | | |
Collapse
|
11
|
Herff C, Diener L, Angrick M, Mugler E, Tate MC, Goldrick MA, Krusienski DJ, Slutzky MW, Schultz T. Generating Natural, Intelligible Speech From Brain Activity in Motor, Premotor, and Inferior Frontal Cortices. Front Neurosci 2019; 13:1267. [PMID: 31824257 PMCID: PMC6882773 DOI: 10.3389/fnins.2019.01267] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 11/07/2019] [Indexed: 12/17/2022] Open
Abstract
Neural interfaces that directly produce intelligible speech from brain activity would allow people with severe impairment from neurological disorders to communicate more naturally. Here, we record neural population activity in motor, premotor and inferior frontal cortices during speech production using electrocorticography (ECoG) and show that ECoG signals alone can be used to generate intelligible speech output that can preserve conversational cues. To produce speech directly from neural data, we adapted a method from the field of speech synthesis called unit selection, in which units of speech are concatenated to form audible output. In our approach, which we call Brain-To-Speech, we chose subsequent units of speech based on the measured ECoG activity to generate audio waveforms directly from the neural recordings. Brain-To-Speech employed the user's own voice to generate speech that sounded very natural and included features such as prosody and accentuation. By investigating the brain areas involved in speech production separately, we found that speech motor cortex provided more information for the reconstruction process than the other cortical areas.
Collapse
Affiliation(s)
- Christian Herff
- School of Mental Health & Neuroscience, Maastricht University, Maastricht, Netherlands
- Cognitive Systems Lab, University of Bremen, Bremen, Germany
| | - Lorenz Diener
- Cognitive Systems Lab, University of Bremen, Bremen, Germany
| | - Miguel Angrick
- Cognitive Systems Lab, University of Bremen, Bremen, Germany
| | - Emily Mugler
- Department of Neurology, Northwestern University, Chicago, IL, United States
| | - Matthew C. Tate
- Department of Neurosurgery, Northwestern University, Chicago, IL, United States
| | - Matthew A. Goldrick
- Department of Linguistics, Northwestern University, Chicago, IL, United States
| | - Dean J. Krusienski
- Biomedical Engineering Department, Virginia Commonwealth University, Richmond, VA, United States
| | - Marc W. Slutzky
- Department of Neurology, Northwestern University, Chicago, IL, United States
- Department of Physiology, Northwestern University, Chicago, IL, United States
- Department of Physical Medicine & Rehabilitation, Northwestern University, Chicago, IL, United States
| | - Tanja Schultz
- Cognitive Systems Lab, University of Bremen, Bremen, Germany
| |
Collapse
|
12
|
Rabbani Q, Milsap G, Crone NE. The Potential for a Speech Brain-Computer Interface Using Chronic Electrocorticography. Neurotherapeutics 2019; 16:144-165. [PMID: 30617653 PMCID: PMC6361062 DOI: 10.1007/s13311-018-00692-2] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
A brain-computer interface (BCI) is a technology that uses neural features to restore or augment the capabilities of its user. A BCI for speech would enable communication in real time via neural correlates of attempted or imagined speech. Such a technology would potentially restore communication and improve quality of life for locked-in patients and other patients with severe communication disorders. There have been many recent developments in neural decoders, neural feature extraction, and brain recording modalities facilitating BCI for the control of prosthetics and in automatic speech recognition (ASR). Indeed, ASR and related fields have developed significantly over the past years, and many lend many insights into the requirements, goals, and strategies for speech BCI. Neural speech decoding is a comparatively new field but has shown much promise with recent studies demonstrating semantic, auditory, and articulatory decoding using electrocorticography (ECoG) and other neural recording modalities. Because the neural representations for speech and language are widely distributed over cortical regions spanning the frontal, parietal, and temporal lobes, the mesoscopic scale of population activity captured by ECoG surface electrode arrays may have distinct advantages for speech BCI, in contrast to the advantages of microelectrode arrays for upper-limb BCI. Nevertheless, there remain many challenges for the translation of speech BCIs to clinical populations. This review discusses and outlines the current state-of-the-art for speech BCI and explores what a speech BCI using chronic ECoG might entail.
Collapse
Affiliation(s)
- Qinwan Rabbani
- Department of Electrical Engineering, The Johns Hopkins University Whiting School of Engineering, Baltimore, MD, USA.
| | - Griffin Milsap
- Department of Biomedical Engineering, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Nathan E Crone
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| |
Collapse
|