1
|
Dash D, Ferrari P, Wang J. Neural Decoding of Spontaneous Overt and Intended Speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024:1-10. [PMID: 39106199 DOI: 10.1044/2024_jslhr-24-00046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/09/2024]
Abstract
PURPOSE The aim of this study was to decode intended and overt speech from neuromagnetic signals while the participants performed spontaneous overt speech tasks without cues or prompts (stimuli). METHOD Magnetoencephalography (MEG), a noninvasive neuroimaging technique, was used to collect neural signals from seven healthy adult English speakers performing spontaneous, overt speech tasks. The participants randomly spoke the words yes or no at a self-paced rate without cues. Two machine learning models, namely, linear discriminant analysis (LDA) and one-dimensional convolutional neural network (1D CNN), were employed to classify the two words from the recorded MEG signals. RESULTS LDA and 1D CNN achieved average decoding accuracies of 79.02% and 90.40%, respectively, in decoding overt speech, significantly surpassing the chance level (50%). The accuracy for decoding intended speech was 67.19% using 1D CNN. CONCLUSIONS This study showcases the possibility of decoding spontaneous overt and intended speech directly from neural signals in the absence of perceptual interference. We believe that these findings make a steady step toward the future spontaneous speech-based brain-computer interface.
Collapse
Affiliation(s)
- Debadatta Dash
- Department of Neurology, The University of Texas at Austin
| | - Paul Ferrari
- Helen DeVos Children's Hospital, Corewell Health, Grand Rapids, MI
| | - Jun Wang
- Department of Neurology, The University of Texas at Austin
- Department of Speech, Language, and Hearing Sciences, The University of Texas at Austin
| |
Collapse
|
2
|
Jeong JH, Cho JH, Lee BH, Lee SW. Real-Time Deep Neurolinguistic Learning Enhances Noninvasive Neural Language Decoding for Brain-Machine Interaction. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:7469-7482. [PMID: 36251899 DOI: 10.1109/tcyb.2022.3211694] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Electroencephalogram (EEG)-based brain-machine interface (BMI) has been utilized to help patients regain motor function and has recently been validated for its use in healthy people because of its ability to directly decipher human intentions. In particular, neurolinguistic research using EEGs has been investigated as an intuitive and naturalistic communication tool between humans and machines. In this study, the human mind directly decoded the neural languages based on speech imagery using the proposed deep neurolinguistic learning. Through real-time experiments, we evaluated whether BMI-based cooperative tasks between multiple users could be accomplished using a variety of neural languages. We successfully demonstrated a BMI system that allows a variety of scenarios, such as essential activity, collaborative play, and emotional interaction. This outcome presents a novel BMI frontier that can interact at the level of human-like intelligence in real time and extends the boundaries of the communication paradigm.
Collapse
|
3
|
Berezutskaya J, Freudenburg ZV, Vansteensel MJ, Aarnoutse EJ, Ramsey NF, van Gerven MAJ. Direct speech reconstruction from sensorimotor brain activity with optimized deep learning models. J Neural Eng 2023; 20:056010. [PMID: 37467739 PMCID: PMC10510111 DOI: 10.1088/1741-2552/ace8be] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Revised: 07/12/2023] [Accepted: 07/19/2023] [Indexed: 07/21/2023]
Abstract
Objective.Development of brain-computer interface (BCI) technology is key for enabling communication in individuals who have lost the faculty of speech due to severe motor paralysis. A BCI control strategy that is gaining attention employs speech decoding from neural data. Recent studies have shown that a combination of direct neural recordings and advanced computational models can provide promising results. Understanding which decoding strategies deliver best and directly applicable results is crucial for advancing the field.Approach.In this paper, we optimized and validated a decoding approach based on speech reconstruction directly from high-density electrocorticography recordings from sensorimotor cortex during a speech production task.Main results.We show that (1) dedicated machine learning optimization of reconstruction models is key for achieving the best reconstruction performance; (2) individual word decoding in reconstructed speech achieves 92%-100% accuracy (chance level is 8%); (3) direct reconstruction from sensorimotor brain activity produces intelligible speech.Significance.These results underline the need for model optimization in achieving best speech decoding results and highlight the potential that reconstruction-based speech decoding from sensorimotor cortex can offer for development of next-generation BCI technology for communication.
Collapse
Affiliation(s)
- Julia Berezutskaya
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht 3584 CX, The Netherlands
- Donders Center for Brain, Cognition and Behaviour, Nijmegen 6525 GD, The Netherlands
| | - Zachary V Freudenburg
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht 3584 CX, The Netherlands
| | - Mariska J Vansteensel
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht 3584 CX, The Netherlands
| | - Erik J Aarnoutse
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht 3584 CX, The Netherlands
| | - Nick F Ramsey
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht 3584 CX, The Netherlands
| | - Marcel A J van Gerven
- Donders Center for Brain, Cognition and Behaviour, Nijmegen 6525 GD, The Netherlands
| |
Collapse
|
4
|
Cooney C, Folli R, Coyle D. Opportunities, pitfalls and trade-offs in designing protocols for measuring the neural correlates of speech. Neurosci Biobehav Rev 2022; 140:104783. [PMID: 35907491 DOI: 10.1016/j.neubiorev.2022.104783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 07/12/2022] [Accepted: 07/15/2022] [Indexed: 11/25/2022]
Abstract
Decoding speech and speech-related processes directly from the human brain has intensified in studies over recent years as such a decoder has the potential to positively impact people with limited communication capacity due to disease or injury. Additionally, it can present entirely new forms of human-computer interaction and human-machine communication in general and facilitate better neuroscientific understanding of speech processes. Here, we synthesize the literature on neural speech decoding pertaining to how speech decoding experiments have been conducted, coalescing around a necessity for thoughtful experimental design aimed at specific research goals, and robust procedures for evaluating speech decoding paradigms. We examine the use of different modalities for presenting stimuli to participants, methods for construction of paradigms including timings and speech rhythms, and possible linguistic considerations. In addition, novel methods for eliciting naturalistic speech and validating imagined speech task performance in experimental settings are presented based on recent research. We also describe the multitude of terms used to instruct participants on how to produce imagined speech during experiments and propose methods for investigating the effect of these terms on imagined speech decoding. We demonstrate that the range of experimental procedures used in neural speech decoding studies can have unintended consequences which can impact upon the efficacy of the knowledge obtained. The review delineates the strengths and weaknesses of present approaches and poses methodological advances which we anticipate will enhance experimental design, and progress toward the optimal design of movement independent direct speech brain-computer interfaces.
Collapse
Affiliation(s)
- Ciaran Cooney
- Intelligent Systems Research Centre, Ulster University, Derry, UK.
| | - Raffaella Folli
- Institute for Research in Social Sciences, Ulster University, Jordanstown, UK
| | - Damien Coyle
- Intelligent Systems Research Centre, Ulster University, Derry, UK
| |
Collapse
|
5
|
Berezutskaya J, Vansteensel MJ, Aarnoutse EJ, Freudenburg ZV, Piantoni G, Branco MP, Ramsey NF. Open multimodal iEEG-fMRI dataset from naturalistic stimulation with a short audiovisual film. Sci Data 2022; 9:91. [PMID: 35314718 PMCID: PMC8938409 DOI: 10.1038/s41597-022-01173-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Accepted: 01/24/2022] [Indexed: 12/19/2022] Open
Abstract
Intracranial human recordings are a valuable and rare resource of information about the brain. Making such data publicly available not only helps tackle reproducibility issues in science, it helps make more use of these valuable data. This is especially true for data collected using naturalistic tasks. Here, we describe a dataset collected from a large group of human subjects while they watched a short audiovisual film. The dataset has several unique features. First, it includes a large amount of intracranial electroencephalography (iEEG) data (51 participants, age range of 5-55 years, who all performed the same task). Second, it includes functional magnetic resonance imaging (fMRI) recordings (30 participants, age range of 7-47) during the same task. Eighteen participants performed both iEEG and fMRI versions of the task, non-simultaneously. Third, the data were acquired using a rich audiovisual stimulus, for which we provide detailed speech and video annotations. This dataset can be used to study neural mechanisms of multimodal perception and language comprehension, and similarity of neural signals across brain recording modalities.
Collapse
Affiliation(s)
- Julia Berezutskaya
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht, the Netherlands.
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands.
| | - Mariska J Vansteensel
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Erik J Aarnoutse
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Zachary V Freudenburg
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Giovanni Piantoni
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Mariana P Branco
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Nick F Ramsey
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht, the Netherlands
| |
Collapse
|
6
|
Glanz O, Hader M, Schulze-Bonhage A, Auer P, Ball T. A Study of Word Complexity Under Conditions of Non-experimental, Natural Overt Speech Production Using ECoG. Front Hum Neurosci 2022; 15:711886. [PMID: 35185491 PMCID: PMC8854223 DOI: 10.3389/fnhum.2021.711886] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Accepted: 12/15/2021] [Indexed: 11/25/2022] Open
Abstract
The linguistic complexity of words has largely been studied on the behavioral level and in experimental settings. Only little is known about the neural processes underlying it in uninstructed, spontaneous conversations. We built up a multimodal neurolinguistic corpus composed of synchronized audio, video, and electrocorticographic (ECoG) recordings from the fronto-temporo-parietal cortex to address this phenomenon based on uninstructed, spontaneous speech production. We performed extensive linguistic annotations of the language material and calculated word complexity using several numeric parameters. We orthogonalized the parameters with the help of a linear regression model. Then, we correlated the spectral components of neural activity with the individual linguistic parameters and with the residuals of the linear regression model, and compared the results. The proportional relation between the number of consonants and vowels, which was the most informative parameter with regard to the neural representation of word complexity, showed effects in two areas: the frontal one was at the junction of the premotor cortex, the prefrontal cortex, and Brodmann area 44. The postcentral one lay directly above the lateral sulcus and comprised the ventral central sulcus, the parietal operculum and the adjacent inferior parietal cortex. Beyond the physiological findings summarized here, our methods may be useful for those interested in ways of studying neural effects related to natural language production and in surmounting the intrinsic problem of collinearity between multiple features of spontaneously spoken material.
Collapse
Affiliation(s)
- Olga Glanz
- GRK 1624 “Frequency Effects in Language,” University of Freiburg, Freiburg, Germany
- Department of German Linguistics, University of Freiburg, Freiburg, Germany
- The Hermann Paul School of Linguistics, University of Freiburg, Freiburg, Germany
- BrainLinks-BrainTools, University of Freiburg, Freiburg, Germany
- Neurobiology and Biophysics, Faculty of Biology, University of Freiburg, Freiburg, Germany
- Translational Neurotechnology Lab, Department of Neurosurgery, Faculty of Medicine, Medical Center—University of Freiburg, University of Freiburg, Freiburg, Germany
- Olga Glanz (Iljina),
| | - Marina Hader
- BrainLinks-BrainTools, University of Freiburg, Freiburg, Germany
- Translational Neurotechnology Lab, Department of Neurosurgery, Faculty of Medicine, Medical Center—University of Freiburg, University of Freiburg, Freiburg, Germany
| | - Andreas Schulze-Bonhage
- Department of Neurosurgery, Faculty of Medicine, Epilepsy Center, Medical Center—University of Freiburg, University of Freiburg, Freiburg, Germany
- Bernstein Center Freiburg, University of Freiburg, Freiburg, Germany
| | - Peter Auer
- GRK 1624 “Frequency Effects in Language,” University of Freiburg, Freiburg, Germany
- Department of German Linguistics, University of Freiburg, Freiburg, Germany
- The Hermann Paul School of Linguistics, University of Freiburg, Freiburg, Germany
| | - Tonio Ball
- BrainLinks-BrainTools, University of Freiburg, Freiburg, Germany
- Translational Neurotechnology Lab, Department of Neurosurgery, Faculty of Medicine, Medical Center—University of Freiburg, University of Freiburg, Freiburg, Germany
- Bernstein Center Freiburg, University of Freiburg, Freiburg, Germany
- *Correspondence: Tonio Ball,
| |
Collapse
|
7
|
Cooney C, Folli R, Coyle D. A bimodal deep learning architecture for EEG-fNIRS decoding of overt and imagined speech. IEEE Trans Biomed Eng 2021; 69:1983-1994. [PMID: 34874850 DOI: 10.1109/tbme.2021.3132861] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
OBJECTIVE Brain-computer interfaces (BCI) studies are increasingly leveraging different attributes of multiple signal modalities simultaneously. Bimodal data acquisition protocols combining the temporal resolution of electroencephalography (EEG) with the spatial resolution of functional near-infrared spectroscopy (fNIRS) require novel approaches to decoding. METHODS We present an EEG-fNIRS Hybrid BCI that employs a new bimodal deep neural network architecture consisting of two convolutional sub-networks (subnets) to decode overt and imagined speech. Features from each subnet are fused before further feature extraction and classification. Nineteen participants performed overt and imagined speech in a novel cue-based paradigm enabling investigation of stimulus and linguistic effects on decoding. RESULTS Using the hybrid approach, classification accuracies (46.31% and 34.29% for overt and imagined speech, respectively (chance: 25%)) indicated a significant improvement on EEG used independently for imagined speech (p=0.020) while tending towards significance for overt speech (p=0.098). In comparison with fNIRS, significant improvements for both speech-types were achieved with bimodal decoding (p<0.001). There was a mean difference of ~12.02% between overt and imagined speech with accuracies as high as 87.18% and 53%. Deeper subnets enhanced performance while stimulus effected overt and imagined speech in significantly different ways. CONCLUSION The bimodal approach was a significant improvement on unimodal results for several tasks. Results indicate the potential of multi-modal deep learning for enhancing neural signal decoding. SIGNIFICANCE This novel architecture can be used to enhance speech decoding from bimodal neural signals.
Collapse
|
8
|
Cooney C, Korik A, Folli R, Coyle D. Evaluation of Hyperparameter Optimization in Machine and Deep Learning Methods for Decoding Imagined Speech EEG. SENSORS 2020; 20:s20164629. [PMID: 32824559 PMCID: PMC7472624 DOI: 10.3390/s20164629] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Revised: 08/10/2020] [Accepted: 08/13/2020] [Indexed: 01/12/2023]
Abstract
Classification of electroencephalography (EEG) signals corresponding to imagined speech production is important for the development of a direct-speech brain-computer interface (DS-BCI). Deep learning (DL) has been utilized with great success across several domains. However, it remains an open question whether DL methods provide significant advances over traditional machine learning (ML) approaches for classification of imagined speech. Furthermore, hyperparameter (HP) optimization has been neglected in DL-EEG studies, resulting in the significance of its effects remaining uncertain. In this study, we aim to improve classification of imagined speech EEG by employing DL methods while also statistically evaluating the impact of HP optimization on classifier performance. We trained three distinct convolutional neural networks (CNN) on imagined speech EEG using a nested cross-validation approach to HP optimization. Each of the CNNs evaluated was designed specifically for EEG decoding. An imagined speech EEG dataset consisting of both words and vowels facilitated training on both sets independently. CNN results were compared with three benchmark ML methods: Support Vector Machine, Random Forest and regularized Linear Discriminant Analysis. Intra- and inter-subject methods of HP optimization were tested and the effects of HPs statistically analyzed. Accuracies obtained by the CNNs were significantly greater than the benchmark methods when trained on both datasets (words: 24.97%, p < 1 × 10-7, chance: 16.67%; vowels: 30.00%, p < 1 × 10-7, chance: 20%). The effects of varying HP values, and interactions between HPs and the CNNs were both statistically significant. The results of HP optimization demonstrate how critical it is for training CNNs to decode imagined speech.
Collapse
Affiliation(s)
- Ciaran Cooney
- Intelligent Systems Research Centre, Ulster University, Londonderry BT48 7JL, UK; (A.K.); (D.C.)
- Correspondence:
| | - Attila Korik
- Intelligent Systems Research Centre, Ulster University, Londonderry BT48 7JL, UK; (A.K.); (D.C.)
| | - Raffaella Folli
- Institute for Research in Social Sciences, Ulster University, Jordanstown BT37 0QB, UK;
| | - Damien Coyle
- Intelligent Systems Research Centre, Ulster University, Londonderry BT48 7JL, UK; (A.K.); (D.C.)
| |
Collapse
|
9
|
Dash D, Ferrari P, Wang J. Decoding Imagined and Spoken Phrases From Non-invasive Neural (MEG) Signals. Front Neurosci 2020; 14:290. [PMID: 32317917 PMCID: PMC7154084 DOI: 10.3389/fnins.2020.00290] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2019] [Accepted: 03/13/2020] [Indexed: 11/16/2022] Open
Abstract
Speech production is a hierarchical mechanism involving the synchronization of the brain and the oral articulators, where the intention of linguistic concepts is transformed into meaningful sounds. Individuals with locked-in syndrome (fully paralyzed but aware) lose their motor ability completely including articulation and even eyeball movement. The neural pathway may be the only option to resume a certain level of communication for these patients. Current brain-computer interfaces (BCIs) use patients' visual and attentional correlates to build communication, resulting in a slow communication rate (a few words per minute). Direct decoding of imagined speech from the neural signals (and then driving a speech synthesizer) has the potential for a higher communication rate. In this study, we investigated the decoding of five imagined and spoken phrases from single-trial, non-invasive magnetoencephalography (MEG) signals collected from eight adult subjects. Two machine learning algorithms were used. One was an artificial neural network (ANN) with statistical features as the baseline approach. The other was convolutional neural networks (CNNs) applied on the spatial, spectral and temporal features extracted from the MEG signals. Experimental results indicated the possibility to decode imagined and spoken phrases directly from neuromagnetic signals. CNNs were found to be highly effective with an average decoding accuracy of up to 93% for the imagined and 96% for the spoken phrases.
Collapse
Affiliation(s)
- Debadatta Dash
- Department of Electrical and Computer Engineering, University of Texas at Austin, Austin, TX, United States
- Department of Neurology, Dell Medical School, University of Texas at Austin, Austin, TX, United States
| | - Paul Ferrari
- MEG Lab, Dell Children's Medical Center, Austin, TX, United States
- Department of Psychology, University of Texas at Austin, Austin, TX, United States
| | - Jun Wang
- Department of Neurology, Dell Medical School, University of Texas at Austin, Austin, TX, United States
- Department of Communication Sciences and Disorders, University of Texas at Austin, Austin, TX, United States
| |
Collapse
|
10
|
Angrick M, Herff C, Mugler E, Tate MC, Slutzky MW, Krusienski DJ, Schultz T. Speech synthesis from ECoG using densely connected 3D convolutional neural networks. J Neural Eng 2019; 16:036019. [PMID: 30831567 PMCID: PMC6822609 DOI: 10.1088/1741-2552/ab0c59] [Citation(s) in RCA: 71] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
OBJECTIVE Direct synthesis of speech from neural signals could provide a fast and natural way of communication to people with neurological diseases. Invasively-measured brain activity (electrocorticography; ECoG) supplies the necessary temporal and spatial resolution to decode fast and complex processes such as speech production. A number of impressive advances in speech decoding using neural signals have been achieved in recent years, but the complex dynamics are still not fully understood. However, it is unlikely that simple linear models can capture the relation between neural activity and continuous spoken speech. APPROACH Here we show that deep neural networks can be used to map ECoG from speech production areas onto an intermediate representation of speech (logMel spectrogram). The proposed method uses a densely connected convolutional neural network topology which is well-suited to work with the small amount of data available from each participant. MAIN RESULTS In a study with six participants, we achieved correlations up to r = 0.69 between the reconstructed and original logMel spectrograms. We transfered our prediction back into an audible waveform by applying a Wavenet vocoder. The vocoder was conditioned on logMel features that harnessed a much larger, pre-existing data corpus to provide the most natural acoustic output. SIGNIFICANCE To the best of our knowledge, this is the first time that high-quality speech has been reconstructed from neural recordings during speech production using deep neural networks.
Collapse
Affiliation(s)
- Miguel Angrick
- Cognitive Systems Lab, University of Bremen, Bremen, Germany
| | | | | | | | | | | | | |
Collapse
|
11
|
Towards reconstructing intelligible speech from the human auditory cortex. Sci Rep 2019; 9:874. [PMID: 30696881 PMCID: PMC6351601 DOI: 10.1038/s41598-018-37359-z] [Citation(s) in RCA: 86] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2018] [Accepted: 11/30/2018] [Indexed: 11/08/2022] Open
Abstract
Auditory stimulus reconstruction is a technique that finds the best approximation of the acoustic stimulus from the population of evoked neural activity. Reconstructing speech from the human auditory cortex creates the possibility of a speech neuroprosthetic to establish a direct communication with the brain and has been shown to be possible in both overt and covert conditions. However, the low quality of the reconstructed speech has severely limited the utility of this method for brain-computer interface (BCI) applications. To advance the state-of-the-art in speech neuroprosthesis, we combined the recent advances in deep learning with the latest innovations in speech synthesis technologies to reconstruct closed-set intelligible speech from the human auditory cortex. We investigated the dependence of reconstruction accuracy on linear and nonlinear (deep neural network) regression methods and the acoustic representation that is used as the target of reconstruction, including auditory spectrogram and speech synthesis parameters. In addition, we compared the reconstruction accuracy from low and high neural frequency ranges. Our results show that a deep neural network model that directly estimates the parameters of a speech synthesizer from all neural frequencies achieves the highest subjective and objective scores on a digit recognition task, improving the intelligibility by 65% over the baseline method which used linear regression to reconstruct the auditory spectrogram. These results demonstrate the efficacy of deep learning and speech synthesis algorithms for designing the next generation of speech BCI systems, which not only can restore communications for paralyzed patients but also have the potential to transform human-computer interaction technologies.
Collapse
|
12
|
Cooney C, Folli R, Coyle D. Neurolinguistics Research Advancing Development of a Direct-Speech Brain-Computer Interface. iScience 2018; 8:103-125. [PMID: 30296666 PMCID: PMC6174918 DOI: 10.1016/j.isci.2018.09.016] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2018] [Revised: 09/04/2018] [Accepted: 09/18/2018] [Indexed: 01/09/2023] Open
Abstract
A direct-speech brain-computer interface (DS-BCI) acquires neural signals corresponding to imagined speech, then processes and decodes these signals to produce a linguistic output in the form of phonemes, words, or sentences. Recent research has shown the potential of neurolinguistics to enhance decoding approaches to imagined speech with the inclusion of semantics and phonology in experimental procedures. As neurolinguistics research findings are beginning to be incorporated within the scope of DS-BCI research, it is our view that a thorough understanding of imagined speech, and its relationship with overt speech, must be considered an integral feature of research in this field. With a focus on imagined speech, we provide a review of the most important neurolinguistics research informing the field of DS-BCI and suggest how this research may be utilized to improve current experimental protocols and decoding techniques. Our review of the literature supports a cross-disciplinary approach to DS-BCI research, in which neurolinguistics concepts and methods are utilized to aid development of a naturalistic mode of communication.
Collapse
Affiliation(s)
- Ciaran Cooney
- Intelligent Systems Research Centre, Ulster University, Derry, UK.
| | - Raffaella Folli
- Institute for Research in Social Sciences, Ulster University, Jordanstown, UK
| | - Damien Coyle
- Intelligent Systems Research Centre, Ulster University, Derry, UK
| |
Collapse
|