1
|
Thomas TM, Singh A, Bullock LP, Liang D, Morse CW, Scherschligt X, Seymour JP, Tandon N. Decoding articulatory and phonetic components of naturalistic continuous speech from the distributed language network. J Neural Eng 2023; 20:046030. [PMID: 37487487 DOI: 10.1088/1741-2552/ace9fb] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 07/24/2023] [Indexed: 07/26/2023]
Abstract
Objective.The speech production network relies on a widely distributed brain network. However, research and development of speech brain-computer interfaces (speech-BCIs) has typically focused on decoding speech only from superficial subregions readily accessible by subdural grid arrays-typically placed over the sensorimotor cortex. Alternatively, the technique of stereo-electroencephalography (sEEG) enables access to distributed brain regions using multiple depth electrodes with lower surgical risks, especially in patients with brain injuries resulting in aphasia and other speech disorders.Approach.To investigate the decoding potential of widespread electrode coverage in multiple cortical sites, we used a naturalistic continuous speech production task. We obtained neural recordings using sEEG from eight participants while they read aloud sentences. We trained linear classifiers to decode distinct speech components (articulatory components and phonemes) solely based on broadband gamma activity and evaluated the decoding performance using nested five-fold cross-validation.Main Results.We achieved an average classification accuracy of 18.7% across 9 places of articulation (e.g. bilabials, palatals), 26.5% across 5 manner of articulation (MOA) labels (e.g. affricates, fricatives), and 4.81% across 38 phonemes. The highest classification accuracies achieved with a single large dataset were 26.3% for place of articulation, 35.7% for MOA, and 9.88% for phonemes. Electrodes that contributed high decoding power were distributed across multiple sulcal and gyral sites in both dominant and non-dominant hemispheres, including ventral sensorimotor, inferior frontal, superior temporal, and fusiform cortices. Rather than finding a distinct cortical locus for each speech component, we observed neural correlates of both articulatory and phonetic components in multiple hubs of a widespread language production network.Significance.These results reveal the distributed cortical representations whose activity can enable decoding speech components during continuous speech through the use of this minimally invasive recording method, elucidating language neurobiology and neural targets for future speech-BCIs.
Collapse
Affiliation(s)
- Tessy M Thomas
- Vivian L. Smith Department of Neurosurgery, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
- Texas Institute for Restorative Neurotechnologies, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
| | - Aditya Singh
- Vivian L. Smith Department of Neurosurgery, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
- Texas Institute for Restorative Neurotechnologies, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
| | - Latané P Bullock
- Vivian L. Smith Department of Neurosurgery, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
- Texas Institute for Restorative Neurotechnologies, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
| | - Daniel Liang
- Department of Computer Science, Rice University, Houston, TX 77005, United States of America
| | - Cale W Morse
- Vivian L. Smith Department of Neurosurgery, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
- Texas Institute for Restorative Neurotechnologies, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
| | - Xavier Scherschligt
- Vivian L. Smith Department of Neurosurgery, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
- Texas Institute for Restorative Neurotechnologies, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
| | - John P Seymour
- Vivian L. Smith Department of Neurosurgery, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
- Texas Institute for Restorative Neurotechnologies, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
- Department of Electrical & Computer Engineering, Rice University, Houston, TX 77005, United States of America
| | - Nitin Tandon
- Vivian L. Smith Department of Neurosurgery, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
- Texas Institute for Restorative Neurotechnologies, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
- Memorial Hermann Hospital, Texas Medical Center, Houston, TX 77030, United States of America
| |
Collapse
|
2
|
Meng K, Goodarzy F, Kim E, Park YJ, Kim JS, Cook MJ, Chung CK, Grayden DB. Continuous synthesis of artificial speech sounds from human cortical surface recordings during silent speech production. J Neural Eng 2023; 20:046019. [PMID: 37459853 DOI: 10.1088/1741-2552/ace7f6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 07/17/2023] [Indexed: 07/28/2023]
Abstract
Objective. Brain-computer interfaces can restore various forms of communication in paralyzed patients who have lost their ability to articulate intelligible speech. This study aimed to demonstrate the feasibility of closed-loop synthesis of artificial speech sounds from human cortical surface recordings during silent speech production.Approach. Ten participants with intractable epilepsy were temporarily implanted with intracranial electrode arrays over cortical surfaces. A decoding model that predicted audible outputs directly from patient-specific neural feature inputs was trained during overt word reading and immediately tested with overt, mimed and imagined word reading. Predicted outputs were later assessed objectively against corresponding voice recordings and subjectively through human perceptual judgments.Main results. Artificial speech sounds were successfully synthesized during overt and mimed utterances by two participants with some coverage of the precentral gyrus. About a third of these sounds were correctly identified by naïve listeners in two-alternative forced-choice tasks. A similar outcome could not be achieved during imagined utterances by any of the participants. However, neural feature contribution analyses suggested the presence of exploitable activation patterns during imagined speech in the postcentral gyrus and the superior temporal gyrus. In future work, a more comprehensive coverage of cortical surfaces, including posterior parts of the middle frontal gyrus and the inferior frontal gyrus, could improve synthesis performance during imagined speech.Significance.As the field of speech neuroprostheses is rapidly moving toward clinical trials, this study addressed important considerations about task instructions and brain coverage when conducting research on silent speech with non-target participants.
Collapse
Affiliation(s)
- Kevin Meng
- Department of Biomedical Engineering, The University of Melbourne, Melbourne, Australia
- Graeme Clark Institute for Biomedical Engineering, The University of Melbourne, Melbourne, Australia
| | - Farhad Goodarzy
- Department of Medicine, St Vincent's Hospital, The University of Melbourne, Melbourne, Australia
| | - EuiYoung Kim
- Interdisciplinary Program in Neuroscience, Seoul National University, Seoul, Republic of Korea
| | - Ye Jin Park
- Department of Brain and Cognitive Sciences, Seoul National University, Seoul, Republic of Korea
| | - June Sic Kim
- Research Institute of Basic Sciences, Seoul National University, Seoul, Republic of Korea
| | - Mark J Cook
- Department of Biomedical Engineering, The University of Melbourne, Melbourne, Australia
- Graeme Clark Institute for Biomedical Engineering, The University of Melbourne, Melbourne, Australia
- Department of Medicine, St Vincent's Hospital, The University of Melbourne, Melbourne, Australia
| | - Chun Kee Chung
- Department of Brain and Cognitive Sciences, Seoul National University, Seoul, Republic of Korea
- Department of Neurosurgery, Seoul National University Hospital, Seoul, Republic of Korea
| | - David B Grayden
- Department of Biomedical Engineering, The University of Melbourne, Melbourne, Australia
- Graeme Clark Institute for Biomedical Engineering, The University of Melbourne, Melbourne, Australia
- Department of Medicine, St Vincent's Hospital, The University of Melbourne, Melbourne, Australia
| |
Collapse
|
3
|
Prinsloo KD, Lalor EC. General Auditory and Speech-Specific Contributions to Cortical Envelope Tracking Revealed Using Auditory Chimeras. J Neurosci 2022; 42:7782-7798. [PMID: 36041853 PMCID: PMC9581567 DOI: 10.1523/jneurosci.2735-20.2022] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 06/28/2022] [Accepted: 07/01/2022] [Indexed: 11/21/2022] Open
Abstract
In recent years research on natural speech processing has benefited from recognizing that low-frequency cortical activity tracks the amplitude envelope of natural speech. However, it remains unclear to what extent this tracking reflects speech-specific processing beyond the analysis of the stimulus acoustics. In the present study, we aimed to disentangle contributions to cortical envelope tracking that reflect general acoustic processing from those that are functionally related to processing speech. To do so, we recorded EEG from subjects as they listened to auditory chimeras, stimuli composed of the temporal fine structure of one speech stimulus modulated by the amplitude envelope (ENV) of another speech stimulus. By varying the number of frequency bands used in making the chimeras, we obtained some control over which speech stimulus was recognized by the listener. No matter which stimulus was recognized, envelope tracking was always strongest for the ENV stimulus, indicating a dominant contribution from acoustic processing. However, there was also a positive relationship between intelligibility and the tracking of the perceived speech, indicating a contribution from speech-specific processing. These findings were supported by a follow-up analysis that assessed envelope tracking as a function of the (estimated) output of the cochlea rather than the original stimuli used in creating the chimeras. Finally, we sought to isolate the speech-specific contribution to envelope tracking using forward encoding models and found that indices of phonetic feature processing tracked reliably with intelligibility. Together these results show that cortical speech tracking is dominated by acoustic processing but also reflects speech-specific processing.SIGNIFICANCE STATEMENT Activity in auditory cortex is known to dynamically track the energy fluctuations, or amplitude envelope, of speech. Measures of this tracking are now widely used in research on hearing and language and have had a substantial influence on theories of how auditory cortex parses and processes speech. But how much of this speech tracking is actually driven by speech-specific processing rather than general acoustic processing is unclear, limiting its interpretability and its usefulness. Here, by merging two speech stimuli together to form so-called auditory chimeras, we show that EEG tracking of the speech envelope is dominated by acoustic processing but also reflects linguistic analysis. This has important implications for theories of cortical speech tracking and for using measures of that tracking in applied research.
Collapse
Affiliation(s)
- Kevin D Prinsloo
- Departments of Biomedical Engineering and Neuroscience, and Del Monte Institute for Neuroscience, University of Rochester, Rochester, New York 14627
| | - Edmund C Lalor
- Departments of Biomedical Engineering and Neuroscience, and Del Monte Institute for Neuroscience, University of Rochester, Rochester, New York 14627
| |
Collapse
|
4
|
Shah U, Alzubaidi M, Mohsen F, Abd-Alrazaq A, Alam T, Househ M. The Role of Artificial Intelligence in Decoding Speech from EEG Signals: A Scoping Review. Sensors (Basel) 2022; 22:6975. [PMID: 36146323 PMCID: PMC9505262 DOI: 10.3390/s22186975] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/25/2022] [Revised: 08/01/2022] [Accepted: 08/09/2022] [Indexed: 06/16/2023]
Abstract
Background: Brain traumas, mental disorders, and vocal abuse can result in permanent or temporary speech impairment, significantly impairing one's quality of life and occasionally resulting in social isolation. Brain-computer interfaces (BCI) can support people who have issues with their speech or who have been paralyzed to communicate with their surroundings via brain signals. Therefore, EEG signal-based BCI has received significant attention in the last two decades for multiple reasons: (i) clinical research has capitulated detailed knowledge of EEG signals, (ii) inexpensive EEG devices, and (iii) its application in medical and social fields. Objective: This study explores the existing literature and summarizes EEG data acquisition, feature extraction, and artificial intelligence (AI) techniques for decoding speech from brain signals. Method: We followed the PRISMA-ScR guidelines to conduct this scoping review. We searched six electronic databases: PubMed, IEEE Xplore, the ACM Digital Library, Scopus, arXiv, and Google Scholar. We carefully selected search terms based on target intervention (i.e., imagined speech and AI) and target data (EEG signals), and some of the search terms were derived from previous reviews. The study selection process was carried out in three phases: study identification, study selection, and data extraction. Two reviewers independently carried out study selection and data extraction. A narrative approach was adopted to synthesize the extracted data. Results: A total of 263 studies were evaluated; however, 34 met the eligibility criteria for inclusion in this review. We found 64-electrode EEG signal devices to be the most widely used in the included studies. The most common signal normalization and feature extractions in the included studies were the bandpass filter and wavelet-based feature extraction. We categorized the studies based on AI techniques, such as machine learning and deep learning. The most prominent ML algorithm was a support vector machine, and the DL algorithm was a convolutional neural network. Conclusions: EEG signal-based BCI is a viable technology that can enable people with severe or temporal voice impairment to communicate to the world directly from their brain. However, the development of BCI technology is still in its infancy.
Collapse
Affiliation(s)
- Uzair Shah
- College of Science and Engineering, Hamad Bin Khalifa University, Doha P.O. Box 34110, Qatar
| | - Mahmood Alzubaidi
- College of Science and Engineering, Hamad Bin Khalifa University, Doha P.O. Box 34110, Qatar
| | - Farida Mohsen
- College of Science and Engineering, Hamad Bin Khalifa University, Doha P.O. Box 34110, Qatar
| | - Alaa Abd-Alrazaq
- AI Center for Precision Health, Weill Cornell Medicine-Qatar, Doha P.O. Box 34110, Qatar
| | - Tanvir Alam
- College of Science and Engineering, Hamad Bin Khalifa University, Doha P.O. Box 34110, Qatar
| | - Mowafa Househ
- College of Science and Engineering, Hamad Bin Khalifa University, Doha P.O. Box 34110, Qatar
| |
Collapse
|
5
|
Wandelt SK, Kellis S, Bjånes DA, Pejsa K, Lee B, Liu C, Andersen RA. Decoding grasp and speech signals from the cortical grasp circuit in a tetraplegic human. Neuron 2022; 110:1777-1787.e3. [PMID: 35364014 DOI: 10.1016/j.neuron.2022.03.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Revised: 02/01/2022] [Accepted: 03/08/2022] [Indexed: 02/04/2023]
Abstract
The cortical grasp network encodes planning and execution of grasps and processes spoken and written aspects of language. High-level cortical areas within this network are attractive implant sites for brain-machine interfaces (BMIs). While a tetraplegic patient performed grasp motor imagery and vocalized speech, neural activity was recorded from the supramarginal gyrus (SMG), ventral premotor cortex (PMv), and somatosensory cortex (S1). In SMG and PMv, five imagined grasps were well represented by firing rates of neuronal populations during visual cue presentation. During motor imagery, these grasps were significantly decodable from all brain areas. During speech production, SMG encoded both spoken grasp types and the names of five colors. Whereas PMv neurons significantly modulated their activity during grasping, SMG's neural population broadly encoded features of both motor imagery and speech. Together, these results indicate that brain signals from high-level areas of the human cortex could be used for grasping and speech BMI applications.
Collapse
Affiliation(s)
- Sarah K Wandelt
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA; T&C Chen Brain-Machine Interface Center, California Institute of Technology, Pasadena, CA 91125, USA.
| | - Spencer Kellis
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA; T&C Chen Brain-Machine Interface Center, California Institute of Technology, Pasadena, CA 91125, USA; Department of Neurological Surgery, Keck School of Medicine of USC, Los Angeles, CA 90033, USA; USC Neurorestoration Center, Keck School of Medicine of USC, Los Angeles, CA 90033, USA
| | - David A Bjånes
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA; T&C Chen Brain-Machine Interface Center, California Institute of Technology, Pasadena, CA 91125, USA
| | - Kelsie Pejsa
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA; T&C Chen Brain-Machine Interface Center, California Institute of Technology, Pasadena, CA 91125, USA
| | - Brian Lee
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA; Department of Neurological Surgery, Keck School of Medicine of USC, Los Angeles, CA 90033, USA; USC Neurorestoration Center, Keck School of Medicine of USC, Los Angeles, CA 90033, USA
| | - Charles Liu
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA; Department of Neurological Surgery, Keck School of Medicine of USC, Los Angeles, CA 90033, USA; USC Neurorestoration Center, Keck School of Medicine of USC, Los Angeles, CA 90033, USA; Rancho Los Amigos National Rehabilitation Center, Downey, CA 90242, USA
| | - Richard A Andersen
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA; T&C Chen Brain-Machine Interface Center, California Institute of Technology, Pasadena, CA 91125, USA
| |
Collapse
|
6
|
Wilson GH, Stavisky SD, Willett FR, Avansino DT, Kelemen JN, Hochberg LR, Henderson JM, Druckmann S, Shenoy KV. Decoding spoken English from intracortical electrode arrays in dorsal precentral gyrus. J Neural Eng 2020; 17:066007. [PMID: 33236720 PMCID: PMC8293867 DOI: 10.1088/1741-2552/abbfef] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
OBJECTIVE To evaluate the potential of intracortical electrode array signals for brain-computer interfaces (BCIs) to restore lost speech, we measured the performance of decoders trained to discriminate a comprehensive basis set of 39 English phonemes and to synthesize speech sounds via a neural pattern matching method. We decoded neural correlates of spoken-out-loud words in the 'hand knob' area of precentral gyrus, a step toward the eventual goal of decoding attempted speech from ventral speech areas in patients who are unable to speak. APPROACH Neural and audio data were recorded while two BrainGate2 pilot clinical trial participants, each with two chronically-implanted 96-electrode arrays, spoke 420 different words that broadly sampled English phonemes. Phoneme onsets were identified from audio recordings, and their identities were then classified from neural features consisting of each electrode's binned action potential counts or high-frequency local field potential power. Speech synthesis was performed using the 'Brain-to-Speech' pattern matching method. We also examined two potential confounds specific to decoding overt speech: acoustic contamination of neural signals and systematic differences in labeling different phonemes' onset times. MAIN RESULTS A linear decoder achieved up to 29.3% classification accuracy (chance = 6%) across 39 phonemes, while an RNN classifier achieved 33.9% accuracy. Parameter sweeps indicated that performance did not saturate when adding more electrodes or more training data, and that accuracy improved when utilizing time-varying structure in the data. Microphonic contamination and phoneme onset differences modestly increased decoding accuracy, but could be mitigated by acoustic artifact subtraction and using a neural speech onset marker, respectively. Speech synthesis achieved r = 0.523 correlation between true and reconstructed audio. SIGNIFICANCE The ability to decode speech using intracortical electrode array signals from a nontraditional speech area suggests that placing electrode arrays in ventral speech areas is a promising direction for speech BCIs.
Collapse
Affiliation(s)
- Guy H Wilson
- Neurosciences Graduate Program, Stanford University, Stanford, CA, United States of America
| | - Sergey D Stavisky
- Department of Neurosurgery, Stanford University, Stanford, CA, United States of America
- Wu Tsai Neurosciences Institute and Bio-X Institute, Stanford University, Stanford, CA, United States of America
- Department of Electrical Engineering, Stanford University, Stanford, CA, United States of America
| | - Francis R Willett
- Department of Neurosurgery, Stanford University, Stanford, CA, United States of America
- Department of Electrical Engineering, Stanford University, Stanford, CA, United States of America
- Howard Hughes Medical Institute at Stanford University, Stanford, CA, United States of America
| | - Donald T Avansino
- Department of Neurosurgery, Stanford University, Stanford, CA, United States of America
| | - Jessica N Kelemen
- Department of Neurology, Harvard Medical School, Boston, MA, United States of America
| | - Leigh R Hochberg
- Department of Neurology, Harvard Medical School, Boston, MA, United States of America
- Center for Neurotechnology and Neurorecovery, Dept. of Neurology, Massachusetts General Hospital, Boston, MA, United States of America
- VA RR&D Center for Neurorestoration and Neurotechnology, Rehabilitation R&D Service, Providence VA Medical Center, Providence, RI, United States of America
- Carney Institute for Brain Science and School of Engineering, Brown University, Providence, RI, United States of America
| | - Jaimie M Henderson
- Department of Neurosurgery, Stanford University, Stanford, CA, United States of America
- Wu Tsai Neurosciences Institute and Bio-X Institute, Stanford University, Stanford, CA, United States of America
| | - Shaul Druckmann
- Wu Tsai Neurosciences Institute and Bio-X Institute, Stanford University, Stanford, CA, United States of America
- Department of Neurobiology, Stanford University, Stanford, CA, United States of America
| | - Krishna V Shenoy
- Wu Tsai Neurosciences Institute and Bio-X Institute, Stanford University, Stanford, CA, United States of America
- Department of Electrical Engineering, Stanford University, Stanford, CA, United States of America
- Howard Hughes Medical Institute at Stanford University, Stanford, CA, United States of America
- Department of Neurobiology, Stanford University, Stanford, CA, United States of America
- Department of Bioengineering, Stanford University, Stanford, CA, United States of America
| |
Collapse
|
7
|
Keitel A, Gross J, Kayser C. Shared and modality-specific brain regions that mediate auditory and visual word comprehension. eLife 2020; 9:e56972. [PMID: 32831168 PMCID: PMC7470824 DOI: 10.7554/elife.56972] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Accepted: 08/18/2020] [Indexed: 12/22/2022] Open
Abstract
Visual speech carried by lip movements is an integral part of communication. Yet, it remains unclear in how far visual and acoustic speech comprehension are mediated by the same brain regions. Using multivariate classification of full-brain MEG data, we first probed where the brain represents acoustically and visually conveyed word identities. We then tested where these sensory-driven representations are predictive of participants' trial-wise comprehension. The comprehension-relevant representations of auditory and visual speech converged only in anterior angular and inferior frontal regions and were spatially dissociated from those representations that best reflected the sensory-driven word identity. These results provide a neural explanation for the behavioural dissociation of acoustic and visual speech comprehension and suggest that cerebral representations encoding word identities may be more modality-specific than often upheld.
Collapse
Affiliation(s)
- Anne Keitel
- Psychology, University of DundeeDundeeUnited Kingdom
- Institute of Neuroscience and Psychology, University of GlasgowGlasgowUnited Kingdom
| | - Joachim Gross
- Institute of Neuroscience and Psychology, University of GlasgowGlasgowUnited Kingdom
- Institute for Biomagnetism and Biosignalanalysis, University of MünsterMünsterGermany
| | - Christoph Kayser
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld UniversityBielefeldGermany
| |
Collapse
|
8
|
Nogueira W, Dolhopiatenko H, Schierholz I, Büchner A, Mirkovic B, Bleichner MG, Debener S. Decoding Selective Attention in Normal Hearing Listeners and Bilateral Cochlear Implant Users With Concealed Ear EEG. Front Neurosci 2019; 13:720. [PMID: 31379479 PMCID: PMC6657402 DOI: 10.3389/fnins.2019.00720] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Accepted: 06/26/2019] [Indexed: 11/29/2022] Open
Abstract
Electroencephalography (EEG) data can be used to decode an attended speech source in normal-hearing (NH) listeners using high-density EEG caps, as well as around-the-ear EEG devices. The technology may find application in identifying the target speaker in a cocktail party like scenario and steer speech enhancement algorithms in cochlear implants (CIs). However, the worse spectral resolution and the electrical artifacts introduced by a CI may limit the applicability of this approach to CI users. The goal of this study was to investigate whether selective attention can be decoded in CI users using an around-the-ear EEG system (cEEGrid). The performances of high-density cap EEG recordings and cEEGrid EEG recordings were compared in a selective attention paradigm using an envelope tracking algorithm. Speech from two audio books was presented through insert earphones to NH listeners and via direct audio cable to the CI users. 10 NH listeners and 10 bilateral CI users participated in the study. Participants were instructed to attend to one out of the two concurrent speech streams while data were recorded by a 96-channel scalp EEG and an 18-channel cEEGrid setup simultaneously. Reconstruction performance was evaluated by means of parametric correlations between the reconstructed speech and both, the envelope of the attended and the unattended speech stream. Results confirm the feasibility to decode selective attention by means of single-trial EEG data in NH and CI users using a high-density EEG. All NH listeners and 9 out of 10 CI achieved high decoding accuracies. The cEEGrid was successful in decoding selective attention in 5 out of 10 NH listeners. The same result was obtained for CI users.
Collapse
Affiliation(s)
- Waldo Nogueira
- Department of Otolaryngology, Hearing4all, Hannover Medical School, Hanover, Germany
| | - Hanna Dolhopiatenko
- Department of Otolaryngology, Hearing4all, Hannover Medical School, Hanover, Germany
| | - Irina Schierholz
- Department of Otolaryngology, Hearing4all, Hannover Medical School, Hanover, Germany
| | - Andreas Büchner
- Department of Otolaryngology, Hearing4all, Hannover Medical School, Hanover, Germany
| | - Bojana Mirkovic
- Neuropsychology Lab, Department of Psychology, Hearing4all, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| | - Martin G Bleichner
- Neuropsychology Lab, Department of Psychology, Hearing4all, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| | - Stefan Debener
- Neuropsychology Lab, Department of Psychology, Hearing4all, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
9
|
Wong DDE, Fuglsang SA, Hjortkjær J, Ceolini E, Slaney M, de Cheveigné A. A Comparison of Regularization Methods in Forward and Backward Models for Auditory Attention Decoding. Front Neurosci 2018; 12:531. [PMID: 30131670 PMCID: PMC6090837 DOI: 10.3389/fnins.2018.00531] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2018] [Accepted: 07/16/2018] [Indexed: 11/17/2022] Open
Abstract
The decoding of selective auditory attention from noninvasive electroencephalogram (EEG) data is of interest in brain computer interface and auditory perception research. The current state-of-the-art approaches for decoding the attentional selection of listeners are based on linear mappings between features of sound streams and EEG responses (forward model), or vice versa (backward model). It has been shown that when the envelope of attended speech and EEG responses are used to derive such mapping functions, the model estimates can be used to discriminate between attended and unattended talkers. However, the predictive/reconstructive performance of the models is dependent on how the model parameters are estimated. There exist a number of model estimation methods that have been published, along with a variety of datasets. It is currently unclear if any of these methods perform better than others, as they have not yet been compared side by side on a single standardized dataset in a controlled fashion. Here, we present a comparative study of the ability of different estimation methods to classify attended speakers from multi-channel EEG data. The performance of the model estimation methods is evaluated using different performance metrics on a set of labeled EEG data from 18 subjects listening to mixtures of two speech streams. We find that when forward models predict the EEG from the attended audio, regularized models do not improve regression or classification accuracies. When backward models decode the attended speech from the EEG, regularization provides higher regression and classification accuracies.
Collapse
Affiliation(s)
- Daniel D. E. Wong
- Laboratoire des Systèmes Perceptifs, CNRS, UMR 8248, Paris, France
- Département d'Études Cognitives, École Normale Supérieure, PSL Research University, Paris, France
| | - Søren A. Fuglsang
- Department of Electrical Engineering, Danmarks Tekniske Universitet, Kongens Lyngby, Denmark
| | - Jens Hjortkjær
- Department of Electrical Engineering, Danmarks Tekniske Universitet, Kongens Lyngby, Denmark
- Danish Research Centre for Magnetic Resonance, Copenhagen University Hospital Hvidovre, Hvidovre, Denmark
| | - Enea Ceolini
- Institute of Neuroinformatics, University of Zürich, Zurich, Switzerland
| | - Malcolm Slaney
- AI Machine Perception, Google, Mountain View, CA, United States
| | - Alain de Cheveigné
- Laboratoire des Systèmes Perceptifs, CNRS, UMR 8248, Paris, France
- Département d'Études Cognitives, École Normale Supérieure, PSL Research University, Paris, France
- Ear Institute, University College London, London, United Kingdom
| |
Collapse
|
10
|
Yi HG, Xie Z, Reetzke R, Dimakis AG, Chandrasekaran B. Vowel decoding from single-trial speech-evoked electrophysiological responses: A feature-based machine learning approach. Brain Behav 2017; 7:e00665. [PMID: 28638700 PMCID: PMC5474698 DOI: 10.1002/brb3.665] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
INTRODUCTION Scalp-recorded electrophysiological responses to complex, periodic auditory signals reflect phase-locked activity from neural ensembles within the auditory system. These responses, referred to as frequency-following responses (FFRs), have been widely utilized to index typical and atypical representation of speech signals in the auditory system. One of the major limitations in FFR is the low signal-to-noise ratio at the level of single trials. For this reason, the analysis relies on averaging across thousands of trials. The ability to examine the quality of single-trial FFRs will allow investigation of trial-by-trial dynamics of the FFR, which has been impossible due to the averaging approach. METHODS In a novel, data-driven approach, we used machine learning principles to decode information related to the speech signal from single trial FFRs. FFRs were collected from participants while they listened to two vowels produced by two speakers. Scalp-recorded electrophysiological responses were projected onto a low-dimensional spectral feature space independently derived from the same two vowels produced by 40 speakers, which were not presented to the participants. A novel supervised machine learning classifier was trained to discriminate vowel tokens on a subset of FFRs from each participant, and tested on the remaining subset. RESULTS We demonstrate reliable decoding of speech signals at the level of single-trials by decomposing the raw FFR based on information-bearing spectral features in the speech signal that were independently derived. CONCLUSIONS Taken together, the ability to extract interpretable features at the level of single-trials in a data-driven manner offers unchartered possibilities in the noninvasive assessment of human auditory function.
Collapse
Affiliation(s)
- Han G Yi
- Department of Communication Sciences & Disorders Moody College of Communication The University of Texas at Austin Austin TX USA
| | - Zilong Xie
- Department of Communication Sciences & Disorders Moody College of Communication The University of Texas at Austin Austin TX USA
| | - Rachel Reetzke
- Department of Communication Sciences & Disorders Moody College of Communication The University of Texas at Austin Austin TX USA
| | - Alexandros G Dimakis
- Department of Electrical and Computer Engineering Cockrell School of Engineering The University of Texas at Austin Austin TX USA
| | - Bharath Chandrasekaran
- Department of Communication Sciences & Disorders Moody College of Communication The University of Texas at Austin Austin TX USA.,Department of Psychology College of Liberal Arts The University of Texas at Austin Austin TX USA.,Department of Linguistics College of Liberal Arts The University of Texas at Austin Austin TX USA.,Institute of Mental Health Research College of Liberal Arts The University of Texas at Austin Austin TX USA.,Institute for Neuroscience College of Liberal Arts The University of Texas at Austin Austin TX USA
| |
Collapse
|
11
|
Abstract
Target speaker identification is essential for speech enhancement algorithms in assistive devices aimed toward helping the hearing impaired. Several recent studies have reported that target speaker identification is possible through electroencephalography (EEG) recordings. If the EEG system could be reduced to acceptable size while retaining the signal quality, hearing aids could benefit from the integration with concealed EEG. To compare the performance of a multichannel around-the-ear EEG system with high-density cap EEG recordings an envelope tracking algorithm was applied in a competitive speaker paradigm. The data from 20 normal hearing listeners were concurrently collected from the traditional state-of-the-art laboratory wired EEG system and a wireless mobile EEG system with two bilaterally-placed around-the-ear electrode arrays (cEEGrids). The results show that the cEEGrid ear-EEG technology captured neural signals that allowed the identification of the attended speaker above chance-level, with 69.3% accuracy, while cap-EEG signals resulted in the accuracy of 84.8%. Further analyses investigated the influence of ear-EEG signal quality and revealed that the envelope tracking procedure was unaffected by variability in channel impedances. We conclude that the quality of concealed ear-EEG recordings as acquired with the cEEGrid array has potential to be used in the brain-computer interface steering of hearing aids.
Collapse
Affiliation(s)
- Bojana Mirkovic
- Neuropsychology Lab, Department of Psychology, University of OldenburgOldenburg, Germany; Cluster of Excellence "Hearing4all"Oldenburg, Germany
| | - Martin G Bleichner
- Neuropsychology Lab, Department of Psychology, University of OldenburgOldenburg, Germany; Cluster of Excellence "Hearing4all"Oldenburg, Germany
| | - Maarten De Vos
- Department of Engineering, Institute of Biomedical Engineering, University of Oxford Oxford, UK
| | - Stefan Debener
- Neuropsychology Lab, Department of Psychology, University of OldenburgOldenburg, Germany; Cluster of Excellence "Hearing4all"Oldenburg, Germany; Research Center Neurosensory Science, University of OldenburgOldenburg, Germany
| |
Collapse
|
12
|
Herff C, Heger D, de Pesters A, Telaar D, Brunner P, Schalk G, Schultz T. Brain-to-text: decoding spoken phrases from phone representations in the brain. Front Neurosci 2015; 9:217. [PMID: 26124702 PMCID: PMC4464168 DOI: 10.3389/fnins.2015.00217] [Citation(s) in RCA: 144] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2015] [Accepted: 05/18/2015] [Indexed: 11/24/2022] Open
Abstract
It has long been speculated whether communication between humans and machines based on natural speech related cortical activity is possible. Over the past decade, studies have suggested that it is feasible to recognize isolated aspects of speech from neural signals, such as auditory features, phones or one of a few isolated words. However, until now it remained an unsolved challenge to decode continuously spoken speech from the neural substrate associated with speech and language processing. Here, we show for the first time that continuously spoken speech can be decoded into the expressed words from intracranial electrocorticographic (ECoG) recordings.Specifically, we implemented a system, which we call Brain-To-Text that models single phones, employs techniques from automatic speech recognition (ASR), and thereby transforms brain activity while speaking into the corresponding textual representation. Our results demonstrate that our system can achieve word error rates as low as 25% and phone error rates below 50%. Additionally, our approach contributes to the current understanding of the neural basis of continuous speech production by identifying those cortical regions that hold substantial information about individual phones. In conclusion, the Brain-To-Text system described in this paper represents an important step toward human-machine communication based on imagined speech.
Collapse
Affiliation(s)
- Christian Herff
- Cognitive Systems Lab, Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology Karlsruhe, Germany
| | - Dominic Heger
- Cognitive Systems Lab, Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology Karlsruhe, Germany
| | - Adriana de Pesters
- New York State Department of Health, National Center for Adaptive Neurotechnologies, Wadsworth Center Albany, NY, USA ; Department of Biomedical Sciences, State University of New York at Albany Albany, NY, USA
| | - Dominic Telaar
- Cognitive Systems Lab, Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology Karlsruhe, Germany
| | - Peter Brunner
- New York State Department of Health, National Center for Adaptive Neurotechnologies, Wadsworth Center Albany, NY, USA ; Department of Neurology, Albany Medical College Albany, NY, USA
| | - Gerwin Schalk
- New York State Department of Health, National Center for Adaptive Neurotechnologies, Wadsworth Center Albany, NY, USA ; Department of Biomedical Sciences, State University of New York at Albany Albany, NY, USA ; Department of Neurology, Albany Medical College Albany, NY, USA
| | - Tanja Schultz
- Cognitive Systems Lab, Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology Karlsruhe, Germany
| |
Collapse
|