1
|
Topolinski S, Vogel T, Ingendahl M. Can sequencing of articulation ease explain the in-out effect? A preregistered test. Cogn Emot 2024:1-11. [PMID: 38465892 DOI: 10.1080/02699931.2024.2326072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 02/23/2024] [Indexed: 03/12/2024]
Abstract
Words whose consonantal articulation places move from the front of the mouth to the back (e.g. BADAKA; inward) receive more positive evaluations than words whose consonantal articulation places move from the back of the mouth to the front (e.g. KADABA; outward). This in-out effect has a variety of affective, cognitive, and even behavioural consequences, but its underlying mechanisms remain elusive. Most recently, a linguistic explanation has been proposed applying the linguistic easy-first account and the so-called labial-coronal effect from developmental speech research and phonology to the in-out effect: Labials (front) are easier to process than coronals (middle); and people prefer easy followed by harder motor components. Disentangling consonantal articulation direction and articulation place, the present three preregistered experiments (total N = 1012) found in-out effects for coronal-dorsal (back), and labial-dorsal articulation places. Critically, no in-out effect emerged for labial-coronal articulation places. Thus, the in-out effect is unlikely an instantiation of easy first.
Collapse
Affiliation(s)
| | - Tobias Vogel
- Department of Social Sciences, Darmstadt University of Applied Sciences, Darmstadt, Germany
| | - Moritz Ingendahl
- Department of Psychology, Ruhr University Bochum, Bochum, Germany
| |
Collapse
|
2
|
Meier A, Kuzdeba S, Jackson L, Daliri A, Tourville JA, Guenther FH, Greenlee JDW. Lateralization and Time-Course of Cortical Phonological Representations during Syllable Production. eNeuro 2023; 10:ENEURO.0474-22.2023. [PMID: 37739786 PMCID: PMC10561542 DOI: 10.1523/eneuro.0474-22.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 08/15/2023] [Accepted: 08/28/2023] [Indexed: 09/24/2023] Open
Abstract
Spoken language contains information at a broad range of timescales, from phonetic distinctions on the order of milliseconds to semantic contexts which shift over seconds to minutes. It is not well understood how the brain's speech production systems combine features at these timescales into a coherent vocal output. We investigated the spatial and temporal representations in cerebral cortex of three phonological units with different durations: consonants, vowels, and syllables. Electrocorticography (ECoG) recordings were obtained from five participants while speaking single syllables. We developed a novel clustering and Kalman filter-based trend analysis procedure to sort electrodes into temporal response profiles. A linear discriminant classifier was used to determine how strongly each electrode's response encoded phonological features. We found distinct time-courses of encoding phonological units depending on their duration: consonants were represented more during speech preparation, vowels were represented evenly throughout trials, and syllables during production. Locations of strongly speech-encoding electrodes (the top 30% of electrodes) likewise depended on phonological element duration, with consonant-encoding electrodes left-lateralized, vowel-encoding hemispherically balanced, and syllable-encoding right-lateralized. The lateralization of speech-encoding electrodes depended on onset time, with electrodes active before or after speech production favoring left hemisphere and those active during speech favoring the right. Single-electrode speech classification revealed cortical areas with preferential encoding of particular phonemic elements, including consonant encoding in the left precentral and postcentral gyri and syllable encoding in the right middle frontal gyrus. Our findings support neurolinguistic theories of left hemisphere specialization for processing short-timescale linguistic units and right hemisphere processing of longer-duration units.
Collapse
Affiliation(s)
- Andrew Meier
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA 02215
| | - Scott Kuzdeba
- Graduate Program for Neuroscience, Boston University, Boston, MA 02215
| | - Liam Jackson
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA 02215
| | - Ayoub Daliri
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA 02215
- College of Health Solutions, Arizona State University, Tempe, AZ 85004
| | - Jason A Tourville
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA 02215
| | - Frank H Guenther
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA 02215
- Department of Biomedical Engineering, Boston University, Boston, MA 02215
- Department of Radiology, Massachusetts General Hospital, Boston, MA 02215
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, MA 02215
| | - Jeremy D W Greenlee
- Department of Neurosurgery, University of Iowa Hospitals and Clinics, Iowa City, IA 52242
| |
Collapse
|
3
|
Berezutskaya J, Freudenburg ZV, Vansteensel MJ, Aarnoutse EJ, Ramsey NF, van Gerven MAJ. Direct speech reconstruction from sensorimotor brain activity with optimized deep learning models. J Neural Eng 2023; 20:056010. [PMID: 37467739 PMCID: PMC10510111 DOI: 10.1088/1741-2552/ace8be] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Revised: 07/12/2023] [Accepted: 07/19/2023] [Indexed: 07/21/2023]
Abstract
Objective.Development of brain-computer interface (BCI) technology is key for enabling communication in individuals who have lost the faculty of speech due to severe motor paralysis. A BCI control strategy that is gaining attention employs speech decoding from neural data. Recent studies have shown that a combination of direct neural recordings and advanced computational models can provide promising results. Understanding which decoding strategies deliver best and directly applicable results is crucial for advancing the field.Approach.In this paper, we optimized and validated a decoding approach based on speech reconstruction directly from high-density electrocorticography recordings from sensorimotor cortex during a speech production task.Main results.We show that (1) dedicated machine learning optimization of reconstruction models is key for achieving the best reconstruction performance; (2) individual word decoding in reconstructed speech achieves 92%-100% accuracy (chance level is 8%); (3) direct reconstruction from sensorimotor brain activity produces intelligible speech.Significance.These results underline the need for model optimization in achieving best speech decoding results and highlight the potential that reconstruction-based speech decoding from sensorimotor cortex can offer for development of next-generation BCI technology for communication.
Collapse
Affiliation(s)
- Julia Berezutskaya
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht 3584 CX, The Netherlands
- Donders Center for Brain, Cognition and Behaviour, Nijmegen 6525 GD, The Netherlands
| | - Zachary V Freudenburg
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht 3584 CX, The Netherlands
| | - Mariska J Vansteensel
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht 3584 CX, The Netherlands
| | - Erik J Aarnoutse
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht 3584 CX, The Netherlands
| | - Nick F Ramsey
- Brain Center, Department of Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht 3584 CX, The Netherlands
| | - Marcel A J van Gerven
- Donders Center for Brain, Cognition and Behaviour, Nijmegen 6525 GD, The Netherlands
| |
Collapse
|
4
|
Thomas TM, Singh A, Bullock LP, Liang D, Morse CW, Scherschligt X, Seymour JP, Tandon N. Decoding articulatory and phonetic components of naturalistic continuous speech from the distributed language network. J Neural Eng 2023; 20:046030. [PMID: 37487487 DOI: 10.1088/1741-2552/ace9fb] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 07/24/2023] [Indexed: 07/26/2023]
Abstract
Objective.The speech production network relies on a widely distributed brain network. However, research and development of speech brain-computer interfaces (speech-BCIs) has typically focused on decoding speech only from superficial subregions readily accessible by subdural grid arrays-typically placed over the sensorimotor cortex. Alternatively, the technique of stereo-electroencephalography (sEEG) enables access to distributed brain regions using multiple depth electrodes with lower surgical risks, especially in patients with brain injuries resulting in aphasia and other speech disorders.Approach.To investigate the decoding potential of widespread electrode coverage in multiple cortical sites, we used a naturalistic continuous speech production task. We obtained neural recordings using sEEG from eight participants while they read aloud sentences. We trained linear classifiers to decode distinct speech components (articulatory components and phonemes) solely based on broadband gamma activity and evaluated the decoding performance using nested five-fold cross-validation.Main Results.We achieved an average classification accuracy of 18.7% across 9 places of articulation (e.g. bilabials, palatals), 26.5% across 5 manner of articulation (MOA) labels (e.g. affricates, fricatives), and 4.81% across 38 phonemes. The highest classification accuracies achieved with a single large dataset were 26.3% for place of articulation, 35.7% for MOA, and 9.88% for phonemes. Electrodes that contributed high decoding power were distributed across multiple sulcal and gyral sites in both dominant and non-dominant hemispheres, including ventral sensorimotor, inferior frontal, superior temporal, and fusiform cortices. Rather than finding a distinct cortical locus for each speech component, we observed neural correlates of both articulatory and phonetic components in multiple hubs of a widespread language production network.Significance.These results reveal the distributed cortical representations whose activity can enable decoding speech components during continuous speech through the use of this minimally invasive recording method, elucidating language neurobiology and neural targets for future speech-BCIs.
Collapse
Affiliation(s)
- Tessy M Thomas
- Vivian L. Smith Department of Neurosurgery, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
- Texas Institute for Restorative Neurotechnologies, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
| | - Aditya Singh
- Vivian L. Smith Department of Neurosurgery, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
- Texas Institute for Restorative Neurotechnologies, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
| | - Latané P Bullock
- Vivian L. Smith Department of Neurosurgery, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
- Texas Institute for Restorative Neurotechnologies, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
| | - Daniel Liang
- Department of Computer Science, Rice University, Houston, TX 77005, United States of America
| | - Cale W Morse
- Vivian L. Smith Department of Neurosurgery, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
- Texas Institute for Restorative Neurotechnologies, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
| | - Xavier Scherschligt
- Vivian L. Smith Department of Neurosurgery, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
- Texas Institute for Restorative Neurotechnologies, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
| | - John P Seymour
- Vivian L. Smith Department of Neurosurgery, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
- Texas Institute for Restorative Neurotechnologies, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
- Department of Electrical & Computer Engineering, Rice University, Houston, TX 77005, United States of America
| | - Nitin Tandon
- Vivian L. Smith Department of Neurosurgery, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
- Texas Institute for Restorative Neurotechnologies, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America
- Memorial Hermann Hospital, Texas Medical Center, Houston, TX 77030, United States of America
| |
Collapse
|
5
|
Meng K, Goodarzy F, Kim E, Park YJ, Kim JS, Cook MJ, Chung CK, Grayden DB. Continuous synthesis of artificial speech sounds from human cortical surface recordings during silent speech production. J Neural Eng 2023; 20:046019. [PMID: 37459853 DOI: 10.1088/1741-2552/ace7f6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 07/17/2023] [Indexed: 07/28/2023]
Abstract
Objective. Brain-computer interfaces can restore various forms of communication in paralyzed patients who have lost their ability to articulate intelligible speech. This study aimed to demonstrate the feasibility of closed-loop synthesis of artificial speech sounds from human cortical surface recordings during silent speech production.Approach. Ten participants with intractable epilepsy were temporarily implanted with intracranial electrode arrays over cortical surfaces. A decoding model that predicted audible outputs directly from patient-specific neural feature inputs was trained during overt word reading and immediately tested with overt, mimed and imagined word reading. Predicted outputs were later assessed objectively against corresponding voice recordings and subjectively through human perceptual judgments.Main results. Artificial speech sounds were successfully synthesized during overt and mimed utterances by two participants with some coverage of the precentral gyrus. About a third of these sounds were correctly identified by naïve listeners in two-alternative forced-choice tasks. A similar outcome could not be achieved during imagined utterances by any of the participants. However, neural feature contribution analyses suggested the presence of exploitable activation patterns during imagined speech in the postcentral gyrus and the superior temporal gyrus. In future work, a more comprehensive coverage of cortical surfaces, including posterior parts of the middle frontal gyrus and the inferior frontal gyrus, could improve synthesis performance during imagined speech.Significance.As the field of speech neuroprostheses is rapidly moving toward clinical trials, this study addressed important considerations about task instructions and brain coverage when conducting research on silent speech with non-target participants.
Collapse
Affiliation(s)
- Kevin Meng
- Department of Biomedical Engineering, The University of Melbourne, Melbourne, Australia
- Graeme Clark Institute for Biomedical Engineering, The University of Melbourne, Melbourne, Australia
| | - Farhad Goodarzy
- Department of Medicine, St Vincent's Hospital, The University of Melbourne, Melbourne, Australia
| | - EuiYoung Kim
- Interdisciplinary Program in Neuroscience, Seoul National University, Seoul, Republic of Korea
| | - Ye Jin Park
- Department of Brain and Cognitive Sciences, Seoul National University, Seoul, Republic of Korea
| | - June Sic Kim
- Research Institute of Basic Sciences, Seoul National University, Seoul, Republic of Korea
| | - Mark J Cook
- Department of Biomedical Engineering, The University of Melbourne, Melbourne, Australia
- Graeme Clark Institute for Biomedical Engineering, The University of Melbourne, Melbourne, Australia
- Department of Medicine, St Vincent's Hospital, The University of Melbourne, Melbourne, Australia
| | - Chun Kee Chung
- Department of Brain and Cognitive Sciences, Seoul National University, Seoul, Republic of Korea
- Department of Neurosurgery, Seoul National University Hospital, Seoul, Republic of Korea
| | - David B Grayden
- Department of Biomedical Engineering, The University of Melbourne, Melbourne, Australia
- Graeme Clark Institute for Biomedical Engineering, The University of Melbourne, Melbourne, Australia
- Department of Medicine, St Vincent's Hospital, The University of Melbourne, Melbourne, Australia
| |
Collapse
|
6
|
Verwoert M, Ottenhoff MC, Goulis S, Colon AJ, Wagner L, Tousseyn S, van Dijk JP, Kubben PL, Herff C. Dataset of Speech Production in intracranial.Electroencephalography. Sci Data 2022; 9:434. [PMID: 35869138 PMCID: PMC9307753 DOI: 10.1038/s41597-022-01542-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Accepted: 07/08/2022] [Indexed: 11/28/2022] Open
Abstract
Speech production is an intricate process involving a large number of muscles and cognitive processes. The neural processes underlying speech production are not completely understood. As speech is a uniquely human ability, it can not be investigated in animal models. High-fidelity human data can only be obtained in clinical settings and is therefore not easily available to all researchers. Here, we provide a dataset of 10 participants reading out individual words while we measured intracranial EEG from a total of 1103 electrodes. The data, with its high temporal resolution and coverage of a large variety of cortical and sub-cortical brain regions, can help in understanding the speech production process better. Simultaneously, the data can be used to test speech decoding and synthesis approaches from neural data to develop speech Brain-Computer Interfaces and speech neuroprostheses. Measurement(s) | Brain activity | Technology Type(s) | Stereotactic electroencephalography | Sample Characteristic - Organism | Homo sapiens | Sample Characteristic - Environment | Epilepsy monitoring center | Sample Characteristic - Location | The Netherlands |
Collapse
|
7
|
Metzger SL, Liu JR, Moses DA, Dougherty ME, Seaton MP, Littlejohn KT, Chartier J, Anumanchipalli GK, Tu-Chan A, Ganguly K, Chang EF. Generalizable spelling using a speech neuroprosthesis in an individual with severe limb and vocal paralysis. Nat Commun 2022; 13:6510. [PMID: 36347863 PMCID: PMC9643551 DOI: 10.1038/s41467-022-33611-3] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 09/26/2022] [Indexed: 11/09/2022] Open
Abstract
Neuroprostheses have the potential to restore communication to people who cannot speak or type due to paralysis. However, it is unclear if silent attempts to speak can be used to control a communication neuroprosthesis. Here, we translated direct cortical signals in a clinical-trial participant (ClinicalTrials.gov; NCT03698149) with severe limb and vocal-tract paralysis into single letters to spell out full sentences in real time. We used deep-learning and language-modeling techniques to decode letter sequences as the participant attempted to silently spell using code words that represented the 26 English letters (e.g. "alpha" for "a"). We leveraged broad electrode coverage beyond speech-motor cortex to include supplemental control signals from hand cortex and complementary information from low- and high-frequency signal components to improve decoding accuracy. We decoded sentences using words from a 1,152-word vocabulary at a median character error rate of 6.13% and speed of 29.4 characters per minute. In offline simulations, we showed that our approach generalized to large vocabularies containing over 9,000 words (median character error rate of 8.23%). These results illustrate the clinical viability of a silently controlled speech neuroprosthesis to generate sentences from a large vocabulary through a spelling-based approach, complementing previous demonstrations of direct full-word decoding.
Collapse
Affiliation(s)
- Sean L. Metzger
- grid.266102.10000 0001 2297 6811Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA USA ,grid.266102.10000 0001 2297 6811Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA USA ,grid.47840.3f0000 0001 2181 7878University of California, Berkeley - University of California, San Francisco Graduate Program in Bioengineering, Berkeley, CA USA
| | - Jessie R. Liu
- grid.266102.10000 0001 2297 6811Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA USA ,grid.266102.10000 0001 2297 6811Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA USA ,grid.47840.3f0000 0001 2181 7878University of California, Berkeley - University of California, San Francisco Graduate Program in Bioengineering, Berkeley, CA USA
| | - David A. Moses
- grid.266102.10000 0001 2297 6811Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA USA ,grid.266102.10000 0001 2297 6811Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA USA
| | - Maximilian E. Dougherty
- grid.266102.10000 0001 2297 6811Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA USA
| | - Margaret P. Seaton
- grid.266102.10000 0001 2297 6811Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA USA
| | - Kaylo T. Littlejohn
- grid.266102.10000 0001 2297 6811Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA USA ,grid.266102.10000 0001 2297 6811Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA USA ,grid.47840.3f0000 0001 2181 7878Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA USA
| | - Josh Chartier
- grid.266102.10000 0001 2297 6811Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA USA ,grid.266102.10000 0001 2297 6811Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA USA
| | - Gopala K. Anumanchipalli
- grid.266102.10000 0001 2297 6811Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA USA ,grid.266102.10000 0001 2297 6811Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA USA ,grid.47840.3f0000 0001 2181 7878Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA USA
| | - Adelyn Tu-Chan
- grid.266102.10000 0001 2297 6811Department of Neurology, University of California, San Francisco, San Francisco, CA USA
| | - Karunesh Ganguly
- grid.266102.10000 0001 2297 6811Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA USA ,grid.266102.10000 0001 2297 6811Department of Neurology, University of California, San Francisco, San Francisco, CA USA
| | - Edward F. Chang
- grid.266102.10000 0001 2297 6811Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA USA ,grid.266102.10000 0001 2297 6811Weill Institute for Neuroscience, University of California, San Francisco, San Francisco, CA USA ,grid.47840.3f0000 0001 2181 7878University of California, Berkeley - University of California, San Francisco Graduate Program in Bioengineering, Berkeley, CA USA
| |
Collapse
|
8
|
Cooney C, Folli R, Coyle D. Opportunities, pitfalls and trade-offs in designing protocols for measuring the neural correlates of speech. Neurosci Biobehav Rev 2022; 140:104783. [PMID: 35907491 DOI: 10.1016/j.neubiorev.2022.104783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 07/12/2022] [Accepted: 07/15/2022] [Indexed: 11/25/2022]
Abstract
Decoding speech and speech-related processes directly from the human brain has intensified in studies over recent years as such a decoder has the potential to positively impact people with limited communication capacity due to disease or injury. Additionally, it can present entirely new forms of human-computer interaction and human-machine communication in general and facilitate better neuroscientific understanding of speech processes. Here, we synthesize the literature on neural speech decoding pertaining to how speech decoding experiments have been conducted, coalescing around a necessity for thoughtful experimental design aimed at specific research goals, and robust procedures for evaluating speech decoding paradigms. We examine the use of different modalities for presenting stimuli to participants, methods for construction of paradigms including timings and speech rhythms, and possible linguistic considerations. In addition, novel methods for eliciting naturalistic speech and validating imagined speech task performance in experimental settings are presented based on recent research. We also describe the multitude of terms used to instruct participants on how to produce imagined speech during experiments and propose methods for investigating the effect of these terms on imagined speech decoding. We demonstrate that the range of experimental procedures used in neural speech decoding studies can have unintended consequences which can impact upon the efficacy of the knowledge obtained. The review delineates the strengths and weaknesses of present approaches and poses methodological advances which we anticipate will enhance experimental design, and progress toward the optimal design of movement independent direct speech brain-computer interfaces.
Collapse
Affiliation(s)
- Ciaran Cooney
- Intelligent Systems Research Centre, Ulster University, Derry, UK.
| | - Raffaella Folli
- Institute for Research in Social Sciences, Ulster University, Jordanstown, UK
| | - Damien Coyle
- Intelligent Systems Research Centre, Ulster University, Derry, UK
| |
Collapse
|
9
|
Jeong JH, Cho JH, Lee YE, Lee SH, Shin GH, Kweon YS, Millán JDR, Müller KR, Lee SW. 2020 International brain-computer interface competition: A review. Front Hum Neurosci 2022; 16:898300. [PMID: 35937679 PMCID: PMC9354666 DOI: 10.3389/fnhum.2022.898300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Accepted: 07/01/2022] [Indexed: 11/16/2022] Open
Abstract
The brain-computer interface (BCI) has been investigated as a form of communication tool between the brain and external devices. BCIs have been extended beyond communication and control over the years. The 2020 international BCI competition aimed to provide high-quality neuroscientific data for open access that could be used to evaluate the current degree of technical advances in BCI. Although there are a variety of remaining challenges for future BCI advances, we discuss some of more recent application directions: (i) few-shot EEG learning, (ii) micro-sleep detection (iii) imagined speech decoding, (iv) cross-session classification, and (v) EEG(+ear-EEG) detection in an ambulatory environment. Not only did scientists from the BCI field compete, but scholars with a broad variety of backgrounds and nationalities participated in the competition to address these challenges. Each dataset was prepared and separated into three data that were released to the competitors in the form of training and validation sets followed by a test set. Remarkable BCI advances were identified through the 2020 competition and indicated some trends of interest to BCI researchers.
Collapse
Affiliation(s)
- Ji-Hoon Jeong
- School of Computer Science, Chungbuk National University, Cheongju, South Korea
| | - Jeong-Hyun Cho
- Department of Brain and Cognitive Engineering, Korea University, Seoul, South Korea
| | - Young-Eun Lee
- Department of Brain and Cognitive Engineering, Korea University, Seoul, South Korea
| | - Seo-Hyun Lee
- Department of Brain and Cognitive Engineering, Korea University, Seoul, South Korea
| | - Gi-Hwan Shin
- Department of Brain and Cognitive Engineering, Korea University, Seoul, South Korea
| | - Young-Seok Kweon
- Department of Brain and Cognitive Engineering, Korea University, Seoul, South Korea
| | - José del R. Millán
- Department of Electrical and Computer Engineering, University of Texas at Austin, Austin, TX, United States
| | - Klaus-Robert Müller
- Department of Brain and Cognitive Engineering, Korea University, Seoul, South Korea
- Machine Learning Group, Department of Computer Science, Berlin Institute of Technology, Berlin, Germany
- Max Planck Institute for Informatics, Saarbrucken, Germany
- Department of Artificial Intelligence, Korea University, Seoul, South Korea
| | - Seong-Whan Lee
- Department of Brain and Cognitive Engineering, Korea University, Seoul, South Korea
- Department of Artificial Intelligence, Korea University, Seoul, South Korea
| |
Collapse
|
10
|
刘 艳, 龚 安, 丁 鹏, 赵 磊, 钱 谦, 周 建, 苏 磊, 伏 云. [Key technology of brain-computer interaction based on speech imagery]. SHENG WU YI XUE GONG CHENG XUE ZA ZHI = JOURNAL OF BIOMEDICAL ENGINEERING = SHENGWU YIXUE GONGCHENGXUE ZAZHI 2022; 39:596-611. [PMID: 35788530 PMCID: PMC10950764 DOI: 10.7507/1001-5515.202107018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Revised: 04/14/2022] [Indexed: 06/15/2023]
Abstract
Speech expression is an important high-level cognitive behavior of human beings. The realization of this behavior is closely related to human brain activity. Both true speech expression and speech imagination can activate part of the same brain area. Therefore, speech imagery becomes a new paradigm of brain-computer interaction. Brain-computer interface (BCI) based on speech imagery has the advantages of spontaneous generation, no training, and friendliness to subjects, so it has attracted the attention of many scholars. However, this interactive technology is not mature in the design of experimental paradigms and the choice of imagination materials, and there are many issues that need to be discussed urgently. Therefore, in response to these problems, this article first expounds the neural mechanism of speech imagery. Then, by reviewing the previous BCI research of speech imagery, the mainstream methods and core technologies of experimental paradigm, imagination materials, data processing and so on are systematically analyzed. Finally, the key problems and main challenges that restrict the development of this type of BCI are discussed. And the future development and application perspective of the speech imaginary BCI system are prospected.
Collapse
Affiliation(s)
- 艳鹏 刘
- 昆明理工大学 信息工程与自动化学院(昆明 650500)School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, P. R. China
- 昆明理工大学 脑认知与脑机智能融合创新团队(昆明 650500)Brain Cognition and Brain-Computer Intelligence Integration Group, Kunming University of Science and Technology, Kunming 650500, P. R. China
| | - 安民 龚
- 昆明理工大学 信息工程与自动化学院(昆明 650500)School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, P. R. China
| | - 鹏 丁
- 昆明理工大学 信息工程与自动化学院(昆明 650500)School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, P. R. China
- 昆明理工大学 脑认知与脑机智能融合创新团队(昆明 650500)Brain Cognition and Brain-Computer Intelligence Integration Group, Kunming University of Science and Technology, Kunming 650500, P. R. China
| | - 磊 赵
- 昆明理工大学 信息工程与自动化学院(昆明 650500)School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, P. R. China
| | - 谦 钱
- 昆明理工大学 信息工程与自动化学院(昆明 650500)School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, P. R. China
- 昆明理工大学 脑认知与脑机智能融合创新团队(昆明 650500)Brain Cognition and Brain-Computer Intelligence Integration Group, Kunming University of Science and Technology, Kunming 650500, P. R. China
| | - 建华 周
- 昆明理工大学 信息工程与自动化学院(昆明 650500)School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, P. R. China
- 昆明理工大学 脑认知与脑机智能融合创新团队(昆明 650500)Brain Cognition and Brain-Computer Intelligence Integration Group, Kunming University of Science and Technology, Kunming 650500, P. R. China
| | - 磊 苏
- 昆明理工大学 信息工程与自动化学院(昆明 650500)School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, P. R. China
- 昆明理工大学 脑认知与脑机智能融合创新团队(昆明 650500)Brain Cognition and Brain-Computer Intelligence Integration Group, Kunming University of Science and Technology, Kunming 650500, P. R. China
| | - 云发 伏
- 昆明理工大学 信息工程与自动化学院(昆明 650500)School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, P. R. China
- 昆明理工大学 脑认知与脑机智能融合创新团队(昆明 650500)Brain Cognition and Brain-Computer Intelligence Integration Group, Kunming University of Science and Technology, Kunming 650500, P. R. China
- 武警工程大学 信息工程学院(西安 710000)College of Information Engineering, Engineering University of PAP, Xi’an 710000, P. R. China
- 昆明理工大学 理学院(昆明 650500)Faculty of Science, Kunming University of Science and Technology, Kunming 650500, P. R. China
| |
Collapse
|
11
|
Wilson BS, Tucci DL, Moses DA, Chang EF, Young NM, Zeng FG, Lesica NA, Bur AM, Kavookjian H, Mussatto C, Penn J, Goodwin S, Kraft S, Wang G, Cohen JM, Ginsburg GS, Dawson G, Francis HW. Harnessing the Power of Artificial Intelligence in Otolaryngology and the Communication Sciences. J Assoc Res Otolaryngol 2022; 23:319-349. [PMID: 35441936 PMCID: PMC9086071 DOI: 10.1007/s10162-022-00846-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Accepted: 04/02/2022] [Indexed: 02/01/2023] Open
Abstract
Use of artificial intelligence (AI) is a burgeoning field in otolaryngology and the communication sciences. A virtual symposium on the topic was convened from Duke University on October 26, 2020, and was attended by more than 170 participants worldwide. This review presents summaries of all but one of the talks presented during the symposium; recordings of all the talks, along with the discussions for the talks, are available at https://www.youtube.com/watch?v=ktfewrXvEFg and https://www.youtube.com/watch?v=-gQ5qX2v3rg . Each of the summaries is about 2500 words in length and each summary includes two figures. This level of detail far exceeds the brief summaries presented in traditional reviews and thus provides a more-informed glimpse into the power and diversity of current AI applications in otolaryngology and the communication sciences and how to harness that power for future applications.
Collapse
Affiliation(s)
- Blake S. Wilson
- grid.26009.3d0000 0004 1936 7961Department of Head and Neck Surgery & Communication Sciences, Duke University School of Medicine, Durham, NC 27710 USA ,grid.26009.3d0000 0004 1936 7961Duke Hearing Center, Duke University School of Medicine, Durham, NC 27710 USA ,grid.26009.3d0000 0004 1936 7961Department of Electrical & Computer Engineering, Duke University, Durham, NC 27708 USA ,grid.26009.3d0000 0004 1936 7961Department of Biomedical Engineering, Duke University, Durham, NC 27708 USA ,grid.410711.20000 0001 1034 1720Department of Otolaryngology – Head & Neck Surgery, University of North Carolina, Chapel Hill, Chapel Hill, NC 27599 USA
| | - Debara L. Tucci
- grid.26009.3d0000 0004 1936 7961Department of Head and Neck Surgery & Communication Sciences, Duke University School of Medicine, Durham, NC 27710 USA ,grid.214431.10000 0001 2226 8444National Institute On Deafness and Other Communication Disorders, National Institutes of Health, Bethesda, MD 20892 USA
| | - David A. Moses
- grid.266102.10000 0001 2297 6811Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA 94143 USA ,grid.266102.10000 0001 2297 6811UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94117 USA
| | - Edward F. Chang
- grid.266102.10000 0001 2297 6811Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA 94143 USA ,grid.266102.10000 0001 2297 6811UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94117 USA
| | - Nancy M. Young
- grid.413808.60000 0004 0388 2248Division of Otolaryngology, Ann and Robert H. Lurie Childrens Hospital of Chicago, Chicago, IL 60611 USA ,grid.16753.360000 0001 2299 3507Department of Otolaryngology - Head and Neck Surgery, Northwestern University Feinberg School of Medicine, Chicago, IL 60611 USA ,grid.16753.360000 0001 2299 3507Department of Communication, Knowles Hearing Center, Northwestern University, Evanston, IL 60208 USA
| | - Fan-Gang Zeng
- grid.266093.80000 0001 0668 7243Center for Hearing Research, University of California, Irvine, Irvine, CA 92697 USA ,grid.266093.80000 0001 0668 7243Department of Anatomy and Neurobiology, University of California, Irvine, Irvine, CA 92697 USA ,grid.266093.80000 0001 0668 7243Department of Biomedical Engineering, University of California, Irvine, Irvine, CA 92697 USA ,grid.266093.80000 0001 0668 7243Department of Cognitive Sciences, University of California, Irvine, Irvine, CA 92697 USA ,grid.266093.80000 0001 0668 7243Department of Otolaryngology – Head and Neck Surgery, University of California, Irvine, CA 92697 USA
| | - Nicholas A. Lesica
- grid.83440.3b0000000121901201UCL Ear Institute, University College London, London, WC1X 8EE UK
| | - Andrés M. Bur
- grid.266515.30000 0001 2106 0692Department of Otolaryngology - Head and Neck Surgery, Medical Center, University of Kansas, Kansas City, KS 66160 USA
| | - Hannah Kavookjian
- grid.266515.30000 0001 2106 0692Department of Otolaryngology - Head and Neck Surgery, Medical Center, University of Kansas, Kansas City, KS 66160 USA
| | - Caroline Mussatto
- grid.266515.30000 0001 2106 0692Department of Otolaryngology - Head and Neck Surgery, Medical Center, University of Kansas, Kansas City, KS 66160 USA
| | - Joseph Penn
- grid.266515.30000 0001 2106 0692Department of Otolaryngology - Head and Neck Surgery, Medical Center, University of Kansas, Kansas City, KS 66160 USA
| | - Sara Goodwin
- grid.266515.30000 0001 2106 0692Department of Otolaryngology - Head and Neck Surgery, Medical Center, University of Kansas, Kansas City, KS 66160 USA
| | - Shannon Kraft
- grid.266515.30000 0001 2106 0692Department of Otolaryngology - Head and Neck Surgery, Medical Center, University of Kansas, Kansas City, KS 66160 USA
| | - Guanghui Wang
- grid.68312.3e0000 0004 1936 9422Department of Computer Science, Ryerson University, Toronto, ON M5B 2K3 Canada
| | - Jonathan M. Cohen
- grid.26009.3d0000 0004 1936 7961Department of Head and Neck Surgery & Communication Sciences, Duke University School of Medicine, Durham, NC 27710 USA ,grid.415014.50000 0004 0575 3669ENT Department, Kaplan Medical Center, 7661041 Rehovot, Israel
| | - Geoffrey S. Ginsburg
- grid.26009.3d0000 0004 1936 7961Department of Biomedical Engineering, Duke University, Durham, NC 27708 USA ,grid.26009.3d0000 0004 1936 7961MEDx (Medicine & Engineering at Duke), Duke University, Durham, NC 27708 USA ,grid.26009.3d0000 0004 1936 7961Center for Applied Genomics & Precision Medicine, Duke University School of Medicine, Durham, NC 27710 USA ,grid.26009.3d0000 0004 1936 7961Department of Medicine, Duke University School of Medicine, Durham, NC 27710 USA ,grid.26009.3d0000 0004 1936 7961Department of Pathology, Duke University School of Medicine, Durham, NC 27710 USA ,grid.26009.3d0000 0004 1936 7961Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710 USA
| | - Geraldine Dawson
- grid.26009.3d0000 0004 1936 7961Duke Institute for Brain Sciences, Duke University, Durham, NC 27710 USA ,grid.26009.3d0000 0004 1936 7961Duke Center for Autism and Brain Development, Duke University School of Medicine and the Duke Institute for Brain Sciences, NIH Autism Center of Excellence, Durham, NC 27705 USA ,grid.26009.3d0000 0004 1936 7961Department of Psychiatry and Behavioral Sciences, Duke University School of Medicine, Durham, NC 27701 USA
| | - Howard W. Francis
- grid.26009.3d0000 0004 1936 7961Department of Head and Neck Surgery & Communication Sciences, Duke University School of Medicine, Durham, NC 27710 USA
| |
Collapse
|
12
|
Zhang L, Du Y. Lip movements enhance speech representations and effective connectivity in auditory dorsal stream. Neuroimage 2022; 257:119311. [PMID: 35589000 DOI: 10.1016/j.neuroimage.2022.119311] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 05/09/2022] [Accepted: 05/11/2022] [Indexed: 11/25/2022] Open
Abstract
Viewing speaker's lip movements facilitates speech perception, especially under adverse listening conditions, but the neural mechanisms of this perceptual benefit at the phonemic and feature levels remain unclear. This fMRI study addressed this question by quantifying regional multivariate representation and network organization underlying audiovisual speech-in-noise perception. Behaviorally, valid lip movements improved recognition of place of articulation to aid phoneme identification. Meanwhile, lip movements enhanced neural representations of phonemes in left auditory dorsal stream regions, including frontal speech motor areas and supramarginal gyrus (SMG). Moreover, neural representations of place of articulation and voicing features were promoted differentially by lip movements in these regions, with voicing enhanced in Broca's area while place of articulation better encoded in left ventral premotor cortex and SMG. Next, dynamic causal modeling (DCM) analysis showed that such local changes were accompanied by strengthened effective connectivity along the dorsal stream. Moreover, the neurite orientation dispersion of the left arcuate fasciculus, the bearing skeleton of auditory dorsal stream, predicted the visual enhancements of neural representations and effective connectivity. Our findings provide novel insight to speech science that lip movements promote both local phonemic and feature encoding and network connectivity in the dorsal pathway and the functional enhancement is mediated by the microstructural architecture of the circuit.
Collapse
Affiliation(s)
- Lei Zhang
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China 100101; Department of Psychology, University of Chinese Academy of Sciences, Beijing, China 100049
| | - Yi Du
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China 100101; Department of Psychology, University of Chinese Academy of Sciences, Beijing, China 100049; CAS Center for Excellence in Brain Science and Intelligence Technology, Shanghai, China 200031; Chinese Institute for Brain Research, Beijing, China 102206.
| |
Collapse
|
13
|
Tamura S, Hirose N, Mitsudo T, Hoaki N, Nakamura I, Onitsuka T, Hirano Y. Multi-modal imaging of the auditory-larynx motor network for voicing perception. Neuroimage 2022; 251:118981. [PMID: 35150835 DOI: 10.1016/j.neuroimage.2022.118981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Revised: 12/20/2021] [Accepted: 02/07/2022] [Indexed: 10/19/2022] Open
Abstract
Voicing is one of the most important characteristics of phonetic speech sounds. Despite its importance, voicing perception mechanisms remain largely unknown. To explore auditory-motor networks associated with voicing perception, we firstly examined the brain regions that showed common activities for voicing production and perception using functional magnetic resonance imaging. Results indicated that the auditory and speech motor areas were activated with the operculum parietale 4 (OP4) during both voicing production and perception. Secondly, we used a magnetoencephalography and examined the dynamical functional connectivity of the auditory-motor networks during a perceptual categorization task of /da/-/ta/ continuum stimuli varying in voice onset time (VOT) from 0 to 40 ms in 10 ms steps. Significant functional connectivities from the auditory cortical regions to the larynx motor area via OP4 were observed only when perceiving the stimulus with VOT 30 ms. In addition, regional activity analysis showed that the neural representation of VOT in the auditory cortical regions was mostly correlated with categorical perception of voicing but did not reflect the perception of stimulus with VOT 30 ms. We suggest that the larynx motor area, which is considered to play a crucial role in voicing production, contributes to categorical perception of voicing by complementing the temporal processing in the auditory cortical regions.
Collapse
Affiliation(s)
- Shunsuke Tamura
- Department of Neuropsychiatry, Graduate School of Medical Sciences, Kyushu University, 3-1-1 Maidashi, Higashiku, Fukuoka 812-8582, Japan.
| | - Nobuyuki Hirose
- Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan
| | - Takako Mitsudo
- Department of Neuropsychiatry, Graduate School of Medical Sciences, Kyushu University, 3-1-1 Maidashi, Higashiku, Fukuoka 812-8582, Japan
| | | | - Itta Nakamura
- Department of Neuropsychiatry, Graduate School of Medical Sciences, Kyushu University, 3-1-1 Maidashi, Higashiku, Fukuoka 812-8582, Japan
| | - Toshiaki Onitsuka
- Department of Neuropsychiatry, Graduate School of Medical Sciences, Kyushu University, 3-1-1 Maidashi, Higashiku, Fukuoka 812-8582, Japan
| | - Yoji Hirano
- Department of Neuropsychiatry, Graduate School of Medical Sciences, Kyushu University, 3-1-1 Maidashi, Higashiku, Fukuoka 812-8582, Japan; Neural Dynamics Laboratory, Research Service, VA Boston Healthcare System, and Department of Psychiatry, Harvard Medical School, Boston, United States
| |
Collapse
|
14
|
Huggins JE, Krusienski D, Vansteensel MJ, Valeriani D, Thelen A, Stavisky S, Norton JJS, Nijholt A, Müller-Putz G, Kosmyna N, Korczowski L, Kapeller C, Herff C, Halder S, Guger C, Grosse-Wentrup M, Gaunt R, Dusang AN, Clisson P, Chavarriaga R, Anderson CW, Allison BZ, Aksenova T, Aarnoutse E. Workshops of the Eighth International Brain-Computer Interface Meeting: BCIs: The Next Frontier. BRAIN-COMPUTER INTERFACES 2022; 9:69-101. [PMID: 36908334 PMCID: PMC9997957 DOI: 10.1080/2326263x.2021.2009654] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2021] [Accepted: 11/15/2021] [Indexed: 12/11/2022]
Abstract
The Eighth International Brain-Computer Interface (BCI) Meeting was held June 7-9th, 2021 in a virtual format. The conference continued the BCI Meeting series' interactive nature with 21 workshops covering topics in BCI (also called brain-machine interface) research. As in the past, workshops covered the breadth of topics in BCI. Some workshops provided detailed examinations of specific methods, hardware, or processes. Others focused on specific BCI applications or user groups. Several workshops continued consensus building efforts designed to create BCI standards and increase the ease of comparisons between studies and the potential for meta-analysis and large multi-site clinical trials. Ethical and translational considerations were both the primary topic for some workshops or an important secondary consideration for others. The range of BCI applications continues to expand, with more workshops focusing on approaches that can extend beyond the needs of those with physical impairments. This paper summarizes each workshop, provides background information and references for further study, presents an overview of the discussion topics, and describes the conclusion, challenges, or initiatives that resulted from the interactions and discussion at the workshop.
Collapse
Affiliation(s)
- Jane E Huggins
- Department of Physical Medicine and Rehabilitation, Department of Biomedical Engineering, Neuroscience Graduate Program, University of Michigan, Ann Arbor, Michigan, United States 325 East Eisenhower, Room 3017; Ann Arbor, Michigan 48108-5744, 734-936-7177
| | - Dean Krusienski
- Department of Biomedical Engineering, Virginia Commonwealth University, Richmond, VA 23219
| | - Mariska J Vansteensel
- UMC Utrecht Brain Center, Dept of Neurosurgery, University Medical Center Utrecht, The Netherlands
| | | | - Antonia Thelen
- eemagine Medical Imaging Solutions GmbH, Berlin, Germany
| | | | - James J S Norton
- National Center for Adaptive Neurotechnologies, US Department of Veterans Affairs, 113 Holland Ave, Albany, NY 12208
| | - Anton Nijholt
- Faculty EEMCS, University of Twente, Enschede, The Netherlands
| | - Gernot Müller-Putz
- Institute of Neural Engineering, GrazBCI Lab, Graz University of Technology, Stremayrgasse 16/4, 8010 Graz, Austria
| | - Nataliya Kosmyna
- Massachusetts Institute of Technology (MIT), Media Lab, E14-548, Cambridge, MA 02139, Unites States
| | | | | | - Christian Herff
- School of Mental Health and Neuroscience, Maastricht University, Maastricht, The Netherlands
| | | | - Christoph Guger
- g.tec medical engineering GmbH/Guger Technologies OG, Austria, Sierningstrasse 14, 4521 Schiedlberg, Austria, +43725122240-0
| | - Moritz Grosse-Wentrup
- Research Group Neuroinformatics, Faculty of Computer Science, Vienna Cognitive Science Hub, Data Science @ Uni Vienna University of Vienna
| | - Robert Gaunt
- Rehab Neural Engineering Labs, Department of Physical Medicine and Rehabilitation, Center for the Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, PA, USA, 3520 5th Ave, Suite 300, Pittsburgh, PA 15213, 412-383-1426
| | - Aliceson Nicole Dusang
- Department of Electrical and Computer Engineering, School of Engineering, Brown University, Carney Institute for Brain Science, Brown University, Providence, RI
- Department of Veterans Affairs Medical Center, Center for Neurorestoration and Neurotechnology, Rehabilitation R&D Service, Providence, RI
- Center for Neurotechnology and Neurorecovery, Neurology, Massachusetts General Hospital, Boston, MA
| | | | - Ricardo Chavarriaga
- IEEE Standards Association Industry Connections group on neurotechnologies for brain-machine interface, Center for Artificial Intelligence, School of Engineering, ZHAW-Zurich University of Applied Sciences, Switzerland, Switzerland
| | - Charles W Anderson
- Department of Computer Science, Molecular, Cellular and Integrative Neurosience Program, Colorado State University, Fort Collins, CO 80523
| | - Brendan Z Allison
- Dept. of Cognitive Science, Mail Code 0515, University of California at San Diego, La Jolla, United States, 619-534-9754
| | - Tetiana Aksenova
- University Grenoble Alpes, CEA, LETI, Clinatec, Grenoble 38000, France
| | - Erik Aarnoutse
- UMC Utrecht Brain Center, Department of Neurology & Neurosurgery, University Medical Center Utrecht, Heidelberglaan 100, 3584 CX Utrecht, The Netherlands
| |
Collapse
|
15
|
Luo S, Rabbani Q, Crone NE. Brain-Computer Interface: Applications to Speech Decoding and Synthesis to Augment Communication. Neurotherapeutics 2022; 19:263-273. [PMID: 35099768 PMCID: PMC9130409 DOI: 10.1007/s13311-022-01190-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/16/2022] [Indexed: 01/03/2023] Open
Abstract
Damage or degeneration of motor pathways necessary for speech and other movements, as in brainstem strokes or amyotrophic lateral sclerosis (ALS), can interfere with efficient communication without affecting brain structures responsible for language or cognition. In the worst-case scenario, this can result in the locked in syndrome (LIS), a condition in which individuals cannot initiate communication and can only express themselves by answering yes/no questions with eye blinks or other rudimentary movements. Existing augmentative and alternative communication (AAC) devices that rely on eye tracking can improve the quality of life for people with this condition, but brain-computer interfaces (BCIs) are also increasingly being investigated as AAC devices, particularly when eye tracking is too slow or unreliable. Moreover, with recent and ongoing advances in machine learning and neural recording technologies, BCIs may offer the only means to go beyond cursor control and text generation on a computer, to allow real-time synthesis of speech, which would arguably offer the most efficient and expressive channel for communication. The potential for BCI speech synthesis has only recently been realized because of seminal studies of the neuroanatomical and neurophysiological underpinnings of speech production using intracranial electrocorticographic (ECoG) recordings in patients undergoing epilepsy surgery. These studies have shown that cortical areas responsible for vocalization and articulation are distributed over a large area of ventral sensorimotor cortex, and that it is possible to decode speech and reconstruct its acoustics from ECoG if these areas are recorded with sufficiently dense and comprehensive electrode arrays. In this article, we review these advances, including the latest neural decoding strategies that range from deep learning models to the direct concatenation of speech units. We also discuss state-of-the-art vocoders that are integral in constructing natural-sounding audio waveforms for speech BCIs. Finally, this review outlines some of the challenges ahead in directly synthesizing speech for patients with LIS.
Collapse
Affiliation(s)
- Shiyu Luo
- Department of Biomedical Engineering, The Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| | - Qinwan Rabbani
- Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, MD, USA
| | - Nathan E Crone
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| |
Collapse
|
16
|
Moses DA, Metzger SL, Liu JR, Anumanchipalli GK, Makin JG, Sun PF, Chartier J, Dougherty ME, Liu PM, Abrams GM, Tu-Chan A, Ganguly K, Chang EF. Neuroprosthesis for Decoding Speech in a Paralyzed Person with Anarthria. N Engl J Med 2021; 385:217-227. [PMID: 34260835 PMCID: PMC8972947 DOI: 10.1056/nejmoa2027540] [Citation(s) in RCA: 144] [Impact Index Per Article: 48.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
BACKGROUND Technology to restore the ability to communicate in paralyzed persons who cannot speak has the potential to improve autonomy and quality of life. An approach that decodes words and sentences directly from the cerebral cortical activity of such patients may represent an advancement over existing methods for assisted communication. METHODS We implanted a subdural, high-density, multielectrode array over the area of the sensorimotor cortex that controls speech in a person with anarthria (the loss of the ability to articulate speech) and spastic quadriparesis caused by a brain-stem stroke. Over the course of 48 sessions, we recorded 22 hours of cortical activity while the participant attempted to say individual words from a vocabulary set of 50 words. We used deep-learning algorithms to create computational models for the detection and classification of words from patterns in the recorded cortical activity. We applied these computational models, as well as a natural-language model that yielded next-word probabilities given the preceding words in a sequence, to decode full sentences as the participant attempted to say them. RESULTS We decoded sentences from the participant's cortical activity in real time at a median rate of 15.2 words per minute, with a median word error rate of 25.6%. In post hoc analyses, we detected 98% of the attempts by the participant to produce individual words, and we classified words with 47.1% accuracy using cortical signals that were stable throughout the 81-week study period. CONCLUSIONS In a person with anarthria and spastic quadriparesis caused by a brain-stem stroke, words and sentences were decoded directly from cortical activity during attempted speech with the use of deep-learning models and a natural-language model. (Funded by Facebook and others; ClinicalTrials.gov number, NCT03698149.).
Collapse
Affiliation(s)
- David A Moses
- From the Department of Neurological Surgery (D.A.M., S.L.M., J.R.L., G.K.A., J.G.M., P.F.S., J.C., M.E.D., E.F.C.), the Weill Institute for Neuroscience (D.A.M., S.L.M., J.R.L., G.K.A., J.G.M., P.F.S., J.C., K.G., E.F.C.), and the Departments of Rehabilitation Services (P.M.L.) and Neurology (G.M.A., A.T.-C., K.G.), University of California, San Francisco (UCSF), San Francisco, and the Graduate Program in Bioengineering, University of California, Berkeley-UCSF, Berkeley (S.L.M., J.R.L., E.F.C.)
| | - Sean L Metzger
- From the Department of Neurological Surgery (D.A.M., S.L.M., J.R.L., G.K.A., J.G.M., P.F.S., J.C., M.E.D., E.F.C.), the Weill Institute for Neuroscience (D.A.M., S.L.M., J.R.L., G.K.A., J.G.M., P.F.S., J.C., K.G., E.F.C.), and the Departments of Rehabilitation Services (P.M.L.) and Neurology (G.M.A., A.T.-C., K.G.), University of California, San Francisco (UCSF), San Francisco, and the Graduate Program in Bioengineering, University of California, Berkeley-UCSF, Berkeley (S.L.M., J.R.L., E.F.C.)
| | - Jessie R Liu
- From the Department of Neurological Surgery (D.A.M., S.L.M., J.R.L., G.K.A., J.G.M., P.F.S., J.C., M.E.D., E.F.C.), the Weill Institute for Neuroscience (D.A.M., S.L.M., J.R.L., G.K.A., J.G.M., P.F.S., J.C., K.G., E.F.C.), and the Departments of Rehabilitation Services (P.M.L.) and Neurology (G.M.A., A.T.-C., K.G.), University of California, San Francisco (UCSF), San Francisco, and the Graduate Program in Bioengineering, University of California, Berkeley-UCSF, Berkeley (S.L.M., J.R.L., E.F.C.)
| | - Gopala K Anumanchipalli
- From the Department of Neurological Surgery (D.A.M., S.L.M., J.R.L., G.K.A., J.G.M., P.F.S., J.C., M.E.D., E.F.C.), the Weill Institute for Neuroscience (D.A.M., S.L.M., J.R.L., G.K.A., J.G.M., P.F.S., J.C., K.G., E.F.C.), and the Departments of Rehabilitation Services (P.M.L.) and Neurology (G.M.A., A.T.-C., K.G.), University of California, San Francisco (UCSF), San Francisco, and the Graduate Program in Bioengineering, University of California, Berkeley-UCSF, Berkeley (S.L.M., J.R.L., E.F.C.)
| | - Joseph G Makin
- From the Department of Neurological Surgery (D.A.M., S.L.M., J.R.L., G.K.A., J.G.M., P.F.S., J.C., M.E.D., E.F.C.), the Weill Institute for Neuroscience (D.A.M., S.L.M., J.R.L., G.K.A., J.G.M., P.F.S., J.C., K.G., E.F.C.), and the Departments of Rehabilitation Services (P.M.L.) and Neurology (G.M.A., A.T.-C., K.G.), University of California, San Francisco (UCSF), San Francisco, and the Graduate Program in Bioengineering, University of California, Berkeley-UCSF, Berkeley (S.L.M., J.R.L., E.F.C.)
| | - Pengfei F Sun
- From the Department of Neurological Surgery (D.A.M., S.L.M., J.R.L., G.K.A., J.G.M., P.F.S., J.C., M.E.D., E.F.C.), the Weill Institute for Neuroscience (D.A.M., S.L.M., J.R.L., G.K.A., J.G.M., P.F.S., J.C., K.G., E.F.C.), and the Departments of Rehabilitation Services (P.M.L.) and Neurology (G.M.A., A.T.-C., K.G.), University of California, San Francisco (UCSF), San Francisco, and the Graduate Program in Bioengineering, University of California, Berkeley-UCSF, Berkeley (S.L.M., J.R.L., E.F.C.)
| | - Josh Chartier
- From the Department of Neurological Surgery (D.A.M., S.L.M., J.R.L., G.K.A., J.G.M., P.F.S., J.C., M.E.D., E.F.C.), the Weill Institute for Neuroscience (D.A.M., S.L.M., J.R.L., G.K.A., J.G.M., P.F.S., J.C., K.G., E.F.C.), and the Departments of Rehabilitation Services (P.M.L.) and Neurology (G.M.A., A.T.-C., K.G.), University of California, San Francisco (UCSF), San Francisco, and the Graduate Program in Bioengineering, University of California, Berkeley-UCSF, Berkeley (S.L.M., J.R.L., E.F.C.)
| | - Maximilian E Dougherty
- From the Department of Neurological Surgery (D.A.M., S.L.M., J.R.L., G.K.A., J.G.M., P.F.S., J.C., M.E.D., E.F.C.), the Weill Institute for Neuroscience (D.A.M., S.L.M., J.R.L., G.K.A., J.G.M., P.F.S., J.C., K.G., E.F.C.), and the Departments of Rehabilitation Services (P.M.L.) and Neurology (G.M.A., A.T.-C., K.G.), University of California, San Francisco (UCSF), San Francisco, and the Graduate Program in Bioengineering, University of California, Berkeley-UCSF, Berkeley (S.L.M., J.R.L., E.F.C.)
| | - Patricia M Liu
- From the Department of Neurological Surgery (D.A.M., S.L.M., J.R.L., G.K.A., J.G.M., P.F.S., J.C., M.E.D., E.F.C.), the Weill Institute for Neuroscience (D.A.M., S.L.M., J.R.L., G.K.A., J.G.M., P.F.S., J.C., K.G., E.F.C.), and the Departments of Rehabilitation Services (P.M.L.) and Neurology (G.M.A., A.T.-C., K.G.), University of California, San Francisco (UCSF), San Francisco, and the Graduate Program in Bioengineering, University of California, Berkeley-UCSF, Berkeley (S.L.M., J.R.L., E.F.C.)
| | - Gary M Abrams
- From the Department of Neurological Surgery (D.A.M., S.L.M., J.R.L., G.K.A., J.G.M., P.F.S., J.C., M.E.D., E.F.C.), the Weill Institute for Neuroscience (D.A.M., S.L.M., J.R.L., G.K.A., J.G.M., P.F.S., J.C., K.G., E.F.C.), and the Departments of Rehabilitation Services (P.M.L.) and Neurology (G.M.A., A.T.-C., K.G.), University of California, San Francisco (UCSF), San Francisco, and the Graduate Program in Bioengineering, University of California, Berkeley-UCSF, Berkeley (S.L.M., J.R.L., E.F.C.)
| | - Adelyn Tu-Chan
- From the Department of Neurological Surgery (D.A.M., S.L.M., J.R.L., G.K.A., J.G.M., P.F.S., J.C., M.E.D., E.F.C.), the Weill Institute for Neuroscience (D.A.M., S.L.M., J.R.L., G.K.A., J.G.M., P.F.S., J.C., K.G., E.F.C.), and the Departments of Rehabilitation Services (P.M.L.) and Neurology (G.M.A., A.T.-C., K.G.), University of California, San Francisco (UCSF), San Francisco, and the Graduate Program in Bioengineering, University of California, Berkeley-UCSF, Berkeley (S.L.M., J.R.L., E.F.C.)
| | - Karunesh Ganguly
- From the Department of Neurological Surgery (D.A.M., S.L.M., J.R.L., G.K.A., J.G.M., P.F.S., J.C., M.E.D., E.F.C.), the Weill Institute for Neuroscience (D.A.M., S.L.M., J.R.L., G.K.A., J.G.M., P.F.S., J.C., K.G., E.F.C.), and the Departments of Rehabilitation Services (P.M.L.) and Neurology (G.M.A., A.T.-C., K.G.), University of California, San Francisco (UCSF), San Francisco, and the Graduate Program in Bioengineering, University of California, Berkeley-UCSF, Berkeley (S.L.M., J.R.L., E.F.C.)
| | - Edward F Chang
- From the Department of Neurological Surgery (D.A.M., S.L.M., J.R.L., G.K.A., J.G.M., P.F.S., J.C., M.E.D., E.F.C.), the Weill Institute for Neuroscience (D.A.M., S.L.M., J.R.L., G.K.A., J.G.M., P.F.S., J.C., K.G., E.F.C.), and the Departments of Rehabilitation Services (P.M.L.) and Neurology (G.M.A., A.T.-C., K.G.), University of California, San Francisco (UCSF), San Francisco, and the Graduate Program in Bioengineering, University of California, Berkeley-UCSF, Berkeley (S.L.M., J.R.L., E.F.C.)
| |
Collapse
|
17
|
Wilson GH, Stavisky SD, Willett FR, Avansino DT, Kelemen JN, Hochberg LR, Henderson JM, Druckmann S, Shenoy KV. Decoding spoken English from intracortical electrode arrays in dorsal precentral gyrus. J Neural Eng 2020; 17:066007. [PMID: 33236720 PMCID: PMC8293867 DOI: 10.1088/1741-2552/abbfef] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
OBJECTIVE To evaluate the potential of intracortical electrode array signals for brain-computer interfaces (BCIs) to restore lost speech, we measured the performance of decoders trained to discriminate a comprehensive basis set of 39 English phonemes and to synthesize speech sounds via a neural pattern matching method. We decoded neural correlates of spoken-out-loud words in the 'hand knob' area of precentral gyrus, a step toward the eventual goal of decoding attempted speech from ventral speech areas in patients who are unable to speak. APPROACH Neural and audio data were recorded while two BrainGate2 pilot clinical trial participants, each with two chronically-implanted 96-electrode arrays, spoke 420 different words that broadly sampled English phonemes. Phoneme onsets were identified from audio recordings, and their identities were then classified from neural features consisting of each electrode's binned action potential counts or high-frequency local field potential power. Speech synthesis was performed using the 'Brain-to-Speech' pattern matching method. We also examined two potential confounds specific to decoding overt speech: acoustic contamination of neural signals and systematic differences in labeling different phonemes' onset times. MAIN RESULTS A linear decoder achieved up to 29.3% classification accuracy (chance = 6%) across 39 phonemes, while an RNN classifier achieved 33.9% accuracy. Parameter sweeps indicated that performance did not saturate when adding more electrodes or more training data, and that accuracy improved when utilizing time-varying structure in the data. Microphonic contamination and phoneme onset differences modestly increased decoding accuracy, but could be mitigated by acoustic artifact subtraction and using a neural speech onset marker, respectively. Speech synthesis achieved r = 0.523 correlation between true and reconstructed audio. SIGNIFICANCE The ability to decode speech using intracortical electrode array signals from a nontraditional speech area suggests that placing electrode arrays in ventral speech areas is a promising direction for speech BCIs.
Collapse
Affiliation(s)
- Guy H Wilson
- Neurosciences Graduate Program, Stanford University, Stanford, CA, United States of America
| | - Sergey D Stavisky
- Department of Neurosurgery, Stanford University, Stanford, CA, United States of America
- Wu Tsai Neurosciences Institute and Bio-X Institute, Stanford University, Stanford, CA, United States of America
- Department of Electrical Engineering, Stanford University, Stanford, CA, United States of America
| | - Francis R Willett
- Department of Neurosurgery, Stanford University, Stanford, CA, United States of America
- Department of Electrical Engineering, Stanford University, Stanford, CA, United States of America
- Howard Hughes Medical Institute at Stanford University, Stanford, CA, United States of America
| | - Donald T Avansino
- Department of Neurosurgery, Stanford University, Stanford, CA, United States of America
| | - Jessica N Kelemen
- Department of Neurology, Harvard Medical School, Boston, MA, United States of America
| | - Leigh R Hochberg
- Department of Neurology, Harvard Medical School, Boston, MA, United States of America
- Center for Neurotechnology and Neurorecovery, Dept. of Neurology, Massachusetts General Hospital, Boston, MA, United States of America
- VA RR&D Center for Neurorestoration and Neurotechnology, Rehabilitation R&D Service, Providence VA Medical Center, Providence, RI, United States of America
- Carney Institute for Brain Science and School of Engineering, Brown University, Providence, RI, United States of America
| | - Jaimie M Henderson
- Department of Neurosurgery, Stanford University, Stanford, CA, United States of America
- Wu Tsai Neurosciences Institute and Bio-X Institute, Stanford University, Stanford, CA, United States of America
| | - Shaul Druckmann
- Wu Tsai Neurosciences Institute and Bio-X Institute, Stanford University, Stanford, CA, United States of America
- Department of Neurobiology, Stanford University, Stanford, CA, United States of America
| | - Krishna V Shenoy
- Wu Tsai Neurosciences Institute and Bio-X Institute, Stanford University, Stanford, CA, United States of America
- Department of Electrical Engineering, Stanford University, Stanford, CA, United States of America
- Howard Hughes Medical Institute at Stanford University, Stanford, CA, United States of America
- Department of Neurobiology, Stanford University, Stanford, CA, United States of America
- Department of Bioengineering, Stanford University, Stanford, CA, United States of America
| |
Collapse
|
18
|
Stavisky SD, Willett FR, Wilson GH, Murphy BA, Rezaii P, Avansino DT, Memberg WD, Miller JP, Kirsch RF, Hochberg LR, Ajiboye AB, Druckmann S, Shenoy KV, Henderson JM. Neural ensemble dynamics in dorsal motor cortex during speech in people with paralysis. eLife 2019; 8:e46015. [PMID: 31820736 PMCID: PMC6954053 DOI: 10.7554/elife.46015] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Accepted: 11/14/2019] [Indexed: 01/20/2023] Open
Abstract
Speaking is a sensorimotor behavior whose neural basis is difficult to study with single neuron resolution due to the scarcity of human intracortical measurements. We used electrode arrays to record from the motor cortex 'hand knob' in two people with tetraplegia, an area not previously implicated in speech. Neurons modulated during speaking and during non-speaking movements of the tongue, lips, and jaw. This challenges whether the conventional model of a 'motor homunculus' division by major body regions extends to the single-neuron scale. Spoken words and syllables could be decoded from single trials, demonstrating the potential of intracortical recordings for brain-computer interfaces to restore speech. Two neural population dynamics features previously reported for arm movements were also present during speaking: a component that was mostly invariant across initiating different words, followed by rotatory dynamics during speaking. This suggests that common neural dynamical motifs may underlie movement of arm and speech articulators.
Collapse
Affiliation(s)
- Sergey D Stavisky
- Department of NeurosurgeryStanford UniversityStanfordUnited States
- Department of Electrical EngineeringStanford UniversityStanfordUnited States
| | - Francis R Willett
- Department of NeurosurgeryStanford UniversityStanfordUnited States
- Department of Electrical EngineeringStanford UniversityStanfordUnited States
| | - Guy H Wilson
- Neurosciences ProgramStanford UniversityStanfordUnited States
| | - Brian A Murphy
- Department of Biomedical EngineeringCase Western Reserve UniversityClevelandUnited States
- FES Center, Rehab R&D ServiceLouis Stokes Cleveland Department of Veterans Affairs Medical CenterClevelandUnited States
| | - Paymon Rezaii
- Department of NeurosurgeryStanford UniversityStanfordUnited States
| | | | - William D Memberg
- Department of Biomedical EngineeringCase Western Reserve UniversityClevelandUnited States
- FES Center, Rehab R&D ServiceLouis Stokes Cleveland Department of Veterans Affairs Medical CenterClevelandUnited States
| | - Jonathan P Miller
- FES Center, Rehab R&D ServiceLouis Stokes Cleveland Department of Veterans Affairs Medical CenterClevelandUnited States
- Department of NeurosurgeryUniversity Hospitals Cleveland Medical CenterClevelandUnited States
| | - Robert F Kirsch
- Department of Biomedical EngineeringCase Western Reserve UniversityClevelandUnited States
- FES Center, Rehab R&D ServiceLouis Stokes Cleveland Department of Veterans Affairs Medical CenterClevelandUnited States
| | - Leigh R Hochberg
- VA RR&D Center for Neurorestoration and Neurotechnology, Rehabilitation R&D ServiceProvidence VA Medical CenterProvidenceUnited States
- Center for Neurotechnology and Neurorecovery, Department of NeurologyMassachusetts General Hospital, Harvard Medical SchoolBostonUnited States
- School of Engineering and Robert J. & Nandy D. Carney Institute for Brain ScienceBrown UniversityProvidenceUnited States
| | - A Bolu Ajiboye
- Department of Biomedical EngineeringCase Western Reserve UniversityClevelandUnited States
- FES Center, Rehab R&D ServiceLouis Stokes Cleveland Department of Veterans Affairs Medical CenterClevelandUnited States
| | - Shaul Druckmann
- Department of NeurobiologyStanford UniversityStanfordUnited States
| | - Krishna V Shenoy
- Department of Electrical EngineeringStanford UniversityStanfordUnited States
- Department of NeurobiologyStanford UniversityStanfordUnited States
- Department of BioengineeringStanford UniversityStanfordUnited States
- Howard Hughes Medical Institute, Stanford UniversityStanfordUnited States
- Wu Tsai Neurosciences InstituteStanford UniversityStanfordUnited States
- Bio-X ProgramStanford UniversityStanfordUnited States
| | - Jaimie M Henderson
- Department of NeurosurgeryStanford UniversityStanfordUnited States
- Wu Tsai Neurosciences InstituteStanford UniversityStanfordUnited States
- Bio-X ProgramStanford UniversityStanfordUnited States
| |
Collapse
|
19
|
Huggins JE, Guger C, Aarnoutse E, Allison B, Anderson CW, Bedrick S, Besio W, Chavarriaga R, Collinger JL, Do AH, Herff C, Hohmann M, Kinsella M, Lee K, Lotte F, Müller-Putz G, Nijholt A, Pels E, Peters B, Putze F, Rupp R, Schalk G, Scott S, Tangermann M, Tubig P, Zander T. Workshops of the Seventh International Brain-Computer Interface Meeting: Not Getting Lost in Translation. BRAIN-COMPUTER INTERFACES 2019; 6:71-101. [PMID: 33033729 PMCID: PMC7539697 DOI: 10.1080/2326263x.2019.1697163] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Accepted: 10/30/2019] [Indexed: 12/11/2022]
Abstract
The Seventh International Brain-Computer Interface (BCI) Meeting was held May 21-25th, 2018 at the Asilomar Conference Grounds, Pacific Grove, California, United States. The interactive nature of this conference was embodied by 25 workshops covering topics in BCI (also called brain-machine interface) research. Workshops covered foundational topics such as hardware development and signal analysis algorithms, new and imaginative topics such as BCI for virtual reality and multi-brain BCIs, and translational topics such as clinical applications and ethical assumptions of BCI development. BCI research is expanding in the diversity of applications and populations for whom those applications are being developed. BCI applications are moving toward clinical readiness as researchers struggle with the practical considerations to make sure that BCI translational efforts will be successful. This paper summarizes each workshop, providing an overview of the topic of discussion, references for additional information, and identifying future issues for research and development that resulted from the interactions and discussion at the workshop.
Collapse
Affiliation(s)
- Jane E Huggins
- Department of Physical Medicine and Rehabilitation, Department of Biomedical Engineering, Neuroscience Graduate Program, University of Michigan, Ann Arbor, Michigan, United States, 325 East Eisenhower, Room 3017; Ann Arbor, Michigan 48108-5744
| | - Christoph Guger
- g.tec medical engineering GmbH/Guger Technologies OG, Austria, Sierningstrasse 14, 4521 Schiedlberg, Austria
| | - Erik Aarnoutse
- UMC Utrecht Brain Center, Department of Neurology & Neurosurgery, University Medical Center Utrecht, Heidelberglaan 100, 3584 CX Utrecht, The Netherlands
| | - Brendan Allison
- Dept. of Cognitive Science, Mail Code 0515, University of California at San Diego, La Jolla, United States
| | - Charles W Anderson
- Department of Computer Science, Molecular, Cellular and Integrative Neurosience Program, Colorado State University, Fort Collins, CO 80523
| | - Steven Bedrick
- Center for Spoken Language Understanding, Oregon Health & Science University, Portland, OR 97239
| | - Walter Besio
- Department of Electrical, Computer, & Biomedical Engineering and Interdisciplinary Neuroscience Program, University of Rhode Island, Kingston, Rhode Island, USA, CREmedical Corp. Kingston, Rhode Island, USA
| | - Ricardo Chavarriaga
- Defitech Chair in Brain-Machine Interface (CNBI), Center for Neuroprosthetics, Ecole Polytechnique Fédérale de Lausanne - EPFL, Switzerland
| | - Jennifer L Collinger
- University of Pittsburgh, Department of Physical Medicine and Rehabilitation, VA Pittsburgh Healthcare System, Department of Veterans Affairs, 3520 5th Ave, Pittsburgh, PA, 15213
| | - An H Do
- UC Irvine Brain Computer Interface Lab, Department of Neurology, University of California, Irvine
| | - Christian Herff
- School of Mental Health and Neuroscience, Maastricht University, Maastricht, The Netherlands
| | - Matthias Hohmann
- Max Planck Institute for Intelligent Systems, Department for Empirical Inference, Max-Planck-Ring 4, 72074 Tübingen, Germany
| | - Michelle Kinsella
- Oregon Health & Science University, Institute on Development & Disability, 707 SW Gaines St, #1290, Portland, OR 97239
| | - Kyuhwa Lee
- Swiss Federal Institute of Technology in Lausanne-EPFL
| | - Fabien Lotte
- Inria Bordeaux Sud-Ouest, LaBRI (Univ. Bordeaux/CNRS/Bordeaux INP), 200 avenue de la vieille tour, 33405, Talence Cedex, France
| | | | - Anton Nijholt
- Faculty EEMCS, University of Twente, Enschede, The Netherlands
| | - Elmar Pels
- UMC Utrecht Brain Center, Department of Neurology & Neurosurgery, University Medical Center Utrecht, Heidelberglaan 100, 3584 CX Utrecht, The Netherlands
| | - Betts Peters
- Oregon Health & Science University, Institute on Development & Disability, 707 SW Gaines St, #1290, Portland, OR 97239
| | - Felix Putze
- University of Bremen, Germany, Cognitive Systems Lab, University of Bremen, Enrique-Schmidt-Straße 5 (Cartesium), 28359 Bremen
| | - Rüdiger Rupp
- Spinal Cord Injury Center, Heidelberg University Hospital
| | - Gerwin Schalk
- National Center for Adaptive Neurotechnologies, Wadsworth Center, NYS Dept. of Health, Dept. of Neurology, Albany Medical College, Dept. of Biomed. Sci., State Univ. of New York at Albany, Center for Medical Sciences 2003, 150 New Scotland Avenue, Albany, New York 12208
| | - Stephanie Scott
- Department of Media Communications, Colorado State University, Fort Collins, CO 80523
| | - Michael Tangermann
- Brain State Decoding Lab, Cluster of Excellence BrainLinks-BrainTools, Computer Science Dept., University of Freiburg, Germany, Autonomous Intelligent Systems Lab, Computer Science Dept., University of Freiburg, Germany
| | - Paul Tubig
- Department of Philosophy, Center for Neurotechnology, University of Washington, Savery Hall, Room 361, Seattle, WA 98195
| | - Thorsten Zander
- Team PhyPA, Biological Psychology and Neuroergonomics, Technische Universität Berlin, Berlin, Germany, 7 Zander Laboratories B.V., Amsterdam, The Netherlands
| |
Collapse
|
20
|
Herff C, Diener L, Angrick M, Mugler E, Tate MC, Goldrick MA, Krusienski DJ, Slutzky MW, Schultz T. Generating Natural, Intelligible Speech From Brain Activity in Motor, Premotor, and Inferior Frontal Cortices. Front Neurosci 2019; 13:1267. [PMID: 31824257 PMCID: PMC6882773 DOI: 10.3389/fnins.2019.01267] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 11/07/2019] [Indexed: 12/17/2022] Open
Abstract
Neural interfaces that directly produce intelligible speech from brain activity would allow people with severe impairment from neurological disorders to communicate more naturally. Here, we record neural population activity in motor, premotor and inferior frontal cortices during speech production using electrocorticography (ECoG) and show that ECoG signals alone can be used to generate intelligible speech output that can preserve conversational cues. To produce speech directly from neural data, we adapted a method from the field of speech synthesis called unit selection, in which units of speech are concatenated to form audible output. In our approach, which we call Brain-To-Speech, we chose subsequent units of speech based on the measured ECoG activity to generate audio waveforms directly from the neural recordings. Brain-To-Speech employed the user's own voice to generate speech that sounded very natural and included features such as prosody and accentuation. By investigating the brain areas involved in speech production separately, we found that speech motor cortex provided more information for the reconstruction process than the other cortical areas.
Collapse
Affiliation(s)
- Christian Herff
- School of Mental Health & Neuroscience, Maastricht University, Maastricht, Netherlands
- Cognitive Systems Lab, University of Bremen, Bremen, Germany
| | - Lorenz Diener
- Cognitive Systems Lab, University of Bremen, Bremen, Germany
| | - Miguel Angrick
- Cognitive Systems Lab, University of Bremen, Bremen, Germany
| | - Emily Mugler
- Department of Neurology, Northwestern University, Chicago, IL, United States
| | - Matthew C. Tate
- Department of Neurosurgery, Northwestern University, Chicago, IL, United States
| | - Matthew A. Goldrick
- Department of Linguistics, Northwestern University, Chicago, IL, United States
| | - Dean J. Krusienski
- Biomedical Engineering Department, Virginia Commonwealth University, Richmond, VA, United States
| | - Marc W. Slutzky
- Department of Neurology, Northwestern University, Chicago, IL, United States
- Department of Physiology, Northwestern University, Chicago, IL, United States
- Department of Physical Medicine & Rehabilitation, Northwestern University, Chicago, IL, United States
| | - Tanja Schultz
- Cognitive Systems Lab, University of Bremen, Bremen, Germany
| |
Collapse
|
21
|
Martin S, Mikutta C, Leonard MK, Hungate D, Koelsch S, Shamma S, Chang EF, Millán JDR, Knight RT, Pasley BN. Neural Encoding of Auditory Features during Music Perception and Imagery. Cereb Cortex 2019; 28:4222-4233. [PMID: 29088345 DOI: 10.1093/cercor/bhx277] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2017] [Indexed: 11/12/2022] Open
Abstract
Despite many behavioral and neuroimaging investigations, it remains unclear how the human cortex represents spectrotemporal sound features during auditory imagery, and how this representation compares to auditory perception. To assess this, we recorded electrocorticographic signals from an epileptic patient with proficient music ability in 2 conditions. First, the participant played 2 piano pieces on an electronic piano with the sound volume of the digital keyboard on. Second, the participant replayed the same piano pieces, but without auditory feedback, and the participant was asked to imagine hearing the music in his mind. In both conditions, the sound output of the keyboard was recorded, thus allowing precise time-locking between the neural activity and the spectrotemporal content of the music imagery. This novel task design provided a unique opportunity to apply receptive field modeling techniques to quantitatively study neural encoding during auditory mental imagery. In both conditions, we built encoding models to predict high gamma neural activity (70-150 Hz) from the spectrogram representation of the recorded sound. We found robust spectrotemporal receptive fields during auditory imagery with substantial, but not complete overlap in frequency tuning and cortical location compared to receptive fields measured during auditory perception.
Collapse
Affiliation(s)
- Stephanie Martin
- Defitech Chair in Brain-Machine Interface, Center for Neuroprosthetics, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.,Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA
| | - Christian Mikutta
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA.,Translational Research Center and Division of Clinical Research Support, Psychiatric Services University of Bern (UPD), University Hospital of Psychiatry, Bern, Switzerland.,Department of Neurology, Inselspital, Bern, University Hospital, University of Bern, Bern, Switzerland
| | - Matthew K Leonard
- Department of Neurological Surgery, Department of Physiology, and Center for Integrative Neuroscience, University of California, San Francisco, CA, USA
| | - Dylan Hungate
- Department of Neurological Surgery, Department of Physiology, and Center for Integrative Neuroscience, University of California, San Francisco, CA, USA
| | | | - Shihab Shamma
- Département d'études cognitives, École normale supérieure, PSL Research University, Paris, France.,Electrical and Computer Engineering & Institute for Systems Research, Univ. of Maryland in College Park, MD, USA
| | - Edward F Chang
- Department of Neurological Surgery, Department of Physiology, and Center for Integrative Neuroscience, University of California, San Francisco, CA, USA
| | - José Del R Millán
- Defitech Chair in Brain-Machine Interface, Center for Neuroprosthetics, Ecole Polytechnique Fe´de´rale de Lausanne, Lausanne, Switzerland
| | - Robert T Knight
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA.,Department of Psychology, University of California, Berkeley, CA, USA
| | - Brian N Pasley
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA
| |
Collapse
|
22
|
Stavisky SD, Rezaii P, Willett FR, Hochberg LR, Shenoy KV, Henderson JM. Decoding Speech from Intracortical Multielectrode Arrays in Dorsal "Arm/Hand Areas" of Human Motor Cortex. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2019; 2018:93-97. [PMID: 30440349 DOI: 10.1109/embc.2018.8512199] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Neural prostheses are being developed to restore speech to people with neurological injury or disease. A key design consideration is where and how to access neural correlates of intended speech. Most prior work has examined cortical field potentials at a coarse resolution using electroencephalography (EEG) or medium resolution using electrocorticography (ECoG). The few studies of speech with single-neuron resolution recorded from ventral areas known to be part of the speech network. Here, we recorded from two 96- electrode arrays chronically implanted into the 'hand knob' area of motor cortex while a person with tetraplegia spoke. Despite being located in an area previously demonstrated to modulate during attempted arm movements, many electrodes' neuronal firing rates responded to speech production. In offline analyses, we could classify which of 9 phonemes (plus silence) was spoken with 81% single-trial accuracy using a combination of spike rate and local field potential (LFP) power. This suggests that high-fidelity speech prostheses may be possible using large-scale intracortical recordings in motor cortical areas involved in controlling speech articulators.
Collapse
|
23
|
Tam WK, Wu T, Zhao Q, Keefer E, Yang Z. Human motor decoding from neural signals: a review. BMC Biomed Eng 2019; 1:22. [PMID: 32903354 PMCID: PMC7422484 DOI: 10.1186/s42490-019-0022-z] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Accepted: 07/21/2019] [Indexed: 01/24/2023] Open
Abstract
Many people suffer from movement disability due to amputation or neurological diseases. Fortunately, with modern neurotechnology now it is possible to intercept motor control signals at various points along the neural transduction pathway and use that to drive external devices for communication or control. Here we will review the latest developments in human motor decoding. We reviewed the various strategies to decode motor intention from human and their respective advantages and challenges. Neural control signals can be intercepted at various points in the neural signal transduction pathway, including the brain (electroencephalography, electrocorticography, intracortical recordings), the nerves (peripheral nerve recordings) and the muscles (electromyography). We systematically discussed the sites of signal acquisition, available neural features, signal processing techniques and decoding algorithms in each of these potential interception points. Examples of applications and the current state-of-the-art performance were also reviewed. Although great strides have been made in human motor decoding, we are still far away from achieving naturalistic and dexterous control like our native limbs. Concerted efforts from material scientists, electrical engineers, and healthcare professionals are needed to further advance the field and make the technology widely available in clinical use.
Collapse
Affiliation(s)
- Wing-kin Tam
- Department of Biomedical Engineering, University of Minnesota Twin Cities, 7-105 Hasselmo Hall, 312 Church St. SE, Minnesota, 55455 USA
| | - Tong Wu
- Department of Biomedical Engineering, University of Minnesota Twin Cities, 7-105 Hasselmo Hall, 312 Church St. SE, Minnesota, 55455 USA
| | - Qi Zhao
- Department of Computer Science and Engineering, University of Minnesota Twin Cities, 4-192 Keller Hall, 200 Union Street SE, Minnesota, 55455 USA
| | - Edward Keefer
- Nerves Incorporated, Dallas, TX P. O. Box 141295 USA
| | - Zhi Yang
- Department of Biomedical Engineering, University of Minnesota Twin Cities, 7-105 Hasselmo Hall, 312 Church St. SE, Minnesota, 55455 USA
| |
Collapse
|
24
|
Livezey JA, Bouchard KE, Chang EF. Deep learning as a tool for neural data analysis: Speech classification and cross-frequency coupling in human sensorimotor cortex. PLoS Comput Biol 2019; 15:e1007091. [PMID: 31525179 PMCID: PMC6762206 DOI: 10.1371/journal.pcbi.1007091] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2018] [Revised: 09/26/2019] [Accepted: 05/10/2019] [Indexed: 11/26/2022] Open
Abstract
A fundamental challenge in neuroscience is to understand what structure in the world is represented in spatially distributed patterns of neural activity from multiple single-trial measurements. This is often accomplished by learning a simple, linear transformations between neural features and features of the sensory stimuli or motor task. While successful in some early sensory processing areas, linear mappings are unlikely to be ideal tools for elucidating nonlinear, hierarchical representations of higher-order brain areas during complex tasks, such as the production of speech by humans. Here, we apply deep networks to predict produced speech syllables from a dataset of high gamma cortical surface electric potentials recorded from human sensorimotor cortex. We find that deep networks had higher decoding prediction accuracy compared to baseline models. Having established that deep networks extract more task relevant information from neural data sets relative to linear models (i.e., higher predictive accuracy), we next sought to demonstrate their utility as a data analysis tool for neuroscience. We first show that deep network's confusions revealed hierarchical latent structure in the neural data, which recapitulated the underlying articulatory nature of speech motor control. We next broadened the frequency features beyond high-gamma and identified a novel high-gamma-to-beta coupling during speech production. Finally, we used deep networks to compare task-relevant information in different neural frequency bands, and found that the high-gamma band contains the vast majority of information relevant for the speech prediction task, with little-to-no additional contribution from lower-frequency amplitudes. Together, these results demonstrate the utility of deep networks as a data analysis tool for basic and applied neuroscience.
Collapse
Affiliation(s)
- Jesse A. Livezey
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
- Redwood Center for Theoretical Neuroscience, University of California, Berkeley, Berkeley, California, United States of America
| | - Kristofer E. Bouchard
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
- Redwood Center for Theoretical Neuroscience, University of California, Berkeley, Berkeley, California, United States of America
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, California, United States of America
| | - Edward F. Chang
- Department of Neurological Surgery and Department of Physiology, University of California, San Francisco, San Francisco, California, United States of America
- Center for Integrative Neuroscience, University of California, San Francisco, San Francisco, California, United States of America
- UCSF Epilepsy Center, University of California, San Francisco, San Francisco, California, United States of America
| |
Collapse
|
25
|
Gehrig J, Michalareas G, Forster MT, Lei J, Hok P, Laufs H, Senft C, Seifert V, Schoffelen JM, Hanslmayr S, Kell CA. Low-Frequency Oscillations Code Speech during Verbal Working Memory. J Neurosci 2019; 39:6498-6512. [PMID: 31196933 PMCID: PMC6697399 DOI: 10.1523/jneurosci.0018-19.2019] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Revised: 05/09/2019] [Accepted: 05/10/2019] [Indexed: 11/21/2022] Open
Abstract
The way the human brain represents speech in memory is still unknown. An obvious characteristic of speech is its evolvement over time. During speech processing, neural oscillations are modulated by the temporal properties of the acoustic speech signal, but also acquired knowledge on the temporal structure of language influences speech perception-related brain activity. This suggests that speech could be represented in the temporal domain, a form of representation that the brain also uses to encode autobiographic memories. Empirical evidence for such a memory code is lacking. We investigated the nature of speech memory representations using direct cortical recordings in the left perisylvian cortex during delayed sentence reproduction in female and male patients undergoing awake tumor surgery. Our results reveal that the brain endogenously represents speech in the temporal domain. Temporal pattern similarity analyses revealed that the phase of frontotemporal low-frequency oscillations, primarily in the beta range, represents sentence identity in working memory. The positive relationship between beta power during working memory and task performance suggests that working memory representations benefit from increased phase separation.SIGNIFICANCE STATEMENT Memory is an endogenous source of information based on experience. While neural oscillations encode autobiographic memories in the temporal domain, little is known on their contribution to memory representations of human speech. Our electrocortical recordings in participants who maintain sentences in memory identify the phase of left frontotemporal beta oscillations as the most prominent information carrier of sentence identity. These observations provide evidence for a theoretical model on speech memory representations and explain why interfering with beta oscillations in the left inferior frontal cortex diminishes verbal working memory capacity. The lack of sentence identity coding at the syllabic rate suggests that sentences are represented in memory in a more abstract form compared with speech coding during speech perception and production.
Collapse
Affiliation(s)
- Johannes Gehrig
- Department of Neurology, Goethe University, 60528 Frankfurt, Germany
| | | | | | - Juan Lei
- Department of Neurology, Goethe University, 60528 Frankfurt, Germany
- Institute for Cell Biology and Neuroscience, Goethe University, 60438 Frankfurt, Germany
| | - Pavel Hok
- Department of Neurology, Goethe University, 60528 Frankfurt, Germany
- Department of Neurology, Palacky University and University Hospital Olomouc, 77147 Olomouc, Czech Republic
| | - Helmut Laufs
- Department of Neurology, Goethe University, 60528 Frankfurt, Germany
- Department of Neurology, Christian-Albrechts-University, 24105 Kiel, Germany
| | - Christian Senft
- Department of Neurosurgery, Goethe University, 60528 Frankfurt, Germany
| | - Volker Seifert
- Department of Neurosurgery, Goethe University, 60528 Frankfurt, Germany
| | - Jan-Mathijs Schoffelen
- Radboud University, Donders Institute for Brain, Cognition and Behaviour, 6525 HR Nijmegen, The Netherlands, and
| | - Simon Hanslmayr
- School of Psychology at University of Birmingham, B15 2TT Birmingham, United Kingdom
| | - Christian A Kell
- Department of Neurology, Goethe University, 60528 Frankfurt, Germany,
| |
Collapse
|
26
|
Moses DA, Leonard MK, Makin JG, Chang EF. Real-time decoding of question-and-answer speech dialogue using human cortical activity. Nat Commun 2019; 10:3096. [PMID: 31363096 PMCID: PMC6667454 DOI: 10.1038/s41467-019-10994-4] [Citation(s) in RCA: 98] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2018] [Accepted: 06/06/2019] [Indexed: 01/15/2023] Open
Abstract
Natural communication often occurs in dialogue, differentially engaging auditory and sensorimotor brain regions during listening and speaking. However, previous attempts to decode speech directly from the human brain typically consider listening or speaking tasks in isolation. Here, human participants listened to questions and responded aloud with answers while we used high-density electrocorticography (ECoG) recordings to detect when they heard or said an utterance and to then decode the utterance's identity. Because certain answers were only plausible responses to certain questions, we could dynamically update the prior probabilities of each answer using the decoded question likelihoods as context. We decode produced and perceived utterances with accuracy rates as high as 61% and 76%, respectively (chance is 7% and 20%). Contextual integration of decoded question likelihoods significantly improves answer decoding. These results demonstrate real-time decoding of speech in an interactive, conversational setting, which has important implications for patients who are unable to communicate.
Collapse
Affiliation(s)
- David A Moses
- Department of Neurological Surgery and the Center for Integrative Neuroscience at UC San Francisco, 675 Nelson Rising Lane, San Francisco, CA, 94158, USA
| | - Matthew K Leonard
- Department of Neurological Surgery and the Center for Integrative Neuroscience at UC San Francisco, 675 Nelson Rising Lane, San Francisco, CA, 94158, USA
| | - Joseph G Makin
- Department of Neurological Surgery and the Center for Integrative Neuroscience at UC San Francisco, 675 Nelson Rising Lane, San Francisco, CA, 94158, USA
| | - Edward F Chang
- Department of Neurological Surgery and the Center for Integrative Neuroscience at UC San Francisco, 675 Nelson Rising Lane, San Francisco, CA, 94158, USA.
| |
Collapse
|
27
|
Angrick M, Herff C, Mugler E, Tate MC, Slutzky MW, Krusienski DJ, Schultz T. Speech synthesis from ECoG using densely connected 3D convolutional neural networks. J Neural Eng 2019; 16:036019. [PMID: 30831567 PMCID: PMC6822609 DOI: 10.1088/1741-2552/ab0c59] [Citation(s) in RCA: 71] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
OBJECTIVE Direct synthesis of speech from neural signals could provide a fast and natural way of communication to people with neurological diseases. Invasively-measured brain activity (electrocorticography; ECoG) supplies the necessary temporal and spatial resolution to decode fast and complex processes such as speech production. A number of impressive advances in speech decoding using neural signals have been achieved in recent years, but the complex dynamics are still not fully understood. However, it is unlikely that simple linear models can capture the relation between neural activity and continuous spoken speech. APPROACH Here we show that deep neural networks can be used to map ECoG from speech production areas onto an intermediate representation of speech (logMel spectrogram). The proposed method uses a densely connected convolutional neural network topology which is well-suited to work with the small amount of data available from each participant. MAIN RESULTS In a study with six participants, we achieved correlations up to r = 0.69 between the reconstructed and original logMel spectrograms. We transfered our prediction back into an audible waveform by applying a Wavenet vocoder. The vocoder was conditioned on logMel features that harnessed a much larger, pre-existing data corpus to provide the most natural acoustic output. SIGNIFICANCE To the best of our knowledge, this is the first time that high-quality speech has been reconstructed from neural recordings during speech production using deep neural networks.
Collapse
Affiliation(s)
- Miguel Angrick
- Cognitive Systems Lab, University of Bremen, Bremen, Germany
| | | | | | | | | | | | | |
Collapse
|
28
|
Angrick M, Herff C, Johnson G, Shih J, Krusienski D, Schultz T. Interpretation of convolutional neural networks for speech spectrogram regression from intracranial recordings. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.10.080] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
29
|
Brumberg JS, Pitt KM, Burnison JD. A Noninvasive Brain-Computer Interface for Real-Time Speech Synthesis: The Importance of Multimodal Feedback. IEEE Trans Neural Syst Rehabil Eng 2019; 26:874-881. [PMID: 29641392 DOI: 10.1109/tnsre.2018.2808425] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We conducted a study of a motor imagery brain-computer interface (BCI) using electroencephalography to continuously control a formant frequency speech synthesizer with instantaneous auditory and visual feedback. Over a three-session training period, sixteen participants learned to control the BCI for production of three vowel sounds (/ textipa i/ [heed], / textipa A/ [hot], and / textipa u/ [who'd]) and were split into three groups: those receiving unimodal auditory feedback of synthesized speech, those receiving unimodal visual feedback of formant frequencies, and those receiving multimodal, audio-visual (AV) feedback. Audio feedback was provided by a formant frequency artificial speech synthesizer, and visual feedback was given as a 2-D cursor on a graphical representation of the plane defined by the first two formant frequencies. We found that combined AV feedback led to the greatest performance in terms of percent accuracy, distance to target, and movement time to target compared with either unimodal feedback of auditory or visual information. These results indicate that performance is enhanced when multimodal feedback is meaningful for the BCI task goals, rather than as a generic biofeedback signal of BCI progress.
Collapse
|
30
|
Chrabaszcz A, Neumann WJ, Stretcu O, Lipski WJ, Bush A, Dastolfo-Hromack CA, Wang D, Crammond DJ, Shaiman S, Dickey MW, Holt LL, Turner RS, Fiez JA, Richardson RM. Subthalamic Nucleus and Sensorimotor Cortex Activity During Speech Production. J Neurosci 2019; 39:2698-2708. [PMID: 30700532 PMCID: PMC6445998 DOI: 10.1523/jneurosci.2842-18.2019] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Revised: 01/11/2019] [Accepted: 01/18/2019] [Indexed: 11/21/2022] Open
Abstract
The sensorimotor cortex is somatotopically organized to represent the vocal tract articulators such as lips, tongue, larynx, and jaw. How speech and articulatory features are encoded at the subcortical level, however, remains largely unknown. We analyzed LFP recordings from the subthalamic nucleus (STN) and simultaneous electrocorticography recordings from the sensorimotor cortex of 11 human subjects (1 female) with Parkinson's disease during implantation of deep-brain stimulation (DBS) electrodes while they read aloud three-phoneme words. The initial phonemes involved either articulation primarily with the tongue (coronal consonants) or the lips (labial consonants). We observed significant increases in high-gamma (60-150 Hz) power in both the STN and the sensorimotor cortex that began before speech onset and persisted for the duration of speech articulation. As expected from previous reports, in the sensorimotor cortex, the primary articulators involved in the production of the initial consonants were topographically represented by high-gamma activity. We found that STN high-gamma activity also demonstrated specificity for the primary articulator, although no clear topography was observed. In general, subthalamic high-gamma activity varied along the ventral-dorsal trajectory of the electrodes, with greater high-gamma power recorded in the dorsal locations of the STN. Interestingly, the majority of significant articulator-discriminative activity in the STN occurred before that in sensorimotor cortex. These results demonstrate that articulator-specific speech information is contained within high-gamma activity of the STN, but with different spatial and temporal organization compared with similar information encoded in the sensorimotor cortex.SIGNIFICANCE STATEMENT Clinical and electrophysiological evidence suggest that the subthalamic nucleus (STN) is involved in speech; however, this important basal ganglia node is ignored in current models of speech production. We previously showed that STN neurons differentially encode early and late aspects of speech production, but no previous studies have examined subthalamic functional organization for speech articulators. Using simultaneous LFP recordings from the sensorimotor cortex and the STN in patients with Parkinson's disease undergoing deep-brain stimulation surgery, we discovered that STN high-gamma activity tracks speech production at the level of vocal tract articulators before the onset of vocalization and often before related cortical encoding.
Collapse
Affiliation(s)
- Anna Chrabaszcz
- Department of Psychology, University of Pittsburgh, Pittsburgh, Pennsylvania 15213
| | - Wolf-Julian Neumann
- Movement Disorder and Neuromodulation Unit, Department of Neurology, Campus Mitte, Charité, Universitätsmedizin Berlin, Berlin, Germany 10117
| | - Otilia Stretcu
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
| | - Witold J Lipski
- Brain Modulation Laboratory, Department of Neurological Surgery, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania 15213
| | - Alan Bush
- Brain Modulation Laboratory, Department of Neurological Surgery, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania 15213
- Department of Physics, FCEN, University of Buenos Aires and IFIBA-CONICET, Buenos Aires, Argentina 1428
| | - Christina A Dastolfo-Hromack
- Brain Modulation Laboratory, Department of Neurological Surgery, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania 15213
| | - Dengyu Wang
- Brain Modulation Laboratory, Department of Neurological Surgery, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania 15213
- School of Medicine, Tsinghua University, Beijing, China 100084
| | - Donald J Crammond
- Brain Modulation Laboratory, Department of Neurological Surgery, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania 15213
| | - Susan Shaiman
- Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, Pennsylvania 15213
| | - Michael W Dickey
- Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, Pennsylvania 15213
| | - Lori L Holt
- Department of Psychology, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
| | - Robert S Turner
- Department of Neurobiology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania 15213, and
- University of Pittsburgh Brain Institute, Pittsburgh, Pennsylvania 15213
| | - Julie A Fiez
- Department of Psychology, University of Pittsburgh, Pittsburgh, Pennsylvania 15213
- Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, Pennsylvania 15213
- University of Pittsburgh Brain Institute, Pittsburgh, Pennsylvania 15213
| | - R Mark Richardson
- Brain Modulation Laboratory, Department of Neurological Surgery, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania 15213,
- Department of Neurobiology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania 15213, and
- University of Pittsburgh Brain Institute, Pittsburgh, Pennsylvania 15213
| |
Collapse
|
31
|
Slutzky MW. Brain-Machine Interfaces: Powerful Tools for Clinical Treatment and Neuroscientific Investigations. Neuroscientist 2019; 25:139-154. [PMID: 29772957 PMCID: PMC6611552 DOI: 10.1177/1073858418775355] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
Brain-machine interfaces (BMIs) have exploded in popularity in the past decade. BMIs, also called brain-computer interfaces, provide a direct link between the brain and a computer, usually to control an external device. BMIs have a wide array of potential clinical applications, ranging from restoring communication to people unable to speak due to amyotrophic lateral sclerosis or a stroke, to restoring movement to people with paralysis from spinal cord injury or motor neuron disease, to restoring memory to people with cognitive impairment. Because BMIs are controlled directly by the activity of prespecified neurons or cortical areas, they also provide a powerful paradigm with which to investigate fundamental questions about brain physiology, including neuronal behavior, learning, and the role of oscillations. This article reviews the clinical and neuroscientific applications of BMIs, with a primary focus on motor BMIs.
Collapse
Affiliation(s)
- Marc W Slutzky
- 1 Departments of Neurology, Physiology, and Physical Medicine & Rehabilitation, Northwestern University, Chicago, IL, USA
| |
Collapse
|
32
|
Milsap G, Collard M, Coogan C, Rabbani Q, Wang Y, Crone NE. Keyword Spotting Using Human Electrocorticographic Recordings. Front Neurosci 2019; 13:60. [PMID: 30837823 PMCID: PMC6389788 DOI: 10.3389/fnins.2019.00060] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Accepted: 01/21/2019] [Indexed: 11/13/2022] Open
Abstract
Neural keyword spotting could form the basis of a speech brain-computer-interface for menu-navigation if it can be done with low latency and high specificity comparable to the “wake-word” functionality of modern voice-activated AI assistant technologies. This study investigated neural keyword spotting using motor representations of speech via invasively-recorded electrocorticographic signals as a proof-of-concept. Neural matched filters were created from monosyllabic consonant-vowel utterances: one keyword utterance, and 11 similar non-keyword utterances. These filters were used in an analog to the acoustic keyword spotting problem, applied for the first time to neural data. The filter templates were cross-correlated with the neural signal, capturing temporal dynamics of neural activation across cortical sites. Neural vocal activity detection (VAD) was used to identify utterance times and a discriminative classifier was used to determine if these utterances were the keyword or non-keyword speech. Model performance appeared to be highly related to electrode placement and spatial density. Vowel height (/a/ vs /i/) was poorly discriminated in recordings from sensorimotor cortex, but was highly discriminable using neural features from superior temporal gyrus during self-monitoring. The best performing neural keyword detection (5 keyword detections with two false-positives across 60 utterances) and neural VAD (100% sensitivity, ~1 false detection per 10 utterances) came from high-density (2 mm electrode diameter and 5 mm pitch) recordings from ventral sensorimotor cortex, suggesting the spatial fidelity and extent of high-density ECoG arrays may be sufficient for the purpose of speech brain-computer-interfaces.
Collapse
Affiliation(s)
- Griffin Milsap
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, United States
| | - Maxwell Collard
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, United States
| | - Christopher Coogan
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, United States
| | - Qinwan Rabbani
- Department of Electrical Engineering, Johns Hopkins University, Baltimore, MD, United States
| | - Yujing Wang
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, United States.,Fischell Department of Bioengineering, University of Maryland College Park, College Park, MD, United States
| | - Nathan E Crone
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD, United States
| |
Collapse
|
33
|
Towards reconstructing intelligible speech from the human auditory cortex. Sci Rep 2019; 9:874. [PMID: 30696881 PMCID: PMC6351601 DOI: 10.1038/s41598-018-37359-z] [Citation(s) in RCA: 86] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2018] [Accepted: 11/30/2018] [Indexed: 11/08/2022] Open
Abstract
Auditory stimulus reconstruction is a technique that finds the best approximation of the acoustic stimulus from the population of evoked neural activity. Reconstructing speech from the human auditory cortex creates the possibility of a speech neuroprosthetic to establish a direct communication with the brain and has been shown to be possible in both overt and covert conditions. However, the low quality of the reconstructed speech has severely limited the utility of this method for brain-computer interface (BCI) applications. To advance the state-of-the-art in speech neuroprosthesis, we combined the recent advances in deep learning with the latest innovations in speech synthesis technologies to reconstruct closed-set intelligible speech from the human auditory cortex. We investigated the dependence of reconstruction accuracy on linear and nonlinear (deep neural network) regression methods and the acoustic representation that is used as the target of reconstruction, including auditory spectrogram and speech synthesis parameters. In addition, we compared the reconstruction accuracy from low and high neural frequency ranges. Our results show that a deep neural network model that directly estimates the parameters of a speech synthesizer from all neural frequencies achieves the highest subjective and objective scores on a digit recognition task, improving the intelligibility by 65% over the baseline method which used linear regression to reconstruct the auditory spectrogram. These results demonstrate the efficacy of deep learning and speech synthesis algorithms for designing the next generation of speech BCI systems, which not only can restore communications for paralyzed patients but also have the potential to transform human-computer interaction technologies.
Collapse
|
34
|
Rabbani Q, Milsap G, Crone NE. The Potential for a Speech Brain-Computer Interface Using Chronic Electrocorticography. Neurotherapeutics 2019; 16:144-165. [PMID: 30617653 PMCID: PMC6361062 DOI: 10.1007/s13311-018-00692-2] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
A brain-computer interface (BCI) is a technology that uses neural features to restore or augment the capabilities of its user. A BCI for speech would enable communication in real time via neural correlates of attempted or imagined speech. Such a technology would potentially restore communication and improve quality of life for locked-in patients and other patients with severe communication disorders. There have been many recent developments in neural decoders, neural feature extraction, and brain recording modalities facilitating BCI for the control of prosthetics and in automatic speech recognition (ASR). Indeed, ASR and related fields have developed significantly over the past years, and many lend many insights into the requirements, goals, and strategies for speech BCI. Neural speech decoding is a comparatively new field but has shown much promise with recent studies demonstrating semantic, auditory, and articulatory decoding using electrocorticography (ECoG) and other neural recording modalities. Because the neural representations for speech and language are widely distributed over cortical regions spanning the frontal, parietal, and temporal lobes, the mesoscopic scale of population activity captured by ECoG surface electrode arrays may have distinct advantages for speech BCI, in contrast to the advantages of microelectrode arrays for upper-limb BCI. Nevertheless, there remain many challenges for the translation of speech BCIs to clinical populations. This review discusses and outlines the current state-of-the-art for speech BCI and explores what a speech BCI using chronic ECoG might entail.
Collapse
Affiliation(s)
- Qinwan Rabbani
- Department of Electrical Engineering, The Johns Hopkins University Whiting School of Engineering, Baltimore, MD, USA.
| | - Griffin Milsap
- Department of Biomedical Engineering, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Nathan E Crone
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| |
Collapse
|
35
|
Mugler EM, Tate MC, Livescu K, Templer JW, Goldrick MA, Slutzky MW. Differential Representation of Articulatory Gestures and Phonemes in Precentral and Inferior Frontal Gyri. J Neurosci 2018; 38:9803-9813. [PMID: 30257858 PMCID: PMC6234299 DOI: 10.1523/jneurosci.1206-18.2018] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2018] [Revised: 09/09/2018] [Accepted: 09/10/2018] [Indexed: 11/21/2022] Open
Abstract
Speech is a critical form of human communication and is central to our daily lives. Yet, despite decades of study, an understanding of the fundamental neural control of speech production remains incomplete. Current theories model speech production as a hierarchy from sentences and phrases down to words, syllables, speech sounds (phonemes), and the actions of vocal tract articulators used to produce speech sounds (articulatory gestures). Here, we investigate the cortical representation of articulatory gestures and phonemes in ventral precentral and inferior frontal gyri in men and women. Our results indicate that ventral precentral cortex represents gestures to a greater extent than phonemes, while inferior frontal cortex represents both gestures and phonemes. These findings suggest that speech production shares a common cortical representation with that of other types of movement, such as arm and hand movements. This has important implications both for our understanding of speech production and for the design of brain-machine interfaces to restore communication to people who cannot speak.SIGNIFICANCE STATEMENT Despite being studied for decades, the production of speech by the brain is not fully understood. In particular, the most elemental parts of speech, speech sounds (phonemes) and the movements of vocal tract articulators used to produce these sounds (articulatory gestures), have both been hypothesized to be encoded in motor cortex. Using direct cortical recordings, we found evidence that primary motor and premotor cortices represent gestures to a greater extent than phonemes. Inferior frontal cortex (part of Broca's area) appears to represent both gestures and phonemes. These findings suggest that speech production shares a similar cortical organizational structure with the movement of other body parts.
Collapse
Affiliation(s)
| | | | - Karen Livescu
- Toyota Technological Institute at Chicago, Chicago, Illinois 60637
| | | | | | - Marc W Slutzky
- Departments of Neurology,
- Physiology
- Physical Medicine & Rehabilitation, Northwestern University, Chicago, Illinois 60611, and
| |
Collapse
|
36
|
Cooney C, Folli R, Coyle D. Neurolinguistics Research Advancing Development of a Direct-Speech Brain-Computer Interface. iScience 2018; 8:103-125. [PMID: 30296666 PMCID: PMC6174918 DOI: 10.1016/j.isci.2018.09.016] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2018] [Revised: 09/04/2018] [Accepted: 09/18/2018] [Indexed: 01/09/2023] Open
Abstract
A direct-speech brain-computer interface (DS-BCI) acquires neural signals corresponding to imagined speech, then processes and decodes these signals to produce a linguistic output in the form of phonemes, words, or sentences. Recent research has shown the potential of neurolinguistics to enhance decoding approaches to imagined speech with the inclusion of semantics and phonology in experimental procedures. As neurolinguistics research findings are beginning to be incorporated within the scope of DS-BCI research, it is our view that a thorough understanding of imagined speech, and its relationship with overt speech, must be considered an integral feature of research in this field. With a focus on imagined speech, we provide a review of the most important neurolinguistics research informing the field of DS-BCI and suggest how this research may be utilized to improve current experimental protocols and decoding techniques. Our review of the literature supports a cross-disciplinary approach to DS-BCI research, in which neurolinguistics concepts and methods are utilized to aid development of a naturalistic mode of communication.
Collapse
Affiliation(s)
- Ciaran Cooney
- Intelligent Systems Research Centre, Ulster University, Derry, UK.
| | - Raffaella Folli
- Institute for Research in Social Sciences, Ulster University, Jordanstown, UK
| | - Damien Coyle
- Intelligent Systems Research Centre, Ulster University, Derry, UK
| |
Collapse
|
37
|
Rudner M, Lyberg-Åhlander V, Brännström J, Nirme J, Pichora-Fuller MK, Sahlén B. Listening Comprehension and Listening Effort in the Primary School Classroom. Front Psychol 2018; 9:1193. [PMID: 30050489 PMCID: PMC6052349 DOI: 10.3389/fpsyg.2018.01193] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2018] [Accepted: 06/20/2018] [Indexed: 11/13/2022] Open
Abstract
In the primary school classroom, children are exposed to multiple factors that combine to create adverse conditions for listening to and understanding what the teacher is saying. Despite the ubiquity of these conditions, there is little knowledge concerning the way in which various factors combine to influence listening comprehension and the effortfulness of listening. The aim of the present study was to investigate the combined effects of background noise, voice quality, and visual cues on children's listening comprehension and effort. To achieve this aim, we performed a set of four well-controlled, yet ecologically valid, experiments with 245 eight-year-old participants. Classroom listening conditions were simulated using a digitally animated talker with a dysphonic (hoarse) voice and background babble noise composed of several children talking. Results show that even low levels of babble noise interfere with listening comprehension, and there was some evidence that this effect was reduced by seeing the talker's face. Dysphonia did not significantly reduce listening comprehension scores, but it was considered unpleasant and made listening seem difficult, probably by reducing motivation to listen. We found some evidence that listening comprehension performance under adverse conditions is positively associated with individual differences in executive function. Overall, these results suggest that multiple factors combine to influence listening comprehension and effort for child listeners in the primary school classroom. The constellation of these room, talker, modality, and listener factors should be taken into account in the planning and design of educational and learning activities.
Collapse
Affiliation(s)
- Mary Rudner
- Linnaeus Centre HEAD, Department of Behavioural Sciences and Learning, Linköping University, Linköping, Sweden
| | - Viveka Lyberg-Åhlander
- Department of Clinical Sciences, Logopedics, Phoniatrics and Audiology, Lund University, Lund, Sweden
| | - Jonas Brännström
- Department of Clinical Sciences, Logopedics, Phoniatrics and Audiology, Lund University, Lund, Sweden
| | - Jens Nirme
- Lund University Cognitive Science (LUCS), Lund University, Lund, Sweden
| | | | - Birgitta Sahlén
- Department of Clinical Sciences, Logopedics, Phoniatrics and Audiology, Lund University, Lund, Sweden
| |
Collapse
|
38
|
Chartier J, Anumanchipalli GK, Johnson K, Chang EF. Encoding of Articulatory Kinematic Trajectories in Human Speech Sensorimotor Cortex. Neuron 2018; 98:1042-1054.e4. [PMID: 29779940 PMCID: PMC5992088 DOI: 10.1016/j.neuron.2018.04.031] [Citation(s) in RCA: 96] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2017] [Revised: 12/16/2017] [Accepted: 04/22/2018] [Indexed: 11/19/2022]
Abstract
When speaking, we dynamically coordinate movements of our jaw, tongue, lips, and larynx. To investigate the neural mechanisms underlying articulation, we used direct cortical recordings from human sensorimotor cortex while participants spoke natural sentences that included sounds spanning the entire English phonetic inventory. We used deep neural networks to infer speakers' articulator movements from produced speech acoustics. Individual electrodes encoded a diversity of articulatory kinematic trajectories (AKTs), each revealing coordinated articulator movements toward specific vocal tract shapes. AKTs captured a wide range of movement types, yet they could be differentiated by the place of vocal tract constriction. Additionally, AKTs manifested out-and-back trajectories with harmonic oscillator dynamics. While AKTs were functionally stereotyped across different sentences, context-dependent encoding of preceding and following movements during production of the same phoneme demonstrated the cortical representation of coarticulation. Articulatory movements encoded in sensorimotor cortex give rise to the complex kinematics underlying continuous speech production. VIDEO ABSTRACT.
Collapse
Affiliation(s)
- Josh Chartier
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA; Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA 94143, USA; Joint Program in Bioengineering, University of California, Berkeley and University of California, San Francisco, Berkeley, CA 94720, USA
| | - Gopala K Anumanchipalli
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA; Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Keith Johnson
- Department of Linguistics, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Edward F Chang
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA; Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA 94143, USA.
| |
Collapse
|
39
|
de Cheveigné A, Wong DD, Di Liberto GM, Hjortkjær J, Slaney M, Lalor E. Decoding the auditory brain with canonical component analysis. Neuroimage 2018; 172:206-216. [DOI: 10.1016/j.neuroimage.2018.01.033] [Citation(s) in RCA: 100] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Revised: 12/11/2017] [Accepted: 01/15/2018] [Indexed: 11/28/2022] Open
|
40
|
Kapeller C, Ogawa H, Schalk G, Kunii N, Coon WG, Scharinger J, Guger C, Kamada K. Real-time detection and discrimination of visual perception using electrocorticographic signals. J Neural Eng 2018; 15:036001. [PMID: 29359711 DOI: 10.1088/1741-2552/aaa9f6] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
OBJECTIVE Several neuroimaging studies have demonstrated that the ventral temporal cortex contains specialized regions that process visual stimuli. This study investigated the spatial and temporal dynamics of electrocorticographic (ECoG) responses to different types and colors of visual stimulation that were presented to four human participants, and demonstrated a real-time decoder that detects and discriminates responses to untrained natural images. APPROACH ECoG signals from the participants were recorded while they were shown colored and greyscale versions of seven types of visual stimuli (images of faces, objects, bodies, line drawings, digits, and kanji and hiragana characters), resulting in 14 classes for discrimination (experiment I). Additionally, a real-time system asynchronously classified ECoG responses to faces, kanji and black screens presented via a monitor (experiment II), or to natural scenes (i.e. the face of an experimenter, natural images of faces and kanji, and a mirror) (experiment III). Outcome measures in all experiments included the discrimination performance across types based on broadband γ activity. MAIN RESULTS Experiment I demonstrated an offline classification accuracy of 72.9% when discriminating among the seven types (without color separation). Further discrimination of grey versus colored images reached an accuracy of 67.1%. Discriminating all colors and types (14 classes) yielded an accuracy of 52.1%. In experiment II and III, the real-time decoder correctly detected 73.7% responses to face, kanji and black computer stimuli and 74.8% responses to presented natural scenes. SIGNIFICANCE Seven different types and their color information (either grey or color) could be detected and discriminated using broadband γ activity. Discrimination performance maximized for combined spatial-temporal information. The discrimination of stimulus color information provided the first ECoG-based evidence for color-related population-level cortical broadband γ responses in humans. Stimulus categories can be detected by their ECoG responses in real time within 500 ms with respect to stimulus onset.
Collapse
Affiliation(s)
- C Kapeller
- Guger Technologies OG, Graz, Austria. Department of Computational Perception, Johannes Kepler University, Linz, Austria
| | | | | | | | | | | | | | | |
Collapse
|
41
|
Decoding spoken phonemes from sensorimotor cortex with high-density ECoG grids. Neuroimage 2017; 180:301-311. [PMID: 28993231 DOI: 10.1016/j.neuroimage.2017.10.011] [Citation(s) in RCA: 58] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2017] [Revised: 10/04/2017] [Accepted: 10/06/2017] [Indexed: 12/19/2022] Open
Abstract
For people who cannot communicate due to severe paralysis or involuntary movements, technology that decodes intended speech from the brain may offer an alternative means of communication. If decoding proves to be feasible, intracranial Brain-Computer Interface systems can be developed which are designed to translate decoded speech into computer generated speech or to instructions for controlling assistive devices. Recent advances suggest that such decoding may be feasible from sensorimotor cortex, but it is not clear how this challenge can be approached best. One approach is to identify and discriminate elements of spoken language, such as phonemes. We investigated feasibility of decoding four spoken phonemes from the sensorimotor face area, using electrocorticographic signals obtained with high-density electrode grids. Several decoding algorithms including spatiotemporal matched filters, spatial matched filters and support vector machines were compared. Phonemes could be classified correctly at a level of over 75% with spatiotemporal matched filters. Support Vector machine analysis reached a similar level, but spatial matched filters yielded significantly lower scores. The most informative electrodes were clustered along the central sulcus. Highest scores were achieved from time windows centered around voice onset time, but a 500 ms window before onset time could also be classified significantly. The results suggest that phoneme production involves a sequence of robust and reproducible activity patterns on the cortical surface. Importantly, decoding requires inclusion of temporal information to capture the rapid shifts of robust patterns associated with articulator muscle group contraction during production of a phoneme. The high classification scores are likely to be enabled by the use of high density grids, and by the use of discrete phonemes. Implications for use in Brain-Computer Interfaces are discussed.
Collapse
|
42
|
Holdgraf CR, Rieger JW, Micheli C, Martin S, Knight RT, Theunissen FE. Encoding and Decoding Models in Cognitive Electrophysiology. Front Syst Neurosci 2017; 11:61. [PMID: 29018336 PMCID: PMC5623038 DOI: 10.3389/fnsys.2017.00061] [Citation(s) in RCA: 64] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Accepted: 08/07/2017] [Indexed: 11/13/2022] Open
Abstract
Cognitive neuroscience has seen rapid growth in the size and complexity of data recorded from the human brain as well as in the computational tools available to analyze this data. This data explosion has resulted in an increased use of multivariate, model-based methods for asking neuroscience questions, allowing scientists to investigate multiple hypotheses with a single dataset, to use complex, time-varying stimuli, and to study the human brain under more naturalistic conditions. These tools come in the form of "Encoding" models, in which stimulus features are used to model brain activity, and "Decoding" models, in which neural features are used to generated a stimulus output. Here we review the current state of encoding and decoding models in cognitive electrophysiology and provide a practical guide toward conducting experiments and analyses in this emerging field. Our examples focus on using linear models in the study of human language and audition. We show how to calculate auditory receptive fields from natural sounds as well as how to decode neural recordings to predict speech. The paper aims to be a useful tutorial to these approaches, and a practical introduction to using machine learning and applied statistics to build models of neural activity. The data analytic approaches we discuss may also be applied to other sensory modalities, motor systems, and cognitive systems, and we cover some examples in these areas. In addition, a collection of Jupyter notebooks is publicly available as a complement to the material covered in this paper, providing code examples and tutorials for predictive modeling in python. The aim is to provide a practical understanding of predictive modeling of human brain data and to propose best-practices in conducting these analyses.
Collapse
Affiliation(s)
- Christopher R. Holdgraf
- Department of Psychology, Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, United States
- Office of the Vice Chancellor for Research, Berkeley Institute for Data Science, University of California, Berkeley, Berkeley, CA, United States
| | - Jochem W. Rieger
- Department of Psychology, Carl-von-Ossietzky University, Oldenburg, Germany
| | - Cristiano Micheli
- Department of Psychology, Carl-von-Ossietzky University, Oldenburg, Germany
- Institut des Sciences Cognitives Marc Jeannerod, Lyon, France
| | - Stephanie Martin
- Department of Psychology, Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, United States
- Defitech Chair in Brain-Machine Interface, Center for Neuroprosthetics, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Robert T. Knight
- Department of Psychology, Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, United States
| | - Frederic E. Theunissen
- Department of Psychology, Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA, United States
- Department of Psychology, University of California, Berkeley, Berkeley, CA, United States
| |
Collapse
|
43
|
Bocquelet F, Hueber T, Girin L, Chabardès S, Yvert B. Key considerations in designing a speech brain-computer interface. ACTA ACUST UNITED AC 2017; 110:392-401. [PMID: 28756027 DOI: 10.1016/j.jphysparis.2017.07.002] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2017] [Revised: 06/21/2017] [Accepted: 07/19/2017] [Indexed: 01/08/2023]
Abstract
Restoring communication in case of aphasia is a key challenge for neurotechnologies. To this end, brain-computer strategies can be envisioned to allow artificial speech synthesis from the continuous decoding of neural signals underlying speech imagination. Such speech brain-computer interfaces do not exist yet and their design should consider three key choices that need to be made: the choice of appropriate brain regions to record neural activity from, the choice of an appropriate recording technique, and the choice of a neural decoding scheme in association with an appropriate speech synthesis method. These key considerations are discussed here in light of (1) the current understanding of the functional neuroanatomy of cortical areas underlying overt and covert speech production, (2) the available literature making use of a variety of brain recording techniques to better characterize and address the challenge of decoding cortical speech signals, and (3) the different speech synthesis approaches that can be considered depending on the level of speech representation (phonetic, acoustic or articulatory) envisioned to be decoded at the core of a speech BCI paradigm.
Collapse
Affiliation(s)
- Florent Bocquelet
- INSERM, BrainTech Laboratory U1205, F-38000 Grenoble, France; Univ. Grenoble Alpes, BrainTech Laboratory U1205, F-38000 Grenoble, France
| | - Thomas Hueber
- Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, 38000 Grenoble, France
| | - Laurent Girin
- Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, 38000 Grenoble, France
| | | | - Blaise Yvert
- INSERM, BrainTech Laboratory U1205, F-38000 Grenoble, France; Univ. Grenoble Alpes, BrainTech Laboratory U1205, F-38000 Grenoble, France.
| |
Collapse
|
44
|
Iljina O, Derix J, Schirrmeister RT, Schulze-Bonhage A, Auer P, Aertsen A, Ball T. Neurolinguistic and machine-learning perspectives on direct speech BCIs for restoration of naturalistic communication. BRAIN-COMPUTER INTERFACES 2017. [DOI: 10.1080/2326263x.2017.1330611] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Olga Iljina
- GRK 1624 ‘Frequency effects in language’, University of Freiburg, Freiburg, Germany
- Department of German Linguistics, University of Freiburg, Freiburg, Germany
- Hermann Paul School of Linguistics, University of Freiburg, Germany
- BrainLinks-BrainTools, University of Freiburg, Freiburg, Germany
- Neurobiology and Biophysics, Faculty of Biology, University of Freiburg, Freiburg, Germany
| | - Johanna Derix
- BrainLinks-BrainTools, University of Freiburg, Freiburg, Germany
- Translational Neurotechnology Lab, Department of Neurosurgery, University Medical Center Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Robin Tibor Schirrmeister
- BrainLinks-BrainTools, University of Freiburg, Freiburg, Germany
- Translational Neurotechnology Lab, Department of Neurosurgery, University Medical Center Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Andreas Schulze-Bonhage
- Epilepsy Center, Department of Neurosurgery, University Medical Center Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
- BrainLinks-BrainTools, University of Freiburg, Freiburg, Germany
| | - Peter Auer
- GRK 1624 ‘Frequency effects in language’, University of Freiburg, Freiburg, Germany
- Department of German Linguistics, University of Freiburg, Freiburg, Germany
- Hermann Paul School of Linguistics, University of Freiburg, Germany
- Freiburg Institute for Advanced Studies (FRIAS), University of Freiburg, Freiburg, Germany
| | - Ad Aertsen
- Neurobiology and Biophysics, Faculty of Biology, University of Freiburg, Freiburg, Germany
- Bernstein Center Freiburg, University of Freiburg, Germany
| | - Tonio Ball
- BrainLinks-BrainTools, University of Freiburg, Freiburg, Germany
- Translational Neurotechnology Lab, Department of Neurosurgery, University Medical Center Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| |
Collapse
|
45
|
Huggins JE, Guger C, Ziat M, Zander TO, Taylor D, Tangermann M, Soria-Frisch A, Simeral J, Scherer R, Rupp R, Ruffini G, Robinson DKR, Ramsey NF, Nijholt A, Müller-Putz G, McFarland DJ, Mattia D, Lance BJ, Kindermans PJ, Iturrate I, Herff C, Gupta D, Do AH, Collinger JL, Chavarriaga R, Chase SM, Bleichner MG, Batista A, Anderson CW, Aarnoutse EJ. Workshops of the Sixth International Brain-Computer Interface Meeting: brain-computer interfaces past, present, and future. BRAIN-COMPUTER INTERFACES 2017; 4:3-36. [PMID: 29152523 PMCID: PMC5693371 DOI: 10.1080/2326263x.2016.1275488] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
The Sixth International Brain-Computer Interface (BCI) Meeting was held 30 May-3 June 2016 at the Asilomar Conference Grounds, Pacific Grove, California, USA. The conference included 28 workshops covering topics in BCI and brain-machine interface research. Topics included BCI for specific populations or applications, advancing BCI research through use of specific signals or technological advances, and translational and commercial issues to bring both implanted and non-invasive BCIs to market. BCI research is growing and expanding in the breadth of its applications, the depth of knowledge it can produce, and the practical benefit it can provide both for those with physical impairments and the general public. Here we provide summaries of each workshop, illustrating the breadth and depth of BCI research and highlighting important issues and calls for action to support future research and development.
Collapse
Affiliation(s)
- Jane E. Huggins
- Department of Physical Medicine and Rehabilitation, Department of Biomedical Engineering, University of Michigan, Ann Arbor, Michigan, USA
| | - Christoph Guger
- G.Tec Medical Engineering GmbH, Guger Technologies OG, Schiedlberg, Austria
| | - Mounia Ziat
- Psychology Department, Northern Michigan University, Marquette, MI, USA
| | - Thorsten O. Zander
- Team PhyPA, Biological Psychology and Neuroergonomics, Technical University of Berlin, Berlin, Germany
| | | | - Michael Tangermann
- Cluster of Excellence BrainLinks-BrainTools, University of Freiburg, Germany
| | | | - John Simeral
- Ctr. For Neurorestoration and Neurotechnology, Rehab. R&D Service, Dept. of VA Medical Center, School of Engineering, Brown University, Providence, RI, USA
| | - Reinhold Scherer
- Institute of Neural Engineering, BCI- Lab, Graz University of Technology, Graz, Austria
| | - Rüdiger Rupp
- Section Experimental Neurorehabilitation, Spinal Cord Injury Center, University Hospital in Heidelberg, Heidelberg, Germany
| | - Giulio Ruffini
- Neuroscience Business Unit, Starlab Barcelona SLU, Barcelona, Spain
- Neuroelectrics Inc., Boston, USA
| | - Douglas K. R. Robinson
- Institute: Laboratoire Interdisciplinaire Sciences Innovations Sociétés (LISIS), Université Paris-Est Marne-la-Vallée, MARNE-LA-VALLÉE, France
| | - Nick F. Ramsey
- Dept Neurology & Neurosurgery, Brain Center Rudolf Magnus, University Medical Center Utrecht, University of Utrecht, Utrecht, Netherlands
| | - Anton Nijholt
- Faculty EEMCS, Enschede, University of Twente, The Netherlands & Imagineering Institute, Iskandar, Malaysia
| | - Gernot Müller-Putz
- Institute of Neural Engineering, BCI- Lab, Graz University of Technology, Graz, Austria
| | - Dennis J. McFarland
- New York State Department of Health, National Center for Adaptive Neurotechnologies, Wadsworth Center, Albany, New York USA
| | - Donatella Mattia
- Clinical Neurophysiology, Fondazione Santa Lucia, Neuroelectrical Imaging and BCI Lab, IRCCS, Rome, Italy
| | - Brent J. Lance
- Human Research and Engineering Directorate, U.S. Army Research Laboratory, Aberdeen Proving Ground, Aberdeen, MD USA
| | | | - Iñaki Iturrate
- Defitech Chair in Brain–machine Interface (CNBI), Center for Neuroprosthetics, École Polytechnique Fédérale de Lausanne, EPFL-STI-CNBI, Campus Biotech H4, Geneva, Switzerland
| | - Christian Herff
- Cognitive Systems Lab, University of Bremen, Bremen, Germany
| | - Disha Gupta
- Brain Mind Research Inst, Weill Cornell Medical College, Early Brain Injury and Recovery Lab, Burke Medical Research Inst, White Plains, New York, USA
| | - An H. Do
- Department of Neurology, UC Irvine Brain Computer Interface Lab, University of California, Irvine, CA, USA
| | - Jennifer L. Collinger
- Department of Physical Medicine and Rehabilitation, Department of Veterans Affairs, VA Pittsburgh Healthcare System, University of Pittsburgh, Pittsburgh, PA, USA
| | - Ricardo Chavarriaga
- Defitech Chair in Brain–machine Interface (CNBI), Center for Neuroprosthetics, École Polytechnique Fédérale de Lausanne, EPFL-STI-CNBI, Campus Biotech H4, Geneva, Switzerland
| | - Steven M. Chase
- Center for the Neural Basis of Cognition and Department Biomedical Engineering, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Martin G. Bleichner
- Neuropsychology Lab, Department of Psychology, European Medical School, Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany
| | - Aaron Batista
- Department of Bioengineering, Swanson School of Engineering, University of Pittsburgh, Pittsburgh, PA USA
| | - Charles W. Anderson
- Department of Computer Science, Colorado State University, Fort Collins, CO USA
| | - Erik J. Aarnoutse
- Brain Center Rudolf Magnus, Dept Neurology and Neurosurgery, University Medical Center Utrecht, Utrecht, The Netherlands
| |
Collapse
|
46
|
Skipper JI, Devlin JT, Lametti DR. The hearing ear is always found close to the speaking tongue: Review of the role of the motor system in speech perception. BRAIN AND LANGUAGE 2017; 164:77-105. [PMID: 27821280 DOI: 10.1016/j.bandl.2016.10.004] [Citation(s) in RCA: 117] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2016] [Accepted: 10/24/2016] [Indexed: 06/06/2023]
Abstract
Does "the motor system" play "a role" in speech perception? If so, where, how, and when? We conducted a systematic review that addresses these questions using both qualitative and quantitative methods. The qualitative review of behavioural, computational modelling, non-human animal, brain damage/disorder, electrical stimulation/recording, and neuroimaging research suggests that distributed brain regions involved in producing speech play specific, dynamic, and contextually determined roles in speech perception. The quantitative review employed region and network based neuroimaging meta-analyses and a novel text mining method to describe relative contributions of nodes in distributed brain networks. Supporting the qualitative review, results show a specific functional correspondence between regions involved in non-linguistic movement of the articulators, covertly and overtly producing speech, and the perception of both nonword and word sounds. This distributed set of cortical and subcortical speech production regions are ubiquitously active and form multiple networks whose topologies dynamically change with listening context. Results are inconsistent with motor and acoustic only models of speech perception and classical and contemporary dual-stream models of the organization of language and the brain. Instead, results are more consistent with complex network models in which multiple speech production related networks and subnetworks dynamically self-organize to constrain interpretation of indeterminant acoustic patterns as listening context requires.
Collapse
Affiliation(s)
- Jeremy I Skipper
- Experimental Psychology, University College London, United Kingdom.
| | - Joseph T Devlin
- Experimental Psychology, University College London, United Kingdom
| | - Daniel R Lametti
- Experimental Psychology, University College London, United Kingdom; Department of Experimental Psychology, University of Oxford, United Kingdom
| |
Collapse
|
47
|
Brumberg JS, Krusienski DJ, Chakrabarti S, Gunduz A, Brunner P, Ritaccio AL, Schalk G. Spatio-Temporal Progression of Cortical Activity Related to Continuous Overt and Covert Speech Production in a Reading Task. PLoS One 2016; 11:e0166872. [PMID: 27875590 PMCID: PMC5119784 DOI: 10.1371/journal.pone.0166872] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2016] [Accepted: 11/04/2016] [Indexed: 11/18/2022] Open
Abstract
How the human brain plans, executes, and monitors continuous and fluent speech has remained largely elusive. For example, previous research has defined the cortical locations most important for different aspects of speech function, but has not yet yielded a definition of the temporal progression of involvement of those locations as speech progresses either overtly or covertly. In this paper, we uncovered the spatio-temporal evolution of neuronal population-level activity related to continuous overt speech, and identified those locations that shared activity characteristics across overt and covert speech. Specifically, we asked subjects to repeat continuous sentences aloud or silently while we recorded electrical signals directly from the surface of the brain (electrocorticography (ECoG)). We then determined the relationship between cortical activity and speech output across different areas of cortex and at sub-second timescales. The results highlight a spatio-temporal progression of cortical involvement in the continuous speech process that initiates utterances in frontal-motor areas and ends with the monitoring of auditory feedback in superior temporal gyrus. Direct comparison of cortical activity related to overt versus covert conditions revealed a common network of brain regions involved in speech that may implement orthographic and phonological processing. Our results provide one of the first characterizations of the spatiotemporal electrophysiological representations of the continuous speech process, and also highlight the common neural substrate of overt and covert speech. These results thereby contribute to a refined understanding of speech functions in the human brain.
Collapse
Affiliation(s)
- Jonathan S. Brumberg
- Department of Speech-Language-Hearing: Sciences & Disorders, University of Kansas, Lawrence, KS, United States of America
- * E-mail:
| | - Dean J. Krusienski
- Department of Electrical & Computer Engineering, Old Dominion University, Norfolk, VA, United States of America
| | - Shreya Chakrabarti
- Department of Electrical & Computer Engineering, Old Dominion University, Norfolk, VA, United States of America
| | - Aysegul Gunduz
- J. Crayton Pruitt Family Dept. of Biomedical Engineering, University of Florida, Gainesville, FL, United States of America
| | - Peter Brunner
- National Center for Adaptive Neurotechnologies, Wadsworth Center, New York State Department of Health, Albany, NY, United States of America
- Department of Neurology, Albany Medical College, Albany, NY, United States of America
| | - Anthony L. Ritaccio
- Department of Neurology, Albany Medical College, Albany, NY, United States of America
| | - Gerwin Schalk
- National Center for Adaptive Neurotechnologies, Wadsworth Center, New York State Department of Health, Albany, NY, United States of America
- Department of Neurology, Albany Medical College, Albany, NY, United States of America
| |
Collapse
|
48
|
Herff C, Schultz T. Automatic Speech Recognition from Neural Signals: A Focused Review. Front Neurosci 2016; 10:429. [PMID: 27729844 PMCID: PMC5037201 DOI: 10.3389/fnins.2016.00429] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Accepted: 09/05/2016] [Indexed: 11/13/2022] Open
Abstract
Speech interfaces have become widely accepted and are nowadays integrated in various real-life applications and devices. They have become a part of our daily life. However, speech interfaces presume the ability to produce intelligible speech, which might be impossible due to either loud environments, bothering bystanders or incapabilities to produce speech (i.e., patients suffering from locked-in syndrome). For these reasons it would be highly desirable to not speak but to simply envision oneself to say words or sentences. Interfaces based on imagined speech would enable fast and natural communication without the need for audible speech and would give a voice to otherwise mute people. This focused review analyzes the potential of different brain imaging techniques to recognize speech from neural signals by applying Automatic Speech Recognition technology. We argue that modalities based on metabolic processes, such as functional Near Infrared Spectroscopy and functional Magnetic Resonance Imaging, are less suited for Automatic Speech Recognition from neural signals due to low temporal resolution but are very useful for the investigation of the underlying neural mechanisms involved in speech processes. In contrast, electrophysiologic activity is fast enough to capture speech processes and is therefor better suited for ASR. Our experimental results indicate the potential of these signals for speech recognition from neural data with a focus on invasively measured brain activity (electrocorticography). As a first example of Automatic Speech Recognition techniques used from neural signals, we discuss the Brain-to-text system.
Collapse
Affiliation(s)
- Christian Herff
- Cognitive Systems Lab, Department for Mathematics and Computer Science, University of Bremen Bremen, Germany
| | - Tanja Schultz
- Cognitive Systems Lab, Department for Mathematics and Computer Science, University of Bremen Bremen, Germany
| |
Collapse
|
49
|
Bouchard KE, Conant DF, Anumanchipalli GK, Dichter B, Chaisanguanthum KS, Johnson K, Chang EF. High-Resolution, Non-Invasive Imaging of Upper Vocal Tract Articulators Compatible with Human Brain Recordings. PLoS One 2016; 11:e0151327. [PMID: 27019106 PMCID: PMC4809489 DOI: 10.1371/journal.pone.0151327] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2015] [Accepted: 02/27/2016] [Indexed: 11/29/2022] Open
Abstract
A complete neurobiological understanding of speech motor control requires determination of the relationship between simultaneously recorded neural activity and the kinematics of the lips, jaw, tongue, and larynx. Many speech articulators are internal to the vocal tract, and therefore simultaneously tracking the kinematics of all articulators is nontrivial—especially in the context of human electrophysiology recordings. Here, we describe a noninvasive, multi-modal imaging system to monitor vocal tract kinematics, demonstrate this system in six speakers during production of nine American English vowels, and provide new analysis of such data. Classification and regression analysis revealed considerable variability in the articulator-to-acoustic relationship across speakers. Non-negative matrix factorization extracted basis sets capturing vocal tract shapes allowing for higher vowel classification accuracy than traditional methods. Statistical speech synthesis generated speech from vocal tract measurements, and we demonstrate perceptual identification. We demonstrate the capacity to predict lip kinematics from ventral sensorimotor cortical activity. These results demonstrate a multi-modal system to non-invasively monitor articulator kinematics during speech production, describe novel analytic methods for relating kinematic data to speech acoustics, and provide the first decoding of speech kinematics from electrocorticography. These advances will be critical for understanding the cortical basis of speech production and the creation of vocal prosthetics.
Collapse
Affiliation(s)
- Kristofer E. Bouchard
- Biological Systems and Engineering Division & Computational Research Division, Lawrence Berkeley National Laboratories (LBNL), Berkeley, California, United States of America
- Department of Neurological Surgery, University of California San Francisco (UCSF), San Francisco, California, United States of America
| | - David F. Conant
- Department of Neurological Surgery, University of California San Francisco (UCSF), San Francisco, California, United States of America
- Center for Integrative Neuroscience, UCSF, San Francisco, California, United States of America
| | - Gopala K. Anumanchipalli
- Department of Neurological Surgery, University of California San Francisco (UCSF), San Francisco, California, United States of America
- Center for Integrative Neuroscience, UCSF, San Francisco, California, United States of America
| | - Benjamin Dichter
- Department of Neurological Surgery, University of California San Francisco (UCSF), San Francisco, California, United States of America
- Center for Integrative Neuroscience, UCSF, San Francisco, California, United States of America
| | - Kris S. Chaisanguanthum
- Department of Neurological Surgery, University of California San Francisco (UCSF), San Francisco, California, United States of America
- Center for Integrative Neuroscience, UCSF, San Francisco, California, United States of America
| | - Keith Johnson
- Department of Linguistics, University of California (UCB), Berkeley, California, United States of America
| | - Edward F. Chang
- Department of Neurological Surgery, University of California San Francisco (UCSF), San Francisco, California, United States of America
- Center for Integrative Neuroscience, UCSF, San Francisco, California, United States of America
- * E-mail:
| |
Collapse
|
50
|
Ritaccio A, Matsumoto R, Morrell M, Kamada K, Koubeissi M, Poeppel D, Lachaux JP, Yanagisawa Y, Hirata M, Guger C, Schalk G. Proceedings of the Seventh International Workshop on Advances in Electrocorticography. Epilepsy Behav 2015; 51:312-20. [PMID: 26322594 PMCID: PMC4593746 DOI: 10.1016/j.yebeh.2015.08.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/31/2015] [Accepted: 08/01/2015] [Indexed: 10/23/2022]
Abstract
The Seventh International Workshop on Advances in Electrocorticography (ECoG) convened in Washington, DC, on November 13-14, 2014. Electrocorticography-based research continues to proliferate widely across basic science and clinical disciplines. The 2014 workshop highlighted advances in neurolinguistics, brain-computer interface, functional mapping, and seizure termination facilitated by advances in the recording and analysis of the ECoG signal. The following proceedings document summarizes the content of this successful multidisciplinary gathering.
Collapse
Affiliation(s)
| | - Riki Matsumoto
- Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | | | | | | | - David Poeppel
- Max-Planck-Institute, Frankfurt, Germany,New York University, New York, NY, USA
| | - Jean-Philippe Lachaux
- Lyon Neuroscience Research Center, INSERM U1028, CNRS UMR5292, University Lyon I, Lyon, France
| | - Yakufumi Yanagisawa
- Graduate School of Medicine, Osaka University, Osaka, Japan,ATR Computational Neuroscience Laboratories, Kyoto, Japan
| | | | | | - Gerwin Schalk
- Albany Medical College, Albany, NY, USA,Wadsworth Center, New York State Department of Health, Albany, NY, USA
| |
Collapse
|