1
|
Hong Y, Ryun S, Chung CK. Evoking artificial speech perception through invasive brain stimulation for brain-computer interfaces: current challenges and future perspectives. Front Neurosci 2024; 18:1428256. [PMID: 38988764 PMCID: PMC11234843 DOI: 10.3389/fnins.2024.1428256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 05/06/2024] [Accepted: 06/10/2024] [Indexed: 07/12/2024] Open
Abstract
Encoding artificial perceptions through brain stimulation, especially that of higher cognitive functions such as speech perception, is one of the most formidable challenges in brain-computer interfaces (BCI). Brain stimulation has been used for functional mapping in clinical practices for the last 70 years to treat various disorders affecting the nervous system, including epilepsy, Parkinson's disease, essential tremors, and dystonia. Recently, direct electrical stimulation has been used to evoke various forms of perception in humans, ranging from sensorimotor, auditory, and visual to speech cognition. Successfully evoking and fine-tuning artificial perceptions could revolutionize communication for individuals with speech disorders and significantly enhance the capabilities of brain-computer interface technologies. However, despite the extensive literature on encoding various perceptions and the rising popularity of speech BCIs, inducing artificial speech perception is still largely unexplored, and its potential has yet to be determined. In this paper, we examine the various stimulation techniques used to evoke complex percepts and the target brain areas for the input of speech-like information. Finally, we discuss strategies to address the challenges of speech encoding and discuss the prospects of these approaches.
Collapse
Affiliation(s)
- Yirye Hong
- Department of Brain and Cognitive Sciences, College of Natural Sciences, Seoul National University, Seoul, Republic of Korea
| | - Seokyun Ryun
- Neuroscience Research Institute, Seoul National University Medical Research Center, Seoul, Republic of Korea
| | - Chun Kee Chung
- Neuroscience Research Institute, Seoul National University Medical Research Center, Seoul, Republic of Korea
| |
Collapse
|
2
|
Bai Y, Liu S, Zhu M, Wang B, Li S, Meng L, Shi X, Chen F, Jiang H, Jiang C. Perceptual Pattern of Cleft-Related Speech: A Task-fMRI Study on Typical Mandarin-Speaking Adults. Brain Sci 2023; 13:1506. [PMID: 38002467 PMCID: PMC10669275 DOI: 10.3390/brainsci13111506] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 08/31/2023] [Revised: 09/30/2023] [Accepted: 10/17/2023] [Indexed: 11/26/2023] Open
Abstract
Congenital cleft lip and palate is one of the common deformities in the craniomaxillofacial region. The current study aimed to explore the perceptual pattern of cleft-related speech produced by Mandarin-speaking patients with repaired cleft palate using the task-based functional magnetic resonance imaging (task-fMRI) technique. Three blocks of speech stimuli, including hypernasal speech, the glottal stop, and typical speech, were played to 30 typical adult listeners with no history of cleft palate speech exploration. Using a randomized block design paradigm, the participants were instructed to assess the intelligibility of the stimuli. Simultaneously, fMRI data were collected. Brain activation was compared among the three types of speech stimuli. Results revealed that greater blood-oxygen-level-dependent (BOLD) responses to the cleft-related glottal stop than to typical speech were localized in the right fusiform gyrus and the left inferior occipital gyrus. The regions responding to the contrast between the glottal stop and cleft-related hypernasal speech were located in the right fusiform gyrus. More significant BOLD responses to hypernasal speech than to the glottal stop were localized in the left orbital part of the inferior frontal gyrus and middle temporal gyrus. More significant BOLD responses to typical speech than to the glottal stop were localized in the left inferior temporal gyrus, left superior temporal gyrus, left medial superior frontal gyrus, and right angular gyrus. Furthermore, there was no significant difference between hypernasal speech and typical speech. In conclusion, the typical listener would initiate different neural processes to perceive cleft-related speech. Our findings lay a foundation for exploring the perceptual pattern of patients with repaired cleft palate.
Collapse
Affiliation(s)
- Yun Bai
- Department of Oral and Maxillofacial Surgery, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing 210029, China; (Y.B.)
- Jiangsu Province Key Laboratory of Oral Diseases, Nanjing 210029, China
- Jiangsu Province Engineering Research Center of Stomatological Translational Medicine, Nanjing 210029, China
| | - Shaowei Liu
- Department of Radiology, Jiangsu Province Hospital of Chinese Medicine, Affiliated Hospital of Nanjing University of Chinese Medicine, Nanjing 210004, China
| | - Mengxian Zhu
- Department of Oral and Maxillofacial Surgery, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing 210029, China; (Y.B.)
- Jiangsu Province Key Laboratory of Oral Diseases, Nanjing 210029, China
- Jiangsu Province Engineering Research Center of Stomatological Translational Medicine, Nanjing 210029, China
| | - Binbing Wang
- Department of Oral and Maxillofacial Surgery, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing 210029, China; (Y.B.)
- Jiangsu Province Key Laboratory of Oral Diseases, Nanjing 210029, China
- Jiangsu Province Engineering Research Center of Stomatological Translational Medicine, Nanjing 210029, China
| | - Sheng Li
- Department of Oral and Maxillofacial Surgery, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing 210029, China; (Y.B.)
- Jiangsu Province Key Laboratory of Oral Diseases, Nanjing 210029, China
- Jiangsu Province Engineering Research Center of Stomatological Translational Medicine, Nanjing 210029, China
| | - Liping Meng
- Department of Children’s Healthcare, Women’s Hospital of Nanjing Medical University, Nanjing Maternity and Child Health Care Hospital, Nanjing 210004, China
| | - Xinghui Shi
- Department of Oral and Maxillofacial Surgery, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing 210029, China; (Y.B.)
- Jiangsu Province Key Laboratory of Oral Diseases, Nanjing 210029, China
- Jiangsu Province Engineering Research Center of Stomatological Translational Medicine, Nanjing 210029, China
| | - Fei Chen
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen 518055, China
| | - Hongbing Jiang
- Department of Oral and Maxillofacial Surgery, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing 210029, China; (Y.B.)
- Jiangsu Province Key Laboratory of Oral Diseases, Nanjing 210029, China
- Jiangsu Province Engineering Research Center of Stomatological Translational Medicine, Nanjing 210029, China
| | - Chenghui Jiang
- Department of Oral and Maxillofacial Surgery, The Affiliated Stomatological Hospital of Nanjing Medical University, Nanjing 210029, China; (Y.B.)
- Jiangsu Province Key Laboratory of Oral Diseases, Nanjing 210029, China
- Jiangsu Province Engineering Research Center of Stomatological Translational Medicine, Nanjing 210029, China
| |
Collapse
|
3
|
Ylinen A, Wikman P, Leminen M, Alho K. Task-dependent cortical activations during selective attention to audiovisual speech. Brain Res 2022; 1775:147739. [PMID: 34843702 DOI: 10.1016/j.brainres.2021.147739] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 06/14/2021] [Revised: 10/21/2021] [Accepted: 11/21/2021] [Indexed: 11/28/2022]
Abstract
Selective listening to speech depends on widespread networks of the brain, but how the involvement of different neural systems in speech processing is affected by factors such as the task performed by a listener and speech intelligibility remains poorly understood. We used functional magnetic resonance imaging to systematically examine the effects that performing different tasks has on neural activations during selective attention to continuous audiovisual speech in the presence of task-irrelevant speech. Participants viewed audiovisual dialogues and attended either to the semantic or the phonological content of speech, or ignored speech altogether and performed a visual control task. The tasks were factorially combined with good and poor auditory and visual speech qualities. Selective attention to speech engaged superior temporal regions and the left inferior frontal gyrus regardless of the task. Frontoparietal regions implicated in selective auditory attention to simple sounds (e.g., tones, syllables) were not engaged by the semantic task, suggesting that this network may not be not as crucial when attending to continuous speech. The medial orbitofrontal cortex, implicated in social cognition, was most activated by the semantic task. Activity levels during the phonological task in the left prefrontal, premotor, and secondary somatosensory regions had a distinct temporal profile as well as the highest overall activity, possibly relating to the role of the dorsal speech processing stream in sub-lexical processing. Our results demonstrate that the task type influences neural activations during selective attention to speech, and emphasize the importance of ecologically valid experimental designs.
Collapse
Affiliation(s)
- Artturi Ylinen
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland.
| | - Patrik Wikman
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland; Department of Neuroscience, Georgetown University, Washington D.C., USA
| | - Miika Leminen
- Analytics and Data Services, HUS Helsinki University Hospital, Helsinki, Finland
| | - Kimmo Alho
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland; Advanced Magnetic Imaging Centre, Aalto NeuroImaging, Aalto University, Espoo, Finland
| |
Collapse
|
4
|
Venezia JH, Richards VM, Hickok G. Speech-Driven Spectrotemporal Receptive Fields Beyond the Auditory Cortex. Hear Res 2021; 408:108307. [PMID: 34311190 PMCID: PMC8378265 DOI: 10.1016/j.heares.2021.108307] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Academic Contribution Register] [Received: 04/08/2021] [Revised: 06/15/2021] [Accepted: 06/30/2021] [Indexed: 10/20/2022]
Abstract
We recently developed a method to estimate speech-driven spectrotemporal receptive fields (STRFs) using fMRI. The method uses spectrotemporal modulation filtering, a form of acoustic distortion that renders speech sometimes intelligible and sometimes unintelligible. Using this method, we found significant STRF responses only in classic auditory regions throughout the superior temporal lobes. However, our analysis was not optimized to detect small clusters of STRFs as might be expected in non-auditory regions. Here, we re-analyze our data using a more sensitive multivariate statistical test for cross-subject alignment of STRFs, and we identify STRF responses in non-auditory regions including the left dorsal premotor cortex (dPM), left inferior frontal gyrus (IFG), and bilateral calcarine sulcus (calcS). All three regions responded more to intelligible than unintelligible speech, but left dPM and calcS responded significantly to vocal pitch and demonstrated strong functional connectivity with early auditory regions. Left dPM's STRF generated the best predictions of activation on trials rated as unintelligible by listeners, a hallmark auditory profile. IFG, on the other hand, responded almost exclusively to intelligible speech and was functionally connected with classic speech-language regions in the superior temporal sulcus and middle temporal gyrus. IFG's STRF was also (weakly) able to predict activation on unintelligible trials, suggesting the presence of a partial 'acoustic trace' in the region. We conclude that left dPM is part of the human dorsal laryngeal motor cortex, a region previously shown to be capable of operating in an 'auditory mode' to encode vocal pitch. Further, given previous observations that IFG is involved in syntactic working memory and/or processing of linear order, we conclude that IFG is part of a higher-order speech circuit that exerts a top-down influence on processing of speech acoustics. Finally, because calcS is modulated by emotion, we speculate that changes in the quality of vocal pitch may have contributed to its response.
Collapse
Affiliation(s)
- Jonathan H Venezia
- VA Loma Linda Healthcare System, Loma Linda, CA, United States; Dept. of Otolaryngology, Loma Linda University School of Medicine, Loma Linda, CA, United States.
| | - Virginia M Richards
- Depts. of Cognitive Sciences and Language Science, University of California, Irvine, Irvine, CA, United States
| | - Gregory Hickok
- Depts. of Cognitive Sciences and Language Science, University of California, Irvine, Irvine, CA, United States
| |
Collapse
|
5
|
Liu L, Zhang Y, Zhou Q, Garrett DD, Lu C, Chen A, Qiu J, Ding G. Auditory-Articulatory Neural Alignment between Listener and Speaker during Verbal Communication. Cereb Cortex 2021; 30:942-951. [PMID: 31318013 DOI: 10.1093/cercor/bhz138] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 11/01/2018] [Revised: 05/20/2019] [Accepted: 05/25/2019] [Indexed: 11/13/2022] Open
Abstract
Whether auditory processing of speech relies on reference to the articulatory motor information of speaker remains elusive. Here, we addressed this issue under a two-brain framework. Functional magnetic resonance imaging was applied to record the brain activities of speakers when telling real-life stories and later of listeners when listening to the audio recordings of these stories. Based on between-brain seed-to-voxel correlation analyses, we revealed that neural dynamics in listeners' auditory temporal cortex are temporally coupled with the dynamics in the speaker's larynx/phonation area. Moreover, the coupling response in listener's left auditory temporal cortex follows the hierarchical organization for speech processing, with response lags in A1+, STG/STS, and MTG increasing linearly. Further, listeners showing greater coupling responses understand the speech better. When comprehension fails, such interbrain auditory-articulation coupling vanishes substantially. These findings suggest that a listener's auditory system and a speaker's articulatory system are inherently aligned during naturalistic verbal interaction, and such alignment is associated with high-level information transfer from the speaker to the listener. Our study provides reliable evidence supporting that references to the articulatory motor information of speaker facilitate speech comprehension under a naturalistic scene.
Collapse
Affiliation(s)
- Lanfang Liu
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, People's Republic of China.,Department of Psychology, Sun Yat-sen University, Guangzhou 510006, People's Republic of China
| | - Yuxuan Zhang
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, People's Republic of China
| | - Qi Zhou
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, People's Republic of China
| | - Douglas D Garrett
- Max Planck UCL Centre for Computational Psychiatry and Ageing Research, Max Planck Institute for Human Development, Lentzeallee 94, Berlin 14195, Germany
| | - Chunming Lu
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, People's Republic of China
| | - Antao Chen
- Key Laboratory of Cognition and Personality (SWU), Ministry of Education & Department of Psychology, Southwest University, Chongqing 400715, People's Republic of China
| | - Jiang Qiu
- Key Laboratory of Cognition and Personality (SWU), Ministry of Education & Department of Psychology, Southwest University, Chongqing 400715, People's Republic of China
| | - Guosheng Ding
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, People's Republic of China
| |
Collapse
|
6
|
Shamma S, Patel P, Mukherjee S, Marion G, Khalighinejad B, Han C, Herrero J, Bickel S, Mehta A, Mesgarani N. Learning Speech Production and Perception through Sensorimotor Interactions. Cereb Cortex Commun 2020; 2:tgaa091. [PMID: 33506209 PMCID: PMC7811190 DOI: 10.1093/texcom/tgaa091] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 11/19/2020] [Revised: 11/19/2020] [Accepted: 11/23/2020] [Indexed: 12/21/2022] Open
Abstract
Action and perception are closely linked in many behaviors necessitating a close coordination between sensory and motor neural processes so as to achieve a well-integrated smoothly evolving task performance. To investigate the detailed nature of these sensorimotor interactions, and their role in learning and executing the skilled motor task of speaking, we analyzed ECoG recordings of responses in the high-γ band (70-150 Hz) in human subjects while they listened to, spoke, or silently articulated speech. We found elaborate spectrotemporally modulated neural activity projecting in both "forward" (motor-to-sensory) and "inverse" directions between the higher-auditory and motor cortical regions engaged during speaking. Furthermore, mathematical simulations demonstrate a key role for the forward projection in "learning" to control the vocal tract, beyond its commonly postulated predictive role during execution. These results therefore offer a broader view of the functional role of the ubiquitous forward projection as an important ingredient in learning, rather than just control, of skilled sensorimotor tasks.
Collapse
Affiliation(s)
- Shihab Shamma
- Department of Electrical and Computer Engineering, Institute for Systems Research, University of Maryland, College Park, MD 20742, USA
- Laboratoire des Systèmes Perceptifs, Department des Etudes Cognitive, École Normale Supérieure, PSL University, 75005 Paris, France
| | - Prachi Patel
- Department of Electrical Engineering, Columbia University, New York, NY 10027, USA
- Mortimer B Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - Shoutik Mukherjee
- Department of Electrical and Computer Engineering, Institute for Systems Research, University of Maryland, College Park, MD 20742, USA
| | - Guilhem Marion
- Laboratoire des Systèmes Perceptifs, Department des Etudes Cognitive, École Normale Supérieure, PSL University, 75005 Paris, France
| | - Bahar Khalighinejad
- Department of Electrical Engineering, Columbia University, New York, NY 10027, USA
- Mortimer B Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - Cong Han
- Department of Electrical Engineering, Columbia University, New York, NY 10027, USA
- Mortimer B Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| | - Jose Herrero
- Neurosurgery, Hofstra Northwell School of Medicine, Manhasset, NY, USA
| | - Stephan Bickel
- Neurosurgery, Hofstra Northwell School of Medicine, Manhasset, NY, USA
| | - Ashesh Mehta
- Neurosurgery, Hofstra Northwell School of Medicine, Manhasset, NY, USA
- The Feinstein Institutes for Medical Research, Manhasset, NY 11030, USA
| | - Nima Mesgarani
- Department of Electrical Engineering, Columbia University, New York, NY 10027, USA
- Mortimer B Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, USA
| |
Collapse
|
7
|
Saltzman DI, Myers EB. Neural Representation of Articulable and Inarticulable Novel Sound Contrasts: The Role of the Dorsal Stream. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2020; 1:339-364. [PMID: 35784619 PMCID: PMC9248853 DOI: 10.1162/nol_a_00016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Academic Contribution Register] [Received: 10/21/2019] [Accepted: 05/23/2020] [Indexed: 06/15/2023]
Abstract
The extent that articulatory information embedded in incoming speech contributes to the formation of new perceptual categories for speech sounds has been a matter of discourse for decades. It has been theorized that the acquisition of new speech sound categories requires a network of sensory and speech motor cortical areas (the "dorsal stream") to successfully integrate auditory and articulatory information. However, it is possible that these brain regions are not sensitive specifically to articulatory information, but instead are sensitive to the abstract phonological categories being learned. We tested this hypothesis by training participants over the course of several days on an articulable non-native speech contrast and acoustically matched inarticulable nonspeech analogues. After reaching comparable levels of proficiency with the two sets of stimuli, activation was measured in fMRI as participants passively listened to both sound types. Decoding of category membership for the articulable speech contrast alone revealed a series of left and right hemisphere regions outside of the dorsal stream that have previously been implicated in the emergence of non-native speech sound categories, while no regions could successfully decode the inarticulable nonspeech contrast. Although activation patterns in the left inferior frontal gyrus, the middle temporal gyrus, and the supplementary motor area provided better information for decoding articulable (speech) sounds compared to the inarticulable (sine wave) sounds, the finding that dorsal stream regions do not emerge as good decoders of the articulable contrast alone suggests that other factors, including the strength and structure of the emerging speech categories are more likely drivers of dorsal stream activation for novel sound learning.
Collapse
|
8
|
Plass J, Brang D, Suzuki S, Grabowecky M. Vision perceptually restores auditory spectral dynamics in speech. Proc Natl Acad Sci U S A 2020; 117:16920-16927. [PMID: 32632010 PMCID: PMC7382243 DOI: 10.1073/pnas.2002887117] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Indexed: 11/18/2022] Open
Abstract
Visual speech facilitates auditory speech perception, but the visual cues responsible for these benefits and the information they provide remain unclear. Low-level models emphasize basic temporal cues provided by mouth movements, but these impoverished signals may not fully account for the richness of auditory information provided by visual speech. High-level models posit interactions among abstract categorical (i.e., phonemes/visemes) or amodal (e.g., articulatory) speech representations, but require lossy remapping of speech signals onto abstracted representations. Because visible articulators shape the spectral content of speech, we hypothesized that the perceptual system might exploit natural correlations between midlevel visual (oral deformations) and auditory speech features (frequency modulations) to extract detailed spectrotemporal information from visual speech without employing high-level abstractions. Consistent with this hypothesis, we found that the time-frequency dynamics of oral resonances (formants) could be predicted with unexpectedly high precision from the changing shape of the mouth during speech. When isolated from other speech cues, speech-based shape deformations improved perceptual sensitivity for corresponding frequency modulations, suggesting that listeners could exploit this cross-modal correspondence to facilitate perception. To test whether this type of correspondence could improve speech comprehension, we selectively degraded the spectral or temporal dimensions of auditory sentence spectrograms to assess how well visual speech facilitated comprehension under each degradation condition. Visual speech produced drastically larger enhancements during spectral degradation, suggesting a condition-specific facilitation effect driven by cross-modal recovery of auditory speech spectra. The perceptual system may therefore use audiovisual correlations rooted in oral acoustics to extract detailed spectrotemporal information from visual speech.
Collapse
Affiliation(s)
- John Plass
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109;
- Department of Psychology, Northwestern University, Evanston, IL 60208
| | - David Brang
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109
| | - Satoru Suzuki
- Department of Psychology, Northwestern University, Evanston, IL 60208
- Interdepartmental Neuroscience Program, Northwestern University, Chicago, IL 60611
| | - Marcia Grabowecky
- Department of Psychology, Northwestern University, Evanston, IL 60208
- Interdepartmental Neuroscience Program, Northwestern University, Chicago, IL 60611
| |
Collapse
|
9
|
Heinrich A, Knight S. Reproducibility in Cognitive Hearing Research: Theoretical Considerations and Their Practical Application in Multi-Lab Studies. Front Psychol 2020; 11:1590. [PMID: 32765364 PMCID: PMC7378399 DOI: 10.3389/fpsyg.2020.01590] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 09/25/2019] [Accepted: 06/15/2020] [Indexed: 01/12/2023] Open
Abstract
In this article, we consider the issue of reproducibility within the field of cognitive hearing science. First, we examine how retest reliability can provide useful information for the generality of results and intervention effectiveness. Second, we provide an overview of retest reliability coefficients within three areas of cognitive hearing science (cognition, speech perception, and self-reported measures of communication) and show how the reporting of these coefficients differs between fields. We argue that practices surrounding the provision of retest coefficients are currently most rigorous in clinical assessment and that basic science research would benefit from adopting similar standards. Finally, based on a distinction between direct replications (which aim to keep materials as close to the original study as possible) and conceptual replications (which test the same purported mechanism using different materials), we discuss new initiatives which address the need for both. Using the example of the auditory Stroop task, we provide practical illustrations of how these theoretical issues can be addressed within the context of a multi-lab replication study. By illustrating how theoretical concepts can be put into practice in empirical research, we hope to encourage others to set up and participate in a wide variety of reproducibility-related studies.
Collapse
Affiliation(s)
- Antje Heinrich
- Manchester Centre for Audiology and Deafness, University of Manchester, Manchester, United Kingdom
| | - Sarah Knight
- Department of Psychology, University of York, York, United Kingdom
| |
Collapse
|
10
|
Notter MP, Hanke M, Murray MM, Geiser E. Encoding of Auditory Temporal Gestalt in the Human Brain. Cereb Cortex 2020; 29:475-484. [PMID: 29365070 DOI: 10.1093/cercor/bhx328] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 05/29/2017] [Indexed: 12/16/2022] Open
Abstract
The perception of an acoustic rhythm is invariant to the absolute temporal intervals constituting a sound sequence. It is unknown where in the brain temporal Gestalt, the percept emerging from the relative temporal proximity between acoustic events, is encoded. Two different relative temporal patterns, each induced by three experimental conditions with different absolute temporal patterns as sensory basis, were presented to participants. A linear support vector machine classifier was trained to differentiate activation patterns in functional magnetic resonance imaging data to the two different percepts. Across the sensory constituents the classifier decoded which percept was perceived. A searchlight analysis localized activation patterns specific to the temporal Gestalt bilaterally to the temporoparietal junction, including the planum temporale and supramarginal gyrus, and unilaterally to the right inferior frontal gyrus (pars opercularis). We show that auditory areas not only process absolute temporal intervals, but also integrate them into percepts of Gestalt and that encoding of these percepts persists in high-level associative areas. The findings complement existing knowledge regarding the processing of absolute temporal patterns to the processing of relative temporal patterns relevant to the sequential binding of perceptual elements into Gestalt.
Collapse
Affiliation(s)
- Michael P Notter
- Department of Radiology.,Neuropsychology and Neurorehabilitation Service.,EEG Brain Mapping Core, Center for Biomedical Imaging (CIBM), Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
| | - Michael Hanke
- Institute of Psychology, Otto-von-Guericke-University.,Center for Behavioral Brain Sciences, Magdeburg, Germany
| | - Micah M Murray
- Department of Radiology.,Neuropsychology and Neurorehabilitation Service.,EEG Brain Mapping Core, Center for Biomedical Imaging (CIBM), Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland.,Ophthalmology Department, University of Lausanne and Fondation Asile des Aveugles, Lausanne, Switzerland.,Department of Hearing and Speech Sciences, Vanderbilt University, Nashville, TN, USA
| | - Eveline Geiser
- Department of Radiology.,Neuropsychology and Neurorehabilitation Service.,McGovern Institute, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
11
|
Müsch K, Himberger K, Tan KM, Valiante TA, Honey CJ. Transformation of speech sequences in human sensorimotor circuits. Proc Natl Acad Sci U S A 2020; 117:3203-3213. [PMID: 31996476 PMCID: PMC7022155 DOI: 10.1073/pnas.1910939117] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Indexed: 11/24/2022] Open
Abstract
After we listen to a series of words, we can silently replay them in our mind. Does this mental replay involve a reactivation of our original perceptual dynamics? We recorded electrocorticographic (ECoG) activity across the lateral cerebral cortex as people heard and then mentally rehearsed spoken sentences. For each region, we tested whether silent rehearsal of sentences involved reactivation of sentence-specific representations established during perception or transformation to a distinct representation. In sensorimotor and premotor cortex, we observed reliable and temporally precise responses to speech; these patterns transformed to distinct sentence-specific representations during mental rehearsal. In contrast, we observed less reliable and less temporally precise responses in prefrontal and temporoparietal cortex; these higher-order representations, which were sensitive to sentence semantics, were shared across perception and rehearsal of the same sentence. The mental rehearsal of natural speech involves the transformation of stimulus-locked speech representations in sensorimotor and premotor cortex, combined with diffuse reactivation of higher-order semantic representations.
Collapse
Affiliation(s)
- Kathrin Müsch
- Department of Psychological & Brain Sciences, Johns Hopkins University, Baltimore, MD 21218;
| | - Kevin Himberger
- Department of Psychological & Brain Sciences, Johns Hopkins University, Baltimore, MD 21218
| | - Kean Ming Tan
- Department of Statistics, University of Michigan, Ann Arbor, MI 48109
| | - Taufik A Valiante
- Krembil Research Institute, Toronto Western Hospital, Toronto, ON M5T 2S8, Canada
| | - Christopher J Honey
- Department of Psychological & Brain Sciences, Johns Hopkins University, Baltimore, MD 21218
| |
Collapse
|
12
|
Feng G, Gan Z, Wang S, Wong PCM, Chandrasekaran B. Task-General and Acoustic-Invariant Neural Representation of Speech Categories in the Human Brain. Cereb Cortex 2019; 28:3241-3254. [PMID: 28968658 DOI: 10.1093/cercor/bhx195] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 12/14/2016] [Accepted: 07/13/2017] [Indexed: 11/14/2022] Open
Abstract
A significant neural challenge in speech perception includes extracting discrete phonetic categories from continuous and multidimensional signals despite varying task demands and surface-acoustic variability. While neural representations of speech categories have been previously identified in frontal and posterior temporal-parietal regions, the task dependency and dimensional specificity of these neural representations are still unclear. Here, we asked native Mandarin participants to listen to speech syllables carrying 4 distinct lexical tone categories across passive listening, repetition, and categorization tasks while they underwent functional magnetic resonance imaging (fMRI). We used searchlight classification and representational similarity analysis (RSA) to identify the dimensional structure underlying neural representation across tasks and surface-acoustic properties. Searchlight classification analyses revealed significant "cross-task" lexical tone decoding within the bilateral superior temporal gyrus (STG) and left inferior parietal lobule (LIPL). RSA revealed that the LIPL and LSTG, in contrast to the RSTG, relate to 2 critical dimensions (pitch height, pitch direction) underlying tone perception. Outside this core representational network, we found greater activation in the inferior frontal and parietal regions for stimuli that are more perceptually similar during tone categorization. Our findings reveal the specific characteristics of fronto-tempo-parietal regions that support speech representation and categorization processing.
Collapse
Affiliation(s)
- Gangyi Feng
- Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China.,Brain and Mind Institute, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China.,Department of Communication Sciences & Disorders, Moody College of Communication, The University of Texas at Austin, 2504A Whitis Avenue (A1100), Austin, TX, USA
| | - Zhenzhong Gan
- Center for the Study of Applied Psychology and School of Psychology, South China Normal University, Guangzhou, China
| | - Suiping Wang
- Center for the Study of Applied Psychology and School of Psychology, South China Normal University, Guangzhou, China.,Guangdong Provincial Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou, China
| | - Patrick C M Wong
- Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China.,Brain and Mind Institute, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong SAR, China
| | - Bharath Chandrasekaran
- Department of Communication Sciences & Disorders, Moody College of Communication, The University of Texas at Austin, 2504A Whitis Avenue (A1100), Austin, TX, USA.,Department of Psychology, The University of Texas at Austin, 108 E. Dean Keeton Stop, Austin, TX, USA.,Department of Linguistics, The University of Texas at Austin, 305 E. 23rd Street STOP, Austin, TX, USA.,Institute for Mental Health Research, College of Liberal Arts, The University of Texas at Austin, 305 E. 23rd St. Stop, Austin, TX, USA.,The Institute for Neuroscience, The University of Texas at Austin, 1 University Station Stop, Austin, TX, USA
| |
Collapse
|
13
|
Popp M, Trumpp NM, Kiefer M. Processing of Action and Sound Verbs in Context: An FMRI Study. Transl Neurosci 2019; 10:200-222. [PMID: 31637047 PMCID: PMC6795028 DOI: 10.1515/tnsci-2019-0035] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 12/21/2018] [Accepted: 07/17/2019] [Indexed: 01/28/2023] Open
Abstract
Recent theories propose a flexible recruitment of sensory and motor brain regions during conceptual processing depending on context and task. The present functional magnetic resonance imaging study investigated the influence of context and task on conceptual processing of action and sound verbs. Participants first performed an explicit semantic context decision task, in which action and sound verbs were presented together with a context noun. The same verbs were repeatedly presented in a subsequent implicit lexical decision task together with new action and sound verbs. Thereafter, motor and acoustic localizer tasks were administered to identify brain regions involved in perception and action. During the explicit task, we found differential activations to action and sound verbs near corresponding sensorimotor brain regions. During the implicit lexical decision task, differences between action and sound verbs were absent. However, feature-specific repetition effects were observed near corresponding sensorimotor brain regions. The present results suggest flexible conceptual representations depending on context and task. Feature-specific effects were observed only near, but not within corresponding sensorimotor brain regions, as defined by the localizer tasks. Our results therefore only provide limited evidence in favor of grounded cognition theories assuming a close link between the conceptual and the sensorimotor systems.
Collapse
Affiliation(s)
- Margot Popp
- Ulm University, Department of Psychiatry, Ulm, Germany
| | | | - Markus Kiefer
- Ulm University, Department of Psychiatry, Ulm, Germany
| |
Collapse
|
14
|
Abstract
Recent evidence suggests that the motor system may have a facilitatory role in speech perception during noisy listening conditions. Studies clearly show an association between activity in auditory and motor speech systems, but also hint at a causal role for the motor system in noisy speech perception. However, in the most compelling "causal" studies performance was only measured at a single signal-to-noise ratio (SNR). If listening conditions must be noisy to invoke causal motor involvement, then effects will be contingent on the SNR at which they are tested. We used articulatory suppression to disrupt motor-speech areas while measuring phonemic identification across a range of SNRs. As controls, we also measured phoneme identification during passive listening, mandible gesturing, and foot-tapping conditions. Two-parameter (threshold, slope) psychometric functions were fit to the data in each condition. Our findings indicate: (1) no effect of experimental task on psychometric function slopes; (2) a small effect of articulatory suppression, in particular, on psychometric function thresholds. The size of the latter effect was 1 dB (~5% correct) on average, suggesting, at best, a minor modulatory role of the speech motor system in perception.
Collapse
Affiliation(s)
- Ryan C Stokes
- Department of Cognitive Sciences Social and Behavioral Sciences Gateway, University of California - Irvine, Irvine, CA, 92697-5100, USA.
| | - Jonathan H Venezia
- Department of Cognitive Sciences Social and Behavioral Sciences Gateway, University of California - Irvine, Irvine, CA, 92697-5100, USA
| | - Gregory Hickok
- Department of Cognitive Sciences Social and Behavioral Sciences Gateway, University of California - Irvine, Irvine, CA, 92697-5100, USA
| |
Collapse
|
15
|
Rabbani Q, Milsap G, Crone NE. The Potential for a Speech Brain-Computer Interface Using Chronic Electrocorticography. Neurotherapeutics 2019; 16:144-165. [PMID: 30617653 PMCID: PMC6361062 DOI: 10.1007/s13311-018-00692-2] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Indexed: 12/30/2022] Open
Abstract
A brain-computer interface (BCI) is a technology that uses neural features to restore or augment the capabilities of its user. A BCI for speech would enable communication in real time via neural correlates of attempted or imagined speech. Such a technology would potentially restore communication and improve quality of life for locked-in patients and other patients with severe communication disorders. There have been many recent developments in neural decoders, neural feature extraction, and brain recording modalities facilitating BCI for the control of prosthetics and in automatic speech recognition (ASR). Indeed, ASR and related fields have developed significantly over the past years, and many lend many insights into the requirements, goals, and strategies for speech BCI. Neural speech decoding is a comparatively new field but has shown much promise with recent studies demonstrating semantic, auditory, and articulatory decoding using electrocorticography (ECoG) and other neural recording modalities. Because the neural representations for speech and language are widely distributed over cortical regions spanning the frontal, parietal, and temporal lobes, the mesoscopic scale of population activity captured by ECoG surface electrode arrays may have distinct advantages for speech BCI, in contrast to the advantages of microelectrode arrays for upper-limb BCI. Nevertheless, there remain many challenges for the translation of speech BCIs to clinical populations. This review discusses and outlines the current state-of-the-art for speech BCI and explores what a speech BCI using chronic ECoG might entail.
Collapse
Affiliation(s)
- Qinwan Rabbani
- Department of Electrical Engineering, The Johns Hopkins University Whiting School of Engineering, Baltimore, MD, USA.
| | - Griffin Milsap
- Department of Biomedical Engineering, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Nathan E Crone
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| |
Collapse
|
16
|
Saltuklaroglu T, Bowers A, Harkrider AW, Casenhiser D, Reilly KJ, Jenson DE, Thornton D. EEG mu rhythms: Rich sources of sensorimotor information in speech processing. BRAIN AND LANGUAGE 2018; 187:41-61. [PMID: 30509381 DOI: 10.1016/j.bandl.2018.09.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Academic Contribution Register] [Received: 04/11/2017] [Revised: 09/27/2017] [Accepted: 09/23/2018] [Indexed: 06/09/2023]
Affiliation(s)
- Tim Saltuklaroglu
- Department of Audiology and Speech-Language Pathology, University of Tennessee Health Sciences, Knoxville, TN 37996, USA.
| | - Andrew Bowers
- University of Arkansas, Epley Center for Health Professions, 606 N. Razorback Road, Fayetteville, AR 72701, USA
| | - Ashley W Harkrider
- Department of Audiology and Speech-Language Pathology, University of Tennessee Health Sciences, Knoxville, TN 37996, USA
| | - Devin Casenhiser
- Department of Audiology and Speech-Language Pathology, University of Tennessee Health Sciences, Knoxville, TN 37996, USA
| | - Kevin J Reilly
- Department of Audiology and Speech-Language Pathology, University of Tennessee Health Sciences, Knoxville, TN 37996, USA
| | - David E Jenson
- Department of Speech and Hearing Sciences, Elson S. Floyd College of Medicine, Spokane, WA 99210-1495, USA
| | - David Thornton
- Department of Hearing, Speech, and Language Sciences, Gallaudet University, 800 Florida Avenue NE, Washington, DC 20002, USA
| |
Collapse
|
17
|
Shen G, Meltzoff AN, Marshall PJ. Touching lips and hearing fingers: effector-specific congruency between tactile and auditory stimulation modulates N1 amplitude and alpha desynchronization. Exp Brain Res 2018; 236:13-29. [PMID: 29038847 PMCID: PMC5976883 DOI: 10.1007/s00221-017-5104-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 05/30/2017] [Accepted: 10/05/2017] [Indexed: 10/18/2022]
Abstract
Understanding the interactions between audition and sensorimotor processes is of theoretical importance, particularly in relation to speech processing. Although one current focus in this area is on interactions between auditory perception and the motor system, there has been less research on connections between the auditory and somatosensory modalities. The current study takes a novel approach to this omission by examining specific auditory-tactile interactions in the context of speech and non-speech sound production. Electroencephalography was used to examine brain responses when participants were presented with speech syllables (a bilabial sound /pa/ and a non-labial sound /ka/) or finger-snapping sounds that were simultaneously paired with tactile stimulation of either the lower lip or the right middle finger. Analyses focused on the sensory-evoked N1 in the event-related potential and the extent of alpha band desynchronization elicited by the stimuli. N1 amplitude over fronto-central sites was significantly enhanced when the bilabial /pa/ sound was paired with tactile lip stimulation and when the finger-snapping sound was paired with tactile stimulation of the finger. Post-stimulus alpha desynchronization at central sites was also enhanced when the /pa/ sound was accompanied by tactile stimulation of the lip. These novel findings indicate that neural aspects of somatosensory-auditory interactions are influenced by the congruency between the location of the bodily touch and the bodily origin of a perceived sound.
Collapse
Affiliation(s)
- Guannan Shen
- Department of Psychology, 1701 N 13th Street, Philadelphia, PA, 19122, USA.
| | - Andrew N Meltzoff
- Department of Psychology, 1701 N 13th Street, Philadelphia, PA, 19122, USA
- Institute for Learning and Brian Sciences, University of Washington, Seattle, USA
| | - Peter J Marshall
- Department of Psychology, 1701 N 13th Street, Philadelphia, PA, 19122, USA
| |
Collapse
|
18
|
Evans S. What Has Replication Ever Done for Us? Insights from Neuroimaging of Speech Perception. Front Hum Neurosci 2017; 11:41. [PMID: 28203154 PMCID: PMC5285370 DOI: 10.3389/fnhum.2017.00041] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 10/16/2016] [Accepted: 01/19/2017] [Indexed: 12/03/2022] Open
Affiliation(s)
- Samuel Evans
- Institute of Cognitive Neuroscience, University College LondonLondon UK; Department of Psychology, University of WestminsterLondon, UK
| |
Collapse
|
19
|
Keane A. A common neural mechanism for speech perception and movement initiation specialized for place of articulation. COGENT PSYCHOLOGY 2016. [DOI: 10.1080/23311908.2016.1233649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Indexed: 10/20/2022] Open
Affiliation(s)
- A.M. Keane
- Psychology, National University of Ireland, Galway, Ireland
| |
Collapse
|
20
|
Schomers MR, Pulvermüller F. Is the Sensorimotor Cortex Relevant for Speech Perception and Understanding? An Integrative Review. Front Hum Neurosci 2016; 10:435. [PMID: 27708566 PMCID: PMC5030253 DOI: 10.3389/fnhum.2016.00435] [Citation(s) in RCA: 74] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 05/26/2016] [Accepted: 08/15/2016] [Indexed: 11/21/2022] Open
Abstract
In the neuroscience of language, phonemes are frequently described as multimodal units whose neuronal representations are distributed across perisylvian cortical regions, including auditory and sensorimotor areas. A different position views phonemes primarily as acoustic entities with posterior temporal localization, which are functionally independent from frontoparietal articulatory programs. To address this current controversy, we here discuss experimental results from functional magnetic resonance imaging (fMRI) as well as transcranial magnetic stimulation (TMS) studies. On first glance, a mixed picture emerges, with earlier research documenting neurofunctional distinctions between phonemes in both temporal and frontoparietal sensorimotor systems, but some recent work seemingly failing to replicate the latter. Detailed analysis of methodological differences between studies reveals that the way experiments are set up explains whether sensorimotor cortex maps phonological information during speech perception or not. In particular, acoustic noise during the experiment and ‘motor noise’ caused by button press tasks work against the frontoparietal manifestation of phonemes. We highlight recent studies using sparse imaging and passive speech perception tasks along with multivariate pattern analysis (MVPA) and especially representational similarity analysis (RSA), which succeeded in separating acoustic-phonological from general-acoustic processes and in mapping specific phonological information on temporal and frontoparietal regions. The question about a causal role of sensorimotor cortex on speech perception and understanding is addressed by reviewing recent TMS studies. We conclude that frontoparietal cortices, including ventral motor and somatosensory areas, reflect phonological information during speech perception and exert a causal influence on language understanding.
Collapse
Affiliation(s)
- Malte R Schomers
- Brain Language Laboratory, Department of Philosophy and Humanities, Freie Universität BerlinBerlin, Germany; Berlin School of Mind and Brain, Humboldt-Universität zu BerlinBerlin, Germany
| | - Friedemann Pulvermüller
- Brain Language Laboratory, Department of Philosophy and Humanities, Freie Universität BerlinBerlin, Germany; Berlin School of Mind and Brain, Humboldt-Universität zu BerlinBerlin, Germany
| |
Collapse
|