1
|
Zhao Y, Chen Y, Cheng K, Huang W. Artificial intelligence based multimodal language decoding from brain activity: A review. Brain Res Bull 2023; 201:110713. [PMID: 37487829 DOI: 10.1016/j.brainresbull.2023.110713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 06/26/2023] [Accepted: 07/20/2023] [Indexed: 07/26/2023]
Abstract
Decoding brain activity is conducive to the breakthrough of brain-computer interface (BCI) technology. The development of artificial intelligence (AI) continually promotes the progress of brain language decoding technology. Existent research has mainly focused on a single modality and paid insufficient attention to AI methods. Therefore, our objective is to provide an overview of relevant decoding research from the perspective of different modalities and methodologies. The modalities involve text, speech, image, and video, whereas the core method is using AI-built decoders to translate brain signals induced by multimodal stimuli into text or vocal language. The semantic information of brain activity can be successfully decoded into a language at various levels, ranging from words through sentences to discourses. However, the decoding effect is affected by various factors, such as the decoding model, vector representation model, and brain regions. Challenges and future directions are also discussed. The advances in brain language decoding and BCI technology will potentially assist patients with clinical aphasia in regaining the ability to communicate.
Collapse
Affiliation(s)
- Yuhao Zhao
- College of Language Intelligence, Sichuan International Studies University, Chongqing 400031, PR China
| | - Yu Chen
- Technical College for the Deaf, Tianjin University of Technology, Tianjin 300384, PR China
| | - Kaiwen Cheng
- College of Language Intelligence, Sichuan International Studies University, Chongqing 400031, PR China.
| | - Wei Huang
- Sichuan Provincial Key Laboratory for Human Disease Gene Study, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu 611731, PR China.
| |
Collapse
|
2
|
Suess N, Hauswald A, Reisinger P, Rösch S, Keitel A, Weisz N. Cortical tracking of formant modulations derived from silently presented lip movements and its decline with age. Cereb Cortex 2022; 32:4818-4833. [PMID: 35062025 PMCID: PMC9627034 DOI: 10.1093/cercor/bhab518] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Revised: 12/15/2021] [Accepted: 12/16/2021] [Indexed: 11/26/2022] Open
Abstract
The integration of visual and auditory cues is crucial for successful processing of speech, especially under adverse conditions. Recent reports have shown that when participants watch muted videos of speakers, the phonological information about the acoustic speech envelope, which is associated with but independent from the speakers' lip movements, is tracked by the visual cortex. However, the speech signal also carries richer acoustic details, for example, about the fundamental frequency and the resonant frequencies, whose visuophonological transformation could aid speech processing. Here, we investigated the neural basis of the visuo-phonological transformation processes of these more fine-grained acoustic details and assessed how they change as a function of age. We recorded whole-head magnetoencephalographic (MEG) data while the participants watched silent normal (i.e., natural) and reversed videos of a speaker and paid attention to their lip movements. We found that the visual cortex is able to track the unheard natural modulations of resonant frequencies (or formants) and the pitch (or fundamental frequency) linked to lip movements. Importantly, only the processing of natural unheard formants decreases significantly with age in the visual and also in the cingulate cortex. This is not the case for the processing of the unheard speech envelope, the fundamental frequency, or the purely visual information carried by lip movements. These results show that unheard spectral fine details (along with the unheard acoustic envelope) are transformed from a mere visual to a phonological representation. Aging affects especially the ability to derive spectral dynamics at formant frequencies. As listening in noisy environments should capitalize on the ability to track spectral fine details, our results provide a novel focus on compensatory processes in such challenging situations.
Collapse
Affiliation(s)
- Nina Suess
- Department of Psychology, Centre for Cognitive Neuroscience, University of Salzburg, Salzburg 5020, Austria
| | - Anne Hauswald
- Department of Psychology, Centre for Cognitive Neuroscience, University of Salzburg, Salzburg 5020, Austria
| | - Patrick Reisinger
- Department of Psychology, Centre for Cognitive Neuroscience, University of Salzburg, Salzburg 5020, Austria
| | - Sebastian Rösch
- Department of Otorhinolaryngology, Head and Neck Surgery, Paracelsus Medical University Salzburg, University Hospital Salzburg, Salzburg 5020, Austria
| | - Anne Keitel
- School of Social Sciences, University of Dundee, Dundee DD1 4HN, UK
| | - Nathan Weisz
- Department of Psychology, Centre for Cognitive Neuroscience, University of Salzburg, Salzburg 5020, Austria
- Department of Psychology, Neuroscience Institute, Christian Doppler University Hospital, Paracelsus Medical University, Salzburg 5020, Austria
| |
Collapse
|
3
|
Auditory dominance in processing Chinese semantic abnormalities in response to competing audio-visual stimuli. Neuroscience 2022; 502:1-9. [PMID: 36031089 DOI: 10.1016/j.neuroscience.2022.08.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 08/17/2022] [Accepted: 08/22/2022] [Indexed: 10/15/2022]
Abstract
Language is a remarkable cognitive ability that can be expressed through visual (written language) or auditory (spoken language) modalities. When visual characters and auditory speech convey conflicting information, individuals may selectively attend to either one of them. However, the dominant modality in such a competing situation and the neural mechanism underlying it are still unclear. Here, we presented participants with Chinese sentences in which the visual characters and auditory speech convey conflicting information, while behavioral and electroencephalographic (EEG) responses were recorded. Results showed a prominent auditory dominance when audio-visual competition occurred. Specifically, higher accuracy (ACC), larger N400 amplitudes and more linkages in the posterior occipital-parietal areas were demonstrated in the auditory mismatch condition compared to that in the visual mismatch condition. O0ur research illustrates the superiority of the auditory speech over the visual characters, extending our understanding of the neural mechanisms of audio-visual competition in Chinese.
Collapse
|
4
|
Zhang L, Du Y. Lip movements enhance speech representations and effective connectivity in auditory dorsal stream. Neuroimage 2022; 257:119311. [PMID: 35589000 DOI: 10.1016/j.neuroimage.2022.119311] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 05/09/2022] [Accepted: 05/11/2022] [Indexed: 11/25/2022] Open
Abstract
Viewing speaker's lip movements facilitates speech perception, especially under adverse listening conditions, but the neural mechanisms of this perceptual benefit at the phonemic and feature levels remain unclear. This fMRI study addressed this question by quantifying regional multivariate representation and network organization underlying audiovisual speech-in-noise perception. Behaviorally, valid lip movements improved recognition of place of articulation to aid phoneme identification. Meanwhile, lip movements enhanced neural representations of phonemes in left auditory dorsal stream regions, including frontal speech motor areas and supramarginal gyrus (SMG). Moreover, neural representations of place of articulation and voicing features were promoted differentially by lip movements in these regions, with voicing enhanced in Broca's area while place of articulation better encoded in left ventral premotor cortex and SMG. Next, dynamic causal modeling (DCM) analysis showed that such local changes were accompanied by strengthened effective connectivity along the dorsal stream. Moreover, the neurite orientation dispersion of the left arcuate fasciculus, the bearing skeleton of auditory dorsal stream, predicted the visual enhancements of neural representations and effective connectivity. Our findings provide novel insight to speech science that lip movements promote both local phonemic and feature encoding and network connectivity in the dorsal pathway and the functional enhancement is mediated by the microstructural architecture of the circuit.
Collapse
Affiliation(s)
- Lei Zhang
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China 100101; Department of Psychology, University of Chinese Academy of Sciences, Beijing, China 100049
| | - Yi Du
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China 100101; Department of Psychology, University of Chinese Academy of Sciences, Beijing, China 100049; CAS Center for Excellence in Brain Science and Intelligence Technology, Shanghai, China 200031; Chinese Institute for Brain Research, Beijing, China 102206.
| |
Collapse
|
5
|
Bröhl F, Keitel A, Kayser C. MEG Activity in Visual and Auditory Cortices Represents Acoustic Speech-Related Information during Silent Lip Reading. eNeuro 2022; 9:ENEURO.0209-22.2022. [PMID: 35728955 PMCID: PMC9239847 DOI: 10.1523/eneuro.0209-22.2022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Accepted: 06/06/2022] [Indexed: 11/21/2022] Open
Abstract
Speech is an intrinsically multisensory signal, and seeing the speaker's lips forms a cornerstone of communication in acoustically impoverished environments. Still, it remains unclear how the brain exploits visual speech for comprehension. Previous work debated whether lip signals are mainly processed along the auditory pathways or whether the visual system directly implements speech-related processes. To probe this, we systematically characterized dynamic representations of multiple acoustic and visual speech-derived features in source localized MEG recordings that were obtained while participants listened to speech or viewed silent speech. Using a mutual-information framework we provide a comprehensive assessment of how well temporal and occipital cortices reflect the physically presented signals and unique aspects of acoustic features that were physically absent but may be critical for comprehension. Our results demonstrate that both cortices feature a functionally specific form of multisensory restoration: during lip reading, they reflect unheard acoustic features, independent of co-existing representations of the visible lip movements. This restoration emphasizes the unheard pitch signature in occipital cortex and the speech envelope in temporal cortex and is predictive of lip-reading performance. These findings suggest that when seeing the speaker's lips, the brain engages both visual and auditory pathways to support comprehension by exploiting multisensory correspondences between lip movements and spectro-temporal acoustic cues.
Collapse
Affiliation(s)
- Felix Bröhl
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld University, Bielefeld 33615, Germany
| | - Anne Keitel
- Psychology, University of Dundee, Dundee DD1 4HN, United Kingdom
| | - Christoph Kayser
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld University, Bielefeld 33615, Germany
| |
Collapse
|
6
|
Pfeffer T, Keitel C, Kluger DS, Keitel A, Russmann A, Thut G, Donner TH, Gross J. Coupling of pupil- and neuronal population dynamics reveals diverse influences of arousal on cortical processing. eLife 2022; 11:e71890. [PMID: 35133276 PMCID: PMC8853659 DOI: 10.7554/elife.71890] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Accepted: 02/04/2022] [Indexed: 11/13/2022] Open
Abstract
Fluctuations in arousal, controlled by subcortical neuromodulatory systems, continuously shape cortical state, with profound consequences for information processing. Yet, how arousal signals influence cortical population activity in detail has so far only been characterized for a few selected brain regions. Traditional accounts conceptualize arousal as a homogeneous modulator of neural population activity across the cerebral cortex. Recent insights, however, point to a higher specificity of arousal effects on different components of neural activity and across cortical regions. Here, we provide a comprehensive account of the relationships between fluctuations in arousal and neuronal population activity across the human brain. Exploiting the established link between pupil size and central arousal systems, we performed concurrent magnetoencephalographic (MEG) and pupillographic recordings in a large number of participants, pooled across three laboratories. We found a cascade of effects relative to the peak timing of spontaneous pupil dilations: Decreases in low-frequency (2-8 Hz) activity in temporal and lateral frontal cortex, followed by increased high-frequency (>64 Hz) activity in mid-frontal regions, followed by monotonic and inverted U relationships with intermediate frequency-range activity (8-32 Hz) in occipito-parietal regions. Pupil-linked arousal also coincided with widespread changes in the structure of the aperiodic component of cortical population activity, indicative of changes in the excitation-inhibition balance in underlying microcircuits. Our results provide a novel basis for studying the arousal modulation of cognitive computations in cortical circuits.
Collapse
Affiliation(s)
- Thomas Pfeffer
- Universitat Pompeu Fabra, Center for Brain and Cognition, Computational Neuroscience GroupBarcelonaSpain
- University Medical Center Hamburg-Eppendorf, Department of Neurophysiology and PathophysiologyHamburgGermany
| | - Christian Keitel
- University of Stirling, PsychologyStirlingUnited Kingdom
- Centre for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, University of GlasgowGlasgowUnited Kingdom
| | - Daniel S Kluger
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, MalmedywegMuensterGermany
- Otto Creutzfeldt Center for Cognitive and Behavioral Neuroscience, University of MünsterMuensterGermany
| | - Anne Keitel
- University of Dundee, PsychologyDundeeUnited Kingdom
| | - Alena Russmann
- University Medical Center Hamburg-Eppendorf, Department of Neurophysiology and PathophysiologyHamburgGermany
| | - Gregor Thut
- Centre for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, University of GlasgowGlasgowUnited Kingdom
| | - Tobias H Donner
- University Medical Center Hamburg-Eppendorf, Department of Neurophysiology and PathophysiologyHamburgGermany
| | - Joachim Gross
- Centre for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, University of GlasgowGlasgowUnited Kingdom
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, MalmedywegMuensterGermany
- Otto Creutzfeldt Center for Cognitive and Behavioral Neuroscience, University of MünsterMuensterGermany
| |
Collapse
|
7
|
Multivariate Analysis of Evoked Responses during the Rubber Hand Illusion Suggests a Temporal Parcellation into Manipulation and Illusion-Specific Correlates. eNeuro 2022; 9:ENEURO.0355-21.2021. [PMID: 34980661 PMCID: PMC8805188 DOI: 10.1523/eneuro.0355-21.2021] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 11/16/2021] [Accepted: 12/13/2021] [Indexed: 11/23/2022] Open
Abstract
The neurophysiological processes reflecting body illusions such as the rubber hand remain debated. Previous studies investigating the neural responses evoked by the illusion-inducing stimulation have provided diverging reports as to when these responses reflect the illusory state of the artificial limb becoming embodied. One reason for these diverging reports may be that different studies contrasted different experimental conditions to isolate potential correlates of the illusion, but individual contrasts may reflect multiple facets of the adopted experimental paradigm and not just the illusory state. To resolve these controversies, we recorded EEG responses in human participants and combined multivariate (cross-)classification with multiple Illusion and non-Illusion conditions. These conditions were designed to probe for markers of the illusory state that generalize across the spatial arrangements of limbs or the specific nature of the control object (a rubber hand or participant’s real hand), hence which are independent of the precise experimental conditions used as contrast for the illusion. Our results reveal a parcellation of evoked responses into a temporal sequence of events. Around 125 and 275 ms following stimulus onset, the neurophysiological signals reliably differentiate the illusory state from non-Illusion epochs. These results consolidate previous work by demonstrating multiple neurophysiological correlates of the rubber hand illusion and illustrate how multivariate approaches can help pinpointing those that are independent of the precise experimental configuration used to induce the illusion.
Collapse
|
8
|
Keitel A, Gross J, Kayser C. Shared and modality-specific brain regions that mediate auditory and visual word comprehension. eLife 2020; 9:e56972. [PMID: 32831168 PMCID: PMC7470824 DOI: 10.7554/elife.56972] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Accepted: 08/18/2020] [Indexed: 12/22/2022] Open
Abstract
Visual speech carried by lip movements is an integral part of communication. Yet, it remains unclear in how far visual and acoustic speech comprehension are mediated by the same brain regions. Using multivariate classification of full-brain MEG data, we first probed where the brain represents acoustically and visually conveyed word identities. We then tested where these sensory-driven representations are predictive of participants' trial-wise comprehension. The comprehension-relevant representations of auditory and visual speech converged only in anterior angular and inferior frontal regions and were spatially dissociated from those representations that best reflected the sensory-driven word identity. These results provide a neural explanation for the behavioural dissociation of acoustic and visual speech comprehension and suggest that cerebral representations encoding word identities may be more modality-specific than often upheld.
Collapse
Affiliation(s)
- Anne Keitel
- Psychology, University of DundeeDundeeUnited Kingdom
- Institute of Neuroscience and Psychology, University of GlasgowGlasgowUnited Kingdom
| | - Joachim Gross
- Institute of Neuroscience and Psychology, University of GlasgowGlasgowUnited Kingdom
- Institute for Biomagnetism and Biosignalanalysis, University of MünsterMünsterGermany
| | - Christoph Kayser
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld UniversityBielefeldGermany
| |
Collapse
|