1
|
Maguinness C, Schall S, Mathias B, Schoemann M, von Kriegstein K. Prior multisensory learning can facilitate auditory-only voice-identity and speech recognition in noise. Q J Exp Psychol (Hove) 2024:17470218241278649. [PMID: 39164830 DOI: 10.1177/17470218241278649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/22/2024]
Abstract
Seeing the visual articulatory movements of a speaker, while hearing their voice, helps with understanding what is said. This multisensory enhancement is particularly evident in noisy listening conditions. Multisensory enhancement also occurs even in auditory-only conditions: auditory-only speech and voice-identity recognition are superior for speakers previously learned with their face, compared to control learning; an effect termed the "face-benefit." Whether the face-benefit can assist in maintaining robust perception in increasingly noisy listening conditions, similar to concurrent multisensory input, is unknown. Here, in two behavioural experiments, we examined this hypothesis. In each experiment, participants learned a series of speakers' voices together with their dynamic face or control image. Following learning, participants listened to auditory-only sentences spoken by the same speakers and recognised the content of the sentences (speech recognition, Experiment 1) or the voice-identity of the speaker (Experiment 2) in increasing levels of auditory noise. For speech recognition, we observed that 14 of 30 participants (47%) showed a face-benefit. 19 of 25 participants (76%) showed a face-benefit for voice-identity recognition. For those participants who demonstrated a face-benefit, the face-benefit increased with auditory noise levels. Taken together, the results support an audio-visual model of auditory communication and suggest that the brain can develop a flexible system in which learned facial characteristics are used to deal with varying auditory uncertainty.
Collapse
Affiliation(s)
- Corrina Maguinness
- Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technische Universität Dresden, Dresden, Germany
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Sonja Schall
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Brian Mathias
- Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technische Universität Dresden, Dresden, Germany
- School of Psychology, University of Aberdeen, Aberdeen, United Kingdom
| | - Martin Schoemann
- Chair of Psychological Methods and Cognitive Modelling, Faculty of Psychology, Technische Universität Dresden, Dresden, Germany
| | - Katharina von Kriegstein
- Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technische Universität Dresden, Dresden, Germany
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| |
Collapse
|
2
|
Lifshitz-Ben-Basat A, Taitelbaum-Swead R, Fostick L. Speech perception following transcranial direct current stimulation (tDCS) over left superior temporal gyrus (STG) (including Wernicke's area) versus inferior frontal gyrus (IFG) (including Broca's area). Neuropsychologia 2024; 202:108959. [PMID: 39029652 DOI: 10.1016/j.neuropsychologia.2024.108959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 07/17/2024] [Accepted: 07/17/2024] [Indexed: 07/21/2024]
Abstract
Imaging and neurocognitive studies have searched for the brain areas involved in speech perception, specifically when speech is accompanied by noise, attempting to identify the underlying neural mechanism(s). Transcranial direct current stimulation (tDCS), a noninvasive, painless cortical neuromodulation technique, has been used to either excite or inhibit brain activity in order to better understand the neural mechanism underlying speech perception in noise. In the present study, anodal (excitatory) and cathodal (inhibitory) stimulations were performed on 48 participants, either over the left Inferior Frontal Gyrus (IFG), which includes Broca's area (n = 10 anodal, and n = 10 cathodal) or over the left Superior Temporal Gyrus (STG), which includes Wernicke's area (n = 13 anodal, n = 15 cathodal). Speech perception was measured using a sentence recognition task accompanied by white noise with a signal-to-noise ratio of -10 dB. Speech perception performance was measured four times: at baseline, after each of the two sessions of stimulation (one active and one sham session, the order of which was randomized between participants), and at a two-week follow-up session. Groups receiving anodal and cathodal stimulation over the left IFG did not show an effect of stimulation type. For groups receiving left STG stimulation, anodal stimulation resulted in higher scores, regardless of whether it was given before or after sham stimulation. However, cathodal stimulation showed an effect only when active stimulation was applied following sham stimulation. These results showed that tDCS had a direct effect on improving speech perception only over left STG. Furthermore, while anodal stimulation was effective in whatever order it was given, cathodal stimulation was effective only following sham stimulation, thereby allowing some amount of training. These findings carry both theoretical and clinical implications for the relationship between the DMN's left IFG and left STG areas during speech perception accompanied by background noise.
Collapse
Affiliation(s)
| | | | - Leah Fostick
- Department of Communication Disorders, Ariel University, Israel
| |
Collapse
|
3
|
Jeschke L, Mathias B, von Kriegstein K. Inhibitory TMS over Visual Area V5/MT Disrupts Visual Speech Recognition. J Neurosci 2023; 43:7690-7699. [PMID: 37848284 PMCID: PMC10634547 DOI: 10.1523/jneurosci.0975-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 07/26/2023] [Accepted: 09/04/2023] [Indexed: 10/19/2023] Open
Abstract
During face-to-face communication, the perception and recognition of facial movements can facilitate individuals' understanding of what is said. Facial movements are a form of complex biological motion. Separate neural pathways are thought to processing (1) simple, nonbiological motion with an obligatory waypoint in the motion-sensitive visual middle temporal area (V5/MT); and (2) complex biological motion. Here, we present findings that challenge this dichotomy. Neuronavigated offline transcranial magnetic stimulation (TMS) over V5/MT on 24 participants (17 females and 7 males) led to increased response times in the recognition of simple, nonbiological motion as well as visual speech recognition compared with TMS over the vertex, an active control region. TMS of area V5/MT also reduced practice effects on response times, that are typically observed in both visual speech and motion recognition tasks over time. Our findings provide the first indication that area V5/MT causally influences the recognition of visual speech.SIGNIFICANCE STATEMENT In everyday face-to-face communication, speech comprehension is often facilitated by viewing a speaker's facial movements. Several brain areas contribute to the recognition of visual speech. One area of interest is the motion-sensitive visual medial temporal area (V5/MT), which has been associated with the perception of simple, nonbiological motion such as moving dots, as well as more complex, biological motion such as visual speech. Here, we demonstrate using noninvasive brain stimulation that area V5/MT is causally relevant in recognizing visual speech. This finding provides new insights into the neural mechanisms that support the perception of human communication signals, which will help guide future research in typically developed individuals and populations with communication difficulties.
Collapse
Affiliation(s)
- Lisa Jeschke
- Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technische Universität Dresden, 01069 Dresden, Germany
| | - Brian Mathias
- School of Psychology, University of Aberdeen, Aberdeen AB243FX, United Kingdom
| | - Katharina von Kriegstein
- Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technische Universität Dresden, 01069 Dresden, Germany
| |
Collapse
|
4
|
Mathias B, von Kriegstein K. Enriched learning: behavior, brain, and computation. Trends Cogn Sci 2023; 27:81-97. [PMID: 36456401 DOI: 10.1016/j.tics.2022.10.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 10/05/2022] [Accepted: 10/25/2022] [Indexed: 11/29/2022]
Abstract
The presence of complementary information across multiple sensory or motor modalities during learning, referred to as multimodal enrichment, can markedly benefit learning outcomes. Why is this? Here, we integrate cognitive, neuroscientific, and computational approaches to understanding the effectiveness of enrichment and discuss recent neuroscience findings indicating that crossmodal responses in sensory and motor brain regions causally contribute to the behavioral benefits of enrichment. The findings provide novel evidence for multimodal theories of enriched learning, challenge assumptions of longstanding cognitive theories, and provide counterevidence to unimodal neurobiologically inspired theories. Enriched educational methods are likely effective not only because they may engage greater levels of attention or deeper levels of processing, but also because multimodal interactions in the brain can enhance learning and memory.
Collapse
Affiliation(s)
- Brian Mathias
- School of Psychology, University of Aberdeen, Aberdeen, UK; Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technische Universität Dresden, Dresden, Germany.
| | - Katharina von Kriegstein
- Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technische Universität Dresden, Dresden, Germany.
| |
Collapse
|
5
|
Maguinness C, von Kriegstein K. Visual mechanisms for voice-identity recognition flexibly adjust to auditory noise level. Hum Brain Mapp 2021; 42:3963-3982. [PMID: 34043249 PMCID: PMC8288083 DOI: 10.1002/hbm.25532] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Revised: 04/26/2021] [Accepted: 05/02/2021] [Indexed: 11/24/2022] Open
Abstract
Recognising the identity of voices is a key ingredient of communication. Visual mechanisms support this ability: recognition is better for voices previously learned with their corresponding face (compared to a control condition). This so‐called ‘face‐benefit’ is supported by the fusiform face area (FFA), a region sensitive to facial form and identity. Behavioural findings indicate that the face‐benefit increases in noisy listening conditions. The neural mechanisms for this increase are unknown. Here, using functional magnetic resonance imaging, we examined responses in face‐sensitive regions while participants recognised the identity of auditory‐only speakers (previously learned by face) in high (SNR −4 dB) and low (SNR +4 dB) levels of auditory noise. We observed a face‐benefit in both noise levels, for most participants (16 of 21). In high‐noise, the recognition of face‐learned speakers engaged the right posterior superior temporal sulcus motion‐sensitive face area (pSTS‐mFA), a region implicated in the processing of dynamic facial cues. The face‐benefit in high‐noise also correlated positively with increased functional connectivity between this region and voice‐sensitive regions in the temporal lobe in the group of 16 participants with a behavioural face‐benefit. In low‐noise, the face‐benefit was robustly associated with increased responses in the FFA and to a lesser extent the right pSTS‐mFA. The findings highlight the remarkably adaptive nature of the visual network supporting voice‐identity recognition in auditory‐only listening conditions.
Collapse
Affiliation(s)
- Corrina Maguinness
- Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technische Universität Dresden, Dresden, Germany.,Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Katharina von Kriegstein
- Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technische Universität Dresden, Dresden, Germany.,Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| |
Collapse
|
6
|
Thézé R, Giraud AL, Mégevand P. The phase of cortical oscillations determines the perceptual fate of visual cues in naturalistic audiovisual speech. SCIENCE ADVANCES 2020; 6:6/45/eabc6348. [PMID: 33148648 PMCID: PMC7673697 DOI: 10.1126/sciadv.abc6348] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Accepted: 09/17/2020] [Indexed: 06/11/2023]
Abstract
When we see our interlocutor, our brain seamlessly extracts visual cues from their face and processes them along with the sound of their voice, making speech an intrinsically multimodal signal. Visual cues are especially important in noisy environments, when the auditory signal is less reliable. Neuronal oscillations might be involved in the cortical processing of audiovisual speech by selecting which sensory channel contributes more to perception. To test this, we designed computer-generated naturalistic audiovisual speech stimuli where one mismatched phoneme-viseme pair in a key word of sentences created bistable perception. Neurophysiological recordings (high-density scalp and intracranial electroencephalography) revealed that the precise phase angle of theta-band oscillations in posterior temporal and occipital cortex of the right hemisphere was crucial to select whether the auditory or the visual speech cue drove perception. We demonstrate that the phase of cortical oscillations acts as an instrument for sensory selection in audiovisual speech processing.
Collapse
Affiliation(s)
- Raphaël Thézé
- Department of Basic Neurosciences, Faculty of Medicine, University of Geneva, 1202 Geneva, Switzerland
| | - Anne-Lise Giraud
- Department of Basic Neurosciences, Faculty of Medicine, University of Geneva, 1202 Geneva, Switzerland
| | - Pierre Mégevand
- Department of Basic Neurosciences, Faculty of Medicine, University of Geneva, 1202 Geneva, Switzerland.
- Division of Neurology, Department of Clinical Neurosciences, Geneva University Hospitals, 1205 Geneva, Switzerland
| |
Collapse
|
7
|
Processing communicative facial and vocal cues in the superior temporal sulcus. Neuroimage 2020; 221:117191. [PMID: 32711066 DOI: 10.1016/j.neuroimage.2020.117191] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2020] [Revised: 07/14/2020] [Accepted: 07/19/2020] [Indexed: 11/20/2022] Open
Abstract
Facial and vocal cues provide critical social information about other humans, including their emotional and attentional states and the content of their speech. Recent work has shown that the face-responsive region of posterior superior temporal sulcus ("fSTS") also responds strongly to vocal sounds. Here, we investigate the functional role of this region and the broader STS by measuring responses to a range of face movements, vocal sounds, and hand movements using fMRI. We find that the fSTS responds broadly to different types of audio and visual face action, including both richly social communicative actions, as well as minimally social noncommunicative actions, ruling out hypotheses of specialization for processing speech signals, or communicative signals more generally. Strikingly, however, responses to hand movements were very low, whether communicative or not, indicating a specific role in the analysis of face actions (facial and vocal), not a general role in the perception of any human action. Furthermore, spatial patterns of response in this region were able to decode communicative from noncommunicative face actions, both within and across modality (facial/vocal cues), indicating sensitivity to an abstract social dimension. These functional properties of the fSTS contrast with a region of middle STS that has a selective, largely unimodal auditory response to speech sounds over both communicative and noncommunicative vocal nonspeech sounds, and nonvocal sounds. Region of interest analyses were corroborated by a data-driven independent component analysis, identifying face-voice and auditory speech responses as dominant sources of voxelwise variance across the STS. These results suggest that the STS contains separate processing streams for the audiovisual analysis of face actions and auditory speech processing.
Collapse
|
8
|
A flexible workflow for simulating transcranial electric stimulation in healthy and lesioned brains. PLoS One 2020; 15:e0228119. [PMID: 32407389 PMCID: PMC7224502 DOI: 10.1371/journal.pone.0228119] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 04/23/2020] [Indexed: 01/22/2023] Open
Abstract
Simulating transcranial electric stimulation is actively researched as knowledge about the distribution of the electrical field is decisive for understanding the variability in the elicited stimulation effect. Several software pipelines comprehensively solve this task in an automated manner for standard use-cases. However, simulations for non-standard applications such as uncommon electrode shapes or the creation of head models from non-optimized T1-weighted imaging data and the inclusion of irregular structures are more difficult to accomplish. We address these limitations and suggest a comprehensive workflow to simulate transcranial electric stimulation based on open-source tools. The workflow covers the head model creation from MRI data, the electrode modeling, the modeling of anisotropic conductivity behavior of the white matter, the numerical simulation and visualization. Skin, skull, air cavities, cerebrospinal fluid, white matter, and gray matter are segmented semi-automatically from T1-weighted MR images. Electrodes of arbitrary number and shape can be modeled. The meshing of the head model is implemented in a way to preserve the feature edges of the electrodes and is free of topological restrictions of the considered structures of the head model. White matter anisotropy can be computed from diffusion-tensor imaging data. Our solver application was verified analytically and by contrasting the tDCS simulation results with that of other simulation pipelines (SimNIBS 3.0, ROAST 3.0). An agreement in both cases underlines the validity of our workflow. Our suggested solutions facilitate investigations of irregular structures in patients (e.g. lesions, implants) or new electrode types. For a coupled use of the described workflow, we provide documentation and disclose the full source code of the developed tools.
Collapse
|
9
|
Young AW, Frühholz S, Schweinberger SR. Face and Voice Perception: Understanding Commonalities and Differences. Trends Cogn Sci 2020; 24:398-410. [DOI: 10.1016/j.tics.2020.02.001] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Revised: 01/16/2020] [Accepted: 02/03/2020] [Indexed: 01/01/2023]
|
10
|
Moradi S, Lidestam B, Ning Ng EH, Danielsson H, Rönnberg J. Perceptual Doping: An Audiovisual Facilitation Effect on Auditory Speech Processing, From Phonetic Feature Extraction to Sentence Identification in Noise. Ear Hear 2019; 40:312-327. [PMID: 29870521 PMCID: PMC6400397 DOI: 10.1097/aud.0000000000000616] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2017] [Accepted: 04/15/2018] [Indexed: 11/25/2022]
Abstract
OBJECTIVE We have previously shown that the gain provided by prior audiovisual (AV) speech exposure for subsequent auditory (A) sentence identification in noise is relatively larger than that provided by prior A speech exposure. We have called this effect "perceptual doping." Specifically, prior AV speech processing dopes (recalibrates) the phonological and lexical maps in the mental lexicon, which facilitates subsequent phonological and lexical access in the A modality, separately from other learning and priming effects. In this article, we use data from the n200 study and aim to replicate and extend the perceptual doping effect using two different A and two different AV speech tasks and a larger sample than in our previous studies. DESIGN The participants were 200 hearing aid users with bilateral, symmetrical, mild-to-severe sensorineural hearing loss. There were four speech tasks in the n200 study that were presented in both A and AV modalities (gated consonants, gated vowels, vowel duration discrimination, and sentence identification in noise tasks). The modality order of speech presentation was counterbalanced across participants: half of the participants completed the A modality first and the AV modality second (A1-AV2), and the other half completed the AV modality and then the A modality (AV1-A2). Based on the perceptual doping hypothesis, which assumes that the gain of prior AV exposure will be relatively larger relative to that of prior A exposure for subsequent processing of speech stimuli, we predicted that the mean A scores in the AV1-A2 modality order would be better than the mean A scores in the A1-AV2 modality order. We therefore expected a significant difference in terms of the identification of A speech stimuli between the two modality orders (A1 versus A2). As prior A exposure provides a smaller gain than AV exposure, we also predicted that the difference in AV speech scores between the two modality orders (AV1 versus AV2) may not be statistically significantly different. RESULTS In the gated consonant and vowel tasks and the vowel duration discrimination task, there were significant differences in A performance of speech stimuli between the two modality orders. The participants' mean A performance was better in the AV1-A2 than in the A1-AV2 modality order (i.e., after AV processing). In terms of mean AV performance, no significant difference was observed between the two orders. In the sentence identification in noise task, a significant difference in the A identification of speech stimuli between the two orders was observed (A1 versus A2). In addition, a significant difference in the AV identification of speech stimuli between the two orders was also observed (AV1 versus AV2). This finding was most likely because of a procedural learning effect due to the greater complexity of the sentence materials or a combination of procedural learning and perceptual learning due to the presentation of sentential materials in noisy conditions. CONCLUSIONS The findings of the present study support the perceptual doping hypothesis, as prior AV relative to A speech exposure resulted in a larger gain for the subsequent processing of speech stimuli. For complex speech stimuli that were presented in degraded listening conditions, a procedural learning effect (or a combination of procedural learning and perceptual learning effects) also facilitated the identification of speech stimuli, irrespective of whether the prior modality was A or AV.
Collapse
Affiliation(s)
- Shahram Moradi
- Linnaeus Centre HEAD, Swedish Institute for Disability Research, Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden
| | - Björn Lidestam
- Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden
| | - Elaine Hoi Ning Ng
- Linnaeus Centre HEAD, Swedish Institute for Disability Research, Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden
- Oticon A/S, Smørum, Denmark
| | - Henrik Danielsson
- Linnaeus Centre HEAD, Swedish Institute for Disability Research, Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden
| | - Jerker Rönnberg
- Linnaeus Centre HEAD, Swedish Institute for Disability Research, Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden
| |
Collapse
|
11
|
Kaganovich N, Ancel E. Different neural processes underlie visual speech perception in school-age children and adults: An event-related potentials study. J Exp Child Psychol 2019; 184:98-122. [PMID: 31015101 PMCID: PMC6857813 DOI: 10.1016/j.jecp.2019.03.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2018] [Revised: 03/15/2019] [Accepted: 03/26/2019] [Indexed: 11/18/2022]
Abstract
The ability to use visual speech cues does not fully develop until late adolescence. The cognitive and neural processes underlying this slow maturation are not yet understood. We examined electrophysiological responses of younger (8-9 years) and older (11-12 years) children as well as adults elicited by visually perceived articulations in an audiovisual word matching task and related them to the amount of benefit gained during a speech-in-noise (SIN) perception task when seeing the talker's face. On each trial, participants first heard a word and, after a short pause, saw a speaker silently articulate a word. In half of the trials the articulated word matched the auditory word (congruent trials), whereas in the other half it did not (incongruent trials). In all three age groups, incongruent articulations elicited the N400 component and congruent articulations elicited the late positive complex (LPC). Groups did not differ in the mean amplitude of N400. The mean amplitude of LPC was larger in younger children compared with older children and adults. Importantly, the relationship between event-related potential measures and SIN performance varied by group. In 8- and 9-year-olds, neither component was predictive of SIN gain. The LPC amplitude predicted the SIN gain in older children but not in adults. Conversely, the N400 amplitude predicted the SIN gain in adults. We argue that although all groups were able to detect correspondences between auditory and visual word onsets at the phonemic/syllabic level, only adults could use this information for lexical access.
Collapse
Affiliation(s)
- Natalya Kaganovich
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, IN 47907, USA; Department of Psychological Sciences, Purdue University, West Lafayette, IN 47907, USA.
| | - Elizabeth Ancel
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, IN 47907, USA
| |
Collapse
|
12
|
Origin and evolution of human speech: Emergence from a trimodal auditory, visual and vocal network. PROGRESS IN BRAIN RESEARCH 2019; 250:345-371. [PMID: 31703907 DOI: 10.1016/bs.pbr.2019.01.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
In recent years, there have been important additions to the classical model of speech processing as originally depicted by the Broca-Wernicke model consisting of an anterior, productive region and a posterior, perceptive region, both connected via the arcuate fasciculus. The modern view implies a separation into a dorsal and a ventral pathway conveying different kinds of linguistic information, which parallels the organization of the visual system. Furthermore, this organization is highly conserved in evolution and can be seen as the neural scaffolding from which the speech networks originated. In this chapter we emphasize that the speech networks are embedded in a multimodal system encompassing audio-vocal and visuo-vocal connections, which can be referred to an ancestral audio-visuo-motor pathway present in nonhuman primates. Likewise, we propose a trimodal repertoire for speech processing and acquisition involving auditory, visual and motor representations of the basic elements of speech: phoneme, observation of mouth movements, and articulatory processes. Finally, we discuss this proposal in the context of a scenario for early speech acquisition in infants and in human evolution.
Collapse
|
13
|
Borowiak K, Schelinski S, von Kriegstein K. Recognizing visual speech: Reduced responses in visual-movement regions, but not other speech regions in autism. Neuroimage Clin 2018; 20:1078-1091. [PMID: 30368195 PMCID: PMC6202694 DOI: 10.1016/j.nicl.2018.09.019] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2018] [Revised: 09/19/2018] [Accepted: 09/21/2018] [Indexed: 12/23/2022]
Abstract
Speech information inherent in face movements is important for understanding what is said in face-to-face communication. Individuals with autism spectrum disorders (ASD) have difficulties in extracting speech information from face movements, a process called visual-speech recognition. Currently, it is unknown what dysfunctional brain regions or networks underlie the visual-speech recognition deficit in ASD. We conducted a functional magnetic resonance imaging (fMRI) study with concurrent eye tracking to investigate visual-speech recognition in adults diagnosed with high-functioning autism and pairwise matched typically developed controls. Compared to the control group (n = 17), the ASD group (n = 17) showed decreased Blood Oxygenation Level Dependent (BOLD) response during visual-speech recognition in the right visual area 5 (V5/MT) and left temporal visual speech area (TVSA) - brain regions implicated in visual-movement perception. The right V5/MT showed positive correlation with visual-speech task performance in the ASD group, but not in the control group. Psychophysiological interaction analysis (PPI) revealed that functional connectivity between the left TVSA and the bilateral V5/MT and between the right V5/MT and the left IFG was lower in the ASD than in the control group. In contrast, responses in other speech-motor regions and their connectivity were on the neurotypical level. Reduced responses and network connectivity of the visual-movement regions in conjunction with intact speech-related mechanisms indicate that perceptual mechanisms might be at the core of the visual-speech recognition deficit in ASD. Communication deficits in ASD might at least partly stem from atypical sensory processing and not higher-order cognitive processing of socially relevant information.
Collapse
Affiliation(s)
- Kamila Borowiak
- Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstraße 1a, 04103 Leipzig, Germany; Berlin School of Mind and Brain, Humboldt University of Berlin, Luisenstraße 56, 10117 Berlin, Germany; Technische Universität Dresden, Bamberger Straße 7, 01187 Dresden, Germany.
| | - Stefanie Schelinski
- Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstraße 1a, 04103 Leipzig, Germany; Technische Universität Dresden, Bamberger Straße 7, 01187 Dresden, Germany
| | - Katharina von Kriegstein
- Max Planck Institute for Human Cognitive and Brain Sciences, Stephanstraße 1a, 04103 Leipzig, Germany; Technische Universität Dresden, Bamberger Straße 7, 01187 Dresden, Germany
| |
Collapse
|
14
|
Jesse A, Bartoli M. Learning to recognize unfamiliar talkers: Listeners rapidly form representations of facial dynamic signatures. Cognition 2018; 176:195-208. [DOI: 10.1016/j.cognition.2018.03.018] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2017] [Revised: 03/13/2018] [Accepted: 03/21/2018] [Indexed: 11/25/2022]
|
15
|
Huang Q, Ou Y, Xiong H, Yang H, Zhang Z, Chen S, Ye Y, Zheng Y. The miR-34a/Bcl-2 Pathway Contributes to Auditory Cortex Neuron Apoptosis in Age-Related Hearing Loss. Audiol Neurootol 2017; 22:96-103. [DOI: 10.1159/000454874] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2016] [Accepted: 11/30/2016] [Indexed: 11/19/2022] Open
Abstract
Hypothesis: The miR-34a/Bcl-2 signaling pathway may play a role in the mechanisms related to age-related hearing loss (AHL) in the auditory cortex. Background: The auditory cortex plays a key role in the recognition and processing of complex sound. It is difficult to explain why patients with AHL have poor speech recognition, so increasing numbers of studies have focused on its central change. Although micro (mi)RNAs in the central nervous system have recently been increasingly reported to be associated with age-related diseases, the molecular mechanisms of AHL in the auditory cortex are not fully understood. Methods: The auditory brainstem response was used to assess the hearing ability of C57BL/6 mice, and q-PCR, immunohistochemistry, and Western blotting were used to detect the expression levels of miR-34a and Bcl-2 in the mouse auditory cortex. TUNEL and DNA fragmentation were adopted to detect the apoptosis of neurons in the auditory cortex. To verify the relationship of miR-34a and Bcl-2, we transfected an miR-34a mimic or miR-34a inhibitor into primary auditory cortex neurons. Results: In this study, miR-34a/Bcl-2 signaling was examined in auditory cortex neurons during aging. miR-34a and apoptosis increased in the auditory cortex neurons of C57BL/6 mice with aging, whereas an age-related decrease in Bcl-2 was determined. In the primary neurons of the auditory cortex, miR-34a overexpression inhibited Bcl-2, leading to an increase in apoptosis. Moreover, miR-34a knockdown increased Bcl-2 expression and diminished apoptosis. Conclusion: Our results support a link between age-related apoptosis in auditory cortex neurons and miR-34a/Bcl-2 signaling, which may serve as a potential mechanism of the expression of AHL in the auditory cortex.
Collapse
|
16
|
Zoefel B, Davis MH. Transcranial electric stimulation for the investigation of speech perception and comprehension. LANGUAGE, COGNITION AND NEUROSCIENCE 2017; 32:910-923. [PMID: 28670598 PMCID: PMC5470108 DOI: 10.1080/23273798.2016.1247970] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2016] [Accepted: 10/04/2016] [Indexed: 05/24/2023]
Abstract
Transcranial electric stimulation (tES), comprising transcranial direct current stimulation (tDCS) and transcranial alternating current stimulation (tACS), involves applying weak electrical current to the scalp, which can be used to modulate membrane potentials and thereby modify neural activity. Critically, behavioural or perceptual consequences of this modulation provide evidence for a causal role of neural activity in the stimulated brain region for the observed outcome. We present tES as a tool for the investigation of which neural responses are necessary for successful speech perception and comprehension. We summarise existing studies, along with challenges that need to be overcome, potential solutions, and future directions. We conclude that, although standardised stimulation parameters still need to be established, tES is a promising tool for revealing the neural basis of speech processing. Future research can use this method to explore the causal role of brain regions and neural processes for the perception and comprehension of speech.
Collapse
|
17
|
Giordano BL, Ince RAA, Gross J, Schyns PG, Panzeri S, Kayser C. Contributions of local speech encoding and functional connectivity to audio-visual speech perception. eLife 2017; 6. [PMID: 28590903 PMCID: PMC5462535 DOI: 10.7554/elife.24763] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2016] [Accepted: 05/07/2017] [Indexed: 11/13/2022] Open
Abstract
Seeing a speaker’s face enhances speech intelligibility in adverse environments. We investigated the underlying network mechanisms by quantifying local speech representations and directed connectivity in MEG data obtained while human participants listened to speech of varying acoustic SNR and visual context. During high acoustic SNR speech encoding by temporally entrained brain activity was strong in temporal and inferior frontal cortex, while during low SNR strong entrainment emerged in premotor and superior frontal cortex. These changes in local encoding were accompanied by changes in directed connectivity along the ventral stream and the auditory-premotor axis. Importantly, the behavioral benefit arising from seeing the speaker’s face was not predicted by changes in local encoding but rather by enhanced functional connectivity between temporal and inferior frontal cortex. Our results demonstrate a role of auditory-frontal interactions in visual speech representations and suggest that functional connectivity along the ventral pathway facilitates speech comprehension in multisensory environments. DOI:http://dx.doi.org/10.7554/eLife.24763.001 When listening to someone in a noisy environment, such as a cocktail party, we can understand the speaker more easily if we can also see his or her face. Movements of the lips and tongue convey additional information that helps the listener’s brain separate out syllables, words and sentences. However, exactly where in the brain this effect occurs and how it works remain unclear. To find out, Giordano et al. scanned the brains of healthy volunteers as they watched clips of people speaking. The clarity of the speech varied between clips. Furthermore, in some of the clips the lip movements of the speaker corresponded to the speech in question, whereas in others the lip movements were nonsense babble. As expected, the volunteers performed better on a word recognition task when the speech was clear and when the lips movements agreed with the spoken dialogue. Watching the video clips stimulated rhythmic activity in multiple regions of the volunteers’ brains, including areas that process sound and areas that plan movements. Speech is itself rhythmic, and the volunteers’ brain activity synchronized with the rhythms of the speech they were listening to. Seeing the speaker’s face increased this degree of synchrony. However, it also made it easier for sound-processing regions within the listeners’ brains to transfer information to one other. Notably, only the latter effect predicted improved performance on the word recognition task. This suggests that seeing a person’s face makes it easier to understand his or her speech by boosting communication between brain regions, rather than through effects on individual areas. Further work is required to determine where and how the brain encodes lip movements and speech sounds. The next challenge will be to identify where these two sets of information interact, and how the brain merges them together to generate the impression of specific words. DOI:http://dx.doi.org/10.7554/eLife.24763.002
Collapse
Affiliation(s)
- Bruno L Giordano
- Institut de Neurosciences de la Timone UMR 7289, Aix Marseille Université - Centre National de la Recherche Scientifique, Marseille, France.,Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Robin A A Ince
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Joachim Gross
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Philippe G Schyns
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Stefano Panzeri
- Neural Computation Laboratory, Center for Neuroscience and Cognitive Systems, Istituto Italiano di Tecnologia, Rovereto, Italy
| | - Christoph Kayser
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| |
Collapse
|
18
|
Moradi S, Wahlin A, Hällgren M, Rönnberg J, Lidestam B. The Efficacy of Short-term Gated Audiovisual Speech Training for Improving Auditory Sentence Identification in Noise in Elderly Hearing Aid Users. Front Psychol 2017; 8:368. [PMID: 28348542 PMCID: PMC5346541 DOI: 10.3389/fpsyg.2017.00368] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2016] [Accepted: 02/27/2017] [Indexed: 11/13/2022] Open
Abstract
This study aimed to examine the efficacy and maintenance of short-term (one-session) gated audiovisual speech training for improving auditory sentence identification in noise in experienced elderly hearing-aid users. Twenty-five hearing aid users (16 men and 9 women), with an average age of 70.8 years, were randomly divided into an experimental (audiovisual training, n = 14) and a control (auditory training, n = 11) group. Participants underwent gated speech identification tasks comprising Swedish consonants and words presented at 65 dB sound pressure level with a 0 dB signal-to-noise ratio (steady-state broadband noise), in audiovisual or auditory-only training conditions. The Hearing-in-Noise Test was employed to measure participants’ auditory sentence identification in noise before the training (pre-test), promptly after training (post-test), and 1 month after training (one-month follow-up). The results showed that audiovisual training improved auditory sentence identification in noise promptly after the training (post-test vs. pre-test scores); furthermore, this improvement was maintained 1 month after the training (one-month follow-up vs. pre-test scores). Such improvement was not observed in the control group, neither promptly after the training nor at the one-month follow-up. However, no significant between-groups difference nor an interaction between groups and session was observed. Conclusion: Audiovisual training may be considered in aural rehabilitation of hearing aid users to improve listening capabilities in noisy conditions. However, the lack of a significant between-groups effect (audiovisual vs. auditory) or an interaction between group and session calls for further research.
Collapse
Affiliation(s)
- Shahram Moradi
- Linnaeus Centre HEAD, Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden
| | - Anna Wahlin
- Linnaeus Centre HEAD, Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden
| | - Mathias Hällgren
- Department of Otorhinolaryngology and Department of Clinical and Experimental Medicine, Linköping University, Linköping, Sweden
| | - Jerker Rönnberg
- Linnaeus Centre HEAD, Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden
| | - Björn Lidestam
- Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden
| |
Collapse
|
19
|
Mouth and Voice: A Relationship between Visual and Auditory Preference in the Human Superior Temporal Sulcus. J Neurosci 2017; 37:2697-2708. [PMID: 28179553 DOI: 10.1523/jneurosci.2914-16.2017] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Revised: 01/17/2017] [Accepted: 01/24/2017] [Indexed: 11/21/2022] Open
Abstract
Cortex in and around the human posterior superior temporal sulcus (pSTS) is known to be critical for speech perception. The pSTS responds to both the visual modality (especially biological motion) and the auditory modality (especially human voices). Using fMRI in single subjects with no spatial smoothing, we show that visual and auditory selectivity are linked. Regions of the pSTS were identified that preferred visually presented moving mouths (presented in isolation or as part of a whole face) or moving eyes. Mouth-preferring regions responded strongly to voices and showed a significant preference for vocal compared with nonvocal sounds. In contrast, eye-preferring regions did not respond to either vocal or nonvocal sounds. The converse was also true: regions of the pSTS that showed a significant response to speech or preferred vocal to nonvocal sounds responded more strongly to visually presented mouths than eyes. These findings can be explained by environmental statistics. In natural environments, humans see visual mouth movements at the same time as they hear voices, while there is no auditory accompaniment to visual eye movements. The strength of a voxel's preference for visual mouth movements was strongly correlated with the magnitude of its auditory speech response and its preference for vocal sounds, suggesting that visual and auditory speech features are coded together in small populations of neurons within the pSTS.SIGNIFICANCE STATEMENT Humans interacting face to face make use of auditory cues from the talker's voice and visual cues from the talker's mouth to understand speech. The human posterior superior temporal sulcus (pSTS), a brain region known to be important for speech perception, is complex, with some regions responding to specific visual stimuli and others to specific auditory stimuli. Using BOLD fMRI, we show that the natural statistics of human speech, in which voices co-occur with mouth movements, are reflected in the neural architecture of the pSTS. Different pSTS regions prefer visually presented faces containing either a moving mouth or moving eyes, but only mouth-preferring regions respond strongly to voices.
Collapse
|
20
|
Rosenblum LD, Dorsi J, Dias JW. The Impact and Status of Carol Fowler's Supramodal Theory of Multisensory Speech Perception. ECOLOGICAL PSYCHOLOGY 2016. [DOI: 10.1080/10407413.2016.1230373] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
21
|
Diamond E, Zhang Y. Cortical processing of phonetic and emotional information in speech: A cross-modal priming study. Neuropsychologia 2016; 82:110-122. [PMID: 26796714 DOI: 10.1016/j.neuropsychologia.2016.01.019] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2015] [Revised: 01/06/2016] [Accepted: 01/16/2016] [Indexed: 10/22/2022]
Abstract
The current study employed behavioral and electrophysiological measures to investigate the timing, localization, and neural oscillation characteristics of cortical activities associated with phonetic and emotional information processing of speech. The experimental design used a cross-modal priming paradigm in which the normal adult participants were presented a visual prime followed by an auditory target. Primes were facial expressions that systematically varied in emotional content (happy or angry) and mouth shape (corresponding to /a/ or /i/ vowels). Targets were spoken words that varied by emotional prosody (happy or angry) and vowel (/a/ or /i/). In both the phonetic and prosodic conditions, participants were asked to judge congruency status of the visual prime and the auditory target. Behavioral results showed a congruency effect for both percent correct and reaction time. Two ERP responses, the N400 and late positive response (LPR), were identified in both conditions. Source localization and inter-trial phase coherence of the N400 and LPR components further revealed different cortical contributions and neural oscillation patterns for selective processing of phonetic and emotional information in speech. The results provide corroborating evidence for the necessity of differentiating brain mechanisms underlying the representation and processing of co-existing linguistic and paralinguistic information in spoken language, which has important implications for theoretical models of speech recognition as well as clinical studies on the neural bases of language and social communication deficits.
Collapse
Affiliation(s)
- Erin Diamond
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Minneapolis, MN 55455, USA
| | - Yang Zhang
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Minneapolis, MN 55455, USA; Center for Neurobehavioral Development, University of Minnesota, Minneapolis, MN 55455, USA; School of Foreign Languages, Shanghai Jiao Tong University, Shanghai 200240, China.
| |
Collapse
|
22
|
Krishnamurthy V, Gopinath K, Brown GS, Hampstead BM. Resting-state fMRI reveals enhanced functional connectivity in spatial navigation networks after transcranial direct current stimulation. Neurosci Lett 2015; 604:80-5. [PMID: 26240994 DOI: 10.1016/j.neulet.2015.07.042] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2015] [Revised: 07/27/2015] [Accepted: 07/29/2015] [Indexed: 11/28/2022]
Abstract
A number of studies have established that transcranial direct current stimulation (tDCS) modulates cortical excitability. We previously demonstrated polarity dependent changes in parietal lobe blood oxygen level dependent (BOLD) fMRI in a group of young adults during a spatial navigation task [15]. Here we used resting state functional connectivity (rsFC) to examine whether analogous changes were also evident during the resting state. Participants were randomized to either a parietal-anodal, frontal-cathodal (P+F-) or the opposite montage (P-F+) and received 20min of tDCS (2mA) before undergoing resting-state fMRI. rsFC was evaluated between the groups by placing a seed in the medial superior parietal lobule (mSPL), which was under the target electrode. rsFC between the mSPL and a number of other areas involved in spatial navigation, scene processing, and sensorimotor processing was significantly higher in the P+F- than the P-F+ group. Thus, the modulatory effects of tDCS were evident during rest and suggest that stimulation primes not just the underlying neocortex but an extended network that can be recruited as necessary during active task performance.
Collapse
Affiliation(s)
| | - Kaundinya Gopinath
- Department of Radiology & Imaging Sciences, Emory University, Atlanta, GA, USA.
| | - Gregory S Brown
- Department of Rehabilitation Medicine, Emory University, Atlanta, GA, USA
| | - Benjamin M Hampstead
- Rehabilitation R&D Center of Excellence, Atlanta VAMC, Decatur, GA, USA; Department of Rehabilitation Medicine, Emory University, Atlanta, GA, USA
| |
Collapse
|
23
|
Prediction in speech and language processing. Cortex 2015; 68:1-7. [PMID: 26048658 DOI: 10.1016/j.cortex.2015.05.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2015] [Revised: 05/03/2015] [Accepted: 05/03/2015] [Indexed: 11/20/2022]
|
24
|
Visual abilities are important for auditory-only speech recognition: Evidence from autism spectrum disorder. Neuropsychologia 2014; 65:1-11. [DOI: 10.1016/j.neuropsychologia.2014.09.031] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2014] [Revised: 08/25/2014] [Accepted: 09/18/2014] [Indexed: 11/22/2022]
|