1
|
Arya R, Ervin B, Greiner HM, Buroker J, Byars AW, Tenney JR, Arthur TM, Fong SL, Lin N, Frink C, Rozhkov L, Scholle C, Skoch J, Leach JL, Mangano FT, Glauser TA, Hickok G, Holland KD. Emotional facial expression and perioral motor functions of the human auditory cortex. Clin Neurophysiol 2024; 163:102-111. [PMID: 38729074 PMCID: PMC11176009 DOI: 10.1016/j.clinph.2024.04.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 04/16/2024] [Accepted: 04/17/2024] [Indexed: 05/12/2024]
Abstract
OBJECTIVE We investigated the role of transverse temporal gyrus and adjacent cortex (TTG+) in facial expressions and perioral movements. METHODS In 31 patients undergoing stereo-electroencephalography monitoring, we describe behavioral responses elicited by electrical stimulation within the TTG+. Task-induced high-gamma modulation (HGM), auditory evoked responses, and resting-state connectivity were used to investigate the cortical sites having different types of responses on electrical stimulation. RESULTS Changes in facial expressions and perioral movements were elicited on electrical stimulation within TTG+ in 9 (29%) and 10 (32%) patients, respectively, in addition to the more common language responses (naming interruptions, auditory hallucinations, paraphasic errors). All functional sites showed auditory task induced HGM and evoked responses validating their location within the auditory cortex, however, motor sites showed lower peak amplitudes and longer peak latencies compared to language sites. Significant first-degree connections for motor sites included precentral, anterior cingulate, parahippocampal, and anterior insular gyri, whereas those for language sites included posterior superior temporal, posterior middle temporal, inferior frontal, supramarginal, and angular gyri. CONCLUSIONS Multimodal data suggests that TTG+ may participate in auditory-motor integration. SIGNIFICANCE TTG+ likely participates in facial expressions in response to emotional cues during an auditory discourse.
Collapse
Affiliation(s)
- Ravindra Arya
- Comprehensive Epilepsy Center, Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA; Department of Electrical Engineering and Computer Science, University of Cincinnati, Cincinnati, OH, USA.
| | - Brian Ervin
- Comprehensive Epilepsy Center, Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA; Department of Electrical Engineering and Computer Science, University of Cincinnati, Cincinnati, OH, USA
| | - Hansel M Greiner
- Comprehensive Epilepsy Center, Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Jason Buroker
- Comprehensive Epilepsy Center, Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Anna W Byars
- Comprehensive Epilepsy Center, Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Jeffrey R Tenney
- Comprehensive Epilepsy Center, Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Todd M Arthur
- Comprehensive Epilepsy Center, Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Susan L Fong
- Comprehensive Epilepsy Center, Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Nan Lin
- Comprehensive Epilepsy Center, Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Clayton Frink
- Comprehensive Epilepsy Center, Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Leonid Rozhkov
- Comprehensive Epilepsy Center, Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Craig Scholle
- Comprehensive Epilepsy Center, Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Jesse Skoch
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA; Division of Pediatric Neurosurgery, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - James L Leach
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA; Division of Pediatric Neuro-radiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Francesco T Mangano
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA; Division of Pediatric Neurosurgery, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Tracy A Glauser
- Comprehensive Epilepsy Center, Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Gregory Hickok
- Department of Cognitive Sciences, Department of Language Science, University of California, Irvine, CA, USA
| | - Katherine D Holland
- Comprehensive Epilepsy Center, Division of Neurology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| |
Collapse
|
2
|
Dong C, Noppeney U, Wang S. Perceptual uncertainty explains activation differences between audiovisual congruent speech and McGurk stimuli. Hum Brain Mapp 2024; 45:e26653. [PMID: 38488460 DOI: 10.1002/hbm.26653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 02/20/2024] [Accepted: 02/26/2024] [Indexed: 03/19/2024] Open
Abstract
Face-to-face communication relies on the integration of acoustic speech signals with the corresponding facial articulations. In the McGurk illusion, an auditory /ba/ phoneme presented simultaneously with a facial articulation of a /ga/ (i.e., viseme), is typically fused into an illusory 'da' percept. Despite its widespread use as an index of audiovisual speech integration, critics argue that it arises from perceptual processes that differ categorically from natural speech recognition. Conversely, Bayesian theoretical frameworks suggest that both the illusory McGurk and the veridical audiovisual congruent speech percepts result from probabilistic inference based on noisy sensory signals. According to these models, the inter-sensory conflict in McGurk stimuli may only increase observers' perceptual uncertainty. This functional magnetic resonance imaging (fMRI) study presented participants (20 male and 24 female) with audiovisual congruent, McGurk (i.e., auditory /ba/ + visual /ga/), and incongruent (i.e., auditory /ga/ + visual /ba/) stimuli along with their unisensory counterparts in a syllable categorization task. Behaviorally, observers' response entropy was greater for McGurk compared to congruent audiovisual stimuli. At the neural level, McGurk stimuli increased activations in a widespread neural system, extending from the inferior frontal sulci (IFS) to the pre-supplementary motor area (pre-SMA) and insulae, typically involved in cognitive control processes. Crucially, in line with Bayesian theories these activation increases were fully accounted for by observers' perceptual uncertainty as measured by their response entropy. Our findings suggest that McGurk and congruent speech processing rely on shared neural mechanisms, thereby supporting the McGurk illusion as a valid measure of natural audiovisual speech perception.
Collapse
Affiliation(s)
- Chenjie Dong
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents (South China Normal University), Ministry of Education, Guangzhou, China
- Donders Institute for Brain, Cognition, and Behavior, Radboud University, Nijmegen, the Netherlands
| | - Uta Noppeney
- Donders Institute for Brain, Cognition, and Behavior, Radboud University, Nijmegen, the Netherlands
| | - Suiping Wang
- Philosophy and Social Science Laboratory of Reading and Development in Children and Adolescents (South China Normal University), Ministry of Education, Guangzhou, China
| |
Collapse
|
3
|
Loskutova E, Butler JS, Setti A, O'Brien C, Loughman J. Ability to Process Multisensory Information Is Impaired in Open Angle Glaucoma. J Glaucoma 2024; 33:78-86. [PMID: 37974328 DOI: 10.1097/ijg.0000000000002331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 10/09/2023] [Indexed: 11/19/2023]
Abstract
PRCIS Patients with glaucoma demonstrated deficiencies in their ability to process multisensory information when compared with controls, with those deficiencies being related to glaucoma severity. Impaired multisensory integration (MSI) may affect the quality of life in individuals with glaucoma and may contribute to the increased prevalence of falls and driving safety concerns. Therapeutic possibilities to influence cognition in glaucoma should be explored. PURPOSE Glaucoma is a neurodegenerative disease of the optic nerve that has also been linked to cognitive health decline. This study explored MSI as a function of glaucoma status and severity. METHODS MSI was assessed in 37 participants with open angle glaucoma relative to 18 age-matched healthy controls. The sound-induced flash illusion was used to assess MSI efficiency. Participants were presented with various combinations of simultaneous visual and/or auditory stimuli and were required to indicate the number of visual stimuli observed for each of the 96 total presentations. Central retinal sensitivity was assessed as an indicator of glaucoma severity (MAIA; CenterVue). RESULTS Participants with glaucoma performed with equivalent capacity to healthy controls on unisensory trials ( F1,53 =2.222, P =0.142). Both groups performed equivalently on congruent multisensory trials involving equal numbers of auditory and visual stimuli F1,53 =1.032, P =0.314). For incongruent presentations, that is, 2 beeps and 1 flash stimulus, individuals with glaucoma demonstrated a greater influence of the incongruent beeps when judging the number of flashes, indicating less efficient MSI relative to age-matched controls ( F1,53 =11.45, P <0.002). In addition, MSI performance was positively correlated with retinal sensitivity ( F3,49 =4.042, P <0.025), adjusted R ²=0.15). CONCLUSIONS Individuals with open angle glaucoma exhibited MSI deficiencies that relate to disease severity. The type of deficiencies observed were similar to those observed among older individuals with cognitive impairment and balance issues. Impaired MSI may, therefore, be relevant to the increased prevalence of falls observed among individuals with glaucoma, a concept that merits further investigation.
Collapse
Affiliation(s)
- Ekaterina Loskutova
- Centre for Eye Research Ireland, School of Physics, Clinical & Optometric Sciences, Technological University Dublin, Dublin, Ireland
| | - John S Butler
- Centre for Eye Research Ireland, School of Mathematical Sciences, Technological University Dublin, Dublin, Ireland
| | - Annalisa Setti
- School of Applied Psychology, University College Cork, Cork, Ireland
| | - Colm O'Brien
- Department of Ophthalmology, Mater Misericordiae University Hospital, Dublin, Ireland
| | - James Loughman
- Centre for Eye Research Ireland, School of Physics, Clinical & Optometric Sciences, Technological University Dublin, Dublin, Ireland
| |
Collapse
|
4
|
Sato M. Competing influence of visual speech on auditory neural adaptation. BRAIN AND LANGUAGE 2023; 247:105359. [PMID: 37951157 DOI: 10.1016/j.bandl.2023.105359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 09/25/2023] [Accepted: 11/06/2023] [Indexed: 11/13/2023]
Abstract
Visual information from a speaker's face enhances auditory neural processing and speech recognition. To determine whether auditory memory can be influenced by visual speech, the degree of auditory neural adaptation of an auditory syllable preceded by an auditory, visual, or audiovisual syllable was examined using EEG. Consistent with previous findings and additional adaptation of auditory neurons tuned to acoustic features, stronger adaptation of N1, P2 and N2 auditory evoked responses was observed when the auditory syllable was preceded by an auditory compared to a visual syllable. However, although stronger than when preceded by a visual syllable, lower adaptation was observed when the auditory syllable was preceded by an audiovisual compared to an auditory syllable. In addition, longer N1 and P2 latencies were then observed. These results further demonstrate that visual speech acts on auditory memory but suggest competing visual influences in the case of audiovisual stimulation.
Collapse
Affiliation(s)
- Marc Sato
- Laboratoire Parole et Langage, Centre National de la Recherche Scientifique, UMR 7309 CNRS & Aix-Marseille Université, Aix-Marseille Université, 5 avenue Pasteur, Aix-en-Provence, France.
| |
Collapse
|
5
|
Nidiffer AR, Cao CZ, O'Sullivan A, Lalor EC. A representation of abstract linguistic categories in the visual system underlies successful lipreading. Neuroimage 2023; 282:120391. [PMID: 37757989 DOI: 10.1016/j.neuroimage.2023.120391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 09/22/2023] [Accepted: 09/24/2023] [Indexed: 09/29/2023] Open
Abstract
There is considerable debate over how visual speech is processed in the absence of sound and whether neural activity supporting lipreading occurs in visual brain areas. Much of the ambiguity stems from a lack of behavioral grounding and neurophysiological analyses that cannot disentangle high-level linguistic and phonetic/energetic contributions from visual speech. To address this, we recorded EEG from human observers as they watched silent videos, half of which were novel and half of which were previously rehearsed with the accompanying audio. We modeled how the EEG responses to novel and rehearsed silent speech reflected the processing of low-level visual features (motion, lip movements) and a higher-level categorical representation of linguistic units, known as visemes. The ability of these visemes to account for the EEG - beyond the motion and lip movements - was significantly enhanced for rehearsed videos in a way that correlated with participants' trial-by-trial ability to lipread that speech. Source localization of viseme processing showed clear contributions from visual cortex, with no strong evidence for the involvement of auditory areas. We interpret this as support for the idea that the visual system produces its own specialized representation of speech that is (1) well-described by categorical linguistic features, (2) dissociable from lip movements, and (3) predictive of lipreading ability. We also suggest a reinterpretation of previous findings of auditory cortical activation during silent speech that is consistent with hierarchical accounts of visual and audiovisual speech perception.
Collapse
Affiliation(s)
- Aaron R Nidiffer
- Department of Biomedical Engineering, Department of Neuroscience, Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA
| | - Cody Zhewei Cao
- Department of Psychology, University of Michigan, Ann Arbor, MI, USA
| | - Aisling O'Sullivan
- School of Engineering, Trinity College Institute of Neuroscience, Trinity Centre for Biomedical Engineering, Trinity College, Dublin, Ireland
| | - Edmund C Lalor
- Department of Biomedical Engineering, Department of Neuroscience, Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA; School of Engineering, Trinity College Institute of Neuroscience, Trinity Centre for Biomedical Engineering, Trinity College, Dublin, Ireland.
| |
Collapse
|
6
|
Ahmed F, Nidiffer AR, O'Sullivan AE, Zuk NJ, Lalor EC. The integration of continuous audio and visual speech in a cocktail-party environment depends on attention. Neuroimage 2023; 274:120143. [PMID: 37121375 DOI: 10.1016/j.neuroimage.2023.120143] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 03/17/2023] [Accepted: 04/27/2023] [Indexed: 05/02/2023] Open
Abstract
In noisy environments, our ability to understand speech benefits greatly from seeing the speaker's face. This is attributed to the brain's ability to integrate audio and visual information, a process known as multisensory integration. In addition, selective attention plays an enormous role in what we understand, the so-called cocktail-party phenomenon. But how attention and multisensory integration interact remains incompletely understood, particularly in the case of natural, continuous speech. Here, we addressed this issue by analyzing EEG data recorded from participants who undertook a multisensory cocktail-party task using natural speech. To assess multisensory integration, we modeled the EEG responses to the speech in two ways. The first assumed that audiovisual speech processing is simply a linear combination of audio speech processing and visual speech processing (i.e., an A + V model), while the second allows for the possibility of audiovisual interactions (i.e., an AV model). Applying these models to the data revealed that EEG responses to attended audiovisual speech were better explained by an AV model, providing evidence for multisensory integration. In contrast, unattended audiovisual speech responses were best captured using an A + V model, suggesting that multisensory integration is suppressed for unattended speech. Follow up analyses revealed some limited evidence for early multisensory integration of unattended AV speech, with no integration occurring at later levels of processing. We take these findings as evidence that the integration of natural audio and visual speech occurs at multiple levels of processing in the brain, each of which can be differentially affected by attention.
Collapse
Affiliation(s)
- Farhin Ahmed
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY 14627, USA
| | - Aaron R Nidiffer
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY 14627, USA
| | - Aisling E O'Sullivan
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY 14627, USA; School of Engineering, Trinity Centre for Biomedical Engineering, and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland
| | - Nathaniel J Zuk
- Edmond & Lily Safra Center for Brain Sciences, Hebrew University, Jerusalem, Israel
| | - Edmund C Lalor
- Department of Biomedical Engineering, Department of Neuroscience, and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY 14627, USA; School of Engineering, Trinity Centre for Biomedical Engineering, and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland.
| |
Collapse
|
7
|
Saalasti S, Alho J, Lahnakoski JM, Bacha-Trams M, Glerean E, Jääskeläinen IP, Hasson U, Sams M. Lipreading a naturalistic narrative in a female population: Neural characteristics shared with listening and reading. Brain Behav 2023; 13:e2869. [PMID: 36579557 PMCID: PMC9927859 DOI: 10.1002/brb3.2869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 11/29/2022] [Accepted: 12/06/2022] [Indexed: 12/30/2022] Open
Abstract
INTRODUCTION Few of us are skilled lipreaders while most struggle with the task. Neural substrates that enable comprehension of connected natural speech via lipreading are not yet well understood. METHODS We used a data-driven approach to identify brain areas underlying the lipreading of an 8-min narrative with participants whose lipreading skills varied extensively (range 6-100%, mean = 50.7%). The participants also listened to and read the same narrative. The similarity between individual participants' brain activity during the whole narrative, within and between conditions, was estimated by a voxel-wise comparison of the Blood Oxygenation Level Dependent (BOLD) signal time courses. RESULTS Inter-subject correlation (ISC) of the time courses revealed that lipreading, listening to, and reading the narrative were largely supported by the same brain areas in the temporal, parietal and frontal cortices, precuneus, and cerebellum. Additionally, listening to and reading connected naturalistic speech particularly activated higher-level linguistic processing in the parietal and frontal cortices more consistently than lipreading, probably paralleling the limited understanding obtained via lip-reading. Importantly, higher lipreading test score and subjective estimate of comprehension of the lipread narrative was associated with activity in the superior and middle temporal cortex. CONCLUSIONS Our new data illustrates that findings from prior studies using well-controlled repetitive speech stimuli and stimulus-driven data analyses are also valid for naturalistic connected speech. Our results might suggest an efficient use of brain areas dealing with phonological processing in skilled lipreaders.
Collapse
Affiliation(s)
- Satu Saalasti
- Department of Psychology and Logopedics, University of Helsinki, Helsinki, Finland.,Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Advanced Magnetic Imaging (AMI) Centre, Aalto NeuroImaging, School of Science, Aalto University, Espoo, Finland
| | - Jussi Alho
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland
| | - Juha M Lahnakoski
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Independent Max Planck Research Group for Social Neuroscience, Max Planck Institute of Psychiatry, Munich, Germany.,Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Center Jülich, Jülich, Germany.,Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Mareike Bacha-Trams
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland
| | - Enrico Glerean
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, USA
| | - Iiro P Jääskeläinen
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland
| | - Uri Hasson
- Department of Psychology and the Neuroscience Institute, Princeton University, Princeton, USA
| | - Mikko Sams
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Aalto Studios - MAGICS, Aalto University, Espoo, Finland
| |
Collapse
|
8
|
Inceoglu S. Language Experience and Subjective Word Familiarity on the Multimodal Perception of Non-native Vowels. LANGUAGE AND SPEECH 2022; 65:173-192. [PMID: 34463597 DOI: 10.1177/0023830921998723] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The present study investigated native (L1) and non-native (L2) speakers' perception of the French vowels /ɔ̃, ɑ̃, ɛ̃, o/. Thirty-four American-English learners of French and 33 native speakers of Parisian French were asked to identify 60 monosyllabic words produced by a native speaker in three modalities of presentation: auditory-only (A-only); audiovisual (AV); and visual-only (V-only). The L2 participants also completed a vocabulary knowledge test of the words presented in the perception experiment that aimed to explore whether subjective word familiarity affected speech perception. Results showed that overall performance was better in the AV and A-only conditions for the two groups with the pattern of confusion differing across modalities. The lack of audiovisual benefit was not due to the vowel contrasts being not visually salient enough, as shown by the native group's performance in the V-only modality, but to the L2 group's weaker sensitivity to visual information. Additionally, a significant relationship was found between subjective word familiarity and AV and A-only (but not V-only) perception of non-native contrasts.
Collapse
|
9
|
Skirzewski M, Molotchnikoff S, Hernandez LF, Maya-Vetencourt JF. Multisensory Integration: Is Medial Prefrontal Cortex Signaling Relevant for the Treatment of Higher-Order Visual Dysfunctions? Front Mol Neurosci 2022; 14:806376. [PMID: 35110996 PMCID: PMC8801884 DOI: 10.3389/fnmol.2021.806376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Accepted: 12/17/2021] [Indexed: 11/29/2022] Open
Abstract
In the mammalian brain, information processing in sensory modalities and global mechanisms of multisensory integration facilitate perception. Emerging experimental evidence suggests that the contribution of multisensory integration to sensory perception is far more complex than previously expected. Here we revise how associative areas such as the prefrontal cortex, which receive and integrate inputs from diverse sensory modalities, can affect information processing in unisensory systems via processes of down-stream signaling. We focus our attention on the influence of the medial prefrontal cortex on the processing of information in the visual system and whether this phenomenon can be clinically used to treat higher-order visual dysfunctions. We propose that non-invasive and multisensory stimulation strategies such as environmental enrichment and/or attention-related tasks could be of clinical relevance to fight cerebral visual impairment.
Collapse
Affiliation(s)
- Miguel Skirzewski
- Rodent Cognition Research and Innovation Core, University of Western Ontario, London, ON, Canada
| | - Stéphane Molotchnikoff
- Département de Sciences Biologiques, Université de Montréal, Montreal, QC, Canada
- Département de Génie Electrique et Génie Informatique, Université de Sherbrooke, Sherbrooke, QC, Canada
| | - Luis F. Hernandez
- Knoebel Institute for Healthy Aging, University of Denver, Denver, CO, United States
| | - José Fernando Maya-Vetencourt
- Department of Biology, University of Pisa, Pisa, Italy
- Centre for Synaptic Neuroscience, Istituto Italiano di Tecnologia (IIT), Genova, Italy
- *Correspondence: José Fernando Maya-Vetencourt
| |
Collapse
|
10
|
Abstract
Coordination between different sensory systems is a necessary element of sensory processing. Where and how signals from different sense organs converge onto common neural circuitry have become topics of increasing interest in recent years. In this article, we focus specifically on visual-auditory interactions in areas of the mammalian brain that are commonly considered to be auditory in function. The auditory cortex and inferior colliculus are two key points of entry where visual signals reach the auditory pathway, and both contain visual- and/or eye movement-related signals in humans and other animals. The visual signals observed in these auditory structures reflect a mixture of visual modulation of auditory-evoked activity and visually driven responses that are selective for stimulus location or features. These key response attributes also appear in the classic visual pathway but may play a different role in the auditory pathway: to modify auditory rather than visual perception. Finally, while this review focuses on two particular areas of the auditory pathway where this question has been studied, robust descending as well as ascending connections within this pathway suggest that undiscovered visual signals may be present at other stages as well. Expected final online publication date for the Annual Review of Vision Science, Volume 7 is September 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Meredith N Schmehl
- Department of Neurobiology, Duke University, Durham, North Carolina 27708, USA; , .,Center for Cognitive Neuroscience, Duke University, Durham, North Carolina 27708, USA.,Duke Institute for Brain Sciences, Duke University, Durham, North Carolina 27708, USA
| | - Jennifer M Groh
- Department of Neurobiology, Duke University, Durham, North Carolina 27708, USA; , .,Department of Psychology & Neuroscience, Duke University, Durham, North Carolina 27708, USA.,Department of Computer Science, Duke University, Durham, North Carolina 27708, USA.,Department of Biomedical Engineering, Duke University, Durham, North Carolina 27708, USA.,Center for Cognitive Neuroscience, Duke University, Durham, North Carolina 27708, USA.,Duke Institute for Brain Sciences, Duke University, Durham, North Carolina 27708, USA
| |
Collapse
|
11
|
Pant R, Guerreiro MJS, Ley P, Bottari D, Shareef I, Kekunnaya R, Röder B. The size-weight illusion is unimpaired in individuals with a history of congenital visual deprivation. Sci Rep 2021; 11:6693. [PMID: 33758328 PMCID: PMC7988063 DOI: 10.1038/s41598-021-86227-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Accepted: 03/10/2021] [Indexed: 11/29/2022] Open
Abstract
Visual deprivation in childhood can lead to lifelong impairments in multisensory processing. Here, the Size-Weight Illusion (SWI) was used to test whether visuo-haptic integration recovers after early visual deprivation. Normally sighted individuals perceive larger objects to be lighter than smaller objects of the same weight. In Experiment 1, individuals treated for dense bilateral congenital cataracts (who had no patterned visual experience at birth), individuals treated for developmental cataracts (who had patterned visual experience at birth, but were visually impaired), congenitally blind individuals and normally sighted individuals had to rate the weight of manually explored cubes that differed in size (Small, Medium, Large) across two possible weights (350 g, 700 g). In Experiment 2, individuals treated for dense bilateral congenital cataracts were compared to sighted individuals in a similar task using a string set-up, which removed haptic size cues. In both experiments, indistinguishable SWI effects were observed across all groups. These results provide evidence that early aberrant vision does not interfere with the development of the SWI, and suggest a recovery of the integration of size and weight cues provided by the visual and haptic modality.
Collapse
Affiliation(s)
- Rashi Pant
- Biological Psychology and Neuropsychology, University of Hamburg, 20146, Hamburg, Germany.
| | - Maria J S Guerreiro
- Biological Psychology and Neuropsychology, University of Hamburg, 20146, Hamburg, Germany
| | - Pia Ley
- Biological Psychology and Neuropsychology, University of Hamburg, 20146, Hamburg, Germany
| | - Davide Bottari
- Biological Psychology and Neuropsychology, University of Hamburg, 20146, Hamburg, Germany.,Molecular Mind Lab, IMT School for Advanced Studies, 55100, Lucca, Italy
| | - Idris Shareef
- Child Sight Institute, Jasti V Ramanamma Children's Eye Care Center, LV Prasad Eye Institute, Hyderabad, Telangana, 500034, India
| | - Ramesh Kekunnaya
- Child Sight Institute, Jasti V Ramanamma Children's Eye Care Center, LV Prasad Eye Institute, Hyderabad, Telangana, 500034, India
| | - Brigitte Röder
- Biological Psychology and Neuropsychology, University of Hamburg, 20146, Hamburg, Germany
| |
Collapse
|
12
|
Anwyl-Irvine AL, Dalmaijer ES, Quinn AJ, Johnson A, Astle DE. Subjective SES is Associated with Children's Neurophysiological Response to Auditory Oddballs. Cereb Cortex Commun 2020; 2:tgaa092. [PMID: 34296147 PMCID: PMC8152887 DOI: 10.1093/texcom/tgaa092] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 11/24/2020] [Accepted: 11/24/2020] [Indexed: 12/05/2022] Open
Abstract
Language and reading acquisitions are strongly associated with a child's socioeconomic status (SES). There are a number of potential explanations for this relationship. We explore one potential explanation-a child's SES is associated with how children discriminate word-like sounds (i.e., phonological processing), a foundational skill for reading acquisition. Magnetoencephalography data from a sample of 71 children (aged 6 years and 11 months-12 years and 3 months), during a passive auditory oddball task containing word and nonword deviants, were used to test "where" (which sensors) and "when" (at what time) any association may occur. We also investigated associations between cognition, education, and this neurophysiological response. We report differences in the neural processing of word and nonword deviant tones at an early N200 component (likely representing early sensory processing) and a later P300 component (likely representing attentional and/or semantic processing). More interestingly we found "parental subjective" SES (the parents rating of their own relative affluence) was convincingly associated with later responses, but there were no significant associations with equivalized income. This suggests that the SES as rated by their parents is associated with underlying phonological detection skills. Furthermore, this correlation likely occurs at a later time point in information processing, associated with semantic and attentional processes. In contrast, household income is not significantly associated with these skills. One possibility is that the subjective assessment of SES is more impactful on neural mechanisms of phonological processing than the less complex and more objective measure of household income.
Collapse
Affiliation(s)
| | - Edwin S Dalmaijer
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, UK
| | - Andrew J Quinn
- Oxford Centre for Human Brain Activity, Wellcome Centre for Integrative Neuroimaging, Department of Psychiatry, University of Oxford, Oxford, OX3 7JX, UK
| | - Amy Johnson
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, UK
| | - Duncan E Astle
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, UK
| |
Collapse
|
13
|
Michon M, Boncompte G, López V. Electrophysiological Dynamics of Visual Speech Processing and the Role of Orofacial Effectors for Cross-Modal Predictions. Front Hum Neurosci 2020; 14:538619. [PMID: 33192386 PMCID: PMC7653187 DOI: 10.3389/fnhum.2020.538619] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 09/29/2020] [Indexed: 11/13/2022] Open
Abstract
The human brain generates predictions about future events. During face-to-face conversations, visemic information is used to predict upcoming auditory input. Recent studies suggest that the speech motor system plays a role in these cross-modal predictions, however, usually only audio-visual paradigms are employed. Here we tested whether speech sounds can be predicted on the basis of visemic information only, and to what extent interfering with orofacial articulatory effectors can affect these predictions. We registered EEG and employed N400 as an index of such predictions. Our results show that N400's amplitude was strongly modulated by visemic salience, coherent with cross-modal speech predictions. Additionally, N400 ceased to be evoked when syllables' visemes were presented backwards, suggesting that predictions occur only when the observed viseme matched an existing articuleme in the observer's speech motor system (i.e., the articulatory neural sequence required to produce a particular phoneme/viseme). Importantly, we found that interfering with the motor articulatory system strongly disrupted cross-modal predictions. We also observed a late P1000 that was evoked only for syllable-related visual stimuli, but whose amplitude was not modulated by interfering with the motor system. The present study provides further evidence of the importance of the speech production system for speech sounds predictions based on visemic information at the pre-lexical level. The implications of these results are discussed in the context of a hypothesized trimodal repertoire for speech, in which speech perception is conceived as a highly interactive process that involves not only your ears but also your eyes, lips and tongue.
Collapse
Affiliation(s)
- Maëva Michon
- Laboratorio de Neurociencia Cognitiva y Evolutiva, Escuela de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile
- Laboratorio de Neurociencia Cognitiva y Social, Facultad de Psicología, Universidad Diego Portales, Santiago, Chile
| | - Gonzalo Boncompte
- Laboratorio de Neurodinámicas de la Cognición, Escuela de Medicina, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Vladimir López
- Laboratorio de Psicología Experimental, Escuela de Psicología, Pontificia Universidad Católica de Chile, Santiago, Chile
| |
Collapse
|
14
|
Michaelis K, Erickson LC, Fama ME, Skipper-Kallal LM, Xing S, Lacey EH, Anbari Z, Norato G, Rauschecker JP, Turkeltaub PE. Effects of age and left hemisphere lesions on audiovisual integration of speech. BRAIN AND LANGUAGE 2020; 206:104812. [PMID: 32447050 PMCID: PMC7379161 DOI: 10.1016/j.bandl.2020.104812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 04/02/2020] [Accepted: 05/04/2020] [Indexed: 06/11/2023]
Abstract
Neuroimaging studies have implicated left temporal lobe regions in audiovisual integration of speech and inferior parietal regions in temporal binding of incoming signals. However, it remains unclear which regions are necessary for audiovisual integration, especially when the auditory and visual signals are offset in time. Aging also influences integration, but the nature of this influence is unresolved. We used a McGurk task to test audiovisual integration and sensitivity to the timing of audiovisual signals in two older adult groups: left hemisphere stroke survivors and controls. We observed a positive relationship between age and audiovisual speech integration in both groups, and an interaction indicating that lesions reduce sensitivity to timing offsets between signals. Lesion-symptom mapping demonstrated that damage to the left supramarginal gyrus and planum temporale reduces temporal acuity in audiovisual speech perception. This suggests that a process mediated by these structures identifies asynchronous audiovisual signals that should not be integrated.
Collapse
Affiliation(s)
- Kelly Michaelis
- Neurology Department and Center for Brain Plasticity and Recovery, Georgetown University Medical Center, Washington DC, USA
| | - Laura C Erickson
- Neurology Department and Center for Brain Plasticity and Recovery, Georgetown University Medical Center, Washington DC, USA; Neuroscience Department, Georgetown University Medical Center, Washington DC, USA
| | - Mackenzie E Fama
- Neurology Department and Center for Brain Plasticity and Recovery, Georgetown University Medical Center, Washington DC, USA; Department of Speech-Language Pathology & Audiology, Towson University, Towson, MD, USA
| | - Laura M Skipper-Kallal
- Neurology Department and Center for Brain Plasticity and Recovery, Georgetown University Medical Center, Washington DC, USA
| | - Shihui Xing
- Neurology Department and Center for Brain Plasticity and Recovery, Georgetown University Medical Center, Washington DC, USA; Department of Neurology, First Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - Elizabeth H Lacey
- Neurology Department and Center for Brain Plasticity and Recovery, Georgetown University Medical Center, Washington DC, USA; Research Division, MedStar National Rehabilitation Hospital, Washington DC, USA
| | - Zainab Anbari
- Neurology Department and Center for Brain Plasticity and Recovery, Georgetown University Medical Center, Washington DC, USA
| | - Gina Norato
- Clinical Trials Unit, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Josef P Rauschecker
- Neuroscience Department, Georgetown University Medical Center, Washington DC, USA
| | - Peter E Turkeltaub
- Neurology Department and Center for Brain Plasticity and Recovery, Georgetown University Medical Center, Washington DC, USA; Research Division, MedStar National Rehabilitation Hospital, Washington DC, USA.
| |
Collapse
|
15
|
Randazzo M, Priefer R, Smith PJ, Nagler A, Avery T, Froud K. Neural Correlates of Modality-Sensitive Deviance Detection in the Audiovisual Oddball Paradigm. Brain Sci 2020; 10:brainsci10060328. [PMID: 32481538 PMCID: PMC7348766 DOI: 10.3390/brainsci10060328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Revised: 05/15/2020] [Accepted: 05/25/2020] [Indexed: 11/16/2022] Open
Abstract
The McGurk effect, an incongruent pairing of visual /ga/–acoustic /ba/, creates a fusion illusion /da/ and is the cornerstone of research in audiovisual speech perception. Combination illusions occur given reversal of the input modalities—auditory /ga/-visual /ba/, and percept /bga/. A robust literature shows that fusion illusions in an oddball paradigm evoke a mismatch negativity (MMN) in the auditory cortex, in absence of changes to acoustic stimuli. We compared fusion and combination illusions in a passive oddball paradigm to further examine the influence of visual and auditory aspects of incongruent speech stimuli on the audiovisual MMN. Participants viewed videos under two audiovisual illusion conditions: fusion with visual aspect of the stimulus changing, and combination with auditory aspect of the stimulus changing, as well as two unimodal auditory- and visual-only conditions. Fusion and combination deviants exerted similar influence in generating congruency predictions with significant differences between standards and deviants in the N100 time window. Presence of the MMN in early and late time windows differentiated fusion from combination deviants. When the visual signal changes, a new percept is created, but when the visual is held constant and the auditory changes, the response is suppressed, evoking a later MMN. In alignment with models of predictive processing in audiovisual speech perception, we interpreted our results to indicate that visual information can both predict and suppress auditory speech perception.
Collapse
Affiliation(s)
- Melissa Randazzo
- Department of Communication Sciences and Disorders, Adelphi University, Garden City, NY 11530, USA; (R.P.); (A.N.)
- Correspondence: ; Tel.: +1-516-877-4769
| | - Ryan Priefer
- Department of Communication Sciences and Disorders, Adelphi University, Garden City, NY 11530, USA; (R.P.); (A.N.)
| | - Paul J. Smith
- Neuroscience and Education, Department of Biobehavioral Sciences, Teachers College, Columbia University, New York, NY 10027, USA; (P.J.S.); (T.A.); (K.F.)
| | - Amanda Nagler
- Department of Communication Sciences and Disorders, Adelphi University, Garden City, NY 11530, USA; (R.P.); (A.N.)
| | - Trey Avery
- Neuroscience and Education, Department of Biobehavioral Sciences, Teachers College, Columbia University, New York, NY 10027, USA; (P.J.S.); (T.A.); (K.F.)
| | - Karen Froud
- Neuroscience and Education, Department of Biobehavioral Sciences, Teachers College, Columbia University, New York, NY 10027, USA; (P.J.S.); (T.A.); (K.F.)
| |
Collapse
|
16
|
Plumridge JMA, Barham MP, Foley DL, Ware AT, Clark GM, Albein-Urios N, Hayden MJ, Lum JAG. The Effect of Visual Articulatory Information on the Neural Correlates of Non-native Speech Sound Discrimination. Front Hum Neurosci 2020; 14:25. [PMID: 32116609 PMCID: PMC7019039 DOI: 10.3389/fnhum.2020.00025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Accepted: 01/20/2020] [Indexed: 11/13/2022] Open
Abstract
Behavioral studies have shown that the ability to discriminate between non-native speech sounds improves after seeing how the sounds are articulated. This study examined the influence of visual articulatory information on the neural correlates of non-native speech sound discrimination. English speakers’ discrimination of the Hindi dental and retroflex sounds was measured using the mismatch negativity (MMN) event-related potential, before and after they completed one of three 8-min training conditions. In an audio-visual speech training condition (n = 14), each sound was presented with its corresponding visual articulation. In one control condition (n = 14), both sounds were presented with the same visual articulation, resulting in one congruent and one incongruent audio-visual pairing. In another control condition (n = 14), both sounds were presented with the same image of a still face. The control conditions aimed to rule out the possibility that the MMN is influenced by non-specific audio-visual pairings, or by general exposure to the dental and retroflex sounds over the course of the study. The results showed that audio-visual speech training reduced the latency of the MMN but did not affect MMN amplitude. No change in MMN amplitude or latency was observed for the two control conditions. The pattern of results suggests that a relatively short audio-visual speech training session (i.e., 8 min) may increase the speed with which the brain processes non-native speech sound contrasts. The absence of a training effect on MMN amplitude suggests a single session of audio-visual speech training does not lead to the formation of more discrete memory traces for non-native speech sounds. Longer and/or multiple sessions might be needed to influence the MMN amplitude.
Collapse
Affiliation(s)
- James M A Plumridge
- Cognitive Neuroscience Unit, School of Psychology, Deakin University, Geelong, VIC, Australia
| | - Michael P Barham
- Cognitive Neuroscience Unit, School of Psychology, Deakin University, Geelong, VIC, Australia
| | - Denise L Foley
- Cognitive Neuroscience Unit, School of Psychology, Deakin University, Geelong, VIC, Australia
| | - Anna T Ware
- Cognitive Neuroscience Unit, School of Psychology, Deakin University, Geelong, VIC, Australia
| | - Gillian M Clark
- Cognitive Neuroscience Unit, School of Psychology, Deakin University, Geelong, VIC, Australia
| | - Natalia Albein-Urios
- Cognitive Neuroscience Unit, School of Psychology, Deakin University, Geelong, VIC, Australia
| | - Melissa J Hayden
- Cognitive Neuroscience Unit, School of Psychology, Deakin University, Geelong, VIC, Australia
| | - Jarrad A G Lum
- Cognitive Neuroscience Unit, School of Psychology, Deakin University, Geelong, VIC, Australia
| |
Collapse
|
17
|
Kolozsvári OB, Xu W, Leppänen PHT, Hämäläinen JA. Top-Down Predictions of Familiarity and Congruency in Audio-Visual Speech Perception at Neural Level. Front Hum Neurosci 2019; 13:243. [PMID: 31354459 PMCID: PMC6639789 DOI: 10.3389/fnhum.2019.00243] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Accepted: 06/28/2019] [Indexed: 11/13/2022] Open
Abstract
During speech perception, listeners rely on multimodal input and make use of both auditory and visual information. When presented with speech, for example syllables, the differences in brain responses to distinct stimuli are not, however, caused merely by the acoustic or visual features of the stimuli. The congruency of the auditory and visual information and the familiarity of a syllable, that is, whether it appears in the listener's native language or not, also modulates brain responses. We investigated how the congruency and familiarity of the presented stimuli affect brain responses to audio-visual (AV) speech in 12 adult Finnish native speakers and 12 adult Chinese native speakers. They watched videos of a Chinese speaker pronouncing syllables (/pa/, /pha/, /ta/, /tha/, /fa/) during a magnetoencephalography (MEG) measurement where only /pa/ and /ta/ were part of Finnish phonology while all the stimuli were part of Chinese phonology. The stimuli were presented in audio-visual (congruent or incongruent), audio only, or visual only conditions. The brain responses were examined in five time-windows: 75-125, 150-200, 200-300, 300-400, and 400-600 ms. We found significant differences for the congruency comparison in the fourth time-window (300-400 ms) in both sensor and source level analysis. Larger responses were observed for the incongruent stimuli than for the congruent stimuli. For the familiarity comparisons no significant differences were found. The results are in line with earlier studies reporting on the modulation of brain responses for audio-visual congruency around 250-500 ms. This suggests a much stronger process for the general detection of a mismatch between predictions based on lip movements and the auditory signal than for the top-down modulation of brain responses based on phonological information.
Collapse
Affiliation(s)
- Orsolya B Kolozsvári
- Department of Psychology, University of Jyväskylä, Jyväskylä, Finland.,Jyväskylä Centre for Interdisciplinary Brain Research (CIBR), University of Jyväskylä, Jyväskylä, Finland
| | - Weiyong Xu
- Department of Psychology, University of Jyväskylä, Jyväskylä, Finland.,Jyväskylä Centre for Interdisciplinary Brain Research (CIBR), University of Jyväskylä, Jyväskylä, Finland
| | - Paavo H T Leppänen
- Department of Psychology, University of Jyväskylä, Jyväskylä, Finland.,Jyväskylä Centre for Interdisciplinary Brain Research (CIBR), University of Jyväskylä, Jyväskylä, Finland
| | - Jarmo A Hämäläinen
- Department of Psychology, University of Jyväskylä, Jyväskylä, Finland.,Jyväskylä Centre for Interdisciplinary Brain Research (CIBR), University of Jyväskylä, Jyväskylä, Finland
| |
Collapse
|
18
|
Discussion of the Relation between Initial Time Delay Gap (ITDG) and Acoustical Intimacy: Leo Beranek’s Final Thoughts on the Subject, Documented. ACOUSTICS 2019. [DOI: 10.3390/acoustics1030032] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Current discussions on the objective attributes contributing to concert hall quality started formally in 1962 with the publication of Leo Beranek’s book “Music, Acoustics, and Architecture”. From his consulting work in the late 1950s, Beranek determined that in narrow halls, the short early delay times were an important factor in quality. Needing a measurable acoustical factor, rather than a dimensional one, he chose to define the initial time delay gap (ITDG) for a specific location near the middle of the hall’s main floor. Many acousticians failed to understand the simplicity of this proposal. Beranek had learned that long first delays sounded “arena-like” and “remote”, and, thus, not “intimate”. This bolstered his belief that ITDG was an important objective factor he decided to call “intimacy”. Most acoustical parameters can be directly measured and sensed by the listener, such as reverberation decay, sound strength, clarity. “Intimacy” however is a feeling, and over the past two decades, it has become apparent that it is a multisensory attribute influenced by visual input and perhaps other factors. [J.R. Hyde, Proc. IOA, London, July 2002, Volume 24, Pt. 4, “Acoustical Intimacy in Concert Halls: Does Visual Input affect the Aural Experience”?] Beranek’s paper “Comments on “intimacy” and ITDG concepts in musical performing spaces”, [JASA 115, 2403 (2004)] finally acknowledged the multisensory aspects of “intimacy” and stated this choice of the word “may have been unfortunate”. He further separated the term “intimacy” from ITDG. Documentation of this pronouncement will be provided in the paper.
Collapse
|
19
|
Lindborg A, Baart M, Stekelenburg JJ, Vroomen J, Andersen TS. Speech-specific audiovisual integration modulates induced theta-band oscillations. PLoS One 2019; 14:e0219744. [PMID: 31310616 PMCID: PMC6634411 DOI: 10.1371/journal.pone.0219744] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Accepted: 07/02/2019] [Indexed: 11/18/2022] Open
Abstract
Speech perception is influenced by vision through a process of audiovisual integration. This is demonstrated by the McGurk illusion where visual speech (for example /ga/) dubbed with incongruent auditory speech (such as /ba/) leads to a modified auditory percept (/da/). Recent studies have indicated that perception of the incongruent speech stimuli used in McGurk paradigms involves mechanisms of both general and audiovisual speech specific mismatch processing and that general mismatch processing modulates induced theta-band (4–8 Hz) oscillations. Here, we investigated whether the theta modulation merely reflects mismatch processing or, alternatively, audiovisual integration of speech. We used electroencephalographic recordings from two previously published studies using audiovisual sine-wave speech (SWS), a spectrally degraded speech signal sounding nonsensical to naïve perceivers but perceived as speech by informed subjects. Earlier studies have shown that informed, but not naïve subjects integrate SWS phonetically with visual speech. In an N1/P2 event-related potential paradigm, we found a significant difference in theta-band activity between informed and naïve perceivers of audiovisual speech, suggesting that audiovisual integration modulates induced theta-band oscillations. In a McGurk mismatch negativity paradigm (MMN) where infrequent McGurk stimuli were embedded in a sequence of frequent audio-visually congruent stimuli we found no difference between congruent and McGurk stimuli. The infrequent stimuli in this paradigm are violating both the general prediction of stimulus content, and that of audiovisual congruence. Hence, we found no support for the hypothesis that audiovisual mismatch modulates induced theta-band oscillations. We also did not find any effects of audiovisual integration in the MMN paradigm, possibly due to the experimental design.
Collapse
Affiliation(s)
- Alma Lindborg
- Section for Cognitive Systems, DTU Compute, Technical University of Denmark, Lyngby, Denmark
| | - Martijn Baart
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands.,BCBL. Basque Center on Cognition, Brain and Language, Donostia, Spain
| | - Jeroen J Stekelenburg
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| | - Jean Vroomen
- Department of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| | - Tobias S Andersen
- Section for Cognitive Systems, DTU Compute, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
20
|
Abstract
Speech research during recent years has moved progressively away from its traditional focus on audition toward a more multisensory approach. In addition to audition and vision, many somatosenses including proprioception, pressure, vibration and aerotactile sensation are all highly relevant modalities for experiencing and/or conveying speech. In this article, we review both long-standing cross-modal effects stemming from decades of audiovisual speech research as well as new findings related to somatosensory effects. Cross-modal effects in speech perception to date are found to be constrained by temporal congruence and signal relevance, but appear to be unconstrained by spatial congruence. Far from taking place in a one-, two- or even three-dimensional space, the literature reveals that speech occupies a highly multidimensional sensory space. We argue that future research in cross-modal effects should expand to consider each of these modalities both separately and in combination with other modalities in speech.
Collapse
Affiliation(s)
- Megan Keough
- Interdisciplinary Speech Research Lab, Department of Linguistics, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| | - Donald Derrick
- New Zealand Institute of Brain and Behaviour, University of Canterbury, Christchurch 8140, New Zealand
- MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Penrith, New South Wales 2751, Australia
| | - Bryan Gick
- Interdisciplinary Speech Research Lab, Department of Linguistics, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
- Haskins Laboratories, Yale University, New Haven, CT 06511, USA
| |
Collapse
|
21
|
Modality-independent recruitment of inferior frontal cortex during speech processing in human infants. Dev Cogn Neurosci 2018; 34:130-138. [PMID: 30391756 PMCID: PMC6969291 DOI: 10.1016/j.dcn.2018.10.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2017] [Revised: 08/25/2018] [Accepted: 10/25/2018] [Indexed: 11/22/2022] Open
Abstract
Despite increasing interest in the development of audiovisual speech perception in infancy, the underlying mechanisms and neural processes are still only poorly understood. In addition to regions in temporal cortex associated with speech processing and multimodal integration, such as superior temporal sulcus, left inferior frontal cortex (IFC) has been suggested to be critically involved in mapping information from different modalities during speech perception. To further illuminate the role of IFC during infant language learning and speech perception, the current study examined the processing of auditory, visual and audiovisual speech in 6-month-old infants using functional near-infrared spectroscopy (fNIRS). Our results revealed that infants recruit speech-sensitive regions in frontal cortex including IFC regardless of whether they processed unimodal or multimodal speech. We argue that IFC may play an important role in associating multimodal speech information during the early steps of language learning.
Collapse
|
22
|
Proverbio AM, Raso G, Zani A. Electrophysiological Indexes of Incongruent Audiovisual Phonemic Processing: Unraveling the McGurk Effect. Neuroscience 2018; 385:215-226. [PMID: 29932985 DOI: 10.1016/j.neuroscience.2018.06.021] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2017] [Revised: 06/11/2018] [Accepted: 06/12/2018] [Indexed: 11/15/2022]
Abstract
In this study the timing of electromagnetic signals recorded during incongruent and congruent audiovisual (AV) stimulation in 14 Italian healthy volunteers was examined. In a previous study (Proverbio et al., 2016) we investigated the McGurk effect in the Italian language and found out which visual and auditory inputs provided the most compelling illusory effects (e.g., bilabial phonemes presented acoustically and paired with non-labials, especially alveolar-nasal and velar-occlusive phonemes). In this study EEG was recorded from 128 scalp sites while participants observed a female and a male actor uttering 288 syllables selected on the basis of the previous investigation (lasting approximately 600 ms) and responded to rare targets (/re/, /ri/, /ro/, /ru/). In half of the cases the AV information was incongruent, except for targets that were always congruent. A pMMN (phonological Mismatch Negativity) to incongruent AV stimuli was identified 500 ms after voice onset time. This automatic response indexed the detection of an incongruity between the labial and phonetic information. SwLORETA (Low-Resolution Electromagnetic Tomography) analysis applied to the difference voltage incongruent-congruent in the same time window revealed that the strongest sources of this activity were the right superior temporal (STG) and superior frontal gyri, which supports their involvement in AV integration.
Collapse
Affiliation(s)
- Alice Mado Proverbio
- Neuro-Mi Center for Neuroscience, Dept. of Psychology, University of Milano-Bicocca, Italy.
| | - Giulia Raso
- Neuro-Mi Center for Neuroscience, Dept. of Psychology, University of Milano-Bicocca, Italy
| | | |
Collapse
|
23
|
Zhang J, Meng Y, McBride C, Fan X, Yuan Z. Combining Behavioral and ERP Methodologies to Investigate the Differences Between McGurk Effects Demonstrated by Cantonese and Mandarin Speakers. Front Hum Neurosci 2018; 12:181. [PMID: 29780312 PMCID: PMC5945971 DOI: 10.3389/fnhum.2018.00181] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2017] [Accepted: 04/17/2018] [Indexed: 11/13/2022] Open
Abstract
The present study investigated the impact of Chinese dialects on McGurk effect using behavioral and event-related potential (ERP) methodologies. Specifically, intra-language comparison of McGurk effect was conducted between Mandarin and Cantonese speakers. The behavioral results showed that Cantonese speakers exhibited a stronger McGurk effect in audiovisual speech perception compared to Mandarin speakers, although both groups performed equally in the auditory and visual conditions. ERP results revealed that Cantonese speakers were more sensitive to visual cues than Mandarin speakers, though this was not the case for the auditory cues. Taken together, the current findings suggest that the McGurk effect generated by Chinese speakers is mainly influenced by segmental phonology during audiovisual speech integration.
Collapse
Affiliation(s)
- Juan Zhang
- Faculty of Education, University of Macau, Macau, China
| | - Yaxuan Meng
- Faculty of Education, University of Macau, Macau, China
| | - Catherine McBride
- Department of Psychology, The Chinese University of Hong Kong, Shatin, Hong Kong
| | - Xitao Fan
- School of Humanities and Social Science, The Chinese University of Hong Kong, Shenzhen, Shenzhen, China
| | - Zhen Yuan
- Faculty of Health Sciences, University of Macau, Macau, China
| |
Collapse
|
24
|
Hauswald A, Lithari C, Collignon O, Leonardelli E, Weisz N. A Visual Cortical Network for Deriving Phonological Information from Intelligible Lip Movements. Curr Biol 2018; 28:1453-1459.e3. [PMID: 29681475 PMCID: PMC5956463 DOI: 10.1016/j.cub.2018.03.044] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2018] [Revised: 02/25/2018] [Accepted: 03/20/2018] [Indexed: 11/26/2022]
Abstract
Successful lip-reading requires a mapping from visual to phonological information [1]. Recently, visual and motor cortices have been implicated in tracking lip movements (e.g., [2]). It remains unclear, however, whether visuo-phonological mapping occurs already at the level of the visual cortex-that is, whether this structure tracks the acoustic signal in a functionally relevant manner. To elucidate this, we investigated how the cortex tracks (i.e., entrains to) absent acoustic speech signals carried by silent lip movements. Crucially, we contrasted the entrainment to unheard forward (intelligible) and backward (unintelligible) acoustic speech. We observed that the visual cortex exhibited stronger entrainment to the unheard forward acoustic speech envelope compared to the unheard backward acoustic speech envelope. Supporting the notion of a visuo-phonological mapping process, this forward-backward difference of occipital entrainment was not present for actually observed lip movements. Importantly, the respective occipital region received more top-down input, especially from left premotor, primary motor, and somatosensory regions and, to a lesser extent, also from posterior temporal cortex. Strikingly, across participants, the extent of top-down modulation of the visual cortex stemming from these regions partially correlated with the strength of entrainment to absent acoustic forward speech envelope, but not to present forward lip movements. Our findings demonstrate that a distributed cortical network, including key dorsal stream auditory regions [3-5], influences how the visual cortex shows sensitivity to the intelligibility of speech while tracking silent lip movements.
Collapse
Affiliation(s)
- Anne Hauswald
- Centre for Cognitive Neurosciences, University of Salzburg, Salzburg 5020, Austria; CIMeC, Center for Mind/Brain Sciences, Università degli studi di Trento, Trento 38123, Italy.
| | - Chrysa Lithari
- Centre for Cognitive Neurosciences, University of Salzburg, Salzburg 5020, Austria; CIMeC, Center for Mind/Brain Sciences, Università degli studi di Trento, Trento 38123, Italy
| | - Olivier Collignon
- CIMeC, Center for Mind/Brain Sciences, Università degli studi di Trento, Trento 38123, Italy; Institute of Research in Psychology & Institute of NeuroScience, Université catholique de Louvain, Louvain 1348, Belgium
| | - Elisa Leonardelli
- CIMeC, Center for Mind/Brain Sciences, Università degli studi di Trento, Trento 38123, Italy
| | - Nathan Weisz
- Centre for Cognitive Neurosciences, University of Salzburg, Salzburg 5020, Austria; CIMeC, Center for Mind/Brain Sciences, Università degli studi di Trento, Trento 38123, Italy.
| |
Collapse
|
25
|
Stekelenburg JJ, Keetels M, Vroomen J. Multisensory integration of speech sounds with letters vs. visual speech: only visual speech induces the mismatch negativity. Eur J Neurosci 2018. [PMID: 29537657 PMCID: PMC5969231 DOI: 10.1111/ejn.13908] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Numerous studies have demonstrated that the vision of lip movements can alter the perception of auditory speech syllables (McGurk effect). While there is ample evidence for integration of text and auditory speech, there are only a few studies on the orthographic equivalent of the McGurk effect. Here, we examined whether written text, like visual speech, can induce an illusory change in the perception of speech sounds on both the behavioural and neural levels. In a sound categorization task, we found that both text and visual speech changed the identity of speech sounds from an /aba/-/ada/ continuum, but the size of this audiovisual effect was considerably smaller for text than visual speech. To examine at which level in the information processing hierarchy these multisensory interactions occur, we recorded electroencephalography in an audiovisual mismatch negativity (MMN, a component of the event-related potential reflecting preattentive auditory change detection) paradigm in which deviant text or visual speech was used to induce an illusory change in a sequence of ambiguous sounds halfway between /aba/ and /ada/. We found that only deviant visual speech induced an MMN, but not deviant text, which induced a late P3-like positive potential. These results demonstrate that text has much weaker effects on sound processing than visual speech does, possibly because text has different biological roots than visual speech.
Collapse
Affiliation(s)
- Jeroen J Stekelenburg
- Department of Cognitive Neuropsychology, Tilburg University, Warandelaan 2, PO box 90153, 5000 LE, Tilburg, the Netherlands
| | - Mirjam Keetels
- Department of Cognitive Neuropsychology, Tilburg University, Warandelaan 2, PO box 90153, 5000 LE, Tilburg, the Netherlands
| | - Jean Vroomen
- Department of Cognitive Neuropsychology, Tilburg University, Warandelaan 2, PO box 90153, 5000 LE, Tilburg, the Netherlands
| |
Collapse
|
26
|
Abstract
While audiovisual integration is well known in speech perception, faces and speech are also informative with respect to speaker recognition. To date, audiovisual integration in the recognition of familiar people has never been demonstrated. Here we show systematic benefits and costs for the recognition of familiar voices when these are combined with time-synchronized articulating faces, of corresponding or noncorresponding speaker identity, respectively. While these effects were strong for familiar voices, they were smaller or nonsignificant for unfamiliar voices, suggesting that the effects depend on the previous creation of a multimodal representation of a person's identity. Moreover, the effects were reduced or eliminated when voices were combined with the same faces presented as static pictures, demonstrating that the effects do not simply reflect the use of facial identity as a “cue” for voice recognition. This is the first direct evidence for audiovisual integration in person recognition.
Collapse
|
27
|
Neural Mechanisms Underlying Cross-Modal Phonetic Encoding. J Neurosci 2017; 38:1835-1849. [PMID: 29263241 DOI: 10.1523/jneurosci.1566-17.2017] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Revised: 11/17/2017] [Accepted: 12/08/2017] [Indexed: 11/21/2022] Open
Abstract
Audiovisual (AV) integration is essential for speech comprehension, especially in adverse listening situations. Divergent, but not mutually exclusive, theories have been proposed to explain the neural mechanisms underlying AV integration. One theory advocates that this process occurs via interactions between the auditory and visual cortices, as opposed to fusion of AV percepts in a multisensory integrator. Building upon this idea, we proposed that AV integration in spoken language reflects visually induced weighting of phonetic representations at the auditory cortex. EEG was recorded while male and female human subjects watched and listened to videos of a speaker uttering consonant vowel (CV) syllables /ba/ and /fa/, presented in Auditory-only, AV congruent or incongruent contexts. Subjects reported whether they heard /ba/ or /fa/. We hypothesized that vision alters phonetic encoding by dynamically weighting which phonetic representation in the auditory cortex is strengthened or weakened. That is, when subjects are presented with visual /fa/ and acoustic /ba/ and hear /fa/ (illusion-fa), the visual input strengthens the weighting of the phone /f/ representation. When subjects are presented with visual /ba/ and acoustic /fa/ and hear /ba/ (illusion-ba), the visual input weakens the weighting of the phone /f/ representation. Indeed, we found an enlarged N1 auditory evoked potential when subjects perceived illusion-ba, and a reduced N1 when they perceived illusion-fa, mirroring the N1 behavior for /ba/ and /fa/ in Auditory-only settings. These effects were especially pronounced in individuals with more robust illusory perception. These findings provide evidence that visual speech modifies phonetic encoding at the auditory cortex.SIGNIFICANCE STATEMENT The current study presents evidence that audiovisual integration in spoken language occurs when one modality (vision) acts on representations of a second modality (audition). Using the McGurk illusion, we show that visual context primes phonetic representations at the auditory cortex, altering the auditory percept, evidenced by changes in the N1 auditory evoked potential. This finding reinforces the theory that audiovisual integration occurs via visual networks influencing phonetic representations in the auditory cortex. We believe that this will lead to the generation of new hypotheses regarding cross-modal mapping, particularly whether it occurs via direct or indirect routes (e.g., via a multisensory mediator).
Collapse
|
28
|
Mismatch negativity (MMN) to speech sounds is modulated systematically by manual grip execution. Neurosci Lett 2017; 651:237-241. [PMID: 28504120 DOI: 10.1016/j.neulet.2017.05.024] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2017] [Revised: 04/21/2017] [Accepted: 05/10/2017] [Indexed: 11/23/2022]
Abstract
Manual actions and speech are connected: for example, grip execution can influence simultaneous vocalizations and vice versa. Our previous studies show that the consonant [k] is associated with the power grip and the consonant [t] with the precision grip. Here we studied whether the interaction between speech sounds and grips could operate already at a pre-attentive stage of auditory processing, reflected by the mismatch-negativity (MMN) component of the event-related potential (ERP). Participants executed power and precision grips according to visual cues while listening to syllable sequences consisting of [ke] and [te] utterances. The grips modulated the MMN amplitudes to these syllables in a systematic manner so that when the deviant was [ke], the MMN response was larger with a precision grip than with a power grip. There was a converse trend when the deviant was [te]. These results suggest that manual gestures and speech can interact already at a pre-attentive processing level of auditory perception, and show, for the first time that manual actions can systematically modulate the MMN.
Collapse
|
29
|
Irwin J, DiBlasi L. Audiovisual speech perception: A new approach and implications for clinical populations. LANGUAGE AND LINGUISTICS COMPASS 2017; 11:77-91. [PMID: 29520300 PMCID: PMC5839512 DOI: 10.1111/lnc3.12237] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2015] [Accepted: 01/25/2017] [Indexed: 06/01/2023]
Abstract
This selected overview of audiovisual (AV) speech perception examines the influence of visible articulatory information on what is heard. Thought to be a cross-cultural phenomenon that emerges early in typical language development, variables that influence AV speech perception include properties of the visual and the auditory signal, attentional demands, and individual differences. A brief review of the existing neurobiological evidence on how visual information influences heard speech indicates potential loci, timing, and facilitatory effects of AV over auditory only speech. The current literature on AV speech in certain clinical populations (individuals with an autism spectrum disorder, developmental language disorder, or hearing loss) reveals differences in processing that may inform interventions. Finally, a new method of assessing AV speech that does not require obvious cross-category mismatch or auditory noise was presented as a novel approach for investigators.
Collapse
Affiliation(s)
- Julia Irwin
- LEARN Center, Haskins Laboratories Inc., USA
| | | |
Collapse
|
30
|
Affiliation(s)
- Stefan R. Schweinberger
- Department of General Psychology, Friedrich Schiller University and DFG Research Unit Person Perception, Jena, Germany
| | - David M.C. Robertson
- Department of General Psychology, Friedrich Schiller University and DFG Research Unit Person Perception, Jena, Germany
| |
Collapse
|
31
|
O'Sullivan AE, Crosse MJ, Di Liberto GM, Lalor EC. Visual Cortical Entrainment to Motion and Categorical Speech Features during Silent Lipreading. Front Hum Neurosci 2017; 10:679. [PMID: 28123363 PMCID: PMC5225113 DOI: 10.3389/fnhum.2016.00679] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Accepted: 12/20/2016] [Indexed: 11/13/2022] Open
Abstract
Speech is a multisensory percept, comprising an auditory and visual component. While the content and processing pathways of audio speech have been well characterized, the visual component is less well understood. In this work, we expand current methodologies using system identification to introduce a framework that facilitates the study of visual speech in its natural, continuous form. Specifically, we use models based on the unheard acoustic envelope (E), the motion signal (M) and categorical visual speech features (V) to predict EEG activity during silent lipreading. Our results show that each of these models performs similarly at predicting EEG in visual regions and that respective combinations of the individual models (EV, MV, EM and EMV) provide an improved prediction of the neural activity over their constituent models. In comparing these different combinations, we find that the model incorporating all three types of features (EMV) outperforms the individual models, as well as both the EV and MV models, while it performs similarly to the EM model. Importantly, EM does not outperform EV and MV, which, considering the higher dimensionality of the V model, suggests that more data is needed to clarify this finding. Nevertheless, the performance of EMV, and comparisons of the subject performances for the three individual models, provides further evidence to suggest that visual regions are involved in both low-level processing of stimulus dynamics and categorical speech perception. This framework may prove useful for investigating modality-specific processing of visual speech under naturalistic conditions.
Collapse
Affiliation(s)
- Aisling E O'Sullivan
- School of Engineering, Trinity College DublinDublin, Ireland; Trinity Centre for Bioengineering, Trinity College DublinDublin, Ireland
| | - Michael J Crosse
- Department of Pediatrics and Department of Neuroscience, Albert Einstein College of Medicine Bronx, NY, USA
| | - Giovanni M Di Liberto
- School of Engineering, Trinity College DublinDublin, Ireland; Trinity Centre for Bioengineering, Trinity College DublinDublin, Ireland
| | - Edmund C Lalor
- School of Engineering, Trinity College DublinDublin, Ireland; Trinity Centre for Bioengineering, Trinity College DublinDublin, Ireland; Trinity College Institute of Neuroscience, Trinity College DublinDublin, Ireland; Department of Biomedical Engineering and Department of Neuroscience, University of RochesterRochester, NY, USA
| |
Collapse
|
32
|
Salmi J, Koistinen OP, Glerean E, Jylänki P, Vehtari A, Jääskeläinen IP, Mäkelä S, Nummenmaa L, Nummi-Kuisma K, Nummi I, Sams M. Distributed neural signatures of natural audiovisual speech and music in the human auditory cortex. Neuroimage 2016; 157:108-117. [PMID: 27932074 DOI: 10.1016/j.neuroimage.2016.12.005] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2016] [Revised: 11/02/2016] [Accepted: 12/03/2016] [Indexed: 11/25/2022] Open
Abstract
During a conversation or when listening to music, auditory and visual information are combined automatically into audiovisual objects. However, it is still poorly understood how specific type of visual information shapes neural processing of sounds in lifelike stimulus environments. Here we applied multi-voxel pattern analysis to investigate how naturally matching visual input modulates supratemporal cortex activity during processing of naturalistic acoustic speech, singing and instrumental music. Bayesian logistic regression classifiers with sparsity-promoting priors were trained to predict whether the stimulus was audiovisual or auditory, and whether it contained piano playing, speech, or singing. The predictive performances of the classifiers were tested by leaving one participant at a time for testing and training the model using the remaining 15 participants. The signature patterns associated with unimodal auditory stimuli encompassed distributed locations mostly in the middle and superior temporal gyrus (STG/MTG). A pattern regression analysis, based on a continuous acoustic model, revealed that activity in some of these MTG and STG areas were associated with acoustic features present in speech and music stimuli. Concurrent visual stimulus modulated activity in bilateral MTG (speech), lateral aspect of right anterior STG (singing), and bilateral parietal opercular cortex (piano). Our results suggest that specific supratemporal brain areas are involved in processing complex natural speech, singing, and piano playing, and other brain areas located in anterior (facial speech) and posterior (music-related hand actions) supratemporal cortex are influenced by related visual information. Those anterior and posterior supratemporal areas have been linked to stimulus identification and sensory-motor integration, respectively.
Collapse
Affiliation(s)
- Juha Salmi
- Department of Neuroscience and Biomedical Engineering (NBE), School of Science, Aalto University, Finland; Advanced Magnetic Imaging (AMI) Centre, School of Science, Aalto University, Finland; Institute of Behavioural Sciences, Division of Cognitive and Neuropsychology, University of Helsinki, Finland
| | - Olli-Pekka Koistinen
- Department of Neuroscience and Biomedical Engineering (NBE), School of Science, Aalto University, Finland
| | - Enrico Glerean
- Department of Neuroscience and Biomedical Engineering (NBE), School of Science, Aalto University, Finland
| | - Pasi Jylänki
- Department of Neuroscience and Biomedical Engineering (NBE), School of Science, Aalto University, Finland
| | - Aki Vehtari
- Department of Neuroscience and Biomedical Engineering (NBE), School of Science, Aalto University, Finland
| | - Iiro P Jääskeläinen
- Department of Neuroscience and Biomedical Engineering (NBE), School of Science, Aalto University, Finland
| | - Sasu Mäkelä
- Department of Neuroscience and Biomedical Engineering (NBE), School of Science, Aalto University, Finland
| | - Lauri Nummenmaa
- Department of Neuroscience and Biomedical Engineering (NBE), School of Science, Aalto University, Finland; Turku PET Centre, University of Turku, Finland
| | | | - Ilari Nummi
- Department of Neuroscience and Biomedical Engineering (NBE), School of Science, Aalto University, Finland
| | - Mikko Sams
- Department of Neuroscience and Biomedical Engineering (NBE), School of Science, Aalto University, Finland.
| |
Collapse
|
33
|
Sheth BR, Young R. Two Visual Pathways in Primates Based on Sampling of Space: Exploitation and Exploration of Visual Information. Front Integr Neurosci 2016; 10:37. [PMID: 27920670 PMCID: PMC5118626 DOI: 10.3389/fnint.2016.00037] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2016] [Accepted: 10/25/2016] [Indexed: 11/14/2022] Open
Abstract
Evidence is strong that the visual pathway is segregated into two distinct streams—ventral and dorsal. Two proposals theorize that the pathways are segregated in function: The ventral stream processes information about object identity, whereas the dorsal stream, according to one model, processes information about either object location, and according to another, is responsible in executing movements under visual control. The models are influential; however recent experimental evidence challenges them, e.g., the ventral stream is not solely responsible for object recognition; conversely, its function is not strictly limited to object vision; the dorsal stream is not responsible by itself for spatial vision or visuomotor control; conversely, its function extends beyond vision or visuomotor control. In their place, we suggest a robust dichotomy consisting of a ventral stream selectively sampling high-resolution/focal spaces, and a dorsal stream sampling nearly all of space with reduced foveal bias. The proposal hews closely to the theme of embodied cognition: Function arises as a consequence of an extant sensory underpinning. A continuous, not sharp, segregation based on function emerges, and carries with it an undercurrent of an exploitation-exploration dichotomy. Under this interpretation, cells of the ventral stream, which individually have more punctate receptive fields that generally include the fovea or parafovea, provide detailed information about object shapes and features and lead to the systematic exploitation of said information; cells of the dorsal stream, which individually have large receptive fields, contribute to visuospatial perception, provide information about the presence/absence of salient objects and their locations for novel exploration and subsequent exploitation by the ventral stream or, under certain conditions, the dorsal stream. We leverage the dichotomy to unify neuropsychological cases under a common umbrella, account for the increased prevalence of multisensory integration in the dorsal stream under a Bayesian framework, predict conditions under which object recognition utilizes the ventral or dorsal stream, and explain why cells of the dorsal stream drive sensorimotor control and motion processing and have poorer feature selectivity. Finally, the model speculates on a dynamic interaction between the two streams that underscores a unified, seamless perception. Existing theories are subsumed under our proposal.
Collapse
Affiliation(s)
- Bhavin R Sheth
- Department of Electrical and Computer Engineering, University of HoustonHouston, TX, USA; Center for NeuroEngineering and Cognitive Systems, University of HoustonHouston, TX, USA
| | - Ryan Young
- Department of Neuroscience, Brandeis University Waltham, MA, USA
| |
Collapse
|
34
|
Rosenblum LD, Dorsi J, Dias JW. The Impact and Status of Carol Fowler's Supramodal Theory of Multisensory Speech Perception. ECOLOGICAL PSYCHOLOGY 2016. [DOI: 10.1080/10407413.2016.1230373] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
35
|
Skilled musicians are not subject to the McGurk effect. Sci Rep 2016; 6:30423. [PMID: 27453363 PMCID: PMC4958963 DOI: 10.1038/srep30423] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2016] [Accepted: 07/05/2016] [Indexed: 11/25/2022] Open
Abstract
The McGurk effect is a compelling illusion in which humans auditorily perceive mismatched audiovisual speech as a completely different syllable. In this study evidences are provided that professional musicians are not subject to this illusion, possibly because of their finer auditory or attentional abilities. 80 healthy age-matched graduate students volunteered to the study. 40 were musicians of Brescia Luca Marenzio Conservatory of Music with at least 8–13 years of musical academic studies. /la/, /da/, /ta/, /ga/, /ka/, /na/, /ba/, /pa/ phonemes were presented to participants in audiovisual congruent and incongruent conditions, or in unimodal (only visual or only auditory) conditions while engaged in syllable recognition tasks. Overall musicians showed no significant McGurk effect for any of the phonemes. Controls showed a marked McGurk effect for several phonemes (including alveolar-nasal, velar-occlusive and bilabial ones). The results indicate that the early and intensive musical training might affect the way the auditory cortex process phonetic information.
Collapse
|
36
|
Kaufmann JM, Schweinberger SR. Speaker Variations Influence Speechreading Speed for Dynamic Faces. Perception 2016; 34:595-610. [PMID: 15991696 DOI: 10.1068/p5104] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
We investigated the influence of task-irrelevant speaker variations on speechreading performance. In three experiments with video digitised faces presented either in dynamic, static-sequential, or static mode, participants performed speeded classifications on vowel utterances (German vowels /u/ and /i/). A Garner interference paradigm was used, in which speaker identity was task-irrelevant but could be either correlated, constant, or orthogonal to the vowel uttered. Reaction times for facial speech classifications were slowed by task-irrelevant speaker variations for dynamic stimuli. The results are discussed with reference to distributed models of face perception (Haxby et al, 2000 Trends in Cognitive Sciences4 223–233) and the relevance of both dynamic information and speaker characteristics for speechreading.
Collapse
Affiliation(s)
- Jürgen M Kaufmann
- Department of Psychology, University of Glasgow, 58 Hillhead Street, Glasgow G12 8QB, Scotland, UK.
| | | |
Collapse
|
37
|
Rosenblum LD, Dias JW, Dorsi J. The supramodal brain: implications for auditory perception. JOURNAL OF COGNITIVE PSYCHOLOGY 2016. [DOI: 10.1080/20445911.2016.1181691] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
38
|
Dias JW, Cook TC, Rosenblum LD. Influences of selective adaptation on perception of audiovisual speech. JOURNAL OF PHONETICS 2016; 56:75-84. [PMID: 27041781 PMCID: PMC4815035 DOI: 10.1016/j.wocn.2016.02.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Research suggests that selective adaptation in speech is a low-level process dependent on sensory-specific information shared between the adaptor and test-stimuli. However, previous research has only examined how adaptors shift perception of unimodal test stimuli, either auditory or visual. In the current series of experiments, we investigated whether adaptation to cross-sensory phonetic information can influence perception of integrated audio-visual phonetic information. We examined how selective adaptation to audio and visual adaptors shift perception of speech along an audiovisual test continuum. This test-continuum consisted of nine audio-/ba/-visual-/va/ stimuli, ranging in visual clarity of the mouth. When the mouth was clearly visible, perceivers "heard" the audio-visual stimulus as an integrated "va" percept 93.7% of the time (e.g., McGurk & MacDonald, 1976). As visibility of the mouth became less clear across the nine-item continuum, the audio-visual "va" percept weakened, resulting in a continuum ranging in audio-visual percepts from /va/ to /ba/. Perception of the test-stimuli was tested before and after adaptation. Changes in audiovisual speech perception were observed following adaptation to visual-/va/ and audiovisual-/va/, but not following adaptation to auditory-/va/, auditory-/ba/, or visual-/ba/. Adaptation modulates perception of integrated audio-visual speech by modulating the processing of sensory-specific information. The results suggest that auditory and visual speech information are not completely integrated at the level of selective adaptation.
Collapse
|
39
|
van de Rijt LPH, van Opstal AJ, Mylanus EAM, Straatman LV, Hu HY, Snik AFM, van Wanrooij MM. Temporal Cortex Activation to Audiovisual Speech in Normal-Hearing and Cochlear Implant Users Measured with Functional Near-Infrared Spectroscopy. Front Hum Neurosci 2016; 10:48. [PMID: 26903848 PMCID: PMC4750083 DOI: 10.3389/fnhum.2016.00048] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2015] [Accepted: 01/29/2016] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Speech understanding may rely not only on auditory, but also on visual information. Non-invasive functional neuroimaging techniques can expose the neural processes underlying the integration of multisensory processes required for speech understanding in humans. Nevertheless, noise (from functional MRI, fMRI) limits the usefulness in auditory experiments, and electromagnetic artifacts caused by electronic implants worn by subjects can severely distort the scans (EEG, fMRI). Therefore, we assessed audio-visual activation of temporal cortex with a silent, optical neuroimaging technique: functional near-infrared spectroscopy (fNIRS). METHODS We studied temporal cortical activation as represented by concentration changes of oxy- and deoxy-hemoglobin in four, easy-to-apply fNIRS optical channels of 33 normal-hearing adult subjects and five post-lingually deaf cochlear implant (CI) users in response to supra-threshold unisensory auditory and visual, as well as to congruent auditory-visual speech stimuli. RESULTS Activation effects were not visible from single fNIRS channels. However, by discounting physiological noise through reference channel subtraction (RCS), auditory, visual and audiovisual (AV) speech stimuli evoked concentration changes for all sensory modalities in both cohorts (p < 0.001). Auditory stimulation evoked larger concentration changes than visual stimuli (p < 0.001). A saturation effect was observed for the AV condition. CONCLUSIONS Physiological, systemic noise can be removed from fNIRS signals by RCS. The observed multisensory enhancement of an auditory cortical channel can be plausibly described by a simple addition of the auditory and visual signals with saturation.
Collapse
Affiliation(s)
- Luuk P H van de Rijt
- Department of Otorhinolaryngology, Donders Institute for Brain, Cognition, and Behaviour, Radboud University Nijmegen Medical CentreNijmegen, Netherlands; Department of Biophysics, Donders Institute for Brain, Cognition, and Behaviour, Radboud University NijmegenNijmegen, Netherlands
| | - A John van Opstal
- Department of Biophysics, Donders Institute for Brain, Cognition, and Behaviour, Radboud University Nijmegen Nijmegen, Netherlands
| | - Emmanuel A M Mylanus
- Department of Otorhinolaryngology, Donders Institute for Brain, Cognition, and Behaviour, Radboud University Nijmegen Medical Centre Nijmegen, Netherlands
| | - Louise V Straatman
- Department of Otorhinolaryngology, Donders Institute for Brain, Cognition, and Behaviour, Radboud University Nijmegen Medical Centre Nijmegen, Netherlands
| | - Hai Yin Hu
- Department of Biophysics, Donders Institute for Brain, Cognition, and Behaviour, Radboud University Nijmegen Nijmegen, Netherlands
| | - Ad F M Snik
- Department of Otorhinolaryngology, Donders Institute for Brain, Cognition, and Behaviour, Radboud University Nijmegen Medical Centre Nijmegen, Netherlands
| | - Marc M van Wanrooij
- Department of Otorhinolaryngology, Donders Institute for Brain, Cognition, and Behaviour, Radboud University Nijmegen Medical CentreNijmegen, Netherlands; Department of Biophysics, Donders Institute for Brain, Cognition, and Behaviour, Radboud University NijmegenNijmegen, Netherlands
| |
Collapse
|
40
|
Congruent Visual Speech Enhances Cortical Entrainment to Continuous Auditory Speech in Noise-Free Conditions. J Neurosci 2016; 35:14195-204. [PMID: 26490860 DOI: 10.1523/jneurosci.1829-15.2015] [Citation(s) in RCA: 102] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
UNLABELLED Congruent audiovisual speech enhances our ability to comprehend a speaker, even in noise-free conditions. When incongruent auditory and visual information is presented concurrently, it can hinder a listener's perception and even cause him or her to perceive information that was not presented in either modality. Efforts to investigate the neural basis of these effects have often focused on the special case of discrete audiovisual syllables that are spatially and temporally congruent, with less work done on the case of natural, continuous speech. Recent electrophysiological studies have demonstrated that cortical response measures to continuous auditory speech can be easily obtained using multivariate analysis methods. Here, we apply such methods to the case of audiovisual speech and, importantly, present a novel framework for indexing multisensory integration in the context of continuous speech. Specifically, we examine how the temporal and contextual congruency of ongoing audiovisual speech affects the cortical encoding of the speech envelope in humans using electroencephalography. We demonstrate that the cortical representation of the speech envelope is enhanced by the presentation of congruent audiovisual speech in noise-free conditions. Furthermore, we show that this is likely attributable to the contribution of neural generators that are not particularly active during unimodal stimulation and that it is most prominent at the temporal scale corresponding to syllabic rate (2-6 Hz). Finally, our data suggest that neural entrainment to the speech envelope is inhibited when the auditory and visual streams are incongruent both temporally and contextually. SIGNIFICANCE STATEMENT Seeing a speaker's face as he or she talks can greatly help in understanding what the speaker is saying. This is because the speaker's facial movements relay information about what the speaker is saying, but also, importantly, when the speaker is saying it. Studying how the brain uses this timing relationship to combine information from continuous auditory and visual speech has traditionally been methodologically difficult. Here we introduce a new approach for doing this using relatively inexpensive and noninvasive scalp recordings. Specifically, we show that the brain's representation of auditory speech is enhanced when the accompanying visual speech signal shares the same timing. Furthermore, we show that this enhancement is most pronounced at a time scale that corresponds to mean syllable length.
Collapse
|
41
|
|
42
|
Fercho K, Baugh LA, Hanson EK. Effects of Alphabet-Supplemented Speech on Brain Activity of Listeners: An fMRI Study. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2015; 58:1452-1463. [PMID: 26254449 DOI: 10.1044/2015_jslhr-s-14-0038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/11/2014] [Accepted: 07/08/2015] [Indexed: 06/04/2023]
Abstract
PURPOSE The purpose of this article was to examine the neural mechanisms associated with increases in speech intelligibility brought about through alphabet supplementation. METHOD Neurotypical participants listened to dysarthric speech while watching an accompanying video of a hand pointing to the 1st letter spoken of each word on an alphabet display (treatment condition) or a scrambled display (control condition). Their hemodynamic response was measured with functional magnetic resonance imaging, using a sparse sampling event-related paradigm. Speech intelligibility was assessed via a forced-choice auditory identification task throughout the scanning session. RESULTS Alphabet supplementation was associated with significant increases in speech intelligibility. Further, alphabet supplementation increased activation in brain regions known to be involved in both auditory speech and visual letter perception above that seen with the scrambled display. Significant increases in functional activity were observed within the posterior to mid superior temporal sulcus/superior temporal gyrus during alphabet supplementation, regions known to be involved in speech processing and audiovisual integration. CONCLUSION Alphabet supplementation is an effective tool for increasing the intelligibility of degraded speech and is associated with changes in activity within audiovisual integration sites. Changes in activity within the superior temporal sulcus/superior temporal gyrus may be related to the behavioral increases in intelligibility brought about by this augmented communication method.
Collapse
|
43
|
Ahveninen J, Huang S, Ahlfors SP, Hämäläinen M, Rossi S, Sams M, Jääskeläinen IP. Interacting parallel pathways associate sounds with visual identity in auditory cortices. Neuroimage 2015; 124:858-868. [PMID: 26419388 DOI: 10.1016/j.neuroimage.2015.09.044] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2015] [Revised: 08/26/2015] [Accepted: 09/20/2015] [Indexed: 10/23/2022] Open
Abstract
Spatial and non-spatial information of sound events is presumably processed in parallel auditory cortex (AC) "what" and "where" streams, which are modulated by inputs from the respective visual-cortex subsystems. How these parallel processes are integrated to perceptual objects that remain stable across time and the source agent's movements is unknown. We recorded magneto- and electroencephalography (MEG/EEG) data while subjects viewed animated video clips featuring two audiovisual objects, a black cat and a gray cat. Adaptor-probe events were either linked to the same object (the black cat meowed twice in a row in the same location) or included a visually conveyed identity change (the black and then the gray cat meowed with identical voices in the same location). In addition to effects in visual (including fusiform, middle temporal or MT areas) and frontoparietal association areas, the visually conveyed object-identity change was associated with a release from adaptation of early (50-150ms) activity in posterior ACs, spreading to left anterior ACs at 250-450ms in our combined MEG/EEG source estimates. Repetition of events belonging to the same object resulted in increased theta-band (4-8Hz) synchronization within the "what" and "where" pathways (e.g., between anterior AC and fusiform areas). In contrast, the visually conveyed identity changes resulted in distributed synchronization at higher frequencies (alpha and beta bands, 8-32Hz) across different auditory, visual, and association areas. The results suggest that sound events become initially linked to perceptual objects in posterior AC, followed by modulations of representations in anterior AC. Hierarchical what and where pathways seem to operate in parallel after repeating audiovisual associations, whereas the resetting of such associations engages a distributed network across auditory, visual, and multisensory areas.
Collapse
Affiliation(s)
- Jyrki Ahveninen
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital/Harvard Medical School, Charlestown, MA, USA.
| | - Samantha Huang
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital/Harvard Medical School, Charlestown, MA, USA
| | - Seppo P Ahlfors
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital/Harvard Medical School, Charlestown, MA, USA
| | - Matti Hämäläinen
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital/Harvard Medical School, Charlestown, MA, USA; Harvard-MIT Division of Health Sciences and Technology, Cambridge, MA, USA; Department of Neuroscience and Biomedical Engineering, Aalto University, School of Science, Espoo, Finland
| | - Stephanie Rossi
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital/Harvard Medical School, Charlestown, MA, USA
| | - Mikko Sams
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University School of Science, Espoo, Finland
| | - Iiro P Jääskeläinen
- Brain and Mind Laboratory, Department of Neuroscience and Biomedical Engineering, Aalto University School of Science, Espoo, Finland
| |
Collapse
|
44
|
Abstract
Frequency modulation is critical to human speech. Evidence from psychophysics, neurophysiology, and neuroimaging suggests that there are neuronal populations tuned to this property of speech. Consistent with this, extended exposure to frequency change produces direction specific aftereffects in frequency change detection. We show that this aftereffect occurs extremely rapidly, requiring only a single trial of just 100-ms duration. We demonstrate this using a long, randomized series of frequency sweeps (both upward and downward, by varying amounts) and analyzing intertrial adaptation effects. We show the point of constant frequency is shifted systematically towards the previous trial's sweep direction (i.e., a frequency sweep aftereffect). Furthermore, the perception of glide direction is also independently influenced by the glide presented two trials previously. The aftereffect is frequency tuned, as exposure to a frequency sweep from a set centered on 1,000 Hz does not influence a subsequent trial drawn from a set centered on 400 Hz. More generally, the rapidity of adaptation suggests the auditory system is constantly adapting and "tuning" itself to the most recent environmental conditions.
Collapse
|
45
|
Tse CY, Gratton G, Garnsey SM, Novak MA, Fabiani M. Read My Lips: Brain Dynamics Associated with Audiovisual Integration and Deviance Detection. J Cogn Neurosci 2015; 27:1723-37. [DOI: 10.1162/jocn_a_00812] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Abstract
Information from different modalities is initially processed in different brain areas, yet real-world perception often requires the integration of multisensory signals into a single percept. An example is the McGurk effect, in which people viewing a speaker whose lip movements do not match the utterance perceive the spoken sounds incorrectly, hearing them as more similar to those signaled by the visual rather than the auditory input. This indicates that audiovisual integration is important for generating the phoneme percept. Here we asked when and where the audiovisual integration process occurs, providing spatial and temporal boundaries for the processes generating phoneme perception. Specifically, we wanted to separate audiovisual integration from other processes, such as simple deviance detection. Building on previous work employing ERPs, we used an oddball paradigm in which task-irrelevant audiovisually deviant stimuli were embedded in strings of non-deviant stimuli. We also recorded the event-related optical signal, an imaging method combining spatial and temporal resolution, to investigate the time course and neuroanatomical substrate of audiovisual integration. We found that audiovisual deviants elicit a short duration response in the middle/superior temporal gyrus, whereas audiovisual integration elicits a more extended response involving also inferior frontal and occipital regions. Interactions between audiovisual integration and deviance detection processes were observed in the posterior/superior temporal gyrus. These data suggest that dynamic interactions between inferior frontal cortex and sensory regions play a significant role in multimodal integration.
Collapse
Affiliation(s)
- Chun-Yu Tse
- 1University of Illinois at Urbana-Champaign
- 2The Chinese University of Hong Kong
| | | | | | | | | |
Collapse
|
46
|
Einarson KM, Trainor LJ. The Effect of Visual Information on Young Children’s Perceptual Sensitivity to Musical Beat Alignment. TIMING & TIME PERCEPTION 2015. [DOI: 10.1163/22134468-03002039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Recent work examined five-year-old children’s perceptual sensitivity to musical beat alignment. In this work, children watched pairs of videos of puppets drumming to music with simple or complex metre, where one puppet’s drumming sounds (and movements) were synchronized with the beat of the music and the other drummed with incorrect tempo or phase. The videos were used to maintain children’s interest in the task. Five-year-olds were better able to detect beat misalignments in simple than complex metre music. However, adults can perform poorly when attempting to detect misalignment of sound and movement in audiovisual tasks, so it is possible that the moving stimuli actually hindered children’s performance. Here we compared children’s sensitivity to beat misalignment in conditions with dynamic visual movement versus still (static) visual images. Eighty-four five-year-old children performed either the same task as described above or a task that employed identical auditory stimuli accompanied by a motionless picture of the puppet with the drum. There was a significant main effect of metre type, replicating the finding that five-year-olds are better able to detect beat misalignment in simple metre music. There was no main effect of visual condition. These results suggest that, given identical auditory information, children’s ability to judge beat misalignment in this task is not affected by the presence or absence of dynamic visual stimuli. We conclude that at five years of age, children can tell if drumming is aligned to the musical beat when the music has simple metric structure.
Collapse
Affiliation(s)
| | - Laurel J. Trainor
- McMaster UniversityCanada
- McMaster Institute for Music and the MindCanada
- Rotman Research InstituteCanada
| |
Collapse
|
47
|
Eskelund K, MacDonald EN, Andersen TS. Face configuration affects speech perception: Evidence from a McGurk mismatch negativity study. Neuropsychologia 2015; 66:48-54. [DOI: 10.1016/j.neuropsychologia.2014.10.021] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2014] [Revised: 09/23/2014] [Accepted: 10/14/2014] [Indexed: 10/24/2022]
|
48
|
Abstract
Central auditory processing disorders (CAPD) can affect children and adults of all ages due to a wide variety of causes. CAPD is a neurobiologic deficit in the central auditory nervous system (CANS) that affects those mechanisms that underlie fundamental auditory perception, including localization and lateralization; discrimination of speech and non-speech sounds; auditory pattern recognition; temporal aspects of audition, including integration, resolution, ordering, and masking; and auditory performance with competing and/or degraded acoustic signals (American Speech-Language-Hearing Association, 2005a, b). Although it is recognized that central auditory dysfunction may coexist with other disorders, CAPD is conceptualized as a sensory-based auditory disorder. Administration of behavioral and/or electrophysiologic audiologic tests that have been shown to be sensitive and specific to dysfunction of the CANS is critical for a proper diagnosis of CAPD, in addition to assessments and collaboration with a multidisciplinary team. Intervention recommendations for CAPD diagnosis are based on the demonstrated auditory processing deficits and related listening and related complaints. This chapter provides an overview of current definitions and conceptualizations, methods of diagnosis of, and intervention for, CAPD. The chapter culminates with a case study illustrating pre- and posttreatment behavioral and electrophysiologic diagnostic findings.
Collapse
|
49
|
Kaganovich N, Schumaker J. Audiovisual integration for speech during mid-childhood: electrophysiological evidence. BRAIN AND LANGUAGE 2014; 139:36-48. [PMID: 25463815 PMCID: PMC4363284 DOI: 10.1016/j.bandl.2014.09.011] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2014] [Revised: 09/28/2014] [Accepted: 09/30/2014] [Indexed: 05/05/2023]
Abstract
Previous studies have demonstrated that the presence of visual speech cues reduces the amplitude and latency of the N1 and P2 event-related potential (ERP) components elicited by speech stimuli. However, the developmental trajectory of this effect is not yet fully mapped. We examined ERP responses to auditory, visual, and audiovisual speech in two groups of school-age children (7-8-year-olds and 10-11-year-olds) and in adults. Audiovisual speech led to the attenuation of the N1 and P2 components in all groups of participants, suggesting that the neural mechanisms underlying these effects are functional by early school years. Additionally, while the reduction in N1 was largest over the right scalp, the P2 attenuation was largest over the left and midline scalp. The difference in the hemispheric distribution of the N1 and P2 attenuation supports the idea that these components index at least somewhat disparate neural processes within the context of audiovisual speech perception.
Collapse
Affiliation(s)
- Natalya Kaganovich
- Department of Speech, Language, and Hearing Sciences, Purdue University, Lyles Porter Hall, 715 Clinic Drive, West Lafayette, IN 47907-2038, United States; Department of Psychological Sciences, Purdue University, 703 Third Street, West Lafayette, IN 47907-2038, United States.
| | - Jennifer Schumaker
- Department of Speech, Language, and Hearing Sciences, Purdue University, Lyles Porter Hall, 715 Clinic Drive, West Lafayette, IN 47907-2038, United States
| |
Collapse
|
50
|
Bernstein LE, Liebenthal E. Neural pathways for visual speech perception. Front Neurosci 2014; 8:386. [PMID: 25520611 PMCID: PMC4248808 DOI: 10.3389/fnins.2014.00386] [Citation(s) in RCA: 89] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2014] [Accepted: 11/10/2014] [Indexed: 12/03/2022] Open
Abstract
This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody) can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns of activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1) The visual perception of speech relies on visual pathway representations of speech qua speech. (2) A proposed site of these representations, the temporal visual speech area (TVSA) has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS). (3) Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA.
Collapse
Affiliation(s)
- Lynne E Bernstein
- Department of Speech and Hearing Sciences, George Washington University Washington, DC, USA
| | - Einat Liebenthal
- Department of Neurology, Medical College of Wisconsin Milwaukee, WI, USA ; Department of Psychiatry, Brigham and Women's Hospital Boston, MA, USA
| |
Collapse
|