1
|
Brang D, Plass J, Sherman A, Stacey WC, Wasade VS, Grabowecky M, Ahn E, Towle VL, Tao JX, Wu S, Issa NP, Suzuki S. Visual cortex responds to sound onset and offset during passive listening. J Neurophysiol 2022; 127:1547-1563. [PMID: 35507478 DOI: 10.1152/jn.00164.2021] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Sounds enhance our ability to detect, localize, and respond to co-occurring visual targets. Research suggests that sounds improve visual processing by resetting the phase of ongoing oscillations in visual cortex. However, it remains unclear what information is relayed from the auditory system to visual areas and if sounds modulate visual activity even in the absence of visual stimuli (e.g., during passive listening). Using intracranial electroencephalography (iEEG) in humans, we examined the sensitivity of visual cortex to three forms of auditory information during a passive listening task: auditory onset responses, auditory offset responses, and rhythmic entrainment to sounds. Because some auditory neurons respond to both sound onsets and offsets, visual timing and duration processing may benefit from each. Additionally, if auditory entrainment information is relayed to visual cortex, it could support the processing of complex stimulus dynamics that are aligned between auditory and visual stimuli. Results demonstrate that in visual cortex, amplitude-modulated sounds elicited transient onset and offset responses in multiple areas, but no entrainment to sound modulation frequencies. These findings suggest that activity in visual cortex (as measured with iEEG in response to auditory stimuli) may not be affected by temporally fine-grained auditory stimulus dynamics during passive listening (though it remains possible that this signal may be observable with simultaneous auditory-visual stimuli). Moreover, auditory responses were maximal in low-level visual cortex, potentially implicating a direct pathway for rapid interactions between auditory and visual cortices. This mechanism may facilitate perception by time-locking visual computations to environmental events marked by auditory discontinuities.
Collapse
Affiliation(s)
- David Brang
- Department of Psychology, University of Michigan, Ann Arbor, MI, United States
| | - John Plass
- Department of Psychology, University of Michigan, Ann Arbor, MI, United States
| | - Aleksandra Sherman
- Department of Cognitive Science, Occidental College, Los Angeles, CA, United States
| | - William C Stacey
- Department of Neurology, University of Michigan, Ann Arbor, MI, United States
| | | | - Marcia Grabowecky
- Department of Psychology, Northwestern University, Evanston, IL, United States
| | - EunSeon Ahn
- Department of Psychology, University of Michigan, Ann Arbor, MI, United States
| | - Vernon L Towle
- Department of Neurology, The University of Chicago, Chicago, IL, United States
| | - James X Tao
- Department of Neurology, The University of Chicago, Chicago, IL, United States
| | - Shasha Wu
- Department of Neurology, The University of Chicago, Chicago, IL, United States
| | - Naoum P Issa
- Department of Neurology, The University of Chicago, Chicago, IL, United States
| | - Satoru Suzuki
- Department of Psychology, Northwestern University, Evanston, IL, United States
| |
Collapse
|
2
|
Asymmetrical cross-modal influence on neural encoding of auditory and visual features in natural scenes. Neuroimage 2022; 255:119182. [PMID: 35395403 DOI: 10.1016/j.neuroimage.2022.119182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 03/24/2022] [Accepted: 04/04/2022] [Indexed: 11/22/2022] Open
Abstract
Natural scenes contain multi-modal information, which is integrated to form a coherent perception. Previous studies have demonstrated that cross-modal information can modulate neural encoding of low-level sensory features. These studies, however, mostly focus on the processing of single sensory events or rhythmic sensory sequences. Here, we investigate how the neural encoding of basic auditory and visual features is modulated by cross-modal information when the participants watch movie clips primarily composed of non-rhythmic events. We presented audiovisual congruent and audiovisual incongruent movie clips, and since attention can modulate cross-modal interactions, we separately analyzed high- and low-arousal movie clips. We recorded neural responses using electroencephalography (EEG), and employed the temporal response function (TRF) to quantify the neural encoding of auditory and visual features. The neural encoding of sound envelope is enhanced in the audiovisual congruent condition than the incongruent condition, but this effect is only significant for high-arousal movie clips. In contrast, audiovisual congruency does not significantly modulate the neural encoding of visual features, e.g., luminance or visual motion. In summary, our findings demonstrate asymmetrical cross-modal interactions during the processing of natural scenes that lack rhythmicity: Congruent visual information enhances low-level auditory processing, while congruent auditory information does not significantly modulate low-level visual processing.
Collapse
|
3
|
Mégevand P, Mercier MR, Groppe DM, Zion Golumbic E, Mesgarani N, Beauchamp MS, Schroeder CE, Mehta AD. Crossmodal Phase Reset and Evoked Responses Provide Complementary Mechanisms for the Influence of Visual Speech in Auditory Cortex. J Neurosci 2020; 40:8530-8542. [PMID: 33023923 PMCID: PMC7605423 DOI: 10.1523/jneurosci.0555-20.2020] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Revised: 07/27/2020] [Accepted: 08/31/2020] [Indexed: 12/26/2022] Open
Abstract
Natural conversation is multisensory: when we can see the speaker's face, visual speech cues improve our comprehension. The neuronal mechanisms underlying this phenomenon remain unclear. The two main alternatives are visually mediated phase modulation of neuronal oscillations (excitability fluctuations) in auditory neurons and visual input-evoked responses in auditory neurons. Investigating this question using naturalistic audiovisual speech with intracranial recordings in humans of both sexes, we find evidence for both mechanisms. Remarkably, auditory cortical neurons track the temporal dynamics of purely visual speech using the phase of their slow oscillations and phase-related modulations in broadband high-frequency activity. Consistent with known perceptual enhancement effects, the visual phase reset amplifies the cortical representation of concomitant auditory speech. In contrast to this, and in line with earlier reports, visual input reduces the amplitude of evoked responses to concomitant auditory input. We interpret the combination of improved phase tracking and reduced response amplitude as evidence for more efficient and reliable stimulus processing in the presence of congruent auditory and visual speech inputs.SIGNIFICANCE STATEMENT Watching the speaker can facilitate our understanding of what is being said. The mechanisms responsible for this influence of visual cues on the processing of speech remain incompletely understood. We studied these mechanisms by recording the electrical activity of the human brain through electrodes implanted surgically inside the brain. We found that visual inputs can operate by directly activating auditory cortical areas, and also indirectly by modulating the strength of cortical responses to auditory input. Our results help to understand the mechanisms by which the brain merges auditory and visual speech into a unitary perception.
Collapse
Affiliation(s)
- Pierre Mégevand
- Department of Neurosurgery, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York 11549
- Feinstein Institutes for Medical Research, Manhasset, New York 11030
- Department of Basic Neurosciences, Faculty of Medicine, University of Geneva, 1211 Geneva, Switzerland
| | - Manuel R Mercier
- Department of Neurology, Montefiore Medical Center, Bronx, New York 10467
- Department of Neuroscience, Albert Einstein College of Medicine, Bronx, New York 10461
- Institut de Neurosciences des Systèmes, Aix Marseille University, INSERM, 13005 Marseille, France
| | - David M Groppe
- Department of Neurosurgery, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York 11549
- Feinstein Institutes for Medical Research, Manhasset, New York 11030
- The Krembil Neuroscience Centre, University Health Network, Toronto, Ontario M5T 1M8, Canada
| | - Elana Zion Golumbic
- The Gonda Brain Research Center, Bar Ilan University, Ramat Gan 5290002, Israel
| | - Nima Mesgarani
- Department of Electrical Engineering, Columbia University, New York, New York 10027
| | - Michael S Beauchamp
- Department of Neurosurgery, Baylor College of Medicine, Houston, Texas 77030
| | - Charles E Schroeder
- Nathan S. Kline Institute, Orangeburg, New York 10962
- Department of Psychiatry, Columbia University, New York, New York 10032
| | - Ashesh D Mehta
- Department of Neurosurgery, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York 11549
- Feinstein Institutes for Medical Research, Manhasset, New York 11030
| |
Collapse
|
4
|
Responses to Visual Speech in Human Posterior Superior Temporal Gyrus Examined with iEEG Deconvolution. J Neurosci 2020; 40:6938-6948. [PMID: 32727820 PMCID: PMC7470920 DOI: 10.1523/jneurosci.0279-20.2020] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Revised: 06/01/2020] [Accepted: 06/02/2020] [Indexed: 12/22/2022] Open
Abstract
Experimentalists studying multisensory integration compare neural responses to multisensory stimuli with responses to the component modalities presented in isolation. This procedure is problematic for multisensory speech perception since audiovisual speech and auditory-only speech are easily intelligible but visual-only speech is not. To overcome this confound, we developed intracranial encephalography (iEEG) deconvolution. Individual stimuli always contained both auditory and visual speech, but jittering the onset asynchrony between modalities allowed for the time course of the unisensory responses and the interaction between them to be independently estimated. We applied this procedure to electrodes implanted in human epilepsy patients (both male and female) over the posterior superior temporal gyrus (pSTG), a brain area known to be important for speech perception. iEEG deconvolution revealed sustained positive responses to visual-only speech and larger, phasic responses to auditory-only speech. Confirming results from scalp EEG, responses to audiovisual speech were weaker than responses to auditory-only speech, demonstrating a subadditive multisensory neural computation. Leveraging the spatial resolution of iEEG, we extended these results to show that subadditivity is most pronounced in more posterior aspects of the pSTG. Across electrodes, subadditivity correlated with visual responsiveness, supporting a model in which visual speech enhances the efficiency of auditory speech processing in pSTG. The ability to separate neural processes may make iEEG deconvolution useful for studying a variety of complex cognitive and perceptual tasks.SIGNIFICANCE STATEMENT Understanding speech is one of the most important human abilities. Speech perception uses information from both the auditory and visual modalities. It has been difficult to study neural responses to visual speech because visual-only speech is difficult or impossible to comprehend, unlike auditory-only and audiovisual speech. We used intracranial encephalography deconvolution to overcome this obstacle. We found that visual speech evokes a positive response in the human posterior superior temporal gyrus, enhancing the efficiency of auditory speech processing.
Collapse
|
5
|
Micheli C, Schepers IM, Ozker M, Yoshor D, Beauchamp MS, Rieger JW. Electrocorticography reveals continuous auditory and visual speech tracking in temporal and occipital cortex. Eur J Neurosci 2020; 51:1364-1376. [PMID: 29888819 PMCID: PMC6289876 DOI: 10.1111/ejn.13992] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2017] [Revised: 05/19/2018] [Accepted: 05/29/2018] [Indexed: 12/11/2022]
Abstract
During natural speech perception, humans must parse temporally continuous auditory and visual speech signals into sequences of words. However, most studies of speech perception present only single words or syllables. We used electrocorticography (subdural electrodes implanted on the brains of epileptic patients) to investigate the neural mechanisms for processing continuous audiovisual speech signals consisting of individual sentences. Using partial correlation analysis, we found that posterior superior temporal gyrus (pSTG) and medial occipital cortex tracked both the auditory and the visual speech envelopes. These same regions, as well as inferior temporal cortex, responded more strongly to a dynamic video of a talking face compared to auditory speech paired with a static face. Occipital cortex and pSTG carry temporal information about both auditory and visual speech dynamics. Visual speech tracking in pSTG may be a mechanism for enhancing perception of degraded auditory speech.
Collapse
Affiliation(s)
- Cristiano Micheli
- Department of Psychology, Carl von Ossietzky University, Oldenburg, Germany
- Donders Centre for Cognitive Neuroimaging, Radboud University, Nijmegen, The Netherlands
| | - Inga M Schepers
- Department of Psychology, Carl von Ossietzky University, Oldenburg, Germany
- Research Center Neurosensory Science, Carl von Ossietzky University, Oldenburg, Germany
| | - Müge Ozker
- Department of Neurosurgery, Baylor College of Medicine, Houston, Texas
| | - Daniel Yoshor
- Department of Neurosurgery, Baylor College of Medicine, Houston, Texas
- Michael E. DeBakey Veterans Affairs Medical Center, Houston, Texas
| | | | - Jochem W Rieger
- Department of Psychology, Carl von Ossietzky University, Oldenburg, Germany
- Research Center Neurosensory Science, Carl von Ossietzky University, Oldenburg, Germany
| |
Collapse
|
6
|
Pigdon L, Willmott C, Reilly S, Conti-Ramsden G, Gaser C, Connelly A, Morgan AT. Grey matter volume in developmental speech and language disorder. Brain Struct Funct 2019; 224:3387-3398. [PMID: 31732792 DOI: 10.1007/s00429-019-01978-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Accepted: 11/04/2019] [Indexed: 01/15/2023]
Abstract
Developmental language disorder (DLD) and developmental speech disorder (DSD) are common, yet their etiologies are not well understood. Atypical volume of the inferior and posterior language regions and striatum have been reported in DLD; however, variability in both methodology and study findings limits interpretations. Imaging research within DSD, on the other hand, is scarce. The present study compared grey matter volume in children with DLD, DSD, and typically developing speech and language. Compared to typically developing controls, children with DLD had larger volume in the right cerebellum, possibly associated with the procedural learning deficits that have been proposed in DLD. Children with DSD showed larger volume in the left inferior occipital lobe compared to controls, which may indicate a compensatory role of the visual processing regions due to sub-optimal auditory-perceptual processes. Overall, these findings suggest that different neural systems may be involved in the specific deficits related to DLD and DSD.
Collapse
Affiliation(s)
- Lauren Pigdon
- Murdoch Children's Research Institute, 50 Flemington Rd, Parkville, VIC, 3052, Australia.,Turner Institute for Brain and Mental Health, Monash University, 18 Innovation Walk, Clayton, VIC, 3800, Australia
| | - Catherine Willmott
- Turner Institute for Brain and Mental Health, Monash University, 18 Innovation Walk, Clayton, VIC, 3800, Australia.,Monash-Epworth Rehabilitation Research Centre, Monash University, 18 Innovation Walk, Clayton, VIC, 3800, Australia
| | - Sheena Reilly
- Murdoch Children's Research Institute, 50 Flemington Rd, Parkville, VIC, 3052, Australia.,Menzies Health Institute Queensland, Griffith University, G40 Level 8.86, Mount Gravatt, QLD, 4222, Australia
| | - Gina Conti-Ramsden
- Murdoch Children's Research Institute, 50 Flemington Rd, Parkville, VIC, 3052, Australia.,The University of Manchester, Oxford Rd, Manchester, M13 9PL, UK
| | - Christian Gaser
- Jena University Hospital, Am Klinikum 1, 07747, Jena, Germany
| | - Alan Connelly
- Florey Institute of Neuroscience and Mental Health, 245 Burgundy Street, Heidelberg, VIC, 3084, Australia.,University of Melbourne, Grattan Street, Parkville, VIC, 3010, Australia
| | - Angela T Morgan
- Murdoch Children's Research Institute, 50 Flemington Rd, Parkville, VIC, 3052, Australia. .,University of Melbourne, Grattan Street, Parkville, VIC, 3010, Australia. .,Royal Children's Hospital, 50 Flemington Rd, Parkville, VIC, 3052, Australia.
| |
Collapse
|
7
|
Karas PJ, Magnotti JF, Metzger BA, Zhu LL, Smith KB, Yoshor D, Beauchamp MS. The visual speech head start improves perception and reduces superior temporal cortex responses to auditory speech. eLife 2019; 8:e48116. [PMID: 31393261 PMCID: PMC6687434 DOI: 10.7554/elife.48116] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Accepted: 07/17/2019] [Indexed: 12/30/2022] Open
Abstract
Visual information about speech content from the talker's mouth is often available before auditory information from the talker's voice. Here we examined perceptual and neural responses to words with and without this visual head start. For both types of words, perception was enhanced by viewing the talker's face, but the enhancement was significantly greater for words with a head start. Neural responses were measured from electrodes implanted over auditory association cortex in the posterior superior temporal gyrus (pSTG) of epileptic patients. The presence of visual speech suppressed responses to auditory speech, more so for words with a visual head start. We suggest that the head start inhibits representations of incompatible auditory phonemes, increasing perceptual accuracy and decreasing total neural responses. Together with previous work showing visual cortex modulation (Ozker et al., 2018b) these results from pSTG demonstrate that multisensory interactions are a powerful modulator of activity throughout the speech perception network.
Collapse
Affiliation(s)
- Patrick J Karas
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - John F Magnotti
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Brian A Metzger
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Lin L Zhu
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Kristen B Smith
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | - Daniel Yoshor
- Department of NeurosurgeryBaylor College of MedicineHoustonUnited States
| | | |
Collapse
|
8
|
O'Sullivan AE, Lim CY, Lalor EC. Look at me when I'm talking to you: Selective attention at a multisensory cocktail party can be decoded using stimulus reconstruction and alpha power modulations. Eur J Neurosci 2019; 50:3282-3295. [PMID: 31013361 DOI: 10.1111/ejn.14425] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Revised: 03/25/2019] [Accepted: 04/17/2019] [Indexed: 11/30/2022]
Abstract
Recent work using electroencephalography has applied stimulus reconstruction techniques to identify the attended speaker in a cocktail party environment. The success of these approaches has been primarily based on the ability to detect cortical tracking of the acoustic envelope at the scalp level. However, most studies have ignored the effects of visual input, which is almost always present in naturalistic scenarios. In this study, we investigated the effects of visual input on envelope-based cocktail party decoding in two multisensory cocktail party situations: (a) Congruent AV-facing the attended speaker while ignoring another speaker represented by the audio-only stream and (b) Incongruent AV (eavesdropping)-attending the audio-only speaker while looking at the unattended speaker. We trained and tested decoders for each condition separately and found that we can successfully decode attention to congruent audiovisual speech and can also decode attention when listeners were eavesdropping, i.e., looking at the face of the unattended talker. In addition to this, we found alpha power to be a reliable measure of attention to the visual speech. Using parieto-occipital alpha power, we found that we can distinguish whether subjects are attending or ignoring the speaker's face. Considering the practical applications of these methods, we demonstrate that with only six near-ear electrodes we can successfully determine the attended speech. This work extends the current framework for decoding attention to speech to more naturalistic scenarios, and in doing so provides additional neural measures which may be incorporated to improve decoding accuracy.
Collapse
Affiliation(s)
- Aisling E O'Sullivan
- School of Engineering, Trinity Centre for Bioengineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland
| | - Chantelle Y Lim
- Department of Biomedical Engineering, University of Rochester, Rochester, New York
| | - Edmund C Lalor
- School of Engineering, Trinity Centre for Bioengineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland.,Department of Biomedical Engineering, University of Rochester, Rochester, New York.,Department of Neuroscience, Del Monte Institute for Neuroscience, University of Rochester, Rochester, New York
| |
Collapse
|
9
|
Ozker M, Yoshor D, Beauchamp MS. Frontal cortex selects representations of the talker's mouth to aid in speech perception. eLife 2018; 7:30387. [PMID: 29485404 PMCID: PMC5828660 DOI: 10.7554/elife.30387] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2017] [Accepted: 02/18/2018] [Indexed: 11/25/2022] Open
Abstract
Human faces contain multiple sources of information. During speech perception, visual information from the talker’s mouth is integrated with auditory information from the talker's voice. By directly recording neural responses from small populations of neurons in patients implanted with subdural electrodes, we found enhanced visual cortex responses to speech when auditory speech was absent (rendering visual speech especially relevant). Receptive field mapping demonstrated that this enhancement was specific to regions of the visual cortex with retinotopic representations of the mouth of the talker. Connectivity between frontal cortex and other brain regions was measured with trial-by-trial power correlations. Strong connectivity was observed between frontal cortex and mouth regions of visual cortex; connectivity was weaker between frontal cortex and non-mouth regions of visual cortex or auditory cortex. These results suggest that top-down selection of visual information from the talker’s mouth by frontal cortex plays an important role in audiovisual speech perception.
Collapse
Affiliation(s)
- Muge Ozker
- Department of Neurosurgery, Baylor College of Medicine, Texas, United States
| | - Daniel Yoshor
- Department of Neurosurgery, Baylor College of Medicine, Texas, United States.,Michael E. DeBakey Veterans Affairs Medical Center, Texas, United States
| | - Michael S Beauchamp
- Department of Neurosurgery, Baylor College of Medicine, Texas, United States
| |
Collapse
|
10
|
Giroud N, Hirsiger S, Muri R, Kegel A, Dillier N, Meyer M. Neuroanatomical and resting state EEG power correlates of central hearing loss in older adults. Brain Struct Funct 2017; 223:145-163. [DOI: 10.1007/s00429-017-1477-0] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2016] [Accepted: 07/11/2017] [Indexed: 02/02/2023]
|
11
|
Giordano BL, Ince RAA, Gross J, Schyns PG, Panzeri S, Kayser C. Contributions of local speech encoding and functional connectivity to audio-visual speech perception. eLife 2017; 6. [PMID: 28590903 PMCID: PMC5462535 DOI: 10.7554/elife.24763] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2016] [Accepted: 05/07/2017] [Indexed: 11/13/2022] Open
Abstract
Seeing a speaker’s face enhances speech intelligibility in adverse environments. We investigated the underlying network mechanisms by quantifying local speech representations and directed connectivity in MEG data obtained while human participants listened to speech of varying acoustic SNR and visual context. During high acoustic SNR speech encoding by temporally entrained brain activity was strong in temporal and inferior frontal cortex, while during low SNR strong entrainment emerged in premotor and superior frontal cortex. These changes in local encoding were accompanied by changes in directed connectivity along the ventral stream and the auditory-premotor axis. Importantly, the behavioral benefit arising from seeing the speaker’s face was not predicted by changes in local encoding but rather by enhanced functional connectivity between temporal and inferior frontal cortex. Our results demonstrate a role of auditory-frontal interactions in visual speech representations and suggest that functional connectivity along the ventral pathway facilitates speech comprehension in multisensory environments. DOI:http://dx.doi.org/10.7554/eLife.24763.001 When listening to someone in a noisy environment, such as a cocktail party, we can understand the speaker more easily if we can also see his or her face. Movements of the lips and tongue convey additional information that helps the listener’s brain separate out syllables, words and sentences. However, exactly where in the brain this effect occurs and how it works remain unclear. To find out, Giordano et al. scanned the brains of healthy volunteers as they watched clips of people speaking. The clarity of the speech varied between clips. Furthermore, in some of the clips the lip movements of the speaker corresponded to the speech in question, whereas in others the lip movements were nonsense babble. As expected, the volunteers performed better on a word recognition task when the speech was clear and when the lips movements agreed with the spoken dialogue. Watching the video clips stimulated rhythmic activity in multiple regions of the volunteers’ brains, including areas that process sound and areas that plan movements. Speech is itself rhythmic, and the volunteers’ brain activity synchronized with the rhythms of the speech they were listening to. Seeing the speaker’s face increased this degree of synchrony. However, it also made it easier for sound-processing regions within the listeners’ brains to transfer information to one other. Notably, only the latter effect predicted improved performance on the word recognition task. This suggests that seeing a person’s face makes it easier to understand his or her speech by boosting communication between brain regions, rather than through effects on individual areas. Further work is required to determine where and how the brain encodes lip movements and speech sounds. The next challenge will be to identify where these two sets of information interact, and how the brain merges them together to generate the impression of specific words. DOI:http://dx.doi.org/10.7554/eLife.24763.002
Collapse
Affiliation(s)
- Bruno L Giordano
- Institut de Neurosciences de la Timone UMR 7289, Aix Marseille Université - Centre National de la Recherche Scientifique, Marseille, France.,Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Robin A A Ince
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Joachim Gross
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Philippe G Schyns
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| | - Stefano Panzeri
- Neural Computation Laboratory, Center for Neuroscience and Cognitive Systems, Istituto Italiano di Tecnologia, Rovereto, Italy
| | - Christoph Kayser
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom
| |
Collapse
|
12
|
Ozker M, Schepers IM, Magnotti JF, Yoshor D, Beauchamp MS. A Double Dissociation between Anterior and Posterior Superior Temporal Gyrus for Processing Audiovisual Speech Demonstrated by Electrocorticography. J Cogn Neurosci 2017; 29:1044-1060. [PMID: 28253074 DOI: 10.1162/jocn_a_01110] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Human speech can be comprehended using only auditory information from the talker's voice. However, comprehension is improved if the talker's face is visible, especially if the auditory information is degraded as occurs in noisy environments or with hearing loss. We explored the neural substrates of audiovisual speech perception using electrocorticography, direct recording of neural activity using electrodes implanted on the cortical surface. We observed a double dissociation in the responses to audiovisual speech with clear and noisy auditory component within the superior temporal gyrus (STG), a region long known to be important for speech perception. Anterior STG showed greater neural activity to audiovisual speech with clear auditory component, whereas posterior STG showed similar or greater neural activity to audiovisual speech in which the speech was replaced with speech-like noise. A distinct border between the two response patterns was observed, demarcated by a landmark corresponding to the posterior margin of Heschl's gyrus. To further investigate the computational roles of both regions, we considered Bayesian models of multisensory integration, which predict that combining the independent sources of information available from different modalities should reduce variability in the neural responses. We tested this prediction by measuring the variability of the neural responses to single audiovisual words. Posterior STG showed smaller variability than anterior STG during presentation of audiovisual speech with noisy auditory component. Taken together, these results suggest that posterior STG but not anterior STG is important for multisensory integration of noisy auditory and visual speech.
Collapse
Affiliation(s)
- Muge Ozker
- 1 University of Texas Graduate School of Biomedical Sciences at Houston.,2 Baylor College of Medicine
| | | | | | | | | |
Collapse
|
13
|
O'Sullivan AE, Crosse MJ, Di Liberto GM, Lalor EC. Visual Cortical Entrainment to Motion and Categorical Speech Features during Silent Lipreading. Front Hum Neurosci 2017; 10:679. [PMID: 28123363 PMCID: PMC5225113 DOI: 10.3389/fnhum.2016.00679] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Accepted: 12/20/2016] [Indexed: 11/13/2022] Open
Abstract
Speech is a multisensory percept, comprising an auditory and visual component. While the content and processing pathways of audio speech have been well characterized, the visual component is less well understood. In this work, we expand current methodologies using system identification to introduce a framework that facilitates the study of visual speech in its natural, continuous form. Specifically, we use models based on the unheard acoustic envelope (E), the motion signal (M) and categorical visual speech features (V) to predict EEG activity during silent lipreading. Our results show that each of these models performs similarly at predicting EEG in visual regions and that respective combinations of the individual models (EV, MV, EM and EMV) provide an improved prediction of the neural activity over their constituent models. In comparing these different combinations, we find that the model incorporating all three types of features (EMV) outperforms the individual models, as well as both the EV and MV models, while it performs similarly to the EM model. Importantly, EM does not outperform EV and MV, which, considering the higher dimensionality of the V model, suggests that more data is needed to clarify this finding. Nevertheless, the performance of EMV, and comparisons of the subject performances for the three individual models, provides further evidence to suggest that visual regions are involved in both low-level processing of stimulus dynamics and categorical speech perception. This framework may prove useful for investigating modality-specific processing of visual speech under naturalistic conditions.
Collapse
Affiliation(s)
- Aisling E O'Sullivan
- School of Engineering, Trinity College DublinDublin, Ireland; Trinity Centre for Bioengineering, Trinity College DublinDublin, Ireland
| | - Michael J Crosse
- Department of Pediatrics and Department of Neuroscience, Albert Einstein College of Medicine Bronx, NY, USA
| | - Giovanni M Di Liberto
- School of Engineering, Trinity College DublinDublin, Ireland; Trinity Centre for Bioengineering, Trinity College DublinDublin, Ireland
| | - Edmund C Lalor
- School of Engineering, Trinity College DublinDublin, Ireland; Trinity Centre for Bioengineering, Trinity College DublinDublin, Ireland; Trinity College Institute of Neuroscience, Trinity College DublinDublin, Ireland; Department of Biomedical Engineering and Department of Neuroscience, University of RochesterRochester, NY, USA
| |
Collapse
|
14
|
Rhone AE, Nourski KV, Oya H, Kawasaki H, Howard MA, McMurray B. Can you hear me yet? An intracranial investigation of speech and non-speech audiovisual interactions in human cortex. LANGUAGE, COGNITION AND NEUROSCIENCE 2015; 31:284-302. [PMID: 27182530 PMCID: PMC4865257 DOI: 10.1080/23273798.2015.1101145] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In everyday conversation, viewing a talker's face can provide information about the timing and content of an upcoming speech signal, resulting in improved intelligibility. Using electrocorticography, we tested whether human auditory cortex in Heschl's gyrus (HG) and on superior temporal gyrus (STG) and motor cortex on precentral gyrus (PreC) were responsive to visual/gestural information prior to the onset of sound and whether early stages of auditory processing were sensitive to the visual content (speech syllable versus non-speech motion). Event-related band power (ERBP) in the high gamma band was content-specific prior to acoustic onset on STG and PreC, and ERBP in the beta band differed in all three areas. Following sound onset, we found with no evidence for content-specificity in HG, evidence for visual specificity in PreC, and specificity for both modalities in STG. These results support models of audio-visual processing in which sensory information is integrated in non-primary cortical areas.
Collapse
|
15
|
Murray MM, Thelen A, Thut G, Romei V, Martuzzi R, Matusz PJ. The multisensory function of the human primary visual cortex. Neuropsychologia 2015; 83:161-169. [PMID: 26275965 DOI: 10.1016/j.neuropsychologia.2015.08.011] [Citation(s) in RCA: 107] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2015] [Revised: 08/08/2015] [Accepted: 08/10/2015] [Indexed: 01/20/2023]
Abstract
It has been nearly 10 years since Ghazanfar and Schroeder (2006) proposed that the neocortex is essentially multisensory in nature. However, it is only recently that sufficient and hard evidence that supports this proposal has accrued. We review evidence that activity within the human primary visual cortex plays an active role in multisensory processes and directly impacts behavioural outcome. This evidence emerges from a full pallet of human brain imaging and brain mapping methods with which multisensory processes are quantitatively assessed by taking advantage of particular strengths of each technique as well as advances in signal analyses. Several general conclusions about multisensory processes in primary visual cortex of humans are supported relatively solidly. First, haemodynamic methods (fMRI/PET) show that there is both convergence and integration occurring within primary visual cortex. Second, primary visual cortex is involved in multisensory processes during early post-stimulus stages (as revealed by EEG/ERP/ERFs as well as TMS). Third, multisensory effects in primary visual cortex directly impact behaviour and perception, as revealed by correlational (EEG/ERPs/ERFs) as well as more causal measures (TMS/tACS). While the provocative claim of Ghazanfar and Schroeder (2006) that the whole of neocortex is multisensory in function has yet to be demonstrated, this can now be considered established in the case of the human primary visual cortex.
Collapse
Affiliation(s)
- Micah M Murray
- The Laboratory for Investigative Neurophysiology (The LINE), Neuropsychology and Neurorehabilitation Service and Department of Radiology, University Hospital Center and University of Lausanne, Lausanne, Switzerland; EEG Brain Mapping Core, Center for Biomedical Imaging (CIBM) of Lausanne and Geneva, Lausanne, Switzerland; Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN, USA.
| | - Antonia Thelen
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Gregor Thut
- Centre for Cognitive Neuroimaging, Institute of Neuroscience and Psychology, University of Glasgow, Glasgow G12 8QB, United Kingdom
| | - Vincenzo Romei
- Centre for Brain Science, Department of Psychology, University of Essex, Colchester, United Kingdom
| | - Roberto Martuzzi
- Laboratory of Cognitive Neuroscience, Brain-Mind Institute, Ecole Polytechnique Fédérale de Lausanne, Switzerland
| | - Pawel J Matusz
- The Laboratory for Investigative Neurophysiology (The LINE), Neuropsychology and Neurorehabilitation Service and Department of Radiology, University Hospital Center and University of Lausanne, Lausanne, Switzerland; Attention, Brain, and Cognitive Development Group, Department of Experimental Psychology, University of Oxford, United Kingdom.
| |
Collapse
|