1
|
Çetinçelik M, Rowland CF, Snijders TM. Does the speaker's eye gaze facilitate infants' word segmentation from continuous speech? An ERP study. Dev Sci 2024; 27:e13436. [PMID: 37551932 DOI: 10.1111/desc.13436] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 04/20/2023] [Accepted: 07/04/2023] [Indexed: 08/09/2023]
Abstract
The environment in which infants learn language is multimodal and rich with social cues. Yet, the effects of such cues, such as eye contact, on early speech perception have not been closely examined. This study assessed the role of ostensive speech, signalled through the speaker's eye gaze direction, on infants' word segmentation abilities. A familiarisation-then-test paradigm was used while electroencephalography (EEG) was recorded. Ten-month-old Dutch-learning infants were familiarised with audio-visual stories in which a speaker recited four sentences with one repeated target word. The speaker addressed them either with direct or with averted gaze while speaking. In the test phase following each story, infants heard familiar and novel words presented via audio-only. Infants' familiarity with the words was assessed using event-related potentials (ERPs). As predicted, infants showed a negative-going ERP familiarity effect to the isolated familiarised words relative to the novel words over the left-frontal region of interest during the test phase. While the word familiarity effect did not differ as a function of the speaker's gaze over the left-frontal region of interest, there was also a (not predicted) positive-going early ERP familiarity effect over right fronto-central and central electrodes in the direct gaze condition only. This study provides electrophysiological evidence that infants can segment words from audio-visual speech, regardless of the ostensiveness of the speaker's communication. However, the speaker's gaze direction seems to influence the processing of familiar words. RESEARCH HIGHLIGHTS: We examined 10-month-old infants' ERP word familiarity response using audio-visual stories, in which a speaker addressed infants with direct or averted gaze while speaking. Ten-month-old infants can segment and recognise familiar words from audio-visual speech, indicated by their negative-going ERP response to familiar, relative to novel, words. This negative-going ERP word familiarity effect was present for isolated words over left-frontal electrodes regardless of whether the speaker offered eye contact while speaking. An additional positivity in response to familiar words was observed for direct gaze only, over right fronto-central and central electrodes.
Collapse
Affiliation(s)
- Melis Çetinçelik
- Max Planck Institute for Psycholinguistics, Nijmegen, Gelderland, The Netherlands
| | - Caroline F Rowland
- Max Planck Institute for Psycholinguistics, Nijmegen, Gelderland, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Gelderland, The Netherlands
| | - Tineke M Snijders
- Max Planck Institute for Psycholinguistics, Nijmegen, Gelderland, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Gelderland, The Netherlands
- Cognitive Neuropsychology Department, Tilburg University, Tilburg, The Netherlands
| |
Collapse
|
2
|
Frei V, Schmitt R, Meyer M, Giroud N. Processing of Visual Speech Cues in Speech-in-Noise Comprehension Depends on Working Memory Capacity and Enhances Neural Speech Tracking in Older Adults With Hearing Impairment. Trends Hear 2024; 28:23312165241287622. [PMID: 39444375 PMCID: PMC11520018 DOI: 10.1177/23312165241287622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 08/21/2024] [Accepted: 09/11/2024] [Indexed: 10/25/2024] Open
Abstract
Comprehending speech in noise (SiN) poses a challenge for older hearing-impaired listeners, requiring auditory and working memory resources. Visual speech cues provide additional sensory information supporting speech understanding, while the extent of such visual benefit is characterized by large variability, which might be accounted for by individual differences in working memory capacity (WMC). In the current study, we investigated behavioral and neurofunctional (i.e., neural speech tracking) correlates of auditory and audio-visual speech comprehension in babble noise and the associations with WMC. Healthy older adults with hearing impairment quantified by pure-tone hearing loss (threshold average: 31.85-57 dB, N = 67) listened to sentences in babble noise in audio-only, visual-only and audio-visual speech modality and performed a pattern matching and a comprehension task, while electroencephalography (EEG) was recorded. Behaviorally, no significant difference in task performance was observed across modalities. However, we did find a significant association between individual working memory capacity and task performance, suggesting a more complex interplay between audio-visual speech cues, working memory capacity and real-world listening tasks. Furthermore, we found that the visual speech presentation was accompanied by increased cortical tracking of the speech envelope, particularly in a right-hemispheric auditory topographical cluster. Post-hoc, we investigated the potential relationships between the behavioral performance and neural speech tracking but were not able to establish a significant association. Overall, our results show an increase in neurofunctional correlates of speech associated with congruent visual speech cues, specifically in a right auditory cluster, suggesting multisensory integration.
Collapse
Affiliation(s)
- Vanessa Frei
- Computational Neuroscience of Speech and Hearing, Department of Computational Linguistics, University of Zurich, Zurich, Switzerland
- International Max Planck Research School for the Life Course: Evolutionary and Ontogenetic Dynamics (LIFE), Berlin, Germany
| | - Raffael Schmitt
- Computational Neuroscience of Speech and Hearing, Department of Computational Linguistics, University of Zurich, Zurich, Switzerland
- International Max Planck Research School for the Life Course: Evolutionary and Ontogenetic Dynamics (LIFE), Berlin, Germany
- Competence Center Language & Medicine, Center of Medical Faculty and Faculty of Arts and Sciences, University of Zurich, Zurich, Switzerland
| | - Martin Meyer
- Competence Center Language & Medicine, Center of Medical Faculty and Faculty of Arts and Sciences, University of Zurich, Zurich, Switzerland
- University of Zurich, University Research Priority Program Dynamics of Healthy Aging, Zurich, Switzerland
- Center for Neuroscience Zurich, University and ETH of Zurich, Zurich, Switzerland
- Evolutionary Neuroscience of Language, Department of Comparative Language Science, University of Zurich, Zurich, Switzerland
- Cognitive Psychology Unit, Alpen-Adria University, Klagenfurt, Austria
| | - Nathalie Giroud
- Computational Neuroscience of Speech and Hearing, Department of Computational Linguistics, University of Zurich, Zurich, Switzerland
- International Max Planck Research School for the Life Course: Evolutionary and Ontogenetic Dynamics (LIFE), Berlin, Germany
- Competence Center Language & Medicine, Center of Medical Faculty and Faculty of Arts and Sciences, University of Zurich, Zurich, Switzerland
- Center for Neuroscience Zurich, University and ETH of Zurich, Zurich, Switzerland
| |
Collapse
|
3
|
Zhang Y, Ding R, Frassinelli D, Tuomainen J, Klavinskis-Whiting S, Vigliocco G. The role of multimodal cues in second language comprehension. Sci Rep 2023; 13:20824. [PMID: 38012193 PMCID: PMC10682458 DOI: 10.1038/s41598-023-47643-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 11/16/2023] [Indexed: 11/29/2023] Open
Abstract
In face-to-face communication, multimodal cues such as prosody, gestures, and mouth movements can play a crucial role in language processing. While several studies have addressed how these cues contribute to native (L1) language processing, their impact on non-native (L2) comprehension is largely unknown. Comprehension of naturalistic language by L2 comprehenders may be supported by the presence of (at least some) multimodal cues, as these provide correlated and convergent information that may aid linguistic processing. However, it is also the case that multimodal cues may be less used by L2 comprehenders because linguistic processing is more demanding than for L1 comprehenders, leaving more limited resources for the processing of multimodal cues. In this study, we investigated how L2 comprehenders use multimodal cues in naturalistic stimuli (while participants watched videos of a speaker), as measured by electrophysiological responses (N400) to words, and whether there are differences between L1 and L2 comprehenders. We found that prosody, gestures, and informative mouth movements each reduced the N400 in L2, indexing easier comprehension. Nevertheless, L2 participants showed weaker effects for each cue compared to L1 comprehenders, with the exception of meaningful gestures and informative mouth movements. These results show that L2 comprehenders focus on specific multimodal cues - meaningful gestures that support meaningful interpretation and mouth movements that enhance the acoustic signal - while using multimodal cues to a lesser extent than L1 comprehenders overall.
Collapse
Affiliation(s)
- Ye Zhang
- Experimental Psychology, University College London, London, UK
| | - Rong Ding
- Language and Computation in Neural Systems, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| | - Diego Frassinelli
- Department of Linguistics, University of Konstanz, Konstanz, Germany
| | - Jyrki Tuomainen
- Speech, Hearing and Phonetic Sciences, University College London, London, UK
| | | | | |
Collapse
|
4
|
Brown VA, Strand JF. Preregistration: Practical Considerations for Speech, Language, and Hearing Research. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:1889-1898. [PMID: 36472937 PMCID: PMC10465155 DOI: 10.1044/2022_jslhr-22-00317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 07/18/2022] [Accepted: 08/27/2022] [Indexed: 06/17/2023]
Abstract
PURPOSE In the last decade, psychology and other sciences have implemented numerous reforms to improve the robustness of our research, many of which are based on increasing transparency throughout the research process. Among these reforms is the practice of preregistration, in which researchers create a time-stamped and uneditable document before data collection that describes the methods of the study, how the data will be analyzed, the sample size, and many other decisions. The current article highlights the benefits of preregistration with a focus on the specific issues that speech, language, and hearing researchers are likely to encounter, and additionally provides a tutorial for writing preregistrations. CONCLUSIONS Although rates of preregistration have increased dramatically in recent years, the practice is still relatively uncommon in research on speech, language, and hearing. Low rates of adoption may be driven by a lack of understanding of the benefits of preregistration (either generally or for our discipline in particular) or uncertainty about how to proceed if it becomes necessary to deviate from the preregistered plan. Alternatively, researchers may see the benefits of preregistration but not know where to start, and gathering this information from a wide variety of sources is arduous and time consuming. This tutorial addresses each of these potential roadblocks to preregistration and equips readers with tools to facilitate writing preregistrations for research on speech, language, and hearing. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.21644843.
Collapse
Affiliation(s)
- Violet A. Brown
- Department of Psychological & Brain Sciences, Washington University in St. Louis, MO
| | | |
Collapse
|
5
|
Van Engen KJ, Dey A, Sommers MS, Peelle JE. Audiovisual speech perception: Moving beyond McGurk. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:3216. [PMID: 36586857 PMCID: PMC9894660 DOI: 10.1121/10.0015262] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 10/26/2022] [Accepted: 11/05/2022] [Indexed: 05/29/2023]
Abstract
Although it is clear that sighted listeners use both auditory and visual cues during speech perception, the manner in which multisensory information is combined is a matter of debate. One approach to measuring multisensory integration is to use variants of the McGurk illusion, in which discrepant auditory and visual cues produce auditory percepts that differ from those based on unimodal input. Not all listeners show the same degree of susceptibility to the McGurk illusion, and these individual differences are frequently used as a measure of audiovisual integration ability. However, despite their popularity, we join the voices of others in the field to argue that McGurk tasks are ill-suited for studying real-life multisensory speech perception: McGurk stimuli are often based on isolated syllables (which are rare in conversations) and necessarily rely on audiovisual incongruence that does not occur naturally. Furthermore, recent data show that susceptibility to McGurk tasks does not correlate with performance during natural audiovisual speech perception. Although the McGurk effect is a fascinating illusion, truly understanding the combined use of auditory and visual information during speech perception requires tasks that more closely resemble everyday communication: namely, words, sentences, and narratives with congruent auditory and visual speech cues.
Collapse
Affiliation(s)
- Kristin J Van Engen
- Department of Psychological and Brain Sciences, Washington University, St. Louis, Missouri 63130, USA
| | - Avanti Dey
- PLOS ONE, 1265 Battery Street, San Francisco, California 94111, USA
| | - Mitchell S Sommers
- Department of Psychological and Brain Sciences, Washington University, St. Louis, Missouri 63130, USA
| | - Jonathan E Peelle
- Department of Otolaryngology, Washington University, St. Louis, Missouri 63130, USA
| |
Collapse
|
6
|
Xiu B, Paul BT, Chen JM, Le TN, Lin VY, Dimitrijevic A. Neural responses to naturalistic audiovisual speech are related to listening demand in cochlear implant users. Front Hum Neurosci 2022; 16:1043499. [DOI: 10.3389/fnhum.2022.1043499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 10/21/2022] [Indexed: 11/09/2022] Open
Abstract
There is a weak relationship between clinical and self-reported speech perception outcomes in cochlear implant (CI) listeners. Such poor correspondence may be due to differences in clinical and “real-world” listening environments and stimuli. Speech in the real world is often accompanied by visual cues, background environmental noise, and is generally in a conversational context, all factors that could affect listening demand. Thus, our objectives were to determine if brain responses to naturalistic speech could index speech perception and listening demand in CI users. Accordingly, we recorded high-density electroencephalogram (EEG) while CI users listened/watched a naturalistic stimulus (i.e., the television show, “The Office”). We used continuous EEG to quantify “speech neural tracking” (i.e., TRFs, temporal response functions) to the show’s soundtrack and 8–12 Hz (alpha) brain rhythms commonly related to listening effort. Background noise at three different signal-to-noise ratios (SNRs), +5, +10, and +15 dB were presented to vary the difficulty of following the television show, mimicking a natural noisy environment. The task also included an audio-only (no video) condition. After each condition, participants subjectively rated listening demand and the degree of words and conversations they felt they understood. Fifteen CI users reported progressively higher degrees of listening demand and less words and conversation with increasing background noise. Listening demand and conversation understanding in the audio-only condition was comparable to that of the highest noise condition (+5 dB). Increasing background noise affected speech neural tracking at a group level, in addition to eliciting strong individual differences. Mixed effect modeling showed that listening demand and conversation understanding were correlated to early cortical speech tracking, such that high demand and low conversation understanding occurred with lower amplitude TRFs. In the high noise condition, greater listening demand was negatively correlated to parietal alpha power, where higher demand was related to lower alpha power. No significant correlations were observed between TRF/alpha and clinical speech perception scores. These results are similar to previous findings showing little relationship between clinical speech perception and quality-of-life in CI users. However, physiological responses to complex natural speech may provide an objective measure of aspects of quality-of-life measures like self-perceived listening demand.
Collapse
|
7
|
Hanulíková A. Do faces speak volumes? Social expectations in speech comprehension and evaluation across three age groups. PLoS One 2021; 16:e0259230. [PMID: 34710176 PMCID: PMC8553087 DOI: 10.1371/journal.pone.0259230] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Accepted: 10/18/2021] [Indexed: 11/18/2022] Open
Abstract
An unresolved issue in social perception concerns the effect of perceived ethnicity on speech processing. Bias-based accounts assume conscious misunderstanding of native speech in the case of a speaker classification as nonnative, resulting in negative ratings and poorer comprehension. In contrast, exemplar models of socially indexed speech perception suggest that such negative effects arise only when a contextual cue to the social identity is misleading, i.e. when ethnicity and speech clash with listeners' expectations. To address these accounts, and to assess ethnicity effects across different age groups, three non-university populations (N = 172) were primed with photographs of Asian and white European women and asked to repeat and rate utterances spoken in three accents (Korean-accented German, a regional German accent, standard German), all embedded in background noise. In line with exemplar models, repetition accuracy increased when the expected and perceived speech matched, but the effect was limited to the foreign accent, and-at the group level-to teens and older adults. In contrast, Asian speakers received the most negative accent ratings across all accents, consistent with a bias-based view, but group distinctions again came into play here, with the effect most pronounced in older adults, and limited to standard German for teens. Importantly, the effects varied across ages, with younger adults showing no effects of ethnicity in either task. The findings suggest that theoretical contradictions are a consequence of methodological choices, which reflect distinct aspects of social information processing.
Collapse
Affiliation(s)
- Adriana Hanulíková
- Department of German – German Linguistics, Albert-Ludwigs-Universität Freiburg, University of Freiburg, Freiburg, Germany
- Freiburg Institute of Advanced Studies (FRIAS), University of Freiburg, Freiburg, Germany
- * E-mail:
| |
Collapse
|
8
|
Graf S, Bungenstock A, Richter L, Unterhofer C, Gruner M, Hartmann P, Hoyer P. Acoustically Induced Vocal Training for Individuals With Impaired Hearing. J Voice 2021; 37:374-381. [PMID: 33632556 DOI: 10.1016/j.jvoice.2021.01.020] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2020] [Revised: 01/24/2021] [Accepted: 01/26/2021] [Indexed: 10/22/2022]
Abstract
OBJECTIVES/HYPOTHESIS Articulation, phonation, and resonance disorders in the speech of hearing-impaired-speakers reduces intelligibility. The study focusses on (1) whether nonacoustic feedback may facilitate the adjustment of the vocal tract, leading to increased vocal tract resonance, and (2) whether training with the feedback would be helpful for the subsequent formation of vowels. STUDY DESIGN Prospective. METHODS Seven profoundly hearing-impaired participants used acoustic sound waves in the frequency range of the first two vocal tract resonances applied in front of the open mouth at intensities above 1 Pa. They were asked to amplify the sound via adjusting the vocal tract. The sound waves corresponded to the first and second resonance frequencies of the vowels [u], [o], and [a]. The self-assessment of the participants and a software-based/auditory analysis was reported. RESULTS The participants were able to enhance the acoustic signal by adjusting the vocal tract shape. The self-perception of the participants, the auditory voice analysis, and the acoustic analysis of vowels were consistent with each other. While the maximum sound pressure levels were constant, the mean sound pressure levels increased. Breathiness and hoarseness declined during the exercises. Resonance/harmonic-to-noise ratio increased, especially for the vowels [u], [o], [a]. Furthermore, the positively connoted feedback from the participants indicated easier sound production. CONCLUSION Nonauditory feedback, based on acoustic waves, could be suitable for improving the formation of vowels. The findings are in accordance with a reduction of acoustic losses within the vocal tract.
Collapse
Affiliation(s)
- Simone Graf
- Department of Otorhinolaryngology/Phoniatrics, Klinikum Rechts der Isar, Technical University Munich, Munich, Germany
| | - Anna Bungenstock
- Department of Otorhinolaryngology/Phoniatrics, Klinikum Rechts der Isar, Technical University Munich, Munich, Germany
| | - Lena Richter
- Department of Otorhinolaryngology/Phoniatrics, Klinikum Rechts der Isar, Technical University Munich, Munich, Germany
| | - Carmen Unterhofer
- Department of Otorhinolaryngology/Phoniatrics, Klinikum Rechts der Isar, Technical University Munich, Munich, Germany
| | - Michael Gruner
- Fraunhofer Application Center for Optical Metrology and Surface Technologies (AZOM), Zwickau, Germany and West Saxon University of Applied Sciences, Zwickau, Germany
| | - Peter Hartmann
- Fraunhofer Application Center for Optical Metrology and Surface Technologies (AZOM), Zwickau, Germany and West Saxon University of Applied Sciences, Zwickau, Germany
| | | |
Collapse
|
9
|
Talking Points: A Modulating Circle Increases Listening Effort Without Improving Speech Recognition in Young Adults. Psychon Bull Rev 2020; 27:536-543. [PMID: 32128719 DOI: 10.3758/s13423-020-01713-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Speech recognition is improved when the acoustic input is accompanied by visual cues provided by a talking face (Erber in Journal of Speech and Hearing Research, 12(2), 423-425, 1969; Sumby & Pollack in The Journal of the Acoustical Society of America, 26(2), 212-215, 1954). One way that the visual signal facilitates speech recognition is by providing the listener with information about fine phonetic detail that complements information from the auditory signal. However, given that degraded face stimuli can still improve speech recognition accuracy (Munhall, Kroos, Jozan, & Vatikiotis-Bateson in Perception & Psychophysics, 66(4), 574-583, 2004), and static or moving shapes can improve speech detection accuracy (Bernstein, Auer, & Takayanagi in Speech Communication, 44(1-4), 5-18, 2004), aspects of the visual signal other than fine phonetic detail may also contribute to the perception of speech. In two experiments, we show that a modulating circle providing information about the onset, offset, and acoustic amplitude envelope of the speech does not improve recognition of spoken sentences (Experiment 1) or words (Experiment 2). Further, contrary to our hypothesis, the modulating circle increased listening effort despite subjective reports that it made the word recognition task seem easier to complete (Experiment 2). These results suggest that audiovisual speech processing, even when the visual stimulus only conveys temporal information about the acoustic signal, may be a cognitively demanding process.
Collapse
|
10
|
Strand JF, Ray L, Dillman-Hasso NH, Villanueva J, Brown VA. Understanding Speech Amid the Jingle and Jangle: Recommendations for Improving Measurement Practices in Listening Effort Research. ACTA ACUST UNITED AC 2020; 3:169-188. [PMID: 34240011 DOI: 10.1080/25742442.2021.1903293] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
The latent constructs psychologists study are typically not directly accessible, so researchers must design measurement instruments that are intended to provide insights about those constructs. Construct validation-assessing whether instruments measure what they intend to-is therefore critical for ensuring that the conclusions we draw actually reflect the intended phenomena. Insufficient construct validation can lead to the jingle fallacy-falsely assuming two instruments measure the same construct because the instruments share a name (Thorndike, 1904)-and the jangle fallacy-falsely assuming two instruments measure different constructs because the instruments have different names (Kelley, 1927). In this paper, we examine construct validation practices in research on listening effort and identify patterns that strongly suggest the presence of jingle and jangle in the literature. We argue that the lack of construct validation for listening effort measures has led to inconsistent findings and hindered our understanding of the construct. We also provide specific recommendations for improving construct validation of listening effort instruments, drawing on the framework laid out in a recent paper on improving measurement practices (Flake & Fried, 2020). Although this paper addresses listening effort, the issues raised and recommendations presented are widely applicable to tasks used in research on auditory perception and cognitive psychology.
Collapse
Affiliation(s)
| | - Lucia Ray
- Carleton College, Department of Psychology
| | | | | | - Violet A Brown
- Washington University in St. Louis, Department of Psychological & Brain Sciences
| |
Collapse
|