1
|
Krason A, Zhang Y, Man H, Vigliocco G. Mouth and facial informativeness norms for 2276 English words. Behav Res Methods 2024; 56:4786-4801. [PMID: 37604959 PMCID: PMC11289175 DOI: 10.3758/s13428-023-02216-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/01/2023] [Indexed: 08/23/2023]
Abstract
Mouth and facial movements are part and parcel of face-to-face communication. The primary way of assessing their role in speech perception has been by manipulating their presence (e.g., by blurring the area of a speaker's lips) or by looking at how informative different mouth patterns are for the corresponding phonemes (or visemes; e.g., /b/ is visually more salient than /g/). However, moving beyond informativeness of single phonemes is challenging due to coarticulation and language variations (to name just a few factors). Here, we present mouth and facial informativeness (MaFI) for words, i.e., how visually informative words are based on their corresponding mouth and facial movements. MaFI was quantified for 2276 English words, varying in length, frequency, and age of acquisition, using phonological distance between a word and participants' speechreading guesses. The results showed that MaFI norms capture well the dynamic nature of mouth and facial movements per word, with words containing phonemes with roundness and frontness features, as well as visemes characterized by lower lip tuck, lip rounding, and lip closure being visually more informative. We also showed that the more of these features there are in a word, the more informative it is based on mouth and facial movements. Finally, we demonstrated that the MaFI norms generalize across different variants of English language. The norms are freely accessible via Open Science Framework ( https://osf.io/mna8j/ ) and can benefit any language researcher using audiovisual stimuli (e.g., to control for the effect of speech-linked mouth and facial movements).
Collapse
Affiliation(s)
- Anna Krason
- Department of Experimental Psychology, University College London, 26 Bedford Way, London, WC1H, 0AP, UK.
| | - Ye Zhang
- Department of Experimental Psychology, University College London, 26 Bedford Way, London, WC1H, 0AP, UK.
| | - Hillarie Man
- Department of Experimental Psychology, University College London, 26 Bedford Way, London, WC1H, 0AP, UK
| | - Gabriella Vigliocco
- Department of Experimental Psychology, University College London, 26 Bedford Way, London, WC1H, 0AP, UK
| |
Collapse
|
2
|
Weng Y, Rong Y, Peng G. The development of audiovisual speech perception in Mandarin-speaking children: Evidence from the McGurk paradigm. Child Dev 2024; 95:750-765. [PMID: 37843038 DOI: 10.1111/cdev.14022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Revised: 08/30/2023] [Accepted: 09/21/2023] [Indexed: 10/17/2023]
Abstract
The developmental trajectory of audiovisual speech perception in Mandarin-speaking children remains understudied. This cross-sectional study in Mandarin-speaking 3- to 4-year-old, 5- to 6-year-old, 7- to 8-year-old children, and adults from Xiamen, China (n = 87, 44 males) investigated this issue using the McGurk paradigm with three levels of auditory noise. For the identification of congruent stimuli, 3- to 4-year-olds underperformed older groups whose performances were comparable. For the perception of the incongruent stimuli, a developmental shift was observed as 3- to 4-year-olds made significantly more audio-dominant but fewer audiovisual-integrated responses to incongruent stimuli than older groups. With increasing auditory noise, the difference between children and adults widened in identifying congruent stimuli but narrowed in perceiving incongruent ones. The findings regarding noise effects agree with the statistically optimal hypothesis.
Collapse
Affiliation(s)
- Yi Weng
- Department of Chinese and Bilingual Studies, Research Centre for Language, Cognition, and Neuroscience, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Yicheng Rong
- Department of Chinese and Bilingual Studies, Research Centre for Language, Cognition, and Neuroscience, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Gang Peng
- Department of Chinese and Bilingual Studies, Research Centre for Language, Cognition, and Neuroscience, The Hong Kong Polytechnic University, Hong Kong SAR, China
| |
Collapse
|
3
|
Mutlu Aİ, Yüksel M. Listening effort, fatigue, and streamed voice quality during online university courses. LOGOP PHONIATR VOCO 2024:1-8. [PMID: 38440900 DOI: 10.1080/14015439.2024.2317789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 02/08/2024] [Indexed: 03/06/2024]
Abstract
Understanding the impact of listening effort (LE) and fatigue has become increasingly crucial in optimizing the learning experience with the growing prevalence of online classrooms as a mode of instruction. The purpose of this study was to investigate the LE, fatigue, and voice quality experienced by students during online and face-to-face class sessions. A total of 110 participants with an average age of 20.76 (range 18-28) comprising first year undergraduate students in Speech and Language Therapy and Audiology programs in Turkey, rated their LE during the 2022-2023 spring semester using the Listening Effort Screening Questionnaire (LESQ) and assessed their fatigue with the Multidimensional Fatigue Inventory (MFI-20). Voice quality of lecturers was assessed using smoothed cepstral peak prominence (CPPS) measurements. Data were collected from both online and face-to-face sessions. The results revealed that participants reported increased LE and fatigue during online sessions compared to face-to-face sessions and the differences were statistically significant. Correlation analysis showed significant relationships (p < 0.05) between audio-video streaming quality and LE-related items in the LESQ, as well as MFI sub-scales and total scores. The findings revealed a relationship between an increased preference for face-to-face classrooms and higher levels of LE and fatigue, emphasizing the significance of these factors in shaping the learning experience. CPPS measurements indicated a dysphonic voice quality during online classroom audio streaming. These findings highlight the challenges of online classes in terms of increased LE, fatigue, and voice quality issues. Understanding these factors is crucial for improving online instruction and student experience.
Collapse
Affiliation(s)
- Ayşe İlayda Mutlu
- School of Health Sciences, Department of Speech and Language Therapy, Lokman Hekim University, Ankara, Turkey
| | - Mustafa Yüksel
- School of Health Sciences, Department of Audiology, Ankara Medipol University, Ankara, Turkey
| |
Collapse
|
4
|
Sewell K, Brown VA, Farwell G, Rogers M, Zhang X, Strand JF. The effects of temporal cues, point-light displays, and faces on speech identification and listening effort. PLoS One 2023; 18:e0290826. [PMID: 38019831 PMCID: PMC10686424 DOI: 10.1371/journal.pone.0290826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Accepted: 08/16/2023] [Indexed: 12/01/2023] Open
Abstract
Among the most robust findings in speech research is that the presence of a talking face improves the intelligibility of spoken language. Talking faces supplement the auditory signal by providing fine phonetic cues based on the placement of the articulators, as well as temporal cues to when speech is occurring. In this study, we varied the amount of information contained in the visual signal, ranging from temporal information alone to a natural talking face. Participants were presented with spoken sentences in energetic or informational masking in four different visual conditions: audio-only, a modulating circle providing temporal cues to salient features of the speech, a digitally rendered point-light display showing lip movement, and a natural talking face. We assessed both sentence identification accuracy and self-reported listening effort. Audiovisual benefit for intelligibility was observed for the natural face in both informational and energetic masking, but the digitally rendered point-light display only provided benefit in energetic masking. Intelligibility for speech accompanied by the modulating circle did not differ from the audio-only conditions in either masker type. Thus, the temporal cues used here were insufficient to improve speech intelligibility in noise, but some types of digital point-light displays may contain enough phonetic detail to produce modest improvements in speech identification in noise.
Collapse
Affiliation(s)
- Katrina Sewell
- Department of Psychology, Carleton College, Northfield, MN, United States of America
| | - Violet A. Brown
- Department of Psychological & Brain Sciences, Washington University in St. Louis, St. Louis, MO, United States of America
| | - Grace Farwell
- Department of Psychology, Carleton College, Northfield, MN, United States of America
| | - Maya Rogers
- Department of Psychology, Carleton College, Northfield, MN, United States of America
| | - Xingyi Zhang
- Department of Psychology, Carleton College, Northfield, MN, United States of America
| | - Julia F. Strand
- Department of Psychology, Carleton College, Northfield, MN, United States of America
| |
Collapse
|
5
|
Krason A, Vigliocco G, Mailend ML, Stoll H, Varley R, Buxbaum LJ. Benefit of visual speech information for word comprehension in post-stroke aphasia. Cortex 2023; 165:86-100. [PMID: 37271014 PMCID: PMC10850036 DOI: 10.1016/j.cortex.2023.04.011] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Revised: 03/13/2023] [Accepted: 04/22/2023] [Indexed: 06/06/2023]
Abstract
Aphasia is a language disorder that often involves speech comprehension impairments affecting communication. In face-to-face settings, speech is accompanied by mouth and facial movements, but little is known about the extent to which they benefit aphasic comprehension. This study investigated the benefit of visual information accompanying speech for word comprehension in people with aphasia (PWA) and the neuroanatomic substrates of any benefit. Thirty-six PWA and 13 neurotypical matched control participants performed a picture-word verification task in which they indicated whether a picture of an animate/inanimate object matched a subsequent word produced by an actress in a video. Stimuli were either audiovisual (with visible mouth and facial movements) or auditory-only (still picture of a silhouette) with audio being clear (unedited) or degraded (6-band noise-vocoding). We found that visual speech information was more beneficial for neurotypical participants than PWA, and more beneficial for both groups when speech was degraded. A multivariate lesion-symptom mapping analysis for the degraded speech condition showed that lesions to superior temporal gyrus, underlying insula, primary and secondary somatosensory cortices, and inferior frontal gyrus were associated with reduced benefit of audiovisual compared to auditory-only speech, suggesting that the integrity of these fronto-temporo-parietal regions may facilitate cross-modal mapping. These findings provide initial insights into our understanding of the impact of audiovisual information on comprehension in aphasia and the brain regions mediating any benefit.
Collapse
Affiliation(s)
- Anna Krason
- Experimental Psychology, University College London, UK; Moss Rehabilitation Research Institute, Elkins Park, PA, USA.
| | - Gabriella Vigliocco
- Experimental Psychology, University College London, UK; Moss Rehabilitation Research Institute, Elkins Park, PA, USA
| | - Marja-Liisa Mailend
- Moss Rehabilitation Research Institute, Elkins Park, PA, USA; Department of Special Education, University of Tartu, Tartu Linn, Estonia
| | - Harrison Stoll
- Moss Rehabilitation Research Institute, Elkins Park, PA, USA; Applied Cognitive and Brain Science, Drexel University, Philadelphia, PA, USA
| | | | - Laurel J Buxbaum
- Moss Rehabilitation Research Institute, Elkins Park, PA, USA; Department of Rehabilitation Medicine, Thomas Jefferson University, Philadelphia, PA, USA
| |
Collapse
|
6
|
Suri KN, Whedon M, Lewis M. Perception of audio-visual synchrony in infants at elevated likelihood of developing autism spectrum disorder. Eur J Pediatr 2023; 182:2105-2117. [PMID: 36820895 DOI: 10.1007/s00431-023-04871-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Revised: 02/05/2023] [Accepted: 02/08/2023] [Indexed: 02/24/2023]
Abstract
UNLABELLED The inability to perceive audio-visual speech as a unified event may contribute to social impairments and language deficits in children with autism spectrum disorder (ASD). In this study, we examined and compared two groups of infants on their sensitivity to audio-visual asynchrony for a social (speaking face) and non-social event (bouncing ball) and assessed the relations between multisensory integration and language production. Infants at elevated likelihood of developing ASD were less sensitive to audio-visual synchrony for the social event than infants without elevated likelihood. Among infants without elevated likelihood, greater sensitivity to audio-visual synchrony for the social event was associated with a larger productive vocabulary. CONCLUSION Findings suggest that early deficits in multisensory integration may impair language development among infants with elevated likelihood of developing ASD. WHAT IS KNOWN •Perceptual integration of auditory and visual cues within speech is important for language development. •Prior work suggests that children with ASD are less sensitive to the temporal synchrony within audio-visual speech. WHAT IS NEW •In this study, infants at elevated likelihood of developing ASD showed a larger temporal binding window for adynamic social event (Speaking Face) than TD infants, suggesting less efficient multisensory integration.
Collapse
Affiliation(s)
- Kirin N Suri
- Institute for the Study of Child Development, Rutgers Robert Wood Johnson Medical School, 89 French Street, New Brunswick, NJ, 08901, USA.,Children's Health at Hackensack Meridian, Hackensack, NJ, 07601, USA
| | - Margaret Whedon
- Institute for the Study of Child Development, Rutgers Robert Wood Johnson Medical School, 89 French Street, New Brunswick, NJ, 08901, USA.
| | - Michael Lewis
- Institute for the Study of Child Development, Rutgers Robert Wood Johnson Medical School, 89 French Street, New Brunswick, NJ, 08901, USA
| |
Collapse
|
7
|
Van Engen KJ, Dey A, Sommers MS, Peelle JE. Audiovisual speech perception: Moving beyond McGurk. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:3216. [PMID: 36586857 PMCID: PMC9894660 DOI: 10.1121/10.0015262] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 10/26/2022] [Accepted: 11/05/2022] [Indexed: 05/29/2023]
Abstract
Although it is clear that sighted listeners use both auditory and visual cues during speech perception, the manner in which multisensory information is combined is a matter of debate. One approach to measuring multisensory integration is to use variants of the McGurk illusion, in which discrepant auditory and visual cues produce auditory percepts that differ from those based on unimodal input. Not all listeners show the same degree of susceptibility to the McGurk illusion, and these individual differences are frequently used as a measure of audiovisual integration ability. However, despite their popularity, we join the voices of others in the field to argue that McGurk tasks are ill-suited for studying real-life multisensory speech perception: McGurk stimuli are often based on isolated syllables (which are rare in conversations) and necessarily rely on audiovisual incongruence that does not occur naturally. Furthermore, recent data show that susceptibility to McGurk tasks does not correlate with performance during natural audiovisual speech perception. Although the McGurk effect is a fascinating illusion, truly understanding the combined use of auditory and visual information during speech perception requires tasks that more closely resemble everyday communication: namely, words, sentences, and narratives with congruent auditory and visual speech cues.
Collapse
Affiliation(s)
- Kristin J Van Engen
- Department of Psychological and Brain Sciences, Washington University, St. Louis, Missouri 63130, USA
| | - Avanti Dey
- PLOS ONE, 1265 Battery Street, San Francisco, California 94111, USA
| | - Mitchell S Sommers
- Department of Psychological and Brain Sciences, Washington University, St. Louis, Missouri 63130, USA
| | - Jonathan E Peelle
- Department of Otolaryngology, Washington University, St. Louis, Missouri 63130, USA
| |
Collapse
|
8
|
Crosse MJ, Foxe JJ, Tarrit K, Freedman EG, Molholm S. Resolution of impaired multisensory processing in autism and the cost of switching sensory modality. Commun Biol 2022; 5:601. [PMID: 35773473 PMCID: PMC9246932 DOI: 10.1038/s42003-022-03519-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Accepted: 05/23/2022] [Indexed: 11/09/2022] Open
Abstract
Children with autism spectrum disorders (ASD) exhibit alterations in multisensory processing, which may contribute to the prevalence of social and communicative deficits in this population. Resolution of multisensory deficits has been observed in teenagers with ASD for complex, social speech stimuli; however, whether this resolution extends to more basic multisensory processing deficits remains unclear. Here, in a cohort of 364 participants we show using simple, non-social audiovisual stimuli that deficits in multisensory processing observed in high-functioning children and teenagers with ASD are not evident in adults with the disorder. Computational modelling indicated that multisensory processing transitions from a default state of competition to one of facilitation, and that this transition is delayed in ASD. Further analysis revealed group differences in how sensory channels are weighted, and how this is impacted by preceding cross-sensory inputs. Our findings indicate that there is a complex and dynamic interplay among the sensory systems that differs considerably in individuals with ASD. Crosse et al. study a cohort of 364 participants with autism spectrum disorders (ASD) and matched controls, and show that deficits in multisensory processing observed in high-functioning children and teenagers with ASD are not evident in adults with the disorder. Using computational modelling they go on to demonstrate that there is a delayed transition of multisensory processing from a default state of competition to one of facilitation in ASD, as well as differences in sensory weighting and the ability to switch between sensory modalities, which sheds light on the interplay among sensory systems that differ in ASD individuals.
Collapse
Affiliation(s)
- Michael J Crosse
- The Cognitive Neurophysiology Laboratory, Department of Pediatrics, Albert Einstein College of Medicine, Bronx, NY, USA. .,The Dominick P. Purpura Department of Neuroscience, Rose F. Kennedy Intellectual and Developmental Disabilities Research Center, Albert Einstein College of Medicine, Bronx, NY, USA. .,Trinity Centre for Biomedical Engineering, Department of Mechanical, Manufacturing & Biomedical Engineering, Trinity College Dublin, Dublin, Ireland.
| | - John J Foxe
- The Cognitive Neurophysiology Laboratory, Department of Pediatrics, Albert Einstein College of Medicine, Bronx, NY, USA.,The Dominick P. Purpura Department of Neuroscience, Rose F. Kennedy Intellectual and Developmental Disabilities Research Center, Albert Einstein College of Medicine, Bronx, NY, USA.,The Cognitive Neurophysiology Laboratory, Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, NY, USA
| | - Katy Tarrit
- The Cognitive Neurophysiology Laboratory, Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, NY, USA
| | - Edward G Freedman
- The Cognitive Neurophysiology Laboratory, Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, NY, USA
| | - Sophie Molholm
- The Cognitive Neurophysiology Laboratory, Department of Pediatrics, Albert Einstein College of Medicine, Bronx, NY, USA. .,The Dominick P. Purpura Department of Neuroscience, Rose F. Kennedy Intellectual and Developmental Disabilities Research Center, Albert Einstein College of Medicine, Bronx, NY, USA. .,The Cognitive Neurophysiology Laboratory, Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, NY, USA.
| |
Collapse
|
9
|
Krason A, Fenton R, Varley R, Vigliocco G. The role of iconic gestures and mouth movements in face-to-face communication. Psychon Bull Rev 2022; 29:600-612. [PMID: 34671936 PMCID: PMC9038814 DOI: 10.3758/s13423-021-02009-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/06/2021] [Indexed: 11/16/2022]
Abstract
Human face-to-face communication is multimodal: it comprises speech as well as visual cues, such as articulatory and limb gestures. In the current study, we assess how iconic gestures and mouth movements influence audiovisual word recognition. We presented video clips of an actress uttering single words accompanied, or not, by more or less informative iconic gestures. For each word we also measured the informativeness of the mouth movements from a separate lipreading task. We manipulated whether gestures were congruent or incongruent with the speech, and whether the words were audible or noise vocoded. The task was to decide whether the speech from the video matched a previously seen picture. We found that congruent iconic gestures aided word recognition, especially in the noise-vocoded condition, and the effect was larger (in terms of reaction times) for more informative gestures. Moreover, more informative mouth movements facilitated performance in challenging listening conditions when the speech was accompanied by gestures (either congruent or incongruent) suggesting an enhancement when both cues are present relative to just one. We also observed (a trend) that more informative mouth movements speeded up word recognition across clarity conditions, but only when the gestures were absent. We conclude that listeners use and dynamically weight the informativeness of gestures and mouth movements available during face-to-face communication.
Collapse
Affiliation(s)
- Anna Krason
- Division of Psychology and Language Science, University College London, 26 Bedford Way, London, WC1H 0AP, UK.
| | - Rebecca Fenton
- Division of Psychology and Language Science, University College London, 26 Bedford Way, London, WC1H 0AP, UK
| | - Rosemary Varley
- Division of Psychology and Language Science, University College London, 26 Bedford Way, London, WC1H 0AP, UK
| | - Gabriella Vigliocco
- Division of Psychology and Language Science, University College London, 26 Bedford Way, London, WC1H 0AP, UK
| |
Collapse
|
10
|
Pattamadilok C, Sato M. How are visemes and graphemes integrated with speech sounds during spoken word recognition? ERP evidence for supra-additive responses during audiovisual compared to auditory speech processing. BRAIN AND LANGUAGE 2022; 225:105058. [PMID: 34929531 DOI: 10.1016/j.bandl.2021.105058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 10/31/2021] [Accepted: 12/08/2021] [Indexed: 06/14/2023]
Abstract
Both visual articulatory gestures and orthography provide information on the phonological content of speech. This EEG study investigated the integration between speech and these two visual inputs. A comparison of skilled readers' brain responses elicited by a spoken word presented alone versus synchronously with a static image of a viseme or a grapheme of the spoken word's onset showed that while neither visual input induced audiovisual integration on N1 acoustic component, both led to a supra-additive integration on P2, with a stronger integration between speech and graphemes on left-anterior electrodes. This pattern persisted in P350 time-window and generalized to all electrodes. The finding suggests a strong impact of spelling knowledge on phonetic processing and lexical access. It also indirectly indicates that the dynamic and predictive value present in natural lip movements but not in static visemes is particularly critical to the contribution of visual articulatory gestures to speech processing.
Collapse
Affiliation(s)
| | - Marc Sato
- Aix Marseille Univ, CNRS, LPL, Aix-en-Provence, France
| |
Collapse
|
11
|
Basharat A, Thayanithy A, Barnett-Cowan M. A Scoping Review of Audiovisual Integration Methodology: Screening for Auditory and Visual Impairment in Younger and Older Adults. Front Aging Neurosci 2022; 13:772112. [PMID: 35153716 PMCID: PMC8829696 DOI: 10.3389/fnagi.2021.772112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 12/17/2021] [Indexed: 11/13/2022] Open
Abstract
With the rise of the aging population, many scientists studying multisensory integration have turned toward understanding how this process may change with age. This scoping review was conducted to understand and describe the scope and rigor with which researchers studying audiovisual sensory integration screen for hearing and vision impairment. A structured search in three licensed databases (Scopus, PubMed, and PsychInfo) using the key concepts of multisensory integration, audiovisual modality, and aging revealed 2,462 articles, which were screened for inclusion by two reviewers. Articles were included if they (1) tested healthy older adults (minimum mean or median age of 60) with younger adults as a comparison (mean or median age between 18 and 35), (2) measured auditory and visual integration, (3) were written in English, and (4) reported behavioral outcomes. Articles that included the following were excluded: (1) tested taste exclusively, (2) tested olfaction exclusively, (3) tested somatosensation exclusively, (4) tested emotion perception, (5) were not written in English, (6) were clinical commentaries, editorials, interviews, letters, newspaper articles, abstracts only, or non-peer reviewed literature (e.g., theses), and (7) focused on neuroimaging without a behavioral component. Data pertaining to the details of the study (e.g., country of publication, year of publication, etc.) were extracted, however, of higher importance to our research question, data pertaining to screening measures used for hearing and vision impairment (e.g., type of test used, whether hearing- and visual-aids were worn, thresholds used, etc.) were extracted, collated, and summarized. Our search revealed that only 64% of studies screened for age-abnormal hearing impairment, 51% screened for age-abnormal vision impairment, and that consistent definitions of normal or abnormal vision and hearing were not used among the studies that screened for sensory abilities. A total of 1,624 younger adults and 4,778 older participants were included in the scoping review with males composing approximately 44% and females composing 56% of the total sample and most of the data was obtained from only four countries. We recommend that studies investigating the effects of aging on multisensory integration should screen for normal vision and hearing by using the World Health Organization's (WHO) hearing loss and visual impairment cut-off scores in order to maintain consistency among other aging researchers. As mild cognitive impairment (MCI) has been defined as a “transitional” or a “transitory” stage between normal aging and dementia and because approximately 3–5% of the aging population will develop MCI each year, it is therefore important that when researchers aim to study a healthy aging population, that they appropriately screen for MCI. One of our secondary aims was to determine how often researchers were screening for cognitive impairment and the types of tests that were used to do so. Our results revealed that only 55 out of 72 studies tested for neurological and cognitive function, and only a subset used standardized tests. Additionally, among the studies that used standardized tests, the cut-off scores used were not always adequate for screening out mild cognitive impairment. An additional secondary aim of this scoping review was to determine the feasibility of whether a meta-analysis could be conducted in the future to further quantitatively evaluate the results (i.e., are the findings obtained from studies using self-reported vision and hearing impairment screening methods significantly different from those measuring vision and hearing impairment in the lab) and to assess the scope of this problem. We found that it may not be feasible to conduct a meta-analysis with the entire dataset of this scoping review. However, a meta-analysis can be conducted if stricter parameters are used (e.g., focusing on accuracy or response time data only).Systematic Review Registration:https://doi.org/10.17605/OSF.IO/GTUHD.
Collapse
|
12
|
Myerson J, Tye-Murray N, Spehar B, Hale S, Sommers M. Predicting Audiovisual Word Recognition in Noisy Situations: Toward Precision Audiology. Ear Hear 2021; 42:1656-1667. [PMID: 34320527 PMCID: PMC8545708 DOI: 10.1097/aud.0000000000001072] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVE Spoken communication is better when one can see as well as hear the talker. Although age-related deficits in speech perception were observed, Tye-Murray and colleagues found that even when age-related deficits in audiovisual (AV) speech perception were observed, AV performance could be accurately predicted from auditory-only (A-only) and visual-only (V-only) performance, and that knowing individuals' ages did not increase the accuracy of prediction. This finding contradicts conventional wisdom, according to which age-related differences in AV speech perception are due to deficits in the integration of auditory and visual information, and our primary goal was to determine whether Tye-Murray et al.'s finding with a closed-set test generalizes to situations more like those in everyday life. A second goal was to test a new predictive model that has important implications for audiological assessment. DESIGN Participants (N = 109; ages 22-93 years), previously studied by Tye-Murray et al., were administered our new, open-set Lex-List test to assess their auditory, visual, and audiovisual perception of individual words. All testing was conducted in six-talker babble (three males and three females) presented at approximately 62 dB SPL. The level of the audio for the Lex-List items, when presented, was approximately 59 dB SPL because pilot testing suggested that this signal-to-noise ratio would avoid ceiling performance under the AV condition. RESULTS Multiple linear regression analyses revealed that A-only and V-only performance accounted for 87.9% of the variance in AV speech perception, and that the contribution of age failed to reach significance. Our new parabolic model accounted for even more (92.8%) of the variance in AV performance, and again, the contribution of age was not significant. Bayesian analyses revealed that for both linear and parabolic models, the present data were almost 10 times as likely to occur with a reduced model (without age) than with a full model (with age as a predictor). Furthermore, comparison of the two reduced models revealed that the data were more than 100 times as likely to occur with the parabolic model than with the linear regression model. CONCLUSIONS The present results strongly support Tye-Murray et al.'s hypothesis that AV performance can be accurately predicted from unimodal performance and that knowing individuals' ages does not increase the accuracy of that prediction. Our results represent an important initial step in extending Tye-Murray et al.'s findings to situations more like those encountered in everyday communication. The accuracy with which speech perception was predicted in this study foreshadows a form of precision audiology in which determining individual strengths and weaknesses in unimodal and multimodal speech perception facilitates identification of targets for rehabilitative efforts aimed at recovering and maintaining speech perception abilities critical to the quality of an older adult's life.
Collapse
Affiliation(s)
- Joel Myerson
- Department of Psychological and Brain Sciences, Washington University, Saint Louis, Missouri, U.S.A
| | - Nancy Tye-Murray
- Department of Otolaryngology, Washington University School of Medicine, Saint Louis, Missouri, U.S.A
| | - Brent Spehar
- Department of Otolaryngology, Washington University School of Medicine, Saint Louis, Missouri, U.S.A
| | - Sandra Hale
- Department of Psychological and Brain Sciences, Washington University, Saint Louis, Missouri, U.S.A
| | - Mitchell Sommers
- Department of Psychological and Brain Sciences, Washington University, Saint Louis, Missouri, U.S.A
| |
Collapse
|
13
|
Lasfargues-Delannoy A, Strelnikov K, Deguine O, Marx M, Barone P. Supra-normal skills in processing of visuo-auditory prosodic information by cochlear-implanted deaf patients. Hear Res 2021; 410:108330. [PMID: 34492444 DOI: 10.1016/j.heares.2021.108330] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Revised: 07/08/2021] [Accepted: 08/02/2021] [Indexed: 10/20/2022]
Abstract
Cochlear implanted (CI) adults with acquired deafness are known to depend on multisensory integration skills (MSI) for speech comprehension through the fusion of speech reading skills and their deficient auditory perception. But, little is known on how CI patients perceive prosodic information relating to speech content. Our study aimed to identify how CI patients use MSI between visual and auditory information to process paralinguistic prosodic information of multimodal speech and the visual strategies employed. A psychophysics assessment was developed, in which CI patients and hearing controls (NH) had to distinguish between a question and a statement. The controls were separated into two age groups (young and aged-matched) to dissociate any effect of aging. In addition, the oculomotor strategies used when facing a speaker in this prosodic decision task were recorded using an eye-tracking device and compared to controls. This study confirmed that prosodic processing is multisensory but it revealed that CI patients showed significant supra-normal audiovisual integration for prosodic information compared to hearing controls irrespective of age. This study clearly showed that CI patients had a visuo-auditory gain more than 3 times larger than that observed in hearing controls. Furthermore, CI participants performed better in the visuo-auditory situation through a specific oculomotor exploration of the face as they significantly fixate the mouth region more than young NH participants who fixate the eyes, whereas the aged-matched controls presented an intermediate exploration pattern equally reported between the eyes and mouth. To conclude, our study demonstrated that CI patients have supra-normal skills MSI when integrating visual and auditory linguistic prosodic information, and a specific adaptive strategy developed as it participates directly in speech content comprehension.
Collapse
Affiliation(s)
- Anne Lasfargues-Delannoy
- Université Fédérale de Toulouse - Université Paul Sabatier (UPS), France; UMR 5549 CerCo, UPS CNRS, France; CHU Toulouse - France, Service d'Oto Rhino Laryngologie (ORL), Otoneurologie et ORL Pédiatrique, Hôpital Pierre Paul Riquet, site Purpan France.
| | - Kuzma Strelnikov
- Université Fédérale de Toulouse - Université Paul Sabatier (UPS), France; UMR 5549 CerCo, UPS CNRS, France; CHU Toulouse, France
| | - Olivier Deguine
- Université Fédérale de Toulouse - Université Paul Sabatier (UPS), France; UMR 5549 CerCo, UPS CNRS, France; CHU Toulouse - France, Service d'Oto Rhino Laryngologie (ORL), Otoneurologie et ORL Pédiatrique, Hôpital Pierre Paul Riquet, site Purpan France
| | - Mathieu Marx
- Université Fédérale de Toulouse - Université Paul Sabatier (UPS), France; UMR 5549 CerCo, UPS CNRS, France; CHU Toulouse - France, Service d'Oto Rhino Laryngologie (ORL), Otoneurologie et ORL Pédiatrique, Hôpital Pierre Paul Riquet, site Purpan France
| | - Pascal Barone
- Université Fédérale de Toulouse - Université Paul Sabatier (UPS), France; UMR 5549 CerCo, UPS CNRS, France
| |
Collapse
|
14
|
Lalonde K, Werner LA. Development of the Mechanisms Underlying Audiovisual Speech Perception Benefit. Brain Sci 2021; 11:49. [PMID: 33466253 PMCID: PMC7824772 DOI: 10.3390/brainsci11010049] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2020] [Revised: 12/30/2020] [Accepted: 12/30/2020] [Indexed: 02/07/2023] Open
Abstract
The natural environments in which infants and children learn speech and language are noisy and multimodal. Adults rely on the multimodal nature of speech to compensate for noisy environments during speech communication. Multiple mechanisms underlie mature audiovisual benefit to speech perception, including reduced uncertainty as to when auditory speech will occur, use of correlations between the amplitude envelope of auditory and visual signals in fluent speech, and use of visual phonetic knowledge for lexical access. This paper reviews evidence regarding infants' and children's use of temporal and phonetic mechanisms in audiovisual speech perception benefit. The ability to use temporal cues for audiovisual speech perception benefit emerges in infancy. Although infants are sensitive to the correspondence between auditory and visual phonetic cues, the ability to use this correspondence for audiovisual benefit may not emerge until age four. A more cohesive account of the development of audiovisual speech perception may follow from a more thorough understanding of the development of sensitivity to and use of various temporal and phonetic cues.
Collapse
Affiliation(s)
- Kaylah Lalonde
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, NE 68131, USA
| | - Lynne A. Werner
- Department of Speech and Hearing Sciences, University of Washington, Seattle, WA 98105, USA;
| |
Collapse
|
15
|
Dorman MF, Natale S, Knickerbocker A. Bilateral Cochlear Implants Allow Listeners to Benefit from Visual Information When Talker Location is Varied. J Am Acad Audiol 2020; 31:547-550. [PMID: 32340054 DOI: 10.1055/s-0040-1709444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
BACKGROUND Previous research has found that when the location of a talker was varied and an auditory prompt indicated the location of the talker, the addition of visual information produced a significant and large improvement in speech understanding for listeners with bilateral cochlear implants (CIs) but not with a unilateral CI. Presumably, the sound-source localization ability of the bilateral CI listeners allowed them to orient to the auditory prompt and benefit from visual information for the subsequent target sentence. PURPOSE The goal of this project was to assess the robustness of previous research by using a different test environment, a different CI, different test material, and a different response measure. RESEARCH DESIGN Nine listeners fit with bilateral CIs were tested in a simulation of a crowded restaurant. Auditory-visual (AV) sentence material was presented from loudspeakers and video monitors at 0, +90, and -90 degrees. Each trial started with the presentation of an auditory alerting phrase from one of the three target loudspeakers followed by an AV target sentence from that loudspeaker/monitor. On each trial, the two nontarget monitors showed the speaker mouthing a different sentence. Sentences were presented in noise in four test conditions: one CI, one CI plus vision, bilateral CIs, and bilateral CIs plus vision. RESULTS Mean percent words correct for the four test conditions were: one CI, 43%; bilateral CI, 60%; one CI plus vision, 52%; and bilateral CI plus vision, 84%. Visual information did not significantly improve performance in the single CI conditions but did improve performance in the bilateral CI conditions. The magnitude of improvement for two CIs versus one CI in the AV condition was approximately twice that for two CIs versus one CI in the auditory condition. CONCLUSIONS Our results are consistent with previous data showing the large value of bilateral implants in a complex AV listening environment. The results indicate that the value of bilateral CIs for speech understanding is significantly underestimated in standard, auditory-only, single-speaker, test environments.
Collapse
Affiliation(s)
- Michael F Dorman
- Department of Speech and Hearing Science, Arizona State University, Tempe, Arizona
| | - Sarah Natale
- Department of Speech and Hearing Science, Arizona State University, Tempe, Arizona
| | - Alissa Knickerbocker
- Department of Speech and Hearing Science, Arizona State University, Tempe, Arizona
| |
Collapse
|
16
|
Abstract
It is widely accepted that seeing a talker improves a listener's ability to understand what a talker is saying in background noise (e.g., Erber, 1969; Sumby & Pollack, 1954). The literature is mixed, however, regarding the influence of the visual modality on the listening effort required to recognize speech (e.g., Fraser, Gagné, Alepins, & Dubois, 2010; Sommers & Phelps, 2016). Here, we present data showing that even when the visual modality robustly benefits recognition, processing audiovisual speech can still result in greater cognitive load than processing speech in the auditory modality alone. We show using a dual-task paradigm that the costs associated with audiovisual speech processing are more pronounced in easy listening conditions, in which speech can be recognized at high rates in the auditory modality alone-indeed, effort did not differ between audiovisual and audio-only conditions when the background noise was presented at a more difficult level. Further, we show that though these effects replicate with different stimuli and participants, they do not emerge when effort is assessed with a recall paradigm rather than a dual-task paradigm. Together, these results suggest that the widely cited audiovisual recognition benefit may come at a cost under more favorable listening conditions, and add to the growing body of research suggesting that various measures of effort may not be tapping into the same underlying construct (Strand et al., 2018).
Collapse
|
17
|
Fu Z, Wu X, Chen J. Congruent audiovisual speech enhances auditory attention decoding with EEG. J Neural Eng 2019; 16:066033. [PMID: 31505476 DOI: 10.1088/1741-2552/ab4340] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
OBJECTIVE The auditory attention decoding (AAD) approach can be used to determine the identity of the attended speaker during an auditory selective attention task, by analyzing measurements of electroencephalography (EEG) data. The AAD approach has the potential to guide the design of speech enhancement algorithms in hearing aids, i.e. to identify the speech stream of listener's interest so that hearing aids algorithms can amplify the target speech and attenuate other distracting sounds. This would consequently result in improved speech understanding and communication and reduced cognitive load, etc. The present work aimed to investigate whether additional visual input (i.e. lipreading) would enhance the AAD performance for normal-hearing listeners. APPROACH In a two-talker scenario, where auditory stimuli of audiobooks narrated by two speakers were presented, multi-channel EEG signals were recorded while participants were selectively attending to one speaker and ignoring the other one. Speakers' mouth movements were recorded during narrating for providing visual stimuli. Stimulus conditions included audio-only, visual input congruent with either (i.e. attended or unattended) speaker, and visual input incongruent with either speaker. The AAD approach was performed separately for each condition to evaluate the effect of additional visual input on AAD. MAIN RESULTS Relative to the audio-only condition, the AAD performance was found improved by visual input only when it was congruent with the attended speech stream, and the improvement was about 14 percentage points on decoding accuracy. Cortical envelope tracking activities in both auditory and visual cortex were demonstrated stronger for the congruent audiovisual speech condition than other conditions. In addition, a higher AAD robustness was revealed for the congruent audiovisual condition, with reduced channel number and trial duration achieving higher accuracy than the audio-only condition. SIGNIFICANCE The present work complements previous studies and further manifests the feasibility of the AAD-guided design of hearing aids in daily face-to-face conversations. The present work also has a directive significance for designing a low-density EEG setup for the AAD approach.
Collapse
Affiliation(s)
- Zhen Fu
- Department of Machine Intelligence, Speech and Hearing Research Center, and Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing 100871, People's Republic of China
| | | | | |
Collapse
|
18
|
Lalonde K, Werner LA. Infants and Adults Use Visual Cues to Improve Detection and Discrimination of Speech in Noise. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2019; 62:3860-3875. [PMID: 31618097 PMCID: PMC7201336 DOI: 10.1044/2019_jslhr-h-19-0106] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Revised: 05/30/2019] [Accepted: 07/08/2019] [Indexed: 06/10/2023]
Abstract
Purpose This study assessed the extent to which 6- to 8.5-month-old infants and 18- to 30-year-old adults detect and discriminate auditory syllables in noise better in the presence of visual speech than in auditory-only conditions. In addition, we examined whether visual cues to the onset and offset of the auditory signal account for this benefit. Method Sixty infants and 24 adults were randomly assigned to speech detection or discrimination tasks and were tested using a modified observer-based psychoacoustic procedure. Each participant completed 1-3 conditions: auditory-only, with visual speech, and with a visual signal that only cued the onset and offset of the auditory syllable. Results Mixed linear modeling indicated that infants and adults benefited from visual speech on both tasks. Adults relied on the onset-offset cue for detection, but the same cue did not improve their discrimination. The onset-offset cue benefited infants for both detection and discrimination. Whereas the onset-offset cue improved detection similarly for infants and adults, the full visual speech signal benefited infants to a lesser extent than adults on the discrimination task. Conclusions These results suggest that infants' use of visual onset-offset cues is mature, but their ability to use more complex visual speech cues is still developing. Additional research is needed to explore differences in audiovisual enhancement (a) of speech discrimination across speech targets and (b) with increasingly complex tasks and stimuli.
Collapse
Affiliation(s)
- Kaylah Lalonde
- Department of Speech & Hearing Sciences, University of Washington, Seattle
| | - Lynne A. Werner
- Department of Speech & Hearing Sciences, University of Washington, Seattle
| |
Collapse
|
19
|
Myers BR, Lense MD, Gordon RL. Pushing the Envelope: Developments in Neural Entrainment to Speech and the Biological Underpinnings of Prosody Perception. Brain Sci 2019; 9:brainsci9030070. [PMID: 30909454 PMCID: PMC6468669 DOI: 10.3390/brainsci9030070] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2018] [Revised: 03/08/2019] [Accepted: 03/15/2019] [Indexed: 01/29/2023] Open
Abstract
Prosodic cues in speech are indispensable for comprehending a speaker’s message, recognizing emphasis and emotion, parsing segmental units, and disambiguating syntactic structures. While it is commonly accepted that prosody provides a fundamental service to higher-level features of speech, the neural underpinnings of prosody processing are not clearly defined in the cognitive neuroscience literature. Many recent electrophysiological studies have examined speech comprehension by measuring neural entrainment to the speech amplitude envelope, using a variety of methods including phase-locking algorithms and stimulus reconstruction. Here we review recent evidence for neural tracking of the speech envelope and demonstrate the importance of prosodic contributions to the neural tracking of speech. Prosodic cues may offer a foundation for supporting neural synchronization to the speech envelope, which scaffolds linguistic processing. We argue that prosody has an inherent role in speech perception, and future research should fill the gap in our knowledge of how prosody contributes to speech envelope entrainment.
Collapse
Affiliation(s)
- Brett R Myers
- Department of Otolaryngology, Vanderbilt University Medical Center, 1215 21st Ave S, Nashville, TN 37232, USA.
- Department of Psychology and Human Development, Peabody College, 230 Appleton Place, Nashville, TN 37203, USA.
| | - Miriam D Lense
- Department of Otolaryngology, Vanderbilt University Medical Center, 1215 21st Ave S, Nashville, TN 37232, USA.
- Vanderbilt Kennedy Center, 110 Magnolia Circle, Nashville, TN 37203, USA.
- Vanderbilt Brain Institute, Vanderbilt University, 2215 Garland Ave, Nashville, TN 37232, USA.
- The Curb Center for Art, Enterprise, and Public Policy, Vanderbilt University, 1801 Edgehill Avenue, Nashville, TN 37212, USA.
| | - Reyna L Gordon
- Department of Otolaryngology, Vanderbilt University Medical Center, 1215 21st Ave S, Nashville, TN 37232, USA.
- Vanderbilt Brain Institute, Vanderbilt University, 2215 Garland Ave, Nashville, TN 37232, USA.
- The Curb Center for Art, Enterprise, and Public Policy, Vanderbilt University, 1801 Edgehill Avenue, Nashville, TN 37212, USA.
- Department of Psychology, Vanderbilt University, 2301 Vanderbilt Place, Nashville, TN 37240, USA.
| |
Collapse
|
20
|
Shahin AJ, Shen S, Kerlin JR. Tolerance for audiovisual asynchrony is enhanced by the spectrotemporal fidelity of the speaker's mouth movements and speech. LANGUAGE, COGNITION AND NEUROSCIENCE 2017; 32:1102-1118. [PMID: 28966930 PMCID: PMC5617130 DOI: 10.1080/23273798.2017.1283428] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2016] [Accepted: 01/07/2017] [Indexed: 06/07/2023]
Abstract
We examined the relationship between tolerance for audiovisual onset asynchrony (AVOA) and the spectrotemporal fidelity of the spoken words and the speaker's mouth movements. In two experiments that only varied in the temporal order of sensory modality, visual speech leading (exp1) or lagging (exp2) acoustic speech, participants watched intact and blurred videos of a speaker uttering trisyllabic words and nonwords that were noise vocoded with 4-, 8-, 16-, and 32-channels. They judged whether the speaker's mouth movements and the speech sounds were in-sync or out-of-sync. Individuals perceived synchrony (tolerated AVOA) on more trials when the acoustic speech was more speech-like (8 channels and higher vs. 4 channels), and when visual speech was intact than blurred (exp1 only). These findings suggest that enhanced spectrotemporal fidelity of the audiovisual (AV) signal prompts the brain to widen the window of integration promoting the fusion of temporally distant AV percepts.
Collapse
Affiliation(s)
- Antoine J Shahin
- Center for Mind and Brain, University of California, Davis, CA, 95618
| | - Stanley Shen
- Center for Mind and Brain, University of California, Davis, CA, 95618
| | - Jess R Kerlin
- Center for Mind and Brain, University of California, Davis, CA, 95618
| |
Collapse
|
21
|
Havy M, Foroud A, Fais L, Werker JF. The Role of Auditory and Visual Speech in Word Learning at 18 Months and in Adulthood. Child Dev 2017; 88:2043-2059. [PMID: 28124795 DOI: 10.1111/cdev.12715] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Visual information influences speech perception in both infants and adults. It is still unknown whether lexical representations are multisensory. To address this question, we exposed 18-month-old infants (n = 32) and adults (n = 32) to new word-object pairings: Participants either heard the acoustic form of the words or saw the talking face in silence. They were then tested on recognition in the same or the other modality. Both 18-month-old infants and adults learned the lexical mappings when the words were presented auditorily and recognized the mapping at test when the word was presented in either modality, but only adults learned new words in a visual-only presentation. These results suggest developmental changes in the sensory format of lexical representations.
Collapse
Affiliation(s)
- Mélanie Havy
- University of British Columbia.,Université de Genève
| | | | | | | |
Collapse
|
22
|
Abstract
Talkers automatically imitate aspects of perceived speech, a phenomenon known as phonetic convergence. Talkers have previously been found to converge to auditory and visual speech information. Furthermore, talkers converge more to the speech of a conversational partner who is seen and heard, relative to one who is just heard (Dias & Rosenblum Perception, 40, 1457-1466, 2011). A question raised by this finding is what visual information facilitates the enhancement effect. In the following experiments, we investigated the possible contributions of visible speech articulation to visual enhancement of phonetic convergence within the noninteractive context of a shadowing task. In Experiment 1, we examined the influence of the visibility of a talker on phonetic convergence when shadowing auditory speech either in the clear or in low-level auditory noise. The results suggest that visual speech can compensate for convergence that is reduced by auditory noise masking. Experiment 2 further established the visibility of articulatory mouth movements as being important to the visual enhancement of phonetic convergence. Furthermore, the word frequency and phonological neighborhood density characteristics of the words shadowed were found to significantly predict phonetic convergence in both experiments. Consistent with previous findings (e.g., Goldinger Psychological Review, 105, 251-279, 1998), phonetic convergence was greater when shadowing low-frequency words. Convergence was also found to be greater for low-density words, contrasting with previous predictions of the effect of phonological neighborhood density on auditory phonetic convergence (e.g., Pardo, Jordan, Mallari, Scanlon, & Lewandowski Journal of Memory and Language, 69, 183-195, 2013). Implications of the results for a gestural account of phonetic convergence are discussed.
Collapse
|
23
|
Moradi S, Lidestam B, Rönnberg J. Comparison of Gated Audiovisual Speech Identification in Elderly Hearing Aid Users and Elderly Normal-Hearing Individuals: Effects of Adding Visual Cues to Auditory Speech Stimuli. Trends Hear 2016; 20:20/0/2331216516653355. [PMID: 27317667 PMCID: PMC5562342 DOI: 10.1177/2331216516653355] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
The present study compared elderly hearing aid (EHA) users (n = 20) with elderly normal-hearing (ENH) listeners (n = 20) in terms of isolation points (IPs, the shortest time required for correct identification of a speech stimulus) and accuracy of audiovisual gated speech stimuli (consonants, words, and final words in highly and less predictable sentences) presented in silence. In addition, we compared the IPs of audiovisual speech stimuli from the present study with auditory ones extracted from a previous study, to determine the impact of the addition of visual cues. Both participant groups achieved ceiling levels in terms of accuracy in the audiovisual identification of gated speech stimuli; however, the EHA group needed longer IPs for the audiovisual identification of consonants and words. The benefit of adding visual cues to auditory speech stimuli was more evident in the EHA group, as audiovisual presentation significantly shortened the IPs for consonants, words, and final words in less predictable sentences; in the ENH group, audiovisual presentation only shortened the IPs for consonants and words. In conclusion, although the audiovisual benefit was greater for EHA group, this group had inferior performance compared with the ENH group in terms of IPs when supportive semantic context was lacking. Consequently, EHA users needed the initial part of the audiovisual speech signal to be longer than did their counterparts with normal hearing to reach the same level of accuracy in the absence of a semantic context.
Collapse
Affiliation(s)
- Shahram Moradi
- Linnaeus Centre HEAD, Department of Behavioral Sciences and Learning, Linköping University, Sweden
| | - Björn Lidestam
- Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden
| | - Jerker Rönnberg
- Linnaeus Centre HEAD, Department of Behavioral Sciences and Learning, Linköping University, Sweden
| |
Collapse
|
24
|
Tye-Murray N, Spehar B, Myerson J, Hale S, Sommers M. Lipreading and audiovisual speech recognition across the adult lifespan: Implications for audiovisual integration. Psychol Aging 2016; 31:380-9. [PMID: 27294718 PMCID: PMC4910521 DOI: 10.1037/pag0000094] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
In this study of visual (V-only) and audiovisual (AV) speech recognition in adults aged 22-92 years, the rate of age-related decrease in V-only performance was more than twice that in AV performance. Both auditory-only (A-only) and V-only performance were significant predictors of AV speech recognition, but age did not account for additional (unique) variance. Blurring the visual speech signal decreased speech recognition, and in AV conditions involving stimuli associated with equivalent unimodal performance for each participant, speech recognition remained constant from 22 to 92 years of age. Finally, principal components analysis revealed separate visual and auditory factors, but no evidence of an AV integration factor. Taken together, these results suggest that the benefit that comes from being able to see as well as hear a talker remains constant throughout adulthood and that changes in this AV advantage are entirely driven by age-related changes in unimodal visual and auditory speech recognition. (PsycINFO Database Record
Collapse
Affiliation(s)
| | - Brent Spehar
- Washington University in St Louis School of Medicine
| | | | | | | |
Collapse
|
25
|
Kaganovich N, Schumaker J, Rowland C. Matching heard and seen speech: An ERP study of audiovisual word recognition. BRAIN AND LANGUAGE 2016; 157-158:14-24. [PMID: 27155219 PMCID: PMC4915735 DOI: 10.1016/j.bandl.2016.04.010] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2015] [Revised: 03/23/2016] [Accepted: 04/10/2016] [Indexed: 06/05/2023]
Abstract
Seeing articulatory gestures while listening to speech-in-noise (SIN) significantly improves speech understanding. However, the degree of this improvement varies greatly among individuals. We examined a relationship between two distinct stages of visual articulatory processing and the SIN accuracy by combining a cross-modal repetition priming task with ERP recordings. Participants first heard a word referring to a common object (e.g., pumpkin) and then decided whether the subsequently presented visual silent articulation matched the word they had just heard. Incongruent articulations elicited a significantly enhanced N400, indicative of a mismatch detection at the pre-lexical level. Congruent articulations elicited a significantly larger LPC, indexing articulatory word recognition. Only the N400 difference between incongruent and congruent trials was significantly correlated with individuals' SIN accuracy improvement in the presence of the talker's face.
Collapse
Affiliation(s)
- Natalya Kaganovich
- Department of Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, IN 47907-2038, United States; Department of Psychological Sciences, Purdue University, 703 Third Street, West Lafayette, IN 47907-2038, United States.
| | - Jennifer Schumaker
- Department of Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, IN 47907-2038, United States
| | - Courtney Rowland
- Department of Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, IN 47907-2038, United States
| |
Collapse
|
26
|
Giezen MR, Emmorey K. Semantic Integration and Age of Acquisition Effects in Code-Blend Comprehension. JOURNAL OF DEAF STUDIES AND DEAF EDUCATION 2016; 21:213-221. [PMID: 26657077 PMCID: PMC4886315 DOI: 10.1093/deafed/env056] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/18/2015] [Revised: 11/09/2015] [Accepted: 11/09/2015] [Indexed: 06/05/2023]
Abstract
Semantic and lexical decision tasks were used to investigate the mechanisms underlying code-blend facilitation: the finding that hearing bimodal bilinguals comprehend signs in American Sign Language (ASL) and spoken English words more quickly when they are presented together simultaneously than when each is presented alone. More robust facilitation effects were observed for semantic decision than for lexical decision, suggesting that lexical integration of signs and words within a code-blend occurs primarily at the semantic level, rather than at the level of form. Early bilinguals exhibited greater facilitation effects than late bilinguals for English (the dominant language) in the semantic decision task, possibly because early bilinguals are better able to process early visual cues from ASL signs and use these to constrain English word recognition. Comprehension facilitation via semantic integration of words and signs is consistent with co-speech gesture research demonstrating facilitative effects of gesture integration on language comprehension.
Collapse
|
27
|
Prediction and constraint in audiovisual speech perception. Cortex 2015; 68:169-81. [PMID: 25890390 DOI: 10.1016/j.cortex.2015.03.006] [Citation(s) in RCA: 127] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2014] [Revised: 01/28/2015] [Accepted: 03/08/2015] [Indexed: 11/23/2022]
Abstract
During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported by distinct neuroanatomical mechanisms.
Collapse
|
28
|
Lalonde K, Holt RF. Preschoolers benefit from visually salient speech cues. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2015; 58:135-50. [PMID: 25322336 PMCID: PMC4712850 DOI: 10.1044/2014_jslhr-h-13-0343] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/28/2013] [Revised: 06/20/2014] [Accepted: 09/23/2014] [Indexed: 06/04/2023]
Abstract
PURPOSE This study explored visual speech influence in preschoolers using 3 developmentally appropriate tasks that vary in perceptual difficulty and task demands. They also examined developmental differences in the ability to use visually salient speech cues and visual phonological knowledge. METHOD Twelve adults and 27 typically developing 3- and 4-year-old children completed 3 audiovisual (AV) speech integration tasks: matching, discrimination, and recognition. The authors compared AV benefit for visually salient and less visually salient speech discrimination contrasts and assessed the visual saliency of consonant confusions in auditory-only and AV word recognition. RESULTS Four-year-olds and adults demonstrated visual influence on all measures. Three-year-olds demonstrated visual influence on speech discrimination and recognition measures. All groups demonstrated greater AV benefit for the visually salient discrimination contrasts. AV recognition benefit in 4-year-olds and adults depended on the visual saliency of speech sounds. CONCLUSIONS Preschoolers can demonstrate AV speech integration. Their AV benefit results from efficient use of visually salient speech cues. Four-year-olds, but not 3-year-olds, used visual phonological knowledge to take advantage of visually salient speech cues, suggesting possible developmental differences in the mechanisms of AV benefit.
Collapse
|
29
|
Phi-square Lexical Competition Database (Phi-Lex): an online tool for quantifying auditory and visual lexical competition. Behav Res Methods 2014; 46:148-58. [PMID: 23754576 DOI: 10.3758/s13428-013-0356-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
A widely agreed-upon feature of spoken word recognition is that multiple lexical candidates in memory are simultaneously activated in parallel when a listener hears a word, and that those candidates compete for recognition (Luce, Goldinger, Auer, & Vitevitch, Perception 62:615-625, 2000; Luce & Pisoni, Ear and Hearing 19:1-36, 1998; McClelland & Elman, Cognitive Psychology 18:1-86, 1986). Because the presence of those competitors influences word recognition, much research has sought to quantify the processes of lexical competition. Metrics that quantify lexical competition continuously are more effective predictors of auditory and visual (lipread) spoken word recognition than are the categorical metrics traditionally used (Feld & Sommers, Speech Communication 53:220-228, 2011; Strand & Sommers, Journal of the Acoustical Society of America 130:1663-1672, 2011). A limitation of the continuous metrics is that they are somewhat computationally cumbersome and require access to existing speech databases. This article describes the Phi-square Lexical Competition Database (Phi-Lex): an online, searchable database that provides access to multiple metrics of auditory and visual (lipread) lexical competition for English words, available at www.juliastrand.com/phi-lex .
Collapse
|
30
|
Moradi S, Lidestam B, Rönnberg J. Gated audiovisual speech identification in silence vs. noise: effects on time and accuracy. Front Psychol 2013; 4:359. [PMID: 23801980 PMCID: PMC3685792 DOI: 10.3389/fpsyg.2013.00359] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2013] [Accepted: 05/31/2013] [Indexed: 11/15/2022] Open
Abstract
This study investigated the degree to which audiovisual presentation (compared to auditory-only presentation) affected isolation point (IPs, the amount of time required for the correct identification of speech stimuli using a gating paradigm) in silence and noise conditions. The study expanded on the findings of Moradi et al. (under revision), using the same stimuli, but presented in an audiovisual instead of an auditory-only manner. The results showed that noise impeded the identification of consonants and words (i.e., delayed IPs and lowered accuracy), but not the identification of final words in sentences. In comparison with the previous study by Moradi et al., it can be concluded that the provision of visual cues expedited IPs and increased the accuracy of speech stimuli identification in both silence and noise. The implication of the results is discussed in terms of models for speech understanding.
Collapse
Affiliation(s)
- Shahram Moradi
- Linnaeus Centre HEAD, Department of Behavioral Sciences and Learning, Linköping University Linköping, Sweden
| | | | | |
Collapse
|
31
|
Tye-Murray N, Spehar BP, Myerson J, Hale S, Sommers MS. Reading your own lips: common-coding theory and visual speech perception. Psychon Bull Rev 2013; 20:115-9. [PMID: 23132604 PMCID: PMC3558632 DOI: 10.3758/s13423-012-0328-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Common-coding theory posits that (1) perceiving an action activates the same representations of motor plans that are activated by actually performing that action, and (2) because of individual differences in the ways that actions are performed, observing recordings of one's own previous behavior activates motor plans to an even greater degree than does observing someone else's behavior. We hypothesized that if observing oneself activates motor plans to a greater degree than does observing others, and if these activated plans contribute to perception, then people should be able to lipread silent video clips of their own previous utterances more accurately than they can lipread video clips of other talkers. As predicted, two groups of participants were able to lipread video clips of themselves, recorded more than two weeks earlier, significantly more accurately than video clips of others. These results suggest that visual input activates speech motor activity that links to word representations in the mental lexicon.
Collapse
Affiliation(s)
- Nancy Tye-Murray
- Department of Otolaryngology, Washington University School of Medicine, Campus Box 8115, 660 South Euclid Avenue, St. Louis, MO 63124, USA.
| | | | | | | | | |
Collapse
|
32
|
Strand JF, Sommers MS. Sizing up the competition: quantifying the influence of the mental lexicon on auditory and visual spoken word recognition. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 130:1663-72. [PMID: 21895103 PMCID: PMC3188976 DOI: 10.1121/1.3613930] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/28/2010] [Revised: 06/27/2011] [Accepted: 06/28/2011] [Indexed: 05/22/2023]
Abstract
Much research has explored how spoken word recognition is influenced by the architecture and dynamics of the mental lexicon (e.g., Luce and Pisoni, 1998; McClelland and Elman, 1986). A more recent question is whether the processes underlying word recognition are unique to the auditory domain, or whether visually perceived (lipread) speech may also be sensitive to the structure of the mental lexicon (Auer, 2002; Mattys, Bernstein, and Auer, 2002). The current research was designed to test the hypothesis that both aurally and visually perceived spoken words are isolated in the mental lexicon as a function of their modality-specific perceptual similarity to other words. Lexical competition (the extent to which perceptually similar words influence recognition of a stimulus word) was quantified using metrics that are well-established in the literature, as well as a statistical method for calculating perceptual confusability based on the phi-square statistic. Both auditory and visual spoken word recognition were influenced by modality-specific lexical competition as well as stimulus word frequency. These findings extend the scope of activation-competition models of spoken word recognition and reinforce the hypothesis (Auer, 2002; Mattys et al., 2002) that perceptual and cognitive properties underlying spoken word recognition are not specific to the auditory domain. In addition, the results support the use of the phi-square statistic as a better predictor of lexical competition than metrics currently used in models of spoken word recognition.
Collapse
Affiliation(s)
- Julia F Strand
- Washington University in Saint Louis, Department of Psychology, Campus Box 1125, Saint Louis, Missouri 63108, USA.
| | | |
Collapse
|
33
|
Holt RF, Kirk KI, Hay-McCutcheon M. Assessing multimodal spoken word-in-sentence recognition in children with normal hearing and children with cochlear implants. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2011; 54:632-657. [PMID: 20689028 PMCID: PMC3056932 DOI: 10.1044/1092-4388(2010/09-0148)] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
PURPOSE To examine multimodal spoken word-in-sentence recognition in children. METHOD Two experiments were undertaken. In Experiment 1, the youngest age with which the multimodal sentence recognition materials could be used was evaluated. In Experiment 2, lexical difficulty and presentation modality effects were examined, along with test-retest reliability and validity in normal-hearing children and those with cochlear implants. RESULTS Normal-hearing children as young as 3.25 years and those with cochlear implants just under 4 years who have used their device for at least 1 year were able to complete the multimodal sentence testing. Both groups identified lexically easy words in sentences more accurately than lexically hard words across modalities, although the largest effects occurred in the auditory-only modality. Both groups displayed audiovisual integration with the highest scores achieved in the audiovisual modality, followed sequentially by auditory-only and visual-only modalities. Recognition of words in sentences was correlated with recognition of words in isolation. Preliminary results suggest fair-to-good test-retest reliability. CONCLUSIONS The results suggest that children's audiovisual word-in-sentence recognition can be assessed using the materials developed for this investigation. With further development, the materials hold promise for becoming a test of multimodal sentence recognition for children with hearing loss.
Collapse
|
34
|
Feld J, Sommers M. There Goes the Neighborhood: Lipreading and the Structure of the Mental Lexicon. SPEECH COMMUNICATION 2011; 53:220-228. [PMID: 21170172 PMCID: PMC3002260 DOI: 10.1016/j.specom.2010.09.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
A central question in spoken word recognition research is whether words are recognized relationally, in the context of other words in the mental lexicon [1, 2]. The current research evaluated metrics for measuring the influence of the mental lexicon on visually perceived (lipread) spoken word recognition. Lexical competition (the extent to which perceptually similar words influence recognition of a stimulus word) was quantified using metrics that are well-established in the literature, as well as a novel statistical method for calculating perceptual confusability, based on the Phi-square statistic.The Phi-square statistic proved an effective measure for assessing lexical competition and explained significant variance in visual spoken word recognition beyond that accounted for by traditional metrics. Because these values include the influence of all words in the lexicon (rather than only perceptually very similar words), it suggests that even perceptually distant words may receive some activation, and therefore provide competition, during spoken word recognition. This work supports and extends earlier research [3] that proposed a common recognition system underlying auditory and visual spoken word recognition and provides support for the use of the Phi-square statistic for quantifying lexical competition.
Collapse
|
35
|
Tye-Murray N, Sommers M, Spehar B, Myerson J, Hale S. Aging, audiovisual integration, and the principle of inverse effectiveness. Ear Hear 2010; 31:636-44. [PMID: 20473178 PMCID: PMC2924437 DOI: 10.1097/aud.0b013e3181ddf7ff] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVE The purpose of this investigation was to compare the ability of young and older adults to integrate auditory and visual sentence materials under conditions of good and poor signal clarity. The principle of inverse effectiveness (PoIE), which characterizes many neuronal and behavioral phenomena related to multisensory integration, asserts that as unimodal performance declines, integration is enhanced. Thus, the PoIE predicts that both young and older adults will show enhanced integration of auditory and visual speech stimuli when these stimuli are degraded. More importantly, because older adults' unimodal speech recognition skills decline in both the auditory and visual domains, the PoIE predicts that older adults will show enhanced integration during audiovisual speech recognition relative to younger adults. This study provides a test of these predictions. DESIGN Fifty-three young and 53 older adults with normal hearing completed the closed-set Build-A-Sentence test and the CUNY Sentence test in a total of eight conditions; four unimodal and four audiovisual. In the unimodal conditions, stimuli were either auditory or visual and either easier or harder to perceive; the audiovisual conditions were formed from all the combinations of the unimodal signals. The hard visual signals were created by degrading video contrast, and the hard auditory signals were created by decreasing the signal to noise ratio. Scores from the unimodal and bimodal conditions were used to compute auditory enhancement and integration enhancement measures. RESULTS Contrary to the PoIE, neither the auditory enhancement nor integration enhancement measures increased when signal clarity in the auditory or visual channel of audiovisual speech stimuli was decreased, nor was either measure higher for older adults than for young adults. In audiovisual conditions with easy visual stimuli, the integration enhancement measure for older adults was equivalent to that for young adults. However, in conditions with hard visual stimuli, integration enhancement for older adults was significantly lower than that for young adults. CONCLUSIONS The present findings do not support extension of the PoIE to audiovisual speech recognition. Our results are not consistent with either the prediction that integration would be enhanced under conditions of poor signal clarity or the prediction that older adults would show enhanced integration, relative to young adults. Although there is a considerable controversy with regard to the best way to measure audiovisual integration, the fact that two of the most prominent measures, auditory enhancement and integration enhancement, both yielded results inconsistent with the PoIE, strongly suggests that the integration of audiovisual speech stimuli differs in some fundamental way from the integration of other bimodal stimuli. The results also suggest that aging does not impair integration enhancement when the visual speech signal has good clarity, but may affect it when the visual speech signal has poor clarity.
Collapse
Affiliation(s)
- Nancy Tye-Murray
- Department of Otolaryngology, Washington University School of Medicine, St. Louis, Missouri 63124, USA.
| | | | | | | | | |
Collapse
|
36
|
Abstract
BACKGROUND The visual speech signal can provide sufficient information to support successful communication. However, individual differences in the ability to appreciate that information are large, and relatively little is known about their sources. PURPOSE Here a body of research is reviewed regarding the development of a theoretical framework in which to study speechreading and individual differences in that ability. Based on the hypothesis that visual speech is processed via the same perceptual-cognitive machinery as auditory speech, a theoretical framework was developed by adapting a theoretical framework originally developed for auditory spoken word recognition. CONCLUSION The evidence to date is consistent with the conclusion that visual spoken word recognition is achieved via a process similar to auditory word recognition provided differences in perceptual similarity are taken into account. Words perceptually similar to many other words and that occur infrequently in the input stream are at a distinct disadvantage within this process. The results to date are also consistent with the conclusion that deaf individuals, regardless of speechreading ability, recognize spoken words via a process similar to individuals with hearing.
Collapse
Affiliation(s)
- Edward T Auer
- Department of Speech-Language-Hearing, University of Kansas, USA.
| |
Collapse
|
37
|
Abstract
Deafblind readers are heterogeneous in reading skill acquisition. This qualitative study uses in-depth interviews and protocol analyses and queries the three deafblind adult participants in describing their metacomprehension, metacognitive and metalinguistic strategies used when reading different types of text. Using retrospective analysis, the three adults describe and reflect on how they learned language and how they learned to read as children. The participants also describe the technology that assists them in reading print. Data suggest that deafblind adults use a variety of auditory, visual and tactilekinesthetic strategies (i.e. braille, large print, and raised print) in decoding English. Some also use ASL, Signed English and tactile ASL and tactile Signed English.
Collapse
|
38
|
Abstract
Spoken word recognition is thought to be achieved via competition in the mental lexicon between perceptually similar word forms. A review of the development and initial behavioral validations of computational models of visual spoken word recognition is presented, followed by a report of new empirical evidence. Specifically, a replication and extension of Mattys, Bernstein & Auer's (2002) study was conducted with 20 deaf participants who varied widely in speechreading ability. Participants visually identified isolated spoken words. Accuracy of visual spoken word recognition was influenced by the number of visually similar words in the lexicon and by the frequency of occurrence of the stimulus words. The results are consistent with the common view held within auditory word recognition that this task is accomplished via a process of activation and competition in which frequently occurring units are favored. Finally, future directions for visual spoken word recognition are discussed.
Collapse
Affiliation(s)
- Edward T Auer
- Department of Speech-Language-Hearing, University of Kansas, Lawrence, KS 66045, USA.
| |
Collapse
|