1
|
So C, Jung K. Approachability and Credibility of Virtual Character Faces: The Role of the Horizontal Viewing Angle. HUMAN FACTORS 2024; 66:1450-1474. [PMID: 36840518 DOI: 10.1177/00187208231153492] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
OBJECTIVE The present work explores how the horizontal viewing angle of a virtual character's face influences perceptions of credibility and approachability. BACKGROUND When encountering virtual characters, people rely both on credibility and approachability judgments to form a first impression of the depicted virtual character. Research shows that certain perceptions are preferred either on frontal or tilted faces, but not how approachability or credibility judgments relate to horizontal viewing angles in finer granularity between 0° and 45°. METHOD 52 participants performed a two-alternative forced choice (2AFC) task rating 240 pairwise comparisons of 20 virtual character faces shown in four horizontal viewing angles (0°, 15°, 30°, and 45°) on approachability and credibility. They also rated scales on individual differences based on the BIS-BAS framework (behavioral inhibition system, drive, and reward responsiveness), self-esteem, and personality traits (neuroticism, loneliness). RESULTS Both approachability and credibility were negatively related to the horizontal viewing angle, but the negative relationship was less pronounced for approachability. Notably, 15° tilted faces were associated with higher approachability than frontal faces by people scoring high in reward responsiveness, drive, and self-esteem, and scoring low in neuroticism and loneliness. CONCLUSION Our findings highlight the conditions under which showing a virtual character's face is preferred in a horizontally 15° tilted over a frontal position. APPLICATION The differential impact of the horizontal viewing angle on approachability and credibility should be considered when displaying virtual character faces.
Collapse
Affiliation(s)
- Chaehan So
- Virtual Friend, Los Angeles, California, and Department of Information and Interaction Design, Yonsei University, Seoul, Korea
| | - Kyuha Jung
- Department of Communication, Seoul National University, Seoul, Korea
| |
Collapse
|
2
|
Datta Choudhary Z, Bruder G, Welch GF. Visual Facial Enhancements Can Significantly Improve Speech Perception in the Presence of Noise. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:4751-4760. [PMID: 37782611 DOI: 10.1109/tvcg.2023.3320247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/04/2023]
Abstract
Human speech perception is generally optimal in quiet environments, however it becomes more difficult and error prone in the presence of noise, such as other humans speaking nearby or ambient noise. In such situations, human speech perception is improved by speech reading, i.e., watching the movements of a speaker's mouth and face, either consciously as done by people with hearing loss or subconsciously by other humans. While previous work focused largely on speech perception of two-dimensional videos of faces, there is a gap in the research field focusing on facial features as seen in head-mounted displays, including the impacts of display resolution, and the effectiveness of visually enhancing a virtual human face on speech perception in the presence of noise. In this paper, we present a comparative user study ( N=21) in which we investigated an audio-only condition compared to two levels of head-mounted display resolution ( 1832×1920 or 916×960 pixels per eye) and two levels of the native or visually enhanced appearance of a virtual human, the latter consisting of an up-scaled facial representation and simulated lipstick (lip coloring) added to increase contrast. To understand effects on speech perception in noise, we measured participants' speech reception thresholds (SRTs) for each audio-visual stimulus condition. These thresholds indicate the decibel levels of the speech signal that are necessary for a listener to receive the speech correctly 50% of the time. First, we show that the display resolution significantly affected participants' ability to perceive the speech signal in noise, which has practical implications for the field, especially in social virtual environments. Second, we show that our visual enhancement method was able to compensate for limited display resolution and was generally preferred by participants. Specifically, our participants indicated that they benefited from the head scaling more than the added facial contrast from the simulated lipstick. We discuss relationships, implications, and guidelines for applications that aim to leverage such enhancements.
Collapse
|
3
|
Abstract
A perceiver's ability to accurately predict target sounds in a forward-gated AV speech task indexes the strength and scope of anticipatory coarticulation in adult speech (Redford et al., JASA, 144, 2447-2461, 2018). This suggests a perception-based method for studying coarticulation in populations who may poorly tolerate the more invasive or restrictive techniques used to measure speech movements directly. But the use of perception to measure production begs the question of confounding influences on perceiver performance and thus on the reliability and generalizability of the proposed method. The present study was therefore designed to test whether a gated AV speech method for measuring coarticulation provides reliable results across different study populations (child versus adult), different task environments (in-lab versus online), and different coarticulatory directions (forward/anticipatory versus backward/carryover). The results indicated excellent measurement reliability across age groups in the forward/anticipatory measurement direction, though more perceivers are needed to achieve the same levels of agreement and consistency when the task is completed online. Accuracy was lower in the backward/carryover direction, and although agreement and consistency were still reasonably high across perceivers, the effect of age group differed between the laboratory and online environments, suggesting measurement error in one or both environments. Overall, the results support using in-lab or online perceptual judgments to measure anticipatory coarticulation in developmental studies of speech production. Further validation study is needed before the method can be extended to measure carryover coarticulation.
Collapse
|
4
|
Redford MA, Kallay JE, Bogdanov SV, Vatikiotis-Bateson E. Leveraging audiovisual speech perception to measure anticipatory coarticulation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:2447. [PMID: 30404498 PMCID: PMC6205840 DOI: 10.1121/1.5064783] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2017] [Revised: 09/27/2018] [Accepted: 10/07/2018] [Indexed: 06/08/2023]
Abstract
A noninvasive method for accurately measuring anticipatory coarticulation at experimentally defined temporal locations is introduced. The method leverages work in audiovisual (AV) speech perception to provide a synthetic and robust measure that can be used to inform psycholinguistic theory. In this validation study, speakers were audio-video recorded while producing simple subject-verb-object sentences with contrasting object noun rhymes. Coarticulatory resistance of target noun onsets was manipulated as was metrical context for the determiner that modified the noun. Individual sentences were then gated from the verb to sentence end at segmental landmarks. These stimuli were presented to perceivers who were tasked with guessing the sentence-final rhyme. An audio-only condition was included to estimate the contribution of visual information to perceivers' performance. Findings were that perceivers accurately identified rhymes earlier in the AV condition than in the audio-only condition (i.e., at determiner onset vs determiner vowel). Effects of coarticulatory resistance and metrical context were similar across conditions and consistent with previous work on coarticulation. These findings were further validated with acoustic measurement of the determiner vowel and a cumulative video-based measure of perioral movement. Overall, gated AV speech perception can be used to test specific hypotheses regarding coarticulatory scope and strength in running speech.
Collapse
Affiliation(s)
- Melissa A Redford
- Department of Linguistics, University of Oregon, Eugene, Oregon 97403, USA
| | - Jeffrey E Kallay
- Department of Linguistics, University of Oregon, Eugene, Oregon 97403, USA
| | - Sergei V Bogdanov
- Department of Linguistics, University of Oregon, Eugene, Oregon 97403, USA
| | - Eric Vatikiotis-Bateson
- Department of Linguistics, University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
5
|
Alsius A, Paré M, Munhall KG. Forty Years After Hearing Lips and Seeing Voices: the McGurk Effect Revisited. Multisens Res 2018; 31:111-144. [PMID: 31264597 DOI: 10.1163/22134808-00002565] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Accepted: 03/09/2017] [Indexed: 11/19/2022]
Abstract
Since its discovery 40 years ago, the McGurk illusion has been usually cited as a prototypical paradigmatic case of multisensory binding in humans, and has been extensively used in speech perception studies as a proxy measure for audiovisual integration mechanisms. Despite the well-established practice of using the McGurk illusion as a tool for studying the mechanisms underlying audiovisual speech integration, the magnitude of the illusion varies enormously across studies. Furthermore, the processing of McGurk stimuli differs from congruent audiovisual processing at both phenomenological and neural levels. This questions the suitability of this illusion as a tool to quantify the necessary and sufficient conditions under which audiovisual integration occurs in natural conditions. In this paper, we review some of the practical and theoretical issues related to the use of the McGurk illusion as an experimental paradigm. We believe that, without a richer understanding of the mechanisms involved in the processing of the McGurk effect, experimenters should be really cautious when generalizing data generated by McGurk stimuli to matching audiovisual speech events.
Collapse
Affiliation(s)
- Agnès Alsius
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| | - Martin Paré
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| | - Kevin G Munhall
- Psychology Department, Queen's University, Humphrey Hall, 62 Arch St., Kingston, Ontario, K7L 3N6 Canada
| |
Collapse
|
6
|
Looking Behavior and Audiovisual Speech Understanding in Children With Normal Hearing and Children With Mild Bilateral or Unilateral Hearing Loss. Ear Hear 2017; 39:783-794. [PMID: 29252979 DOI: 10.1097/aud.0000000000000534] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES Visual information from talkers facilitates speech intelligibility for listeners when audibility is challenged by environmental noise and hearing loss. Less is known about how listeners actively process and attend to visual information from different talkers in complex multi-talker environments. This study tracked looking behavior in children with normal hearing (NH), mild bilateral hearing loss (MBHL), and unilateral hearing loss (UHL) in a complex multi-talker environment to examine the extent to which children look at talkers and whether looking patterns relate to performance on a speech-understanding task. It was hypothesized that performance would decrease as perceptual complexity increased and that children with hearing loss would perform more poorly than their peers with NH. Children with MBHL or UHL were expected to demonstrate greater attention to individual talkers during multi-talker exchanges, indicating that they were more likely to attempt to use visual information from talkers to assist in speech understanding in adverse acoustics. It also was of interest to examine whether MBHL, versus UHL, would differentially affect performance and looking behavior. DESIGN Eighteen children with NH, eight children with MBHL, and 10 children with UHL participated (8-12 years). They followed audiovisual instructions for placing objects on a mat under three conditions: a single talker providing instructions via a video monitor, four possible talkers alternately providing instructions on separate monitors in front of the listener, and the same four talkers providing both target and nontarget information. Multi-talker background noise was presented at a 5 dB signal-to-noise ratio during testing. An eye tracker monitored looking behavior while children performed the experimental task. RESULTS Behavioral task performance was higher for children with NH than for either group of children with hearing loss. There were no differences in performance between children with UHL and children with MBHL. Eye-tracker analysis revealed that children with NH looked more at the screens overall than did children with MBHL or UHL, though individual differences were greater in the groups with hearing loss. Listeners in all groups spent a small proportion of time looking at relevant screens as talkers spoke. Although looking was distributed across all screens, there was a bias toward the right side of the display. There was no relationship between overall looking behavior and performance on the task. CONCLUSIONS The present study examined the processing of audiovisual speech in the context of a naturalistic task. Results demonstrated that children distributed their looking to a variety of sources during the task, but that children with NH were more likely to look at screens than were those with MBHL/UHL. However, all groups looked at the relevant talkers as they were speaking only a small proportion of the time. Despite variability in looking behavior, listeners were able to follow the audiovisual instructions and children with NH demonstrated better performance than children with MBHL/UHL. These results suggest that performance on some challenging multi-talker audiovisual tasks is not dependent on visual fixation to relevant talkers for children with NH or with MBHL/UHL.
Collapse
|
7
|
McCotter MV, Jordan TR. The Role of Facial Colour and Luminance in Visual and Audiovisual Speech Perception. Perception 2016; 32:921-36. [PMID: 14580139 DOI: 10.1068/p3316] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
We conducted four experiments to investigate the role of colour and luminance information in visual and audiovisual speech perception. In experiments la (stimuli presented in quiet conditions) and 1b (stimuli presented in auditory noise), face display types comprised naturalistic colour (NC), grey-scale (GS), and luminance inverted (LI) faces. In experiments 2a (quiet) and 2b (noise), face display types comprised NC, colour inverted (CI), LI, and colour and luminance inverted (CLI) faces. Six syllables and twenty-two words were used to produce auditory and visual speech stimuli. Auditory and visual signals were combined to produce congruent and incongruent audiovisual speech stimuli. Experiments 1a and 1b showed that perception of visual speech, and its influence on identifying the auditory components of congruent and incongruent audiovisual speech, was less for LI than for either NC or GS faces, which produced identical results. Experiments 2a and 2b showed that perception of visual speech, and influences on perception of incongruent auditory speech, was less for LI and CLI faces than for NC and CI faces (which produced identical patterns of performance). Our findings for NC and CI faces suggest that colour is not critical for perception of visual and audiovisual speech. The effect of luminance inversion on performance accuracy was relatively small (5%), which suggests that the luminance information preserved in LI faces is important for the processing of visual and audiovisual speech.
Collapse
Affiliation(s)
- Maxine V McCotter
- School of Psychology, University of Nottingham, University Park, Nottingham NG7 2RD, UK.
| | | |
Collapse
|
8
|
Jordan TR, Sheen M, Abedipour L, Paterson KB. Visual speech perception in foveal and extrafoveal vision: further implications for divisions in hemispheric projections. PLoS One 2014; 9:e98273. [PMID: 25032950 PMCID: PMC4102446 DOI: 10.1371/journal.pone.0098273] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2014] [Accepted: 04/30/2014] [Indexed: 11/19/2022] Open
Abstract
When observing a talking face, it has often been argued that visual speech to the left and right of fixation may produce differences in performance due to divided projections to the two cerebral hemispheres. However, while it seems likely that such a division in hemispheric projections exists for areas away from fixation, the nature and existence of a functional division in visual speech perception at the foveal midline remains to be determined. We investigated this issue by presenting visual speech in matched hemiface displays to the left and right of a central fixation point, either exactly abutting the foveal midline or else located away from the midline in extrafoveal vision. The location of displays relative to the foveal midline was controlled precisely using an automated, gaze-contingent eye-tracking procedure. Visual speech perception showed a clear right hemifield advantage when presented in extrafoveal locations but no hemifield advantage (left or right) when presented abutting the foveal midline. Thus, while visual speech observed in extrafoveal vision appears to benefit from unilateral projections to left-hemisphere processes, no evidence was obtained to indicate that a functional division exists when visual speech is observed around the point of fixation. Implications of these findings for understanding visual speech perception and the nature of functional divisions in hemispheric projection are discussed.
Collapse
Affiliation(s)
| | | | - Lily Abedipour
- School of Psychology, University of Leicester, Leicester, United Kingdom
| | - Kevin B. Paterson
- School of Psychology, University of Leicester, Leicester, United Kingdom
| |
Collapse
|
9
|
Reed P, McCarthy J. Cross-modal attention-switching is impaired in autism spectrum disorders. J Autism Dev Disord 2012; 42:947-53. [PMID: 21720723 DOI: 10.1007/s10803-011-1324-8] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
This investigation aimed to determine if children with ASD are impaired in their ability to switch attention between different tasks, and whether performance is further impaired when required to switch across two separate modalities (visual and auditory). Eighteen children with ASD (9-13 years old) were compared with 18 typically-developing children matched with the ASD group for mental age, and also with 18 subjects with learning difficulties matched with the ASD group for mental and chronological age. Individuals alternated between two different visual tasks, and between a different visual task and an auditory task. Children with ASD performed worse than both comparison groups at both switching tasks. Moreover, children with ASD had greater difficulty when different modalities were required than where only one modality was required in the switching task in comparison with participants matched in terms of mental and chronological age.
Collapse
Affiliation(s)
- Phil Reed
- Department of Psychology, Swansea University, Singleton Park, Swansea SA2 8PP, UK.
| | | |
Collapse
|
10
|
When half a face is as good as a whole: Effects of simple substantial occlusion on visual and audiovisual speech perception. Atten Percept Psychophys 2011; 73:2270-85. [DOI: 10.3758/s13414-011-0152-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
11
|
Jordan TR, Abedipour L. The importance of laughing in your face: influences of visual laughter on auditory laughter perception. Perception 2011; 39:1283-5. [PMID: 21125954 DOI: 10.1068/p6752] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Hearing the sound of laughter is important for social communication, but processes contributing to the audibility of laughter remain to be determined. Production of laughter resembles production of speech in that both involve visible facial movements accompanying socially significant auditory signals. However, while it is known that speech is more audible when the facial movements producing the speech sound can be seen, similar visual enhancement of the audibility of laughter remains unknown. To address this issue, spontaneously occurring laughter was edited to produce stimuli comprising visual laughter, auditory laughter, visual and auditory laughter combined, and no laughter at all (either visual or auditory), all presented in four levels of background noise. Visual laughter and no-laughter stimuli produced very few reports of auditory laughter. However, visual laughter consistently made auditory laughter more audible, compared to the same auditory signal presented without visual laughter, resembling findings reported previously for speech.
Collapse
Affiliation(s)
- Timothy R Jordan
- College of Medicine, Biological Sciences and Psychology, Henry Wellcome Building, University of Leicester, Leicester LE1 7RH, UK.
| | | |
Collapse
|
12
|
Similarity structure in visual speech perception and optical phonetic signals. ACTA ACUST UNITED AC 2008; 69:1070-83. [PMID: 18038946 DOI: 10.3758/bf03193945] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
A complete understanding of visual phonetic perception (lipreading) requires linking perceptual effects to physical stimulus properties. However, the talking face is a highly complex stimulus, affording innumerable possible physical measurements. In the search for isomorphism between stimulus properties and phoneticeffects, second-order isomorphism was examined between theperceptual similarities of video-recorded perceptually identified speech syllables and the physical similarities among the stimuli. Four talkers produced the stimulus syllables comprising 23 initial consonants followed by one of three vowels. Six normal-hearing participants identified the syllables in a visual-only condition. Perceptual stimulus dissimilarity was quantified using the Euclidean distances between stimuli in perceptual spaces obtained via multidimensional scaling. Physical stimulus dissimilarity was quantified using face points recorded in three dimensions by an optical motion capture system. The variance accounted for in the relationship between the perceptual and the physical dissimilarities was evaluated using both the raw dissimilarities and the weighted dissimilarities. With weighting and the full set of 3-D optical data, the variance accounted for ranged between 46% and 66% across talkers and between 49% and 64% across vowels. The robust second-order relationship between the sparse 3-D point representation of visible speech and the perceptual effects suggests that the 3-D point representation is a viable basis for controlled studies of first-order relationships between visual phonetic perception and physical stimulus attributes.
Collapse
|
13
|
|
14
|
Thomas SM, Jordan TR. Contributions of oral and extraoral facial movement to visual and audiovisual speech perception. J Exp Psychol Hum Percept Perform 2005; 30:873-88. [PMID: 15462626 DOI: 10.1037/0096-1523.30.5.873] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Seeing a talker's face influences auditory speech recognition, but the visible input essential for this influence has yet to be established. Using a new seamless editing technique, the authors examined effects of restricting visible movement to oral or extraoral areas of a talking face. In Experiment 1, visual speech identification and visual influences on identifying auditory speech were compared across displays in which the whole face moved, the oral area moved, or the extraoral area moved. Visual speech influences on auditory speech recognition were substantial and unchanging across whole-face and oral-movement displays. However, extraoral movement also influenced identification of visual and audiovisual speech. Experiments 2 and 3 demonstrated that these results are dependent on intact and upright facial contexts, but only with extraoral movement displays.
Collapse
Affiliation(s)
- Sharon M Thomas
- MRC Institute of Hearing Research, University Park, Nottingham, England.
| | | |
Collapse
|
15
|
Thomas SM, Jordan TR. Determining the influence of Gaussian blurring on inversion effects with talking faces. ACTA ACUST UNITED AC 2002; 64:932-44. [PMID: 12269300 DOI: 10.3758/bf03196797] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Perception of visual speech and the influence of visual speech on auditory speech perception is affected by the orientation of a talker's face, but the nature of the visual information underlying this effect has yet to be established. Here, we examine the contributions of visually coarse (configural) and fine (featural) facial movement information to inversion effects in the perception of visual and audiovisual speech. We describe two experiments in which we disrupted perception of fine facial detail by decreasing spatial frequency (blurring) and disrupted perception of coarse configural information by facial inversion. For normal, unblurred talking faces, facial inversion had no influence on visual speech identification or on the effects of congruent or incongruent visual speech movements on perception of auditory speech. However, for blurred faces, facial inversion reduced identification of unimodal visual speech and effects of visual speech on perception of congruent and incongruent auditory speech. These effects were more pronounced for words whose appearance may be defined by fine featural detail. Implications for the nature of inversion effects in visual and audiovisual speech are discussed.
Collapse
|