1
|
Akcay E, Aydın Ö, Zagvozdkina V, Aycan Z, Caglar E, Oztop DB. Pupillary dilation response to the auditory food words in adolescents with obesity without binge eating disorder. Biol Psychol 2024; 193:108874. [PMID: 39313180 DOI: 10.1016/j.biopsycho.2024.108874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 09/17/2024] [Accepted: 09/17/2024] [Indexed: 09/25/2024]
Abstract
Childhood obesity is a growing global public health problem. Studies suggest that environmental cues contribute to developing and maintaining obesity. We aimed to evaluate pupillary changes to auditory food words vs. nonfood words and to conduct a dynamic temporal analysis of pupil size changes in adolescents with obesity without binge eating disorder by comparing healthy-weight adolescents. In this study, a total of 63 adolescents aged 12-18 years (n = 32, obesity group (OG); n = 31, control group (CG)) were included. In an auditory paradigm, participants were presented with a series of high and low-calorie food and nonfood words. A binocular remote eye-tracking device was used to measure pupil diameter. Generalized additive mixed models (GAMMs) were used for dynamic temporal analysis of pupillometry data. The results of GAMM analysis indicated that CG had larger pupil dilation than the OG while listening to auditory food words. CG had larger pupil dilation in food words than in nonfood words. However, the OG had a similar pupillary response in food and nonfood words. Pupil dilation response to higher-calorie foods was extended over the later stages of the time period (after 2000 ms) in the OG. In summary, our findings indicated that individuals with obesity had lower pupil dilation to auditory food words compared to normal-weight peers. Adolescents with obesity had prolonged pupillary dilation in higher calories of food words. The individual psychological factors affecting the dynamic changes of pupil responses to food cues in adolescents with obesity should be examined in further studies.
Collapse
Affiliation(s)
- Elif Akcay
- Ankara Bilkent City Hospital, Department of Child and Adolescent Psychiatry, Ankara, Turkey; University of Health Sciences, Department of Child and Adolescent Psychiatry, Ankara, Turkey.
| | - Özgür Aydın
- Ankara University, Department of Linguistics, Ankara, Turkey; Ankara University Institute of Health Sciences, Department of Interdisciplinary Neuroscience, Ankara, Turkey; Neuroscience and Neurotechnology Center of Excellence (NÖROM), Ankara, Turkey.
| | - Veronika Zagvozdkina
- University of Health Sciences, Department of Child and Adolescent Psychiatry, Ankara, Turkey.
| | - Zehra Aycan
- Ankara University Medical School, Department of Pediatric Endocrinology, Ankara, Turkey.
| | - Elcin Caglar
- Ankara University Medical School, Department of Child and Adolescent Psychiatry, Ankara, Turkey.
| | - Didem Behice Oztop
- Ankara University Medical School, Department of Child and Adolescent Psychiatry, Ankara, Turkey.
| |
Collapse
|
2
|
Rühlemann C, Barthel M. Word frequency and cognitive effort in turns-at-talk: turn structure affects processing load in natural conversation. Front Psychol 2024; 15:1208029. [PMID: 38899128 PMCID: PMC11186443 DOI: 10.3389/fpsyg.2024.1208029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 05/13/2024] [Indexed: 06/21/2024] Open
Abstract
Frequency distributions are known to widely affect psycholinguistic processes. The effects of word frequency in turns-at-talk, the nucleus of social action in conversation, have, by contrast, been largely neglected. This study probes into this gap by applying corpus-linguistic methods on the conversational component of the British National Corpus (BNC) and the Freiburg Multimodal Interaction Corpus (FreMIC). The latter includes continuous pupil size measures of participants of the recorded conversations, allowing for a systematic investigation of patterns in the contained speech and language on the one hand and their relation to concurrent processing costs they may incur in speakers and recipients on the other hand. We test a first hypothesis in this vein, analyzing whether word frequency distributions within turns-at-talk are correlated with interlocutors' processing effort during the production and reception of these turns. Turns are found to generally show a regular distribution pattern of word frequency, with highly frequent words in turn-initial positions, mid-range frequency words in turn-medial positions, and low-frequency words in turn-final positions. Speakers' pupil size is found to tend to increase during the course of a turn at talk, reaching a climax toward the turn end. Notably, the observed decrease in word frequency within turns is inversely correlated with the observed increase in pupil size in speakers, but not in recipients, with steeper decreases in word frequency going along with steeper increases in pupil size in speakers. We discuss the implications of these findings for theories of speech processing, turn structure, and information packaging. Crucially, we propose that the intensification of processing effort in speakers during a turn at talk is owed to an informational climax, which entails a progression from high-frequency, low-information words through intermediate levels to low-frequency, high-information words. At least in English conversation, interlocutors seem to make use of this pattern as one way to achieve efficiency in conversational interaction, creating a regularly recurring distribution of processing load across speaking turns, which aids smooth turn transitions, content prediction, and effective information transfer.
Collapse
Affiliation(s)
| | - Mathias Barthel
- Pragmatics Department, Leibniz Institute for the German Language (IDS), Mannheim, Germany
| |
Collapse
|
3
|
Mechtenberg H, Giorio C, Myers EB. Pupil Dilation Reflects Perceptual Priorities During a Receptive Speech Task. Ear Hear 2024; 45:425-440. [PMID: 37882091 PMCID: PMC10868674 DOI: 10.1097/aud.0000000000001438] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Accepted: 09/01/2023] [Indexed: 10/27/2023]
Abstract
OBJECTIVES The listening demand incurred by speech perception fluctuates in normal conversation. At the acoustic-phonetic level, natural variation in pronunciation acts as speedbumps to accurate lexical selection. Any given utterance may be more or less phonetically ambiguous-a problem that must be resolved by the listener to choose the correct word. This becomes especially apparent when considering two common speech registers-clear and casual-that have characteristically different levels of phonetic ambiguity. Clear speech prioritizes intelligibility through hyperarticulation which results in less ambiguity at the phonetic level, while casual speech tends to have a more collapsed acoustic space. We hypothesized that listeners would invest greater cognitive resources while listening to casual speech to resolve the increased amount of phonetic ambiguity, as compared with clear speech. To this end, we used pupillometry as an online measure of listening effort during perception of clear and casual continuous speech in two background conditions: quiet and noise. DESIGN Forty-eight participants performed a probe detection task while listening to spoken, nonsensical sentences (masked and unmasked) while recording pupil size. Pupil size was modeled using growth curve analysis to capture the dynamics of the pupil response as the sentence unfolded. RESULTS Pupil size during listening was sensitive to the presence of noise and speech register (clear/casual). Unsurprisingly, listeners had overall larger pupil dilations during speech perception in noise, replicating earlier work. The pupil dilation pattern for clear and casual sentences was considerably more complex. Pupil dilation during clear speech trials was slightly larger than for casual speech, across quiet and noisy backgrounds. CONCLUSIONS We suggest that listener motivation could explain the larger pupil dilations to clearly spoken speech. We propose that, bounded by the context of this task, listeners devoted more resources to perceiving the speech signal with the greatest acoustic/phonetic fidelity. Further, we unexpectedly found systematic differences in pupil dilation preceding the onset of the spoken sentences. Together, these data demonstrate that the pupillary system is not merely reactive but also adaptive-sensitive to both task structure and listener motivation to maximize accurate perception in a limited resource system.
Collapse
Affiliation(s)
- Hannah Mechtenberg
- Department of Psychological Sciences, University of Connecticut, Storrs, Connecticut, USA
| | - Cristal Giorio
- Department of Psychology, Pennsylvania State University, State College, Pennsylvania, USA
| | - Emily B. Myers
- Department of Psychological Sciences, University of Connecticut, Storrs, Connecticut, USA
- Department of Speech, Language and Hearing Sciences, University of Connecticut, Storrs, Connecticut, USA
| |
Collapse
|
4
|
Simantiraki O, Wagner AE, Cooke M. The impact of speech type on listening effort and intelligibility for native and non-native listeners. Front Neurosci 2023; 17:1235911. [PMID: 37841688 PMCID: PMC10568627 DOI: 10.3389/fnins.2023.1235911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 09/08/2023] [Indexed: 10/17/2023] Open
Abstract
Listeners are routinely exposed to many different types of speech, including artificially-enhanced and synthetic speech, styles which deviate to a greater or lesser extent from naturally-spoken exemplars. While the impact of differing speech types on intelligibility is well-studied, it is less clear how such types affect cognitive processing demands, and in particular whether those speech forms with the greatest intelligibility in noise have a commensurately lower listening effort. The current study measured intelligibility, self-reported listening effort, and a pupillometry-based measure of cognitive load for four distinct types of speech: (i) plain i.e. natural unmodified speech; (ii) Lombard speech, a naturally-enhanced form which occurs when speaking in the presence of noise; (iii) artificially-enhanced speech which involves spectral shaping and dynamic range compression; and (iv) speech synthesized from text. In the first experiment a cohort of 26 native listeners responded to the four speech types in three levels of speech-shaped noise. In a second experiment, 31 non-native listeners underwent the same procedure at more favorable signal-to-noise ratios, chosen since second language listening in noise has a more detrimental effect on intelligibility than listening in a first language. For both native and non-native listeners, artificially-enhanced speech was the most intelligible and led to the lowest subjective effort ratings, while the reverse was true for synthetic speech. However, pupil data suggested that Lombard speech elicited the lowest processing demands overall. These outcomes indicate that the relationship between intelligibility and cognitive processing demands is not a simple inverse, but is mediated by speech type. The findings of the current study motivate the search for speech modification algorithms that are optimized for both intelligibility and listening effort.
Collapse
Affiliation(s)
- Olympia Simantiraki
- Institute of Applied and Computational Mathematics, Foundation for Research & Technology-Hellas, Heraklion, Greece
| | - Anita E. Wagner
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
| | - Martin Cooke
- Ikerbasque (Basque Science Foundation), Vitoria-Gasteiz, Spain
| |
Collapse
|
5
|
Relaño-Iborra H, Wendt D, Neagu MB, Kressner AA, Dau T, Bækgaard P. Baseline pupil size encodes task-related information and modulates the task-evoked response in a speech-in-noise task. Trends Hear 2022; 26:23312165221134003. [PMID: 36426573 PMCID: PMC9703509 DOI: 10.1177/23312165221134003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Pupillometry data are commonly reported relative to a baseline value recorded in a controlled pre-task condition. In this study, the influence of the experimental design and the preparatory processing related to task difficulty on the baseline pupil size was investigated during a speech intelligibility in noise paradigm. Furthermore, the relationship between the baseline pupil size and the temporal dynamics of the pupil response was assessed. The analysis revealed strong effects of block presentation order, within-block sentence order and task difficulty on the baseline values. An interaction between signal-to-noise ratio and block order was found, indicating that baseline values reflect listener expectations arising from the order in which the different blocks were presented. Furthermore, the baseline pupil size was found to affect the slope, delay and curvature of the pupillary response as well as the peak pupil dilation. This suggests that baseline correction might be sufficient when reporting pupillometry results in terms of mean pupil dilation only, but not when a more complex characterization of the temporal dynamics of the response is considered. By clarifying which factors affect baseline pupil size and how baseline values interact with the task-evoked response, the results from the present study can contribute to a better interpretation of the pupillary response as a marker of cognitive processing.
Collapse
Affiliation(s)
- Helia Relaño-Iborra
- Cognitive Systems Section, Department of Applied Mathematics and Computer Science, Technical University of Denmark, 2800 Kgs, Lyngby, Denmark,Hearing Systems Section, Department of Health Technology, Technical University of Denmark, 2800 Kgs, Lyngby, Denmark,Helia Relaño-Iborra, Cognitive Systems Section, Department of Applied Mathematics and Computer Science, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark.
| | - Dorothea Wendt
- Eriksholm Research Center, Oticon, 3070 Snekkersten, Denmark
| | - Mihaela Beatrice Neagu
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, 2800 Kgs, Lyngby, Denmark
| | - Abigail Anne Kressner
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, 2800 Kgs, Lyngby, Denmark,Copenhagen Hearing and Balance Center, Rigshospitalet, 2100, Copenhagen, Denmark
| | - Torsten Dau
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, 2800 Kgs, Lyngby, Denmark
| | - Per Bækgaard
- Cognitive Systems Section, Department of Applied Mathematics and Computer Science, Technical University of Denmark, 2800 Kgs, Lyngby, Denmark
| |
Collapse
|
6
|
Modelling Human Word Learning and Recognition Using Visually Grounded Speech. Cognit Comput 2022. [DOI: 10.1007/s12559-022-10059-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
AbstractMany computational models of speech recognition assume that the set of target words is already given. This implies that these models learn to recognise speech in a biologically unrealistic manner, i.e. with prior lexical knowledge and explicit supervision. In contrast, visually grounded speech models learn to recognise speech without prior lexical knowledge by exploiting statistical dependencies between spoken and visual input. While it has previously been shown that visually grounded speech models learn to recognise the presence of words in the input, we explicitly investigate such a model as a model of human speech recognition. We investigate the time course of noun and verb recognition as simulated by the model using a gating paradigm to test whether its recognition is affected by well-known word competition effects in human speech processing. We furthermore investigate whether vector quantisation, a technique for discrete representation learning, aids the model in the discovery and recognition of words. Our experiments show that the model is able to recognise nouns in isolation and even learns to properly differentiate between plural and singular nouns. We also find that recognition is influenced by word competition from the word-initial cohort and neighbourhood density, mirroring word competition effects in human speech comprehension. Lastly, we find no evidence that vector quantisation is helpful in discovering and recognising words, though our gating experiment does show that the LSTM-VQ model is able to recognise the target words earlier.
Collapse
|
7
|
Rainey R, Theiss L, Lopez E, Wood T, Wood L, Marques I, Cannon JA, Kennedy GD, Morris MS, Hollis R, Davis T, Chu DI. Characterizing the impact of verbal communication and health literacy in the patient-surgeon encounter. Am J Surg 2022; 224:943-948. [PMID: 35527045 DOI: 10.1016/j.amjsurg.2022.04.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 04/21/2022] [Accepted: 04/26/2022] [Indexed: 11/01/2022]
Abstract
BACKGROUND Patients with limited health literacy (HL) have difficulty understanding written/verbal information. The quality of verbal communication is not well understood. Therefore, our aim was to characterize patient-surgeon conversations and identify opportunities for improvement. METHODS New colorectal patient-surgeon encounters were audio-recorded and transcribed. HL was measured. Primary outcomes were rates-of-speech, understandability of words, patient-reported understanding, and usage of medical jargon/statistics. Secondary outcomes included length-of-visit (LOV), conversation possession time, patient-surgeon exchanges, and speech interruptions. RESULTS Significant variations existed between surgeons in rates-of-speech and understandability of words (p < 0.05). Faster rates-of-speech were associated with significantly less understandable words (p < 0.05). Patient-reported understanding varied by HL and by surgeon. Conversation possession time and usage of medical jargon/statistics varied significantly by surgeon (p < 0.05) in addition to patient-surgeon exchanges and interruptions. Patients with limited HL had shorter LOV. CONCLUSIONS Significant variations exist in how surgeons talk to patients. Opportunities to improve verbal communication include slowing speech and using more understandable words.
Collapse
Affiliation(s)
- Rachael Rainey
- Department of Surgery, The University of Alabama at Birmingham, Division of Gastrointestinal Surgery, USA
| | - Lauren Theiss
- Department of Surgery, The University of Alabama at Birmingham, Division of Gastrointestinal Surgery, USA
| | - Elizabeth Lopez
- Department of Surgery, The University of Alabama at Birmingham, Division of Gastrointestinal Surgery, USA
| | - Tara Wood
- Department of Surgery, The University of Alabama at Birmingham, Division of Gastrointestinal Surgery, USA
| | - Lauren Wood
- Department of Surgery, The University of Alabama at Birmingham, Division of Gastrointestinal Surgery, USA
| | - Isabel Marques
- Department of Surgery, The University of Alabama at Birmingham, Division of Gastrointestinal Surgery, USA
| | - Jamie A Cannon
- Department of Surgery, The University of Alabama at Birmingham, Division of Gastrointestinal Surgery, USA
| | - Gregory D Kennedy
- Department of Surgery, The University of Alabama at Birmingham, Division of Gastrointestinal Surgery, USA
| | - Melanie S Morris
- Department of Surgery, The University of Alabama at Birmingham, Division of Gastrointestinal Surgery, USA
| | - Robert Hollis
- Department of Surgery, The University of Alabama at Birmingham, Division of Gastrointestinal Surgery, USA
| | - Terry Davis
- Departments of Medicine and Pediatrics, Louisiana State University Health, Shreveport, LA, USA
| | - Daniel I Chu
- Department of Surgery, The University of Alabama at Birmingham, Division of Gastrointestinal Surgery, USA.
| |
Collapse
|
8
|
Dingemanse G, Goedegebure A. Listening Effort in Cochlear Implant Users: The Effect of Speech Intelligibility, Noise Reduction Processing, and Working Memory Capacity on the Pupil Dilation Response. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:392-404. [PMID: 34898265 DOI: 10.1044/2021_jslhr-21-00230] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
PURPOSE This study aimed to evaluate the effect of speech recognition performance, working memory capacity (WMC), and a noise reduction algorithm (NRA) on listening effort as measured with pupillometry in cochlear implant (CI) users while listening to speech in noise. METHOD Speech recognition and pupil responses (peak dilation, peak latency, and release of dilation) were measured during a speech recognition task at three speech-to-noise ratios (SNRs) with an NRA in both on and off conditions. WMC was measured with a reading span task. Twenty experienced CI users participated in this study. RESULTS With increasing SNR and speech recognition performance, (a) the peak pupil dilation decreased by only a small amount, (b) the peak latency decreased, and (c) the release of dilation after the sentences increased. The NRA had no effect on speech recognition in noise or on the peak or latency values of the pupil response but caused less release of dilation after the end of the sentences. A lower reading span score was associated with higher peak pupil dilation but was not associated with peak latency, release of dilation, or speech recognition in noise. CONCLUSIONS In CI users, speech perception is effortful, even at higher speech recognition scores and high SNRs, indicating that CI users are in a chronic state of increased effort in communication situations. The application of a clinically used NRA did not improve speech perception, nor did it reduce listening effort. Participants with a relatively low WMC exerted relatively more listening effort but did not have better speech reception thresholds in noise.
Collapse
Affiliation(s)
- Gertjan Dingemanse
- Department of Otorhinolaryngology, Head and Neck Surgery, Erasmus University Medical Center, Rotterdam, the Netherlands
| | - André Goedegebure
- Department of Otorhinolaryngology, Head and Neck Surgery, Erasmus University Medical Center, Rotterdam, the Netherlands
| |
Collapse
|
9
|
Morett LM, Roche JM, Fraundorf SH, McPartland JC. Contrast Is in the Eye of the Beholder: Infelicitous Beat Gesture Increases Cognitive Load During Online Spoken Discourse Comprehension. Cogn Sci 2021; 44:e12912. [PMID: 33073404 DOI: 10.1111/cogs.12912] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2019] [Revised: 05/15/2020] [Accepted: 09/02/2020] [Indexed: 11/30/2022]
Abstract
We investigated how two cues to contrast-beat gesture and contrastive pitch accenting-affect comprehenders' cognitive load during processing of spoken referring expressions. In two visual-world experiments, we orthogonally manipulated the presence of these cues and their felicity, or fit, with the local (sentence-level) referential context in critical referring expressions while comprehenders' task-evoked pupillary responses (TEPRs) were examined. In Experiment 1, beat gesture and contrastive accenting always matched the referential context of filler referring expressions and were therefore relatively felicitous on the global (experiment) level, whereas in Experiment 2, beat gesture and contrastive accenting never fit the referential context of filler referring expressions and were therefore infelicitous on the global level. The results revealed that both beat gesture and contrastive accenting increased comprehenders' cognitive load. For beat gesture, this increase in cognitive load was driven by both local and global infelicity. For contrastive accenting, this increase in cognitive load was unaffected when cues were globally felicitous but exacerbated when cues were globally infelicitous. Together, these results suggest that comprehenders' cognitive resources are taxed by processing infelicitous use of beat gesture and contrastive accenting to convey contrast on both the local and global levels.
Collapse
Affiliation(s)
- Laura M Morett
- Department of Educational Studies in Psychology, Research Methodology, and Counseling, University of Alabama
| | - Jennifer M Roche
- Department of Speech Pathology and Audiology, Kent State University
| | - Scott H Fraundorf
- Department of Psychology, Learning Research and Development Center, University of Pittsburgh
| | | |
Collapse
|
10
|
Pupillometry reveals cognitive demands of lexical competition during spoken word recognition in young and older adults. Psychon Bull Rev 2021; 29:268-280. [PMID: 34405386 DOI: 10.3758/s13423-021-01991-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/27/2021] [Indexed: 12/27/2022]
Abstract
In most contemporary activation-competition frameworks for spoken word recognition, candidate words compete against phonological "neighbors" with similar acoustic properties (e.g., "cap" vs. "cat"). Thus, recognizing words with more competitors should come at a greater cognitive cost relative to recognizing words with fewer competitors, due to increased demands for selecting the correct item and inhibiting incorrect candidates. Importantly, these processes should operate even in the absence of differences in accuracy. In the present study, we tested this proposal by examining differences in processing costs associated with neighborhood density for highly intelligible items presented in quiet. A second goal was to examine whether the cognitive demands associated with increased neighborhood density were greater for older adults compared with young adults. Using pupillometry as an index of cognitive processing load, we compared the cognitive demands associated with spoken word recognition for words with many or fewer neighbors, presented in quiet, for young (n = 67) and older (n = 69) adult listeners. Growth curve analysis of the pupil data indicated that older adults showed a greater evoked pupil response for spoken words than did young adults, consistent with increased cognitive load during spoken word recognition. Words from dense neighborhoods were marginally more demanding to process than words from sparse neighborhoods. There was also an interaction between age and neighborhood density, indicating larger effects of density in young adult listeners. These results highlight the importance of assessing both cognitive demands and accuracy when investigating the mechanisms underlying spoken word recognition.
Collapse
|
11
|
Kontogiorgos D, Gustafson J. Measuring Collaboration Load With Pupillary Responses - Implications for the Design of Instructions in Task-Oriented HRI. Front Psychol 2021; 12:623657. [PMID: 34354623 PMCID: PMC8329026 DOI: 10.3389/fpsyg.2021.623657] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 05/24/2021] [Indexed: 11/17/2022] Open
Abstract
In face-to-face interaction, speakers establish common ground incrementally, the mutual belief of understanding. Instead of constructing “one-shot” complete utterances, speakers tend to package pieces of information in smaller fragments (what Clark calls “installments”). The aim of this paper was to investigate how speakers' fragmented construction of utterances affect the cognitive load of the conversational partners during utterance production and comprehension. In a collaborative furniture assembly, participants instructed each other how to build an IKEA stool. Pupil diameter was measured as an outcome of effort and cognitive processing in the collaborative task. Pupillometry data and eye-gaze behaviour indicated that more cognitive resources were required by speakers to construct fragmented rather than non-fragmented utterances. Such construction of utterances by audience design was associated with higher cognitive load for speakers. We also found that listeners' cognitive resources were decreased in each new speaker utterance, suggesting that speakers' efforts in the fragmented construction of utterances were successful to resolve ambiguities. The results indicated that speaking in fragments is beneficial for minimising collaboration load, however, adapting to listeners is a demanding task. We discuss implications for future empirical research on the design of task-oriented human-robot interactions, and how assistive social robots may benefit from the production of fragmented instructions.
Collapse
Affiliation(s)
- Dimosthenis Kontogiorgos
- Division of Speech, Music and Hearing, Department of Intelligent Systems, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Joakim Gustafson
- Division of Speech, Music and Hearing, Department of Intelligent Systems, KTH Royal Institute of Technology, Stockholm, Sweden
| |
Collapse
|
12
|
Age Differences in the Effects of Speaking Rate on Auditory, Visual, and Auditory-Visual Speech Perception. Ear Hear 2021; 41:549-560. [PMID: 31453875 DOI: 10.1097/aud.0000000000000776] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES This study was designed to examine how speaking rate affects auditory-only, visual-only, and auditory-visual speech perception across the adult lifespan. In addition, the study examined the extent to which unimodal (auditory-only and visual-only) performance predicts auditory-visual performance across a range of speaking rates. The authors hypothesized significant Age × Rate interactions in all three modalities and that unimodal performance would account for a majority of the variance in auditory-visual speech perception for speaking rates that are both slower and faster than normal. DESIGN Participants (N = 145), ranging in age from 22 to 92, were tested in conditions with auditory-only, visual-only, and auditory-visual presentations using a closed-set speech perception test. Five different speaking rates were presented in each modality: an unmodified (normal rate), two rates that were slower than normal, and two rates that were faster than normal. Signal to noise ratios were set individually to produce approximately 30% correct identification in the auditory-only condition and this signal to noise ratio was used in the auditory-only and auditory-visual conditions. RESULTS Age × Rate interactions were observed for the fastest speaking rates in both the visual-only and auditory-visual conditions. Unimodal performance accounted for at least 60% of the variance in auditory-visual performance for all five speaking rates. CONCLUSIONS The findings demonstrate that the disproportionate difficulty that older adults have with rapid speech for auditory-only presentations can also be observed with visual-only and auditory-visual presentations. Taken together, the present analyses of age and individual differences indicate a generalized age-related decline in the ability to understand speech produced at fast speaking rates. The finding that auditory-visual speech performance was almost entirely predicted by unimodal performance across all five speaking rates has important clinical implications for auditory-visual speech perception and the ability of older adults to use visual speech information to compensate for age-related hearing loss.
Collapse
|
13
|
Schubotz L, Holler J, Drijvers L, Özyürek A. Aging and working memory modulate the ability to benefit from visible speech and iconic gestures during speech-in-noise comprehension. PSYCHOLOGICAL RESEARCH 2021; 85:1997-2011. [PMID: 32627053 PMCID: PMC8289811 DOI: 10.1007/s00426-020-01363-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Accepted: 05/20/2020] [Indexed: 12/19/2022]
Abstract
When comprehending speech-in-noise (SiN), younger and older adults benefit from seeing the speaker's mouth, i.e. visible speech. Younger adults additionally benefit from manual iconic co-speech gestures. Here, we investigate to what extent younger and older adults benefit from perceiving both visual articulators while comprehending SiN, and whether this is modulated by working memory and inhibitory control. Twenty-eight younger and 28 older adults performed a word recognition task in three visual contexts: mouth blurred (speech-only), visible speech, or visible speech + iconic gesture. The speech signal was either clear or embedded in multitalker babble. Additionally, there were two visual-only conditions (visible speech, visible speech + gesture). Accuracy levels for both age groups were higher when both visual articulators were present compared to either one or none. However, older adults received a significantly smaller benefit than younger adults, although they performed equally well in speech-only and visual-only word recognition. Individual differences in verbal working memory and inhibitory control partly accounted for age-related performance differences. To conclude, perceiving iconic gestures in addition to visible speech improves younger and older adults' comprehension of SiN. Yet, the ability to benefit from this additional visual information is modulated by age and verbal working memory. Future research will have to show whether these findings extend beyond the single word level.
Collapse
Affiliation(s)
- Louise Schubotz
- Max Planck Institute for Psycholinguistics, P.O. Box 310, 6500 AH, Nijmegen, The Netherlands
| | - Judith Holler
- Max Planck Institute for Psycholinguistics, P.O. Box 310, 6500 AH, Nijmegen, The Netherlands.
- Donders Institute for Brain, Cognition, and Behaviour, P.O. Box 9010, 6500 GL, Nijmegen, The Netherlands.
| | - Linda Drijvers
- Max Planck Institute for Psycholinguistics, P.O. Box 310, 6500 AH, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition, and Behaviour, P.O. Box 9010, 6500 GL, Nijmegen, The Netherlands
| | - Aslı Özyürek
- Max Planck Institute for Psycholinguistics, P.O. Box 310, 6500 AH, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition, and Behaviour, P.O. Box 9010, 6500 GL, Nijmegen, The Netherlands
- Centre for Language Studies, Radboud University Nijmegen, P.O. Box 9103, 6500 HD, Nijmegen, The Netherlands
| |
Collapse
|
14
|
Randolph AB, Petter SC, Storey VC, Jackson MM. Context‐aware
user profiles to improve media synchronicity for individuals with severe motor disabilities. INFORMATION SYSTEMS JOURNAL 2021. [DOI: 10.1111/isj.12337] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Adriane B. Randolph
- Information Systems and Security Kennesaw State University Kennesaw Georgia USA
| | | | - Veda C. Storey
- Computer Information Systems Georgia State University Atlanta Georgia USA
| | - Melody M. Jackson
- College of Computing Georgia Institute of Technology Atlanta Georgia USA
| |
Collapse
|
15
|
Tucker BV, Ford C, Hedges S. Speech aging: Production and perception. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2021; 12:e1557. [PMID: 33651922 DOI: 10.1002/wcs.1557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 12/18/2020] [Accepted: 02/05/2021] [Indexed: 11/06/2022]
Abstract
In this overview we describe literature on how speech production and speech perception change in healthy or normal aging across the adult lifespan. In the production section we review acoustic characteristics that have been investigated as potentially distinguishing younger and older adults. In the speech perception section studies concerning speaker age estimation and those investigating older listeners' perception are addressed. Our discussion focuses on major themes and other fruitful areas for future research. This article is categorized under: Linguistics > Language in Mind and Brain Linguistics > Linguistic Theory Psychology > Development and Aging.
Collapse
Affiliation(s)
- Benjamin V Tucker
- Department of Linguistics, University of Alberta, Edmonton, Alberta, Canada
| | - Catherine Ford
- Department of Linguistics, University of Alberta, Edmonton, Alberta, Canada
| | - Stephanie Hedges
- Department of Linguistics, University of Alberta, Edmonton, Alberta, Canada
| |
Collapse
|
16
|
Borghini G, Hazan V. Effects of acoustic and semantic cues on listening effort during native and non-native speech perception. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:3783. [PMID: 32611155 DOI: 10.1121/10.0001126] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 04/02/2020] [Indexed: 06/11/2023]
Abstract
Relative to native listeners, non-native listeners who are immersed in a second language environment experience increased listening effort and a reduced ability to successfully perform an additional task while listening. Previous research demonstrated that listeners can exploit a variety of intelligibility-enhancing cues to cope with adverse listening conditions. However, little is known about the implications of those speech perception strategies for listening effort. The current research aims to investigate by means of pupillometry how listening effort is modulated in native and non-native listeners by the availability of semantic context and acoustic enhancements during the comprehension of spoken sentences. For this purpose, semantic plausibility and speaking style were manipulated both separately and in combination during a speech perception task in noise. The signal to noise ratio was individually adjusted for each participant in order to target 50% intelligibility level. Behavioural results indicated that native and non-native listeners were equally able to fruitfully exploit both semantic and acoustic cues to aid their comprehension. Pupil data indicated that listening effort was reduced for both groups of listeners when acoustic enhancements were available, while the presence of a plausible semantic context did not lead to a reduction in listening effort.
Collapse
Affiliation(s)
- Giulia Borghini
- Department of Speech Hearing and Phonetic Sciences, Faculty of Brain Sciences, University College London, WC1N1PF London, United Kingdom
| | - Valerie Hazan
- Department of Speech Hearing and Phonetic Sciences, Faculty of Brain Sciences, University College London, WC1N1PF London, United Kingdom
| |
Collapse
|
17
|
Paulus M, Hazan V, Adank P. The relationship between talker acoustics, intelligibility, and effort in degraded listening conditions. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:3348. [PMID: 32486777 DOI: 10.1121/10.0001212] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 04/20/2020] [Indexed: 06/11/2023]
Abstract
Listening to degraded speech is associated with decreased intelligibility and increased effort. However, listeners are generally able to adapt to certain types of degradations. While intelligibility of degraded speech is modulated by talker acoustics, it is unclear whether talker acoustics also affect effort and adaptation. Moreover, it has been demonstrated that talker differences are preserved across spectral degradations, but it is not known whether this effect extends to temporal degradations and which acoustic-phonetic characteristics are responsible. In a listening experiment combined with pupillometry, participants were presented with speech in quiet as well as in masking noise, time-compressed, and noise-vocoded speech by 16 Southern British English speakers. Results showed that intelligibility, but not adaptation, was modulated by talker acoustics. Talkers who were more intelligible under noise-vocoding were also more intelligible under masking and time-compression. This effect was linked to acoustic-phonetic profiles with greater vowel space dispersion (VSD) and energy in mid-range frequencies, as well as slower speaking rate. While pupil dilation indicated increasing effort with decreasing intelligibility, this study also linked reduced effort in quiet to talkers with greater VSD. The results emphasize the relevance of talker acoustics for intelligibility and effort in degraded listening conditions.
Collapse
Affiliation(s)
- Maximillian Paulus
- Speech, Hearing and Phonetic Sciences, University College London, London, United Kingdom
| | - Valerie Hazan
- Speech, Hearing and Phonetic Sciences, University College London, London, United Kingdom
| | - Patti Adank
- Speech, Hearing and Phonetic Sciences, University College London, London, United Kingdom
| |
Collapse
|
18
|
Zhang M, Siegle GJ, McNeil MR, Pratt SR, Palmer C. The role of reward and task demand in value-based strategic allocation of auditory comprehension effort. Hear Res 2019; 381:107775. [DOI: 10.1016/j.heares.2019.107775] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 07/30/2019] [Accepted: 07/31/2019] [Indexed: 12/19/2022]
|
19
|
Barthel M, Sauppe S. Speech Planning at Turn Transitions in Dialog Is Associated With Increased Processing Load. Cogn Sci 2019; 43:e12768. [DOI: 10.1111/cogs.12768] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Revised: 02/13/2019] [Accepted: 05/21/2019] [Indexed: 11/30/2022]
Affiliation(s)
- Mathias Barthel
- Language and Cognition Department Max Planck Institute for Psycholinguistics
| | | |
Collapse
|
20
|
Meng Q, Wang X, Cai Y, Kong F, Buck AN, Yu G, Zheng N, Schnupp JWH. Time-compression thresholds for Mandarin sentences in normal-hearing and cochlear implant listeners. Hear Res 2019; 374:58-68. [PMID: 30732921 DOI: 10.1016/j.heares.2019.01.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/07/2018] [Revised: 01/13/2019] [Accepted: 01/16/2019] [Indexed: 11/19/2022]
Abstract
Faster speech may facilitate more efficient communication, but if speech is too fast it becomes unintelligible. The maximum speeds at which Mandarin words were intelligible in a sentence context were quantified for normal hearing (NH) and cochlear implant (CI) listeners by measuring time-compression thresholds (TCTs) in an adaptive staircase procedure. In Experiment 1, both original and CI-vocoded time-compressed speech from the MSP (Mandarin speech perception) and MHINT (Mandarin hearing in noise test) corpora was presented to 10 NH subjects over headphones. In Experiment 2, original time-compressed speech was presented to 10 CI subjects and another 10 NH subjects through a loudspeaker in a soundproof room. Sentences were time-compressed without changing their spectral profile, and were presented up to three times within a single trial. At the end of each trial, the number of correctly identified words in the sentence was scored. A 50%-word recognition threshold was tracked in the psychophysical procedure. The observed median TCTs were very similar for MSP and MHINT speech. For NH listeners, median TCTs were around 16.7 syllables/s for normal speech, and 11.8 and 8.6 syllables/s respectively for 8 and 4 channel tone-carrier vocoded speech. For CI listeners, TCTs were only around 6.8 syllables/s. The interquartile range of the TCTs within each cohort was smaller than 3.0 syllables/s. Speech reception thresholds in noise were also measured in Experiment 2, and were found to be strongly correlated with TCTs for CI listeners. In conclusion, the Mandarin sentence TCTs were around 16.7 syllables/s for most NH subjects, but rarely faster than 10.0 syllables/s for CI listeners, which quantitatively illustrated upper limits of fast speech information processing with CIs.
Collapse
Affiliation(s)
- Qinglin Meng
- Acoustics Lab of School of Physics and Optoelectronics and State Key Laboratory of Subtropical Building Science, South China University of Technology, China; Hearing Research Group, Department of Biomedical Sciences, City University of Hong Kong, Hong Kong, China.
| | - Xianren Wang
- Department of Otorhinolaryngology, The First Affiliated Hospital, Sun Yat-Sen University and Institute of Otorhinolaryngology, Sun Yat-Sen University, Guangzhou, China
| | - Yuexin Cai
- Department of Otolaryngology, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University and Department of Hearing and Speech Science, Xin Hua College of Sun Yat-Sen University, Guangzhou, China
| | - Fanhui Kong
- The Guangdong Key Laboratory of Intelligent Information Processing, College of Information Engineering, Shenzhen University, China
| | - Alexa Nadezhda Buck
- Hearing Research Group, Department of Biomedical Sciences, City University of Hong Kong, Hong Kong, China
| | - Guangzheng Yu
- Acoustics Lab of School of Physics and Optoelectronics and State Key Laboratory of Subtropical Building Science, South China University of Technology, China.
| | - Nengheng Zheng
- The Guangdong Key Laboratory of Intelligent Information Processing, College of Information Engineering, Shenzhen University, China.
| | - Jan W H Schnupp
- Hearing Research Group, Department of Biomedical Sciences, City University of Hong Kong, Hong Kong, China.
| |
Collapse
|
21
|
Zekveld AA, Koelewijn T, Kramer SE. The Pupil Dilation Response to Auditory Stimuli: Current State of Knowledge. Trends Hear 2019; 22:2331216518777174. [PMID: 30249172 PMCID: PMC6156203 DOI: 10.1177/2331216518777174] [Citation(s) in RCA: 124] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
The measurement of cognitive resource allocation during listening, or listening effort, provides valuable insight in the factors influencing auditory processing. In recent years, many studies inside and outside the field of hearing science have measured the pupil response evoked by auditory stimuli. The aim of the current review was to provide an exhaustive overview of these studies. The 146 studies included in this review originated from multiple domains, including hearing science and linguistics, but the review also covers research into motivation, memory, and emotion. The present review provides a unique overview of these studies and is organized according to the components of the Framework for Understanding Effortful Listening. A summary table presents the sample characteristics, an outline of the study design, stimuli, the pupil parameters analyzed, and the main findings of each study. The results indicate that the pupil response is sensitive to various task manipulations as well as interindividual differences. Many of the findings have been replicated. Frequent interactions between the independent factors affecting the pupil response have been reported, which indicates complex processes underlying cognitive resource allocation. This complexity should be taken into account in future studies that should focus more on interindividual differences, also including older participants. This review facilitates the careful design of new studies by indicating the factors that should be controlled for. In conclusion, measuring the pupil dilation response to auditory stimuli has been demonstrated to be sensitive method applicable to numerous research questions. The sensitivity of the measure calls for carefully designed stimuli.
Collapse
Affiliation(s)
- Adriana A Zekveld
- 1 Section Ear & Hearing, Department of Otolaryngology-Head and Neck Surgery, Amsterdam Public Health Research Institute, VU University Medical Center, the Netherlands.,2 Linnaeus Centre HEAD, The Swedish Institute for Disability Research, Sweden.,3 Department of Behavioural Sciences and Learning, Linköping University, Sweden
| | - Thomas Koelewijn
- 1 Section Ear & Hearing, Department of Otolaryngology-Head and Neck Surgery, Amsterdam Public Health Research Institute, VU University Medical Center, the Netherlands
| | - Sophia E Kramer
- 1 Section Ear & Hearing, Department of Otolaryngology-Head and Neck Surgery, Amsterdam Public Health Research Institute, VU University Medical Center, the Netherlands
| |
Collapse
|
22
|
Dias JW, McClaskey CM, Harris KC. Time-Compressed Speech Identification Is Predicted by Auditory Neural Processing, Perceptuomotor Speed, and Executive Functioning in Younger and Older Listeners. J Assoc Res Otolaryngol 2019; 20:73-88. [PMID: 30456729 PMCID: PMC6364265 DOI: 10.1007/s10162-018-00703-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2018] [Accepted: 10/08/2018] [Indexed: 10/27/2022] Open
Abstract
Older adults typically have difficulty identifying speech that is temporally distorted, such as reverberant, accented, time-compressed, or interrupted speech. These difficulties occur even when hearing thresholds fall within a normal range. Auditory neural processing speed, which we have previously found to predict auditory temporal processing (auditory gap detection), may interfere with the ability to recognize phonetic features as they rapidly unfold over time in spoken speech. Further, declines in perceptuomotor processing speed and executive functioning may interfere with the ability to track, access, and process information. The current investigation examined the extent to which age-related differences in time-compressed speech identification were predicted by auditory neural processing speed, perceptuomotor processing speed, and executive functioning. Groups of normal-hearing (up to 3000 Hz) younger and older adults identified 40, 50, and 60 % time-compressed sentences. Auditory neural processing speed was defined as the P1 and N1 latencies of click-induced auditory-evoked potentials. Perceptuomotor processing speed and executive functioning were measured behaviorally using the Connections Test. Compared to younger adults, older adults exhibited poorer time-compressed speech identification and slower perceptuomotor processing. Executive functioning, P1 latency, and N1 latency did not differ between age groups. Time-compressed speech identification was independently predicted by P1 latency, perceptuomotor processing speed, and executive functioning in younger and older listeners. Results of model testing suggested that declines in perceptuomotor processing speed mediated age-group differences in time-compressed speech identification. The current investigation joins a growing body of literature suggesting that the processing of temporally distorted speech is impacted by lower-level auditory neural processing and higher-level perceptuomotor and executive processes.
Collapse
Affiliation(s)
- James W Dias
- Department of Otolaryngology, Medical University of South Carolina, 135 Rutledge Avenue, MSC 550, Charleston, SC, 29425-5500, USA.
| | - Carolyn M McClaskey
- Department of Otolaryngology, Medical University of South Carolina, 135 Rutledge Avenue, MSC 550, Charleston, SC, 29425-5500, USA
| | - Kelly C Harris
- Department of Otolaryngology, Medical University of South Carolina, 135 Rutledge Avenue, MSC 550, Charleston, SC, 29425-5500, USA
| |
Collapse
|
23
|
Visentin C, Prodi N. A Matrixed Speech-in-Noise Test to Discriminate Favorable Listening Conditions by Means of Intelligibility and Response Time Results. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2018; 61:1497-1516. [PMID: 29845187 DOI: 10.1044/2018_jslhr-h-17-0418] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2017] [Accepted: 02/28/2018] [Indexed: 05/22/2023]
Abstract
PURPOSE The primary aim of this study was to develop and examine the potentials of a new speech-in-noise test in discriminating the favorable listening conditions targeted in the acoustical design of communication spaces. The test is based on the recognition and recall of disyllabic word sequences. A secondary aim was to compare the test with current speech-in-noise tests, assessing its benefits and limitations. METHOD Young adults (19-40 years old), self-reporting normal hearing, were presented with the newly developed Words Sequence Test (WST; 16 participants, Experiment 1) and with a consonant confusion test and a sentence recognition test (Experiment 2, 36 participants randomly assigned to the 2 tests). Participants performing the WST were presented with word sequences of different lengths (from 2 up to 6 words). Two listening conditions were selected: (a) no noise and no reverberation, and (b) reverberant, steady-state noise (Speech Transmission Index: 0.47). The tests were presented in a closed-set format; data on the number of words correctly recognized (speech intelligibility, IS) and the response times (RTs) were collected (onset RT, single words' RT). RESULTS It was found that a sequence composed of 4 disyllabic words ensured both the full recognition score in quiet conditions and a significant decrease in IS results when noise and reverberation degraded the speech signal. RTs increased with the worsening of the listening conditions and the number of words of the sequence. The greatest onset RT variation was found when using a sequence of 4 words. In the comparison with current speech-in-noise tests, it was found that the WST maximized the IS difference between the selected listening conditions as well as the RT increase. CONCLUSIONS Overall, the results suggest that the new speech-in-noise test has good potentials in discriminating conditions with near-ceiling accuracy. As compared with current speech-in-noise tests, it appears that the WST with a 4-word sequence allows for a finer mapping of the acoustical design target conditions of public spaces through accuracy and onset RT data.
Collapse
Affiliation(s)
| | - Nicola Prodi
- Department of Engineering, University of Ferrara, Italy
| |
Collapse
|