1
|
Ufer C, Blank H. Opposing serial effects of stimulus and choice in speech perception scale with context variability. iScience 2024; 27:110611. [PMID: 39252961 PMCID: PMC11382034 DOI: 10.1016/j.isci.2024.110611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 06/05/2024] [Accepted: 07/26/2024] [Indexed: 09/11/2024] Open
Abstract
In this study, we investigated serial effects on the perception of auditory vowel stimuli across three experimental setups with different degrees of context variability. Aligning with recent findings in visual perception, our results confirm the existence of two distinct processes in serial dependence: a repulsive sensory effect coupled with an attractive decisional effect. Importantly, our study extends these observations to the auditory domain, demonstrating parallel serial effects in audition. Furthermore, we uncover context variability effects, revealing a linear pattern for the repulsive perceptual effect and a quadratic pattern for the attractive decisional effect. These findings support the presence of adaptive sensory mechanisms underlying the repulsive effects, while higher-level mechanisms appear to govern the attractive decisional effect. The study provides valuable insights into the interplay of attractive and repulsive serial effects in auditory perception and contributes to our understanding of the underlying mechanisms.
Collapse
Affiliation(s)
- Carina Ufer
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany
- Hamburg Brain School, University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany
| | - Helen Blank
- Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany
- Hamburg Brain School, University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany
| |
Collapse
|
2
|
Crinnion AM, Heffner CC, Myers EB. Individual differences in the use of top-down versus bottom-up cues to resolve phonetic ambiguity. Atten Percept Psychophys 2024:10.3758/s13414-024-02889-4. [PMID: 38811489 DOI: 10.3758/s13414-024-02889-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/24/2024] [Indexed: 05/31/2024]
Abstract
How listeners weight a wide variety of information to interpret ambiguities in the speech signal is a question of interest in speech perception, particularly when understanding how listeners process speech in the context of phrases or sentences. Dominant views of cue use for language comprehension posit that listeners integrate multiple sources of information to interpret ambiguities in the speech signal. Here, we study how semantic context, sentence rate, and vowel length all influence identification of word-final stops. We find that while at the group level all sources of information appear to influence how listeners interpret ambiguities in speech, at the level of the individual listener, we observe systematic differences in cue reliance, such that some individual listeners favor certain cues (e.g., speech rate and vowel length) to the exclusion of others (e.g., semantic context). While listeners exhibit a range of cue preferences, across participants we find a negative relationship between individuals' weighting of semantic and acoustic-phonetic (sentence rate, vowel length) cues. Additionally, we find that these weightings are stable within individuals over a period of 1 month. Taken as a whole, these findings suggest that theories of cue integration and speech processing may fail to capture the rich individual differences that exist between listeners, which could arise due to mechanistic differences between individuals in speech perception.
Collapse
|
3
|
King CJ, Sharpe CM, Shorey AE, Stilp CE. The effects of variability on context effects and psychometric function slopes in speaking rate normalizationa). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:2099-2113. [PMID: 38483206 PMCID: PMC10942802 DOI: 10.1121/10.0025292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 02/23/2024] [Accepted: 02/27/2024] [Indexed: 03/17/2024]
Abstract
Acoustic context influences speech perception, but contextual variability restricts this influence. Assgari and Stilp [J. Acoust. Soc. Am. 138, 3023-3032 (2015)] demonstrated that when categorizing vowels, variability in who spoke the preceding context sentence on each trial but not the sentence contents diminished the resulting spectral contrast effects (perceptual shifts in categorization stemming from spectral differences between sounds). Yet, how such contextual variability affects temporal contrast effects (TCEs) (also known as speaking rate normalization; categorization shifts stemming from temporal differences) is unknown. Here, stimuli were the same context sentences and conditions (one talker saying one sentence, one talker saying 200 sentences, 200 talkers saying 200 sentences) used in Assgari and Stilp [J. Acoust. Soc. Am. 138, 3023-3032 (2015)], but set to fast or slow speaking rates to encourage perception of target words as "tier" or "deer," respectively. In Experiment 1, sentence variability and talker variability each diminished TCE magnitudes; talker variability also produced shallower psychometric function slopes. In Experiment 2, when speaking rates were matched across the 200-sentences conditions, neither TCE magnitudes nor slopes differed across conditions. In Experiment 3, matching slow and fast rates across all conditions failed to produce equal TCEs and slopes everywhere. Results suggest a complex interplay between acoustic, talker, and sentence variability in shaping TCEs in speech perception.
Collapse
Affiliation(s)
- Caleb J King
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| | - Chloe M Sharpe
- School of Psychology, Xavier University, Cincinnati, Ohio 45207, USA
| | - Anya E Shorey
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| | - Christian E Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| |
Collapse
|
4
|
Tsironis A, Vlahou E, Kontou P, Bagos P, Kopčo N. Adaptation to Reverberation for Speech Perception: A Systematic Review. Trends Hear 2024; 28:23312165241273399. [PMID: 39246212 PMCID: PMC11384524 DOI: 10.1177/23312165241273399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/10/2024] Open
Abstract
In everyday acoustic environments, reverberation alters the speech signal received at the ears. Normal-hearing listeners are robust to these distortions, quickly recalibrating to achieve accurate speech perception. Over the past two decades, multiple studies have investigated the various adaptation mechanisms that listeners use to mitigate the negative impacts of reverberation and improve speech intelligibility. Following the PRISMA guidelines, we performed a systematic review of these studies, with the aim to summarize existing research, identify open questions, and propose future directions. Two researchers independently assessed a total of 661 studies, ultimately including 23 in the review. Our results showed that adaptation to reverberant speech is robust across diverse environments, experimental setups, speech units, and tasks, in noise-masked or unmasked conditions. The time course of adaptation is rapid, sometimes occurring in less than 1 s, but this can vary depending on the reverberation and noise levels of the acoustic environment. Adaptation is stronger in moderately reverberant rooms and minimal in rooms with very intense reverberation. While the mechanisms underlying the recalibration are largely unknown, adaptation to the direct-to-reverberant ratio-related changes in amplitude modulation appears to be the predominant candidate. However, additional factors need to be explored to provide a unified theory for the effect and its applications.
Collapse
Affiliation(s)
- Avgeris Tsironis
- Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece
| | - Eleni Vlahou
- Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece
| | - Panagiota Kontou
- Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece
| | - Pantelis Bagos
- Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece
| | - Norbert Kopčo
- Institute of Computer Science, Faculty of Science, Pavol Jozef Šafárik University, Košice, Slovakia
| |
Collapse
|
5
|
Kreft HA, Oxenham AJ. Auditory enhancement in younger and older listeners with normal and impaired hearinga). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:3821-3832. [PMID: 38109406 PMCID: PMC10730236 DOI: 10.1121/10.0023937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 11/17/2023] [Accepted: 11/20/2023] [Indexed: 12/20/2023]
Abstract
Auditory enhancement is a spectral contrast aftereffect that can facilitate the detection of novel events in an ongoing background. A single-interval paradigm combined with roved frequency content between trials can yield as much as 20 dB enhancement in young normal-hearing listeners. This study compared such enhancement in 15 listeners with sensorineural hearing loss with that in 15 age-matched adults and 15 young adults with normal audiograms. All groups were presented with stimulus levels of 70 dB sound pressure level (SPL) per component. The two groups with normal hearing were also tested at 45 dB SPL per component. The hearing-impaired listeners showed very little enhancement overall. However, when tested at the same high (70-dB) level, both young and age-matched normal-hearing listeners also showed substantially reduced enhancement, relative to that found at 45 dB SPL. Some differences in enhancement emerged between young and older normal-hearing listeners at the lower sound level. The results suggest that enhancement is highly level-dependent and may also decrease somewhat with age or slight hearing loss. Implications for hearing-impaired listeners may include a poorer ability to adapt to real-world acoustic variability, due in part to the higher levels at which sound must be presented to be audible.
Collapse
Affiliation(s)
- Heather A Kreft
- Department of Psychology, University of Minnesota, Elliott Hall, 75 East River Parkway, Minneapolis, Minnesota 55455, USA
| | - Andrew J Oxenham
- Department of Psychology, University of Minnesota, Elliott Hall, 75 East River Parkway, Minneapolis, Minnesota 55455, USA
| |
Collapse
|
6
|
Liu W, Pan X, Zhou X. The Temporal Dynamics of Stop Consonant Perception: Evidence from Context Effects. LANGUAGE AND SPEECH 2023; 66:1046-1055. [PMID: 36775903 DOI: 10.1177/00238309231153355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Empirical evidence and theoretical models suggest that phonetic category perception involves two stages of auditory and phonetic processing. However, few studies examined the time course of these two processing stages. With brief stop consonant segments as context stimuli, this study examined the temporal dynamics of stop consonant perception by varying the inter-stimulus interval between context and target stimuli. The results suggest that phonetic category activation of stop consonants may appear before 100 ms of processing time. Furthermore, the activation of phonetic categories resulted in contrast context effects on identifying the target stop continuum; the auditory processing of stop consonants resulted in a different context effect from those caused by phonetic category activation. The findings provide further evidence for the two-stage model of speech perception and reveal the time course of auditory and phonetic processing.
Collapse
Affiliation(s)
- Wenli Liu
- Department of Social Psychology, Zhou Enlai School of Government, Nankai University, China
| | - Xiaoguang Pan
- Department of Social Psychology, Zhou Enlai School of Government, Nankai University, China
| | - Xiang Zhou
- Department of Social Psychology, Zhou Enlai School of Government, Nankai University, China
| |
Collapse
|
7
|
Derawi H, Reinisch E, Gabay Y. Internal Cognitive Load Differentially Influences Acoustic and Lexical Context Effects in Speech Perception: Evidence From a Population With Attention-Deficit/Hyperactivity Disorder. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:3721-3734. [PMID: 37696049 DOI: 10.1044/2023_jslhr-23-00188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/13/2023]
Abstract
BACKGROUND To overcome variability in spoken language, listeners utilize various types of context information for disambiguating speech sounds. Context effects have been shown to be affected by cognitive load. However, previous results are mixed regarding the influence of cognitive load on the use of context information in speech perception. PURPOSE We tested a population characterized by an attention-deficit/hyperactivity disorder (ADHD) to better understand the relationship between attention (or internal cognitive load) and context effects. METHOD The use of acoustic versus lexical properties of the surrounding signal to disambiguate speech sounds was examined in listeners with ADHD and neurotypical listeners. RESULTS Compared to neurotypicals, individuals with ADHD relied more strongly on lexical context for speech perception; however, reliance on acoustic context information from speech rate did not differ. CONCLUSION These findings confirm that cognitive load impacts the use of high-level but not low-level context information in speech and imply that speech recognition deficits in ADHD likely arise due to impaired higher order cognitive processes.
Collapse
Affiliation(s)
- Hadeer Derawi
- Department of Special Education, University of Haifa, Israel
- Edmond J. Safra Brain Research Center for the Study of Learning Disabilities, University of Haifa, Israel
| | - Eva Reinisch
- Acoustics Research Institute, Austrian Academy of Sciences, Vienna, Austria
| | - Yafit Gabay
- Department of Special Education, University of Haifa, Israel
- Edmond J. Safra Brain Research Center for the Study of Learning Disabilities, University of Haifa, Israel
| |
Collapse
|
8
|
Gabay Y, Reinisch E, Even D, Binur N, Hadad BS. Intact Utilization of Contextual Information in Speech Categorization in Autism. J Autism Dev Disord 2023:10.1007/s10803-023-06106-3. [PMID: 37787847 DOI: 10.1007/s10803-023-06106-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/09/2023] [Indexed: 10/04/2023]
Abstract
Current theories of Autism Spectrum Disorder (ASD) suggest atypical use of context in ASD, but little is known about how these atypicalities influence speech perception. We examined the influence of contextual information (lexical, spectral, and temporal) on phoneme categorization of people with ASD and in typically developed (TD) people. Across three experiments, we found that people with ASD used all types of contextual information for disambiguating speech sounds to the same extent as TD; yet they exhibited a shallower identification curve when phoneme categorization required temporal processing. Overall, the results suggest that the observed atypicalities in speech perception in ASD, including the reduced sensitivity observed here, cannot be attributed merely to the limited ability to utilize context during speech perception.
Collapse
Affiliation(s)
- Yafit Gabay
- Department of Special Education, University of Haifa, 199 Abba Khoushy Ave, Haifa, 3498838, Israel.
- Edmond J. Safra Brain Research Center for the Study of Learning Disabilities, University of Haifa, 199 Abba Khoushy Ave, Haifa, 3498838, Israel.
| | - Eva Reinisch
- Acoustics Research Institute, Austrian Academy of Sciences, Wohllebengasse 12-14, Vienna, 1040, Austria
| | - Dana Even
- Department of Special Education, University of Haifa, 199 Abba Khoushy Ave, Haifa, 3498838, Israel
- Edmond J. Safra Brain Research Center for the Study of Learning Disabilities, University of Haifa, 199 Abba Khoushy Ave, Haifa, 3498838, Israel
| | - Nahal Binur
- Department of Special Education, University of Haifa, 199 Abba Khoushy Ave, Haifa, 3498838, Israel
- Edmond J. Safra Brain Research Center for the Study of Learning Disabilities, University of Haifa, 199 Abba Khoushy Ave, Haifa, 3498838, Israel
| | - Bat-Sheva Hadad
- Department of Special Education, University of Haifa, 199 Abba Khoushy Ave, Haifa, 3498838, Israel
- Edmond J. Safra Brain Research Center for the Study of Learning Disabilities, University of Haifa, 199 Abba Khoushy Ave, Haifa, 3498838, Israel
| |
Collapse
|
9
|
Xie X, Jaeger TF, Kurumada C. What we do (not) know about the mechanisms underlying adaptive speech perception: A computational framework and review. Cortex 2023; 166:377-424. [PMID: 37506665 DOI: 10.1016/j.cortex.2023.05.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 12/23/2022] [Accepted: 05/05/2023] [Indexed: 07/30/2023]
Abstract
Speech from unfamiliar talkers can be difficult to comprehend initially. These difficulties tend to dissipate with exposure, sometimes within minutes or less. Adaptivity in response to unfamiliar input is now considered a fundamental property of speech perception, and research over the past two decades has made substantial progress in identifying its characteristics. The mechanisms underlying adaptive speech perception, however, remain unknown. Past work has attributed facilitatory effects of exposure to any one of three qualitatively different hypothesized mechanisms: (1) low-level, pre-linguistic, signal normalization, (2) changes in/selection of linguistic representations, or (3) changes in post-perceptual decision-making. Direct comparisons of these hypotheses, or combinations thereof, have been lacking. We describe a general computational framework for adaptive speech perception (ASP) that-for the first time-implements all three mechanisms. We demonstrate how the framework can be used to derive predictions for experiments on perception from the acoustic properties of the stimuli. Using this approach, we find that-at the level of data analysis presently employed by most studies in the field-the signature results of influential experimental paradigms do not distinguish between the three mechanisms. This highlights the need for a change in research practices, so that future experiments provide more informative results. We recommend specific changes to experimental paradigms and data analysis. All data and code for this study are shared via OSF, including the R markdown document that this article is generated from, and an R library that implements the models we present.
Collapse
Affiliation(s)
- Xin Xie
- Language Science, University of California, Irvine, USA.
| | - T Florian Jaeger
- Brain and Cognitive Sciences, University of Rochester, Rochester, NY, USA; Computer Science, University of Rochester, Rochester, NY, USA
| | - Chigusa Kurumada
- Brain and Cognitive Sciences, University of Rochester, Rochester, NY, USA
| |
Collapse
|
10
|
Liu W, Wang T, Huang X. The influences of forward context on stop-consonant perception: The combined effects of contrast and acoustic cue activation? THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:1903-1920. [PMID: 37756574 DOI: 10.1121/10.0021077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 09/06/2023] [Indexed: 09/29/2023]
Abstract
The perception of the /da/-/ga/ series, distinguished primarily by the third formant (F3) transition, is affected by many nonspeech and speech sounds. Previous studies mainly investigated the influences of context stimuli with frequency bands located in the F3 region and proposed the account of spectral contrast effects. This study examined the effects of context stimuli with bands not in the F3 region. The results revealed that these non-F3-region stimuli (whether with bands higher or lower than the F3 region) mainly facilitated the identification of /ga/; for example, the stimuli (including frequency-modulated glides, sine-wave tones, filtered sentences, and natural vowels) in the low-frequency band (500-1500 Hz) led to more /ga/ responses than those in the low-F3 region (1500-2500 Hz). It is suggested that in the F3 region, context stimuli may act through spectral contrast effects, while in non-F3 regions, context stimuli might activate the acoustic cues of /g/ and further facilitate the identification of /ga/. The combination of contrast and acoustic cue effects can explain more results concerning the forward context influences on the perception of the /da/-/ga/ series, including the effects of non-F3-region stimuli and the imbalanced influences of context stimuli on /da/ and /ga/ perception.
Collapse
Affiliation(s)
- Wenli Liu
- Department of Social Psychology, Zhou Enlai School of Government, Nankai University, 38 Tongshuo Road, Tianjin 300350, China
| | - Tianyu Wang
- Department of Social Psychology, Zhou Enlai School of Government, Nankai University, 38 Tongshuo Road, Tianjin 300350, China
| | - Xianjun Huang
- School of Psychology, Capital Normal University, 105 North West 3rd Ring Road, Beijing 100048, China
| |
Collapse
|
11
|
Stilp C, Chodroff E. "Please say what this word is": Linguistic experience and acoustic context interact in vowel categorization . JASA EXPRESS LETTERS 2023; 3:085203. [PMID: 37555773 PMCID: PMC10416185 DOI: 10.1121/10.0020558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 07/21/2023] [Indexed: 08/10/2023]
Abstract
Ladefoged and Broadbent [(1957). J. Acoust. Soc. Am. 29(1), 98-104] is a foundational study in speech perception research, demonstrating that acoustic properties of earlier sounds alter perception of subsequent sounds: a context sentence with a lowered first formant (F1) frequency promotes perception of a raised F1 in a target word, and vice versa. The present study replicated the original with U.K. and U.S. listeners. While the direction of the perceptual shift was consistent with the original study, neither sample replicated the large effect sizes. This invites consideration of how linguistic experience relates to the magnitudes of these context effects.
Collapse
Affiliation(s)
- Christian Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| | - Eleanor Chodroff
- Department of Language and Linguistic Science, University of York, York, YO10 5DD, United ,
| |
Collapse
|
12
|
Persson A, Jaeger TF. Evaluating normalization accounts against the dense vowel space of Central Swedish. Front Psychol 2023; 14:1165742. [PMID: 37416548 PMCID: PMC10322199 DOI: 10.3389/fpsyg.2023.1165742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 05/23/2023] [Indexed: 07/08/2023] Open
Abstract
Talkers vary in the phonetic realization of their vowels. One influential hypothesis holds that listeners overcome this inter-talker variability through pre-linguistic auditory mechanisms that normalize the acoustic or phonetic cues that form the input to speech recognition. Dozens of competing normalization accounts exist-including both accounts specific to vowel perception and general purpose accounts that can be applied to any type of cue. We add to the cross-linguistic literature on this matter by comparing normalization accounts against a new phonetically annotated vowel database of Swedish, a language with a particularly dense vowel inventory of 21 vowels differing in quality and quantity. We evaluate normalization accounts on how they differ in predicted consequences for perception. The results indicate that the best performing accounts either center or standardize formants by talker. The study also suggests that general purpose accounts perform as well as vowel-specific accounts, and that vowel normalization operates in both temporal and spectral domains.
Collapse
Affiliation(s)
- Anna Persson
- Department of Swedish Language and Multilingualism, Stockholm University, Stockholm, Sweden
| | - T. Florian Jaeger
- Brain and Cognitive Sciences, University of Rochester, Rochester, NY, United States
- Computer Science, University of Rochester, Rochester, NY, United States
| |
Collapse
|
13
|
Kahloon L, Shorey AE, King CJ, Stilp CE. Clear speech promotes speaking rate normalization. JASA EXPRESS LETTERS 2023; 3:055205. [PMID: 37219432 PMCID: PMC11303017 DOI: 10.1121/10.0019499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 05/05/2023] [Indexed: 05/24/2023]
Abstract
When speaking in noisy conditions or to a hearing-impaired listener, talkers often use clear speech, which is typically slower than conversational speech. In other research, changes in speaking rate affect speech perception through speaking rate normalization: Slower context sounds encourage perception of subsequent sounds as faster, and vice versa. Here, on each trial, listeners heard a context sentence before the target word (which varied from "deer" to "tier"). Clear and slowed conversational context sentences elicited more "deer" responses than conversational sentences, consistent with rate normalization. Changing speaking styles aids speech intelligibility but might also produce other outcomes that alter sound/word recognition.
Collapse
Affiliation(s)
- Lilah Kahloon
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, , , ,
| | - Anya E Shorey
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, , , ,
| | - Caleb J King
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, , , ,
| | - Christian E Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, , , ,
| |
Collapse
|
14
|
Shorey AE, Stilp CE. Short-term, not long-term, average spectra of preceding sentences bias consonant categorization. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:2426. [PMID: 37092945 PMCID: PMC10119874 DOI: 10.1121/10.0017862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 03/31/2023] [Accepted: 03/31/2023] [Indexed: 05/03/2023]
Abstract
Speech sound perception is influenced by the spectral properties of surrounding sounds. For example, listeners perceive /g/ (lower F3 onset) more often after sounds with prominent high-F3 frequencies and perceive /d/ (higher F3 onset) more often after sounds with prominent low-F3 frequencies. These biases are known as spectral contrast effects (SCEs). Much of this work examined differences between long-term average spectra (LTAS) of preceding sounds and target speech sounds. Post hoc analyses by Stilp and Assgari [(2021) Atten. Percept. Psychophys. 83(6) 2694-2708] revealed that spectra of the last 475 ms of precursor sentences, not the entire LTAS, best predicted biases in consonant categorization. Here, the influences of proximal (last 500 ms) versus distal (before the last 500 ms) portions of precursor sentences on subsequent consonant categorization were compared. Sentences emphasized different frequency regions in each temporal window (e.g., distal low-F3 emphasis, proximal high-F3 emphasis, and vice versa) naturally or via filtering. In both cases, shifts in consonant categorization were produced in accordance with spectral properties of the proximal window. This was replicated when the distal window did not emphasize either frequency region, but the proximal window did. Results endorse closer consideration of patterns of spectral energy over time in preceding sounds, not just their LTAS.
Collapse
Affiliation(s)
- Anya E Shorey
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| | - Christian E Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| |
Collapse
|
15
|
Reinisch E, Bosker HR. Encoding speech rate in challenging listening conditions: White noise and reverberation. Atten Percept Psychophys 2022; 84:2303-2318. [PMID: 35996057 PMCID: PMC9481500 DOI: 10.3758/s13414-022-02554-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/08/2022] [Indexed: 11/08/2022]
Abstract
Temporal contrasts in speech are perceived relative to the speech rate of the surrounding context. That is, following a fast context sentence, listeners interpret a given target sound as longer than following a slow context, and vice versa. This rate effect, often referred to as "rate-dependent speech perception," has been suggested to be the result of a robust, low-level perceptual process, typically examined in quiet laboratory settings. However, speech perception often occurs in more challenging listening conditions. Therefore, we asked whether rate-dependent perception would be (partially) compromised by signal degradation relative to a clear listening condition. Specifically, we tested effects of white noise and reverberation, with the latter specifically distorting temporal information. We hypothesized that signal degradation would reduce the precision of encoding the speech rate in the context and thereby reduce the rate effect relative to a clear context. This prediction was borne out for both types of degradation in Experiment 1, where the context sentences but not the subsequent target words were degraded. However, in Experiment 2, which compared rate effects when contexts and targets were coherent in terms of signal quality, no reduction of the rate effect was found. This suggests that, when confronted with coherently degraded signals, listeners adapt to challenging listening situations, eliminating the difference between rate-dependent perception in clear and degraded conditions. Overall, the present study contributes towards understanding the consequences of different types of listening environments on the functioning of low-level perceptual processes that listeners use during speech perception.
Collapse
Affiliation(s)
- Eva Reinisch
- Acoustics Research Institute, Austrian Academy of Sciences, Wohllebengasse 12-14, 1040, Vienna, Austria.
| | - Hans Rutger Bosker
- Max Planck Institute for Psycholinguistics, PO Box 310, 6500 AH, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| |
Collapse
|
16
|
Andreeva IG, Ogorodnikova EA. Auditory Adaptation to Speech Signal Characteristics. J EVOL BIOCHEM PHYS+ 2022. [DOI: 10.1134/s0022093022050027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
17
|
Mills HE, Shorey AE, Theodore RM, Stilp CE. Context effects in perception of vowels differentiated by F 1 are not influenced by variability in talkers' mean F 1 or F 3. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:55. [PMID: 35931547 DOI: 10.1121/10.0011920] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 06/08/2022] [Indexed: 06/15/2023]
Abstract
Spectral properties of earlier sounds (context) influence recognition of later sounds (target). Acoustic variability in context stimuli can disrupt this process. When mean fundamental frequencies (f0's) of preceding context sentences were highly variable across trials, shifts in target vowel categorization [due to spectral contrast effects (SCEs)] were smaller than when sentence mean f0's were less variable; when sentences were rearranged to exhibit high or low variability in mean first formant frequencies (F1) in a given block, SCE magnitudes were equivalent [Assgari, Theodore, and Stilp (2019) J. Acoust. Soc. Am. 145(3), 1443-1454]. However, since sentences were originally chosen based on variability in mean f0, stimuli underrepresented the extent to which mean F1 could vary. Here, target vowels (/ɪ/-/ɛ/) were categorized following context sentences that varied substantially in mean F1 (experiment 1) or mean F3 (experiment 2) with variability in mean f0 held constant. In experiment 1, SCE magnitudes were equivalent whether context sentences had high or low variability in mean F1; the same pattern was observed in experiment 2 for new sentences with high or low variability in mean F3. Variability in some acoustic properties (mean f0) can be more perceptually consequential than others (mean F1, mean F3), but these results may be task-dependent.
Collapse
Affiliation(s)
- Hannah E Mills
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| | - Anya E Shorey
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| | - Rachel M Theodore
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, Storrs, Connecticut 06269, USA
| | - Christian E Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| |
Collapse
|
18
|
Yoon YS. Effect of the Target and Conflicting Frequency and Time Ranges on Consonant Enhancement in Normal-Hearing Listeners. Front Psychol 2021; 12:733100. [PMID: 34867614 PMCID: PMC8634346 DOI: 10.3389/fpsyg.2021.733100] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 10/11/2021] [Indexed: 11/29/2022] Open
Abstract
In this paper, the effects of intensifying useful frequency and time regions (target frequency and time ranges) and the removal of detrimental frequency and time regions (conflicting frequency and time ranges) for consonant enhancement were determined. Thirteen normal-hearing (NH) listeners participated in two experiments. In the first experiment, the target and conflicting frequency and time ranges for each consonant were identified under a quiet, dichotic listening condition by analyzing consonant confusion matrices. The target frequency range was defined as the frequency range that provided the highest performance and was decreased 40% from the peak performance from both high-pass filtering (HPF) and low-pass filtering (LPF) schemes. The conflicting frequency range was defined as the frequency range that yielded the peak errors of the most confused consonants and was 20% less than the peak error from both filtering schemes. The target time range was defined as a consonant segment that provided the highest performance and was decreased 40% from that peak performance when the duration of the consonant was systematically truncated from the onset. The conflicting time ranges were defined on the coincided target time range because, if they temporarily coincide, the conflicting frequency ranges would be the most detrimental factor affecting the target frequency ranges. In the second experiment, consonant recognition was binaurally measured in noise under three signal processing conditions: unprocessed, intensified target ranges by a 6-dB gain (target), and combined intensified target and removed conflicting ranges (target-conflicting). The results showed that consonant recognition improved significantly with the target condition but greatly deteriorated with a target-conflicting condition. The target condition helped transmit voicing and manner cues while the target-conflicting condition limited the transmission of these cues. Confusion analyses showed that the effect of the signal processing on consonant improvement was consonant-specific: the unprocessed condition was the best for /da, pa, ma, sa/; the target condition was the best for /ga, fa, va, za, ʒa/; and the target-conflicting condition was the best for /na, ʃa/. Perception of /ba, ta, ka/ was independent of the signal processing. The results suggest that enhancing the target ranges is an efficient way to improve consonant recognition while the removal of conflicting ranges negatively impacts consonant recognition.
Collapse
Affiliation(s)
- Yang-Soo Yoon
- Laboratory of Translational Auditory Research, Department of Communication Sciences and Disorders, Baylor University, Waco, TX, United States
| |
Collapse
|
19
|
Tao R, Zhang K, Peng G. Music Does Not Facilitate Lexical Tone Normalization: A Speech-Specific Perceptual Process. Front Psychol 2021; 12:717110. [PMID: 34777097 PMCID: PMC8585521 DOI: 10.3389/fpsyg.2021.717110] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Accepted: 09/30/2021] [Indexed: 11/13/2022] Open
Abstract
Listeners utilize the immediate contexts to efficiently normalize variable vocal streams into standard phonology units. However, researchers debated whether non-speech contexts can also serve as valid clues for speech normalization. Supporters of the two sides proposed a general-auditory hypothesis and a speech-specific hypothesis to explain the underlying mechanisms. A possible confounding factor of this inconsistency is the listeners' perceptual familiarity of the contexts, as the non-speech contexts were perceptually unfamiliar to listeners. In this study, we examined this confounding factor by recruiting a group of native Cantonese speakers with sufficient musical training experience and a control group with minimal musical training. Participants performed lexical tone judgment tasks in three contextual conditions, i.e., speech, non-speech, and music context conditions. Both groups were familiar with the speech context and not familiar with the non-speech context. The musician group was more familiar with the music context than the non-musician group. The results evidenced the lexical tone normalization process in speech context but not non-speech nor music contexts. More importantly, musicians did not outperform non-musicians on any contextual conditions even if the musicians were experienced at pitch perception, indicating that there is no noticeable transfer in pitch perception from the music domain to the linguistic domain for tonal language speakers. The findings showed that even high familiarity with a non-linguistic context cannot elicit an effective lexical tone normalization process, supporting the speech-specific basis of the perceptual normalization process.
Collapse
Affiliation(s)
| | | | - Gang Peng
- Research Centre for Language, Cognition, and Neuroscience, Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China
| |
Collapse
|
20
|
Stilp CE. Parameterizing spectral contrast effects in vowel categorization using noise contexts. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:2806. [PMID: 34717452 DOI: 10.1121/10.0006657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 09/18/2021] [Indexed: 06/13/2023]
Abstract
When spectra differ between earlier (context) and later (target) sounds, listeners perceive larger spectral changes than are physically present. When context sounds (e.g., a sentence) possess relatively higher frequencies, the target sound (e.g., a vowel sound) is perceived as possessing relatively lower frequencies, and vice versa. These spectral contrast effects (SCEs) are pervasive in auditory perception, but studies traditionally employed contexts with high spectrotemporal variability that made it difficult to understand exactly when context spectral properties biased perception. Here, contexts were speech-shaped noise divided into four consecutive 500-ms epochs. Contexts were filtered to amplify low-F1 (100-400 Hz) or high-F1 (550-850 Hz) frequencies to encourage target perception of /ɛ/ ("bet") or /ɪ/ ("bit"), respectively, via SCEs. Spectral peaks in the context ranged from its initial epoch(s) to its entire duration (onset paradigm), ranged from its final epoch(s) to its entire duration (offset paradigm), or were present for only one epoch (single paradigm). SCE magnitudes increased as spectral-peak durations increased and/or occurred later in the context (closer to the target). Contrary to predictions, brief early spectral peaks still biased subsequent target categorization. Results are compared to related experiments using speech contexts, and physiological and/or psychoacoustic idiosyncrasies of the noise contexts are considered.
Collapse
Affiliation(s)
- Christian E Stilp
- Department of Psychological and Brain Sciences, 317 Life Sciences Building, University of Louisville, Louisville, Kentucky 40292, USA
| |
Collapse
|
21
|
Bhandari P, Demberg V, Kray J. Semantic Predictability Facilitates Comprehension of Degraded Speech in a Graded Manner. Front Psychol 2021; 12:714485. [PMID: 34566795 PMCID: PMC8459870 DOI: 10.3389/fpsyg.2021.714485] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 08/06/2021] [Indexed: 01/02/2023] Open
Abstract
Previous studies have shown that at moderate levels of spectral degradation, semantic predictability facilitates language comprehension. It is argued that when speech is degraded, listeners have narrowed expectations about the sentence endings; i.e., semantic prediction may be limited to only most highly predictable sentence completions. The main objectives of this study were to (i) examine whether listeners form narrowed expectations or whether they form predictions across a wide range of probable sentence endings, (ii) assess whether the facilitatory effect of semantic predictability is modulated by perceptual adaptation to degraded speech, and (iii) use and establish a sensitive metric for the measurement of language comprehension. For this, we created 360 German Subject-Verb-Object sentences that varied in semantic predictability of a sentence-final target word in a graded manner (high, medium, and low) and levels of spectral degradation (1, 4, 6, and 8 channels noise-vocoding). These sentences were presented auditorily to two groups: One group (n =48) performed a listening task in an unpredictable channel context in which the degraded speech levels were randomized, while the other group (n =50) performed the task in a predictable channel context in which the degraded speech levels were blocked. The results showed that at 4 channels noise-vocoding, response accuracy was higher in high-predictability sentences than in the medium-predictability sentences, which in turn was higher than in the low-predictability sentences. This suggests that, in contrast to the narrowed expectations view, comprehension of moderately degraded speech, ranging from low- to high- including medium-predictability sentences, is facilitated in a graded manner; listeners probabilistically preactivate upcoming words from a wide range of semantic space, not limiting only to highly probable sentence endings. Additionally, in both channel contexts, we did not observe learning effects; i.e., response accuracy did not increase over the course of experiment, and response accuracy was higher in the predictable than in the unpredictable channel context. We speculate from these observations that when there is no trial-by-trial variation of the levels of speech degradation, listeners adapt to speech quality at a long timescale; however, when there is a trial-by-trial variation of the high-level semantic feature (e.g., sentence predictability), listeners do not adapt to low-level perceptual property (e.g., speech quality) at a short timescale.
Collapse
Affiliation(s)
- Pratik Bhandari
- Department of Psychology, Saarland University, Saarbrücken, Germany
- Department of Language Science and Technology, Saarland University, Saarbrücken, Germany
| | - Vera Demberg
- Department of Language Science and Technology, Saarland University, Saarbrücken, Germany
- Department of Computer Science, Saarland University, Saarbrücken, Germany
| | - Jutta Kray
- Department of Psychology, Saarland University, Saarbrücken, Germany
| |
Collapse
|
22
|
Rhythmic and speech rate effects in the perception of durational cues. Atten Percept Psychophys 2021; 83:3162-3182. [PMID: 34254267 DOI: 10.3758/s13414-021-02334-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/25/2021] [Indexed: 11/08/2022]
Abstract
Listeners' perception of temporal contrasts in spoken language is highly sensitive to contextual information, such as variation in speech rate. The present study tests how rate-dependent perception is also mediated by distal (i.e., temporally removed) rhythmic patterns. In four experiments the role of rhythmic alternations and their interaction with speech rate effects are tested. Experiment 1 shows proximal speech rate (contrast) effects obtain based on changes in local context. Experiment 2 shows that these effects disappear with the addition of distal rhythmic alternations, indicating that rhythmic grouping shifts listeners' perception, even when proximal context conflicts. Experiments 3 and 4 explore how orthogonal variation in overall speech rate impacts these effects and finds that trial-to-trial (i.e., global) speech rate variation eliminates rhythmic grouping effects, both with and without variation in proximal (immediately preceding) context. Together, these results suggest a role for rhythmic patterning in listeners' processing of durational cues in speech, which interacts in various ways with proximal, distal, and global rate contexts.
Collapse
|
23
|
Mohan D, Maruthy S. Vowel Context Effect on the Perception of Stop Consonants in Malayalam and Its Role in Determining Syllable Frequency. J Audiol Otol 2021; 25:124-130. [PMID: 34185978 PMCID: PMC8311060 DOI: 10.7874/jao.2021.00087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 05/21/2021] [Indexed: 11/22/2022] Open
Abstract
Background and Objectives The study investigated vowel context effects on the perception of stop consonants in Malayalam. It also probed into the role of vowel context effects in determining the frequency of occurrence of various consonant-vowel (CV) syllables in Malayalam. Subjects and Methods The study used a cross-sectional pre-experimental post-test only research design on 30 individuals with normal hearing, who were native speakers of Malayalam. The stimuli included three stop consonants, each spoken in three different vowel contexts. The resultant nine syllables were presented in original form and five gating conditions. The consonant recognition in different vowel contexts of the participants was assessed. The frequency of occurrence of the nine target syllables in the spoken corpus of Malayalam was also systematically derived. Results The consonant recognition score was better in the /u/ vowel context compared with /i/ and /a/ contexts. The frequency of occurrence of the target syllables derived from the spoken corpus of Malayalam showed that the three stop consonants occurred more frequently with the vowel /a/ compared with /u/ and /i/. Conclusions The findings show a definite vowel context effect on the perception of the Malayalam stop consonants. This context effect observed is different from that in other languages. Stop consonants are perceived better in the context of /u/ compared with the /a/ and /i/ contexts. Furthermore, the vowel context effects do not appear to determine the frequency of occurrence of different CV syllables in Malayalam.
Collapse
Affiliation(s)
- Dhanya Mohan
- Department of Audiology and Speech-Language Pathology, Baby Memorial College of Allied Medical Sciences, Kozhikode, India
| | - Sandeep Maruthy
- Department of Audiology, All India Institute of Speech and Hearing, Manasagangothri, Mysuru, India
| |
Collapse
|
24
|
Contributions of natural signal statistics to spectral context effects in consonant categorization. Atten Percept Psychophys 2021; 83:2694-2708. [PMID: 33987821 DOI: 10.3758/s13414-021-02310-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/23/2021] [Indexed: 11/08/2022]
Abstract
Speech perception, like all perception, takes place in context. Recognition of a given speech sound is influenced by the acoustic properties of surrounding sounds. When the spectral composition of earlier (context) sounds (e.g., a sentence with more energy at lower third formant [F3] frequencies) differs from that of a later (target) sound (e.g., consonant with intermediate F3 onset frequency), the auditory system magnifies this difference, biasing target categorization (e.g., towards higher-F3-onset /d/). Historically, these studies used filters to force context stimuli to possess certain spectral compositions. Recently, these effects were produced using unfiltered context sounds that already possessed the desired spectral compositions (Stilp & Assgari, 2019, Attention, Perception, & Psychophysics, 81, 2037-2052). Here, this natural signal statistics approach is extended to consonant categorization (/g/-/d/). Context sentences were either unfiltered (already possessing the desired spectral composition) or filtered (to imbue specific spectral characteristics). Long-term spectral characteristics of unfiltered contexts were poor predictors of shifts in consonant categorization, but short-term characteristics (last 475 ms) were excellent predictors. This diverges from vowel data, where long-term and shorter-term intervals (last 1,000 ms) were equally strong predictors. Thus, time scale plays a critical role in how listeners attune to signal statistics in the acoustic environment.
Collapse
|
25
|
Fogerty D, Sevich VA, Healy EW. Spectro-temporal glimpsing of speech in noise: Regularity and coherence of masking patterns reduces uncertainty and increases intelligibility. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:1552. [PMID: 33003879 PMCID: PMC7500957 DOI: 10.1121/10.0001971] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Revised: 08/27/2020] [Accepted: 08/28/2020] [Indexed: 06/11/2023]
Abstract
Adverse listening conditions involve glimpses of spectro-temporal speech information. This study investigated if the acoustic organization of the spectro-temporal masking pattern affects speech glimpsing in "checkerboard" noise. The regularity and coherence of the masking pattern was varied. Regularity was reduced by randomizing the spectral or temporal gating of the masking noise. Coherence involved the spectral alignment of frequency bands across time or the temporal alignment of gated onsets/offsets across frequency bands. Experiment 1 investigated the effect of spectral or temporal coherence. Experiment 2 investigated independent and combined factors of regularity and coherence. Performance was best in spectro-temporally modulated noise having larger glimpses. Generally, performance also improved as the regularity and coherence of masker fluctuations increased, with regularity having a stronger effect than coherence. An acoustic glimpsing model suggested that the effect of regularity (but not coherence) could be partially attributed to the availability of glimpses retained after energetic masking. Performance tended to be better with maskers that were spectrally coherent as compared to temporally coherent. Overall, performance was best when the spectro-temporal masking pattern imposed even spectral sampling and minimal temporal uncertainty, indicating that listeners use reliable masking patterns to aid in spectro-temporal speech glimpsing.
Collapse
Affiliation(s)
- Daniel Fogerty
- Department of Communication Sciences and Disorders, University of South Carolina, 1705 College Street, Columbia, South Carolina 29208, USA
| | - Victoria A Sevich
- Department of Speech and Hearing Science, The Ohio State University, 1070 Carmack Road, Columbus, Ohio 43210, USA
| | - Eric W Healy
- Department of Speech and Hearing Science, The Ohio State University, 1070 Carmack Road, Columbus, Ohio 43210, USA
| |
Collapse
|
26
|
Fogerty D, Sevich VA, Healy EW. Spectro-temporal glimpsing of speech in noise: Regularity and coherence of masking patterns reduces uncertainty and increases intelligibility. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:1552. [PMID: 33003879 DOI: 10.5041466/10.0001971] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Adverse listening conditions involve glimpses of spectro-temporal speech information. This study investigated if the acoustic organization of the spectro-temporal masking pattern affects speech glimpsing in "checkerboard" noise. The regularity and coherence of the masking pattern was varied. Regularity was reduced by randomizing the spectral or temporal gating of the masking noise. Coherence involved the spectral alignment of frequency bands across time or the temporal alignment of gated onsets/offsets across frequency bands. Experiment 1 investigated the effect of spectral or temporal coherence. Experiment 2 investigated independent and combined factors of regularity and coherence. Performance was best in spectro-temporally modulated noise having larger glimpses. Generally, performance also improved as the regularity and coherence of masker fluctuations increased, with regularity having a stronger effect than coherence. An acoustic glimpsing model suggested that the effect of regularity (but not coherence) could be partially attributed to the availability of glimpses retained after energetic masking. Performance tended to be better with maskers that were spectrally coherent as compared to temporally coherent. Overall, performance was best when the spectro-temporal masking pattern imposed even spectral sampling and minimal temporal uncertainty, indicating that listeners use reliable masking patterns to aid in spectro-temporal speech glimpsing.
Collapse
Affiliation(s)
- Daniel Fogerty
- Department of Communication Sciences and Disorders, University of South Carolina, 1705 College Street, Columbia, South Carolina 29208, USA
| | - Victoria A Sevich
- Department of Speech and Hearing Science, The Ohio State University, 1070 Carmack Road, Columbus, Ohio 43210, USA
| | - Eric W Healy
- Department of Speech and Hearing Science, The Ohio State University, 1070 Carmack Road, Columbus, Ohio 43210, USA
| |
Collapse
|
27
|
Stilp CE. Evaluating peripheral versus central contributions to spectral context effects in speech perception. Hear Res 2020; 392:107983. [PMID: 32464456 DOI: 10.1016/j.heares.2020.107983] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Revised: 04/07/2020] [Accepted: 04/28/2020] [Indexed: 11/27/2022]
Abstract
Perception of a sound is influenced by spectral properties of surrounding sounds. When frequencies are absent in a preceding acoustic context before being introduced in a subsequent target sound, detection of those frequencies is facilitated via an auditory enhancement effect (EE). When spectral composition differs across a preceding context and subsequent target sound, those differences are perceptually magnified and perception shifts via a spectral contrast effect (SCE). Each effect is thought to receive contributions from peripheral and central neural processing, but the relative contributions are unclear. The present experiments manipulated ear of presentation to elucidate the degrees to which peripheral and central processes contributed to each effect in speech perception. In Experiment 1, EE and SCE magnitudes in consonant categorization were substantially diminished through contralateral presentation of contexts and targets compared to ipsilateral or bilateral presentations. In Experiment 2, spectrally complementary contexts were presented dichotically followed by the target in only one ear. This arrangement was predicted to produce context effects peripherally and cancel them centrally, but the competing contralateral context minimally decreased effect magnitudes. Results confirm peripheral and central contributions to EEs and SCEs in speech perception, but both effects appear to be primarily due to peripheral processing.
Collapse
Affiliation(s)
- Christian E Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, KY, 40292, USA.
| |
Collapse
|
28
|
Bosker HR, Sjerps MJ, Reinisch E. Temporal contrast effects in human speech perception are immune to selective attention. Sci Rep 2020; 10:5607. [PMID: 32221376 PMCID: PMC7101381 DOI: 10.1038/s41598-020-62613-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Accepted: 03/16/2020] [Indexed: 11/09/2022] Open
Abstract
Two fundamental properties of perception are selective attention and perceptual contrast, but how these two processes interact remains unknown. Does an attended stimulus history exert a larger contrastive influence on the perception of a following target than unattended stimuli? Dutch listeners categorized target sounds with a reduced prefix "ge-" marking tense (e.g., ambiguous between gegaan-gaan "gone-go"). In 'single talker' Experiments 1-2, participants perceived the reduced syllable (reporting gegaan) when the target was heard after a fast sentence, but not after a slow sentence (reporting gaan). In 'selective attention' Experiments 3-5, participants listened to two simultaneous sentences from two different talkers, followed by the same target sounds, with instructions to attend only one of the two talkers. Critically, the speech rates of attended and unattended talkers were found to equally influence target perception - even when participants could watch the attended talker speak. In fact, participants' target perception in 'selective attention' Experiments 3-5 did not differ from participants who were explicitly instructed to divide their attention equally across the two talkers (Experiment 6). This suggests that contrast effects of speech rate are immune to selective attention, largely operating prior to attentional stream segregation in the auditory processing hierarchy.
Collapse
Affiliation(s)
- Hans Rutger Bosker
- Max Planck Institute for Psycholinguistics, PO Box 310, 6500 AH, Nijmegen, The Netherlands.
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands.
| | - Matthias J Sjerps
- Max Planck Institute for Psycholinguistics, PO Box 310, 6500 AH, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands
| | - Eva Reinisch
- Institute of Phonetics and Speech Processing, Ludwig Maximilian University Munich, Munich, Germany
- Acoustics Research Institute, Austrian Academy of Sciences, Vienna, Austria
| |
Collapse
|
29
|
Abstract
Perception of sounds occurs in the context of surrounding sounds. When spectral properties differ between earlier (context) and later (target) sounds, categorization of later sounds becomes biased through spectral contrast effects (SCEs). Past research has shown SCEs to bias categorization of speech and music alike. Recent studies have extended SCEs to naturalistic listening conditions when the inherent spectral composition of (unfiltered) sentences biased speech categorization. Here, we tested whether natural (unfiltered) music would similarly bias categorization of French horn and tenor saxophone targets. Preceding contexts were either solo performances of the French horn or tenor saxophone (unfiltered; 1 second duration in Experiment 1, or 3 seconds duration in Experiment 2) or a string quintet processed to emphasize frequencies in the horn or saxophone (filtered; 1 second duration). Both approaches produced SCEs, producing more "saxophone" responses following horn / horn-like contexts and vice versa. One-second filtered contexts produced SCEs as in previous studies, but 1-second unfiltered contexts did not. Three-second unfiltered contexts biased perception, but to a lesser degree than filtered contexts did. These results extend SCEs in musical instrument categorization to everyday listening conditions.
Collapse
|