1
|
Liu W, Wang T, Huang X. The influences of forward context on stop-consonant perception: The combined effects of contrast and acoustic cue activation? THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:1903-1920. [PMID: 37756574 DOI: 10.1121/10.0021077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 09/06/2023] [Indexed: 09/29/2023]
Abstract
The perception of the /da/-/ga/ series, distinguished primarily by the third formant (F3) transition, is affected by many nonspeech and speech sounds. Previous studies mainly investigated the influences of context stimuli with frequency bands located in the F3 region and proposed the account of spectral contrast effects. This study examined the effects of context stimuli with bands not in the F3 region. The results revealed that these non-F3-region stimuli (whether with bands higher or lower than the F3 region) mainly facilitated the identification of /ga/; for example, the stimuli (including frequency-modulated glides, sine-wave tones, filtered sentences, and natural vowels) in the low-frequency band (500-1500 Hz) led to more /ga/ responses than those in the low-F3 region (1500-2500 Hz). It is suggested that in the F3 region, context stimuli may act through spectral contrast effects, while in non-F3 regions, context stimuli might activate the acoustic cues of /g/ and further facilitate the identification of /ga/. The combination of contrast and acoustic cue effects can explain more results concerning the forward context influences on the perception of the /da/-/ga/ series, including the effects of non-F3-region stimuli and the imbalanced influences of context stimuli on /da/ and /ga/ perception.
Collapse
Affiliation(s)
- Wenli Liu
- Department of Social Psychology, Zhou Enlai School of Government, Nankai University, 38 Tongshuo Road, Tianjin 300350, China
| | - Tianyu Wang
- Department of Social Psychology, Zhou Enlai School of Government, Nankai University, 38 Tongshuo Road, Tianjin 300350, China
| | - Xianjun Huang
- School of Psychology, Capital Normal University, 105 North West 3rd Ring Road, Beijing 100048, China
| |
Collapse
|
2
|
Shorey AE, Stilp CE. Short-term, not long-term, average spectra of preceding sentences bias consonant categorization. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:2426. [PMID: 37092945 PMCID: PMC10119874 DOI: 10.1121/10.0017862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 03/31/2023] [Accepted: 03/31/2023] [Indexed: 05/03/2023]
Abstract
Speech sound perception is influenced by the spectral properties of surrounding sounds. For example, listeners perceive /g/ (lower F3 onset) more often after sounds with prominent high-F3 frequencies and perceive /d/ (higher F3 onset) more often after sounds with prominent low-F3 frequencies. These biases are known as spectral contrast effects (SCEs). Much of this work examined differences between long-term average spectra (LTAS) of preceding sounds and target speech sounds. Post hoc analyses by Stilp and Assgari [(2021) Atten. Percept. Psychophys. 83(6) 2694-2708] revealed that spectra of the last 475 ms of precursor sentences, not the entire LTAS, best predicted biases in consonant categorization. Here, the influences of proximal (last 500 ms) versus distal (before the last 500 ms) portions of precursor sentences on subsequent consonant categorization were compared. Sentences emphasized different frequency regions in each temporal window (e.g., distal low-F3 emphasis, proximal high-F3 emphasis, and vice versa) naturally or via filtering. In both cases, shifts in consonant categorization were produced in accordance with spectral properties of the proximal window. This was replicated when the distal window did not emphasize either frequency region, but the proximal window did. Results endorse closer consideration of patterns of spectral energy over time in preceding sounds, not just their LTAS.
Collapse
Affiliation(s)
- Anya E Shorey
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| | - Christian E Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| |
Collapse
|
3
|
Stilp CE. Parameterizing spectral contrast effects in vowel categorization using noise contexts. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:2806. [PMID: 34717452 DOI: 10.1121/10.0006657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 09/18/2021] [Indexed: 06/13/2023]
Abstract
When spectra differ between earlier (context) and later (target) sounds, listeners perceive larger spectral changes than are physically present. When context sounds (e.g., a sentence) possess relatively higher frequencies, the target sound (e.g., a vowel sound) is perceived as possessing relatively lower frequencies, and vice versa. These spectral contrast effects (SCEs) are pervasive in auditory perception, but studies traditionally employed contexts with high spectrotemporal variability that made it difficult to understand exactly when context spectral properties biased perception. Here, contexts were speech-shaped noise divided into four consecutive 500-ms epochs. Contexts were filtered to amplify low-F1 (100-400 Hz) or high-F1 (550-850 Hz) frequencies to encourage target perception of /ɛ/ ("bet") or /ɪ/ ("bit"), respectively, via SCEs. Spectral peaks in the context ranged from its initial epoch(s) to its entire duration (onset paradigm), ranged from its final epoch(s) to its entire duration (offset paradigm), or were present for only one epoch (single paradigm). SCE magnitudes increased as spectral-peak durations increased and/or occurred later in the context (closer to the target). Contrary to predictions, brief early spectral peaks still biased subsequent target categorization. Results are compared to related experiments using speech contexts, and physiological and/or psychoacoustic idiosyncrasies of the noise contexts are considered.
Collapse
Affiliation(s)
- Christian E Stilp
- Department of Psychological and Brain Sciences, 317 Life Sciences Building, University of Louisville, Louisville, Kentucky 40292, USA
| |
Collapse
|
4
|
Contributions of natural signal statistics to spectral context effects in consonant categorization. Atten Percept Psychophys 2021; 83:2694-2708. [PMID: 33987821 DOI: 10.3758/s13414-021-02310-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/23/2021] [Indexed: 11/08/2022]
Abstract
Speech perception, like all perception, takes place in context. Recognition of a given speech sound is influenced by the acoustic properties of surrounding sounds. When the spectral composition of earlier (context) sounds (e.g., a sentence with more energy at lower third formant [F3] frequencies) differs from that of a later (target) sound (e.g., consonant with intermediate F3 onset frequency), the auditory system magnifies this difference, biasing target categorization (e.g., towards higher-F3-onset /d/). Historically, these studies used filters to force context stimuli to possess certain spectral compositions. Recently, these effects were produced using unfiltered context sounds that already possessed the desired spectral compositions (Stilp & Assgari, 2019, Attention, Perception, & Psychophysics, 81, 2037-2052). Here, this natural signal statistics approach is extended to consonant categorization (/g/-/d/). Context sentences were either unfiltered (already possessing the desired spectral composition) or filtered (to imbue specific spectral characteristics). Long-term spectral characteristics of unfiltered contexts were poor predictors of shifts in consonant categorization, but short-term characteristics (last 475 ms) were excellent predictors. This diverges from vowel data, where long-term and shorter-term intervals (last 1,000 ms) were equally strong predictors. Thus, time scale plays a critical role in how listeners attune to signal statistics in the acoustic environment.
Collapse
|
5
|
Chodroff E, Wilson C. Acoustic-phonetic and auditory mechanisms of adaptation in the perception of sibilant fricatives. Atten Percept Psychophys 2020; 82:2027-2048. [PMID: 31875314 PMCID: PMC7297833 DOI: 10.3758/s13414-019-01894-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Listeners are highly proficient at adapting to contextual variation when perceiving speech. In the present study, we examined the effects of brief speech and nonspeech contexts on the perception of sibilant fricatives. We explored three theoretically motivated accounts of contextual adaptation, based on phonetic cue calibration, phonetic covariation, and auditory contrast. Under the cue calibration account, listeners adapt by estimating a talker-specific average for each phonetic cue or dimension; under the cue covariation account, listeners adapt by exploiting consistencies in how the realization of speech sounds varies across talkers; under the auditory contrast account, adaptation results from (partial) masking of spectral components that are shared by adjacent stimuli. The spectral center of gravity, a phonetic cue to fricative identity, was manipulated for several types of context sound: /z/-initial syllables, /v/-initial syllables, and white noise matched in long-term average spectrum (LTAS) to the /z/-initial stimuli. Listeners' perception of the /s/-/ʃ/ contrast was significantly influenced by /z/-initial syllables and LTAS-matched white noise stimuli, but not by /v/-initial syllables. No significant difference in adaptation was observed between exposure to /z/-initial syllables and matched white noise stimuli, and speech did not have a considerable advantage over noise when the two were presented consecutively within a context. The pattern of findings is most consistent with the auditory contrast account of short-term perceptual adaptation. The cue covariation account makes accurate predictions for speech contexts, but not for nonspeech contexts or for the absence of a speech-versus-nonspeech difference.
Collapse
Affiliation(s)
- Eleanor Chodroff
- Department of Language and Linguistic Science, University of York, Heslington, York, YO10 5DD, UK.
| | - Colin Wilson
- Department of Cognitive Science, Johns Hopkins University, 3400 N. Charles St., Baltimore, MD, 21218, USA
| |
Collapse
|
6
|
Abstract
Perception of sounds occurs in the context of surrounding sounds. When spectral properties differ between earlier (context) and later (target) sounds, categorization of later sounds becomes biased through spectral contrast effects (SCEs). Past research has shown SCEs to bias categorization of speech and music alike. Recent studies have extended SCEs to naturalistic listening conditions when the inherent spectral composition of (unfiltered) sentences biased speech categorization. Here, we tested whether natural (unfiltered) music would similarly bias categorization of French horn and tenor saxophone targets. Preceding contexts were either solo performances of the French horn or tenor saxophone (unfiltered; 1 second duration in Experiment 1, or 3 seconds duration in Experiment 2) or a string quintet processed to emphasize frequencies in the horn or saxophone (filtered; 1 second duration). Both approaches produced SCEs, producing more "saxophone" responses following horn / horn-like contexts and vice versa. One-second filtered contexts produced SCEs as in previous studies, but 1-second unfiltered contexts did not. Three-second unfiltered contexts biased perception, but to a lesser degree than filtered contexts did. These results extend SCEs in musical instrument categorization to everyday listening conditions.
Collapse
|
7
|
Stilp C. Acoustic context effects in speech perception. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2019; 11:e1517. [PMID: 31453667 DOI: 10.1002/wcs.1517] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Revised: 07/31/2019] [Accepted: 08/01/2019] [Indexed: 11/07/2022]
Abstract
The extreme acoustic variability of speech is well established, which makes the proficiency of human speech perception all the more impressive. Speech perception, like perception in any modality, is relative to context, and this provides a means to normalize the acoustic variability in the speech signal. Acoustic context effects in speech perception have been widely documented, but a clear understanding of how these effects relate to each other across stimuli, timescales, and acoustic domains is lacking. Here we review the influences that spectral context, temporal context, and spectrotemporal context have on speech perception. Studies are organized in terms of whether the context precedes the target (forward effects) or follows it (backward effects), and whether the context is adjacent to the target (proximal) or temporally removed from it (distal). Special cases where proximal and distal contexts have competing influences on perception are also considered. Across studies, a common theme emerges: acoustic differences between contexts and targets are perceptually magnified, producing contrast effects that facilitate perception of target sounds and words. This indicates enhanced sensitivity to changes in the acoustic environment, which maximizes the amount of potential information that can be transmitted to the perceiver. This article is categorized under: Linguistics > Language in Mind and Brain Psychology > Perception and Psychophysics.
Collapse
Affiliation(s)
- Christian Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky
| |
Collapse
|
8
|
Stilp CE. Auditory enhancement and spectral contrast effects in speech perception. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:1503. [PMID: 31472539 DOI: 10.1121/1.5120181] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Accepted: 07/11/2019] [Indexed: 06/10/2023]
Abstract
The auditory system is remarkably sensitive to changes in the acoustic environment. This is exemplified by two classic effects of preceding spectral context on perception. In auditory enhancement effects (EEs), the absence and subsequent insertion of a frequency component increases its salience. In spectral contrast effects (SCEs), spectral differences between earlier and later (target) sounds are perceptually magnified, biasing target sound categorization. These effects have been suggested to be related, but have largely been studied separately. Here, EEs and SCEs are demonstrated using the same speech materials. In Experiment 1, listeners categorized vowels (/ɪ/-/ɛ/) or consonants (/d/-/g/) following a sentence processed by a bandpass or bandstop filter (vowel tasks: 100-400 or 550-850 Hz; consonant tasks: 1700-2700 or 2700-3700 Hz). Bandpass filtering produced SCEs and bandstop filtering produced EEs, with effect magnitudes significantly correlated at the individual differences level. In Experiment 2, context sentences were processed by variable-depth notch filters in these frequency regions (-5 to -20 dB). EE magnitudes increased at larger notch depths, growing linearly in consonant categorization. This parallels previous research where SCEs increased linearly for larger spectral peaks in the context sentence. These results link EEs and SCEs, as both shape speech categorization in orderly ways.
Collapse
Affiliation(s)
- Christian E Stilp
- 317 Life Sciences Building, University of Louisville, Louisville, Kentucky 40292, USA
| |
Collapse
|
9
|
Perceptual sensitivity to spectral properties of earlier sounds during speech categorization. Atten Percept Psychophys 2019; 80:1300-1310. [PMID: 29492759 DOI: 10.3758/s13414-018-1488-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Speech perception is heavily influenced by surrounding sounds. When spectral properties differ between earlier (context) and later (target) sounds, this can produce spectral contrast effects (SCEs) that bias perception of later sounds. For example, when context sounds have more energy in low-F1 frequency regions, listeners report more high-F1 responses to a target vowel, and vice versa. SCEs have been reported using various approaches for a wide range of stimuli, but most often, large spectral peaks were added to the context to bias speech categorization. This obscures the lower limit of perceptual sensitivity to spectral properties of earlier sounds, i.e., when SCEs begin to bias speech categorization. Listeners categorized vowels (/ɪ/-/ɛ/, Experiment 1) or consonants (/d/-/g/, Experiment 2) following a context sentence with little spectral amplification (+1 to +4 dB) in frequency regions known to produce SCEs. In both experiments, +3 and +4 dB amplification in key frequency regions of the context produced SCEs, but lesser amplification was insufficient to bias performance. This establishes a lower limit of perceptual sensitivity where spectral differences across sounds can bias subsequent speech categorization. These results are consistent with proposed adaptation-based mechanisms that potentially underlie SCEs in auditory perception. SIGNIFICANCE STATEMENT Recent sounds can change what speech sounds we hear later. This can occur when the average frequency composition of earlier sounds differs from that of later sounds, biasing how they are perceived. These "spectral contrast effects" are widely observed when sounds' frequency compositions differ substantially. We reveal the lower limit of these effects, as +3 dB amplification of key frequency regions in earlier sounds was enough to bias categorization of the following vowel or consonant sound. Speech categorization being biased by very small spectral differences across sounds suggests that spectral contrast effects occur frequently in everyday speech perception.
Collapse
|
10
|
|
11
|
Assgari AA, Theodore RM, Stilp CE. Variability in talkers' fundamental frequencies shapes context effects in speech perception. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:1443. [PMID: 31067942 DOI: 10.1121/1.5093638] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2018] [Accepted: 02/22/2019] [Indexed: 06/09/2023]
Abstract
The perception of any given sound is influenced by surrounding sounds. When successive sounds differ in their spectral compositions, these differences may be perceptually magnified, resulting in spectral contrast effects (SCEs). For example, listeners are more likely to perceive /ɪ/ (low F1) following sentences with higher F1 frequencies; listeners are also more likely to perceive /ɛ/ (high F1) following sentences with lower F1 frequencies. Previous research showed that SCEs for vowel categorization were attenuated when sentence contexts were spoken by different talkers [Assgari and Stilp. (2015). J. Acoust. Soc. Am. 138(5), 3023-3032], but the locus of this diminished contextual influence was not specified. Here, three experiments examined implications of variable talker acoustics for SCEs in the categorization of /ɪ/ and /ɛ/. The results showed that SCEs were smaller when the mean fundamental frequency (f0) of context sentences was highly variable across talkers compared to when mean f0 was more consistent, even when talker gender was held constant. In contrast, SCE magnitudes were not influenced by variability in mean F1. These findings suggest that talker variability attenuates SCEs due to diminished consistency of f0 as a contextual influence. Connections between these results and talker normalization are considered.
Collapse
Affiliation(s)
- Ashley A Assgari
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| | - Rachel M Theodore
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, Storrs, Connecticut 06828, USA
| | - Christian E Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| |
Collapse
|
12
|
Musical instrument categorization is highly sensitive to spectral properties of earlier sounds. Atten Percept Psychophys 2019; 81:1119-1126. [DOI: 10.3758/s13414-019-01675-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
13
|
Stilp CE. Acoustic Context Alters Vowel Categorization in Perception of Noise-Vocoded Speech. J Assoc Res Otolaryngol 2017; 18:465-481. [PMID: 28281035 PMCID: PMC5418160 DOI: 10.1007/s10162-017-0615-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2016] [Accepted: 01/30/2017] [Indexed: 10/20/2022] Open
Abstract
Normal-hearing listeners' speech perception is widely influenced by spectral contrast effects (SCEs), where perception of a given sound is biased away from stable spectral properties of preceding sounds. Despite this influence, it is not clear how these contrast effects affect speech perception for cochlear implant (CI) users whose spectral resolution is notoriously poor. This knowledge is important for understanding how CIs might better encode key spectral properties of the listening environment. Here, SCEs were measured in normal-hearing listeners using noise-vocoded speech to simulate poor spectral resolution. Listeners heard a noise-vocoded sentence where low-F1 (100-400 Hz) or high-F1 (550-850 Hz) frequency regions were amplified to encourage "eh" (/ɛ/) or "ih" (/ɪ/) responses to the following target vowel, respectively. This was done by filtering with +20 dB (experiment 1a) or +5 dB gain (experiment 1b) or filtering using 100 % of the difference between spectral envelopes of /ɛ/ and /ɪ/ endpoint vowels (experiment 2a) or only 25 % of this difference (experiment 2b). SCEs influenced identification of noise-vocoded vowels in each experiment at every level of spectral resolution. In every case but one, SCE magnitudes exceeded those reported for full-spectrum speech, particularly when spectral peaks in the preceding sentence were large (+20 dB gain, 100 % of the spectral envelope difference). Even when spectral resolution was insufficient for accurate vowel recognition, SCEs were still evident. Results are suggestive of SCEs influencing CI users' speech perception as well, encouraging further investigation of CI users' sensitivity to acoustic context.
Collapse
Affiliation(s)
- Christian E Stilp
- University of Louisville, 317 Life Sciences Building, Louisville, KY, 40292, USA.
| |
Collapse
|