1
|
Viswanathan N, Rinzler A, Kelty-Stephen DG. Compensation for coarticulation despite a midway speaker change: Reassessing effects and implications. PLoS One 2024; 19:e0291992. [PMID: 38215074 PMCID: PMC10786362 DOI: 10.1371/journal.pone.0291992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 09/11/2023] [Indexed: 01/14/2024] Open
Abstract
Accounts of speech perception disagree on how listeners demonstrate perceptual constancy despite considerable variation in the speech signal due to speakers' coarticulation. According to the spectral contrast account, listeners' compensation for coarticulation (CfC) results from listeners perceiving the target-segment frequencies differently depending on the contrastive effects exerted by the preceding sound's frequencies. In this study, we reexamine a notable finding that listeners apparently demonstrate perceptual adjustments to coarticulation even when the identity of the speaker (i.e., the "source") changes midway between speech segments. We evaluated these apparent across-talker CfC effects on the rationale that such adjustments to coarticulation would likely be maladaptive for perceiving speech in multi-talker settings. In addition, we evaluated whether such cross-talker adaptations, if detected, were modulated by prior experience. We did so by manipulating the exposure phase of three groups of listeners by (a) merely exposing them to our stimuli (b) explicitly alerting them to talker change or (c) implicitly alerting them to this change. All groups then completed identical test blocks in which we assessed their CfC patterns in within- and across-talker conditions. Our results uniformly demonstrated that, while all three groups showed robust CfC shifts in the within-talker conditions, no such shifts were detected in the across-talker condition. Our results call into question a speaker-neutral explanation for CfC. Broadly, this demonstrates the need to carefully examine the perceptual demands placed on listeners in constrained experimental tasks and to evaluate whether the accounts that derive from such settings scale up to the demands of real-world listening.
Collapse
Affiliation(s)
- Navin Viswanathan
- Department of Communication Sciences & Disorders, The Pennsylvania State University, State College, Pennsylvania, United States of America
- Haskins Laboratories, New Haven, Connecticut, United States of America
| | - Ana Rinzler
- Department of Psychology, Rutgers University, New Brunswick, New Jersey, United States of America
- Department of Psychology, State University of New York-New Paltz, New Paltz, New York, United States of America
| | - Damian G. Kelty-Stephen
- Department of Psychology, State University of New York-New Paltz, New Paltz, New York, United States of America
| |
Collapse
|
2
|
Liu W, Wang T, Huang X. The influences of forward context on stop-consonant perception: The combined effects of contrast and acoustic cue activation? THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:1903-1920. [PMID: 37756574 DOI: 10.1121/10.0021077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 09/06/2023] [Indexed: 09/29/2023]
Abstract
The perception of the /da/-/ga/ series, distinguished primarily by the third formant (F3) transition, is affected by many nonspeech and speech sounds. Previous studies mainly investigated the influences of context stimuli with frequency bands located in the F3 region and proposed the account of spectral contrast effects. This study examined the effects of context stimuli with bands not in the F3 region. The results revealed that these non-F3-region stimuli (whether with bands higher or lower than the F3 region) mainly facilitated the identification of /ga/; for example, the stimuli (including frequency-modulated glides, sine-wave tones, filtered sentences, and natural vowels) in the low-frequency band (500-1500 Hz) led to more /ga/ responses than those in the low-F3 region (1500-2500 Hz). It is suggested that in the F3 region, context stimuli may act through spectral contrast effects, while in non-F3 regions, context stimuli might activate the acoustic cues of /g/ and further facilitate the identification of /ga/. The combination of contrast and acoustic cue effects can explain more results concerning the forward context influences on the perception of the /da/-/ga/ series, including the effects of non-F3-region stimuli and the imbalanced influences of context stimuli on /da/ and /ga/ perception.
Collapse
Affiliation(s)
- Wenli Liu
- Department of Social Psychology, Zhou Enlai School of Government, Nankai University, 38 Tongshuo Road, Tianjin 300350, China
| | - Tianyu Wang
- Department of Social Psychology, Zhou Enlai School of Government, Nankai University, 38 Tongshuo Road, Tianjin 300350, China
| | - Xianjun Huang
- School of Psychology, Capital Normal University, 105 North West 3rd Ring Road, Beijing 100048, China
| |
Collapse
|
3
|
Tao R, Zhang K, Peng G. Music Does Not Facilitate Lexical Tone Normalization: A Speech-Specific Perceptual Process. Front Psychol 2021; 12:717110. [PMID: 34777097 PMCID: PMC8585521 DOI: 10.3389/fpsyg.2021.717110] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Accepted: 09/30/2021] [Indexed: 11/13/2022] Open
Abstract
Listeners utilize the immediate contexts to efficiently normalize variable vocal streams into standard phonology units. However, researchers debated whether non-speech contexts can also serve as valid clues for speech normalization. Supporters of the two sides proposed a general-auditory hypothesis and a speech-specific hypothesis to explain the underlying mechanisms. A possible confounding factor of this inconsistency is the listeners' perceptual familiarity of the contexts, as the non-speech contexts were perceptually unfamiliar to listeners. In this study, we examined this confounding factor by recruiting a group of native Cantonese speakers with sufficient musical training experience and a control group with minimal musical training. Participants performed lexical tone judgment tasks in three contextual conditions, i.e., speech, non-speech, and music context conditions. Both groups were familiar with the speech context and not familiar with the non-speech context. The musician group was more familiar with the music context than the non-musician group. The results evidenced the lexical tone normalization process in speech context but not non-speech nor music contexts. More importantly, musicians did not outperform non-musicians on any contextual conditions even if the musicians were experienced at pitch perception, indicating that there is no noticeable transfer in pitch perception from the music domain to the linguistic domain for tonal language speakers. The findings showed that even high familiarity with a non-linguistic context cannot elicit an effective lexical tone normalization process, supporting the speech-specific basis of the perceptual normalization process.
Collapse
Affiliation(s)
| | | | - Gang Peng
- Research Centre for Language, Cognition, and Neuroscience, Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China
| |
Collapse
|
4
|
Feng L, Oxenham AJ. Spectral Contrast Effects Reveal Different Acoustic Cues for Vowel Recognition in Cochlear-Implant Users. Ear Hear 2021; 41:990-997. [PMID: 31815819 PMCID: PMC7874522 DOI: 10.1097/aud.0000000000000820] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES The identity of a speech sound can be affected by the spectrum of a preceding stimulus in a contrastive manner. Although such aftereffects are often reduced in people with hearing loss and cochlear implants (CIs), one recent study demonstrated larger spectral contrast effects in CI users than in normal-hearing (NH) listeners. The present study aimed to shed light on this puzzling finding. We hypothesized that poorer spectral resolution leads CI users to rely on different acoustic cues not only to identify speech sounds but also to adapt to the context. DESIGN Thirteen postlingually deafened adult CI users and 33 NH participants (listening to either vocoded or unprocessed speech) participated in this study. Psychometric functions were estimated in a vowel categorization task along the /I/ to /ε/ (as in "bit" and "bet") continuum following a context sentence, the long-term average spectrum of which was manipulated at the level of either fine-grained local spectral cues or coarser global spectral cues. RESULTS In NH listeners with unprocessed speech, the aftereffect was determined solely by the fine-grained local spectral cues, resulting in a surprising insensitivity to the larger, global spectral cues utilized by CI users. Restricting the spectral resolution available to NH listeners via vocoding resulted in patterns of responses more similar to those found in CI users. However, the size of the contrast aftereffect remained smaller in NH listeners than in CI users. CONCLUSIONS Only the spectral contrasts used by listeners contributed to the spectral contrast effects in vowel identification. These results explain why CI users can experience larger-than-normal context effects under specific conditions. The results also suggest that adaptation to new spectral cues can be very rapid for vowel discrimination, but may follow a longer time course to influence spectral contrast effects.
Collapse
Affiliation(s)
- Lei Feng
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota, USA
| | | |
Collapse
|
5
|
Zhang K, Sjerps MJ, Peng G. Integral perception, but separate processing: The perceptual normalization of lexical tones and vowels. Neuropsychologia 2021; 156:107839. [PMID: 33798490 DOI: 10.1016/j.neuropsychologia.2021.107839] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Revised: 03/20/2021] [Accepted: 03/26/2021] [Indexed: 11/28/2022]
Abstract
In tonal languages, speech variability arises in both lexical tone (i.e., suprasegmentally) and vowel quality (segmentally). Listeners can use surrounding speech context to overcome variability in both speech cues, a process known as extrinsic normalization. Although vowels are the main carriers of tones, it is still unknown whether the combined percept (lexical tone and vowel quality) is normalized integrally or in partly separate processes. Here we used electroencephalography (EEG) to investigate the time course of lexical tone normalization and vowel normalization to answer this question. Cantonese adults listened to synthesized three-syllable stimuli in which the identity of a target syllable - ambiguous between high vs. mid-tone (Tone condition) or between /o/ vs. /u/ (Vowel condition) - was dependent on either the tone range (Tone condition) or the formant range (Vowel condition) of the first two syllables. It was observed that the ambiguous tone was more often interpreted as a high-level tone when the context had a relatively low pitch than when it had a high pitch (Tone condition). Similarly, the ambiguous vowel was more often interpreted as /o/ when the context had a relatively low formant range than when it had a relatively high formant range (Vowel condition). These findings show the typical pattern of extrinsic tone and vowel normalization. Importantly, the EEG results of participants showing the contrastive normalization effect demonstrated that the effects of vowel normalization could already be observed within the N2 time window (190-350 ms), while the first reliable effect of lexical tone normalization on cortical processing was observable only from the P3 time window (220-500 ms) onwards. The ERP patterns demonstrate that the contrastive perceptual normalization of lexical tones and that of vowels occur at least in partially separate time windows. This suggests that the extrinsic normalization can operate at the level of phonemes and tonemes separately instead of operating on the whole syllable at once.
Collapse
Affiliation(s)
- Kaile Zhang
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region.
| | - Matthias J Sjerps
- Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, Radboud University, Kapittelweg 29, Nijmegen, 6525 EN, the Netherlands.
| | - Gang Peng
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region; Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Boulevard, Shenzhen, 518055, China.
| |
Collapse
|
6
|
Sjerps MJ, Fox NP, Johnson K, Chang EF. Speaker-normalized sound representations in the human auditory cortex. Nat Commun 2019; 10:2465. [PMID: 31165733 PMCID: PMC6549175 DOI: 10.1038/s41467-019-10365-z] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Accepted: 05/03/2019] [Indexed: 11/08/2022] Open
Abstract
The acoustic dimensions that distinguish speech sounds (like the vowel differences in "boot" and "boat") also differentiate speakers' voices. Therefore, listeners must normalize across speakers without losing linguistic information. Past behavioral work suggests an important role for auditory contrast enhancement in normalization: preceding context affects listeners' perception of subsequent speech sounds. Here, using intracranial electrocorticography in humans, we investigate whether and how such context effects arise in auditory cortex. Participants identified speech sounds that were preceded by phrases from two different speakers whose voices differed along the same acoustic dimension as target words (the lowest resonance of the vocal tract). In every participant, target vowels evoke a speaker-dependent neural response that is consistent with the listener's perception, and which follows from a contrast enhancement model. Auditory cortex processing thus displays a critical feature of normalization, allowing listeners to extract meaningful content from the voices of diverse speakers.
Collapse
Affiliation(s)
- Matthias J Sjerps
- Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, Radboud University, Kapittelweg 29, Nijmegen, 6525 EN, The Netherlands
- Max Planck Institute for Psycholinguistics, Wundtlaan 1, Nijmegen, 6525 XD, Netherlands
| | - Neal P Fox
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, California, 94158, USA
| | - Keith Johnson
- Department of Linguistics, University of California, Berkeley, 1203 Dwinelle Hall #2650, Berkeley, California, 94720, USA
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, California, 94158, USA.
- Weill Institute for Neurosciences, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, California, 94158, USA.
| |
Collapse
|
7
|
Comparing speech and nonspeech context effects across timescales in coarticulatory contexts. Atten Percept Psychophys 2019; 80:316-324. [PMID: 29134576 DOI: 10.3758/s13414-017-1449-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Context effects are ubiquitous in speech perception and reflect the ability of human listeners to successfully perceive highly variable speech signals. In the study of how listeners compensate for coarticulatory variability, past studies have used similar effects speech and tone analogues of speech as strong support for speech-neutral, general auditory mechanisms for compensation for coarticulation. In this manuscript, we revisit compensation for coarticulation by replacing standard button-press responses with mouse-tracking responses and examining both standard geometric measures of uncertainty as well as newer information-theoretic measures that separate fast from slow mouse movements. We found that when our analyses were restricted to end-state responses, tones and speech contexts appeared to produce similar effects. However, a more detailed time-course analysis revealed systematic differences between speech and tone contexts such that listeners' responses to speech contexts, but not to tone contexts, changed across the experimental session. Analyses of the time course of effects within trials using mouse tracking indicated that speech contexts elicited fewer x-position flips but more area under the curve (AUC) and maximum deviation (MD), and they did so in the slower portions of mouse-tracking movements. Our results indicate critical differences between the time course of speech and nonspeech context effects and that general auditory explanations, motivated by their apparent similarity, be reexamined.
Collapse
|
8
|
Viswanathan N, Magnuson JS, Fowler CA. Information for coarticulation: Static signal properties or formant dynamics? J Exp Psychol Hum Percept Perform 2014; 40:1228-36. [PMID: 24730744 DOI: 10.1037/a0036214] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Perception of a speech segment changes depending on properties of surrounding segments in a phenomenon called compensation for coarticulation (Mann, 1980). The nature of information that drives these perceptual changes is a matter of debate. One account attributes perceptual shifts to low-level auditory system contrast effects based on static portions of the signal (e.g., third formant [F3] center or average frequency; Lotto & Kluender, 1998). An alternative account is that listeners' perceptual shifts result from listeners attuning to the acoustic effects of gestural overlap and that this information for coarticulation is necessarily dynamic (Fowler, 2006). In a pair of experiments, we used sinewave speech precursors to investigate the nature of information for compensation for coarticulation. In Experiment 1, as expected by both accounts, we found that sinewave speech precursors produce shifts in following segments. In Experiment 2, we investigated whether effects in Experiment 1 were driven by static F3 offsets of sinewave speech precursors, or by dynamic relationships among their formants. We temporally reversed F1 and F2 in sinewave precursors, preserving static F3 offset and average F1, F2 and F3 frequencies, but disrupting dynamic formant relationships. Despite having identical F3s, selectively reversed precursors produced effects that were significantly smaller and restricted to only a small portion of the continuum. We conclude that dynamic formant relations rather than static properties of the precursor provide information for compensation for coarticulation.
Collapse
|