51
|
Viswanathan N, Magnuson JS, Fowler CA. Information for coarticulation: Static signal properties or formant dynamics? J Exp Psychol Hum Percept Perform 2014; 40:1228-36. [PMID: 24730744 DOI: 10.1037/a0036214] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Perception of a speech segment changes depending on properties of surrounding segments in a phenomenon called compensation for coarticulation (Mann, 1980). The nature of information that drives these perceptual changes is a matter of debate. One account attributes perceptual shifts to low-level auditory system contrast effects based on static portions of the signal (e.g., third formant [F3] center or average frequency; Lotto & Kluender, 1998). An alternative account is that listeners' perceptual shifts result from listeners attuning to the acoustic effects of gestural overlap and that this information for coarticulation is necessarily dynamic (Fowler, 2006). In a pair of experiments, we used sinewave speech precursors to investigate the nature of information for compensation for coarticulation. In Experiment 1, as expected by both accounts, we found that sinewave speech precursors produce shifts in following segments. In Experiment 2, we investigated whether effects in Experiment 1 were driven by static F3 offsets of sinewave speech precursors, or by dynamic relationships among their formants. We temporally reversed F1 and F2 in sinewave precursors, preserving static F3 offset and average F1, F2 and F3 frequencies, but disrupting dynamic formant relationships. Despite having identical F3s, selectively reversed precursors produced effects that were significantly smaller and restricted to only a small portion of the continuum. We conclude that dynamic formant relations rather than static properties of the precursor provide information for compensation for coarticulation.
Collapse
|
52
|
|
53
|
Apfelbaum KS, Bullock-Rest N, Rhone AE, Jongman A, McMurray B. Contingent categorization in speech perception. LANGUAGE, COGNITION AND NEUROSCIENCE 2014; 29:1070-1082. [PMID: 25157376 PMCID: PMC4141128 DOI: 10.1080/01690965.2013.824995] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
The speech signal is notoriously variable, with the same phoneme realized differently depending on factors like talker and phonetic context. Variance in the speech signal has led to a proliferation of theories of how listeners recognize speech. A promising approach, supported by computational modeling studies, is contingent categorization, wherein incoming acoustic cues are computed relative to expectations. We tested contingent encoding empirically. Listeners were asked to categorize fricatives in CV syllables constructed by splicing the fricative from one CV syllable with the vowel from another CV syllable. The two spliced syllables always contained the same fricative, providing consistent bottom-up cues; however on some trials, the vowel and/or talker mismatched between these syllables, giving conflicting contextual information. Listeners were less accurate and slower at identifying the fricatives in mismatching splices. This suggests that listeners rely on context information beyond bottom-up acoustic cues during speech perception, providing support for contingent categorization.
Collapse
Affiliation(s)
- Keith S. Apfelbaum
- Dept. of Psychology, University of Iowa, E11 SSH, Iowa City, IA 52242, (319)335-0692
| | - Natasha Bullock-Rest
- Dept. of Psychology, University of Iowa, E11 SSH, Iowa City, IA 52242, (319)335-0692
| | - Ariane E. Rhone
- Dept. of Neurosurgery, University of Iowa, 1825 JPP, Iowa City, IA 52242, (319)335-7049
| | - Allard Jongman
- Dept. of Linguistics, University of Kansas, 1541 Lilac Ln., Lawrence, KS 66044, (785)864-2384
| | - Bob McMurray
- Dept. of Psychology, Dept. of Communication Sciences and Disorders and Delta Center, University of Iowa, E11 SSH, Iowa City, IA 52242, (319)335-2408
| |
Collapse
|
54
|
Zäske R, Skuk VG, Kaufmann JM, Schweinberger SR. Perceiving vocal age and gender: an adaptation approach. Acta Psychol (Amst) 2013; 144:583-93. [PMID: 24140826 DOI: 10.1016/j.actpsy.2013.09.009] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2013] [Revised: 08/20/2013] [Accepted: 09/20/2013] [Indexed: 12/01/2022] Open
Abstract
Aftereffects of adaptation have revealed both independent and interactive coding of facial signals including identity and expression or gender and age. By contrast, interactive processing of non-linguistic features in voices has rarely been investigated. Here we studied bidirectional cross-categorical aftereffects of adaptation to vocal age and gender. Prolonged exposure to young (~20yrs) or old (~70yrs) male or female voices biased perception of subsequent test voices away from the adapting age (Exp. 1) and the adapting gender (Exp. 2). Relative to gender-congruent adaptor-test pairings, vocal age aftereffects (VAAEs) were reduced but remained significant when voice gender changed between adaptation and test. This suggests that the VAAE relies on both gender-specific and gender-independent age representations for male and female voices. By contrast, voice gender aftereffects (VGAEs) were not modulated by age-congruency of adaptor and test voices (Exp. 2). Instead, young voice adaptors generally induced larger VGAEs than old voice adaptors. This suggests that young voices are particularly efficient gender adaptors, likely reflecting more pronounced sexual dimorphism in these voices. In sum, our findings demonstrate how high-level processing of vocal age and gender is partially intertwined.
Collapse
Affiliation(s)
- Romi Zäske
- Department of General Psychology and Cognitive Neuroscience, Friedrich Schiller University of Jena, Germany; DFG Research Unit Person Perception, Friedrich Schiller University of Jena, Germany.
| | | | | | | |
Collapse
|
55
|
Vitela AD, Warner N, Lotto AJ. Perceptual compensation for differences in speaking style. Front Psychol 2013; 4:399. [PMID: 23847573 PMCID: PMC3698514 DOI: 10.3389/fpsyg.2013.00399] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2013] [Accepted: 06/13/2013] [Indexed: 11/13/2022] Open
Abstract
It is well-established that listeners will shift their categorization of a target vowel as a function of acoustic characteristics of a preceding carrier phrase (CP). These results have been interpreted as an example of perceptual normalization for variability resulting from differences in talker anatomy. The present study examined whether listeners would normalize for acoustic variability resulting from differences in speaking style within a single talker. Two vowel series were synthesized that varied between central and peripheral vowels (the vowels in "beat"-"bit" and "bod"-"bud"). Each member of the series was appended to one of four CPs that were spoken in either a "clear" or "reduced" speech style. Participants categorized vowels in these eight contexts. A reliable shift in categorization as a function of speaking style was obtained for three of four phrase sets. This demonstrates that phrase context effects can be obtained with a single talker. However, the directions of the obtained shifts are not reliably predicted on the basis of the speaking style of the talker. Instead, it appears that the effect is determined by an interaction of the average spectrum of the phrase with the target vowel.
Collapse
Affiliation(s)
- A. Davi Vitela
- Speech, Language and Hearing Sciences, University of ArizonaTucson, AZ, USA
| | - Natasha Warner
- Department of Linguistics, University of ArizonaTucson, AZ, USA
| | - Andrew J. Lotto
- Speech, Language and Hearing Sciences, University of ArizonaTucson, AZ, USA
| |
Collapse
|
56
|
Spectral information in nonspeech contexts influences children's categorization of ambiguous speech sounds. J Exp Child Psychol 2013; 116:728-37. [PMID: 23827642 DOI: 10.1016/j.jecp.2013.05.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2012] [Revised: 05/20/2013] [Accepted: 05/26/2013] [Indexed: 11/20/2022]
Abstract
For both adults and children, acoustic context plays an important role in speech perception. For adults, both speech and nonspeech acoustic contexts influence perception of subsequent speech items, consistent with the argument that effects of context are due to domain-general auditory processes. However, prior research examining the effects of context on children's speech perception have focused on speech contexts; nonspeech contexts have not been explored previously. To better understand the developmental progression of children's use of contexts in speech perception and the mechanisms underlying that development, we created a novel experimental paradigm testing 5-year-old children's speech perception in several acoustic contexts. The results demonstrated that nonspeech context influences children's speech perception, consistent with claims that context effects arise from general auditory system properties rather than speech-specific mechanisms. This supports theoretical accounts of language development suggesting that domain-general processes play a role across the lifespan.
Collapse
|
57
|
Piazza EA, Sweeny TD, Wessel D, Silver MA, Whitney D. Humans use summary statistics to perceive auditory sequences. Psychol Sci 2013; 24:1389-97. [PMID: 23761928 DOI: 10.1177/0956797612473759] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
In vision, humans use summary statistics (e.g., the average facial expression of a crowd) to efficiently perceive the gist of groups of features. Here, we present direct evidence that ensemble coding is also important for auditory processing. We found that listeners could accurately estimate the mean frequency of a set of logarithmically spaced pure tones presented in a temporal sequence (Experiment 1). Their performance was severely reduced when only a subset of tones from a given sequence was presented (Experiment 2), which demonstrates that ensemble coding is based on a substantial number of the tones in a sequence. This precise ensemble coding occurred despite very limited representation of individual tones from the sequence: Listeners were poor at identifying specific individual member tones (Experiment 3) and at determining their positions in the sequence (Experiment 4). Together, these results indicate that summary statistical coding is not limited to visual processing and is an important auditory mechanism for extracting ensemble frequency information from sequences of sounds.
Collapse
Affiliation(s)
- Elise A Piazza
- Vision Science Program, University of California, Berkeley 94720-2020, USA.
| | | | | | | | | |
Collapse
|
58
|
Viswanathan N, Magnuson JS, Fowler CA. Similar response patterns do not imply identical origins: an energetic masking account of nonspeech effects in compensation for coarticulation. J Exp Psychol Hum Percept Perform 2012; 39:1181-92. [PMID: 23148469 DOI: 10.1037/a0030735] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Nonspeech materials are widely used to identify basic mechanisms underlying speech perception. For instance, they have been used to examine the origin of compensation for coarticulation, the observation that listeners' categorization of phonetic segments depends on neighboring segments (Mann, 1980). Specifically, nonspeech precursors matched to critical formant frequencies of speech precursors have been shown to produce similar categorization shifts as speech contexts. This observation has been interpreted to mean that spectrally contrastive frequency relations between neighboring segments underlie the categorization shifts observed after speech, as well as nonspeech precursors (Lotto & Kluender, 1998). From the gestural perspective, however, categorization shifts in speech contexts occur because of listeners' sensitivity to acoustic information for coarticulatory gestural overlap in production; in nonspeech contexts, this occurs because of energetic masking of acoustic information for gestures. In 2 experiments, we distinguish the energetic masking and spectral contrast accounts. In Experiment 1, we investigated the effects of varying precursor tone frequency on speech categorization. Consistent only with the masking account, tonal effects were greater for frequencies close enough to those in the target syllables for masking to occur. In Experiment 2, we filtered the target stimuli to simulate effects of masking and obtained behavioral outcomes that closely resemble those with nonspeech tones. We conclude that masking provides the more plausible account of nonspeech context effects. More generally, we suggest that similar results from the use of speech and nonspeech materials do not automatically imply identical origins and that the use of nonspeech in speech studies entails careful examination of the nature of information in the nonspeech materials.
Collapse
Affiliation(s)
- Navin Viswanathan
- Department of Psychology, State University of New York, New Paltz, NY 12561-2440, USA.
| | | | | |
Collapse
|
59
|
Perceptual averaging by eye and ear: computing summary statistics from multimodal stimuli. Atten Percept Psychophys 2012; 74:810-5. [PMID: 22565575 DOI: 10.3758/s13414-012-0293-0] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Beyond perceiving the features of individual objects, we also have the intriguing ability to efficiently perceive average values of collections of objects across various dimensions. Over what features can perceptual averaging occur? Work to date has been limited to visual properties, but perceptual experience is intrinsically multimodal. In an initial exploration of how this process operates in multimodal environments, we explored statistical summarizing in audition (averaging pitch from a sequence of tones) and vision (averaging size from a sequence of discs), and their interaction. We observed two primary results. First, not only was auditory averaging robust, but if anything, it was more accurate than visual averaging in the present study. Second, when uncorrelated visual and auditory information were simultaneously present, observers showed little cost for averaging in either modality when they did not know until the end of each trial which average they had to report. These results illustrate that perceptual averaging can span different sensory modalities, and they also illustrate how vision and audition can both cooperate and compete for resources.
Collapse
|
60
|
Wagner M, Shafer VL, Martin B, Steinschneider M. The phonotactic influence on the perception of a consonant cluster /pt/ by native English and native Polish listeners: a behavioral and event related potential (ERP) study. BRAIN AND LANGUAGE 2012; 123:30-41. [PMID: 22867752 PMCID: PMC3645296 DOI: 10.1016/j.bandl.2012.06.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2011] [Revised: 06/02/2012] [Accepted: 06/15/2012] [Indexed: 06/01/2023]
Abstract
The effect of exposure to the contextual features of the /pt/ cluster was investigated in native-English and native-Polish listeners using behavioral and event-related potential (ERP) methodology. Both groups experience the /pt/ cluster in their languages, but only the Polish group experiences the cluster in the context of word onset examined in the current experiment. The /st/ cluster was used as an experimental control. ERPs were recorded while participants identified the number of syllables in the second word of nonsense word pairs. The results found that only Polish listeners accurately perceived the /pt/ cluster and perception was reflected within a late positive component of the ERP waveform. Furthermore, evidence of discrimination of /pt/ and /pət/ onsets in the neural signal was found even for non-native listeners who could not perceive the difference. These findings suggest that exposure to phoneme sequences in highly specific contexts may be necessary for accurate perception.
Collapse
Affiliation(s)
- Monica Wagner
- The City University of New York - Graduate School and University Center, Program in Speech-Language-Hearing Sciences, NY 10016, USA.
| | | | | | | |
Collapse
|
61
|
Zhang C, Peng G, Wang WSY. Unequal effects of speech and nonspeech contexts on the perceptual normalization of Cantonese level tones. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 132:1088-1099. [PMID: 22894228 DOI: 10.1121/1.4731470] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Context is important for recovering language information from talker-induced variability in acoustic signals. In tone perception, previous studies reported similar effects of speech and nonspeech contexts in Mandarin, supporting a general perceptual mechanism underlying tone normalization. However, no supportive evidence was obtained in Cantonese, also a tone language. Moreover, no study has compared speech and nonspeech contexts in the multi-talker condition, which is essential for exploring the normalization mechanism of inter-talker variability in speaking F0. The other question is whether a talker's full F0 range and mean F0 equally facilitate normalization. To answer these questions, this study examines the effects of four context conditions (speech/nonspeech × F0 contour/mean F0) in the multi-talker condition in Cantonese. Results show that raising and lowering the F0 of speech contexts change the perception of identical stimuli from mid level tone to low and high level tone, whereas nonspeech contexts only mildly increase the identification preference. It supports the speech-specific mechanism of tone normalization. Moreover, speech context with flattened F0 trajectory, which neutralizes cues of a talker's full F0 range, fails to facilitate normalization in some conditions, implying that a talker's mean F0 is less efficient for minimizing talker-induced lexical ambiguity in tone perception.
Collapse
Affiliation(s)
- Caicai Zhang
- Language Engineering Laboratory, The Chinese University of Hong Kong, Hong Kong Special Administrative Region.
| | | | | |
Collapse
|
62
|
Latinus M, Belin P. Perceptual auditory aftereffects on voice identity using brief vowel stimuli. PLoS One 2012; 7:e41384. [PMID: 22844469 PMCID: PMC3402520 DOI: 10.1371/journal.pone.0041384] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2012] [Accepted: 06/25/2012] [Indexed: 11/18/2022] Open
Abstract
Humans can identify individuals from their voice, suggesting the existence of a perceptual representation of voice identity. We used perceptual aftereffects--shifts in perceived stimulus quality after brief exposure to a repeated adaptor stimulus--to further investigate the representation of voice identity in two experiments. Healthy adult listeners were familiarized with several voices until they reached a recognition criterion. They were then tested on identification tasks that used vowel stimuli generated by morphing between the different identities, presented either in isolation (baseline) or following short exposure to different types of voice adaptors (adaptation). Experiment 1 showed that adaptation to a given voice induced categorization shifts away from that adaptor's identity even when the adaptors consisted of vowels different from the probe stimuli. Moreover, original voices and caricatures resulted in comparable aftereffects, ruling out an explanation of identity aftereffects in terms of adaptation to low-level features. In Experiment 2, we show that adaptors with a disrupted configuration, i.e., altered fundamental frequency or formant frequencies, failed to produce perceptual aftereffects showing the importance of the preserved configuration of these acoustical cues in the representation of voices. These two experiments indicate a high-level, dynamic representation of voice identity based on the combination of several lower-level acoustical features into a specific voice configuration.
Collapse
Affiliation(s)
- Marianne Latinus
- Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, United Kingdom.
| | | |
Collapse
|
63
|
Laing EJC, Liu R, Lotto AJ, Holt LL. Tuned with a Tune: Talker Normalization via General Auditory Processes. Front Psychol 2012; 3:203. [PMID: 22737140 PMCID: PMC3381219 DOI: 10.3389/fpsyg.2012.00203] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2012] [Accepted: 05/31/2012] [Indexed: 11/22/2022] Open
Abstract
Voices have unique acoustic signatures, contributing to the acoustic variability listeners must contend with in perceiving speech, and it has long been proposed that listeners normalize speech perception to information extracted from a talker's speech. Initial attempts to explain talker normalization relied on extraction of articulatory referents, but recent studies of context-dependent auditory perception suggest that general auditory referents such as the long-term average spectrum (LTAS) of a talker's speech similarly affect speech perception. The present study aimed to differentiate the contributions of articulatory/linguistic versus auditory referents for context-driven talker normalization effects and, more specifically, to identify the specific constraints under which such contexts impact speech perception. Synthesized sentences manipulated to sound like different talkers influenced categorization of a subsequent speech target only when differences in the sentences' LTAS were in the frequency range of the acoustic cues relevant for the target phonemic contrast. This effect was true both for speech targets preceded by spoken sentence contexts and for targets preceded by non-speech tone sequences that were LTAS-matched to the spoken sentence contexts. Specific LTAS characteristics, rather than perceived talker, predicted the results suggesting that general auditory mechanisms play an important role in effects considered to be instances of perceptual talker normalization.
Collapse
Affiliation(s)
- Erika J. C. Laing
- Brain Mapping Center, University of Pittsburgh Medical CenterPittsburgh, PA, USA
| | - Ran Liu
- Department of Psychology, Center for the Neural Basis of Cognition, Carnegie Mellon UniversityPittsburgh, PA, USA
| | - Andrew J. Lotto
- Speech, Language and Hearing Sciences, University of ArizonaTucson, AZ, USA
| | - Lori L. Holt
- Department of Psychology, Center for the Neural Basis of Cognition, Carnegie Mellon UniversityPittsburgh, PA, USA
| |
Collapse
|
64
|
Huang J, Holt LL. Listening for the norm: adaptive coding in speech categorization. Front Psychol 2012; 3:10. [PMID: 22347198 PMCID: PMC3272641 DOI: 10.3389/fpsyg.2012.00010] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2011] [Accepted: 01/10/2012] [Indexed: 12/02/2022] Open
Abstract
Perceptual aftereffects have been referred to as "the psychologist's microelectrode" because they can expose dimensions of representation through the residual effect of a context stimulus upon perception of a subsequent target. The present study uses such context-dependence to examine the dimensions of representation involved in a classic demonstration of "talker normalization" in speech perception. Whereas most accounts of talker normalization have emphasized talker-, speech-, or articulatory-specific dimensions' significance, the present work tests an alternative hypothesis: that the long-term average spectrum (LTAS) of speech context is responsible for patterns of context-dependent perception considered to be evidence for talker normalization. In support of this hypothesis, listeners' vowel categorization was equivalently influenced by speech contexts manipulated to sound as though they were spoken by different talkers and non-speech analogs matched in LTAS to the speech contexts. Since the non-speech contexts did not possess talker, speech, or articulatory information, general perceptual mechanisms are implicated. Results are described in terms of adaptive perceptual coding.
Collapse
Affiliation(s)
- Jingyuan Huang
- Department of Psychology, Center for the Neural Basis of Cognition, Carnegie Mellon UniversityPittsburgh, PA, USA
| | - Lori L. Holt
- Department of Psychology, Center for the Neural Basis of Cognition, Carnegie Mellon UniversityPittsburgh, PA, USA
| |
Collapse
|
65
|
Kidd G, Richards VM, Streeter T, Mason CR, Huang R. Contextual effects in the identification of nonspeech auditory patterns. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 130:3926-38. [PMID: 22225048 PMCID: PMC3253596 DOI: 10.1121/1.3658442] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2011] [Revised: 10/05/2011] [Accepted: 10/07/2011] [Indexed: 05/31/2023]
Abstract
This study investigated the benefit of a priori cues in a masked nonspeech pattern identification experiment. Targets were narrowband sequences of tone bursts forming six easily identifiable frequency patterns selected randomly on each trial. The frequency band containing the target was randomized. Maskers were also narrowband sequences of tone bursts chosen randomly on every trial. Targets and maskers were presented monaurally in mutually exclusive frequency bands, producing large amounts of informational masking. Cuing the masker produced a significant improvement in performance, while holding the target frequency band constant provided no benefit. The cue providing the greatest benefit was a copy of the masker presented ipsilaterally before the target-plus-masker. The masker cue presented contralaterally, and a notched-noise cue produced smaller benefits. One possible mechanism underlying these findings is auditory "enhancement" in which the neural response to the target is increased relative to the masker by differential prior stimulation of the target and masker frequency regions. A second possible mechanism provides a benefit to performance by comparing the spectrotemporal correspondence of the cue and target-plus-masker and is effective for either ipsilateral or contralateral cue presentation. These effects improve identification performance by emphasizing spectral contrasts in sequences or streams of sounds.
Collapse
Affiliation(s)
- Gerald Kidd
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA.
| | | | | | | | | |
Collapse
|
66
|
Sjerps MJ, Mitterer H, McQueen JM. Listening to different speakers: On the time-course of perceptual compensation for vocal-tract characteristics. Neuropsychologia 2011; 49:3831-46. [DOI: 10.1016/j.neuropsychologia.2011.09.044] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2011] [Revised: 09/22/2011] [Accepted: 09/27/2011] [Indexed: 11/26/2022]
|
67
|
Latinus M, Belin P. Anti-voice adaptation suggests prototype-based coding of voice identity. Front Psychol 2011; 2:175. [PMID: 21847384 PMCID: PMC3147159 DOI: 10.3389/fpsyg.2011.00175] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2011] [Accepted: 07/13/2011] [Indexed: 11/24/2022] Open
Abstract
We used perceptual aftereffects induced by adaptation with anti-voice stimuli to investigate voice identity representations. Participants learned a set of voices then were tested on a voice identification task with vowel stimuli morphed between identities, after different conditions of adaptation. In Experiment 1, participants chose the identity opposite to the adapting anti-voice significantly more often than the other two identities (e.g., after being adapted to anti-A, they identified the average voice as A). In Experiment 2, participants showed a bias for identities opposite to the adaptor specifically for anti-voice, but not for non-anti-voice adaptors. These results are strikingly similar to adaptation aftereffects observed for facial identity. They are compatible with a representation of individual voice identities in a multidimensional perceptual voice space referenced on a voice prototype.
Collapse
Affiliation(s)
- Marianne Latinus
- Voice Neurocognition Laboratory, Social Interactions Research Centre, Institute of Neuroscience and Psychology, University of Glasgow Glasgow, UK
| | | |
Collapse
|
68
|
Sjerps MJ, Mitterer H, McQueen JM. Constraints on the processes responsible for the extrinsic normalization of vowels. Atten Percept Psychophys 2011; 73:1195-215. [PMID: 21321794 PMCID: PMC3089724 DOI: 10.3758/s13414-011-0096-8] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Listeners tune in to talkers' vowels through extrinsic normalization. We asked here whether this process could be based on compensation for the long-term average spectrum (LTAS) of preceding sounds and whether the mechanisms responsible for normalization are indifferent to the nature of those sounds. If so, normalization should apply to nonspeech stimuli. Previous findings were replicated with first-formant (F1) manipulations of speech. Targets on a [pt]-[pɛt] (low-high F1) continuum were labeled as [pt] more after high-F1 than after low-F1 precursors. Spectrally rotated nonspeech versions of these materials produced similar normalization. None occurred, however, with nonspeech stimuli that were less speechlike, even though precursor-target LTAS relations were equivalent to those used earlier. Additional experiments investigated the roles of pitch movement, amplitude variation, formant location, and the stimuli's perceived similarity to speech. It appears that normalization is not restricted to speech but that the nature of the preceding sounds does matter. Extrinsic normalization of vowels is due, at least in part, to an auditory process that may require familiarity with the spectrotemporal characteristics of speech.
Collapse
Affiliation(s)
- Matthias J Sjerps
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands.
| | | | | |
Collapse
|
69
|
McMurray B, Jongman A. What information is necessary for speech categorization? Harnessing variability in the speech signal by integrating cues computed relative to expectations. Psychol Rev 2011; 118:219-46. [PMID: 21417542 PMCID: PMC3523696 DOI: 10.1037/a0022325] [Citation(s) in RCA: 141] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Most theories of categorization emphasize how continuous perceptual information is mapped to categories. However, equally important are the informational assumptions of a model, the type of information subserving this mapping. This is crucial in speech perception where the signal is variable and context dependent. This study assessed the informational assumptions of several models of speech categorization, in particular, the number of cues that are the basis of categorization and whether these cues represent the input veridically or have undergone compensation. We collected a corpus of 2,880 fricative productions (Jongman, Wayland, & Wong, 2000) spanning many talker and vowel contexts and measured 24 cues for each. A subset was also presented to listeners in an 8AFC phoneme categorization task. We then trained a common classification model based on logistic regression to categorize the fricative from the cue values and manipulated the information in the training set to contrast (a) models based on a small number of invariant cues, (b) models using all cues without compensation, and (c) models in which cues underwent compensation for contextual factors. Compensation was modeled by computing cues relative to expectations (C-CuRE), a new approach to compensation that preserves fine-grained detail in the signal. Only the compensation model achieved a similar accuracy to listeners and showed the same effects of context. Thus, even simple categorization metrics can overcome the variability in speech when sufficient information is available and compensation schemes like C-CuRE are employed.
Collapse
Affiliation(s)
- Bob McMurray
- Department of Psychology, University of Iowa, Iowa City, IA 52240, USA.
| | | |
Collapse
|
70
|
Riecke L, Micheyl C, Vanbussel M, Schreiner CS, Mendelsohn D, Formisano E. Recalibration of the auditory continuity illusion: sensory and decisional effects. Hear Res 2011; 277:152-62. [PMID: 21276844 DOI: 10.1016/j.heares.2011.01.013] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/06/2010] [Revised: 01/17/2011] [Accepted: 01/19/2011] [Indexed: 12/01/2022]
Abstract
An interrupted sound can be perceived as continuous when noise masks the interruption, creating an illusion of continuity. Recent findings have shown that adaptor sounds preceding an ambiguous target sound can influence listeners' rating of target continuity. However, it remains unclear whether these aftereffects on perceived continuity influence sensory processes, decisional processes (i.e., criterion shifts), or both. The present study addressed this question. Results show that the target sound was more likely to be rated as 'continuous' when preceded by adaptors that were perceived as clearly discontinuous than when it was preceded by adaptors that were heard (illusorily or veridically) as continuous. Detection-theory analyses indicated that these contrastive aftereffects reflect a combination of sensory and decisional processes. The contrastive sensory aftereffect persisted even when adaptors and targets were presented to opposite ears, suggesting a neural origin in structures that receive binaural inputs. Finally, physically identical but perceptually ambiguous adaptors that were rated as 'continuous' induced more reports of target continuity than adaptors that were rated as 'discontinuous'. This assimilative aftereffect was purely decisional. These findings confirm that judgments of auditory continuity can be influenced by preceding events, and reveal that these aftereffects have both sensory and decisional components.
Collapse
Affiliation(s)
- Lars Riecke
- Faculty of Psychology and Neuroscience, Maastricht University, Universiteitssingel 40, Maastricht, The Netherlands.
| | | | | | | | | | | |
Collapse
|
71
|
Affiliation(s)
- Arthur G. Samuel
- Department of Psychology, Stony Brook University, Stony Brook, New York 11794-2500
- Basque Center on Cognition, Brain and Language, Donostia-San Sebastian 20009 Spain
- IKERBASQUE, Basque Foundation for Science, Bilbao 48011 Spain;
| |
Collapse
|
72
|
Viswanathan N, Magnuson JS, Fowler CA. Compensation for coarticulation: disentangling auditory and gestural theories of perception of coarticulatory effects in speech. J Exp Psychol Hum Percept Perform 2010; 36:1005-15. [PMID: 20695714 DOI: 10.1037/a0018391] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
According to one approach to speech perception, listeners perceive speech by applying general pattern matching mechanisms to the acoustic signal (e.g., Diehl, Lotto, & Holt, 2004). An alternative is that listeners perceive the phonetic gestures that structured the acoustic signal (e.g., Fowler, 1986). The two accounts have offered different explanations for the phenomenon of compensation for coarticulation (CfC). An example of CfC is that if a speaker produces a gesture with a front place of articulation, it may be pulled slightly backwards if it follows a back place of articulation, and listeners' category boundaries shift (compensate) accordingly. The gestural account appeals to direct attunement to coarticulation to explain CfC, whereas the auditory account explains it by spectral contrast. In previous studies, spectral contrast and gestural consequences of coarticulation have been correlated, such that both accounts made identical predictions. We identify a liquid context in Tamil that disentangles contrast and coarticulation, such that the two accounts make different predictions. In a standard CfC task in Experiment 1, gestural coarticulation rather than spectral contrast determined the direction of CfC. Experiments 2, 3, and 4 demonstrated that tone analogues of the speech precursors failed to produce the same effects observed in Experiment 1, suggesting that simple spectral contrast cannot account for the findings of Experiment 1.
Collapse
Affiliation(s)
- Navin Viswanathan
- Department of Psychology, University of Connecticut and Haskins Laboratories, New Haven, Connecticut, USA.
| | | | | |
Collapse
|
73
|
Alexander JM, Kluender KR. Temporal properties of perceptual calibration to local and broad spectral characteristics of a listening context. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 128:3597-13. [PMID: 21218892 PMCID: PMC3037769 DOI: 10.1121/1.3500693] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2009] [Revised: 08/06/2010] [Accepted: 09/20/2010] [Indexed: 05/28/2023]
Abstract
The auditory system calibrates to reliable properties of a listening environment in ways that enhance sensitivity to less predictable (more informative) aspects of sounds. These reliable properties may be spectrally local (e.g., peaks) or global (e.g., gross tilt), but the time course over which the auditory system registers and calibrates to these properties is unknown. Understanding temporal properties of this perceptual calibration is essential for revealing underlying mechanisms that serve to increase sensitivity to changing and informative properties of sounds. Relative influence of the second formant (F(2)) and spectral tilt was measured for identification of /u/ and /i/ following precursor contexts that were harmonic complexes with frequency-modulated resonances. Precursors filtered to match F(2) or tilt of following vowels induced perceptual calibration (diminished influence) to F(2) and tilt, respectively. Calibration to F(2) was greatest for shorter duration precursors (250 ms), which implicates physiologic and/or perceptual mechanisms that are sensitive to onsets. In contrast, calibration to tilt was greatest for precursors with longer durations and higher repetition rates because greater opportunities to sample the spectrum result in more stable estimates of long-term global spectral properties. Possible mechanisms that promote sensitivity to change are discussed.
Collapse
Affiliation(s)
- Joshua M Alexander
- Department of Speech, Language, and Hearing Science, Purdue University, West Lafayette, Indiana 47907, USA.
| | | |
Collapse
|
74
|
Stilp CE, Alexander JM, Kiefte M, Kluender KR. Auditory color constancy: calibration to reliable spectral properties across nonspeech context and targets. Atten Percept Psychophys 2010; 72:470-80. [PMID: 20139460 PMCID: PMC2829251 DOI: 10.3758/app.72.2.470] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Brief experience with reliable spectral characteristics of a listening context can markedly alter perception of subsequent speech sounds, and parallels have been drawn between auditory compensation for listening context and visual color constancy. In order to better evaluate such an analogy, the generality of acoustic context effects for sounds with spectral-temporal compositions distinct from speech was investigated. Listeners identified nonspeech sounds-extensively edited samples produced by a French horn and a tenor saxophone-following either resynthesized speech or a short passage of music. Preceding contexts were "colored" by spectral envelope difference filters, which were created to emphasize differences between French horn and saxophone spectra. Listeners were more likely to report hearing a saxophone when the stimulus followed a context filtered to emphasize spectral characteristics of the French horn, and vice versa. Despite clear changes in apparent acoustic source, the auditory system calibrated to relatively predictable spectral characteristics of filtered context, differentially affecting perception of subsequent target nonspeech sounds. This calibration to listening context and relative indifference to acoustic sources operates much like visual color constancy, for which reliable properties of the spectrum of illumination are factored out of perception of color.
Collapse
|
75
|
Snyder JS, Carter OL, Hannon EE, Alain C. Adaptation reveals multiple levels of representation in auditory stream segregation. J Exp Psychol Hum Percept Perform 2009; 35:1232-44. [PMID: 19653761 DOI: 10.1037/a0012741] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
When presented with alternating low and high tones, listeners are more likely to perceive 2 separate streams of tones ("streaming") than a single coherent stream when the frequency separation (Deltaf) between tones is greater and the number of tone presentations is greater ("buildup"). However, the same large-Deltaf sequence reduces streaming for subsequent patterns presented after a gap of up to several seconds. Buildup occurs at a level of neural representation with sharp frequency tuning. The authors used adaptation to demonstrate that the contextual effect of prior Deltaf arose from a representation with broad frequency tuning, unlike buildup. Separate adaptation did not occur in a representation of Deltaf independent of frequency range, suggesting that any frequency-shift detectors undergoing adaptation are also frequency specific. A separate effect of prior perception was observed, dissociating stimulus-related (i.e., Deltaf) and perception-related (i.e., 1 stream vs. 2 streams) adaptation. Viewing a visual analogue to auditory streaming had no effect on subsequent perception of streaming, suggesting adaptation in auditory-specific brain circuits. These results, along with previous findings on buildup, suggest that processing in at least 3 levels of auditory neural representation underlies segregation and formation of auditory streams.
Collapse
Affiliation(s)
- Joel S Snyder
- Department of Psychology, University of Nevada, 4505 South Maryland Parkway, Box 455030, Las Vegas, NV 89154-5030, USA
| | | | | | | |
Collapse
|
76
|
Zäske R, Schweinberger SR, Kaufmann JM, Kawahara H. In the ear of the beholder: neural correlates of adaptation to voice gender. Eur J Neurosci 2009; 30:527-34. [DOI: 10.1111/j.1460-9568.2009.06839.x] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
77
|
Huang J, Holt LL. General perceptual contributions to lexical tone normalization. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 125:3983-94. [PMID: 19507980 PMCID: PMC2806435 DOI: 10.1121/1.3125342] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2008] [Revised: 04/01/2009] [Accepted: 04/05/2009] [Indexed: 05/27/2023]
Abstract
Within tone languages that use pitch variations to contrast meaning, large variability exists in the pitches produced by different speakers. Context-dependent perception may help to resolve this perceptual challenge. However, whether speakers rely on context in contour tone perception is unclear; previous studies have produced inconsistent results. The present study aimed to provide an unambiguous test of the effect of context on contour lexical tone perception and to explore its underlying mechanisms. In three experiments, Mandarin listeners' perception of Mandarin first and second (high-level and mid-rising) tones was investigated with preceding speech and non-speech contexts. Results indicate that the mean fundamental frequency (f0) of a preceding sentence affects perception of contour lexical tones and the effect is contrastive. Following a sentence with a higher-frequency mean f0, the following syllable is more likely to be perceived as a lower frequency lexical tone and vice versa. Moreover, non-speech precursors modeling the mean spectrum of f0 also elicit this effect, suggesting general perceptual processing rather than articulatory-based or speaker-identity-driven mechanisms.
Collapse
Affiliation(s)
- Jingyuan Huang
- Department of Psychology and the Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA.
| | | |
Collapse
|
78
|
Schweinberger SR, Casper C, Hauthal N, Kaufmann JM, Kawahara H, Kloth N, Robertson DM, Simpson AP, Zäske R. Auditory Adaptation in Voice Perception. Curr Biol 2008; 18:684-8. [DOI: 10.1016/j.cub.2008.04.015] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2008] [Revised: 03/11/2008] [Accepted: 04/03/2008] [Indexed: 11/28/2022]
|
79
|
Holt LL, Lotto AJ. Speech Perception Within an Auditory Cognitive Science Framework. CURRENT DIRECTIONS IN PSYCHOLOGICAL SCIENCE 2008; 17:42-46. [PMID: 19060961 DOI: 10.1111/j.1467-8721.2008.00545.x] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The complexities of the acoustic speech signal pose many significant challenges for listeners. Although perceiving speech begins with auditory processing, investigation of speech perception has progressed mostly independently of study of the auditory system. Nevertheless, a growing body of evidence demonstrates that cross-fertilization between the two areas of research can be productive. We briefly describe research bridging the study of general auditory processing and speech perception, showing that the latter is constrained and influenced by operating characteristics of the auditory system and that our understanding of the processes involved in speech perception is enhanced by study within a more general framework. The disconnect between the two areas of research has stunted the development of a truly interdisciplinary science, but there is an opportunity for great strides in understanding with the development of an integrated field of auditory cognitive science.
Collapse
Affiliation(s)
- Lori L Holt
- Department of Psychology and Center for the Neural Basis of Cognition, Carnegie Mellon University
| | | |
Collapse
|