1
|
Xie X, Jaeger TF, Kurumada C. What we do (not) know about the mechanisms underlying adaptive speech perception: A computational framework and review. Cortex 2023; 166:377-424. [PMID: 37506665 DOI: 10.1016/j.cortex.2023.05.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 12/23/2022] [Accepted: 05/05/2023] [Indexed: 07/30/2023]
Abstract
Speech from unfamiliar talkers can be difficult to comprehend initially. These difficulties tend to dissipate with exposure, sometimes within minutes or less. Adaptivity in response to unfamiliar input is now considered a fundamental property of speech perception, and research over the past two decades has made substantial progress in identifying its characteristics. The mechanisms underlying adaptive speech perception, however, remain unknown. Past work has attributed facilitatory effects of exposure to any one of three qualitatively different hypothesized mechanisms: (1) low-level, pre-linguistic, signal normalization, (2) changes in/selection of linguistic representations, or (3) changes in post-perceptual decision-making. Direct comparisons of these hypotheses, or combinations thereof, have been lacking. We describe a general computational framework for adaptive speech perception (ASP) that-for the first time-implements all three mechanisms. We demonstrate how the framework can be used to derive predictions for experiments on perception from the acoustic properties of the stimuli. Using this approach, we find that-at the level of data analysis presently employed by most studies in the field-the signature results of influential experimental paradigms do not distinguish between the three mechanisms. This highlights the need for a change in research practices, so that future experiments provide more informative results. We recommend specific changes to experimental paradigms and data analysis. All data and code for this study are shared via OSF, including the R markdown document that this article is generated from, and an R library that implements the models we present.
Collapse
Affiliation(s)
- Xin Xie
- Language Science, University of California, Irvine, USA.
| | - T Florian Jaeger
- Brain and Cognitive Sciences, University of Rochester, Rochester, NY, USA; Computer Science, University of Rochester, Rochester, NY, USA
| | - Chigusa Kurumada
- Brain and Cognitive Sciences, University of Rochester, Rochester, NY, USA
| |
Collapse
|
2
|
Causal inference in environmental sound recognition. Cognition 2021; 214:104627. [PMID: 34044231 DOI: 10.1016/j.cognition.2021.104627] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 01/28/2021] [Accepted: 02/05/2021] [Indexed: 11/23/2022]
Abstract
Sound is caused by physical events in the world. Do humans infer these causes when recognizing sound sources? We tested whether the recognition of common environmental sounds depends on the inference of a basic physical variable - the source intensity (i.e., the power that produces a sound). A source's intensity can be inferred from the intensity it produces at the ear and its distance, which is normally conveyed by reverberation. Listeners could thus use intensity at the ear and reverberation to constrain recognition by inferring the underlying source intensity. Alternatively, listeners might separate these acoustic cues from their representation of a sound's identity in the interest of invariant recognition. We compared these two hypotheses by measuring recognition accuracy for sounds with typically low or high source intensity (e.g., pepper grinders vs. trucks) that were presented across a range of intensities at the ear or with reverberation cues to distance. The recognition of low-intensity sources (e.g., pepper grinders) was impaired by high presentation intensities or reverberation that conveyed distance, either of which imply high source intensity. Neither effect occurred for high-intensity sources. The results suggest that listeners implicitly use the intensity at the ear along with distance cues to infer a source's power and constrain its identity. The recognition of real-world sounds thus appears to depend upon the inference of their physical generative parameters, even generative parameters whose cues might otherwise be separated from the representation of a sound's identity.
Collapse
|
3
|
Lehet M, Holt LL. Nevertheless, it persists: Dimension-based statistical learning and normalization of speech impact different levels of perceptual processing. Cognition 2020; 202:104328. [PMID: 32502867 DOI: 10.1016/j.cognition.2020.104328] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2018] [Revised: 05/04/2020] [Accepted: 05/13/2020] [Indexed: 11/25/2022]
Abstract
Speech is notoriously variable, with no simple mapping from acoustics to linguistically-meaningful units like words and phonemes. Empirical research on this theoretically central issue establishes at least two classes of perceptual phenomena that accommodate acoustic variability: normalization and perceptual learning. Intriguingly, perceptual learning is supported by learning across acoustic variability, but normalization is thought to counteract acoustic variability leaving open questions about how these two phenomena might interact. Here, we examine the joint impact of normalization and perceptual learning on how acoustic dimensions map to vowel categories. As listeners categorized nonwords as setch or satch, they experienced a shift in short-term distributional regularities across the vowels' acoustic dimensions. Introduction of this 'artificial accent' resulted in a shift in the contribution of vowel duration in categorization. Although this dimension-based statistical learning impacted the influence of vowel duration on vowel categorization, the duration of these very same vowels nonetheless maintained a consistent influence on categorization of a subsequent consonant via duration contrast, a form of normalization. Thus, vowel duration had a duplex role consistent with normalization and perceptual learning operating on distinct levels in the processing hierarchy. We posit that whereas normalization operates across auditory dimensions, dimension-based statistical learning impacts the connection weights among auditory dimensions and phonetic categories.
Collapse
Affiliation(s)
- Matthew Lehet
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA 15232, USA; Center for the Neural Basis of Cognition, Pittsburgh, PA 15232, USA
| | - Lori L Holt
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA 15232, USA; Center for the Neural Basis of Cognition, Pittsburgh, PA 15232, USA; Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15232, USA.
| |
Collapse
|
4
|
Sjerps MJ, Fox NP, Johnson K, Chang EF. Speaker-normalized sound representations in the human auditory cortex. Nat Commun 2019; 10:2465. [PMID: 31165733 PMCID: PMC6549175 DOI: 10.1038/s41467-019-10365-z] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Accepted: 05/03/2019] [Indexed: 11/08/2022] Open
Abstract
The acoustic dimensions that distinguish speech sounds (like the vowel differences in "boot" and "boat") also differentiate speakers' voices. Therefore, listeners must normalize across speakers without losing linguistic information. Past behavioral work suggests an important role for auditory contrast enhancement in normalization: preceding context affects listeners' perception of subsequent speech sounds. Here, using intracranial electrocorticography in humans, we investigate whether and how such context effects arise in auditory cortex. Participants identified speech sounds that were preceded by phrases from two different speakers whose voices differed along the same acoustic dimension as target words (the lowest resonance of the vocal tract). In every participant, target vowels evoke a speaker-dependent neural response that is consistent with the listener's perception, and which follows from a contrast enhancement model. Auditory cortex processing thus displays a critical feature of normalization, allowing listeners to extract meaningful content from the voices of diverse speakers.
Collapse
Affiliation(s)
- Matthias J Sjerps
- Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, Radboud University, Kapittelweg 29, Nijmegen, 6525 EN, The Netherlands
- Max Planck Institute for Psycholinguistics, Wundtlaan 1, Nijmegen, 6525 XD, Netherlands
| | - Neal P Fox
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, California, 94158, USA
| | - Keith Johnson
- Department of Linguistics, University of California, Berkeley, 1203 Dwinelle Hall #2650, Berkeley, California, 94720, USA
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, California, 94158, USA.
- Weill Institute for Neurosciences, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, California, 94158, USA.
| |
Collapse
|
5
|
|
6
|
Holt LL, Tierney AT, Guerra G, Laffere A, Dick F. Dimension-selective attention as a possible driver of dynamic, context-dependent re-weighting in speech processing. Hear Res 2018; 366:50-64. [PMID: 30131109 PMCID: PMC6107307 DOI: 10.1016/j.heares.2018.06.014] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Revised: 06/10/2018] [Accepted: 06/19/2018] [Indexed: 12/24/2022]
Abstract
The contribution of acoustic dimensions to an auditory percept is dynamically adjusted and reweighted based on prior experience about how informative these dimensions are across the long-term and short-term environment. This is especially evident in speech perception, where listeners differentially weight information across multiple acoustic dimensions, and use this information selectively to update expectations about future sounds. The dynamic and selective adjustment of how acoustic input dimensions contribute to perception has made it tempting to conceive of this as a form of non-spatial auditory selective attention. Here, we review several human speech perception phenomena that might be consistent with auditory selective attention although, as of yet, the literature does not definitively support a mechanistic tie. We relate these human perceptual phenomena to illustrative nonhuman animal neurobiological findings that offer informative guideposts in how to test mechanistic connections. We next present a novel empirical approach that can serve as a methodological bridge from human research to animal neurobiological studies. Finally, we describe four preliminary results that demonstrate its utility in advancing understanding of human non-spatial dimension-based auditory selective attention.
Collapse
Affiliation(s)
- Lori L Holt
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, 15213, USA; Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.
| | - Adam T Tierney
- Department of Psychological Sciences, Birkbeck College, University of London, London, WC1E 7HX, UK; Centre for Brain and Cognitive Development, Birkbeck College, London, WC1E 7HX, UK
| | - Giada Guerra
- Department of Psychological Sciences, Birkbeck College, University of London, London, WC1E 7HX, UK; Centre for Brain and Cognitive Development, Birkbeck College, London, WC1E 7HX, UK
| | - Aeron Laffere
- Department of Psychological Sciences, Birkbeck College, University of London, London, WC1E 7HX, UK
| | - Frederic Dick
- Department of Psychological Sciences, Birkbeck College, University of London, London, WC1E 7HX, UK; Centre for Brain and Cognitive Development, Birkbeck College, London, WC1E 7HX, UK; Department of Experimental Psychology, University College London, London, WC1H 0AP, UK
| |
Collapse
|
7
|
Gabay Y, Holt LL. Short-term adaptation to sound statistics is unimpaired in developmental dyslexia. PLoS One 2018; 13:e0198146. [PMID: 29879142 PMCID: PMC5991687 DOI: 10.1371/journal.pone.0198146] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2018] [Accepted: 05/14/2018] [Indexed: 11/19/2022] Open
Abstract
Developmental dyslexia is presumed to arise from phonological impairments. Accordingly, people with dyslexia show speech perception deficits taken as indication of impoverished phonological representations. However, the nature of speech perception deficits in those with dyslexia remains elusive. Specifically, there is no agreement as to whether speech perception deficits arise from speech-specific processing impairments, or from general auditory impairments that might be either specific to temporal processing or more general. Recent studies show that general auditory referents such as Long Term Average Spectrum (LTAS, the distribution of acoustic energy across the duration of a sound sequence) affect speech perception. Here we examine the impact of preceding target sounds' LTAS on phoneme categorization to assess the nature of putative general auditory impairments associated with dyslexia. Dyslexic and typical listeners categorized speech targets varying perceptually from /ga/-/da/ preceded by speech and nonspeech tone contexts varying. Results revealed a spectrally contrastive influence of the preceding context LTAS on speech categorization, with a larger magnitude effect for nonspeech compared to speech precursors. Importantly, there was no difference in the presence or magnitude of the effects across dyslexia and control groups. These results demonstrate an aspect of general auditory processing that is spared in dyslexia, available to support phonemic processing when speech is presented in context.
Collapse
Affiliation(s)
- Yafit Gabay
- Department of Special Education, University of Haifa, Haifa, Israel
- Edmond J. Safra Brain Research Center for the Study of Learning Disabilities, University of Haifa, Haifa, Israel
| | - Lori L. Holt
- Carnegie Mellon University, Department of Psychology, Pittsburgh, United States of America
| |
Collapse
|
8
|
Malek S, Sperschneider K. Aftereffects of Spectrally Similar and Dissimilar Spectral Motion Adaptors in the Tritone Paradox. Front Psychol 2018; 9:677. [PMID: 29867653 PMCID: PMC5953344 DOI: 10.3389/fpsyg.2018.00677] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Accepted: 04/19/2018] [Indexed: 11/13/2022] Open
Affiliation(s)
- Stephanie Malek
- Psychology Department, Martin Luther University Halle-Wittenberg, Halle, Germany
- *Correspondence: Stephanie Malek
| | | |
Collapse
|
9
|
Choi JY, Hu ER, Perrachione TK. Varying acoustic-phonemic ambiguity reveals that talker normalization is obligatory in speech processing. Atten Percept Psychophys 2018; 80:784-797. [PMID: 29417449 PMCID: PMC5840042 DOI: 10.3758/s13414-017-1395-5] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The nondeterministic relationship between speech acoustics and abstract phonemic representations imposes a challenge for listeners to maintain perceptual constancy despite the highly variable acoustic realization of speech. Talker normalization facilitates speech processing by reducing the degrees of freedom for mapping between encountered speech and phonemic representations. While this process has been proposed to facilitate the perception of ambiguous speech sounds, it is currently unknown whether talker normalization is affected by the degree of potential ambiguity in acoustic-phonemic mapping. We explored the effects of talker normalization on speech processing in a series of speeded classification paradigms, parametrically manipulating the potential for inconsistent acoustic-phonemic relationships across talkers for both consonants and vowels. Listeners identified words with varying potential acoustic-phonemic ambiguity across talkers (e.g., beet/boat vs. boot/boat) spoken by single or mixed talkers. Auditory categorization of words was always slower when listening to mixed talkers compared to a single talker, even when there was no potential acoustic ambiguity between target sounds. Moreover, the processing cost imposed by mixed talkers was greatest when words had the most potential acoustic-phonemic overlap across talkers. Models of acoustic dissimilarity between target speech sounds did not account for the pattern of results. These results suggest (a) that talker normalization incurs the greatest processing cost when disambiguating highly confusable sounds and (b) that talker normalization appears to be an obligatory component of speech perception, taking place even when the acoustic-phonemic relationships across sounds are unambiguous.
Collapse
Affiliation(s)
- Ja Young Choi
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Ave., Boston, MA, 02215, USA
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA, USA
| | - Elly R Hu
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Ave., Boston, MA, 02215, USA
| | - Tyler K Perrachione
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Ave., Boston, MA, 02215, USA.
| |
Collapse
|
10
|
Wang N, Oxenham AJ. Effects of auditory enhancement on the loudness of masker and target components. Hear Res 2016; 333:150-156. [PMID: 26805025 DOI: 10.1016/j.heares.2016.01.012] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/24/2015] [Revised: 01/14/2016] [Accepted: 01/20/2016] [Indexed: 10/22/2022]
Abstract
Auditory enhancement refers to the observation that the salience of one spectral region (the "signal") of a broadband sound can be enhanced and can "pop out" from the remainder of the sound (the "masker") if it is preceded by the broadband sound without the signal. The present study investigated auditory enhancement as an effective change in loudness, to determine whether it reflects a change in the loudness of the signal, the masker, or both. In the first experiment, the 500-ms precursor, an inharmonic complex with logarithmically spaced components, was followed after a 50-ms gap by the 100-ms signal or masker alone, the loudness of which was compared with that of the same signal or masker presented 2 s later. In the second experiment, the loudness of the signal embedded in the masker was assessed with and without a precursor using the same method, as was the loudness of the entire signal-plus-masker complex. The results suggest that the precursor does not affect the loudness of the signal or the masker alone, but enhances the loudness of the signal in the presence of the masker, while leaving the loudness of the surrounding masker unaffected. The results are consistent with an explanation based on "adaptation of inhibition" [Viemeister and Bacon (1982). J. Acoust. Soc. Am. 71, 1502-1507].
Collapse
Affiliation(s)
- Ningyuan Wang
- Department of Psychology, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Andrew J Oxenham
- Department of Psychology, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
11
|
Varnet L, Knoblauch K, Serniclaes W, Meunier F, Hoen M. A psychophysical imaging method evidencing auditory cue extraction during speech perception: a group analysis of auditory classification images. PLoS One 2015; 10:e0118009. [PMID: 25781470 PMCID: PMC4364617 DOI: 10.1371/journal.pone.0118009] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2014] [Accepted: 01/05/2015] [Indexed: 11/30/2022] Open
Abstract
Although there is a large consensus regarding the involvement of specific acoustic cues in speech perception, the precise mechanisms underlying the transformation from continuous acoustical properties into discrete perceptual units remains undetermined. This gap in knowledge is partially due to the lack of a turnkey solution for isolating critical speech cues from natural stimuli. In this paper, we describe a psychoacoustic imaging method known as the Auditory Classification Image technique that allows experimenters to estimate the relative importance of time-frequency regions in categorizing natural speech utterances in noise. Importantly, this technique enables the testing of hypotheses on the listening strategies of participants at the group level. We exemplify this approach by identifying the acoustic cues involved in da/ga categorization with two phonetic contexts, Al- or Ar-. The application of Auditory Classification Images to our group of 16 participants revealed significant critical regions on the second and third formant onsets, as predicted by the literature, as well as an unexpected temporal cue on the first formant. Finally, through a cluster-based nonparametric test, we demonstrate that this method is sufficiently sensitive to detect fine modifications of the classification strategies between different utterances of the same phoneme.
Collapse
Affiliation(s)
- Léo Varnet
- Lyon Neuroscience Research Center, CNRS UMR 5292, Auditory Language Processing (ALP) research group, Lyon, France
- Laboratoire sur le Langage le Cerveau et la Cognition, CNRS UMR 5304, Auditory Language Processing (ALP) research group, Lyon, France
- Université de Lyon, Université Lyon 1, Lyon, France
| | - Kenneth Knoblauch
- Stem Cell and Brain Research Institute, INSERM U 846, Integrative Neuroscience Department, Bron, France
| | - Willy Serniclaes
- Université Libre de Bruxelles, UNESCOG, CP191, Bruxelles, Belgique
| | - Fanny Meunier
- Laboratoire sur le Langage le Cerveau et la Cognition, CNRS UMR 5304, Auditory Language Processing (ALP) research group, Lyon, France
- Université de Lyon, Université Lyon 1, Lyon, France
| | - Michel Hoen
- Lyon Neuroscience Research Center, CNRS UMR 5292, Auditory Language Processing (ALP) research group, Lyon, France
- INSERM U1028, Lyon Neuroscience Research Center, Brain Dynamics and Cognition Team, Lyon, France
- Université de Lyon, Université Lyon 1, Lyon, France
| |
Collapse
|
12
|
Altmann CF, Gaese BH. Representation of frequency-modulated sounds in the human brain. Hear Res 2013; 307:74-85. [PMID: 23933098 DOI: 10.1016/j.heares.2013.07.018] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/10/2013] [Revised: 07/26/2013] [Accepted: 07/27/2013] [Indexed: 10/26/2022]
Abstract
Frequency-modulation is a ubiquitous sound feature present in communicative sounds of various animal species and humans. Functional imaging of the human auditory system has seen remarkable advances in the last two decades and studies pertaining to frequency-modulation have centered around two major questions: a) are there dedicated feature-detectors encoding frequency-modulation in the brain and b) is there concurrent representation with amplitude-modulation, another temporal sound feature? In this review, we first describe how these two questions are motivated by psychophysical studies and neurophysiology in animal models. We then review how human non-invasive neuroimaging studies have furthered our understanding of the representation of frequency-modulated sounds in the brain. Finally, we conclude with some suggestions on how human neuroimaging could be used in future studies to address currently still open questions on this fundamental sound feature. This article is part of a Special Issue entitled Human Auditory Neuroimaging.
Collapse
Affiliation(s)
- Christian F Altmann
- Human Brain Research Center, Graduate School of Medicine, Kyoto University, Kyoto 606-8507, Japan; Career-Path Promotion Unit for Young Life Scientists, Kyoto University, Kyoto 606-8501, Japan.
| | | |
Collapse
|
13
|
Zhang C, Peng G, Wang WSY. Achieving constancy in spoken word identification: time course of talker normalization. BRAIN AND LANGUAGE 2013; 126:193-202. [PMID: 23792769 DOI: 10.1016/j.bandl.2013.05.010] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/29/2012] [Revised: 04/19/2013] [Accepted: 05/20/2013] [Indexed: 06/02/2023]
Abstract
This event-related potential (ERP) study examines the time course of context-dependent talker normalization in spoken word identification. We found three ERP components, the N1 (100-220 ms), the N400 (250-500 ms) and the Late Positive Component (500-800 ms), which are conjectured to involve (a) auditory processing, (b) talker normalization and lexical retrieval, and (c) decisional process/lexical selection respectively. Talker normalization likely occurs in the time window of the N400 and overlaps with the lexical retrieval process. Compared with the nonspeech context, the speech contexts, no matter whether they have semantic content or not, enable listeners to tune to a talker's pitch range. In this way, speech contexts induce more efficient talker normalization during the activation of potential lexical candidates and lead to more accurate selection of the intended word in spoken word identification.
Collapse
Affiliation(s)
- Caicai Zhang
- Language and Cognition Laboratory, Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Hong Kong Special Administrative Region.
| | | | | |
Collapse
|
14
|
Abstract
Inner speech is one of the most common, but least investigated, mental activities humans perform. It is an internal copy of one's external voice and so is similar to a well-established component of motor control: corollary discharge. Corollary discharge is a prediction of the sound of one's voice generated by the motor system. This prediction is normally used to filter self-caused sounds from perception, which segregates them from externally caused sounds and prevents the sensory confusion that would otherwise result. The similarity between inner speech and corollary discharge motivates the theory, tested here, that corollary discharge provides the sensory content of inner speech. The results reported here show that inner speech attenuates the impact of external sounds. This attenuation was measured using a context effect (an influence of contextual speech sounds on the perception of subsequent speech sounds), which weakens in the presence of speech imagery that matches the context sound. Results from a control experiment demonstrated this weakening in external speech as well. Such sensory attenuation is a hallmark of corollary discharge.
Collapse
Affiliation(s)
- Mark Scott
- Department of Linguistics, University of British Columbia
| |
Collapse
|
15
|
Spectral information in nonspeech contexts influences children's categorization of ambiguous speech sounds. J Exp Child Psychol 2013; 116:728-37. [PMID: 23827642 DOI: 10.1016/j.jecp.2013.05.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2012] [Revised: 05/20/2013] [Accepted: 05/26/2013] [Indexed: 11/20/2022]
Abstract
For both adults and children, acoustic context plays an important role in speech perception. For adults, both speech and nonspeech acoustic contexts influence perception of subsequent speech items, consistent with the argument that effects of context are due to domain-general auditory processes. However, prior research examining the effects of context on children's speech perception have focused on speech contexts; nonspeech contexts have not been explored previously. To better understand the developmental progression of children's use of contexts in speech perception and the mechanisms underlying that development, we created a novel experimental paradigm testing 5-year-old children's speech perception in several acoustic contexts. The results demonstrated that nonspeech context influences children's speech perception, consistent with claims that context effects arise from general auditory system properties rather than speech-specific mechanisms. This supports theoretical accounts of language development suggesting that domain-general processes play a role across the lifespan.
Collapse
|
16
|
Zhang C, Peng G, Wang WSY. Unequal effects of speech and nonspeech contexts on the perceptual normalization of Cantonese level tones. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 132:1088-1099. [PMID: 22894228 DOI: 10.1121/1.4731470] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Context is important for recovering language information from talker-induced variability in acoustic signals. In tone perception, previous studies reported similar effects of speech and nonspeech contexts in Mandarin, supporting a general perceptual mechanism underlying tone normalization. However, no supportive evidence was obtained in Cantonese, also a tone language. Moreover, no study has compared speech and nonspeech contexts in the multi-talker condition, which is essential for exploring the normalization mechanism of inter-talker variability in speaking F0. The other question is whether a talker's full F0 range and mean F0 equally facilitate normalization. To answer these questions, this study examines the effects of four context conditions (speech/nonspeech × F0 contour/mean F0) in the multi-talker condition in Cantonese. Results show that raising and lowering the F0 of speech contexts change the perception of identical stimuli from mid level tone to low and high level tone, whereas nonspeech contexts only mildly increase the identification preference. It supports the speech-specific mechanism of tone normalization. Moreover, speech context with flattened F0 trajectory, which neutralizes cues of a talker's full F0 range, fails to facilitate normalization in some conditions, implying that a talker's mean F0 is less efficient for minimizing talker-induced lexical ambiguity in tone perception.
Collapse
Affiliation(s)
- Caicai Zhang
- Language Engineering Laboratory, The Chinese University of Hong Kong, Hong Kong Special Administrative Region.
| | | | | |
Collapse
|
17
|
Huang J, Holt LL. Listening for the norm: adaptive coding in speech categorization. Front Psychol 2012; 3:10. [PMID: 22347198 PMCID: PMC3272641 DOI: 10.3389/fpsyg.2012.00010] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2011] [Accepted: 01/10/2012] [Indexed: 12/02/2022] Open
Abstract
Perceptual aftereffects have been referred to as "the psychologist's microelectrode" because they can expose dimensions of representation through the residual effect of a context stimulus upon perception of a subsequent target. The present study uses such context-dependence to examine the dimensions of representation involved in a classic demonstration of "talker normalization" in speech perception. Whereas most accounts of talker normalization have emphasized talker-, speech-, or articulatory-specific dimensions' significance, the present work tests an alternative hypothesis: that the long-term average spectrum (LTAS) of speech context is responsible for patterns of context-dependent perception considered to be evidence for talker normalization. In support of this hypothesis, listeners' vowel categorization was equivalently influenced by speech contexts manipulated to sound as though they were spoken by different talkers and non-speech analogs matched in LTAS to the speech contexts. Since the non-speech contexts did not possess talker, speech, or articulatory information, general perceptual mechanisms are implicated. Results are described in terms of adaptive perceptual coding.
Collapse
Affiliation(s)
- Jingyuan Huang
- Department of Psychology, Center for the Neural Basis of Cognition, Carnegie Mellon UniversityPittsburgh, PA, USA
| | - Lori L. Holt
- Department of Psychology, Center for the Neural Basis of Cognition, Carnegie Mellon UniversityPittsburgh, PA, USA
| |
Collapse
|
18
|
Kidd G, Richards VM, Streeter T, Mason CR, Huang R. Contextual effects in the identification of nonspeech auditory patterns. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 130:3926-38. [PMID: 22225048 PMCID: PMC3253596 DOI: 10.1121/1.3658442] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2011] [Revised: 10/05/2011] [Accepted: 10/07/2011] [Indexed: 05/31/2023]
Abstract
This study investigated the benefit of a priori cues in a masked nonspeech pattern identification experiment. Targets were narrowband sequences of tone bursts forming six easily identifiable frequency patterns selected randomly on each trial. The frequency band containing the target was randomized. Maskers were also narrowband sequences of tone bursts chosen randomly on every trial. Targets and maskers were presented monaurally in mutually exclusive frequency bands, producing large amounts of informational masking. Cuing the masker produced a significant improvement in performance, while holding the target frequency band constant provided no benefit. The cue providing the greatest benefit was a copy of the masker presented ipsilaterally before the target-plus-masker. The masker cue presented contralaterally, and a notched-noise cue produced smaller benefits. One possible mechanism underlying these findings is auditory "enhancement" in which the neural response to the target is increased relative to the masker by differential prior stimulation of the target and masker frequency regions. A second possible mechanism provides a benefit to performance by comparing the spectrotemporal correspondence of the cue and target-plus-masker and is effective for either ipsilateral or contralateral cue presentation. These effects improve identification performance by emphasizing spectral contrasts in sequences or streams of sounds.
Collapse
Affiliation(s)
- Gerald Kidd
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA.
| | | | | | | | | |
Collapse
|
19
|
Stilp CE, Alexander JM, Kiefte M, Kluender KR. Auditory color constancy: calibration to reliable spectral properties across nonspeech context and targets. Atten Percept Psychophys 2010; 72:470-80. [PMID: 20139460 PMCID: PMC2829251 DOI: 10.3758/app.72.2.470] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Brief experience with reliable spectral characteristics of a listening context can markedly alter perception of subsequent speech sounds, and parallels have been drawn between auditory compensation for listening context and visual color constancy. In order to better evaluate such an analogy, the generality of acoustic context effects for sounds with spectral-temporal compositions distinct from speech was investigated. Listeners identified nonspeech sounds-extensively edited samples produced by a French horn and a tenor saxophone-following either resynthesized speech or a short passage of music. Preceding contexts were "colored" by spectral envelope difference filters, which were created to emphasize differences between French horn and saxophone spectra. Listeners were more likely to report hearing a saxophone when the stimulus followed a context filtered to emphasize spectral characteristics of the French horn, and vice versa. Despite clear changes in apparent acoustic source, the auditory system calibrated to relatively predictable spectral characteristics of filtered context, differentially affecting perception of subsequent target nonspeech sounds. This calibration to listening context and relative indifference to acoustic sources operates much like visual color constancy, for which reliable properties of the spectrum of illumination are factored out of perception of color.
Collapse
|
20
|
Huang J, Holt LL. General perceptual contributions to lexical tone normalization. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 125:3983-94. [PMID: 19507980 PMCID: PMC2806435 DOI: 10.1121/1.3125342] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2008] [Revised: 04/01/2009] [Accepted: 04/05/2009] [Indexed: 05/27/2023]
Abstract
Within tone languages that use pitch variations to contrast meaning, large variability exists in the pitches produced by different speakers. Context-dependent perception may help to resolve this perceptual challenge. However, whether speakers rely on context in contour tone perception is unclear; previous studies have produced inconsistent results. The present study aimed to provide an unambiguous test of the effect of context on contour lexical tone perception and to explore its underlying mechanisms. In three experiments, Mandarin listeners' perception of Mandarin first and second (high-level and mid-rising) tones was investigated with preceding speech and non-speech contexts. Results indicate that the mean fundamental frequency (f0) of a preceding sentence affects perception of contour lexical tones and the effect is contrastive. Following a sentence with a higher-frequency mean f0, the following syllable is more likely to be perceived as a lower frequency lexical tone and vice versa. Moreover, non-speech precursors modeling the mean spectrum of f0 also elicit this effect, suggesting general perceptual processing rather than articulatory-based or speaker-identity-driven mechanisms.
Collapse
Affiliation(s)
- Jingyuan Huang
- Department of Psychology and the Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA.
| | | |
Collapse
|
21
|
Obleser J, Eisner F. Pre-lexical abstraction of speech in the auditory cortex. Trends Cogn Sci 2009; 13:14-9. [PMID: 19070534 DOI: 10.1016/j.tics.2008.09.005] [Citation(s) in RCA: 99] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2008] [Revised: 09/10/2008] [Accepted: 09/11/2008] [Indexed: 10/21/2022]
|
22
|
Russ BE, Lee YS, Cohen YE. Neural and behavioral correlates of auditory categorization. Hear Res 2007; 229:204-12. [PMID: 17208397 DOI: 10.1016/j.heares.2006.10.010] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/30/2006] [Accepted: 10/28/2006] [Indexed: 10/23/2022]
Abstract
Goal-directed behavior is the essence of adaptation because it allows humans and other animals to respond dynamically to different environmental scenarios. Goal-directed behavior can be characterized as the formation of dynamic links between stimuli and actions. One important attribute of goal-directed behavior is that linkages can be formed based on how a stimulus is categorized. That is, links are formed based on the membership of a stimulus in a particular functional category. In this review, we review categorization with an emphasis on auditory categorization. We focus on the role of categorization in language and non-human vocalizations. We present behavioral data indicating that non-human primates categorize and respond to vocalizations based on differences in their putative meaning and not differences in their acoustics. Finally, we present evidence suggesting that the ventrolateral prefrontal cortex plays an important role in processing auditory objects and has a specific role in the representation of auditory categories.
Collapse
Affiliation(s)
- Brian E Russ
- Department of Psychological and Brain Sciences and Center for Cognitive Neuroscience, Dartmouth College, Hanover, NH 03755, USA
| | | | | |
Collapse
|
23
|
Holt LL. The mean matters: effects of statistically defined nonspeech spectral distributions on speech categorization. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2006; 120:2801-17. [PMID: 17091133 PMCID: PMC1635014 DOI: 10.1121/1.2354071] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Adjacent speech, and even nonspeech, contexts influence phonetic categorization. Four experiments investigated how preceding sequences of sine-wave tones influence phonetic categorization. This experimental paradigm provides a means of investigating the statistical regularities of acoustic events that influence online speech categorization and, reciprocally, reveals regularities of the sound environment tracked by auditory processing. The tones comprising the sequences were drawn from distributions sampling different acoustic frequencies. Results indicate that whereas the mean of the distributions predicts contrastive shifts in speech categorization, variability of the distributions has little effect. Moreover, speech categorization is influenced by the global mean of the tone sequence, without significant influence of local statistical regularities within the tone sequence. Further arguing that the effect is strongly related to the average spectrum of the sequence, notched noise spectral complements of the tone sequences produce a complementary effect on speech categorization. Lastly, these effects are modulated by the number of tones in the acoustic history and the overall duration of the sequence, but not by the density with which the distribution defining the sequence is sampled. Results are discussed in light of stimulus-specific adaptation to statistical regularity in the acoustic input and a speculative link to talker normalization is postulated.
Collapse
Affiliation(s)
- Lori L Holt
- Department of Psychology and the Center for the Neural Basis of Cognition, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, USA.
| |
Collapse
|
24
|
Lotto AJ, Holt LL. Putting phonetic context effects into context: a commentary on Fowler (2006). PERCEPTION & PSYCHOPHYSICS 2006; 68:178-83. [PMID: 16773891 PMCID: PMC1770950 DOI: 10.3758/bf03193667] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
On the basis of a review of the literature and three new experiments, Fowler (2006) concludes that a contrast account for phonetic context effects is not tenable and is inferior to a gestural account. We believe that this conclusion is premature and that it is based on a restricted set of assumptions about a general perceptual account. Here, we briefly address the criticisms of Fowler (2006), with the intent of clarifying what a general auditory and learning approach to speech perception entails.
Collapse
Affiliation(s)
- Andrew J Lotto
- Center for Perceptual Systems, University of Texas, Austin 78712, USA.
| | | |
Collapse
|