1
|
Lavan N. Left-handed voices? Examining the perceptual learning of novel person characteristics from the voice. Q J Exp Psychol (Hove) 2024; 77:2325-2338. [PMID: 38229446 DOI: 10.1177/17470218241228849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2024]
Abstract
We regularly form impressions of who a person is from their voice, such that we can readily categorise people as being female or male, child or adult, trustworthy or not, and can furthermore recognise who specifically is speaking. How we establish mental representations for such categories of person characteristics has, however, only been explored in detail for voice identity learning. In a series of experiments, we therefore set out to examine whether and how listeners can learn to recognise a novel person characteristic. We specifically asked how diagnostic acoustic properties underpinning category distinctions inform perceptual judgements. We manipulated recordings of voices to create acoustic signatures for a person's handedness (left-handed vs. right-handed) in their voice. After training, we found that listeners were able to successfully learn to recognise handedness from voices with above-chance accuracy, although no significant differences in accuracy between the different types of manipulation emerged. Listeners were, furthermore, sensitive to the specific distributions of acoustic properties that underpinned the category distinctions. We, however, also find evidence for perceptual biases that may reflect long-term prior exposure to how voices vary in naturalistic settings. These biases shape how listeners use acoustic information in the voices when forming representations for distinguishing handedness from voices. This study is thus a first step to examine how representations for novel person characteristics are established, outside of voice identity perception. We discuss our findings in light of theoretical accounts of voice perception and speculate about potential mechanisms that may underpin our results.
Collapse
Affiliation(s)
- Nadine Lavan
- Department of Biological and Experimental Psychology, School of Biological and Behavioural Sciences, Queen Mary University of London, London, UK
| |
Collapse
|
2
|
Sorensen E, Oleson J, Kutlu E, McMurray B. A Bayesian hierarchical model for the analysis of visual analogue scaling tasks. Stat Methods Med Res 2024; 33:953-965. [PMID: 38573790 DOI: 10.1177/09622802241242319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/06/2024]
Abstract
In psychophysics and psychometrics, an integral method to the discipline involves charting how a person's response pattern changes according to a continuum of stimuli. For instance, in hearing science, Visual Analog Scaling tasks are experiments in which listeners hear sounds across a speech continuum and give a numeric rating between 0 and 100 conveying whether the sound they heard was more like word "a" or more like word "b" (i.e. each participant is giving a continuous categorization response). By taking all the continuous categorization responses across the speech continuum, a parametric curve model can be fit to the data and used to analyze any individual's response pattern by speech continuum. Standard statistical modeling techniques are not able to accommodate all of the specific requirements needed to analyze these data. Thus, Bayesian hierarchical modeling techniques are employed to accommodate group-level non-linear curves, individual-specific non-linear curves, continuum-level random effects, and a subject-specific variance that is predicted by other model parameters. In this paper, a Bayesian hierarchical model is constructed to model the data from a Visual Analog Scaling task study of mono-lingual and bi-lingual participants. Any nonlinear curve function could be used and we demonstrate the technique using the 4-parameter logistic function. Overall, the model was found to fit particularly well to the data from the study and results suggested that the magnitude of the slope was what most defined the differences in response patterns between continua.
Collapse
Affiliation(s)
- Eldon Sorensen
- Department of Biostatistics, University of Iowa, Iowa City, IA, USA
| | - Jacob Oleson
- Department of Biostatistics, University of Iowa, Iowa City, IA, USA
| | - Ethan Kutlu
- Department of Psychological and Brain Sciences, University of Iowa, Iowa City, IA, USA
- Department of Linguistics, University of Iowa, Iowa City, IA, USA
| | - Bob McMurray
- Department of Psychological and Brain Sciences, University of Iowa, Iowa City, IA, USA
- Department of Linguistics, University of Iowa, Iowa City, IA, USA
| |
Collapse
|
3
|
Redford MA. Speech perception as information processing. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:R7-R8. [PMID: 38558083 DOI: 10.1121/10.0025396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 12/21/2023] [Indexed: 04/04/2024]
Abstract
The Reflections series takes a look back on historical articles from The Journal of the Acoustical Society of America that have had a significant impact on the science and practice of acoustics.
Collapse
Affiliation(s)
- Melissa A Redford
- Department of Linguistics, University of Oregon, 1451 Onyx Street, Eugene, Oregon 97403-1290, USA
| |
Collapse
|
4
|
Crinnion AM, Luthra S, Gaston P, Magnuson JS. Resolving competing predictions in speech: How qualitatively different cues and cue reliability contribute to phoneme identification. Atten Percept Psychophys 2024; 86:942-961. [PMID: 38383914 PMCID: PMC11233028 DOI: 10.3758/s13414-024-02849-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/17/2024] [Indexed: 02/23/2024]
Abstract
Listeners have many sources of information available in interpreting speech. Numerous theoretical frameworks and paradigms have established that various constraints impact the processing of speech sounds, but it remains unclear how listeners might simultaneously consider multiple cues, especially those that differ qualitatively (i.e., with respect to timing and/or modality) or quantitatively (i.e., with respect to cue reliability). Here, we establish that cross-modal identity priming can influence the interpretation of ambiguous phonemes (Exp. 1, N = 40) and show that two qualitatively distinct cues - namely, cross-modal identity priming and auditory co-articulatory context - have additive effects on phoneme identification (Exp. 2, N = 40). However, we find no effect of quantitative variation in a cue - specifically, changes in the reliability of the priming cue did not influence phoneme identification (Exp. 3a, N = 40; Exp. 3b, N = 40). Overall, we find that qualitatively distinct cues can additively influence phoneme identification. While many existing theoretical frameworks address constraint integration to some degree, our results provide a step towards understanding how information that differs in both timing and modality is integrated in online speech perception.
Collapse
Affiliation(s)
| | | | | | - James S Magnuson
- University of Connecticut, Storrs, CT, USA
- BCBL. Basque Center on Cognition, Brain and Language, Donostia-San Sebastián, Spain
- Ikerbasque. Basque Foundation for Science, Bilbao, Spain
| |
Collapse
|
5
|
Hisaizumi M, Tantam D. Enhanced sensitivity to pitch perception and its possible relation to language acquisition in autism. AUTISM & DEVELOPMENTAL LANGUAGE IMPAIRMENTS 2024; 9:23969415241248618. [PMID: 38817731 PMCID: PMC11138189 DOI: 10.1177/23969415241248618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Background and aims Fascinations for or aversions to particular sounds are a familiar feature of autism, as is an ability to reproduce another person's utterances, precisely copying the other person's prosody as well as their words. Such observations seem to indicate not only that autistic people can pay close attention to what they hear, but also that they have the ability to perceive the finer details of auditory stimuli. This is consistent with the previously reported consensus that absolute pitch is more common in autistic individuals than in neurotypicals. We take this to suggest that autistic people have perception that allows them to pay attention to fine details. It is important to establish whether or not this is so as autism is often presented as a deficit rather than a difference. We therefore undertook a narrative literature review of studies of auditory perception, in autistic and nonautistic individuals, focussing on any differences in processing linguistic and nonlinguistic sounds. Main contributions We find persuasive evidence that nonlinguistic auditory perception in autistic children differs from that of nonautistic children. This is supported by the additional finding of a higher prevalence of absolute pitch and enhanced pitch discriminating abilities in autistic children compared to neurotypical children. Such abilities appear to stem from atypical perception, which is biased toward local-level information necessary for processing pitch and other prosodic features. Enhanced pitch discriminating abilities tend to be found in autistic individuals with a history of language delay, suggesting possible reciprocity. Research on various aspects of language development in autism also supports the hypothesis that atypical pitch perception may be accountable for observed differences in language development in autism. Conclusions The results of our review of previously published studies are consistent with the hypothesis that auditory perception, and particularly pitch perception, in autism are different from the norm but not always impaired. Detail-oriented pitch perception may be an advantage given the right environment. We speculate that unusually heightened sensitivity to pitch differences may be at the cost of the normal development of the perception of the sounds that contribute most to early language development. Implications The acquisition of speech and language may be a process that normally involves an enhanced perception of speech sounds at the expense of the processing of nonlinguistic sounds, but autistic children may not give speech sounds this same priority.
Collapse
Affiliation(s)
| | - Digby Tantam
- Middlesex University, Existential Academy, London, UK
| |
Collapse
|
6
|
Schlinger HD. Contrasting Accounts of Early Speech Perception and Production. Perspect Behav Sci 2023; 46:561-583. [PMID: 38144545 PMCID: PMC10733268 DOI: 10.1007/s40614-023-00371-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/24/2023] [Indexed: 12/26/2023] Open
Abstract
Language researchers have historically either dismissed or ignored completely behavioral accounts of language acquisition while at the same time acknowledging the important role of experience in language learning. Many language researchers have also moved away from theories based on an innate generative universal grammar and promoted experience-dependent and usage-based theories of language. These theories suggest that hearing and using language in its context is critical for learning language. However, rather than appealing to empirically derived principles to explain the learning, these theories appeal to inferred cognitive mechanisms. In this article, I describe a usage-based theory of language acquisition as a recent example of a more general cognitive linguistic theory and note both logical and methodological problems. I then present a behavior-analytic theory of speech perception and production and contrast it with cognitive theories. Even though some researchers acknowledge the role of social feedback (they rarely call it reinforcement) in vocal learning, they omit the important role played by automatic reinforcement. I conclude by describing automatic reinforcement as the missing link in a parsimonious account of vocal development in human infants and making comparisons to vocal development in songbirds.
Collapse
|
7
|
Gósy M, Bunta F, Pregitzer M. Speech processing performance of Hungarian-speaking twins and singletons. CLINICAL LINGUISTICS & PHONETICS 2023; 37:979-995. [PMID: 36052433 DOI: 10.1080/02699206.2022.2111274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Revised: 05/30/2022] [Accepted: 08/01/2022] [Indexed: 06/15/2023]
Abstract
Studying speech processing in twins versus their singleton peers provides opportunities to study both genetic and environmental effects on how children acquire these aspects of their speech and - by extension - their phonological systems. Our study focused on speech processing in typically developing Hungarian-speaking twins and their singleton peers between 5 and 9 years of age. Participants included 384 monolingual Hungarian-speaking children (192 twins, and 192 singletons). Data from four tasks - repetition of synthesised monosyllables, nonsense words, well-formed noisy sentences, and well-formed phonologically complex sentences - were analysed. There was a main effect for birth status, and singletons outperformed their twin peers on the majority of the speech processing tasks. Age and task also had effects on the performance of the participants, and there was a three-way task by age by twin versus singleton status indicating that the speech processing performance of twins versus singletons is interdependent with the type of task and age. Our results also indicate that monolingual Hungarian-speaking twins may be at higher risk for developmental speech delays relative to their singleton peers.
Collapse
Affiliation(s)
- Mária Gósy
- Department of Phonetics, Linguistics Institute ELKH and ELTE University, Budapest, Hungary
| | - Ferenc Bunta
- Department of Communication Sciences and Disorders, University of Houston, Houston, Texas, USA
| | | |
Collapse
|
8
|
Masapollo M, Nittrouer S. Interarticulator Speech Coordination: Timing Is of the Essence. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:901-915. [PMID: 36827516 DOI: 10.1044/2022_jslhr-22-00594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
PURPOSE In skilled speech production, sets of articulators, such as the jaw, tongue, and lips, work cooperatively to achieve task-specific movement goals, despite rampant contextual variation. Efforts to understand these functional units, termed coordinative structures, have focused on identifying the essential control parameters responsible for allowing articulators to achieve these goals, with some research focusing on temporal parameters (relative timing of movements) and other research focusing on spatiotemporal parameters (phase angle of movement onset for one articulator, relative to another). Here, both types of parameters were investigated and compared in detail. METHOD Ten talkers recorded nonsense, disyllabic /tV#Cat/ utterances using electromagnetic articulography, with alternative V (/ɑ/-/ɛ/) and C (/t/-/d/), across variation in rate (fast-slow) and stress (first syllable stressed-unstressed). Two measures were obtained: (a) the timing of tongue-tip raising onset for medial C, relative to jaw opening-closing cycles and (b) the angle of tongue-tip raising onset, relative to the jaw phase plane. RESULTS Results showed that any manipulation that shortened the jaw opening-closing cycle reduced both the relative timing and phase angle of the tongue-tip movement onset, but relative timing of tongue-tip movement onset scaled more consistently with jaw opening-closing across rate and stress variation. CONCLUSION These findings suggest the existence of an intrinsic timing mechanism (or "central clock") that is the primary control parameter for coordinative structures, with online compensation then allowing these structures to achieve their goals spatially. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.22144259.
Collapse
Affiliation(s)
- Matthew Masapollo
- Department of Speech, Language, and Hearing Sciences, University of Florida, Gainesville
| | - Susan Nittrouer
- Department of Speech, Language, and Hearing Sciences, University of Florida, Gainesville
| |
Collapse
|
9
|
Tardiff N, Suriya-Arunroj L, Cohen YE, Gold JI. Rule-based and stimulus-based cues bias auditory decisions via different computational and physiological mechanisms. PLoS Comput Biol 2022; 18:e1010601. [PMID: 36206302 PMCID: PMC9581427 DOI: 10.1371/journal.pcbi.1010601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 10/19/2022] [Accepted: 09/26/2022] [Indexed: 11/06/2022] Open
Abstract
Expectations, such as those arising from either learned rules or recent stimulus regularities, can bias subsequent auditory perception in diverse ways. However, it is not well understood if and how these diverse effects depend on the source of the expectations. Further, it is unknown whether different sources of bias use the same or different computational and physiological mechanisms. We examined how rule-based and stimulus-based expectations influenced behavior and pupil-linked arousal, a marker of certain forms of expectation-based processing, of human subjects performing an auditory frequency-discrimination task. Rule-based cues consistently biased choices and response times (RTs) toward the more-probable stimulus. In contrast, stimulus-based cues had a complex combination of effects, including choice and RT biases toward and away from the frequency of recently presented stimuli. These different behavioral patterns also had: 1) distinct computational signatures, including different modulations of key components of a novel form of a drift-diffusion decision model and 2) distinct physiological signatures, including substantial bias-dependent modulations of pupil size in response to rule-based but not stimulus-based cues. These results imply that different sources of expectations can modulate auditory processing via distinct mechanisms: one that uses arousal-linked, rule-based information and another that uses arousal-independent, stimulus-based information to bias the speed and accuracy of auditory perceptual decisions. Prior information about upcoming stimuli can bias our perception of those stimuli. Whether different sources of prior information bias perception in similar or distinct ways is not well understood. We compared the influence of two kinds of prior information on tone-frequency discrimination: rule-based cues, in the form of explicit information about the most-likely identity of the upcoming tone; and stimulus-based cues, in the form of sequences of tones presented before the to-be-discriminated tone. Although both types of prior information biased auditory decision-making, they demonstrated distinct behavioral, computational, and physiological signatures. Our results suggest that the brain processes prior information in a form-specific manner rather than utilizing a general-purpose prior. Such form-specific processing has implications for understanding decision biases real-world contexts, in which prior information comes from many different sources and modalities.
Collapse
Affiliation(s)
- Nathan Tardiff
- Department of Otorhinolaryngology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- * E-mail:
| | - Lalitta Suriya-Arunroj
- Department of Otorhinolaryngology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Yale E. Cohen
- Department of Otorhinolaryngology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Joshua I. Gold
- Department of Neuroscience, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
10
|
McHaney JR, Tessmer R, Roark CL, Chandrasekaran B. Working memory relates to individual differences in speech category learning: Insights from computational modeling and pupillometry. BRAIN AND LANGUAGE 2021; 222:105010. [PMID: 34454285 DOI: 10.1016/j.bandl.2021.105010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Revised: 07/26/2021] [Accepted: 08/10/2021] [Indexed: 05/27/2023]
Abstract
Across two experiments, we examine the relationship between individual differences in working memory (WM) and the acquisition of non-native speech categories in adulthood. While WM is associated with individual differences in a variety of learning tasks, successful acquisition of speech categories is argued to be contingent on WM-independent procedural-learning mechanisms. Thus, the role of WM in speech category learning is unclear. In Experiment 1, we show that individuals with higher WM acquire non-native speech categories faster and to a greater extent than those with lower WM. In Experiment 2, we replicate these results and show that individuals with higher WM use more optimal, procedural-based learning strategies and demonstrate more distinct speech-evoked pupillary responses for correct relative to incorrect trials. We propose that higher WM may allow for greater stimulus-related attention, resulting in more robust representations and optimal learning strategies. We discuss implications for neurobiological models of speech category learning.
Collapse
Affiliation(s)
- Jacie R McHaney
- Department of Communication Science and Disorders, University of Pittsburgh, United States
| | - Rachel Tessmer
- Department of Speech, Language, and Hearing Sciences, University of Texas at Austin, United States
| | - Casey L Roark
- Department of Communication Science and Disorders, University of Pittsburgh, United States; Center for the Neural Basis of Cognition, Pittsburgh, PA, United States
| | - Bharath Chandrasekaran
- Department of Communication Science and Disorders, University of Pittsburgh, United States.
| |
Collapse
|
11
|
Venezia JH, Richards VM, Hickok G. Speech-Driven Spectrotemporal Receptive Fields Beyond the Auditory Cortex. Hear Res 2021; 408:108307. [PMID: 34311190 PMCID: PMC8378265 DOI: 10.1016/j.heares.2021.108307] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 06/15/2021] [Accepted: 06/30/2021] [Indexed: 10/20/2022]
Abstract
We recently developed a method to estimate speech-driven spectrotemporal receptive fields (STRFs) using fMRI. The method uses spectrotemporal modulation filtering, a form of acoustic distortion that renders speech sometimes intelligible and sometimes unintelligible. Using this method, we found significant STRF responses only in classic auditory regions throughout the superior temporal lobes. However, our analysis was not optimized to detect small clusters of STRFs as might be expected in non-auditory regions. Here, we re-analyze our data using a more sensitive multivariate statistical test for cross-subject alignment of STRFs, and we identify STRF responses in non-auditory regions including the left dorsal premotor cortex (dPM), left inferior frontal gyrus (IFG), and bilateral calcarine sulcus (calcS). All three regions responded more to intelligible than unintelligible speech, but left dPM and calcS responded significantly to vocal pitch and demonstrated strong functional connectivity with early auditory regions. Left dPM's STRF generated the best predictions of activation on trials rated as unintelligible by listeners, a hallmark auditory profile. IFG, on the other hand, responded almost exclusively to intelligible speech and was functionally connected with classic speech-language regions in the superior temporal sulcus and middle temporal gyrus. IFG's STRF was also (weakly) able to predict activation on unintelligible trials, suggesting the presence of a partial 'acoustic trace' in the region. We conclude that left dPM is part of the human dorsal laryngeal motor cortex, a region previously shown to be capable of operating in an 'auditory mode' to encode vocal pitch. Further, given previous observations that IFG is involved in syntactic working memory and/or processing of linear order, we conclude that IFG is part of a higher-order speech circuit that exerts a top-down influence on processing of speech acoustics. Finally, because calcS is modulated by emotion, we speculate that changes in the quality of vocal pitch may have contributed to its response.
Collapse
Affiliation(s)
- Jonathan H Venezia
- VA Loma Linda Healthcare System, Loma Linda, CA, United States; Dept. of Otolaryngology, Loma Linda University School of Medicine, Loma Linda, CA, United States.
| | - Virginia M Richards
- Depts. of Cognitive Sciences and Language Science, University of California, Irvine, Irvine, CA, United States
| | - Gregory Hickok
- Depts. of Cognitive Sciences and Language Science, University of California, Irvine, Irvine, CA, United States
| |
Collapse
|
12
|
Nagaraj NK. Effect of Auditory Distraction on Working Memory, Attention Switching, and Listening Comprehension. Audiol Res 2021; 11:227-243. [PMID: 34071364 PMCID: PMC8161440 DOI: 10.3390/audiolres11020021] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 05/25/2021] [Accepted: 05/26/2021] [Indexed: 11/16/2022] Open
Abstract
The effect of non-informational speech spectrum noise as a distractor on cognitive and listening comprehension ability was examined in fifty-three young, normal hearing adults. Time-controlled tasks were used to measure auditory working memory (WM) capacity and attention switching (AS) ability. Listening comprehension was measured using a lecture, interview, and spoken narratives test. Noise level was individually set to achieve at least 90% or higher speech intelligibility. Participants' listening comprehension in the presence of distracting noise was better on inference questions compared to listening in quiet. Their speed of information processing was also significantly faster in WM and AS tasks in noise. These results were consistent with the view that noise may enhance arousal levels leading to faster information processing during cognitive tasks. Whereas the speed of AS was faster in noise, this rapid switching of attention resulted in more errors in updating items. Participants who processed information faster in noise and did so accurately, more effectively switched their attention to refresh/rehearse recall items within WM. More efficient processing deployed in the presence of noise appeared to have led to improvements in WM performance and making inferences in a listening comprehension task. Additional research is required to examine these findings using background noise that can cause informational masking.
Collapse
Affiliation(s)
- Naveen K Nagaraj
- Cognitive Hearing Science Lab, Communicative Disorders & Deaf Education, Utah State University, Logan, UT 84322, USA
| |
Collapse
|
13
|
Feng G, Yi HG, Chandrasekaran B. The Role of the Human Auditory Corticostriatal Network in Speech Learning. Cereb Cortex 2020; 29:4077-4089. [PMID: 30535138 DOI: 10.1093/cercor/bhy289] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2018] [Revised: 08/30/2018] [Indexed: 01/26/2023] Open
Abstract
We establish a mechanistic account of how the mature human brain functionally reorganizes to acquire and represent new speech sounds. Native speakers of English learned to categorize Mandarin lexical tone categories produced by multiple talkers using trial-by-trial feedback. We hypothesized that the corticostriatal system is a key intermediary in mediating temporal lobe plasticity and the acquisition of new speech categories in adulthood. We conducted a functional magnetic resonance imaging experiment in which participants underwent a sound-to-category mapping task. Diffusion tensor imaging data were collected, and probabilistic fiber tracking analysis was employed to assay the auditory corticostriatal pathways. Multivariate pattern analysis showed that talker-invariant novel tone category representations emerged in the left superior temporal gyrus (LSTG) within a few hundred training trials. Univariate analysis showed that the putamen, a subregion of the striatum, was sensitive to positive feedback in correctly categorized trials. With learning, functional coupling between the putamen and LSTG increased during error processing. Furthermore, fiber tractography demonstrated robust structural connectivity between the feedback-sensitive striatal regions and the LSTG regions that represent the newly learned tone categories. Our convergent findings highlight a critical role for the auditory corticostriatal circuitry in mediating the acquisition of new speech categories.
Collapse
Affiliation(s)
- Gangyi Feng
- Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Hong Kong SAR, China.,Brain and Mind Institute, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Han Gyol Yi
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Bharath Chandrasekaran
- Department of Communication Science and Disorders, School of Health and Rehabilitation Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA
| |
Collapse
|
14
|
Jasmin K, Dick F, Holt LL, Tierney A. Tailored perception: Individuals' speech and music perception strategies fit their perceptual abilities. J Exp Psychol Gen 2020; 149:914-934. [PMID: 31589067 PMCID: PMC7133494 DOI: 10.1037/xge0000688] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Revised: 08/09/2019] [Accepted: 08/12/2019] [Indexed: 01/09/2023]
Abstract
Perception involves integration of multiple dimensions that often serve overlapping, redundant functions, for example, pitch, duration, and amplitude in speech. Individuals tend to prioritize these dimensions differently (stable, individualized perceptual strategies), but the reason for this has remained unclear. Here we show that perceptual strategies relate to perceptual abilities. In a speech cue weighting experiment (trial N = 990), we first demonstrate that individuals with a severe deficit for pitch perception (congenital amusics; N = 11) categorize linguistic stimuli similarly to controls (N = 11) when the main distinguishing cue is duration, which they perceive normally. In contrast, in a prosodic task where pitch cues are the main distinguishing factor, we show that amusics place less importance on pitch and instead rely more on duration cues-even when pitch differences in the stimuli are large enough for amusics to discern. In a second experiment testing musical and prosodic phrase interpretation (N = 16 amusics; 15 controls), we found that relying on duration allowed amusics to overcome their pitch deficits to perceive speech and music successfully. We conclude that auditory signals, because of their redundant nature, are robust to impairments for specific dimensions, and that optimal speech and music perception strategies depend not only on invariant acoustic dimensions (the physical signal), but on perceptual dimensions whose precision varies across individuals. Computational models of speech perception (indeed, all types of perception involving redundant cues e.g., vision and touch) should therefore aim to account for the precision of perceptual dimensions and characterize individuals as well as groups. (PsycInfo Database Record (c) 2020 APA, all rights reserved).
Collapse
Affiliation(s)
| | - Fred Dick
- Department of Psychological Sciences
| | | | | |
Collapse
|
15
|
Oxenham AJ. Spectral contrast effects and auditory enhancement under normal and impaired hearing. ACOUSTICAL SCIENCE AND TECHNOLOGY 2020; 41:108-112. [PMID: 32362758 PMCID: PMC7194197 DOI: 10.1250/ast.41.108] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
We are generally able to identify sounds and understand speech with ease, despite the large variations in the acoustics of each sound, which occur due to factors such as different talkers, background noise, and room acoustics. This form of perceptual constancy is likely to be mediated in part by the auditory system's ability to adapt to the ongoing environment or context in which sounds are presented. Auditory context effects have been studied under different names, such as spectral contrast effects in speech and auditory enhancement effects in psychoacoustics, but they share some important properties and may be mediated by similar underlying neural mechanisms. This review provides a survey of recent studies from our laboratory that investigate the mechanisms of speech spectral contrast effects and auditory enhancement in people with normal hearing, hearing loss, and cochlear implants. We argue that a better understanding of such context effects in people with normal hearing may allow us to restore some of these important effects for people with hearing loss via signal processing in hearing aids and cochlear implants, thereby potentially improving auditory and speech perception in the complex and variable everyday acoustic backgrounds that surround us.
Collapse
Affiliation(s)
- Andrew J. Oxenham
- Department of Psychology, University of Minnesota – Twin Cities, Elliott Hall N218, 75 East River Road, Minneapolis, Minnesota 55455, USA
| |
Collapse
|
16
|
Constraints on learning disjunctive, unidimensional auditory and phonetic categories. Atten Percept Psychophys 2019; 81:958-980. [PMID: 30761500 DOI: 10.3758/s13414-019-01683-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Phonetic categories must be learned, but the processes that allow that learning to unfold are still under debate. The current study investigates constraints on the structure of categories that can be learned and whether these constraints are speech-specific. Category structure constraints are a key difference between theories of category learning, which can roughly be divided into instance-based learning (i.e., exemplar only) and abstractionist learning (i.e., at least partly rule-based or prototype-based) theories. Abstractionist theories can relatively easily accommodate constraints on the structure of categories that can be learned, whereas instance-based theories cannot easily include such constraints. The current study included three groups to investigate these possible constraints as well as their speech specificity: English speakers learning German speech categories, German speakers learning German speech categories, and English speakers learning musical instrument categories, with each group including participants who learned different sets of categories. Both speech groups had greater difficulty learning disjunctive categories (ones that require an "or" statement) than nondisjunctive categories, which suggests that instance-based learning alone is insufficient to explain the learning of the participants learning phonetic categories. This fact was true for both novices (English speakers) and experts (German speakers), which implies that expertise with the materials used cannot explain the patterns observed. However, the same was not true for the musical instrument categories, suggesting a degree of domain-specificity in these constraints that cannot be explained through recourse to expertise alone.
Collapse
|
17
|
Llompart M, Reinisch E. Imitation in a Second Language Relies on Phonological Categories but Does Not Reflect the Productive Usage of Difficult Sound Contrasts. LANGUAGE AND SPEECH 2019; 62:594-622. [PMID: 30319031 DOI: 10.1177/0023830918803978] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This study investigated the relationship between imitation and both the perception and production abilities of second language (L2) learners for two non-native contrasts differing in their expected degree of difficulty. German learners of English were tested on perceptual categorization, imitation and a word reading task for the difficult English /ɛ/-/æ/ contrast, which tends not to be well encoded in the learners' phonological inventories, and the easy, near-native /i/-/ɪ/ contrast. As expected, within-task comparisons between contrasts revealed more robust perception and better differentiation during production for /i/-/ɪ/ than /ɛ/-/æ/. Imitation also followed this pattern, suggesting that imitation is modulated by the phonological encoding of L2 categories. Moreover, learners' ability to imitate /ɛ/ and /æ/ was related to their perception of that contrast, confirming a tight perception-production link at the phonological level for difficult L2 sound contrasts. However, no relationship was observed between acoustic measures for imitated and read-aloud tokens of /ɛ/ and /æ/. This dissociation is mostly attributed to the influence of inaccurate non-native lexical representations in the word reading task. We conclude that imitation is strongly related to the phonological representation of L2 sound contrasts, but does not need to reflect the learners' productive usage of such non-native distinctions.
Collapse
|
18
|
Schreiber KE, McMurray B. Listeners can anticipate future segments before they identify the current one. Atten Percept Psychophys 2019; 81:1147-1166. [PMID: 31087271 PMCID: PMC6688751 DOI: 10.3758/s13414-019-01712-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Speech unfolds rapidly over time, and the information necessary to recognize even a single phoneme may not be available simultaneously. Consequently, listeners must both integrate prior acoustic cues and anticipate future segments. Prior work on stop consonants and vowels suggests that listeners integrate asynchronous cues by partially activating lexical entries as soon as any information is available, and then updating this when later cues arrive. However, a recent study suggests that for the voiceless sibilant fricatives (/s/ and /ʃ/), listeners wait to initiate lexical access until all cues have arrived at the onset of the vowel. Sibilants also contain coarticulatory cues that could be used to anticipate the vowel upcoming. However, given these results, it is unclear if listeners could use them fast enough to speed vowel recognition. The current study examines anticipation by asking when listeners use coarticulatory information in the frication to predict the upcoming vowel. A visual world paradigm experiment found that listeners do not wait: they anticipate the vowel immediately from the onset of the frication, even as they wait several hundred milliseconds to identify the fricative. This finding suggests listeners do not strictly process phonemes in the order that they appear; rather the dynamics of language processing may be largely internal and only loosely coupled to the dynamics of the input.
Collapse
Affiliation(s)
- Kayleen E Schreiber
- Interdisciplinary Graduate Program in Neuroscience, University of Iowa, Iowa City, IA, USA
| | - Bob McMurray
- Department of Psychological and Brain Sciences, Department of Communication Sciences and Disorders, Department of Linguistics, University of Iowa, W311 SSH, Iowa City, IA, 52242, USA.
| |
Collapse
|
19
|
Barnaud ML, Schwartz JL, Bessière P, Diard J. Computer simulations of coupled idiosyncrasies in speech perception and speech production with COSMO, a perceptuo-motor Bayesian model of speech communication. PLoS One 2019; 14:e0210302. [PMID: 30633745 PMCID: PMC6329510 DOI: 10.1371/journal.pone.0210302] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Accepted: 12/18/2018] [Indexed: 01/09/2023] Open
Abstract
The existence of a functional relationship between speech perception and production systems is now widely accepted, but the exact nature and role of this relationship remains quite unclear. The existence of idiosyncrasies in production and in perception sheds interesting light on the nature of the link. Indeed, a number of studies explore inter-individual variability in auditory and motor prototypes within a given language, and provide evidence for a link between both sets. In this paper, we attempt to simulate one study on coupled idiosyncrasies in the perception and production of French oral vowels, within COSMO, a Bayesian computational model of speech communication. First, we show that if the learning process in COSMO includes a communicative mechanism between a Learning Agent and a Master Agent, vowel production does display idiosyncrasies. Second, we implement within COSMO three models for speech perception that are, respectively, auditory, motor and perceptuo-motor. We show that no idiosyncrasy in perception can be obtained in the auditory model, since it is optimally tuned to the learning environment, which does not include the motor variability of the Learning Agent. On the contrary, motor and perceptuo-motor models provide perception idiosyncrasies correlated with idiosyncrasies in production. We draw conclusions about the role and importance of motor processes in speech perception, and propose a perceptuo-motor model in which auditory processing would enable optimal processing of learned sounds and motor processing would be helpful in unlearned adverse conditions.
Collapse
Affiliation(s)
- Marie-Lou Barnaud
- Univ. Grenoble Alpes, Gipsa-lab, Grenoble, France.,CNRS, Gipsa-lab, Grenoble, France.,Univ. Grenoble Alpes, LPNC, Grenoble, France.,CNRS, LPNC, Grenoble, France
| | - Jean-Luc Schwartz
- Univ. Grenoble Alpes, Gipsa-lab, Grenoble, France.,CNRS, Gipsa-lab, Grenoble, France
| | | | - Julien Diard
- Univ. Grenoble Alpes, LPNC, Grenoble, France.,CNRS, LPNC, Grenoble, France
| |
Collapse
|
20
|
Abstract
There is substantial evidence that two distinct learning systems are engaged in category learning. One is principally engaged when learning requires selective attention to a single dimension (rule-based), and the other is drawn online by categories requiring integration across two or more dimensions (information-integration). This distinction has largely been drawn from studies of visual categories learned via overt category decisions and explicit feedback. Recent research has extended this model to auditory categories, the nature of which introduces new questions for research. With the present experiment, we addressed the influences of incidental versus overt training and category distribution sampling on learning information-integration and rule-based auditory categories. The results demonstrate that the training task influences category learning, with overt feedback generally outperforming incidental feedback. Additionally, distribution sampling (probabilistic or deterministic) and category type (information-integration or rule-based) both affect how well participants are able to learn. Specifically, rule-based categories are learned equivalently, regardless of distribution sampling, whereas information-integration categories are learned better with deterministic than with probabilistic sampling. The interactions of distribution sampling, category type, and kind of feedback impacted category-learning performance, but these interactions have not yet been integrated into existing category-learning models. These results suggest new dimensions for understanding category learning, inspired by the real-world properties of auditory categories.
Collapse
|
21
|
Knowles JM, Doupe AJ, Brainard MS. Zebra finches are sensitive to combinations of temporally distributed features in a model of word recognition. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:872. [PMID: 30180710 PMCID: PMC6103769 DOI: 10.1121/1.5050910] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2018] [Accepted: 07/21/2018] [Indexed: 06/08/2023]
Abstract
Discrimination between spoken words composed of overlapping elements, such as "captain" and "captive," relies on sensitivity to unique combinations of prefix and suffix elements that span a "uniqueness point" where the word candidates diverge. To model such combinatorial processing, adult female zebra finches were trained to discriminate between target and distractor syllable sequences that shared overlapping "contextual" prefixes and differed only in their "informative" suffixes. The transition from contextual to informative syllables thus created a uniqueness point analogous to that present between overlapping word candidates, where targets and distractors diverged. It was found that target recognition depended not only on informative syllables, but also on contextual syllables that were shared with distractors. Moreover, the influence of each syllable depended on proximity to the uniqueness point. Birds were then trained birds with targets and distractors that shared both prefix and suffix sequences and could only be discriminated by recognizing unique combinations of those sequences. Birds learned to robustly discriminate target and distractor combinations and maintained significant discrimination when the local transitions from prefix to suffix were disrupted. These findings indicate that birds, like humans, combine information across temporally distributed features, spanning contextual and informative elements, in recognizing and discriminating word-like stimuli.
Collapse
Affiliation(s)
- Jeffrey M Knowles
- Center for Integrative Neuroscience, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, California 94158, USA
| | - Allison J Doupe
- Center for Integrative Neuroscience, University of California, San Francisco, 675 Nelson Rising Lane, San Francisco, California 94158, USA
| | - Michael S Brainard
- Howard Hughes Medical Institute, University of California, San Francisco, San Francisco, California 94158, USA
| |
Collapse
|
22
|
Magimairaj BM, Nagaraj NK. Working Memory and Auditory Processing in School-Age Children. Lang Speech Hear Serv Sch 2018; 49:409-423. [DOI: 10.1044/2018_lshss-17-0099] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2017] [Accepted: 03/28/2018] [Indexed: 11/09/2022] Open
Abstract
Purpose
Our goal is to present the relationships between working memory (WM) and auditory processing abilities in school-age children.
Review and Discussion
We begin with an overview of auditory processing, the conceptualization of auditory processing disorder, and the assessment of auditory processing abilities in children. Next, we describe a model of WM and a model of auditory processing followed by their comparison. Evidence for the relationships between WM and auditory processing abilities in school-age children follows. Specifically, we present evidence for the association (or lack thereof) between WM/attention and auditory processing test performance.
Clinical Implications
In conclusion, we describe a new framework for understanding auditory processing abilities in children based on integrated evidence from cognitive science, hearing science, and language science. We also discuss clinical implications in children that could inform future research.
Collapse
Affiliation(s)
- Beula M. Magimairaj
- Cognition and Language Lab, Communication Sciences and Disorders, University of Central Arkansas, Conway
| | - Naveen K. Nagaraj
- Cognitive Hearing Science Lab, Audiology and Speech Pathology, University of Arkansas for Medical Sciences/University of Arkansas at Little Rock
| |
Collapse
|
23
|
Magimairaj BM, Nagaraj NK, Benafield NJ. Children's Speech Perception in Noise: Evidence for Dissociation From Language and Working Memory. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2018; 61:1294-1305. [PMID: 29800354 DOI: 10.1044/2018_jslhr-h-17-0312] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Accepted: 01/12/2018] [Indexed: 06/08/2023]
Abstract
PURPOSE We examined the association between speech perception in noise (SPIN), language abilities, and working memory (WM) capacity in school-age children. Existing studies supporting the Ease of Language Understanding (ELU) model suggest that WM capacity plays a significant role in adverse listening situations. METHOD Eighty-three children between the ages of 7 to 11 years participated. The sample represented a continuum of individual differences in attention, memory, and language abilities. All children had normal-range hearing and normal-range nonverbal IQ. Children completed the Bamford-Kowal-Bench Speech-in-Noise Test (BKB-SIN; Etymotic Research, 2005), a selective auditory attention task, and multiple measures of language and WM. RESULTS Partial correlations (controlling for age) showed significant positive associations among attention, memory, and language measures. However, BKB-SIN did not correlate significantly with any of the other measures. Principal component analysis revealed a distinct WM factor and a distinct language factor. BKB-SIN loaded robustly as a distinct 3rd factor with minimal secondary loading from sentence recall and short-term memory. Nonverbal IQ loaded as a 4th factor. CONCLUSIONS Results did not support an association between SPIN and WM capacity in children. However, in this study, a single SPIN measure was used. Future studies using multiple SPIN measures are warranted. Evidence from the current study supports the use of BKB-SIN as clinical measure of speech perception ability because it was not influenced by variation in children's language and memory abilities. More large-scale studies in school-age children are needed to replicate the proposed role played by WM in adverse listening situations.
Collapse
Affiliation(s)
- Beula M Magimairaj
- Cognition and Language Lab, Communication Sciences and Disorders, University of Central Arkansas, Conway
| | - Naveen K Nagaraj
- Cognitive Hearing Science Lab, University of Arkansas for Medical Sciences/University of Arkansas at Little Rock
| | - Natalie J Benafield
- Cognition and Language Lab, Communication Sciences and Disorders, University of Central Arkansas, Conway
| |
Collapse
|
24
|
Maas E. Speech and nonspeech: What are we talking about? INTERNATIONAL JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2017; 19:345-359. [PMID: 27701907 PMCID: PMC5380597 DOI: 10.1080/17549507.2016.1221995] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2015] [Revised: 08/03/2016] [Accepted: 08/05/2016] [Indexed: 05/29/2023]
Abstract
Understanding of the behavioural, cognitive and neural underpinnings of speech production is of interest theoretically, and is important for understanding disorders of speech production and how to assess and treat such disorders in the clinic. This paper addresses two claims about the neuromotor control of speech production: (1) speech is subserved by a distinct, specialised motor control system and (2) speech is holistic and cannot be decomposed into smaller primitives. Both claims have gained traction in recent literature, and are central to a task-dependent model of speech motor control. The purpose of this paper is to stimulate thinking about speech production, its disorders and the clinical implications of these claims. The paper poses several conceptual and empirical challenges for these claims - including the critical importance of defining speech. The emerging conclusion is that a task-dependent model is called into question as its two central claims are founded on ill-defined and inconsistently applied concepts. The paper concludes with discussion of methodological and clinical implications, including the potential utility of diadochokinetic (DDK) tasks in assessment of motor speech disorders and the contraindication of nonspeech oral motor exercises to improve speech function.
Collapse
Affiliation(s)
- Edwin Maas
- a Department of Communication Sciences and Disorders , Temple University , Philadelphia , PA , USA
| |
Collapse
|
25
|
Chambers C, Akram S, Adam V, Pelofi C, Sahani M, Shamma S, Pressnitzer D. Prior context in audition informs binding and shapes simple features. Nat Commun 2017; 8:15027. [PMID: 28425433 PMCID: PMC5411480 DOI: 10.1038/ncomms15027] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2016] [Accepted: 02/21/2017] [Indexed: 11/17/2022] Open
Abstract
A perceptual phenomenon is reported, whereby prior acoustic context has a large, rapid and long-lasting effect on a basic auditory judgement. Pairs of tones were devised to include ambiguous transitions between frequency components, such that listeners were equally likely to report an upward or downward 'pitch' shift between tones. We show that presenting context tones before the ambiguous pair almost fully determines the perceived direction of shift. The context effect generalizes to a wide range of temporal and spectral scales, encompassing the characteristics of most realistic auditory scenes. Magnetoencephalographic recordings show that a relative reduction in neural responsivity is correlated to the behavioural effect. Finally, a computational model reproduces behavioural results, by implementing a simple constraint of continuity for binding successive sounds in a probabilistic manner. Contextual processing, mediated by ubiquitous neural mechanisms such as adaptation, may be crucial to track complex sound sources over time.
Collapse
Affiliation(s)
- Claire Chambers
- Laboratoire des Systèmes Perceptifs, CNRS UMR 8248, Paris 75005, France
- Département d'Etudes Cognitives, École Normale Supérieure (ENS), PSL Research University, Paris 75005, France
- Department of Physical Medicine and Rehabilitation, Northwestern University and Rehabilitation Institute of Chicago, Chicago, Illinois 60611, USA
| | - Sahar Akram
- Electrical and Computer Engineering & Institute for Systems Research, University of Maryland, College Park, Maryland 20742, USA
| | - Vincent Adam
- Gatsby Computational Neuroscience Unit, University College London, London WC1E 6BT, UK
| | - Claire Pelofi
- Laboratoire des Systèmes Perceptifs, CNRS UMR 8248, Paris 75005, France
- Département d'Etudes Cognitives, École Normale Supérieure (ENS), PSL Research University, Paris 75005, France
| | - Maneesh Sahani
- Gatsby Computational Neuroscience Unit, University College London, London WC1E 6BT, UK
| | - Shihab Shamma
- Laboratoire des Systèmes Perceptifs, CNRS UMR 8248, Paris 75005, France
- Département d'Etudes Cognitives, École Normale Supérieure (ENS), PSL Research University, Paris 75005, France
- Electrical and Computer Engineering & Institute for Systems Research, University of Maryland, College Park, Maryland 20742, USA
| | - Daniel Pressnitzer
- Laboratoire des Systèmes Perceptifs, CNRS UMR 8248, Paris 75005, France
- Département d'Etudes Cognitives, École Normale Supérieure (ENS), PSL Research University, Paris 75005, France
| |
Collapse
|
26
|
Maddox WT, Koslov S, Yi HG, Chandrasekaran B. Performance Pressure Enhances Speech Learning. APPLIED PSYCHOLINGUISTICS 2016; 37:1369-1396. [PMID: 28077883 PMCID: PMC5222599 DOI: 10.1017/s0142716415000600] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Real-world speech learning often occurs in high pressure situations such as trying to communicate in a foreign country. However, the impact of pressure on speech learning success is largely unexplored. In this study, adult, native speakers of English learned non-native speech categories under pressure or no-pressure conditions. In the pressure conditions, participants were informed that they were paired with a (fictitious) partner, and that each had to independently exceed a performance criterion for both to receive a monetary bonus. They were then informed that their partner had exceeded the bonus and the fate of both bonuses depended upon the participant's performance. Our results demonstrate that pressure significantly enhanced speech learning success. In addition, neurobiologically-inspired computational modeling revealed that the performance advantage was due to faster and more frequent use of procedural learning strategies. These results integrate two well-studied research domains and suggest a facilitatory role of motivational factors in speech learning performance that may not be captured in traditional training paradigms.
Collapse
Affiliation(s)
- W Todd Maddox
- Department of Psychology, 1 University Station A8000, Austin, TX, USA, 78712
| | - Seth Koslov
- Department of Psychology, 1 University Station A8000, Austin, TX, USA, 78712
| | - Han-Gyol Yi
- Department of Communication Sciences and Disorders, 1 University Station A1100, Austin, TX, USA, 78712
| | - Bharath Chandrasekaran
- Department of Psychology, 1 University Station A8000, Austin, TX, USA, 78712; Department of Communication Sciences and Disorders, 1 University Station A1100, Austin, TX, USA, 78712
| |
Collapse
|
27
|
Abstract
Listeners possess a remarkable ability to adapt to acoustic variability in the realization of speech sound categories (e.g., different accents). The current work tests whether non-native listeners adapt their use of acoustic cues in phonetic categorization when they are confronted with changes in the distribution of cues in the input, as native listeners do, and examines to what extent these adaptation patterns are influenced by individual cue-weighting strategies. In line with previous work, native English listeners, who use voice onset time (VOT) as a primary cue to the stop voicing contrast (e.g., 'pa' vs. 'ba'), adjusted their use of f0 (a secondary cue to the contrast) when confronted with a noncanonical "accent" in which the two cues gave conflicting information about category membership. Native Korean listeners' adaptation strategies, while variable, were predictable based on their initial cue weighting strategies. In particular, listeners who used f0 as the primary cue to category membership adjusted their use of VOT (their secondary cue) in response to the noncanonical accent, mirroring the native pattern of "downweighting" a secondary cue. Results suggest that non-native listeners show native-like sensitivity to distributional information in the input and use this information to adjust categorization, just as native listeners do, with the specific trajectory of category adaptation governed by initial cue-weighting strategies.
Collapse
|
28
|
Atcherson SR, Nagaraj NK, Kennett SEW, Levisee M. Overview of Central Auditory Processing Deficits in Older Adults. Semin Hear 2016; 36:150-61. [PMID: 27516715 DOI: 10.1055/s-0035-1555118] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
Abstract
Although there are many reported age-related declines in the human body, the notion that a central auditory processing deficit exists in older adults has not always been clear. Hearing loss and both structural and functional central nervous system changes with advancing age are contributors to how we listen, hear, and process auditory information. Even older adults with normal or near normal hearing sensitivity may exhibit age-related central auditory processing deficits as measured behaviorally and/or electrophysiologically. The purpose of this article is to provide an overview of assessment and rehabilitative approaches for central auditory processing deficits in older adults. It is hoped that the outcome of the information presented here will help clinicians with older adult patients who do not exhibit the typical auditory processing behaviors exhibited by others at the same age and with comparable hearing sensitivity all in the absence of other health-related conditions.
Collapse
Affiliation(s)
- Samuel R Atcherson
- University of Arkansas at Little Rock/University of Arkansas for Medical Sciences, Little Rock, Arkansas; Arkansas Consortium for the Ph.D. in Communication Sciences and Disorders, Little Rock, Arkansas
| | - Naveen K Nagaraj
- University of Arkansas at Little Rock/University of Arkansas for Medical Sciences, Little Rock, Arkansas; Arkansas Consortium for the Ph.D. in Communication Sciences and Disorders, Little Rock, Arkansas
| | - Sarah E W Kennett
- University of Arkansas at Little Rock/University of Arkansas for Medical Sciences, Little Rock, Arkansas; Arkansas Consortium for the Ph.D. in Communication Sciences and Disorders, Little Rock, Arkansas; Arkansas Children's Hospital, Little Rock, Arkansas
| | - Meredith Levisee
- University of Arkansas at Little Rock/University of Arkansas for Medical Sciences, Little Rock, Arkansas
| |
Collapse
|
29
|
Ou J, Law SP. Individual differences in processing pitch contour and rise time in adults: A behavioral and electrophysiological study of Cantonese tone merging. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 139:3226. [PMID: 27369146 DOI: 10.1121/1.4954252] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
One way to understand the relationship between speech perception and production is to examine cases where the two dissociate. This study investigates the hypothesis that perceptual acuity reflected in event-related potentials (ERPs) to rise time of sound amplitude envelope and pitch contour [reflected in the mismatch negativity (MMN)] may associate with individual differences in production among speakers with otherwise comparable perceptual abilities. To test this hypothesis, advantage was taken of an on-going sound change-tone merging in Cantonese, and compared the ERPs between two groups of typically developed native speakers who could discriminate the high rising and low rising tones with equivalent accuracy but differed in the distinctiveness of their production of these tones. Using a passive oddball paradigm, early positive-going EEG components to rise time and MMN to pitch contour were elicited during perception of the two tones. Significant group differences were found in neural responses to rise time rather than pitch contour. More importantly, individual differences in efficiency of tone discrimination in response latency and magnitude of neural responses to rise time were correlated with acoustic measures of F0 offset and rise time differences in productions of the two rising tones.
Collapse
Affiliation(s)
- Jinghua Ou
- Division of Speech and Hearing Science, the University of Hong Kong, Hong Kong Special Administrative Region
| | - Sam-Po Law
- Division of Speech and Hearing Science, the University of Hong Kong, Hong Kong Special Administrative Region
| |
Collapse
|
30
|
Abstract
Dual-system models of visual category learning posit the existence of an explicit, hypothesis-testing reflective system, as well as an implicit, procedural-based reflexive system. The reflective and reflexive learning systems are competitive and neurally dissociable. Relatively little is known about the role of these domain-general learning systems in speech category learning. Given the multidimensional, redundant, and variable nature of acoustic cues in speech categories, our working hypothesis is that speech categories are learned reflexively. To this end, we examined the relative contribution of these learning systems to speech learning in adults. Native English speakers learned to categorize Mandarin tone categories over 480 trials. The training protocol involved trial-by-trial feedback and multiple talkers. Experiments 1 and 2 examined the effect of manipulating the timing (immediate vs. delayed) and information content (full vs. minimal) of feedback. Dual-system models of visual category learning predict that delayed feedback and providing rich, informational feedback enhance reflective learning, while immediate and minimally informative feedback enhance reflexive learning. Across the two experiments, our results show that feedback manipulations that targeted reflexive learning enhanced category learning success. In Experiment 3, we examined the role of trial-to-trial talker information (mixed vs. blocked presentation) on speech category learning success. We hypothesized that the mixed condition would enhance reflexive learning by not allowing an association between talker-related acoustic cues and speech categories. Our results show that the mixed talker condition led to relatively greater accuracies. Our experiments demonstrate that speech categories are optimally learned by training methods that target the reflexive learning system.
Collapse
|
31
|
Rhone AE, Nourski KV, Oya H, Kawasaki H, Howard MA, McMurray B. Can you hear me yet? An intracranial investigation of speech and non-speech audiovisual interactions in human cortex. LANGUAGE, COGNITION AND NEUROSCIENCE 2015; 31:284-302. [PMID: 27182530 PMCID: PMC4865257 DOI: 10.1080/23273798.2015.1101145] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In everyday conversation, viewing a talker's face can provide information about the timing and content of an upcoming speech signal, resulting in improved intelligibility. Using electrocorticography, we tested whether human auditory cortex in Heschl's gyrus (HG) and on superior temporal gyrus (STG) and motor cortex on precentral gyrus (PreC) were responsive to visual/gestural information prior to the onset of sound and whether early stages of auditory processing were sensitive to the visual content (speech syllable versus non-speech motion). Event-related band power (ERBP) in the high gamma band was content-specific prior to acoustic onset on STG and PreC, and ERBP in the beta band differed in all three areas. Following sound onset, we found with no evidence for content-specificity in HG, evidence for visual specificity in PreC, and specificity for both modalities in STG. These results support models of audio-visual processing in which sensory information is integrated in non-primary cortical areas.
Collapse
|
32
|
Gabay Y, Holt LL. Incidental learning of sound categories is impaired in developmental dyslexia. Cortex 2015; 73:131-43. [PMID: 26409017 DOI: 10.1016/j.cortex.2015.08.008] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Revised: 06/09/2015] [Accepted: 08/07/2015] [Indexed: 11/29/2022]
Abstract
Developmental dyslexia is commonly thought to arise from specific phonological impairments. However, recent evidence is consistent with the possibility that phonological impairments arise as symptoms of an underlying dysfunction of procedural learning. The nature of the link between impaired procedural learning and phonological dysfunction is unresolved. Motivated by the observation that speech processing involves the acquisition of procedural category knowledge, the present study investigates the possibility that procedural learning impairment may affect phonological processing by interfering with the typical course of phonetic category learning. The present study tests this hypothesis while controlling for linguistic experience and possible speech-specific deficits by comparing auditory category learning across artificial, nonlinguistic sounds among dyslexic adults and matched controls in a specialized first-person shooter videogame that has been shown to engage procedural learning. Nonspeech auditory category learning was assessed online via within-game measures and also with a post-training task involving overt categorization of familiar and novel sound exemplars. Each measure reveals that dyslexic participants do not acquire procedural category knowledge as effectively as age- and cognitive-ability matched controls. This difference cannot be explained by differences in perceptual acuity for the sounds. Moreover, poor nonspeech category learning is associated with slower phonological processing. Whereas phonological processing impairments have been emphasized as the cause of dyslexia, the current results suggest that impaired auditory category learning, general in nature and not specific to speech signals, could contribute to phonological deficits in dyslexia with subsequent negative effects on language acquisition and reading. Implications for the neuro-cognitive mechanisms of developmental dyslexia are discussed.
Collapse
Affiliation(s)
- Yafit Gabay
- Carnegie Mellon University, Department of Psychology, Pittsburgh, PA, USA; Center for the Neural Basis of Cognition, Pittsburgh, PA, USA.
| | - Lori L Holt
- Carnegie Mellon University, Department of Psychology, Pittsburgh, PA, USA; Center for the Neural Basis of Cognition, Pittsburgh, PA, USA
| |
Collapse
|
33
|
Gallese V, Gernsbacher MA, Heyes C, Hickok G, Iacoboni M. Mirror Neuron Forum. PERSPECTIVES ON PSYCHOLOGICAL SCIENCE 2015; 6:369-407. [PMID: 25520744 DOI: 10.1177/1745691611413392] [Citation(s) in RCA: 106] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Vittorio Gallese
- Department of Neuroscience, University of Parma, and Italian Institute of Technology Brain Center for Social and Motor Cognition, Parma, Italy
| | | | - Cecilia Heyes
- All Souls College and Department of Experimental Psychology, University of Oxford, United Kingdom
| | - Gregory Hickok
- Center for Cognitive Neuroscience, Department of Cognitive Sciences, University of California, Irvine
| | - Marco Iacoboni
- Ahmanson-Lovelace Brain Mapping Center, Department of Psychiatry and Biobehavioral Sciences, Semel Institute for Neuroscience and Social Behavior, Brain Research Institute, David Geffen School of Medicine, University of California, Los Angeles
| |
Collapse
|
34
|
Gósy M, Horváth V. Speech processing in children with functional articulation disorders. CLINICAL LINGUISTICS & PHONETICS 2015; 29:185-200. [PMID: 25421354 DOI: 10.3109/02699206.2014.983615] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
This study explored auditory speech processing and comprehension abilities in 5-8-year-old monolingual Hungarian children with functional articulation disorders (FADs) and their typically developing peers. Our main hypothesis was that children with FAD would show co-existing auditory speech processing disorders, with different levels of these skills depending on the nature of the receptive processes. The tasks included (i) sentence and non-word repetitions, (ii) non-word discrimination and (iii) sentence and story comprehension. Results suggest that the auditory speech processing of children with FAD is underdeveloped compared with that of typically developing children, and largely varies across task types. In addition, there are differences between children with FAD and controls in all age groups from 5 to 8 years. Our results have several clinical implications.
Collapse
Affiliation(s)
- Mária Gósy
- Phonetics Department, Research Institute for Linguistics, Hungarian Academy of Sciences , Budapest , Hungary
| | | |
Collapse
|
35
|
Banai K, Amitay S. The effects of stimulus variability on the perceptual learning of speech and non-speech stimuli. PLoS One 2015; 10:e0118465. [PMID: 25714552 PMCID: PMC4340624 DOI: 10.1371/journal.pone.0118465] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2014] [Accepted: 01/17/2015] [Indexed: 11/18/2022] Open
Abstract
Previous studies suggest fundamental differences between the perceptual learning of speech and non-speech stimuli. One major difference is in the way variability in the training set affects learning and its generalization to untrained stimuli: training-set variability appears to facilitate speech learning, while slowing or altogether extinguishing non-speech auditory learning. We asked whether the reason for this apparent difference is a consequence of the very different methodologies used in speech and non-speech studies. We hypothesized that speech and non-speech training would result in a similar pattern of learning if they were trained using the same training regimen. We used a 2 (random vs. blocked pre- and post-testing) × 2 (random vs. blocked training) × 2 (speech vs. non-speech discrimination task) study design, yielding 8 training groups. A further 2 groups acted as untrained controls, tested with either random or blocked stimuli. The speech task required syllable discrimination along 4 minimal-pair continua (e.g., bee-dee), and the non-speech stimuli required duration discrimination around 4 base durations (e.g., 50 ms). Training and testing required listeners to pick the odd-one-out of three stimuli, two of which were the base duration or phoneme continuum endpoint and the third varied adaptively. Training was administered in 9 sessions of 640 trials each, spread over 4–8 weeks. Significant learning was only observed following speech training, with similar learning rates and full generalization regardless of whether training used random or blocked schedules. No learning was observed for duration discrimination with either training regimen. We therefore conclude that the two stimulus classes respond differently to the same training regimen. A reasonable interpretation of the findings is that speech is perceived categorically, enabling learning in either paradigm, while the different base durations are not well-enough differentiated to allow for categorization, resulting in disruption to learning.
Collapse
Affiliation(s)
- Karen Banai
- Department of Communication Sciences and Disorders, University of Haifa, Haifa, Israel
- * E-mail: (KB); (SA)
| | - Sygal Amitay
- Medical Research Council—Institute of Hearing Research, Nottingham, United Kingdom
- * E-mail: (KB); (SA)
| |
Collapse
|
36
|
Rimmele JM, Sussman E, Poeppel D. The role of temporal structure in the investigation of sensory memory, auditory scene analysis, and speech perception: a healthy-aging perspective. Int J Psychophysiol 2015; 95:175-83. [PMID: 24956028 PMCID: PMC4272684 DOI: 10.1016/j.ijpsycho.2014.06.010] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2013] [Revised: 06/13/2014] [Accepted: 06/15/2014] [Indexed: 01/08/2023]
Abstract
Listening situations with multiple talkers or background noise are common in everyday communication and are particularly demanding for older adults. Here we review current research on auditory perception in aging individuals in order to gain insights into the challenges of listening under noisy conditions. Informationally rich temporal structure in auditory signals--over a range of time scales from milliseconds to seconds--renders temporal processing central to perception in the auditory domain. We discuss the role of temporal structure in auditory processing, in particular from a perspective relevant for hearing in background noise, and focusing on sensory memory, auditory scene analysis, and speech perception. Interestingly, these auditory processes, usually studied in an independent manner, show considerable overlap of processing time scales, even though each has its own 'privileged' temporal regimes. By integrating perspectives on temporal structure processing in these three areas of investigation, we aim to highlight similarities typically not recognized.
Collapse
Affiliation(s)
- Johanna Maria Rimmele
- Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
| | - Elyse Sussman
- Albert Einstein College of Medicine, Dominick P. Purpura Department of Neuroscience, Bronx, NY, United States
| | - David Poeppel
- Department of Psychology and Center for Neural Science, New York University, New York, NY, United States; Max-Planck Institute for Empirical Aesthetics, Frankfurt, Germany
| |
Collapse
|
37
|
Yi HG, Maddox WT, Mumford JA, Chandrasekaran B. The Role of Corticostriatal Systems in Speech Category Learning. Cereb Cortex 2014; 26:1409-1420. [PMID: 25331600 DOI: 10.1093/cercor/bhu236] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
One of the most difficult category learning problems for humans is learning nonnative speech categories. While feedback-based category training can enhance speech learning, the mechanisms underlying these benefits are unclear. In this functional magnetic resonance imaging study, we investigated neural and computational mechanisms underlying feedback-dependent speech category learning in adults. Positive feedback activated a large corticostriatal network including the dorsolateral prefrontal cortex, inferior parietal lobule, middle temporal gyrus, caudate, putamen, and the ventral striatum. Successful learning was contingent upon the activity of domain-general category learning systems: the fast-learning reflective system, involving the dorsolateral prefrontal cortex that develops and tests explicit rules based on the feedback content, and the slow-learning reflexive system, involving the putamen in which the stimuli are implicitly associated with category responses based on the reward value in feedback. Computational modeling of response strategies revealed significant use of reflective strategies early in training and greater use of reflexive strategies later in training. Reflexive strategy use was associated with increased activation in the putamen. Our results demonstrate a critical role for the reflexive corticostriatal learning system as a function of response strategy and proficiency during speech category learning.
Collapse
Affiliation(s)
- Han-Gyol Yi
- Department of Communication Sciences & Disorders, Moody College of Communication, The University of Texas at Austin, Austin, TX, USA
| | - W Todd Maddox
- Department of Psychology, College of Liberal Arts, The University of Texas at Austin, Austin, TX, USA.,Institute for Mental Health Research, College of Liberal Arts, The University of Texas at Austin, Austin, TX, USA.,The Institute for Neuroscience, The University of Texas at Austin, Austin, TX, USA.,Center for Perceptual Systems, College of Liberal Arts, The University of Texas at Austin, Austin, TX, USA
| | - Jeanette A Mumford
- Department of Psychology, College of Liberal Arts, The University of Texas at Austin, Austin, TX, USA
| | - Bharath Chandrasekaran
- Department of Communication Sciences & Disorders, Moody College of Communication, The University of Texas at Austin, Austin, TX, USA.,Institute for Mental Health Research, College of Liberal Arts, The University of Texas at Austin, Austin, TX, USA.,The Institute for Neuroscience, The University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
38
|
Maddox WT, Chandrasekaran B. Tests of a Dual-systems Model of Speech Category Learning. BILINGUALISM (CAMBRIDGE, ENGLAND) 2014; 17:709-728. [PMID: 25264426 PMCID: PMC4171735 DOI: 10.1017/s1366728913000783] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
In the visual domain, more than two decades of work posits the existence of dual category learning systems. The reflective system uses working memory to develop and test rules for classifying in an explicit fashion. The reflexive system operates by implicitly associating perception with actions that lead to reinforcement. Dual-systems models posit that in learning natural categories, learners initially use the reflective system and with practice, transfer control to the reflexive system. The role of reflective and reflexive systems in second language (L2) speech learning has not been systematically examined. Here monolingual, native speakers of American English were trained to categorize Mandarin tones produced by multiple talkers. Our computational modeling approach demonstrates that learners use reflective and reflexive strategies during tone category learning. Successful learners use talker-dependent, reflective analysis early in training and reflexive strategies by the end of training. Our results demonstrate that dual-learning systems are operative in L2 speech learning. Critically, learner strategies directly relate to individual differences in category learning success.
Collapse
|
39
|
Shen J, Mack ML, Palmeri TJ. Studying real-world perceptual expertise. Front Psychol 2014; 5:857. [PMID: 25147533 PMCID: PMC4123786 DOI: 10.3389/fpsyg.2014.00857] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2014] [Accepted: 07/19/2014] [Indexed: 12/04/2022] Open
Abstract
Significant insights into visual cognition have come from studying real-world perceptual expertise. Many have previously reviewed empirical findings and theoretical developments from this work. Here we instead provide a brief perspective on approaches, considerations, and challenges to studying real-world perceptual expertise. We discuss factors like choosing to use real-world versus artificial object domains of expertise, selecting a target domain of real-world perceptual expertise, recruiting experts, evaluating their level of expertise, and experimentally testing experts in the lab and online. Throughout our perspective, we highlight expert birding (also called birdwatching) as an example, as it has been used as a target domain for over two decades in the perceptual expertise literature.
Collapse
Affiliation(s)
- Jianhong Shen
- Vanderbilt Vision Research Center, Department of Psychology, Vanderbilt UniversityNashville, TN, USA
| | - Michael L. Mack
- Center for Learning and Department of Psychology, The University of Texas at AustinAustin, TX, USA
| | - Thomas J. Palmeri
- Vanderbilt Vision Research Center, Department of Psychology, Vanderbilt UniversityNashville, TN, USA
| |
Collapse
|
40
|
Chandrasekaran B, Koslov SR, Maddox WT. Toward a dual-learning systems model of speech category learning. Front Psychol 2014; 5:825. [PMID: 25132827 PMCID: PMC4116788 DOI: 10.3389/fpsyg.2014.00825] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2014] [Accepted: 07/10/2014] [Indexed: 11/15/2022] Open
Abstract
More than two decades of work in vision posits the existence of dual-learning systems of category learning. The reflective system uses working memory to develop and test rules for classifying in an explicit fashion, while the reflexive system operates by implicitly associating perception with actions that lead to reinforcement. Dual-learning systems models hypothesize that in learning natural categories, learners initially use the reflective system and, with practice, transfer control to the reflexive system. The role of reflective and reflexive systems in auditory category learning and more specifically in speech category learning has not been systematically examined. In this article, we describe a neurobiologically constrained dual-learning systems theoretical framework that is currently being developed in speech category learning and review recent applications of this framework. Using behavioral and computational modeling approaches, we provide evidence that speech category learning is predominantly mediated by the reflexive learning system. In one application, we explore the effects of normal aging on non-speech and speech category learning. Prominently, we find a large age-related deficit in speech learning. The computational modeling suggests that older adults are less likely to transition from simple, reflective, unidimensional rules to more complex, reflexive, multi-dimensional rules. In a second application, we summarize a recent study examining auditory category learning in individuals with elevated depressive symptoms. We find a deficit in reflective-optimal and an enhancement in reflexive-optimal auditory category learning. Interestingly, individuals with elevated depressive symptoms also show an advantage in learning speech categories. We end with a brief summary and description of a number of future directions.
Collapse
Affiliation(s)
- Bharath Chandrasekaran
- SoundBrain Lab, Department of Communication Sciences and Disorders, The University of Texas at AustinAustin, TX, USA
- Institute for Mental Health Research, The University of Texas at AustinAustin, TX, USA
- Institute for Neuroscience, The University of Texas at AustinAustin, TX, USA
- Department of Psychology, The University of Texas at AustinAustin, TX, USA
| | - Seth R. Koslov
- Department of Psychology, The University of Texas at AustinAustin, TX, USA
| | - W. T. Maddox
- Institute for Mental Health Research, The University of Texas at AustinAustin, TX, USA
- Institute for Neuroscience, The University of Texas at AustinAustin, TX, USA
- Department of Psychology, The University of Texas at AustinAustin, TX, USA
| |
Collapse
|
41
|
Leonard MK, Chang EF. Dynamic speech representations in the human temporal lobe. Trends Cogn Sci 2014; 18:472-9. [PMID: 24906217 DOI: 10.1016/j.tics.2014.05.001] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2013] [Revised: 04/30/2014] [Accepted: 05/06/2014] [Indexed: 11/20/2022]
Abstract
Speech perception requires rapid integration of acoustic input with context-dependent knowledge. Recent methodological advances have allowed researchers to identify underlying information representations in primary and secondary auditory cortex and to examine how context modulates these representations. We review recent studies that focus on contextual modulations of neural activity in the superior temporal gyrus (STG), a major hub for spectrotemporal encoding. Recent findings suggest a highly interactive flow of information processing through the auditory ventral stream, including influences of higher-level linguistic and metalinguistic knowledge, even within individual areas. Such mechanisms may give rise to more abstract representations, such as those for words. We discuss the importance of characterizing representations of context-dependent and dynamic patterns of neural activity in the approach to speech perception research.
Collapse
Affiliation(s)
- Matthew K Leonard
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, Room 535, San Francisco, CA 94158, USA
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, Room 535, San Francisco, CA 94158, USA.
| |
Collapse
|
42
|
Prior experience with negative spectral correlations promotes information integration during auditory category learning. Mem Cognit 2014; 41:752-68. [PMID: 23354998 DOI: 10.3758/s13421-013-0294-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Complex sounds vary along a number of acoustic dimensions. These dimensions may exhibit correlations that are familiar to listeners due to their frequent occurrence in natural sounds-namely, speech. However, the precise mechanisms that enable the integration of these dimensions are not well understood. In this study, we examined the categorization of novel auditory stimuli that differed in the correlations of their acoustic dimensions, using decision bound theory. Decision bound theory assumes that stimuli are categorized on the basis of either a single dimension (rule based) or the combination of more than one dimension (information integration) and provides tools for assessing successful integration across multiple acoustic dimensions. In two experiments, we manipulated the stimulus distributions such that in Experiment 1, optimal categorization could be accomplished by either a rule-based or an information integration strategy, while in Experiment 2, optimal categorization was possible only by using an information integration strategy. In both experiments, the pattern of results demonstrated that unidimensional strategies were strongly preferred. Listeners focused on the acoustic dimension most closely related to pitch, suggesting that pitch-based categorization was given preference over timbre-based categorization. Importantly, in Experiment 2, listeners also relied on a two-dimensional information integration strategy, if there was immediate feedback. Furthermore, this strategy was used more often for distributions defined by a negative spectral correlation between stimulus dimensions, as compared with distributions with a positive correlation. These results suggest that prior experience with such correlations might shape short-term auditory category learning.
Collapse
|
43
|
Remez RE, Thomas EF. Early recognition of speech. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2013; 4:213-223. [PMID: 23926454 PMCID: PMC3709124 DOI: 10.1002/wcs.1213] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Classic research on the perception of speech sought to identify minimal acoustic correlates of each consonant and vowel. In explaining perception, this view designated momentary components of an acoustic spectrum as cues to the recognition of elementary phonemes. This conceptualization of speech perception is untenable given the findings of phonetic sensitivity to modulation independent of the acoustic and auditory form of the carrier. The empirical key is provided by studies of the perceptual organization of speech, a low-level integrative function that finds and follows the sensory effects of speech amid concurrent events. These projects have shown that the perceptual organization of speech is keyed to modulation; fast; unlearned; nonsymbolic; indifferent to short-term auditory properties; and organization requires attention. The ineluctably multisensory nature of speech perception also imposes conditions that distinguish language among cognitive systems. WIREs Cogn Sci 2013, 4:213–223. doi: 10.1002/wcs.1213
Collapse
Affiliation(s)
- Robert E Remez
- Department of Psychology and Program in Neuroscience & Behavior, Barnard College, Columbia University, New York, NY, USA
| | - Emily F Thomas
- Department of Psychology and Program in Neuroscience & Behavior, Barnard College, Columbia University, New York, NY, USA
| |
Collapse
|
44
|
Anderson S, White-Schwoch T, Parbery-Clark A, Kraus N. A dynamic auditory-cognitive system supports speech-in-noise perception in older adults. Hear Res 2013; 300:18-32. [PMID: 23541911 DOI: 10.1016/j.heares.2013.03.006] [Citation(s) in RCA: 154] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/30/2012] [Revised: 03/06/2013] [Accepted: 03/12/2013] [Indexed: 11/16/2022]
Abstract
Understanding speech in noise is one of the most complex activities encountered in everyday life, relying on peripheral hearing, central auditory processing, and cognition. These abilities decline with age, and so older adults are often frustrated by a reduced ability to communicate effectively in noisy environments. Many studies have examined these factors independently; in the last decade, however, the idea of an auditory-cognitive system has emerged, recognizing the need to consider the processing of complex sounds in the context of dynamic neural circuits. Here, we used structural equation modeling to evaluate the interacting contributions of peripheral hearing, central processing, cognitive ability, and life experiences to understanding speech in noise. We recruited 120 older adults (ages 55-79) and evaluated their peripheral hearing status, cognitive skills, and central processing. We also collected demographic measures of life experiences, such as physical activity, intellectual engagement, and musical training. In our model, central processing and cognitive function predicted a significant proportion of variance in the ability to understand speech in noise. To a lesser extent, life experience predicted hearing-in-noise ability through modulation of brainstem function. Peripheral hearing levels did not significantly contribute to the model. Previous musical experience modulated the relative contributions of cognitive ability and lifestyle factors to hearing in noise. Our models demonstrate the complex interactions required to hear in noise and the importance of targeting cognitive function, lifestyle, and central auditory processing in the management of individuals who are having difficulty hearing in noise.
Collapse
Affiliation(s)
- Samira Anderson
- Auditory Neuroscience Laboratory, Northwestern University, Evanston, IL 60208, USA
| | | | | | | |
Collapse
|
45
|
Wagner M, Shafer VL, Martin B, Steinschneider M. The phonotactic influence on the perception of a consonant cluster /pt/ by native English and native Polish listeners: a behavioral and event related potential (ERP) study. BRAIN AND LANGUAGE 2012; 123:30-41. [PMID: 22867752 PMCID: PMC3645296 DOI: 10.1016/j.bandl.2012.06.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2011] [Revised: 06/02/2012] [Accepted: 06/15/2012] [Indexed: 06/01/2023]
Abstract
The effect of exposure to the contextual features of the /pt/ cluster was investigated in native-English and native-Polish listeners using behavioral and event-related potential (ERP) methodology. Both groups experience the /pt/ cluster in their languages, but only the Polish group experiences the cluster in the context of word onset examined in the current experiment. The /st/ cluster was used as an experimental control. ERPs were recorded while participants identified the number of syllables in the second word of nonsense word pairs. The results found that only Polish listeners accurately perceived the /pt/ cluster and perception was reflected within a late positive component of the ERP waveform. Furthermore, evidence of discrimination of /pt/ and /pət/ onsets in the neural signal was found even for non-native listeners who could not perceive the difference. These findings suggest that exposure to phoneme sequences in highly specific contexts may be necessary for accurate perception.
Collapse
Affiliation(s)
- Monica Wagner
- The City University of New York - Graduate School and University Center, Program in Speech-Language-Hearing Sciences, NY 10016, USA.
| | | | | | | |
Collapse
|
46
|
Mattys SL, Davis MH, Bradlow AR, Scott SK. Speech recognition in adverse conditions: A review. ACTA ACUST UNITED AC 2012. [DOI: 10.1080/01690965.2012.705006] [Citation(s) in RCA: 164] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
47
|
Zhang C, Peng G, Wang WSY. Unequal effects of speech and nonspeech contexts on the perceptual normalization of Cantonese level tones. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 132:1088-1099. [PMID: 22894228 DOI: 10.1121/1.4731470] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Context is important for recovering language information from talker-induced variability in acoustic signals. In tone perception, previous studies reported similar effects of speech and nonspeech contexts in Mandarin, supporting a general perceptual mechanism underlying tone normalization. However, no supportive evidence was obtained in Cantonese, also a tone language. Moreover, no study has compared speech and nonspeech contexts in the multi-talker condition, which is essential for exploring the normalization mechanism of inter-talker variability in speaking F0. The other question is whether a talker's full F0 range and mean F0 equally facilitate normalization. To answer these questions, this study examines the effects of four context conditions (speech/nonspeech × F0 contour/mean F0) in the multi-talker condition in Cantonese. Results show that raising and lowering the F0 of speech contexts change the perception of identical stimuli from mid level tone to low and high level tone, whereas nonspeech contexts only mildly increase the identification preference. It supports the speech-specific mechanism of tone normalization. Moreover, speech context with flattened F0 trajectory, which neutralizes cues of a talker's full F0 range, fails to facilitate normalization in some conditions, implying that a talker's mean F0 is less efficient for minimizing talker-induced lexical ambiguity in tone perception.
Collapse
Affiliation(s)
- Caicai Zhang
- Language Engineering Laboratory, The Chinese University of Hong Kong, Hong Kong Special Administrative Region.
| | | | | |
Collapse
|
48
|
Adank P. The neural bases of difficult speech comprehension and speech production: Two Activation Likelihood Estimation (ALE) meta-analyses. BRAIN AND LANGUAGE 2012; 122:42-54. [PMID: 22633697 DOI: 10.1016/j.bandl.2012.04.014] [Citation(s) in RCA: 95] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2011] [Revised: 04/16/2012] [Accepted: 04/23/2012] [Indexed: 06/01/2023]
Abstract
The role of speech production mechanisms in difficult speech comprehension is the subject of on-going debate in speech science. Two Activation Likelihood Estimation (ALE) analyses were conducted on neuroimaging studies investigating difficult speech comprehension or speech production. Meta-analysis 1 included 10 studies contrasting comprehension of less intelligible/distorted speech with more intelligible speech. Meta-analysis 2 (21 studies) identified areas associated with speech production. The results indicate that difficult comprehension involves increased reliance of cortical regions in which comprehension and production overlapped (bilateral anterior Superior Temporal Sulcus (STS) and anterior Supplementary Motor Area (pre-SMA)) and in an area associated with intelligibility processing (left posterior MTG), and second involves increased reliance on cortical areas associated with general executive processes (bilateral anterior insulae). Comprehension of distorted speech may be supported by a hybrid neural mechanism combining increased involvement of areas associated with general executive processing and areas shared between comprehension and production.
Collapse
Affiliation(s)
- Patti Adank
- School of Psychological Sciences, University of Manchester, United Kingdom.
| |
Collapse
|
49
|
Lotto A, Holt L. Psychology of auditory perception. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2011; 2:479-489. [PMID: 26302301 DOI: 10.1002/wcs.123] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Audition is often treated as a 'secondary' sensory system behind vision in the study of cognitive science. In this review, we focus on three seemingly simple perceptual tasks to demonstrate the complexity of perceptual-cognitive processing involved in everyday audition. After providing a short overview of the characteristics of sound and their neural encoding, we present a description of the perceptual task of segregating multiple sound events that are mixed together in the signal reaching the ears. Then, we discuss the ability to localize the sound source in the environment. Finally, we provide some data and theory on how listeners categorize complex sounds, such as speech. In particular, we present research on how listeners weigh multiple acoustic cues in making a categorization decision. One conclusion of this review is that it is time for auditory cognitive science to be developed to match what has been done in vision in order for us to better understand how humans communicate with speech and music. WIREs Cogni Sci 2011 2 479-489 DOI: 10.1002/wcs.123 For further resources related to this article, please visit the WIREs website.
Collapse
Affiliation(s)
- Andrew Lotto
- Department of Speech, Language, and Hearing Sciences, Tucson, AZ, USA
| | - Lori Holt
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
50
|
Heimbauer LA, Beran MJ, Owren MJ. A chimpanzee recognizes synthetic speech with significantly reduced acoustic cues to phonetic content. Curr Biol 2011; 21:1210-4. [PMID: 21723125 PMCID: PMC3143218 DOI: 10.1016/j.cub.2011.06.007] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2010] [Revised: 05/03/2011] [Accepted: 06/03/2011] [Indexed: 11/29/2022]
Abstract
A long-standing debate concerns whether humans are specialized for speech perception, which some researchers argue is demonstrated by the ability to understand synthetic speech with significantly reduced acoustic cues to phonetic content. We tested a chimpanzee (Pan troglodytes) that recognizes 128 spoken words, asking whether she could understand such speech. Three experiments presented 48 individual words, with the animal selecting a corresponding visuographic symbol from among four alternatives. Experiment 1 tested spectrally reduced, noise-vocoded (NV) synthesis, originally developed to simulate input received by human cochlear-implant users. Experiment 2 tested "impossibly unspeechlike" sine-wave (SW) synthesis, which reduces speech to just three moving tones. Although receiving only intermittent and noncontingent reward, the chimpanzee performed well above chance level, including when hearing synthetic versions for the first time. Recognition of SW words was least accurate but improved in experiment 3 when natural words in the same session were rewarded. The chimpanzee was more accurate with NV than SW versions, as were 32 human participants hearing these items. The chimpanzee's ability to spontaneously recognize acoustically reduced synthetic words suggests that experience rather than specialization is critical for speech-perception capabilities that some have suggested are uniquely human.
Collapse
Affiliation(s)
- Lisa A. Heimbauer
- Department of Psychology, Georgia State University, PO Box 5010, Atlanta, GA, 30302-5010, USA
- Language Research Center, Georgia State University, 3401 Panthersville Road, Decatur, GA, 30034, USA
| | - Michael J. Beran
- Department of Psychology, Georgia State University, PO Box 5010, Atlanta, GA, 30302-5010, USA
- Language Research Center, Georgia State University, 3401 Panthersville Road, Decatur, GA, 30034, USA
| | - Michael J. Owren
- Department of Psychology, Georgia State University, PO Box 5010, Atlanta, GA, 30302-5010, USA
- Language Research Center, Georgia State University, 3401 Panthersville Road, Decatur, GA, 30034, USA
| |
Collapse
|