1
|
Bröker F, Holt LL, Roads BD, Dayan P, Love BC. Demystifying unsupervised learning: how it helps and hurts. Trends Cogn Sci 2024:S1364-6613(24)00227-4. [PMID: 39353836 DOI: 10.1016/j.tics.2024.09.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Revised: 09/06/2024] [Accepted: 09/09/2024] [Indexed: 10/04/2024]
Abstract
Humans and machines rarely have access to explicit external feedback or supervision, yet manage to learn. Most modern machine learning systems succeed because they benefit from unsupervised data. Humans are also expected to benefit and yet, mysteriously, empirical results are mixed. Does unsupervised learning help humans or not? Here, we argue that the mixed results are not conflicting answers to this question, but reflect that humans self-reinforce their predictions in the absence of supervision, which can help or hurt depending on whether predictions and task align. We use this framework to synthesize empirical results across various domains to clarify when unsupervised learning will help or hurt. This provides new insights into the fundamentals of learning with implications for instruction and lifelong learning.
Collapse
Affiliation(s)
- Franziska Bröker
- Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Tübingen, Germany; Gatsby Computational Neuroscience Unit, University College London, London, UK; Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, USA; Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA.
| | - Lori L Holt
- Department of Psychology, University of Texas at Austin, Austin, TX, US
| | - Brett D Roads
- Department of Experimental Psychology, University College London, London, UK
| | - Peter Dayan
- Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Tübingen, Germany; University of Tübingen, Tübingen, Germany
| | - Bradley C Love
- Department of Experimental Psychology, University College London, London, UK
| |
Collapse
|
2
|
Qi W, Zevin JD. Statistical learning of syllable sequences as trajectories through a perceptual similarity space. Cognition 2024; 244:105689. [PMID: 38219453 DOI: 10.1016/j.cognition.2023.105689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 12/01/2023] [Accepted: 12/06/2023] [Indexed: 01/16/2024]
Abstract
Learning from sequential statistics is a general capacity common across many cognitive domains and species. One form of statistical learning (SL) - learning to segment "words" from continuous streams of speech syllables in which the only segmentation cue is ostensibly the transitional (or conditional) probability from one syllable to the next - has been studied in great detail. Typically, this phenomenon is modeled as the calculation of probabilities over discrete, featureless units. Here we present an alternative model, in which sequences are learned as trajectories through a similarity space. A simple recurrent network coding syllables with representations that capture the similarity relations among them correctly simulated the result of a classic SL study, as did a similar model that encoded syllables as three dimensional points in a continuous similarity space. We then used the simulations to identify a sequence of "words" that produces the reverse of the typical SL effect, i.e., part-words are predicted to be more familiar than Words. Results from two experiments with human participants are consistent with simulation results. Additional analyses identified features that drive differences in what is learned from a set of artificial languages that have the same transitional probabilities among syllables.
Collapse
Affiliation(s)
- Wendy Qi
- Department of Psychology, University of Southern California, 3620 S. McClintock Ave, Los Angeles, CA 90089, United States
| | - Jason D Zevin
- Department of Psychology, University of Southern California, 3620 S. McClintock Ave, Los Angeles, CA 90089, United States.
| |
Collapse
|
3
|
Zhen LQ, Pratt SR. Perceptual, procedural, and task learning for an auditory temporal discrimination task. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:1823. [PMID: 37002097 PMCID: PMC10257527 DOI: 10.1121/10.0017548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 02/22/2023] [Accepted: 02/23/2023] [Indexed: 05/18/2023]
Abstract
Perceptual learning reflects experience-driven improvements in the ability to detect changes in stimulus characteristics. The time course for perceptual learning overlaps with that for procedural learning (acquiring general skills and strategies) and task learning (learning the perceptual judgment specific to the task), making it difficult to isolate their individual effects. This study was conducted to examine the role of exposure to stimulus, procedure, and task information on learning for auditory temporal-interval discrimination. Eighty-three listeners completed five online sessions that required temporal-interval discrimination (target task). Before the initial session, listeners were differentially exposed to information about the target task's stimulus, procedure, or task characteristics. Learning occurred across sessions, but an exposure effect was not observed. Given the significant learning across sessions and variability within and across listeners, contributions from stimulus, procedure, and task exposure to overall learning cannot be discounted. These findings clarify the influence of experience on temporal perceptual learning and could inform designs of training paradigms that optimize perceptual improvements.
Collapse
Affiliation(s)
- Leslie Q Zhen
- Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, Pennsylvania 15213, USA
| | - Sheila R Pratt
- Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, Pennsylvania 15213, USA
| |
Collapse
|
4
|
Incidental auditory category learning and visuomotor sequence learning do not compete for cognitive resources. Atten Percept Psychophys 2023; 85:452-462. [PMID: 36510102 DOI: 10.3758/s13414-022-02616-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/03/2022] [Indexed: 12/15/2022]
Abstract
The environment provides multiple regularities that might be useful in guiding behavior if one was able to learn their structure. Understanding statistical learning across simultaneous regularities is important, but poorly understood. We investigate learning across two domains: visuomotor sequence learning through the serial reaction time (SRT) task, and incidental auditory category learning via the systematic multimodal association reaction time (SMART) task. Several commonalities raise the possibility that these two learning phenomena may draw on common cognitive resources and neural networks. In each, participants are uninformed of the regularities that they come to use to guide actions, the outcomes of which may provide a form of internal feedback. We used dual-task conditions to compare learning of the regularities in isolation versus when they are simultaneously available to support behavior on a seemingly orthogonal visuomotor task. Learning occurred across the simultaneous regularities, without attenuation even when the informational value of a regularity was reduced by the presence of the additional, convergent regularity. Thus, the simultaneous regularities do not compete for associative strength, as in overshadowing effects. Moreover, the visuomotor sequence learning and incidental auditory category learning do not appear to compete for common cognitive resources; learning across the simultaneous regularities was comparable to learning each regularity in isolation.
Collapse
|
5
|
Iverson P, Herrero BP, Katashima A. Memory-card vowel training for child and adult second-language learners: A first report. JASA EXPRESS LETTERS 2023; 3:015202. [PMID: 36725541 DOI: 10.1121/10.0016836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Japanese adults and Spanish-Catalan children received auditory phonetic training for English vowels using a novel paradigm, a version of the common children's card game Concentration. Individuals played a computer-based game in which they turned over pairs of cards to match spoken words, drawn from sets of vowel minimal pairs. The training was effective for adults, improving vowel recognition in a game that did not explicitly require identification. Children likewise improved over time on the memory card game, but not on the present generalisation task. This gamified training method can serve as a platform for examining development and perceptual learning.
Collapse
Affiliation(s)
- Paul Iverson
- Department of Speech Hearing and Phonetic Sciences, University College London, Chandler House, 4 Wakefield Street, London WC1N 1PF, United Kingdom , ,
| | - Begoña Pericas Herrero
- Department of Speech Hearing and Phonetic Sciences, University College London, Chandler House, 4 Wakefield Street, London WC1N 1PF, United Kingdom , ,
| | - Asano Katashima
- Department of Speech Hearing and Phonetic Sciences, University College London, Chandler House, 4 Wakefield Street, London WC1N 1PF, United Kingdom , ,
| |
Collapse
|
6
|
Stilp CE, Shorey AE, King CJ. Nonspeech sounds are not all equally good at being nonspeech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:1842. [PMID: 36182316 DOI: 10.1121/10.0014174] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Accepted: 08/30/2022] [Indexed: 06/16/2023]
Abstract
Perception of speech sounds has a long history of being compared to perception of nonspeech sounds, with rich and enduring debates regarding how closely they share similar underlying processes. In many instances, perception of nonspeech sounds is directly compared to that of speech sounds without a clear explanation of how related these sounds are to the speech they are selected to mirror (or not mirror). While the extreme acoustic variability of speech sounds is well documented, this variability is bounded by the common source of a human vocal tract. Nonspeech sounds do not share a common source, and as such, exhibit even greater acoustic variability than that observed for speech. This increased variability raises important questions about how well perception of a given nonspeech sound might resemble or model perception of speech sounds. Here, we offer a brief review of extremely diverse nonspeech stimuli that have been used in the efforts to better understand perception of speech sounds. The review is organized according to increasing spectrotemporal complexity: random noise, pure tones, multitone complexes, environmental sounds, music, speech excerpts that are not recognized as speech, and sinewave speech. Considerations are offered for stimulus selection in nonspeech perception experiments moving forward.
Collapse
Affiliation(s)
- Christian E Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| | - Anya E Shorey
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| | - Caleb J King
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| |
Collapse
|
7
|
McMurray B. The acquisition of speech categories: Beyond perceptual narrowing, beyond unsupervised learning and beyond infancy. LANGUAGE, COGNITION AND NEUROSCIENCE 2022; 38:419-445. [PMID: 38425732 PMCID: PMC10904032 DOI: 10.1080/23273798.2022.2105367] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 07/01/2022] [Indexed: 03/02/2024]
Abstract
An early achievement in language is carving a variable acoustic space into categories. The canonical story is that infants accomplish this by the second year, when only unsupervised learning is plausible. I challenge this view, synthesizing five lines of developmental, phonetic and computational work. First, unsupervised learning may be insufficient given the statistics of speech (including infant-directed). Second, evidence that infants "have" speech categories rests on tenuous methodological assumptions. Third, the fact that the ecology of the learning environment is unsupervised does not rule out more powerful error driven learning mechanisms. Fourth, several implicit supervisory signals are available to older infants. Finally, development is protracted through adolescence, enabling richer avenues for development. Infancy may be a time of organizing the auditory space, but true categorization only arises via complex developmental cascades later in life. This has implications for critical periods, second language acquisition, and our basic framing of speech perception.
Collapse
Affiliation(s)
- Bob McMurray
- Dept. of Psychological and Brain Sciences, Dept. of Communication Sciences and Disorders, Dept. of Linguistics, University of Iowa and Haskins Laboratories
| |
Collapse
|
8
|
Long-term priors constrain category learning in the context of short-term statistical regularities. Psychon Bull Rev 2022; 29:1925-1937. [PMID: 35524011 DOI: 10.3758/s13423-022-02114-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/27/2022] [Indexed: 11/08/2022]
Abstract
Cognitive systems face a constant tension of maintaining existing representations that have been fine-tuned to long-term input regularities and adapting representations to meet the needs of short-term input that may deviate from long-term norms. Systems must balance the stability of long-term representations with plasticity to accommodate novel contexts. We investigated the interaction between perceptual biases or priors acquired across the long-term and sensitivity to statistical regularities introduced in the short-term. Participants were first passively exposed to short-term acoustic regularities and then learned categories in a supervised training task that either conflicted or aligned with long-term perceptual priors. We found that the long-term priors had robust and pervasive impact on categorization behavior. In contrast, behavior was not influenced by the nature of the short-term passive exposure. These results demonstrate that perceptual priors place strong constraints on the course of learning and that short-term passive exposure to acoustic regularities has limited impact on directing subsequent category learning.
Collapse
|
9
|
de Larrea-Mancera ESL, Philipp MA, Stavropoulos T, Carrillo AA, Cheung S, Koerner TK, Molis MR, Gallun FJ, Seitz AR. Training with an auditory perceptual learning game transfers to speech in competition. JOURNAL OF COGNITIVE ENHANCEMENT 2021; 6:47-66. [PMID: 34568741 PMCID: PMC8453468 DOI: 10.1007/s41465-021-00224-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Accepted: 08/24/2021] [Indexed: 12/23/2022]
Abstract
Understanding speech in the presence of acoustical competition is a major complaint of those with hearing difficulties. Here, a novel perceptual learning game was tested for its effectiveness in reducing difficulties with hearing speech in competition. The game was designed to train a mixture of auditory processing skills thought to underlie speech in competition, such as spectral-temporal processing, sound localization, and auditory working memory. Training on these skills occurred both in quiet and in competition with noise. Thirty college-aged participants without any known hearing difficulties were assigned either to this mixed-training condition or an active control consisting of frequency discrimination training within the same gamified setting. To assess training effectiveness, tests of speech in competition (primary outcome), as well as basic supra-threshold auditory processing and cognitive processing abilities (secondary outcomes) were administered before and after training. Results suggest modest improvements on speech in competition tests in the mixed-training compared to the frequency-discrimination control condition (Cohen’s d = 0.68). While the sample is small, and in normally hearing individuals, these data suggest promise of future study in populations with hearing difficulties.
Collapse
Affiliation(s)
- E Sebastian Lelo de Larrea-Mancera
- Psychology Department, University of California, Riverside, Riverside, CA USA.,Brain Game Center, University of California, Riverside, Riverside, CA USA
| | - Mark A Philipp
- Brain Game Center, University of California, Riverside, Riverside, CA USA
| | | | | | - Sierra Cheung
- Brain Game Center, University of California, Riverside, Riverside, CA USA
| | - Tess K Koerner
- Oregon Health and Science University, Portland, OR USA.,VA RR&D National Center for Rehabilitative Auditory Research, Portland, OR USA
| | - Michelle R Molis
- Oregon Health and Science University, Portland, OR USA.,VA RR&D National Center for Rehabilitative Auditory Research, Portland, OR USA
| | - Frederick J Gallun
- Oregon Health and Science University, Portland, OR USA.,VA RR&D National Center for Rehabilitative Auditory Research, Portland, OR USA
| | - Aaron R Seitz
- Psychology Department, University of California, Riverside, Riverside, CA USA.,Brain Game Center, University of California, Riverside, Riverside, CA USA
| |
Collapse
|
10
|
Desirable and undesirable difficulties: Influences of variability, training schedule, and aptitude on nonnative phonetic learning. Atten Percept Psychophys 2020; 82:2049-2065. [PMID: 31970707 DOI: 10.3758/s13414-019-01925-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Adult listeners often struggle to learn to distinguish speech sounds not present in their native language. High-variability training sets (i.e., stimuli produced by multiple talkers or stimuli that occur in diverse phonological contexts) often result in better retention of the learned information, as well as increased generalization to new instances. However, high-variability training is also more challenging, and not every listener can take advantage of this kind of training. An open question is how variability should be introduced to the learner in order to capitalize on the benefits of such training without derailing the training process. The current study manipulated phonological variability as native English speakers learned a difficult nonnative (Hindi) contrast by presenting the nonnative contrast in the context of two different vowels (/i/ and /u/). In a between-subjects design, variability was manipulated during training and during test. Participants were trained in the evening hours and returned the next morning for reassessment to test for retention of the speech sounds. We found that blocked training was superior to interleaved training for both learning and retention, but for learners in the interleaved training group, higher pretraining aptitude predicted better identification performance. Further, pretraining discrimination aptitude positively predicted changes in phonetic discrimination after a period of off-line consolidation, regardless of the training manipulation. These findings add to a growing literature suggesting that variability may come at a cost in phonetic learning and that aptitude can affect both learning and retention of nonnative speech sounds.
Collapse
|
11
|
Wang FH, Hutton EA, Zevin JD. Statistical Learning of Unfamiliar Sounds as Trajectories Through a Perceptual Similarity Space. Cogn Sci 2020; 43:e12740. [PMID: 31446661 DOI: 10.1111/cogs.12740] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Revised: 04/20/2019] [Accepted: 04/20/2019] [Indexed: 11/28/2022]
Abstract
In typical statistical learning studies, researchers define sequences in terms of the probability of the next item in the sequence given the current item (or items), and they show that high probability sequences are treated as more familiar than low probability sequences. Existing accounts of these phenomena all assume that participants represent statistical regularities more or less as they are defined by the experimenters-as sequential probabilities of symbols in a string. Here we offer an alternative, or possibly supplementary, hypothesis. Specifically, rather than identifying or labeling individual stimuli discretely in order to predict the next item in a sequence, we need only assume that the participant is able to represent the stimuli as evincing particular similarity relations to one another, with sequences represented as trajectories through this similarity space. We present experiments in which this hypothesis makes sharply different predictions from hypotheses based on the assumption that sequences are learned over discrete, labeled stimuli. We also present a series of simulation models that encode stimuli as positions in a continuous two-dimensional space, and predict the next location from the current location. Although no model captures all of the data presented here, the results of three critical experiments are more consistent with the view that participants represent trajectories through similarity space rather than sequences of discrete labels under particular conditions.
Collapse
Affiliation(s)
- Felix Hao Wang
- Department of Psychology, University of Nevada Las Vegas
| | | | - Jason D Zevin
- Department of Psychology, University of Southern California.,Department of Linguistics, University of Southern California.,Haskins Laboratories, New Haven, CT
| |
Collapse
|
12
|
Martinez JS, Holt LL, Reed CM, Tan HZ. Incidental Categorization of Vibrotactile Stimuli. IEEE TRANSACTIONS ON HAPTICS 2020; 13:73-79. [PMID: 31940551 DOI: 10.1109/toh.2020.2965446] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Past research has demonstrated incidental learning of task-irrelevant visual and auditory stimuli. Motivated by the possibility of similar evidence in the tactile domain and potential applications in tactile speech communication systems, we investigated incidental categorization of vibrotactile stimuli through a visuomotor task of shape identification. Two experiments were conducted where participants were exposed to position-based or movement-based vibrotactile stimuli prior to performing a speeded response to one of two targets. The two experiments differed only in the particular sets of such stimuli employed. Unbeknownst to the participants, the vibrotactile stimuli and visual targets were initially correlated perfectly to facilitate the incidental learning of their associations, briefly uncorrelated to check the cost in reaction time, and correlated again to re-establish the initial association. Finally, participants were asked to predict visual targets from novel position-based and movement-based stimuli. The results from both experiments provided evidence of incidental categorization of vibrotactile stimuli. The percent-correct scores and sensitivity indices for the overt categorization of novel stimuli from both experiments were well above chance, indicating generalization of learning. And while both experiments showed an increase in reaction time when the association between vibrotactile stimuli and visual targets was disrupted, this reaction time cost was significant only for the stimuli used in the second experiment. Our finding of incidental categorization in the tactile domain has important implications for the effective acquisition of speech in tactile speech communication systems.
Collapse
|
13
|
Abstract
Human category learning appears to be supported by dual learning systems. Previous research indicates the engagement of distinct neural systems in learning categories that require selective attention to dimensions versus those that require integration across dimensions. This evidence has largely come from studies of learning across perceptually separable visual dimensions, but recent research has applied dual system models to understanding auditory and speech categorization. Since differential engagement of the dual learning systems is closely related to selective attention to input dimensions, it may be important that acoustic dimensions are quite often perceptually integral and difficult to attend to selectively. We investigated this issue across artificial auditory categories defined by center frequency and modulation frequency acoustic dimensions. Learners demonstrated a bias to integrate across the dimensions, rather than to selectively attend, and the bias specifically reflected a positive correlation between the dimensions. Further, we found that the acoustic dimensions did not equivalently contribute to categorization decisions. These results demonstrate the need to reconsider the assumption that the orthogonal input dimensions used in designing an experiment are indeed orthogonal in perceptual space as there are important implications for category learning.
Collapse
|
14
|
Abstract
Humans are born as “universal listeners.” However, over the first year, infants’ perception is shaped by native speech categories. How do these categories naturally emerge without explicit training or overt feedback? Using fMRI, we examined the neural basis of incidental sound category learning as participants played a videogame in which sound category exemplars had functional utility in guiding videogame success. Even without explicit categorization of the sounds, participants learned functionally relevant sound categories that generalized to novel exemplars when exemplars had an organized distributional structure. Critically, the striatum was engaged and functionally connected to the auditory cortex during game play, and this activity and connectivity predicted the learning outcome. These findings elucidate the neural mechanism by which humans incidentally learn “real-world” categories. Humans are born as “universal listeners” without a bias toward any particular language. However, over the first year of life, infants’ perception is shaped by learning native speech categories. Acoustically different sounds—such as the same word produced by different speakers—come to be treated as functionally equivalent. In natural environments, these categories often emerge incidentally without overt categorization or explicit feedback. However, the neural substrates of category learning have been investigated almost exclusively using overt categorization tasks with explicit feedback about categorization decisions. Here, we examined whether the striatum, previously implicated in category learning, contributes to incidental acquisition of sound categories. In the fMRI scanner, participants played a videogame in which sound category exemplars aligned with game actions and events, allowing sound categories to incidentally support successful game play. An experimental group heard nonspeech sound exemplars drawn from coherent category spaces, whereas a control group heard acoustically similar sounds drawn from a less structured space. Although the groups exhibited similar in-game performance, generalization of sound category learning and activation of the posterior striatum were significantly greater in the experimental than control group. Moreover, the experimental group showed brain–behavior relationships related to the generalization of all categories, while in the control group these relationships were restricted to the categories with structured sound distributions. Together, these results demonstrate that the striatum, through its interactions with the left superior temporal sulcus, contributes to incidental acquisition of sound category representations emerging from naturalistic learning environments.
Collapse
|
15
|
Reed CM, Tan HZ, Perez ZD, Wilson EC, Severgnini FM, Jung J, Martinez JS, Jiao Y, Israr A, Lau F, Klumb K, Turcott R, Abnousi F. A Phonemic-Based Tactile Display for Speech Communication. IEEE TRANSACTIONS ON HAPTICS 2019; 12:2-17. [PMID: 30059321 DOI: 10.1109/toh.2018.2861010] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Despite a long history of research, the development of synthetic tactual aids to support the communication of speech has proven to be a difficult task. The current paper describes a new tactile speech device based on the presentation of phonemic-based tactile codes. The device consists of 24 tactors under independent control for stimulation at the forearm. Using properties that include frequency and waveform of stimulation, amplitude, spatial location, and movement characteristics, unique tactile codes were designed for 39 consonant and vowel phonemes of the English language. The strategy for mapping the phonemes to tactile symbols is described, and properties of the individual phonemic codes are provided. Results are reported for an exploratory study of the ability of 10 young adults to identify the tactile symbols. The participants were trained to identify sets of consonants and vowels, before being tested on the full set of 39 tactile codes. The results indicate a mean recognition rate of 86 percent correct within one to four hours of training across participants. Thus, these results support the viability of a phonemic-based approach for conveying speech information through the tactile sense.
Collapse
|
16
|
Abstract
There is substantial evidence that two distinct learning systems are engaged in category learning. One is principally engaged when learning requires selective attention to a single dimension (rule-based), and the other is drawn online by categories requiring integration across two or more dimensions (information-integration). This distinction has largely been drawn from studies of visual categories learned via overt category decisions and explicit feedback. Recent research has extended this model to auditory categories, the nature of which introduces new questions for research. With the present experiment, we addressed the influences of incidental versus overt training and category distribution sampling on learning information-integration and rule-based auditory categories. The results demonstrate that the training task influences category learning, with overt feedback generally outperforming incidental feedback. Additionally, distribution sampling (probabilistic or deterministic) and category type (information-integration or rule-based) both affect how well participants are able to learn. Specifically, rule-based categories are learned equivalently, regardless of distribution sampling, whereas information-integration categories are learned better with deterministic than with probabilistic sampling. The interactions of distribution sampling, category type, and kind of feedback impacted category-learning performance, but these interactions have not yet been integrated into existing category-learning models. These results suggest new dimensions for understanding category learning, inspired by the real-world properties of auditory categories.
Collapse
|
17
|
Quam C, Wang A, Maddox WT, Golisch K, Lotto A. Procedural-Memory, Working-Memory, and Declarative-Memory Skills Are Each Associated With Dimensional Integration in Sound-Category Learning. Front Psychol 2018; 9:1828. [PMID: 30333772 PMCID: PMC6175975 DOI: 10.3389/fpsyg.2018.01828] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2017] [Accepted: 09/07/2018] [Indexed: 11/25/2022] Open
Abstract
This paper investigates relationships between procedural-memory, declarative-memory, and working-memory skills and adult native English speakers' novel sound-category learning. Participants completed a sound-categorization task that required integrating two dimensions: one native (vowel quality), one non-native (pitch). Similar information-integration category structures in the visual and auditory domains have been shown to be best learned implicitly (e.g., Maddox et al., 2006). Thus, we predicted that individuals with greater procedural-memory capacity would better learn sound categories, because procedural memory appears to support implicit learning of new information and integration of dimensions. Seventy undergraduates were tested across two experiments. Procedural memory was assessed using a linguistic adaptation of the serial-reaction-time task (Misyak et al., 2010a,b). Declarative memory was assessed using the logical-memory subtest of the Wechsler Memory Scale-4th edition (WMS-IV; Wechsler, 2009). Working memory was assessed using an auditory version of the reading-span task (Kane et al., 2004). Experiment 1 revealed contributions of only declarative memory to dimensional integration, which might indicate not enough time or motivation to shift over to a procedural/integrative strategy. Experiment 2 gave twice the speech-sound training, distributed over 2 days, and also attempted to train at the category boundary. As predicted, effects of declarative memory were removed and effects of procedural memory emerged, but, unexpectedly, new effects of working memory surfaced. The results may be compatible with a multiple-systems account in which declarative and working memory facilitate transfer of control to the procedural system.
Collapse
Affiliation(s)
- Carolyn Quam
- Department of Speech and Hearing Sciences, Portland State University, Portland, OR, United States
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, AZ, United States
- Department of Psychology, University of Arizona, Tucson, AZ, United States
| | - Alisa Wang
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, AZ, United States
| | - W. Todd Maddox
- Cognitive Design and Statistical Consulting, LLC., Austin, TX, United States
| | - Kimberly Golisch
- Department of Psychology, University of Arizona, Tucson, AZ, United States
- College of Medicine–Tucson, University of Arizona, Tucson, AZ, United States
| | - Andrew Lotto
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, AZ, United States
- Department of Speech, Language, and Hearing Sciences, University of Florida, Gainesville, FL, United States
| |
Collapse
|
18
|
Holt LL, Tierney AT, Guerra G, Laffere A, Dick F. Dimension-selective attention as a possible driver of dynamic, context-dependent re-weighting in speech processing. Hear Res 2018; 366:50-64. [PMID: 30131109 PMCID: PMC6107307 DOI: 10.1016/j.heares.2018.06.014] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Revised: 06/10/2018] [Accepted: 06/19/2018] [Indexed: 12/24/2022]
Abstract
The contribution of acoustic dimensions to an auditory percept is dynamically adjusted and reweighted based on prior experience about how informative these dimensions are across the long-term and short-term environment. This is especially evident in speech perception, where listeners differentially weight information across multiple acoustic dimensions, and use this information selectively to update expectations about future sounds. The dynamic and selective adjustment of how acoustic input dimensions contribute to perception has made it tempting to conceive of this as a form of non-spatial auditory selective attention. Here, we review several human speech perception phenomena that might be consistent with auditory selective attention although, as of yet, the literature does not definitively support a mechanistic tie. We relate these human perceptual phenomena to illustrative nonhuman animal neurobiological findings that offer informative guideposts in how to test mechanistic connections. We next present a novel empirical approach that can serve as a methodological bridge from human research to animal neurobiological studies. Finally, we describe four preliminary results that demonstrate its utility in advancing understanding of human non-spatial dimension-based auditory selective attention.
Collapse
Affiliation(s)
- Lori L Holt
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, 15213, USA; Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.
| | - Adam T Tierney
- Department of Psychological Sciences, Birkbeck College, University of London, London, WC1E 7HX, UK; Centre for Brain and Cognitive Development, Birkbeck College, London, WC1E 7HX, UK
| | - Giada Guerra
- Department of Psychological Sciences, Birkbeck College, University of London, London, WC1E 7HX, UK; Centre for Brain and Cognitive Development, Birkbeck College, London, WC1E 7HX, UK
| | - Aeron Laffere
- Department of Psychological Sciences, Birkbeck College, University of London, London, WC1E 7HX, UK
| | - Frederic Dick
- Department of Psychological Sciences, Birkbeck College, University of London, London, WC1E 7HX, UK; Centre for Brain and Cognitive Development, Birkbeck College, London, WC1E 7HX, UK; Department of Experimental Psychology, University College London, London, WC1H 0AP, UK
| |
Collapse
|
19
|
Filippi P, Laaha S, Fitch WT. Utterance-final position and pitch marking aid word learning in school-age children. ROYAL SOCIETY OPEN SCIENCE 2017; 4:161035. [PMID: 28878961 PMCID: PMC5579076 DOI: 10.1098/rsos.161035] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/14/2016] [Accepted: 07/17/2017] [Indexed: 06/07/2023]
Abstract
We investigated the effects of word order and prosody on word learning in school-age children. Third graders viewed photographs belonging to one of three semantic categories while hearing four-word nonsense utterances containing a target word. In the control condition, all words had the same pitch and, across trials, the position of the target word was varied systematically within each utterance. The only cue to word-meaning mapping was the co-occurrence of target words and referents. This cue was present in all conditions. In the Utterance-final condition, the target word always occurred in utterance-final position, and at the same fundamental frequency as all the other words of the utterance. In the Pitch peak condition, the position of the target word was varied systematically within each utterance across trials, and produced with pitch contrasts typical of infant-directed speech (IDS). In the Pitch peak + Utterance-final condition, the target word always occurred in utterance-final position, and was marked with a pitch contrast typical of IDS. Word learning occurred in all conditions except the control condition. Moreover, learning performance was significantly higher than that observed with simple co-occurrence (control condition) only for the Pitch peak + Utterance-final condition. We conclude that, for school-age children, the combination of words' utterance-final alignment and pitch enhancement boosts word learning.
Collapse
Affiliation(s)
- Piera Filippi
- Department of Cognitive Biology, University of Vienna, Vienna, Austria
| | - Sabine Laaha
- Department of Linguistics, University of Vienna, Vienna, Austria
| | - W. Tecumseh Fitch
- Department of Cognitive Biology, University of Vienna, Vienna, Austria
| |
Collapse
|
20
|
Sun Y, Hickey TJ, Shinn-Cunningham B, Sekuler R. Catching Audiovisual Interactions With a First-Person Fisherman Video Game. Perception 2016. [DOI: 10.1177/0301006616682755] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The human brain is excellent at integrating information from different sources across multiple sensory modalities. To examine one particularly important form of multisensory interaction, we manipulated the temporal correlation between visual and auditory stimuli in a first-person fisherman video game. Subjects saw rapidly swimming fish whose size oscillated, either at 6 or 8 Hz. Subjects categorized each fish according to its rate of size oscillation, while trying to ignore a concurrent broadband sound seemingly emitted by the fish. In three experiments, categorization was faster and more accurate when the rate at which a fish oscillated in size matched the rate at which the accompanying, task-irrelevant sound was amplitude modulated. Control conditions showed that the difference between responses to matched and mismatched audiovisual signals reflected a performance gain in the matched condition, rather than a cost from the mismatched condition. The performance advantage with matched audiovisual signals was remarkably robust over changes in task demands between experiments. Performance with matched or unmatched audiovisual signals improved over successive trials at about the same rate, emblematic of perceptual learning in which visual oscillation rate becomes more discriminable with experience. Finally, analysis at the level of individual subjects’ performance pointed to differences in the rates at which subjects can extract information from audiovisual stimuli.
Collapse
Affiliation(s)
- Yile Sun
- Volen Center for Complex Systems, Brandeis University, Waltham, MA, USA
| | - Timothy J. Hickey
- Department of Computer Science, Brandeis University, Waltham, MA, USA
| | | | - Robert Sekuler
- Volen Center for Complex Systems, Brandeis University, Waltham, MA, USA
| |
Collapse
|
21
|
Chandrasekaran B, Yi HG, Smayda KE, Maddox WT. Effect of explicit dimensional instruction on speech category learning. Atten Percept Psychophys 2016; 78:566-82. [PMID: 26542400 PMCID: PMC4744489 DOI: 10.3758/s13414-015-0999-x] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Learning nonnative speech categories is often considered a challenging task in adulthood. This difficulty is driven by cross-language differences in weighting critical auditory dimensions that differentiate speech categories. For example, previous studies have shown that differentiating Mandarin tonal categories requires attending to dimensions related to pitch height and direction. Relative to native speakers of Mandarin, the pitch direction dimension is underweighted by native English speakers. In the current study, we examined the effect of explicit instructions (dimension instruction) on native English speakers' Mandarin tone category learning within the framework of a dual-learning systems (DLS) model. This model predicts that successful speech category learning is initially mediated by an explicit, reflective learning system that frequently utilizes unidimensional rules, with an eventual switch to a more implicit, reflexive learning system that utilizes multidimensional rules. Participants were explicitly instructed to focus and/or ignore the pitch height dimension, the pitch direction dimension, or were given no explicit prime. Our results show that instruction instructing participants to focus on pitch direction, and instruction diverting attention away from pitch height, resulted in enhanced tone categorization. Computational modeling of participant responses suggested that instruction related to pitch direction led to faster and more frequent use of multidimensional reflexive strategies and enhanced perceptual selectivity along the previously underweighted pitch direction dimension.
Collapse
Affiliation(s)
- Bharath Chandrasekaran
- Department of Communication Sciences and Disorders, The University of Texas at Austin, 2504A Whitis Ave., Austin, TX, 78712, USA.
- Department of Psychology, The University of Texas at Austin, 2504A Whitis Ave., Austin, TX, 78712, USA.
| | - Han-Gyol Yi
- Department of Communication Sciences and Disorders, The University of Texas at Austin, 2504A Whitis Ave., Austin, TX, 78712, USA
| | - Kirsten E Smayda
- Department of Psychology, The University of Texas at Austin, 2504A Whitis Ave., Austin, TX, 78712, USA
| | - W Todd Maddox
- Department of Psychology, The University of Texas at Austin, 2504A Whitis Ave., Austin, TX, 78712, USA
| |
Collapse
|
22
|
Gabay Y, Holt LL. Incidental learning of sound categories is impaired in developmental dyslexia. Cortex 2015; 73:131-43. [PMID: 26409017 DOI: 10.1016/j.cortex.2015.08.008] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Revised: 06/09/2015] [Accepted: 08/07/2015] [Indexed: 11/29/2022]
Abstract
Developmental dyslexia is commonly thought to arise from specific phonological impairments. However, recent evidence is consistent with the possibility that phonological impairments arise as symptoms of an underlying dysfunction of procedural learning. The nature of the link between impaired procedural learning and phonological dysfunction is unresolved. Motivated by the observation that speech processing involves the acquisition of procedural category knowledge, the present study investigates the possibility that procedural learning impairment may affect phonological processing by interfering with the typical course of phonetic category learning. The present study tests this hypothesis while controlling for linguistic experience and possible speech-specific deficits by comparing auditory category learning across artificial, nonlinguistic sounds among dyslexic adults and matched controls in a specialized first-person shooter videogame that has been shown to engage procedural learning. Nonspeech auditory category learning was assessed online via within-game measures and also with a post-training task involving overt categorization of familiar and novel sound exemplars. Each measure reveals that dyslexic participants do not acquire procedural category knowledge as effectively as age- and cognitive-ability matched controls. This difference cannot be explained by differences in perceptual acuity for the sounds. Moreover, poor nonspeech category learning is associated with slower phonological processing. Whereas phonological processing impairments have been emphasized as the cause of dyslexia, the current results suggest that impaired auditory category learning, general in nature and not specific to speech signals, could contribute to phonological deficits in dyslexia with subsequent negative effects on language acquisition and reading. Implications for the neuro-cognitive mechanisms of developmental dyslexia are discussed.
Collapse
Affiliation(s)
- Yafit Gabay
- Carnegie Mellon University, Department of Psychology, Pittsburgh, PA, USA; Center for the Neural Basis of Cognition, Pittsburgh, PA, USA.
| | - Lori L Holt
- Carnegie Mellon University, Department of Psychology, Pittsburgh, PA, USA; Center for the Neural Basis of Cognition, Pittsburgh, PA, USA
| |
Collapse
|
23
|
Abstract
Language learning requires that listeners discover acoustically variable functional units like phonetic categories and words from an unfamiliar, continuous acoustic stream. Although many category learning studies have examined how listeners learn to generalize across the acoustic variability inherent in the signals that convey the functional units of language, these studies have tended to focus upon category learning across isolated sound exemplars. However, continuous input presents many additional learning challenges that may impact category learning. Listeners may not know the timescale of the functional unit, its relative position in the continuous input, or its relationship to other evolving input regularities. Moving laboratory-based studies of isolated category exemplars toward more natural input is important to modeling language learning, but very little is known about how listeners discover categories embedded in continuous sound. In 3 experiments, adult participants heard acoustically variable sound category instances embedded in acoustically variable and unfamiliar sound streams within a video game task. This task was inherently rich in multisensory regularities with the to-be-learned categories and likely to engage procedural learning without requiring explicit categorization, segmentation, or even attention to the sounds. After 100 min of game play, participants categorized familiar sound streams in which target words were embedded and generalized this learning to novel streams as well as isolated instances of the target words. The findings demonstrate that even without a priori knowledge, listeners can discover input regularities that have the best predictive control over the environment for both non-native speech and nonspeech signals, emphasizing the generality of the learning.
Collapse
Affiliation(s)
- Sung-Joo Lim
- Department of Psychology, Carnegie Mellon University
| | | | - Lori L Holt
- Department of Psychology, Carnegie Mellon University
| |
Collapse
|
24
|
Jones AB, Farrall AJ, Belin P, Pernet CR. Hemispheric association and dissociation of voice and speech information processing in stroke. Cortex 2015; 71:232-9. [PMID: 26247409 DOI: 10.1016/j.cortex.2015.07.004] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2015] [Revised: 05/22/2015] [Accepted: 07/06/2015] [Indexed: 11/18/2022]
Abstract
As we listen to someone speaking, we extract both linguistic and non-linguistic information. Knowing how these two sets of information are processed in the brain is fundamental for the general understanding of social communication, speech recognition and therapy of language impairments. We investigated the pattern of performances in phoneme versus gender categorization in left and right hemisphere stroke patients, and found an anatomo-functional dissociation in the right frontal cortex, establishing a new syndrome in voice discrimination abilities. In addition, phoneme and gender performances were most often associated than dissociated in the left hemisphere patients, suggesting a common neural underpinnings.
Collapse
Affiliation(s)
- Anna B Jones
- Brain Research Imaging Centre, The University of Edinburgh, UK; Centre for Clinical Brain Sciences, The University of Edinburgh, UK
| | - Andrew J Farrall
- Brain Research Imaging Centre, The University of Edinburgh, UK; Centre for Clinical Brain Sciences, The University of Edinburgh, UK
| | - Pascal Belin
- Institute of Neuroscience and Psychology, University of Glasgow, UK; Institut des Neurosciences de La Timone, UMR 7289, CNRS & Université Aix-Marseille, France
| | - Cyril R Pernet
- Brain Research Imaging Centre, The University of Edinburgh, UK; Centre for Clinical Brain Sciences, The University of Edinburgh, UK.
| |
Collapse
|
25
|
Gabay Y, Dick FK, Zevin JD, Holt LL. Incidental auditory category learning. J Exp Psychol Hum Percept Perform 2015; 41:1124-38. [PMID: 26010588 DOI: 10.1037/xhp0000073] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Very little is known about how auditory categories are learned incidentally, without instructions to search for category-diagnostic dimensions, overt category decisions, or experimenter-provided feedback. This is an important gap because learning in the natural environment does not arise from explicit feedback and there is evidence that the learning systems engaged by traditional tasks are distinct from those recruited by incidental category learning. We examined incidental auditory category learning with a novel paradigm, the Systematic Multimodal Associations Reaction Time (SMART) task, in which participants rapidly detect and report the appearance of a visual target in 1 of 4 possible screen locations. Although the overt task is rapid visual detection, a brief sequence of sounds precedes each visual target. These sounds are drawn from 1 of 4 distinct sound categories that predict the location of the upcoming visual target. These many-to-one auditory-to-visuomotor correspondences support incidental auditory category learning. Participants incidentally learn categories of complex acoustic exemplars and generalize this learning to novel exemplars and tasks. Further, learning is facilitated when category exemplar variability is more tightly coupled to the visuomotor associations than when the same stimulus variability is experienced across trials. We relate these findings to phonetic category learning.
Collapse
Affiliation(s)
| | - Frederic K Dick
- Department of Psychological Sciences, Birkbeck College, University of London
| | - Jason D Zevin
- Department of Psychology, University of Southern California
| | | |
Collapse
|
26
|
Myers EB. Emergence of category-level sensitivities in non-native speech sound learning. Front Neurosci 2014; 8:238. [PMID: 25152708 PMCID: PMC4125857 DOI: 10.3389/fnins.2014.00238] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Accepted: 07/20/2014] [Indexed: 11/23/2022] Open
Abstract
Over the course of development, speech sounds that are contrastive in one's native language tend to become perceived categorically: that is, listeners are unaware of variation within phonetic categories while showing excellent sensitivity to speech sounds that span linguistically meaningful phonetic category boundaries. The end stage of this developmental process is that the perceptual systems that handle acoustic-phonetic information show special tuning to native language contrasts, and as such, category-level information appears to be present at even fairly low levels of the neural processing stream. Research on adults acquiring non-native speech categories offers an avenue for investigating the interplay of category-level information and perceptual sensitivities to these sounds as speech categories emerge. In particular, one can observe the neural changes that unfold as listeners learn not only to perceive acoustic distinctions that mark non-native speech sound contrasts, but also to map these distinctions onto category-level representations. An emergent literature on the neural basis of novel and non-native speech sound learning offers new insight into this question. In this review, I will examine this literature in order to answer two key questions. First, where in the neural pathway does sensitivity to category-level phonetic information first emerge over the trajectory of speech sound learning? Second, how do frontal and temporal brain areas work in concert over the course of non-native speech sound learning? Finally, in the context of this literature I will describe a model of speech sound learning in which rapidly-adapting access to categorical information in the frontal lobes modulates the sensitivity of stable, slowly-adapting responses in the temporal lobes.
Collapse
Affiliation(s)
- Emily B Myers
- Department of Speech, Language, and Hearing Sciences, University of Connecticut Storrs, CT, USA ; Department of Psychology, University of Connecticut Storrs, CT, USA ; Haskins Laboratories New Haven, CT, USA
| |
Collapse
|
27
|
Lim SJ, Fiez JA, Holt LL. How may the basal ganglia contribute to auditory categorization and speech perception? Front Neurosci 2014; 8:230. [PMID: 25136291 PMCID: PMC4117994 DOI: 10.3389/fnins.2014.00230] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2014] [Accepted: 07/13/2014] [Indexed: 02/01/2023] Open
Abstract
Listeners must accomplish two complementary perceptual feats in extracting a message from speech. They must discriminate linguistically-relevant acoustic variability and generalize across irrelevant variability. Said another way, they must categorize speech. Since the mapping of acoustic variability is language-specific, these categories must be learned from experience. Thus, understanding how, in general, the auditory system acquires and represents categories can inform us about the toolbox of mechanisms available to speech perception. This perspective invites consideration of findings from cognitive neuroscience literatures outside of the speech domain as a means of constraining models of speech perception. Although neurobiological models of speech perception have mainly focused on cerebral cortex, research outside the speech domain is consistent with the possibility of significant subcortical contributions in category learning. Here, we review the functional role of one such structure, the basal ganglia. We examine research from animal electrophysiology, human neuroimaging, and behavior to consider characteristics of basal ganglia processing that may be advantageous for speech category learning. We also present emerging evidence for a direct role for basal ganglia in learning auditory categories in a complex, naturalistic task intended to model the incidental manner in which speech categories are acquired. To conclude, we highlight new research questions that arise in incorporating the broader neuroscience research literature in modeling speech perception, and suggest how understanding contributions of the basal ganglia can inform attempts to optimize training protocols for learning non-native speech categories in adulthood.
Collapse
Affiliation(s)
- Sung-Joo Lim
- Department of Psychology, Carnegie Mellon University Pittsburgh, PA, USA ; Department of Neuroscience, Center for the Neural Basis of Cognition, University of Pittsburgh Pittsburgh, PA, USA
| | - Julie A Fiez
- Department of Neuroscience, Center for the Neural Basis of Cognition, University of Pittsburgh Pittsburgh, PA, USA ; Department of Neuroscience, Center for Neuroscience, University of Pittsburgh Pittsburgh, PA, USA ; Department of Psychology, University of Pittsburgh Pittsburgh, PA, USA
| | - Lori L Holt
- Department of Psychology, Carnegie Mellon University Pittsburgh, PA, USA ; Department of Neuroscience, Center for the Neural Basis of Cognition, University of Pittsburgh Pittsburgh, PA, USA ; Department of Neuroscience, Center for Neuroscience, University of Pittsburgh Pittsburgh, PA, USA
| |
Collapse
|
28
|
Carbonell KM, Lotto AJ. Speech is not special… again. Front Psychol 2014; 5:427. [PMID: 24917830 PMCID: PMC4042079 DOI: 10.3389/fpsyg.2014.00427] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2014] [Accepted: 04/22/2014] [Indexed: 11/13/2022] Open
Affiliation(s)
| | - Andrew J. Lotto
- Department of Speech, Language and Hearing Sciences, University of ArizonaTucson, AZ, USA
| |
Collapse
|
29
|
Meuwese JDI, Post RAG, Scholte HS, Lamme VAF. Does Perceptual Learning Require Consciousness or Attention? J Cogn Neurosci 2013; 25:1579-96. [PMID: 23691987 DOI: 10.1162/jocn_a_00424] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Abstract
It has been proposed that visual attention and consciousness are separate [Koch, C., & Tsuchiya, N. Attention and consciousness: Two distinct brain processes. Trends in Cognitive Sciences, 11, 16–22, 2007] and possibly even orthogonal processes [Lamme, V. A. F. Why visual attention and awareness are different. Trends in Cognitive Sciences, 7, 12–18, 2003]. Attention and consciousness converge when conscious visual percepts are attended and hence become available for conscious report. In such a view, a lack of reportability can have two causes: the absence of attention or the absence of a conscious percept. This raises an important question in the field of perceptual learning. It is known that learning can occur in the absence of reportability [Gutnisky, D. A., Hansen, B. J., Iliescu, B. F., & Dragoi, V. Attention alters visual plasticity during exposure-based learning. Current Biology, 19, 555–560, 2009; Seitz, A. R., Kim, D., & Watanabe, T. Rewards evoke learning of unconsciously processed visual stimuli in adult humans. Neuron, 61, 700–707, 2009; Seitz, A. R., & Watanabe, T. Is subliminal learning really passive? Nature, 422, 36, 2003; Watanabe, T., Náñez, J. E., & Sasaki, Y. Perceptual learning without perception. Nature, 413, 844–848, 2001], but it is unclear which of the two ingredients—consciousness or attention—is not necessary for learning. We presented textured figure-ground stimuli and manipulated reportability either by masking (which only interferes with consciousness) or with an inattention paradigm (which only interferes with attention). During the second session (24 hr later), learning was assessed neurally and behaviorally, via differences in figure-ground ERPs and via a detection task. Behavioral and neural learning effects were found for stimuli presented in the inattention paradigm and not for masked stimuli. Interestingly, the behavioral learning effect only became apparent when performance feedback was given on the task to measure learning, suggesting that the memory trace that is formed during inattention is latent until accessed. The results suggest that learning requires consciousness, and not attention, and further strengthen the idea that consciousness is separate from attention.
Collapse
Affiliation(s)
- Julia D I Meuwese
- Department of Psychology, University of Amsterdam, Amsterdam, Netherlands.
| | | | | | | |
Collapse
|
30
|
Emberson LL, Liu R, Zevin JD. Is statistical learning constrained by lower level perceptual organization? Cognition 2013; 128:82-102. [PMID: 23618755 PMCID: PMC4020322 DOI: 10.1016/j.cognition.2012.12.006] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2010] [Revised: 11/15/2012] [Accepted: 12/21/2012] [Indexed: 11/19/2022]
Abstract
In order for statistical information to aid in complex developmental processes such as language acquisition, learning from higher-order statistics (e.g. across successive syllables in a speech stream to support segmentation) must be possible while perceptual abilities (e.g. speech categorization) are still developing. The current study examines how perceptual organization interacts with statistical learning. Adult participants were presented with multiple exemplars from novel, complex sound categories designed to reflect some of the spectral complexity and variability of speech. These categories were organized into sequential pairs and presented such that higher-order statistics, defined based on sound categories, could support stream segmentation. Perceptual similarity judgments and multi-dimensional scaling revealed that participants only perceived three perceptual clusters of sounds and thus did not distinguish the four experimenter-defined categories, creating a tension between lower level perceptual organization and higher-order statistical information. We examined whether the resulting pattern of learning is more consistent with statistical learning being "bottom-up," constrained by the lower levels of organization, or "top-down," such that higher-order statistical information of the stimulus stream takes priority over perceptual organization and perhaps influences perceptual organization. We consistently find evidence that learning is constrained by perceptual organization. Moreover, participants generalize their learning to novel sounds that occupy a similar perceptual space, suggesting that statistical learning occurs based on regions of or clusters in perceptual space. Overall, these results reveal a constraint on learning of sound sequences such that statistical information is determined based on lower level organization. These findings have important implications for the role of statistical learning in language acquisition.
Collapse
Affiliation(s)
- Lauren L Emberson
- Brain and Cognitive Sciences Department, University of Rochester, United States.
| | | | | |
Collapse
|
31
|
Spectral information in nonspeech contexts influences children's categorization of ambiguous speech sounds. J Exp Child Psychol 2013; 116:728-37. [PMID: 23827642 DOI: 10.1016/j.jecp.2013.05.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2012] [Revised: 05/20/2013] [Accepted: 05/26/2013] [Indexed: 11/20/2022]
Abstract
For both adults and children, acoustic context plays an important role in speech perception. For adults, both speech and nonspeech acoustic contexts influence perception of subsequent speech items, consistent with the argument that effects of context are due to domain-general auditory processes. However, prior research examining the effects of context on children's speech perception have focused on speech contexts; nonspeech contexts have not been explored previously. To better understand the developmental progression of children's use of contexts in speech perception and the mechanisms underlying that development, we created a novel experimental paradigm testing 5-year-old children's speech perception in several acoustic contexts. The results demonstrated that nonspeech context influences children's speech perception, consistent with claims that context effects arise from general auditory system properties rather than speech-specific mechanisms. This supports theoretical accounts of language development suggesting that domain-general processes play a role across the lifespan.
Collapse
|
32
|
Categorical vowel perception enhances the effectiveness and generalization of auditory feedback in human-machine-interfaces. PLoS One 2013; 8:e59860. [PMID: 23527278 PMCID: PMC3602293 DOI: 10.1371/journal.pone.0059860] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2012] [Accepted: 02/19/2013] [Indexed: 11/19/2022] Open
Abstract
Human-machine interface (HMI) designs offer the possibility of improving quality of life for patient populations as well as augmenting normal user function. Despite pragmatic benefits, utilizing auditory feedback for HMI control remains underutilized, in part due to observed limitations in effectiveness. The goal of this study was to determine the extent to which categorical speech perception could be used to improve an auditory HMI. Using surface electromyography, 24 healthy speakers of American English participated in 4 sessions to learn to control an HMI using auditory feedback (provided via vowel synthesis). Participants trained on 3 targets in sessions 1–3 and were tested on 3 novel targets in session 4. An “established categories with text cues” group of eight participants were trained and tested on auditory targets corresponding to standard American English vowels using auditory and text target cues. An “established categories without text cues” group of eight participants were trained and tested on the same targets using only auditory cuing of target vowel identity. A “new categories” group of eight participants were trained and tested on targets that corresponded to vowel-like sounds not part of American English. Analyses of user performance revealed significant effects of session and group (established categories groups and the new categories group), and a trend for an interaction between session and group. Results suggest that auditory feedback can be effectively used for HMI operation when paired with established categorical (native vowel) targets with an unambiguous cue.
Collapse
|
33
|
|
34
|
Escudero P, Benders T, Wanrooij K. Enhanced bimodal distributions facilitate the learning of second language vowels. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 130:EL206-EL212. [PMID: 21974493 DOI: 10.1121/1.3629144] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2011] [Accepted: 07/01/2011] [Indexed: 05/31/2023]
Abstract
This study addresses the questions of whether listening to a bimodal distribution of vowels improves adult learners' categorization of a difficult L2 vowel contrast and whether enhancing the acoustic differences between the vowels in the distribution yields better categorization performance. Spanish learners of Dutch were trained on a natural bimodal or an enhanced bimodal distribution of the Dutch vowels /ɑ/ and /aː/, with the average productions of the vowels or more extreme values as the endpoints respectively. Categorization improved for learners who listened to the enhanced distribution, which suggests that adults profit from input with properties similar to infant-directed speech.
Collapse
Affiliation(s)
- Paola Escudero
- MARCS Auditory Laboratories, Building 1, University of Western Sydney, Bullecourt Avenue, Milperra, New South Wales 2214, Australia.
| | | | | |
Collapse
|
35
|
Ettlinger M, Margulis EH, Wong PCM. Implicit memory in music and language. Front Psychol 2011; 2:211. [PMID: 21927608 PMCID: PMC3170172 DOI: 10.3389/fpsyg.2011.00211] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2011] [Accepted: 08/15/2011] [Indexed: 11/13/2022] Open
Abstract
Research on music and language in recent decades has focused on their overlapping neurophysiological, perceptual, and cognitive underpinnings, ranging from the mechanism for encoding basic auditory cues to the mechanism for detecting violations in phrase structure. These overlaps have most often been identified in musicians with musical knowledge that was acquired explicitly, through formal training. In this paper, we review independent bodies of work in music and language that suggest an important role for implicitly acquired knowledge, implicit memory, and their associated neural structures in the acquisition of linguistic or musical grammar. These findings motivate potential new work that examines music and language comparatively in the context of the implicit memory system.
Collapse
Affiliation(s)
- Marc Ettlinger
- Roxelyn and Richard Pepper Department of Communication Sciences and Disorders, Northwestern UniversityEvanston, IL, USA
| | | | - Patrick C. M. Wong
- Roxelyn and Richard Pepper Department of Communication Sciences and Disorders, Northwestern UniversityEvanston, IL, USA
- Department of Otolaryngology – Head and Neck Surgery, Feinberg School of Medicine, Northwestern UniversityChicago, IL, USA
| |
Collapse
|
36
|
Lim SJ, Holt LL. Learning foreign sounds in an alien world: videogame training improves non-native speech categorization. Cogn Sci 2011; 35:1390-405. [PMID: 21827533 DOI: 10.1111/j.1551-6709.2011.01192.x] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Although speech categories are defined by multiple acoustic dimensions, some are perceptually weighted more than others and there are residual effects of native-language weightings in non-native speech perception. Recent research on nonlinguistic sound category learning suggests that the distribution characteristics of experienced sounds influence perceptual cue weights: Increasing variability across a dimension leads listeners to rely upon it less in subsequent category learning (Holt & Lotto, 2006). The present experiment investigated the implications of this among native Japanese learning English /r/-/l/ categories. Training was accomplished using a videogame paradigm that emphasizes associations among sound categories, visual information, and players' responses to videogame characters rather than overt categorization or explicit feedback. Subjects who played the game for 2.5h across 5 days exhibited improvements in /r/-/l/ perception on par with 2-4 weeks of explicit categorization training in previous research and exhibited a shift toward more native-like perceptual cue weights.
Collapse
Affiliation(s)
- Sung-joo Lim
- Department of Psychology, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA 15213, USA.
| | | |
Collapse
|
37
|
Abstract
Speech perception (SP) most commonly refers to the perceptual mapping from the highly variable acoustic speech signal to a linguistic representation, whether it be phonemes, diphones, syllables, or words. This is an example of categorization, in that potentially discriminable speech sounds are assigned to functionally equivalent classes. In this tutorial, we present some of the main challenges to our understanding of the categorization of speech sounds and the conceptualization of SP that has resulted from these challenges. We focus here on issues and experiments that define open research questions relevant to phoneme categorization, arguing that SP is best understood as perceptual categorization, a position that places SP in direct contact with research from other areas of perception and cognition.
Collapse
Affiliation(s)
- Lori L Holt
- Department of Radiology, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA.
| | | |
Collapse
|
38
|
Seitz AR, Protopapas A, Tsushima Y, Vlahou EL, Gori S, Grossberg S, Watanabe T. Unattended exposure to components of speech sounds yields same benefits as explicit auditory training. Cognition 2010; 115:435-43. [PMID: 20346448 DOI: 10.1016/j.cognition.2010.03.004] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2009] [Revised: 02/13/2010] [Accepted: 03/01/2010] [Indexed: 11/19/2022]
Abstract
Learning a second language as an adult is particularly effortful when new phonetic representations must be formed. Therefore the processes that allow learning of speech sounds are of great theoretical and practical interest. Here we examined whether perception of single formant transitions, that is, sound components critical in speech perception, can be enhanced through an implicit task-irrelevant learning procedure that has been shown to produce visual perceptual learning. The single-formant sounds were paired at subthreshold levels with the attended targets in an auditory identification task. Results showed that task-irrelevant learning occurred for the unattended stimuli. Surprisingly, the magnitude of this learning effect was similar to that following explicit training on auditory formant transition detection using discriminable stimuli in an adaptive procedure, whereas explicit training on the subthreshold stimuli produced no learning. These results suggest that in adults learning of speech parts can occur at least partially through implicit mechanisms.
Collapse
Affiliation(s)
- Aaron R Seitz
- Center of Excellence for Learning in Education, Science and Technology, Boston, MA 02215, USA.
| | | | | | | | | | | | | |
Collapse
|
39
|
Liu R, Holt LL. Neural changes associated with nonspeech auditory category learning parallel those of speech category acquisition. J Cogn Neurosci 2009; 23:683-98. [PMID: 19929331 DOI: 10.1162/jocn.2009.21392] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Native language experience plays a critical role in shaping speech categorization, but the exact mechanisms by which it does so are not well understood. Investigating category learning of nonspeech sounds with which listeners have no prior experience allows their experience to be systematically controlled in a way that is impossible to achieve by studying natural speech acquisition, and it provides a means of probing the boundaries and constraints that general auditory perception and cognition bring to the task of speech category learning. In this study, we used a multimodal, video-game-based implicit learning paradigm to train participants to categorize acoustically complex, nonlinguistic sounds. MMN responses to the nonspeech stimuli were collected before and after training, and changes in MMN resulting from the nonspeech category learning closely resemble patterns of change typically observed during speech category learning. Results indicate that changes in mismatch negativity resulting from the nonspeech category learning closely resemble patterns of change typically observed during speech category learning. This suggests that the often-observed "specialized" neural responses to speech sounds may result, at least in part, from the expertise we develop with speech categories through experience rather than from properties unique to speech (e.g., linguistic or vocal tract gestural information). Furthermore, particular characteristics of the training paradigm may inform our understanding of mechanisms that support natural speech acquisition.
Collapse
Affiliation(s)
- Ran Liu
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
| | | |
Collapse
|
40
|
Abstract
Regions of the human temporal lobe show greater activation for speech than for other sounds. These differences may reflect intrinsically specialized domain-specific adaptations for processing speech, or they may be driven by the significant expertise we have in listening to the speech signal. To test the expertise hypothesis, we used a video-game-based paradigm that tacitly trained listeners to categorize acoustically complex, artificial nonlinguistic sounds. Before and after training, we used functional MRI to measure how expertise with these sounds modulated temporal lobe activation. Participants' ability to explicitly categorize the nonspeech sounds predicted the change in pretraining to posttraining activation in speech-sensitive regions of the left posterior superior temporal sulcus, suggesting that emergent auditory expertise may help drive this functional regionalization. Thus, seemingly domain-specific patterns of neural activation in higher cortical regions may be driven in part by experience-based restructuring of high-dimensional perceptual space.
Collapse
|
41
|
Ménard L, Polak M, Denny M, Burton E, Lane H, Matthies ML, Marrone N, Perkell JS, Tiede M, Vick J. Interactions of speaking condition and auditory feedback on vowel production in postlingually deaf adults with cochlear implants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2007; 121:3790-801. [PMID: 17552727 DOI: 10.1121/1.2710963] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
This study investigates the effects of speaking condition and auditory feedback on vowel production by postlingually deafened adults. Thirteen cochlear implant users produced repetitions of nine American English vowels prior to implantation, and at one month and one year after implantation. There were three speaking conditions (clear, normal, and fast), and two feedback conditions after implantation (implant processor turned on and off). Ten normal-hearing controls were also recorded once. Vowel contrasts in the formant space (expressed in mels) were larger in the clear than in the fast condition, both for controls and for implant users at all three time samples. Implant users also produced differences in duration between clear and fast conditions that were in the range of those obtained from the controls. In agreement with prior work, the implant users had contrast values lower than did the controls. The implant users' contrasts were larger with hearing on than off and improved from one month to one year postimplant. Because the controls and implant users responded similarly to a change in speaking condition, it is inferred that auditory feedback, although demonstrably important for maintaining normative values of vowel contrasts, is not needed to maintain the distinctiveness of those contrasts in different speaking conditions.
Collapse
Affiliation(s)
- Lucie Ménard
- Département de Linguistique, Université du Québec à Montréal, Montréal (Québec), H3C 3P8 Canada.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
42
|
Perkell JS, Lane H, Denny M, Matthies ML, Tiede M, Zandipour M, Vick J, Burton E. Time course of speech changes in response to unanticipated short-term changes in hearing state. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2007; 121:2296-311. [PMID: 17471743 DOI: 10.1121/1.2642349] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
The timing of changes in parameters of speech production was investigated in six cochlear implant users by switching their implant microphones off and on a number of times in a single experimental session. The subjects repeated four short, two-word utterances, /dV1n#SV2d/ (S = /s/ or /S/), in quasi-random order. The changes between hearing and nonhearing states were introduced by a voice-activated switch at V1 onset. "Postural" measures were made of vowel sound pressure level (SPL), duration, F0; contrast measures were made of vowel separation (distance between pair members in the formant plane) and sibilant separation (difference in spectral means). Changes in parameter values were averaged over multiple utterances, lined up with respect to the switch. No matter whether prosthetic hearing was blocked or restored, contrast measures for vowels and sibilants did not change systematically. Some changes in duration, SPL and F0 were observed during the vowel within which hearing state was changed, V1, as well as during V2 and subsequent utterance repetitions. Thus, sound segment contrasts appear to be controlled differently from the postural parameters of speaking rate and average SPL and F0. These findings are interpreted in terms of the function of hypothesized feedback and feedforward mechanisms for speech motor control.
Collapse
Affiliation(s)
- Joseph S Perkell
- Speech Communication Group, Research Laboratory of Electronics, and Department of Brain and Cognitive Sciences, MIT Room 36-511, Cambridge, MA 02139, USA.
| | | | | | | | | | | | | | | |
Collapse
|
43
|
Holt LL. The mean matters: effects of statistically defined nonspeech spectral distributions on speech categorization. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2006; 120:2801-17. [PMID: 17091133 PMCID: PMC1635014 DOI: 10.1121/1.2354071] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Adjacent speech, and even nonspeech, contexts influence phonetic categorization. Four experiments investigated how preceding sequences of sine-wave tones influence phonetic categorization. This experimental paradigm provides a means of investigating the statistical regularities of acoustic events that influence online speech categorization and, reciprocally, reveals regularities of the sound environment tracked by auditory processing. The tones comprising the sequences were drawn from distributions sampling different acoustic frequencies. Results indicate that whereas the mean of the distributions predicts contrastive shifts in speech categorization, variability of the distributions has little effect. Moreover, speech categorization is influenced by the global mean of the tone sequence, without significant influence of local statistical regularities within the tone sequence. Further arguing that the effect is strongly related to the average spectrum of the sequence, notched noise spectral complements of the tone sequences produce a complementary effect on speech categorization. Lastly, these effects are modulated by the number of tones in the acoustic history and the overall duration of the sequence, but not by the density with which the distribution defining the sequence is sampled. Results are discussed in light of stimulus-specific adaptation to statistical regularity in the acoustic input and a speculative link to talker normalization is postulated.
Collapse
Affiliation(s)
- Lori L Holt
- Department of Psychology and the Center for the Neural Basis of Cognition, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, USA.
| |
Collapse
|
44
|
Purcell DW, Munhall KG. Adaptive control of vowel formant frequency: evidence from real-time formant manipulation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2006; 120:966-77. [PMID: 16938984 DOI: 10.1121/1.2217714] [Citation(s) in RCA: 144] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Auditory feedback during speech production is known to play a role in speech sound acquisition and is also important for the maintenance of accurate articulation. In two studies the first formant (F1) of monosyllabic consonant-vowel-consonant words (CVCs) was shifted electronically and fed back to the participant very quickly so that participants perceived the modified speech as their own productions. When feedback was shifted up (experiment 1 and 2) or down (experiment 1) participants compensated by producing F1 in the opposite frequency direction from baseline. The threshold size of manipulation that initiated a compensation in F1 was usually greater than 60 Hz. When normal feedback was returned, F1 did not return immediately to baseline but showed an exponential deadaptation pattern. Experiment 1 showed that this effect was not influenced by the direction of the F1 shift, with both raising and lowering of F1 exhibiting the same effects. Experiment 2 showed that manipulating the number of trials that F1 was held at the maximum shift in frequency (0, 15, 45 trials) did not influence the recovery from adaptation. There was a correlation between the lag-one autocorrelation of trial-to-trial changes in F1 in the baseline recordings and the magnitude of compensation. Some participants therefore appeared to more actively stabilize their productions from trial-to-trial. The results provide insight into the perceptual control of speech and the representations that govern sensorimotor coordination.
Collapse
Affiliation(s)
- David W Purcell
- Department of Psychology, Queen's University, Kingston, Ontario K7L 3N6, Canada.
| | | |
Collapse
|
45
|
Purcell DW, Munhall KG. Compensation following real-time manipulation of formants in isolated vowels. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2006; 119:2288-97. [PMID: 16642842 DOI: 10.1121/1.2173514] [Citation(s) in RCA: 149] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Auditory feedback influences human speech production, as demonstrated by studies using rapid pitch and loudness changes. Feedback has also been investigated using the gradual manipulation of formants in adaptation studies with whispered speech. In the work reported here, the first formant of steady-state isolated vowels was unexpectedly altered within trials for voiced speech. This was achieved using a real-time formant tracking and filtering system developed for this purpose. The first formant of vowel /epsilon/ was manipulated 100% toward either /ae/ or /I/, and participants responded by altering their production with average Fl compensation as large as 16.3% and 10.6% of the applied formant shift, respectively. Compensation was estimated to begin <460 ms after stimulus onset. The rapid formant compensations found here suggest that auditory feedback control is similar for both F0 and formants.
Collapse
Affiliation(s)
- David W Purcell
- Department of Psychology, Queen's University, Kingston, Ontario, K7L 3N6, Canada.
| | | |
Collapse
|