1
|
Hayashi M, Kida T, Inui K. Segmentation window of speech information processing in the human auditory cortex. Sci Rep 2024; 14:25044. [PMID: 39448758 PMCID: PMC11502806 DOI: 10.1038/s41598-024-76137-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Accepted: 10/10/2024] [Indexed: 10/26/2024] Open
Abstract
Humans perceive continuous speech signals as discrete sequences. To clarify the temporal segmentation window of speech information processing in the human auditory cortex, the relationship between speech perception and cortical responses was investigated using auditory evoked magnetic fields (AEFs). AEFs were measured while participants heard synthetic Japanese words /atataka/. There were eight types of /atataka/ with different speech rates. The durations of the words ranged from 75 to 600 ms. The results revealed a clear correlation between the AEFs and syllables. Specifically, when the durations of the words were between 375 and 600 ms, the evoked responses exhibited four clear responses from the superior temporal area, M100, that corresponded not only to the onset of speech but also to each group of consonant/vowel syllable units. The number of evoked M100 responses was correlated to the duration of the stimulus as well as the number of perceived syllables. The approximate range of the temporal segmentation window limit of speech perception was considered to be between 75 and 94 ms. This finding may contribute to optimizing the temporal performance of high-speed synthesized speech generation systems.
Collapse
Affiliation(s)
- Minoru Hayashi
- Department of Interdisciplinary Science and Engineering, School of Science and Engineering, Meisei University, Tokyo, 191-8506, Japan.
| | - Tetsuo Kida
- Department of Functioning and Disability, Institute for Developmental Research, Aichi Developmental Disability Center, Kasugai, Japan
- Section of Brain Function Information, National Institute for Physiological Sciences, Okazaki, Japan
| | - Koji Inui
- Department of Functioning and Disability, Institute for Developmental Research, Aichi Developmental Disability Center, Kasugai, Japan
- Section of Brain Function Information, National Institute for Physiological Sciences, Okazaki, Japan
| |
Collapse
|
2
|
Clapp W, Sumner M. The episodic encoding of spoken words in Hindi. JASA EXPRESS LETTERS 2024; 4:035202. [PMID: 38426889 DOI: 10.1121/10.0025134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 02/14/2024] [Indexed: 03/02/2024]
Abstract
The discovery that listeners more accurately identify words repeated in the same voice than in a different voice has had an enormous influence on models of representation and speech perception. Widely replicated in English, we understand little about whether and how this effect generalizes across languages. In a continuous recognition memory study with Hindi speakers and listeners (N = 178), we replicated the talker-specificity effect for accuracy-based measures (hit rate and D'), and found the latency advantage to be marginal (p = 0.06). These data help us better understand talker-specificity effects cross-linguistically and highlight the importance of expanding work to less studied languages.
Collapse
Affiliation(s)
- William Clapp
- Department of Linguistics, Stanford University, Stanford, California 94305, ,
| | - Meghan Sumner
- Department of Linguistics, Stanford University, Stanford, California 94305, ,
| |
Collapse
|
3
|
Qi W, Zevin JD. Statistical learning of syllable sequences as trajectories through a perceptual similarity space. Cognition 2024; 244:105689. [PMID: 38219453 DOI: 10.1016/j.cognition.2023.105689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 12/01/2023] [Accepted: 12/06/2023] [Indexed: 01/16/2024]
Abstract
Learning from sequential statistics is a general capacity common across many cognitive domains and species. One form of statistical learning (SL) - learning to segment "words" from continuous streams of speech syllables in which the only segmentation cue is ostensibly the transitional (or conditional) probability from one syllable to the next - has been studied in great detail. Typically, this phenomenon is modeled as the calculation of probabilities over discrete, featureless units. Here we present an alternative model, in which sequences are learned as trajectories through a similarity space. A simple recurrent network coding syllables with representations that capture the similarity relations among them correctly simulated the result of a classic SL study, as did a similar model that encoded syllables as three dimensional points in a continuous similarity space. We then used the simulations to identify a sequence of "words" that produces the reverse of the typical SL effect, i.e., part-words are predicted to be more familiar than Words. Results from two experiments with human participants are consistent with simulation results. Additional analyses identified features that drive differences in what is learned from a set of artificial languages that have the same transitional probabilities among syllables.
Collapse
Affiliation(s)
- Wendy Qi
- Department of Psychology, University of Southern California, 3620 S. McClintock Ave, Los Angeles, CA 90089, United States
| | - Jason D Zevin
- Department of Psychology, University of Southern California, 3620 S. McClintock Ave, Los Angeles, CA 90089, United States.
| |
Collapse
|
4
|
Kral A. Hearing and Cognition in Childhood. Laryngorhinootologie 2023; 102:S3-S11. [PMID: 37130527 PMCID: PMC10184669 DOI: 10.1055/a-1973-5087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The human brain shows extensive development of the cerebral cortex after birth. This is extensively altered by the absence of auditory input: the development of cortical synapses in the auditory system is delayed and their degradation is increased. Recent work shows that the synapses responsible for corticocortical processing of stimuli and their embedding into multisensory interactions and cognition are particularly affected. Since the brain is heavily reciprocally interconnected, inborn deafness manifests not only in deficits in auditory processing, but also in cognitive (non-auditory) functions that are affected differently between individuals. It requires individualized approaches in therapy of deafness in childhood.
Collapse
Affiliation(s)
- Andrej Kral
- Institut für AudioNeuroTechnologie (VIANNA) & Abt. für experimentelle Otologie, Exzellenzcluster Hearing4All, Medizinische Hochschule Hannover (Abteilungsleiter und Institutsleiter: Prof. Dr. A. Kral) & Australian Hearing Hub, School of Medicine and Health Sciences, Macquarie University, Sydney, Australia
| |
Collapse
|
5
|
Cychosz M, Newman RS. Perceptual normalization for speaking rate occurs below the level of the syllable. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:1486. [PMID: 37002071 PMCID: PMC10257529 DOI: 10.1121/10.0017360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 01/10/2023] [Accepted: 02/08/2023] [Indexed: 05/18/2023]
Abstract
Because speaking rates are highly variable, listeners must use cues like phoneme or sentence duration to normalize speech across different contexts. Scaling speech perception in this way allows listeners to distinguish between temporal contrasts, like voiced and voiceless stops, even at different speech speeds. It has long been assumed that this speaking rate normalization can occur over small units such as phonemes. However, phonemes lack clear boundaries in running speech, so it is not clear that listeners can rely on them for normalization. To evaluate this, we isolate two potential processing levels for speaking rate normalization-syllabic and sub-syllabic-by manipulating phoneme duration in order to cue speaking rate, while also holding syllable duration constant. In doing so, we show that changing the duration of phonemes both with unique spectro-temporal signatures (/kɑ/) and more overlapping spectro-temporal signatures (/wɪ/) results in a speaking rate normalization effect. These results suggest that when acoustic boundaries within syllables are less clear, listeners can normalize for rate differences on the basis of sub-syllabic units.
Collapse
Affiliation(s)
- Margaret Cychosz
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA
| | - Rochelle S Newman
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA
| |
Collapse
|
6
|
Baese-Berk MM, Chandrasekaran B, Roark CL. The nature of non-native speech sound representations. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:3025. [PMID: 36456300 PMCID: PMC9671621 DOI: 10.1121/10.0015230] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 10/20/2022] [Accepted: 11/01/2022] [Indexed: 05/23/2023]
Abstract
Most current theories and models of second language speech perception are grounded in the notion that learners acquire speech sound categories in their target language. In this paper, this classic idea in speech perception is revisited, given that clear evidence for formation of such categories is lacking in previous research. To understand the debate on the nature of speech sound representations in a second language, an operational definition of "category" is presented, and the issues of categorical perception and current theories of second language learning are reviewed. Following this, behavioral and neuroimaging evidence for and against acquisition of categorical representations is described. Finally, recommendations for future work are discussed. The paper concludes with a recommendation for integration of behavioral and neuroimaging work and theory in this area.
Collapse
Affiliation(s)
| | - Bharath Chandrasekaran
- Department of Communication Sciences and Disorders, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, USA
| | - Casey L Roark
- Department of Communication Sciences and Disorders, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, USA
| |
Collapse
|
7
|
Lee JJ, Perrachione TK. Implicit and explicit learning in talker identification. Atten Percept Psychophys 2022; 84:2002-2015. [PMID: 35534783 PMCID: PMC10081569 DOI: 10.3758/s13414-022-02500-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/23/2022] [Indexed: 11/08/2022]
Abstract
In the real world, listeners seem to implicitly learn talkers' vocal identities during interactions that prioritize attending to the content of talkers' speech. In contrast, most laboratory experiments of talker identification employ training paradigms that require listeners to explicitly practice identifying voices. Here, we investigated whether listeners become familiar with talkers' vocal identities during initial exposures that do not involve explicit talker identification. Participants were assigned to one of three exposure tasks, in which they heard identical stimuli but were differentially required to attend to the talkers' vocal identity or to the verbal content of their speech: (1) matching the talker to a concurrent visual cue (talker-matching); (2) discriminating whether the talker was the same as the prior trial (talker 1-back); or (3) discriminating whether speech content matched the previous trial (verbal 1-back). All participants were then tested on their ability to learn to identify talkers from novel speech content. Critically, we manipulated whether the talkers during this post-test differed from those heard during training. Compared to learning to identify novel talkers, listeners were significantly more accurate learning to identify the talkers they had previously been exposed to in the talker-matching and verbal 1-back tasks, but not the talker 1-back task. The correlation between talker identification test performance and exposure task performance was also greater when the talkers were the same in both tasks. These results suggest that listeners learn talkers' vocal identity implicitly during speech perception, even if they are not explicitly attending to the talkers' identity.
Collapse
Affiliation(s)
- Jayden J Lee
- Department of Speech, Language, & Hearing Sciences, Boston University, 635 Commonwealth Ave, Boston, MA, 02215, USA
| | - Tyler K Perrachione
- Department of Speech, Language, & Hearing Sciences, Boston University, 635 Commonwealth Ave, Boston, MA, 02215, USA.
| |
Collapse
|
8
|
Kelley MC, Tucker BV. Using acoustic distance and acoustic absement to quantify lexical competition. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:1367. [PMID: 35232063 DOI: 10.1121/10.0009584] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Accepted: 01/28/2022] [Indexed: 06/14/2023]
Abstract
Using phonological neighborhood density has been a common method to quantify lexical competition. It is useful and convenient but has shortcomings that are worth reconsidering. The present study quantifies the effects of lexical competition during spoken word recognition using acoustic distance and acoustic absement rather than phonological neighborhood density. The indication of a word's lexical competition is given by what is termed to be its acoustic distinctiveness, which is taken as its average acoustic absement to all words in the lexicon. A variety of acoustic representations for items in the lexicon are analyzed. Statistical modeling shows that acoustic distinctiveness has a similar effect trend as that of phonological neighborhood density. Additionally, acoustic distinctiveness consistently increases model fitness more than phonological neighborhood density regardless of which kind of acoustic representation is used. However, acoustic distinctiveness does not seem to explain all of the same things as phonological neighborhood density. The different areas that these two predictors explain are discussed in addition to the potential theoretical implications of the usefulness of acoustic distinctiveness in the models. The present paper concludes with some reasons why a researcher may want to use acoustic distinctiveness over phonological neighborhood density in future experiments.
Collapse
Affiliation(s)
- Matthew C Kelley
- Department of Linguistics, University of Alberta, Edmonton, Alberta T6G 2E7, Canada
| | - Benjamin V Tucker
- Department of Linguistics, University of Alberta, Edmonton, Alberta T6G 2E7, Canada
| |
Collapse
|
9
|
Monahan PJ, Schertz J, Fu Z, Pérez A. Unified Coding of Spectral and Temporal Phonetic Cues: Electrophysiological Evidence for Abstract Phonological Features. J Cogn Neurosci 2022; 34:618-638. [DOI: 10.1162/jocn_a_01817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Abstract
Spoken word recognition models and phonological theory propose that abstract features play a central role in speech processing. It remains unknown, however, whether auditory cortex encodes linguistic features in a manner beyond the phonetic properties of the speech sounds themselves. We took advantage of the fact that English phonology functionally codes stops and fricatives as voiced or voiceless with two distinct phonetic cues: Fricatives use a spectral cue, whereas stops use a temporal cue. Evidence that these cues can be grouped together would indicate the disjunctive coding of distinct phonetic cues into a functionally defined abstract phonological feature. In English, the voicing feature, which distinguishes the consonants [s] and [t] from [z] and [d], respectively, is hypothesized to be specified only for voiceless consonants (e.g., [s t]). Here, participants listened to syllables in a many-to-one oddball design, while their EEG was recorded. In one block, both voiceless stops and fricatives were the standards. In the other block, both voiced stops and fricatives were the standards. A critical design element was the presence of intercategory variation within the standards. Therefore, a many-to-one relationship, which is necessary to elicit an MMN, existed only if the stop and fricative standards were grouped together. In addition to the ERPs, event-related spectral power was also analyzed. Results showed an MMN effect in the voiceless standards block—an asymmetric MMN—in a time window consistent with processing in auditory cortex, as well as increased prestimulus beta-band oscillatory power to voiceless standards. These findings suggest that (i) there is an auditory memory trace of the standards based on the shared (voiceless) feature, which is only functionally defined; (ii) voiced consonants are underspecified; and (iii) features can serve as a basis for predictive processing. Taken together, these results point toward auditory cortex's ability to functionally code distinct phonetic cues together and suggest that abstract features can be used to parse the continuous acoustic signal.
Collapse
Affiliation(s)
| | | | - Zhanao Fu
- Cambridge University, United Kingdom
| | - Alejandro Pérez
- University of Toronto Scarborough, Ontario, Canada
- Cambridge University, United Kingdom
| |
Collapse
|
10
|
Abstract
Human speech perception results from neural computations that transform external acoustic speech signals into internal representations of words. The superior temporal gyrus (STG) contains the nonprimary auditory cortex and is a critical locus for phonological processing. Here, we describe how speech sound representation in the STG relies on fundamentally nonlinear and dynamical processes, such as categorization, normalization, contextual restoration, and the extraction of temporal structure. A spatial mosaic of local cortical sites on the STG exhibits complex auditory encoding for distinct acoustic-phonetic and prosodic features. We propose that as a population ensemble, these distributed patterns of neural activity give rise to abstract, higher-order phonemic and syllabic representations that support speech perception. This review presents a multi-scale, recurrent model of phonological processing in the STG, highlighting the critical interface between auditory and language systems.
Collapse
Affiliation(s)
- Ilina Bhaya-Grossman
- Department of Neurological Surgery, University of California, San Francisco, California 94143, USA;
- Joint Graduate Program in Bioengineering, University of California, Berkeley and San Francisco, California 94720, USA
| | - Edward F Chang
- Department of Neurological Surgery, University of California, San Francisco, California 94143, USA;
| |
Collapse
|
11
|
Zhang K, Peng G. The time course of normalizing speech variability in vowels. BRAIN AND LANGUAGE 2021; 222:105028. [PMID: 34597904 DOI: 10.1016/j.bandl.2021.105028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2020] [Revised: 07/21/2021] [Accepted: 09/08/2021] [Indexed: 06/13/2023]
Abstract
To achieve perceptual constancy, listeners utilize contextual cues to normalize speech variabilities in speakers. The present study tested the time course of this cognitive process with an event-related potential (ERP) experiment. The first neurophysiological evidence of speech normalization is observed in P2 (130-250 ms), which is functionally related to phonetic and phonological processes. Furthermore, the normalization process was found to ease lexical retrieval, as indexed by smaller N400 (350-470 ms) after larger P2. A cross-language vowel perception task was carried out to further specify whether normalization was processed in the phonetic and/or phonological stage(s). It was found that both phonetic and phonological cues in the speech context contributed to vowel normalization. The results suggest that vowel normalization in the speech context can be observed in the P2 time window and largely overlaps with phonetic and phonological processes.
Collapse
Affiliation(s)
- Kaile Zhang
- Research Centre for Language, Cognition, and Neuroscience, Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region.
| | - Gang Peng
- Research Centre for Language, Cognition, and Neuroscience, Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region; Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Boulevard, Shenzhen 518055, China.
| |
Collapse
|
12
|
Chien YF, Yan H, Sereno JA. Investigating the Lexical Representation of Mandarin Tone 3 Phonological Alternations. JOURNAL OF PSYCHOLINGUISTIC RESEARCH 2021; 50:777-796. [PMID: 33226518 DOI: 10.1007/s10936-020-09745-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 11/12/2020] [Indexed: 06/11/2023]
Abstract
Phonological alternations pose challenges for models of spoken word recognition in how surface information is mapped onto stored representations in the lexicon. In the current study, an auditory-auditory priming lexical decision experiment was conducted to investigate the alternating representations of Mandarin Tone 3 in both half-third and third tone sandhi contexts. In Mandarin, a full Tone 3 (213) is reduced to an abridged tone (21) when followed by Tone 1, Tone 2, or Tone 4 (half-third tone sandhi), and Tone 3 is replaced by Tone 2 when followed by another Tone 3 (third tone sandhi). In the half-third sandhi block, disyllabic targets with a half-third (21) or full-third (213) tone FIRST syllable and a Tone 2 (35) or Tone 4 (51) second syllable were preceded by either a half-third prime, a full-third prime, or a control prime. In the third tone sandhi block, third-tone sandhi disyllabic targets with a half-third or full-third SECOND syllable were preceded by either a half-third prime, a full-third prime, or a control prime. Results showed that both half-third and full-third primes elicited significantly faster reaction times relative to the control Tone 1 condition. The size of the facilitation was not influenced by prime condition, target frequency, targets' first syllable tone or targets' second syllable tone. These data suggest that Mandarin T3 may be a more abstract tone and stored as the first syllable for both types of sandhi words.
Collapse
Affiliation(s)
- Yu-Fu Chien
- Department of Chinese Language and Literature, Fudan University, Shanghai, China
- Department of Modern Languages, DePaul University, Chicago, IL, USA
| | - Hanbo Yan
- School of Chinese Studies and Exchange, Shanghai International Studies University, Room 418, Building 2, Number 550 West Dalian Road, Hongkou District, Shanghai, 200083, China.
| | - Joan A Sereno
- Department of Linguistics, University of Kansas, Lawrence, KS, USA
| |
Collapse
|
13
|
Skipper JI, Lametti DR. Speech Perception under the Tent: A Domain-general Predictive Role for the Cerebellum. J Cogn Neurosci 2021; 33:1517-1534. [PMID: 34496370 DOI: 10.1162/jocn_a_01729] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
The role of the cerebellum in speech perception remains a mystery. Given its uniform architecture, we tested the hypothesis that it implements a domain-general predictive mechanism whose role in speech is determined by connectivity. We collated all neuroimaging studies reporting cerebellar activity in the Neurosynth database (n = 8206). From this set, we found all studies involving passive speech and sound perception (n = 72, 64% speech, 12.5% sounds, 12.5% music, and 11% tones) and speech production and articulation (n = 175). Standard and coactivation neuroimaging meta-analyses were used to compare cerebellar and associated cortical activations between passive perception and production. We found distinct regions of perception- and production-related activity in the cerebellum and regions of perception-production overlap. Each of these regions had distinct patterns of cortico-cerebellar connectivity. To test for domain-generality versus specificity, we identified all psychological and task-related terms in the Neurosynth database that predicted activity in cerebellar regions associated with passive perception and production. Regions in the cerebellum activated by speech perception were associated with domain-general terms related to prediction. One hallmark of predictive processing is metabolic savings (i.e., decreases in neural activity when events are predicted). To test the hypothesis that the cerebellum plays a predictive role in speech perception, we examined cortical activation between studies reporting cerebellar activation and those without cerebellar activation during speech perception. When the cerebellum was active during speech perception, there was far less cortical activation than when it was inactive. The results suggest that the cerebellum implements a domain-general mechanism related to prediction during speech perception.
Collapse
Affiliation(s)
| | - Daniel R Lametti
- University College London.,Acadia University, Wolfville, Nova Scotia, Canada
| |
Collapse
|
14
|
Kral A, Dorman MF, Wilson BS. Neuronal Development of Hearing and Language: Cochlear Implants and Critical Periods. Annu Rev Neurosci 2019; 42:47-65. [DOI: 10.1146/annurev-neuro-080317-061513] [Citation(s) in RCA: 71] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The modern cochlear implant (CI) is the most successful neural prosthesis developed to date. CIs provide hearing to the profoundly hearing impaired and allow the acquisition of spoken language in children born deaf. Results from studies enabled by the CI have provided new insights into ( a) minimal representations at the periphery for speech reception, ( b) brain mechanisms for decoding speech presented in quiet and in acoustically adverse conditions, ( c) the developmental neuroscience of language and hearing, and ( d) the mechanisms and time courses of intramodal and cross-modal plasticity. Additionally, the results have underscored the interconnectedness of brain functions and the importance of top-down processes in perception and learning. The findings are described in this review with emphasis on the developing brain and the acquisition of hearing and spoken language.
Collapse
Affiliation(s)
- Andrej Kral
- Institute of AudioNeuroTechnology and Department of Experimental Otology, ENT Clinics, Hannover Medical University, 30625 Hannover, Germany
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, Dallas, Texas 75080, USA
- School of Medicine and Health Sciences, Macquarie University, Sydney, New South Wales 2109, Australia
| | - Michael F. Dorman
- Department of Speech and Hearing Science, Arizona State University, Tempe, Arizona 85287, USA
| | - Blake S. Wilson
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, Dallas, Texas 75080, USA
- School of Medicine and Pratt School of Engineering, Duke University, Durham, North Carolina 27708, USA
| |
Collapse
|
15
|
Zimmerer F, Scharinger M, Cornell S, Reetz H, Eulitz C. Neural mechanisms for coping with acoustically reduced speech. BRAIN AND LANGUAGE 2019; 191:46-57. [PMID: 30822731 DOI: 10.1016/j.bandl.2019.02.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Revised: 09/26/2018] [Accepted: 02/12/2019] [Indexed: 06/09/2023]
Abstract
In spoken language, reductions of word forms occur regularly and need to be accommodated by the listener. Intriguingly, this accommodation is usually achieved without any apparent effort. The neural bases of this cognitive skill are not yet fully understood. We here presented participants with reduced words that were either preceded by a related or an unrelated visual prime and compared electric brain responses to reduced words with those to their full counterparts. In time-domain, we found a positivity between 400 and 600 ms differing between reduced and full forms. A later positivity distinguished primed and unprimed words and was modulated by reduction. In frequency-domain, alpha suppression was stronger for reduced than for full words. The time- and frequency-domain reduction effects converge towards the view that reduced words draw on attention and memory mechanisms. Our data demonstrate the importance of interactive processing of bottom-up and top-down information for the comprehension of reduced words.
Collapse
Affiliation(s)
- Frank Zimmerer
- Department of Language Science and Technology, Universität des Saarlandes, Germany; Department of Pediatric Neurology, Developmental Medicine and Social Pediatrics, Dr. von Hauner Children's Hospital, Ludwig-Maximilian-Universität, Munich, Germany
| | - Mathias Scharinger
- Phonetics Research Group, Philipps-Universität Marburg, Germany; Marburg Center for Mind, Brain and Behavior, Philipps-Universität Marburg, Germany.
| | - Sonia Cornell
- Department of Pediatric Neurology, Developmental Medicine and Social Pediatrics, Dr. von Hauner Children's Hospital, Ludwig-Maximilian-Universität, Munich, Germany; Department of Linguistics, Universität Konstanz, Germany
| | - Henning Reetz
- Institute for Phonetics, Goethe-Universität, Frankfurt, Germany
| | - Carsten Eulitz
- Department of Linguistics, Universität Konstanz, Germany
| |
Collapse
|
16
|
Lametti DR, Smith HJ, Watkins KE, Shiller DM. Robust Sensorimotor Learning during Variable Sentence-Level Speech. Curr Biol 2018; 28:3106-3113.e2. [PMID: 30245103 DOI: 10.1016/j.cub.2018.07.030] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Revised: 06/20/2018] [Accepted: 07/10/2018] [Indexed: 10/28/2022]
Abstract
Sensorimotor learning has been studied by altering the sound of the voice in real time as speech is produced. In response to voice alterations, learned changes in production reduce the perceived auditory error and persist for some time after the alteration is removed [1-5]. The results of such experiments have led to the development of prominent models of speech production. This work proposes that the control of speech relies on forward models to predict sensory outcomes of movements, and errors in these predictions drive sensorimotor learning [5-7]. However, sensorimotor learning in speech has only been observed following intensive training on a handful of discrete words or perceptually similar sentences. Stereotyped production does not capture the complex sensorimotor demands of fluid, real-world speech [8-11]. It remains unknown whether talkers predict the sensory consequences of variable sentence production to allow rapid and precise updating of speech motor plans when sensory prediction errors are encountered. Here, we used real-time alterations of speech feedback to test for sensorimotor learning during the production of 50 sentences that varied markedly in length, vocabulary, and grammar. Following baseline production, all vowels were simultaneously altered and played back through headphones in near real time. Robust feedforward changes in sentence production were observed that, on average, precisely countered the direction of the alteration. These changes occurred in every participant and transferred to the production of single words with varying vowel sounds. The results show that to maintain accurate sentence production, the brain actively predicts the auditory consequences of variable sentence-level speech.
Collapse
Affiliation(s)
- Daniel R Lametti
- Department of Experimental Psychology, University of Oxford, Oxford, UK; Department of Psychology, Acadia University, Wolfville, Nova Scotia, Canada.
| | - Harriet J Smith
- Department of Experimental Psychology, University of Oxford, Oxford, UK
| | - Kate E Watkins
- Department of Experimental Psychology, University of Oxford, Oxford, UK
| | - Douglas M Shiller
- École d'orthophonie et d'audiologie, Université de Montréal, Montreal, Canada; Sainte-Justine Hospital Research Center, Montreal, Canada; Centre for Research on Brain, Language & Music, Montreal, Canada.
| |
Collapse
|
17
|
Abstract
We describe the performance of an aphasic individual, K.A., who showed a selective impairment affecting his ability to perceive spoken language, while largely sparing his ability to perceive written language and to produce spoken language. His spoken perception impairment left him unable to distinguish words or nonwords that differed on a single phoneme and he was no better than chance at auditory lexical decision or single spoken word and single picture matching with phonological foils. Strikingly, despite this profound impairment, K.A. showed a selective sparing in his ability to perceive number words, which he was able to repeat and comprehend largely without error. This case adds to a growing literature demonstrating modality-specific dissociations between number word and non-number word processing. Because of the locus of K.A.'s speech perception deficit for non-number words, we argue that this distinction between number word and non-number word processing arises at a sublexical level of representations in speech perception, in a parallel fashion to what has previously been argued for in the organization of the sublexical level of representation for speech production.
Collapse
Affiliation(s)
| | - Rachel Mis
- b Department of Psychology , Temple University , Philadelphia , PA , USA
| | - Heather Dial
- c Department of Communication Sciences and Disorders , University of Texas-Austin , Austin , TX , USA
| |
Collapse
|
18
|
|
19
|
Luthra S, Fox NP, Blumstein SE. Speaker information affects false recognition of unstudied lexical-semantic associates. Atten Percept Psychophys 2018; 80:894-912. [PMID: 29473144 PMCID: PMC6003774 DOI: 10.3758/s13414-018-1485-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Recognition of and memory for a spoken word can be facilitated by a prior presentation of that word spoken by the same talker. However, it is less clear whether this speaker congruency advantage generalizes to facilitate recognition of unheard related words. The present investigation employed a false memory paradigm to examine whether information about a speaker's identity in items heard by listeners could influence the recognition of novel items (critical intruders) phonologically or semantically related to the studied items. In Experiment 1, false recognition of semantically associated critical intruders was sensitive to speaker information, though only when subjects attended to talker identity during encoding. Results from Experiment 2 also provide some evidence that talker information affects the false recognition of critical intruders. Taken together, the present findings indicate that indexical information is able to contact the lexical-semantic network to affect the processing of unheard words.
Collapse
Affiliation(s)
- Sahil Luthra
- Department of Cognitive, Linguistic & Psychological Sciences, Brown University, 190 Thayer St., Box 1821, Providence, RI, 02912, USA.
- Department of Psychological Sciences, University of Connecticut, 406 Babbidge Rd, Unit 1020, Storrs, CT, 06269, USA.
| | - Neal P Fox
- Department of Cognitive, Linguistic & Psychological Sciences, Brown University, 190 Thayer St., Box 1821, Providence, RI, 02912, USA
- Department of Neurological Surgery, University of California, San Francisco, 675 Nelson Rising Lane, Room 535, San Francisco, CA, 94143, USA
| | - Sheila E Blumstein
- Department of Cognitive, Linguistic & Psychological Sciences, Brown University, 190 Thayer St., Box 1821, Providence, RI, 02912, USA
- Brown Institute for Brain Science, Brown University, 2 Stimson Ave, Providence, RI, 02912, USA
| |
Collapse
|
20
|
Grossberg S. Desirability, availability, credit assignment, category learning, and attention: Cognitive-emotional and working memory dynamics of orbitofrontal, ventrolateral, and dorsolateral prefrontal cortices. Brain Neurosci Adv 2018; 2:2398212818772179. [PMID: 32166139 PMCID: PMC7058233 DOI: 10.1177/2398212818772179] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2017] [Accepted: 03/16/2018] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND The prefrontal cortices play an essential role in cognitive-emotional and working memory processes through interactions with multiple brain regions. METHODS This article further develops a unified neural architecture that explains many recent and classical data about prefrontal function and makes testable predictions. RESULTS Prefrontal properties of desirability, availability, credit assignment, category learning, and feature-based attention are explained. These properties arise through interactions of orbitofrontal, ventrolateral prefrontal, and dorsolateral prefrontal cortices with the inferotemporal cortex, perirhinal cortex, parahippocampal cortices; ventral bank of the principal sulcus, ventral prearcuate gyrus, frontal eye fields, hippocampus, amygdala, basal ganglia, hypothalamus, and visual cortical areas V1, V2, V3A, V4, middle temporal cortex, medial superior temporal area, lateral intraparietal cortex, and posterior parietal cortex. Model explanations also include how the value of visual objects and events is computed, which objects and events cause desired consequences and which may be ignored as predictively irrelevant, and how to plan and act to realise these consequences, including how to selectively filter expected versus unexpected events, leading to movements towards, and conscious perception of, expected events. Modelled processes include reinforcement learning and incentive motivational learning; object and spatial working memory dynamics; and category learning, including the learning of object categories, value categories, object-value categories, and sequence categories, or list chunks. CONCLUSION This article hereby proposes a unified neural theory of prefrontal cortex and its functions.
Collapse
Affiliation(s)
- Stephen Grossberg
- Center for Adaptive Systems, Graduate Program in Cognitive and Neural Systems, Departments of Mathematics & Statistics, Psychological & Brain Sciences, Biomedical Engineering, Boston University, Boston, MA, USA
| |
Collapse
|
21
|
Skipper JI, Devlin JT, Lametti DR. The hearing ear is always found close to the speaking tongue: Review of the role of the motor system in speech perception. BRAIN AND LANGUAGE 2017; 164:77-105. [PMID: 27821280 DOI: 10.1016/j.bandl.2016.10.004] [Citation(s) in RCA: 123] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2016] [Accepted: 10/24/2016] [Indexed: 06/06/2023]
Abstract
Does "the motor system" play "a role" in speech perception? If so, where, how, and when? We conducted a systematic review that addresses these questions using both qualitative and quantitative methods. The qualitative review of behavioural, computational modelling, non-human animal, brain damage/disorder, electrical stimulation/recording, and neuroimaging research suggests that distributed brain regions involved in producing speech play specific, dynamic, and contextually determined roles in speech perception. The quantitative review employed region and network based neuroimaging meta-analyses and a novel text mining method to describe relative contributions of nodes in distributed brain networks. Supporting the qualitative review, results show a specific functional correspondence between regions involved in non-linguistic movement of the articulators, covertly and overtly producing speech, and the perception of both nonword and word sounds. This distributed set of cortical and subcortical speech production regions are ubiquitously active and form multiple networks whose topologies dynamically change with listening context. Results are inconsistent with motor and acoustic only models of speech perception and classical and contemporary dual-stream models of the organization of language and the brain. Instead, results are more consistent with complex network models in which multiple speech production related networks and subnetworks dynamically self-organize to constrain interpretation of indeterminant acoustic patterns as listening context requires.
Collapse
Affiliation(s)
- Jeremy I Skipper
- Experimental Psychology, University College London, United Kingdom.
| | - Joseph T Devlin
- Experimental Psychology, University College London, United Kingdom
| | - Daniel R Lametti
- Experimental Psychology, University College London, United Kingdom; Department of Experimental Psychology, University of Oxford, United Kingdom
| |
Collapse
|
22
|
Grossberg S. Towards solving the hard problem of consciousness: The varieties of brain resonances and the conscious experiences that they support. Neural Netw 2016; 87:38-95. [PMID: 28088645 DOI: 10.1016/j.neunet.2016.11.003] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Revised: 10/21/2016] [Accepted: 11/20/2016] [Indexed: 10/20/2022]
Abstract
The hard problem of consciousness is the problem of explaining how we experience qualia or phenomenal experiences, such as seeing, hearing, and feeling, and knowing what they are. To solve this problem, a theory of consciousness needs to link brain to mind by modeling how emergent properties of several brain mechanisms interacting together embody detailed properties of individual conscious psychological experiences. This article summarizes evidence that Adaptive Resonance Theory, or ART, accomplishes this goal. ART is a cognitive and neural theory of how advanced brains autonomously learn to attend, recognize, and predict objects and events in a changing world. ART has predicted that "all conscious states are resonant states" as part of its specification of mechanistic links between processes of consciousness, learning, expectation, attention, resonance, and synchrony. It hereby provides functional and mechanistic explanations of data ranging from individual spikes and their synchronization to the dynamics of conscious perceptual, cognitive, and cognitive-emotional experiences. ART has reached sufficient maturity to begin classifying the brain resonances that support conscious experiences of seeing, hearing, feeling, and knowing. Psychological and neurobiological data in both normal individuals and clinical patients are clarified by this classification. This analysis also explains why not all resonances become conscious, and why not all brain dynamics are resonant. The global organization of the brain into computationally complementary cortical processing streams (complementary computing), and the organization of the cerebral cortex into characteristic layers of cells (laminar computing), figure prominently in these explanations of conscious and unconscious processes. Alternative models of consciousness are also discussed.
Collapse
Affiliation(s)
- Stephen Grossberg
- Center for Adaptive Systems, Boston University, 677 Beacon Street, Boston, MA 02215, USA; Graduate Program in Cognitive and Neural Systems, Departments of Mathematics & Statistics, Psychological & Brain Sciences, and Biomedical Engineering Boston University, 677 Beacon Street, Boston, MA 02215, USA.
| |
Collapse
|
23
|
Papesh MH, Goldinger SD, Hout MC. Eye movements reveal fast, voice-specific priming. J Exp Psychol Gen 2016; 145:314-37. [PMID: 26726911 DOI: 10.1037/xge0000135] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
In spoken word perception, voice specificity effects are well-documented: When people hear repeated words in some task, performance is generally better when repeated items are presented in their originally heard voices, relative to changed voices. A key theoretical question about voice specificity effects concerns their time-course: Some studies suggest that episodic traces exert their influence late in lexical processing (the time-course hypothesis; McLennan & Luce, 2005), whereas others suggest that episodic traces influence immediate, online processing. We report 2 eye-tracking studies investigating the time-course of voice-specific priming within and across cognitive tasks. In Experiment 1, participants performed modified lexical decision or semantic classification to words spoken by 4 speakers. The tasks required participants to click a red "x" or a blue "+" located randomly within separate visual half-fields, necessitating trial-by-trial visual search with consistent half-field response mapping. After a break, participants completed a second block with new and repeated items, half spoken in changed voices. Voice effects were robust very early, appearing in saccade initiation times. Experiment 2 replicated this pattern while changing tasks across blocks, ruling out a response priming account. In the General Discussion, we address the time-course hypothesis, focusing on the challenge it presents for empirical disconfirmation, and highlighting the broad importance of indexical effects, beyond studies of priming.
Collapse
|
24
|
Lentz TO, Kager RWJ. Categorical phonotactic knowledge filters second language input, but probabilistic phonotactic knowledge can still be acquired. LANGUAGE AND SPEECH 2015; 58:387-413. [PMID: 26529903 DOI: 10.1177/0023830914559572] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Probabilistic phonotactic knowledge facilitates perception, but categorical phonotactic illegality can cause misperceptions, especially of non-native phoneme combinations. If misperceptions induced by first language (L1) knowledge filter second language input, access to second language (L2) probabilistic phonotactics is potentially blocked for L2 acquisition. The facilitatory effects of L2 probabilistic phonotactics and categorical filtering effects of L1 phonotactics were compared and contrasted in a series of cross-modal priming experiments. Dutch native listeners and L1 Spanish and Japanese learners of Dutch had to perform a lexical decision task on Dutch words that started with /sC/ clusters that were of different degrees of probabilistic wellformedness in Dutch but illegal in Spanish and Japanese. Versions of target words with Spanish illegality resolving epenthesis in the clusters primed the Spanish group, showing an L1 filter; a similar effect was not found for the Japanese group. In addition, words with wellformed /sC/ clusters were recognised faster, showing a positive effect on processing of probabilistic wellformedness. However, Spanish learners with higher proficiency were facilitated to a greater extent by wellformed but epenthesised clusters, showing that although probabilistic learning occurs in spite of the L1 filter, the acquired probabilistic knowledge is still affected by L1 categorical knowledge. Categorical phonotactic and probabilistic knowledge are of a different nature and interact in acquisition.
Collapse
|
25
|
Erickson LC, Thiessen ED. Statistical learning of language: Theory, validity, and predictions of a statistical learning account of language acquisition. DEVELOPMENTAL REVIEW 2015. [DOI: 10.1016/j.dr.2015.05.002] [Citation(s) in RCA: 57] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
26
|
Antoniou M, Wong PCM, Wang S. The Effect of Intensified Language Exposure on Accommodating Talker Variability. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2015; 58:722-7. [PMID: 25811169 PMCID: PMC4610280 DOI: 10.1044/2015_jslhr-s-14-0259] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2014] [Revised: 12/21/2014] [Accepted: 03/19/2015] [Indexed: 05/05/2023]
Abstract
PURPOSE This study systematically examined the role of intensified exposure to a second language on accommodating talker variability. METHOD English native listeners (n = 37) were compared with Mandarin listeners who had either lived in the United States for an extended period of time (n = 33) or had lived only in China (n = 44). Listeners responded to target words in an English word-monitoring task in which sequences of words were randomized. Half of the sequences were spoken by a single talker and the other half by multiple talkers. RESULTS Mandarin listeners living in China were slower and less accurate than both English listeners and Mandarin listeners living in the United States. Mandarin listeners living in the United States were less accurate than English natives only in the more cognitively demanding mixed-talker condition. CONCLUSIONS Mixed-talker speech affects processing in native and nonnative listeners alike, although the decrement is larger in nonnatives and further exaggerated in less proficient listeners. Language immersion improves listeners' ability to resolve talker variability, and this suggests that immersion may automatize nonnative processing, freeing cognitive resources that may play a crucial role in speech perception. These results lend support to the active control model of speech perception.
Collapse
Affiliation(s)
| | | | - Suiping Wang
- South China Normal University, Shipai, Guangzhou, People's Republic of China
| |
Collapse
|
27
|
Souza PE, Wright RA, Blackburn MC, Tatman R, Gallun FJ. Individual sensitivity to spectral and temporal cues in listeners with hearing impairment. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2015; 58:520-34. [PMID: 25629388 PMCID: PMC4462137 DOI: 10.1044/2015_jslhr-h-14-0138] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2014] [Revised: 10/14/2014] [Accepted: 12/18/2014] [Indexed: 05/26/2023]
Abstract
PURPOSE The present study was designed to evaluate use of spectral and temporal cues under conditions in which both types of cues were available. METHOD Participants included adults with normal hearing and hearing loss. We focused on 3 categories of speech cues: static spectral (spectral shape), dynamic spectral (formant change), and temporal (amplitude envelope). Spectral and/or temporal dimensions of synthetic speech were systematically manipulated along a continuum, and recognition was measured using the manipulated stimuli. Level was controlled to ensure cue audibility. Discriminant function analysis was used to determine to what degree spectral and temporal information contributed to the identification of each stimulus. RESULTS Listeners with normal hearing were influenced to a greater extent by spectral cues for all stimuli. Listeners with hearing impairment generally utilized spectral cues when the information was static (spectral shape) but used temporal cues when the information was dynamic (formant transition). The relative use of spectral and temporal dimensions varied among individuals, especially among listeners with hearing loss. CONCLUSION Information about spectral and temporal cue use may aid in identifying listeners who rely to a greater extent on particular acoustic cues and applying that information toward therapeutic interventions.
Collapse
Affiliation(s)
- Pamela E. Souza
- Northwestern University, Evanston, IL
- Knowles Hearing Center, Northwestern University, Evanston, IL
| | | | | | | | - Frederick J. Gallun
- National Center for Rehabilitative Auditory Research, Portland VA Medical Center, OR
- Oregon Health & Science University, Portland
| |
Collapse
|
28
|
Affiliation(s)
- Robin L. Peterson
- Department of Rehabilitation Medicine, Children's Hospital Colorado, Aurora, Colorado 80045;
| | | |
Collapse
|
29
|
Bernard A. An onset is an onset: Evidence from abstraction of newly-learned phonotactic constraints. JOURNAL OF MEMORY AND LANGUAGE 2015; 78:18-32. [PMID: 25378800 PMCID: PMC4217139 DOI: 10.1016/j.jml.2014.09.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Phonotactic constraints are language-specific patterns in the sequencing of speech sounds. Are these constraints represented at the syllable level (ng cannot begin syllables in English) or at the word level (ng cannot begin words)? In a continuous recognition-memory task, participants more often falsely recognized novel test items that followed than violated the training constraints, whether training and test items matched in word structure (one or two syllables) or position of restricted consonants (word-edge or word-medial position). E.g., learning that ps are onsets and fs codas, participants generalized from pef (one syllable) to putvif (two syllables), and from putvif (word-edge positions) to bufpak (word-medial positions). These results suggest that newly-learned phonotactic constraints are represented at the syllable level. The syllable is a representational unit available and spontaneously used when learning speech-sound constraints. In the current experiments, an onset is an onset and a coda a coda, regardless of word structure or word position.
Collapse
|
30
|
Trude AM, Duff MC, Brown-Schmidt S. Talker-specific learning in amnesia: Insight into mechanisms of adaptive speech perception. Cortex 2014; 54:117-23. [PMID: 24657480 DOI: 10.1016/j.cortex.2014.01.015] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2013] [Revised: 12/18/2013] [Accepted: 01/24/2014] [Indexed: 10/25/2022]
Abstract
A hallmark of human speech perception is the ability to comprehend speech quickly and effortlessly despite enormous variability across talkers. However, current theories of speech perception do not make specific claims about the memory mechanisms involved in this process. To examine whether declarative memory is necessary for talker-specific learning, we tested the ability of amnesic patients with severe declarative memory deficits to learn and distinguish the accents of two unfamiliar talkers by monitoring their eye-gaze as they followed spoken instructions. Analyses of the time-course of eye fixations showed that amnesic patients rapidly learned to distinguish these accents and tailored perceptual processes to the voice of each talker. These results demonstrate that declarative memory is not necessary for this ability and points to the involvement of non-declarative memory mechanisms. These results are consistent with findings that other social and accommodative behaviors are preserved in amnesia and contribute to our understanding of the interactions of multiple memory systems in the use and understanding of spoken language.
Collapse
Affiliation(s)
- Alison M Trude
- Department of Psychology and Beckman Institute, University of Illinois at Urbana-Champaign, Champaign, IL, USA.
| | - Melissa C Duff
- Department of Communication Sciences and Disorders and Department of Neurology, University of Iowa, Iowa City, IA, USA.
| | - Sarah Brown-Schmidt
- Department of Psychology and Beckman Institute, University of Illinois at Urbana-Champaign, Champaign, IL, USA.
| |
Collapse
|
31
|
Hickok G. The architecture of speech production and the role of the phoneme in speech processing. LANGUAGE AND COGNITIVE PROCESSES 2014; 29:2-20. [PMID: 24489420 PMCID: PMC3904400 DOI: 10.1080/01690965.2013.834370] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Speech production has been studied within a number of traditions including linguistics, psycholinguistics, motor control, neuropsychology, and neuroscience. These traditions have had limited interaction, ostensibly because they target different levels of speech production or different dimensions such as representation, processing, or implementation. However, closer examination of reveals a substantial convergence of ideas across the traditions and recent proposals have suggested that an integrated approach may help move the field forward. The present article reviews one such attempt at integration, the state feedback control model and its descendent, the hierarchical state feedback control model. Also considered is how phoneme-level representations might fit in the context of the model.
Collapse
Affiliation(s)
- Gregory Hickok
- Department of Cognitive Sciences, University of California, Irvine, California, 92697, USA
| |
Collapse
|
32
|
Mitterer H, Scharenborg O, McQueen JM. Phonological abstraction without phonemes in speech perception. Cognition 2013; 129:356-61. [DOI: 10.1016/j.cognition.2013.07.011] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2012] [Revised: 07/08/2013] [Accepted: 07/12/2013] [Indexed: 11/29/2022]
|
33
|
Phonetic imitation from an individual-difference perspective: subjective attitude, personality and "autistic" traits. PLoS One 2013. [PMID: 24098665 DOI: 10.1371/journal.pone.0074746.] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Numerous studies have documented the phenomenon of phonetic imitation: the process by which the production patterns of an individual become more similar on some phonetic or acoustic dimension to those of her interlocutor. Though social factors have been suggested as a motivator for imitation, few studies has established a tight connection between language-external factors and a speaker's likelihood to imitate. The present study investigated the phenomenon of phonetic imitation using a within-subject design embedded in an individual-differences framework. Participants were administered a phonetic imitation task, which included two speech production tasks separated by a perceptual learning task, and a battery of measures assessing traits associated with Autism-Spectrum Condition, working memory, and personality. To examine the effects of subjective attitude on phonetic imitation, participants were randomly assigned to four experimental conditions, where the perceived sexual orientation of the narrator (homosexual vs. heterosexual) and the outcome (positive vs. negative) of the story depicted in the exposure materials differed. The extent of phonetic imitation by an individual is significantly modulated by the story outcome, as well as by the participant's subjective attitude toward the model talker, the participant's personality trait of openness and the autistic-like trait associated with attention switching.
Collapse
|
34
|
Yu ACL, Abrego-Collier C, Sonderegger M. Phonetic imitation from an individual-difference perspective: subjective attitude, personality and "autistic" traits. PLoS One 2013; 8:e74746. [PMID: 24098665 PMCID: PMC3786990 DOI: 10.1371/journal.pone.0074746] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2013] [Accepted: 08/06/2013] [Indexed: 11/18/2022] Open
Abstract
Numerous studies have documented the phenomenon of phonetic imitation: the process by which the production patterns of an individual become more similar on some phonetic or acoustic dimension to those of her interlocutor. Though social factors have been suggested as a motivator for imitation, few studies has established a tight connection between language-external factors and a speaker's likelihood to imitate. The present study investigated the phenomenon of phonetic imitation using a within-subject design embedded in an individual-differences framework. Participants were administered a phonetic imitation task, which included two speech production tasks separated by a perceptual learning task, and a battery of measures assessing traits associated with Autism-Spectrum Condition, working memory, and personality. To examine the effects of subjective attitude on phonetic imitation, participants were randomly assigned to four experimental conditions, where the perceived sexual orientation of the narrator (homosexual vs. heterosexual) and the outcome (positive vs. negative) of the story depicted in the exposure materials differed. The extent of phonetic imitation by an individual is significantly modulated by the story outcome, as well as by the participant's subjective attitude toward the model talker, the participant's personality trait of openness and the autistic-like trait associated with attention switching.
Collapse
Affiliation(s)
- Alan C. L. Yu
- Phonology Laboratory, Department of Linguistics, University of Chicago, Chicago, Illinois, United States of America
| | - Carissa Abrego-Collier
- Phonology Laboratory, Department of Linguistics, University of Chicago, Chicago, Illinois, United States of America
| | | |
Collapse
|
35
|
Felty RA, Buchwald A, Gruenenfelder TM, Pisoni DB. Misperceptions of spoken words: data from a random sample of American English words. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 134:572-585. [PMID: 23862832 PMCID: PMC3724775 DOI: 10.1121/1.4809540] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2012] [Revised: 02/06/2013] [Accepted: 05/20/2013] [Indexed: 06/01/2023]
Abstract
This study reports a detailed analysis of incorrect responses from an open-set spoken word recognition experiment of 1428 words designed to be a random sample of the entire American English lexicon. The stimuli were presented in six-talker babble to 192 young, normal-hearing listeners at three signal-to-noise ratios (0, +5, and +10 dB). The results revealed several patterns: (1) errors tended to have a higher frequency of occurrence than did the corresponding target word, and frequency of occurrence of error responses was significantly correlated with target frequency of occurrence; (2) incorrect responses were close to the target words in terms of number of phonemes and syllables but had a mean edit distance of 3; (3) for syllables, substitutions were much more frequent than either deletions or additions; for phonemes, deletions were slightly more frequent than substitutions; both were more frequent than additions; and (4) for errors involving just a single segment, substitutions were more frequent than either deletions or additions. The raw data are being made available to other researchers as supplementary material to form the beginnings of a database of speech errors collected under controlled laboratory conditions.
Collapse
Affiliation(s)
- Robert Albert Felty
- Nuance Communications, 1198 East Arques Avenue, Sunnyvale, California 94085, USA.
| | | | | | | |
Collapse
|
36
|
Abstract
Vowel devoicing happens in Japanese when the high vowel is between voiceless consonants. The aim of this study is to investigate the lexical representation of vowel devoicing. A long-term repetition-priming experiment was conducted. Participants shadowed words containing either a devoiced or a voiced vowel in three priming paradigms, and their shadow responses were analyzed. It was found that participants produced the phonologically appropriate allophone most of the time based on the consonantal environments. Shadowing latencies for the voiced stimuli were faster than for the devoiced stimuli in the environment where the vowel should be voiced; while, no significant RT difference was observed between the two forms in the environment where vowel devoicing was expected. In addition, a priming effect between the devoiced and voiced stimuli emerged only in the devoicing environment. The results suggest that since vowel devoicing is very common in spoken Japanese, the devoiced form may be stored in the lexicon. The results also suggest a link between the two forms in the lexicon and a direct access between an input and a lexical representation without going through intermediate levels that usually cost extra processes.
Collapse
Affiliation(s)
- Naomi Ogasawara
- Department of English, National Taiwan Normal University, No. 162, Section I, He-Ping East Road, Daan District, Taipei City 106, Taiwan.
| |
Collapse
|
37
|
Adaptive Resonance Theory: How a brain learns to consciously attend, learn, and recognize a changing world. Neural Netw 2013; 37:1-47. [PMID: 23149242 DOI: 10.1016/j.neunet.2012.09.017] [Citation(s) in RCA: 183] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2012] [Revised: 08/24/2012] [Accepted: 09/24/2012] [Indexed: 11/17/2022]
|
38
|
Zevin JD, Datta H, Skipper JI. Sensitive periods for language and recovery from stroke: conceptual and practical parallels. Dev Psychobiol 2012; 54:332-42. [PMID: 22415920 DOI: 10.1002/dev.20626] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
In this review, we consider the literature on sensitive periods for language acquisition from the perspective of the stroke recovery literature treated in this Special Issue. Conceptually, the two areas of study are linked in a number of ways. For example, the fact that learning itself can set the stage for future failures to learn (in second language learning) or to remediate (as described in constraint therapy) is an important insight in both areas, as is the increasing awareness that limits on learning can be overcome by creating the appropriate environmental context. Similar practical issues, such as distinguishing native-like language acquisition or recovery of function from compensatory mechanisms, arise in both areas as well.
Collapse
Affiliation(s)
- Jason D Zevin
- Sackler Institute for Developmental Psychobiology, Weill Cornell Medical College, 1300 York Ave., Box 140, New York, New York 10065, USA.
| | | | | |
Collapse
|
39
|
Abstract
Speech alignment describes the unconscious tendency to produce speech that shares characteristics with perceived speech (eg Goldinger, 1998 Psychological Review 105 251-279). In the present study we evaluated whether seeing a talker enhances alignment over just hearing a talker. Pairs of participants performed an interactive search task which required them to repeatedly utter a series of keywords. Half of the pairs performed the task while hearing each other, while the other half could see and hear each other. Alignment was assessed by naive judges rating the similarity of interlocutors' keywords recorded before, during, and after the interactive task. Results showed that interlocutors aligned more when able to see one another suggesting that visual information enhances speech alignment.
Collapse
Affiliation(s)
- James W Dias
- Department of Psychology, University of California, Riverside, Riverside, CA 92521, USA
| | | |
Collapse
|
40
|
Gow DW. The cortical organization of lexical knowledge: a dual lexicon model of spoken language processing. BRAIN AND LANGUAGE 2012; 121:273-88. [PMID: 22498237 PMCID: PMC3348354 DOI: 10.1016/j.bandl.2012.03.005] [Citation(s) in RCA: 109] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2011] [Revised: 02/08/2012] [Accepted: 03/13/2012] [Indexed: 05/14/2023]
Abstract
Current accounts of spoken language assume the existence of a lexicon where wordforms are stored and interact during spoken language perception, understanding and production. Despite the theoretical importance of the wordform lexicon, the exact localization and function of the lexicon in the broader context of language use is not well understood. This review draws on evidence from aphasia, functional imaging, neuroanatomy, laboratory phonology and behavioral results to argue for the existence of parallel lexica that facilitate different processes in the dorsal and ventral speech pathways. The dorsal lexicon, localized in the inferior parietal region including the supramarginal gyrus, serves as an interface between phonetic and articulatory representations. The ventral lexicon, localized in the posterior superior temporal sulcus and middle temporal gyrus, serves as an interface between phonetic and semantic representations. In addition to their interface roles, the two lexica contribute to the robustness of speech processing.
Collapse
Affiliation(s)
- David W Gow
- Neuropsychology Laboratory, Massachusetts General Hospital, Boston, MA 02114, USA.
| |
Collapse
|
41
|
Abstract
Dyslexia is a neurodevelopmental disorder that is characterised by slow and inaccurate word recognition. Dyslexia has been reported in every culture studied, and mounting evidence draws attention to cross-linguistic similarity in its neurobiological and neurocognitive bases. Much progress has been made across research specialties spanning the behavioural, neuropsychological, neurobiological, and causal levels of analysis in the past 5 years. From a neuropsychological perspective, the phonological theory remains the most compelling, although phonological problems also interact with other cognitive risk factors. Work confirms that, neurobiologically, dyslexia is characterised by dysfunction of the normal left hemisphere language network and also implicates abnormal white matter development. Studies accounting for reading experience demonstrate that many recorded neural differences show causes rather than effects of dyslexia. Six predisposing candidate genes have been identified, and evidence shows gene by environment interaction.
Collapse
Affiliation(s)
- Robin L Peterson
- Department of Psychology, University of Denver, Denver, CO 80208, USA.
| | | |
Collapse
|
42
|
Floccia C, Goslin J, Morais JJD, Kolinsky R. Syllable effects in a fragment-detection task in italian listeners. Front Psychol 2012; 3:140. [PMID: 22590464 PMCID: PMC3349302 DOI: 10.3389/fpsyg.2012.00140] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2011] [Accepted: 04/20/2012] [Indexed: 11/13/2022] Open
Abstract
In the line of the monitoring studies initiated by Mehler et al. (1981), a group of Italian listeners were asked to detect auditory CV and CVC targets in carrier words beginning with a CV, a CVC, or a CVG (G = geminate) syllable with variable initial syllable stress. By slowing participants reaction times (RTs), using both catch and foil trials, a syllable effect was found, partially modulated by participants' speed and stress location. When catch trials were removed in a second experiment the syllable effect was not observed, even if RTs were similar to that of the first experiment. We discuss these data in relation to the language transparency hypothesis, the nature of the pivotal consonant, and the resonance-based ART model for speech perception (Grossberg, 2003).
Collapse
|
43
|
Berent I, Balaban E, Vaknin-Nusbaum V. How linguistic chickens help spot spoken-eggs: phonological constraints on speech identification. Front Psychol 2011; 2:182. [PMID: 21949509 PMCID: PMC3171785 DOI: 10.3389/fpsyg.2011.00182] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2011] [Accepted: 07/19/2011] [Indexed: 11/25/2022] Open
Abstract
It has long been known that the identification of aural stimuli as speech is context-dependent (Remez et al., 1981). Here, we demonstrate that the discrimination of speech stimuli from their non-speech transforms is further modulated by their linguistic structure. We gauge the effect of phonological structure on discrimination across different manifestations of well-formedness in two distinct languages. One case examines the restrictions on English syllables (e.g., the well-formed melif vs. ill-formed mlif); another investigates the constraints on Hebrew stems by comparing ill-formed AAB stems (e.g., TiTuG) with well-formed ABB and ABC controls (e.g., GiTuT, MiGuS). In both cases, non-speech stimuli that conform to well-formed structures are harder to discriminate from speech than stimuli that conform to ill-formed structures. Auxiliary experiments rule out alternative acoustic explanations for this phenomenon. In English, we show that acoustic manipulations that mimic the mlif–melif contrast do not impair the classification of non-speech stimuli whose structure is well-formed (i.e., disyllables with phonetically short vs. long tonic vowels). Similarly, non-speech stimuli that are ill-formed in Hebrew present no difficulties to English speakers. Thus, non-speech stimuli are harder to classify only when they are well-formed in the participants’ native language. We conclude that the classification of non-speech stimuli is modulated by their linguistic structure: inputs that support well-formed outputs are more readily classified as speech.
Collapse
Affiliation(s)
- Iris Berent
- Department of Psychology, Northeastern University Boston, MA, USA
| | | | | |
Collapse
|
44
|
Chambers KE, Onishi KH, Fisher C. A vowel is a vowel: generalizing newly learned phonotactic constraints to new contexts. J Exp Psychol Learn Mem Cogn 2010; 36:821-8. [PMID: 20438279 DOI: 10.1037/a0018991] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Adults can learn novel phonotactic constraints from brief listening experience. We investigated the representations underlying phonotactic learning by testing generalization to syllables containing new vowels. Adults heard consonant-vowel-consonant study syllables in which particular consonants were artificially restricted to the onset or coda position (e.g., /f/ is an onset, /s/ is a coda). Subjects were quicker to repeat novel constraint-following (legal) than constraint-violating (illegal) test syllables whether they contained a vowel used in the study syllables (training vowel) or a new (transfer) vowel. This effect emerged regardless of the acoustic similarity between training and transfer vowels. Listeners thus learned and generalized phonotactic constraints that can be characterized as simple first-order constraints on consonant position. Rapid generalization independent of vowel context provides evidence that vowels and consonants are represented independently by processes underlying phonotactic learning.
Collapse
Affiliation(s)
- Kyle E Chambers
- Department of Psychology, Gustavus Adolphus College, Saint Peter, MN 56082, USA.
| | | | | |
Collapse
|
45
|
Cochlea-scaled entropy, not consonants, vowels, or time, best predicts speech intelligibility. Proc Natl Acad Sci U S A 2010; 107:12387-92. [PMID: 20566842 DOI: 10.1073/pnas.0913625107] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Speech sounds are traditionally divided into consonants and vowels. When only vowels or only consonants are replaced by noise, listeners are more accurate understanding sentences in which consonants are replaced but vowels remain. From such data, vowels have been suggested to be more important for understanding sentences; however, such conclusions are mitigated by the fact that replaced consonant segments were roughly one-third shorter than vowels. We report two experiments that demonstrate listener performance to be better predicted by simple psychoacoustic measures of cochlea-scaled spectral change across time. First, listeners identified sentences in which portions of consonants (C), vowels (V), CV transitions, or VC transitions were replaced by noise. Relative intelligibility was not well accounted for on the basis of Cs, Vs, or their transitions. In a second experiment, distinctions between Cs and Vs were abandoned. Instead, portions of sentences were replaced on the basis of cochlea-scaled spectral entropy (CSE). Sentence segments having relatively high, medium, or low entropy were replaced with noise. Intelligibility decreased linearly as the amount of replaced CSE increased. Duration of signal replaced and proportion of consonants/vowels replaced fail to account for listener data. CSE corresponds closely with the linguistic construct of sonority (or vowel-likeness) that is useful for describing phonological systematicity, especially syllable composition. Results challenge traditional distinctions between consonants and vowels. Speech intelligibility is better predicted by nonlinguistic sensory measures of uncertainty (potential information) than by orthodox physical acoustic measures or linguistic constructs.
Collapse
|
46
|
Warner N, Otake T, Arai T. Intonational structure as a word-boundary cue in Tokyo Japanese. LANGUAGE AND SPEECH 2010; 53:107-131. [PMID: 20415004 DOI: 10.1177/0023830909351235] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
While listeners are recognizing words from the connected speech stream, they are also parsing information from the intonational contour. This contour may contain cues to word boundaries, particularly if a language has boundary tones that occur at a large proportion of word onsets. We investigate how useful the pitch rise at the beginning of an accentual phrase (APR) would be as a potential word-boundary cue for Japanese listeners. A corpus study shows that it should allow listeners to locate approximately 40-60% of word onsets, while causing less than 1% false positives. We then present a word-spotting study which shows that Japanese listeners can, indeed, use accentual phrase boundary cues during segmentation. This work shows that the prosodic patterns that have been found in the production of Japanese also impact listeners' processing.
Collapse
Affiliation(s)
- Natasha Warner
- Department of Linguistics, University of Arizona, Tucson, AZ 85721-0028, USA.
| | | | | |
Collapse
|
47
|
Buchwald AB, Winters SJ, Pisoni DB. Visual speech primes open-set recognition of spoken words. ACTA ACUST UNITED AC 2009; 24:580-610. [PMID: 21544260 DOI: 10.1080/01690960802536357] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Visual speech perception has become a topic of considerable interest to speech researchers. Previous research has demonstrated that perceivers neurally encode and use speech information from the visual modality, and this information has been found to facilitate spoken word recognition in tasks such as lexical decision (Kim, Davis, & Krins, 2004). In this paper, we used a cross-modality repetition priming paradigm with visual speech lexical primes and auditory lexical targets to explore the nature of this priming effect. First, we report that participants identified spoken words mixed with noise more accurately when the words were preceded by a visual speech prime of the same word compared with a control condition. Second, analyses of the responses indicated that both correct and incorrect responses were constrained by the visual speech information in the prime. These complementary results suggest that the visual speech primes have an effect on lexical access by increasing the likelihood that words with certain phonetic properties are selected. Third, we found that the cross-modality repetition priming effect was maintained even when visual and auditory signals came from different speakers, and thus different instances of the same lexical item. We discuss implications of these results for current theories of speech perception.
Collapse
Affiliation(s)
- Adam B Buchwald
- Department of Speech-Language Pathology and Audiology, New York University, New York, NY, and Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN, USA
| | | | | |
Collapse
|
48
|
Ogasawara N, Warner N. Processing missing vowels: Allophonic processing in Japanese. ACTA ACUST UNITED AC 2009. [DOI: 10.1080/01690960802084028] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
49
|
Ames H, Grossberg S. Speaker normalization using cortical strip maps: a neural model for steady-state vowel categorization. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 124:3918-3936. [PMID: 19206817 DOI: 10.1121/1.2997478] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Auditory signals of speech are speaker dependent, but representations of language meaning are speaker independent. The transformation from speaker-dependent to speaker-independent language representations enables speech to be learned and understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitch-independent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by adaptive resonance theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [Peterson, G. E., and Barney, H.L., J. Acoust. Soc. Am. 24, 175-184 (1952).] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.
Collapse
Affiliation(s)
- Heather Ames
- Department of Cognitive and Neural Systems, Center for Adaptive Systems, and Center of Excellence for Learning In Education, Science, and Technology, Boston University, Boston, Massachusetts 02215, USA
| | | |
Collapse
|
50
|
Francis AL, Driscoll C. Training to use voice onset time as a cue to talker identification induces a left-ear/right-hemisphere processing advantage. BRAIN AND LANGUAGE 2006; 98:310-8. [PMID: 16828153 PMCID: PMC2957907 DOI: 10.1016/j.bandl.2006.06.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/06/2006] [Revised: 05/11/2006] [Accepted: 06/01/2006] [Indexed: 05/10/2023]
Abstract
We examined the effect of perceptual training on a well-established hemispheric asymmetry in speech processing. Eighteen listeners were trained to use a within-category difference in voice onset time (VOT) to cue talker identity. Successful learners (n=8) showed faster response times for stimuli presented only to the left ear than for those presented only to the right. The development of a left-ear/right-hemisphere advantage for processing a prototypically phonetic cue supports a model of speech perception in which lateralization is driven by functional demands (talker identification vs. phonetic categorization) rather than by acoustic stimulus properties alone.
Collapse
Affiliation(s)
- Alexander L Francis
- Department of Speech, Language and Hearing Sciences, Purdue University, 1353 Heavilon Hall, 500 Oval Drive, West Lafayette, IN 47907, USA.
| | | |
Collapse
|