1
|
Magnuson JS, Luthra S. Simple Recurrent Networks are Interactive. Psychon Bull Rev 2024:10.3758/s13423-024-02608-y. [PMID: 39537950 DOI: 10.3758/s13423-024-02608-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/19/2024] [Indexed: 11/16/2024]
Abstract
There is disagreement among cognitive scientists as to whether a key computational framework - the Simple Recurrent Network (SRN; Elman, Machine Learning, 7(2), 195-225, 1991; Elman, Cognitive Science, 14(2), 179-211, 1990) - is a feedforward system. SRNs have been essential tools in advancing theories of learning, development, and processing in cognitive science for more than three decades. If SRNs were feedforward systems, there would be pervasive theoretical implications: Anything an SRN can do would therefore be explainable without interaction (feedback). However, despite claims that SRNs (and by extension recurrent neural networks more generally) are feedforward (Norris, 1993), this is not the case. Feedforward networks by definition are acyclic graphs - they contain no loops. SRNs contain loops - from hidden units back to hidden units with a time delay - and are therefore cyclic graphs. As we demonstrate, they are interactive in the sense normally implied for networks with feedback connections between layers: In an SRN, bottom-up inputs are inextricably mixed with previous model-internal computations. Inputs are transmitted to hidden units by multiplying them by input-to-hidden weights. However, hidden units simultaneously receive their own previous activations as input via hidden-to-hidden connections with a one-step time delay (typically via context units). These are added to the input-to-hidden values, and the sums are transformed by an activation function. Thus, bottom-up inputs are mixed with the products of potentially many preceding transformations of inputs and model-internal states. We discuss theoretical implications through a key example from psycholinguistics where the status of SRNs as feedforward or interactive has crucial ramifications.
Collapse
Affiliation(s)
- James S Magnuson
- BCBL, Basque Center on Cognition Brain and Language, Donostia-San Sebastián, Spain.
- Ikerbasque, Basque Foundation for Science, Bilbao, Spain.
- University of Connecticut, Storrs, CT, USA.
| | | |
Collapse
|
2
|
Sarrett ME, Toscano JC. Decoding speech sounds from neurophysiological data: Practical considerations and theoretical implications. Psychophysiology 2024; 61:e14475. [PMID: 37947235 DOI: 10.1111/psyp.14475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Revised: 09/13/2023] [Accepted: 10/04/2023] [Indexed: 11/12/2023]
Abstract
Machine learning techniques have proven to be a useful tool in cognitive neuroscience. However, their implementation in scalp-recorded electroencephalography (EEG) is relatively limited. To address this, we present three analyses using data from a previous study that examined event-related potential (ERP) responses to a wide range of naturally-produced speech sounds. First, we explore which features of the EEG signal best maximize machine learning accuracy for a voicing distinction, using a support vector machine (SVM). We manipulate three dimensions of the EEG signal as input to the SVM: number of trials averaged, number of time points averaged, and polynomial fit. We discuss the trade-offs in using different feature sets and offer some recommendations for researchers using machine learning. Next, we use SVMs to classify specific pairs of phonemes, finding that we can detect differences in the EEG signal that are not otherwise detectable using conventional ERP analyses. Finally, we characterize the timecourse of phonetic feature decoding across three phonological dimensions (voicing, manner of articulation, and place of articulation), and find that voicing and manner are decodable from neural activity, whereas place of articulation is not. This set of analyses addresses both practical considerations in the application of machine learning to EEG, particularly for speech studies, and also sheds light on current issues regarding the nature of perceptual representations of speech.
Collapse
Affiliation(s)
- McCall E Sarrett
- Department of Psychological and Brain Sciences, Villanova University, Villanova, Pennsylvania, USA
- Psychology Department, Gonzaga University, Spokane, Washington, USA
| | - Joseph C Toscano
- Department of Psychological and Brain Sciences, Villanova University, Villanova, Pennsylvania, USA
| |
Collapse
|
3
|
Crinnion AM, Luthra S, Gaston P, Magnuson JS. Resolving competing predictions in speech: How qualitatively different cues and cue reliability contribute to phoneme identification. Atten Percept Psychophys 2024; 86:942-961. [PMID: 38383914 PMCID: PMC11233028 DOI: 10.3758/s13414-024-02849-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/17/2024] [Indexed: 02/23/2024]
Abstract
Listeners have many sources of information available in interpreting speech. Numerous theoretical frameworks and paradigms have established that various constraints impact the processing of speech sounds, but it remains unclear how listeners might simultaneously consider multiple cues, especially those that differ qualitatively (i.e., with respect to timing and/or modality) or quantitatively (i.e., with respect to cue reliability). Here, we establish that cross-modal identity priming can influence the interpretation of ambiguous phonemes (Exp. 1, N = 40) and show that two qualitatively distinct cues - namely, cross-modal identity priming and auditory co-articulatory context - have additive effects on phoneme identification (Exp. 2, N = 40). However, we find no effect of quantitative variation in a cue - specifically, changes in the reliability of the priming cue did not influence phoneme identification (Exp. 3a, N = 40; Exp. 3b, N = 40). Overall, we find that qualitatively distinct cues can additively influence phoneme identification. While many existing theoretical frameworks address constraint integration to some degree, our results provide a step towards understanding how information that differs in both timing and modality is integrated in online speech perception.
Collapse
Affiliation(s)
| | | | | | - James S Magnuson
- University of Connecticut, Storrs, CT, USA
- BCBL. Basque Center on Cognition, Brain and Language, Donostia-San Sebastián, Spain
- Ikerbasque. Basque Foundation for Science, Bilbao, Spain
| |
Collapse
|
4
|
Magnuson JS, Crinnion AM, Luthra S, Gaston P, Grubb S. Contra assertions, feedback improves word recognition: How feedback and lateral inhibition sharpen signals over noise. Cognition 2024; 242:105661. [PMID: 37944313 PMCID: PMC11238470 DOI: 10.1016/j.cognition.2023.105661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2022] [Revised: 10/17/2023] [Accepted: 11/02/2023] [Indexed: 11/12/2023]
Abstract
Whether top-down feedback modulates perception has deep implications for cognitive theories. Debate has been vigorous in the domain of spoken word recognition, where competing computational models and agreement on at least one diagnostic experimental paradigm suggest that the debate may eventually be resolvable. Norris and Cutler (2021) revisit arguments against lexical feedback in spoken word recognition models. They also incorrectly claim that recent computational demonstrations that feedback promotes accuracy and speed under noise (Magnuson et al., 2018) were due to the use of the Luce choice rule rather than adding noise to inputs (noise was in fact added directly to inputs). They also claim that feedback cannot improve word recognition because feedback cannot distinguish signal from noise. We have two goals in this paper. First, we correct the record about the simulations of Magnuson et al. (2018). Second, we explain how interactive activation models selectively sharpen signals via joint effects of feedback and lateral inhibition that boost lexically-coherent sublexical patterns over noise. We also review a growing body of behavioral and neural results consistent with feedback and inconsistent with autonomous (non-feedback) architectures, and conclude that parsimony supports feedback. We close by discussing the potential for synergy between autonomous and interactive approaches.
Collapse
Affiliation(s)
- James S Magnuson
- University of Connecticut. Storrs, CT, USA; BCBL. Basque Center on Cognition Brain and Language, Donostia-San Sebastián, Spain; Ikerbasque. Basque Foundation for Science, Bilbao, Spain.
| | | | | | | | | |
Collapse
|
5
|
Bidelman GM, Carter JA. Continuous dynamics in behavior reveal interactions between perceptual warping in categorization and speech-in-noise perception. Front Neurosci 2023; 17:1032369. [PMID: 36937676 PMCID: PMC10014819 DOI: 10.3389/fnins.2023.1032369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 02/14/2023] [Indexed: 03/05/2023] Open
Abstract
Introduction Spoken language comprehension requires listeners map continuous features of the speech signal to discrete category labels. Categories are however malleable to surrounding context and stimulus precedence; listeners' percept can dynamically shift depending on the sequencing of adjacent stimuli resulting in a warping of the heard phonetic category. Here, we investigated whether such perceptual warping-which amplify categorical hearing-might alter speech processing in noise-degraded listening scenarios. Methods We measured continuous dynamics in perception and category judgments of an acoustic-phonetic vowel gradient via mouse tracking. Tokens were presented in serial vs. random orders to induce more/less perceptual warping while listeners categorized continua in clean and noise conditions. Results Listeners' responses were faster and their mouse trajectories closer to the ultimate behavioral selection (marked visually on the screen) in serial vs. random order, suggesting increased perceptual attraction to category exemplars. Interestingly, order effects emerged earlier and persisted later in the trial time course when categorizing speech in noise. Discussion These data describe interactions between perceptual warping in categorization and speech-in-noise perception: warping strengthens the behavioral attraction to relevant speech categories, making listeners more decisive (though not necessarily more accurate) in their decisions of both clean and noise-degraded speech.
Collapse
Affiliation(s)
- Gavin M. Bidelman
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN, United States
- Program in Neuroscience, Indiana University, Bloomington, IN, United States
| | - Jared A. Carter
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, United States
- Hearing Sciences – Scottish Section, Division of Clinical Neuroscience, School of Medicine, University of Nottingham, Glasgow, United Kingdom
| |
Collapse
|
6
|
Yu CH, Li M, Noe C, Fischer-Baum S, Vannucci M. Bayesian inference for stationary points in gaussian process regression models for event-related potentials analysis. Biometrics 2022. [PMID: 34997758 DOI: 10.1111/biom.13621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 12/01/2021] [Accepted: 12/16/2021] [Indexed: 12/01/2022]
Abstract
Stationary points embedded in the derivatives are often critical for a model to be interpretable and may be considered as key features of interest in many applications. We propose a semiparametric Bayesian model to efficiently infer the locations of stationary points of a nonparametric function, which also produces an estimate of the function. We use Gaussian processes as a flexible prior for the underlying function and impose derivative constraints to control the function's shape via conditioning. We develop an inferential strategy that intentionally restricts estimation to the case of at least one stationary point, bypassing possible mis-specifications in the number of stationary points and avoiding the varying dimension problem that often brings in computational complexity. We illustrate the proposed methods using simulations and then apply the method to the estimation of event-related potentials (ERP) derived from electroencephalography (EEG) signals. We show how the proposed method automatically identifies characteristic components and their latencies at the individual level, which avoids the excessive averaging across subjects which is routinely done in the field to obtain smooth curves. By applying this approach to EEG data collected from younger and older adults during a speech perception task, we are able to demonstrate how the time course of speech perception processes changes with age. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Cheng-Han Yu
- Department of Mathematical and Statistical Sciences, Marquette University, Milwaukee, WI, USA
| | - Meng Li
- Department of Statistics, Rice University, Houston, TX, USA
| | - Colin Noe
- Department of Psychological Science, Rice University, Houston, TX 77005
| | | | | |
Collapse
|
7
|
Kapnoula EC, McMurray B. Idiosyncratic use of bottom-up and top-down information leads to differences in speech perception flexibility: Converging evidence from ERPs and eye-tracking. BRAIN AND LANGUAGE 2021; 223:105031. [PMID: 34628259 PMCID: PMC11251822 DOI: 10.1016/j.bandl.2021.105031] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 07/29/2021] [Accepted: 09/22/2021] [Indexed: 06/13/2023]
Abstract
Listeners generally categorize speech sounds in a gradient manner. However, recent work, using a visual analogue scaling (VAS) task, suggests that some listeners show more categorical performance, leading to less flexible cue integration and poorer recovery from misperceptions (Kapnoula et al., 2017, 2021). We asked how individual differences in speech gradiency can be reconciled with the well-established gradiency in the modal listener, showing how VAS performance relates to both Visual World Paradigm and EEG measures of gradiency. We also investigated three potential sources of these individual differences: inhibitory control; lexical inhibition; and early cue encoding. We used the N1 ERP component to track pre-categorical encoding of Voice Onset Time (VOT). The N1 linearly tracked VOT, reflecting a fundamentally gradient speech perception; however, for less gradient listeners, this linearity was disrupted near the boundary. Thus, while all listeners are gradient, they may show idiosyncratic encoding of specific cues, affecting downstream processing.
Collapse
Affiliation(s)
- Efthymia C Kapnoula
- Dept. of Psychological and Brain Sciences, University of Iowa, United States; DeLTA Center, University of Iowa, United States; Basque Center on Cognition, Brain and Language, Spain.
| | - Bob McMurray
- Dept. of Psychological and Brain Sciences, University of Iowa, United States; DeLTA Center, University of Iowa, United States; Dept. of Communication Sciences and Disorders, DeLTA Center, University of Iowa, United States; Dept. of Linguistics, DeLTA Center, University of Iowa, United States
| |
Collapse
|
8
|
Luthra S, Peraza‐Santiago G, Beeson K, Saltzman D, Crinnion AM, Magnuson JS. Robust Lexically Mediated Compensation for Coarticulation: Christmash Time Is Here Again. Cogn Sci 2021; 45:e12962. [PMID: 33877697 PMCID: PMC8243960 DOI: 10.1111/cogs.12962] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Revised: 02/10/2021] [Accepted: 02/19/2021] [Indexed: 11/30/2022]
Abstract
A long-standing question in cognitive science is how high-level knowledge is integrated with sensory input. For example, listeners can leverage lexical knowledge to interpret an ambiguous speech sound, but do such effects reflect direct top-down influences on perception or merely postperceptual biases? A critical test case in the domain of spoken word recognition is lexically mediated compensation for coarticulation (LCfC). Previous LCfC studies have shown that a lexically restored context phoneme (e.g., /s/ in Christma#) can alter the perceived place of articulation of a subsequent target phoneme (e.g., the initial phoneme of a stimulus from a tapes-capes continuum), consistent with the influence of an unambiguous context phoneme in the same position. Because this phoneme-to-phoneme compensation for coarticulation is considered sublexical, scientists agree that evidence for LCfC would constitute strong support for top-down interaction. However, results from previous LCfC studies have been inconsistent, and positive effects have often been small. Here, we conducted extensive piloting of stimuli prior to testing for LCfC. Specifically, we ensured that context items elicited robust phoneme restoration (e.g., that the final phoneme of Christma# was reliably identified as /s/) and that unambiguous context-final segments (e.g., a clear /s/ at the end of Christmas) drove reliable compensation for coarticulation for a subsequent target phoneme. We observed robust LCfC in a well-powered, preregistered experiment with these pretested items (N = 40) as well as in a direct replication study (N = 40). These results provide strong evidence in favor of computational models of spoken word recognition that include top-down feedback.
Collapse
Affiliation(s)
| | | | | | | | | | - James S. Magnuson
- Psychological SciencesUniversity of Connecticut
- BCBL, Basque Center on Cognition Brain and Language
- Ikerbasque, Basque Foundation for Science
| |
Collapse
|
9
|
Mahmud MS, Yeasin M, Bidelman GM. Data-driven machine learning models for decoding speech categorization from evoked brain responses. J Neural Eng 2021; 18. [PMID: 33690177 DOI: 10.1101/2020.08.03.234997] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Accepted: 03/09/2021] [Indexed: 05/24/2023]
Abstract
Objective.Categorical perception (CP) of audio is critical to understand how the human brain perceives speech sounds despite widespread variability in acoustic properties. Here, we investigated the spatiotemporal characteristics of auditory neural activity that reflects CP for speech (i.e. differentiates phonetic prototypes from ambiguous speech sounds).Approach.We recorded 64-channel electroencephalograms as listeners rapidly classified vowel sounds along an acoustic-phonetic continuum. We used support vector machine classifiers and stability selection to determine when and where in the brain CP was best decoded across space and time via source-level analysis of the event-related potentials.Main results. We found that early (120 ms) whole-brain data decoded speech categories (i.e. prototypical vs. ambiguous tokens) with 95.16% accuracy (area under the curve 95.14%;F1-score 95.00%). Separate analyses on left hemisphere (LH) and right hemisphere (RH) responses showed that LH decoding was more accurate and earlier than RH (89.03% vs. 86.45% accuracy; 140 ms vs. 200 ms). Stability (feature) selection identified 13 regions of interest (ROIs) out of 68 brain regions [including auditory cortex, supramarginal gyrus, and inferior frontal gyrus (IFG)] that showed categorical representation during stimulus encoding (0-260 ms). In contrast, 15 ROIs (including fronto-parietal regions, IFG, motor cortex) were necessary to describe later decision stages (later 300-800 ms) of categorization but these areas were highly associated with the strength of listeners' categorical hearing (i.e. slope of behavioral identification functions).Significance.Our data-driven multivariate models demonstrate that abstract categories emerge surprisingly early (∼120 ms) in the time course of speech processing and are dominated by engagement of a relatively compact fronto-temporal-parietal brain network.
Collapse
Affiliation(s)
- Md Sultan Mahmud
- Department of Electrical and Computer Engineering, University of Memphis, 3815 Central Avenue, Memphis, TN 38152, United States of America
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States of America
| | - Mohammed Yeasin
- Department of Electrical and Computer Engineering, University of Memphis, 3815 Central Avenue, Memphis, TN 38152, United States of America
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States of America
| | - Gavin M Bidelman
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States of America
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, United States of America
- University of Tennessee Health Sciences Center, Department of Anatomy and Neurobiology, Memphis, TN, United States of America
| |
Collapse
|
10
|
Mahmud MS, Yeasin M, Bidelman GM. Data-driven machine learning models for decoding speech categorization from evoked brain responses. J Neural Eng 2021; 18:10.1088/1741-2552/abecf0. [PMID: 33690177 PMCID: PMC8738965 DOI: 10.1088/1741-2552/abecf0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Accepted: 03/09/2021] [Indexed: 11/12/2022]
Abstract
Objective.Categorical perception (CP) of audio is critical to understand how the human brain perceives speech sounds despite widespread variability in acoustic properties. Here, we investigated the spatiotemporal characteristics of auditory neural activity that reflects CP for speech (i.e. differentiates phonetic prototypes from ambiguous speech sounds).Approach.We recorded 64-channel electroencephalograms as listeners rapidly classified vowel sounds along an acoustic-phonetic continuum. We used support vector machine classifiers and stability selection to determine when and where in the brain CP was best decoded across space and time via source-level analysis of the event-related potentials.Main results. We found that early (120 ms) whole-brain data decoded speech categories (i.e. prototypical vs. ambiguous tokens) with 95.16% accuracy (area under the curve 95.14%;F1-score 95.00%). Separate analyses on left hemisphere (LH) and right hemisphere (RH) responses showed that LH decoding was more accurate and earlier than RH (89.03% vs. 86.45% accuracy; 140 ms vs. 200 ms). Stability (feature) selection identified 13 regions of interest (ROIs) out of 68 brain regions [including auditory cortex, supramarginal gyrus, and inferior frontal gyrus (IFG)] that showed categorical representation during stimulus encoding (0-260 ms). In contrast, 15 ROIs (including fronto-parietal regions, IFG, motor cortex) were necessary to describe later decision stages (later 300-800 ms) of categorization but these areas were highly associated with the strength of listeners' categorical hearing (i.e. slope of behavioral identification functions).Significance.Our data-driven multivariate models demonstrate that abstract categories emerge surprisingly early (∼120 ms) in the time course of speech processing and are dominated by engagement of a relatively compact fronto-temporal-parietal brain network.
Collapse
Affiliation(s)
- Md Sultan Mahmud
- Department of Electrical and Computer Engineering, University of Memphis, 3815 Central Avenue, Memphis, TN 38152, United States of America
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States of America
| | - Mohammed Yeasin
- Department of Electrical and Computer Engineering, University of Memphis, 3815 Central Avenue, Memphis, TN 38152, United States of America
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States of America
| | - Gavin M Bidelman
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States of America
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, United States of America
- University of Tennessee Health Sciences Center, Department of Anatomy and Neurobiology, Memphis, TN, United States of America
| |
Collapse
|
11
|
Bidelman GM, Pearson C, Harrison A. Lexical Influences on Categorical Speech Perception Are Driven by a Temporoparietal Circuit. J Cogn Neurosci 2021; 33:840-852. [PMID: 33464162 DOI: 10.1162/jocn_a_01678] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Categorical judgments of otherwise identical phonemes are biased toward hearing words (i.e., "Ganong effect") suggesting lexical context influences perception of even basic speech primitives. Lexical biasing could manifest via late stage postperceptual mechanisms related to decision or, alternatively, top-down linguistic inference that acts on early perceptual coding. Here, we exploited the temporal sensitivity of EEG to resolve the spatiotemporal dynamics of these context-related influences on speech categorization. Listeners rapidly classified sounds from a /gɪ/-/kɪ/ gradient presented in opposing word-nonword contexts (GIFT-kift vs. giss-KISS), designed to bias perception toward lexical items. Phonetic perception shifted toward the direction of words, establishing a robust Ganong effect behaviorally. ERPs revealed a neural analog of lexical biasing emerging within ~200 msec. Source analyses uncovered a distributed neural network supporting the Ganong including middle temporal gyrus, inferior parietal lobe, and middle frontal cortex. Yet, among Ganong-sensitive regions, only left middle temporal gyrus and inferior parietal lobe predicted behavioral susceptibility to lexical influence. Our findings confirm lexical status rapidly constrains sublexical categorical representations for speech within several hundred milliseconds but likely does so outside the purview of canonical auditory-sensory brain areas.
Collapse
Affiliation(s)
- Gavin M Bidelman
- University of Memphis, TN.,University of Tennessee Health Sciences Center, Memphis, TN
| | | | | |
Collapse
|
12
|
Getz LM, Toscano JC. The time-course of speech perception revealed by temporally-sensitive neural measures. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2020; 12:e1541. [PMID: 32767836 DOI: 10.1002/wcs.1541] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2019] [Revised: 05/28/2020] [Accepted: 06/26/2020] [Indexed: 11/07/2022]
Abstract
Recent advances in cognitive neuroscience have provided a detailed picture of the early time-course of speech perception. In this review, we highlight this work, placing it within the broader context of research on the neurobiology of speech processing, and discuss how these data point us toward new models of speech perception and spoken language comprehension. We focus, in particular, on temporally-sensitive measures that allow us to directly measure early perceptual processes. Overall, the data provide support for two key principles: (a) speech perception is based on gradient representations of speech sounds and (b) speech perception is interactive and receives input from higher-level linguistic context at the earliest stages of cortical processing. Implications for models of speech processing and the neurobiology of language more broadly are discussed. This article is categorized under: Psychology > Language Psychology > Perception and Psychophysics Neuroscience > Cognition.
Collapse
Affiliation(s)
- Laura M Getz
- Department of Psychological Sciences, University of San Diego, San Diego, California, USA
| | - Joseph C Toscano
- Department of Psychological and Brain Sciences, Villanova University, Villanova, Pennsylvania, USA
| |
Collapse
|