1
|
Siedenburg K, Graves J, Pressnitzer D. A unitary model of auditory frequency change perception. PLoS Comput Biol 2023; 19:e1010307. [PMID: 36634121 PMCID: PMC9876382 DOI: 10.1371/journal.pcbi.1010307] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 01/25/2023] [Accepted: 01/04/2023] [Indexed: 01/13/2023] Open
Abstract
Changes in the frequency content of sounds over time are arguably the most basic form of information about the behavior of sound-emitting objects. In perceptual studies, such changes have mostly been investigated separately, as aspects of either pitch or timbre. Here, we propose a unitary account of "up" and "down" subjective judgments of frequency change, based on a model combining auditory correlates of acoustic cues in a sound-specific and listener-specific manner. To do so, we introduce a generalized version of so-called Shepard tones, allowing symmetric manipulations of spectral information on a fine scale, usually associated to pitch (spectral fine structure, SFS), and on a coarse scale, usually associated timbre (spectral envelope, SE). In a series of behavioral experiments, listeners reported "up" or "down" shifts across pairs of generalized Shepard tones that differed in SFS, in SE, or in both. We observed the classic properties of Shepard tones for either SFS or SE shifts: subjective judgements followed the smallest log-frequency change direction, with cases of ambiguity and circularity. Interestingly, when both SFS and SE changes were applied concurrently (synergistically or antagonistically), we observed a trade-off between cues. Listeners were encouraged to report when they perceived "both" directions of change concurrently, but this rarely happened, suggesting a unitary percept. A computational model could accurately fit the behavioral data by combining different cues reflecting frequency changes after auditory filtering. The model revealed that cue weighting depended on the nature of the sound. When presented with harmonic sounds, listeners put more weight on SFS-related cues, whereas inharmonic sounds led to more weight on SE-related cues. Moreover, these stimulus-based factors were modulated by inter-individual differences, revealing variability across listeners in the detailed recipe for "up" and "down" judgments. We argue that frequency changes are tracked perceptually via the adaptive combination of a diverse set of cues, in a manner that is in fact similar to the derivation of other basic auditory dimensions such as spatial location.
Collapse
Affiliation(s)
- Kai Siedenburg
- Carl von Ossietzky University of Oldenburg, Dept. of Medical Physics and Acoustics, Oldenburg, Germany
- * E-mail:
| | - Jackson Graves
- Laboratoire des systèmes perceptifs, Dépt. d’études cognitives, École normale supérieure, PSL University, CNRS, Paris, France
| | - Daniel Pressnitzer
- Laboratoire des systèmes perceptifs, Dépt. d’études cognitives, École normale supérieure, PSL University, CNRS, Paris, France
| |
Collapse
|
2
|
Long-term priors constrain category learning in the context of short-term statistical regularities. Psychon Bull Rev 2022; 29:1925-1937. [PMID: 35524011 DOI: 10.3758/s13423-022-02114-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/27/2022] [Indexed: 11/08/2022]
Abstract
Cognitive systems face a constant tension of maintaining existing representations that have been fine-tuned to long-term input regularities and adapting representations to meet the needs of short-term input that may deviate from long-term norms. Systems must balance the stability of long-term representations with plasticity to accommodate novel contexts. We investigated the interaction between perceptual biases or priors acquired across the long-term and sensitivity to statistical regularities introduced in the short-term. Participants were first passively exposed to short-term acoustic regularities and then learned categories in a supervised training task that either conflicted or aligned with long-term perceptual priors. We found that the long-term priors had robust and pervasive impact on categorization behavior. In contrast, behavior was not influenced by the nature of the short-term passive exposure. These results demonstrate that perceptual priors place strong constraints on the course of learning and that short-term passive exposure to acoustic regularities has limited impact on directing subsequent category learning.
Collapse
|
3
|
Malaia EA, Borneman SC, Krebs J, Wilbur RB. Low-Frequency Entrainment to Visual Motion Underlies Sign Language Comprehension. IEEE Trans Neural Syst Rehabil Eng 2021; 29:2456-2463. [PMID: 34762589 PMCID: PMC8720261 DOI: 10.1109/tnsre.2021.3127724] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
When people listen to speech, neural activity tracks the entropy fluctuation in the acoustic envelope of the signal. This signal-based entrainment has been shown to be the basis of speech parsing and comprehension. In this electroencephalography (EEG) study, we compute sign language users’ cortical tracking of changes in visual dynamics of the communicative signal in the time-direct videos of sign language, and their time-reversed counterparts, and assess the relative contribution of response frequencies between.2 and 12.4 Hz to comprehension using a machine learning approach to brain state classification. Lower frequencies of EEG response (.2–4 Hz) yield 100% classification accuracy, while information about cortical tracking of the visual envelope in higher frequencies is less informative. This suggests that signers rely on lower visual frequency data, such as envelope of visual signal, for sign language comprehension. In the context of real-time language processing, given the speed of comprehension responses, this suggests that fluent signers employ a predictive processing heuristic based on sign language knowledge.
Collapse
|
4
|
Siedenburg K, Jacobsen S, Reuter C. Spectral envelope position and shape in sustained musical instrument sounds. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:3715. [PMID: 34241486 DOI: 10.1121/10.0005088] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Accepted: 05/09/2021] [Indexed: 06/13/2023]
Abstract
It has been argued that the relative position of spectral envelopes along the frequency axis serves as a cue for musical instrument size (e.g., violin vs viola) and that the shape of the spectral envelope encodes family identity (violin vs flute). It is further known that fundamental frequency (F0), F0-register for specific instruments, and dynamic level strongly affect spectral properties of acoustical instrument sounds. However, the associations between these factors have not been rigorously quantified for a representative set of musical instruments. Here, we analyzed 5640 sounds from 50 sustained orchestral instruments sampled across their entire range of F0s at three dynamic levels. Regression of spectral centroid (SC) values that index envelope position indicated that smaller instruments possessed higher SC values for a majority of instrument classes (families), but SC also correlated with F0 and was strongly and consistently affected by the dynamic level. Instrument classification using relatively low-dimensional cepstral audio descriptors allowed for discrimination between instrument classes with accuracies beyond 80%. Envelope shape became much less indicative of instrument class whenever the classification problem involved generalization to different dynamic levels or F0-registers. These analyses confirm that spectral envelopes encode information about instrument size and family identity and highlight their dependence on F0(-register) and dynamic level.
Collapse
Affiliation(s)
- Kai Siedenburg
- Department of Medical Physics and Acoustics, Carl von Ossietzky University of Oldenburg, 26129 Oldenburg, Germany
| | - Simon Jacobsen
- Department of Medical Physics and Acoustics, Carl von Ossietzky University of Oldenburg, 26129 Oldenburg, Germany
| | - Christoph Reuter
- Department of Musicology, University of Vienna, 1090 Vienna, Austria
| |
Collapse
|
5
|
Contributions of natural signal statistics to spectral context effects in consonant categorization. Atten Percept Psychophys 2021; 83:2694-2708. [PMID: 33987821 DOI: 10.3758/s13414-021-02310-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/23/2021] [Indexed: 11/08/2022]
Abstract
Speech perception, like all perception, takes place in context. Recognition of a given speech sound is influenced by the acoustic properties of surrounding sounds. When the spectral composition of earlier (context) sounds (e.g., a sentence with more energy at lower third formant [F3] frequencies) differs from that of a later (target) sound (e.g., consonant with intermediate F3 onset frequency), the auditory system magnifies this difference, biasing target categorization (e.g., towards higher-F3-onset /d/). Historically, these studies used filters to force context stimuli to possess certain spectral compositions. Recently, these effects were produced using unfiltered context sounds that already possessed the desired spectral compositions (Stilp & Assgari, 2019, Attention, Perception, & Psychophysics, 81, 2037-2052). Here, this natural signal statistics approach is extended to consonant categorization (/g/-/d/). Context sentences were either unfiltered (already possessing the desired spectral composition) or filtered (to imbue specific spectral characteristics). Long-term spectral characteristics of unfiltered contexts were poor predictors of shifts in consonant categorization, but short-term characteristics (last 475 ms) were excellent predictors. This diverges from vowel data, where long-term and shorter-term intervals (last 1,000 ms) were equally strong predictors. Thus, time scale plays a critical role in how listeners attune to signal statistics in the acoustic environment.
Collapse
|
6
|
Ford LKW, Borneman J, Krebs J, Malaia E, Ames B. Classification of visual comprehension based on EEG data using sparse optimal scoring. J Neural Eng 2021; 18. [PMID: 33440368 DOI: 10.1088/1741-2552/abdb3b] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 01/13/2021] [Indexed: 11/12/2022]
Abstract
OBJECTIVE Understanding and differentiating brain states is an important task in the field of cognitive neuroscience with applications in health diagnostics (such as detecting neurotypical development vs. Autism Spectrum or coma/vegetative state vs. locked-in state). Electroencephalography (EEG) analysis is a particularly useful tool for this task as EEG data can detect millisecond-level changes in brain activity across a range of frequencies in a non-invasive and relatively inexpensive fashion. The goal of this study is to apply machine learning methods to EEG data in order to classify visual language comprehension across multiple participants. APPROACH 26-channel EEG was recorded for 24 Deaf participants while they watched videos of sign language sentences played in time-direct and time-reverse formats to simulate interpretable vs. uninterpretable sign language, respectively. Sparse Optimal Scoring (SOS) was applied to EEG data in order to classify which type of video a participant was watching, time-direct or time-reversed. The use of SOS also served to reduce the dimensionality of the features to improve model interpretability. MAIN RESULTS The analysis of frequency-domain EEG data resulted in an average out-of-sample classification accuracy of 98.89%, which was far superior to the time-domain analysis. This high classification accuracy suggests this model can accurately identify common neural responses to visual linguistic stimuli. SIGNIFICANCE The significance of this work is in determining necessary and sufficient neural features for classifying the high-level neural process of visual language comprehension across multiple participants.
Collapse
Affiliation(s)
| | - Joshua Borneman
- Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, Indiana, 47907-2122, UNITED STATES
| | - Julia Krebs
- Center for Cognitive Neuroscience, University of Salzburg, Hellbrunnerstraße 34, Salzburg, 5020, AUSTRIA
| | - Evguenia Malaia
- Communicative Disorders, The University of Alabama, Box 870242, Tuscaloosa, Alabama, 35487, UNITED STATES
| | - Brendan Ames
- Mathematics, The University of Alabama, Box 870350, Tuscaloosa, Alabama, 35487-0350, UNITED STATES
| |
Collapse
|
7
|
Adaptive Efficient Coding of Correlated Acoustic Properties. J Neurosci 2019; 39:8664-8678. [PMID: 31519821 DOI: 10.1523/jneurosci.0141-19.2019] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Revised: 08/26/2019] [Accepted: 08/30/2019] [Indexed: 11/21/2022] Open
Abstract
Natural sounds such as vocalizations often have covarying acoustic attributes, resulting in redundancy in neural coding. The efficient coding hypothesis proposes that sensory systems are able to detect such covariation and adapt to reduce redundancy, leading to more efficient neural coding. Recent psychoacoustic studies have shown the auditory system can rapidly adapt to efficiently encode two covarying dimensions as a single dimension, following passive exposure to sounds in which temporal and spectral attributes covaried in a correlated fashion. However, these studies observed a cost to this adaptation, which was a loss of sensitivity to the orthogonal dimension. Here we explore the neural basis of this psychophysical phenomenon by recording single-unit responses from the primary auditory cortex in awake ferrets exposed passively to stimuli with two correlated attributes, similar in stimulus design to the psychoacoustic experiments in humans. We found: (1) the signal-to-noise ratio of spike-rate coding of cortical responses driven by sounds with correlated attributes remained unchanged along the exposure dimension, but was reduced along the orthogonal dimension; (2) performance of a decoder trained with spike data to discriminate stimuli along the orthogonal dimension was equally reduced; (3) correlations between neurons tuned to the two covarying attributes decreased after exposure; and (4) these exposure effects still occurred if sounds were correlated along two acoustic dimensions, but varied randomly along a third dimension. These neurophysiological results are consistent with the efficient coding hypothesis and may help deepen our understanding of how the auditory system encodes and represents acoustic regularities and covariance.SIGNIFICANCE STATEMENT The efficient coding (EC) hypothesis (Attneave, 1954; Barlow, 1961) proposes that the neural code in sensory systems efficiently encodes natural stimuli by minimizing the number of spikes to transmit a sensory signal. Results of recent psychoacoustic studies in humans are consistent with the EC hypothesis in that, following passive exposure to stimuli with correlated attributes, the auditory system rapidly adapts so as to more efficiently encode the two covarying dimensions as a single dimension. In the current neurophysiological experiments, using a similar stimulus design and the experimental paradigm to the psychoacoustic studies of Stilp et al. (2010) and Stilp and Kluender (2011, 2012, 2016), we recorded responses from single neurons in the auditory cortex of the awake ferret, showing adaptive efficient neural coding of two correlated acoustic attributes.
Collapse
|
8
|
Malaia EA, Wilbur RB. Syllable as a unit of information transfer in linguistic communication: The entropy syllable parsing model. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2019; 11:e1518. [PMID: 31505710 DOI: 10.1002/wcs.1518] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Revised: 08/03/2019] [Accepted: 08/16/2019] [Indexed: 12/12/2022]
Abstract
To understand human language-both spoken and signed-the listener or viewer has to parse the continuous external signal into components. The question of what those components are (e.g., phrases, words, sounds, phonemes?) has been a subject of long-standing debate. We re-frame this question to ask: What properties of the incoming visual or auditory signal are indispensable to eliciting language comprehension? In this review, we assess the phenomenon of language parsing from modality-independent viewpoint. We show that the interplay between dynamic changes in the entropy of the signal and between neural entrainment to the signal at syllable level (4-5 Hz range) is causally related to language comprehension in both speech and sign language. This modality-independent Entropy Syllable Parsing model for the linguistic signal offers insight into the mechanisms of language processing, suggesting common neurocomputational bases for syllables in speech and sign language. This article is categorized under: Linguistics > Linguistic Theory Linguistics > Language in Mind and Brain Linguistics > Computational Models of Language Psychology > Language.
Collapse
Affiliation(s)
- Evie A Malaia
- Department of Communicative Disorders, University of Alabama, Tuscaloosa, Alabama
| | - Ronnie B Wilbur
- Department of Speech, Language, Hearing Sciences, College of Health and Human Sciences, Purdue University, West Lafayette, Indiana.,Linguistics, School of Interdisciplinary Studies, College of Liberal Arts, Purdue University, West Lafayette, Indiana
| |
Collapse
|
9
|
Long-standing problems in speech perception dissolve within an information-theoretic perspective. Atten Percept Psychophys 2019; 81:861-883. [PMID: 30937673 DOI: 10.3758/s13414-019-01702-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
An information theoretic framework is proposed to have the potential to dissolve (rather than attempt to solve) multiple long-standing problems concerning speech perception. By this view, speech perception can be reframed as a series of processes through which sensitivity to information-that which changes and/or is unpredictable-becomes increasingly sophisticated and shaped by experience. Problems concerning appropriate objects of perception (gestures vs. sounds), rate normalization, variance consequent to articulation, and talker normalization are reframed, or even dissolved, within this information-theoretic framework. Application of discriminative models founded on information theory provides a productive approach to answer questions concerning perception of speech, and perception most broadly.
Collapse
|
10
|
|
11
|
Blumenthal-Dramé A, Malaia E. Shared neural and cognitive mechanisms in action and language: The multiscale information transfer framework. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2018; 10:e1484. [PMID: 30417551 DOI: 10.1002/wcs.1484] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Revised: 09/20/2018] [Accepted: 10/02/2018] [Indexed: 11/11/2022]
Abstract
This review compares how humans process action and language sequences produced by other humans. On the one hand, we identify commonalities between action and language processing in terms of cognitive mechanisms (e.g., perceptual segmentation, predictive processing, integration across multiple temporal scales), neural resources (e.g., the left inferior frontal cortex), and processing algorithms (e.g., comprehension based on changes in signal entropy). On the other hand, drawing on sign language with its particularly strong motor component, we also highlight what differentiates (both oral and signed) linguistic communication from nonlinguistic action sequences. We propose the multiscale information transfer framework (MSIT) as a way of integrating these insights and highlight directions into which future empirical research inspired by the MSIT framework might fruitfully evolve. This article is categorized under: Psychology > Language Linguistics > Language in Mind and Brain Psychology > Motor Skill and Performance Psychology > Prediction.
Collapse
Affiliation(s)
- Alice Blumenthal-Dramé
- Department of English, Albert-Ludwigs-Universität Freiburg, Freiburg, Germany.,Freiburg Institute for Advanced Studies, Freiburg, Germany
| | - Evie Malaia
- Department of Communicative Disorders, University of Alabama, Tuscaloosa, Alabama.,Freiburg Institute for Advanced Studies, Freiburg, Germany
| |
Collapse
|
12
|
Stilp CE, Kiefte M, Kluender KR. Discovering acoustic structure of novel sounds. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:2460. [PMID: 29716264 PMCID: PMC5924381 DOI: 10.1121/1.5031018] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2017] [Revised: 03/24/2018] [Accepted: 03/26/2018] [Indexed: 06/08/2023]
Abstract
Natural sounds have substantial acoustic structure (predictability, nonrandomness) in their spectral and temporal compositions. Listeners are expected to exploit this structure to distinguish simultaneous sound sources; however, previous studies confounded acoustic structure and listening experience. Here, sensitivity to acoustic structure in novel sounds was measured in discrimination and identification tasks. Complementary signal-processing strategies independently varied relative acoustic entropy (the inverse of acoustic structure) across frequency or time. In one condition, instantaneous frequency of low-pass-filtered 300-ms random noise was rescaled to 5 kHz bandwidth and resynthesized. In another condition, the instantaneous frequency of a short gated 5-kHz noise was resampled up to 300 ms. In both cases, entropy relative to full bandwidth or full duration was a fraction of that in 300-ms noise sampled at 10 kHz. Discrimination of sounds improved with less relative entropy. Listeners identified a probe sound as a target sound (1%, 3.2%, or 10% relative entropy) that repeated amidst distractor sounds (1%, 10%, or 100% relative entropy) at 0 dB SNR. Performance depended on differences in relative entropy between targets and background. Lower-relative-entropy targets were better identified against higher-relative-entropy distractors than lower-relative-entropy distractors; higher-relative-entropy targets were better identified amidst lower-relative-entropy distractors. Results were consistent across signal-processing strategies.
Collapse
Affiliation(s)
- Christian E Stilp
- Department of Psychological and Brain Sciences, University of Louisville, 317 Life Sciences Building, Louisville, Kentucky 40292, USA
| | - Michael Kiefte
- School of Communication Sciences and Disorders, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Keith R Kluender
- Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana 47907, USA
| |
Collapse
|
13
|
Yin P, Shamma SA, Fritz JB. Relative salience of spectral and temporal features in auditory long-term memory. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:4046. [PMID: 28040019 PMCID: PMC6910011 DOI: 10.1121/1.4968395] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2016] [Revised: 10/05/2016] [Accepted: 11/09/2016] [Indexed: 06/06/2023]
Abstract
In order to explore the representation of sound features in auditory long-term memory, two groups of ferrets were trained on Go vs Nogo, 3-zone classification tasks. The sound stimuli differed primarily along the spectral and temporal dimensions. In Group 1, two ferrets were trained to (i) classify tones based on their frequency (Tone-task), and subsequently learned to (ii) classify white noise based on its amplitude modulation rate (AM-task). In Group 2, two ferrets were trained to classify tones based on correlated combinations of their frequency and AM rate (AM-Tone task). Both groups of ferrets learned their tasks and were able to generalize performance along the trained spectral (tone frequency) or temporal (AM rate) dimensions. Insights into stimulus representations in memory were gained when the animals were tested with a diverse set of untrained probes that mixed features from the two dimensions. Animals exhibited a complex pattern of responses to the probes reflecting primarily the probes' spectral similarity with the training stimuli, and secondarily the temporal features of the stimuli. These diverse behavioral decisions could be well accounted for by a nearest-neighbor classifier model that relied on a multiscale spectrotemporal cortical representation of the training and probe sounds.
Collapse
Affiliation(s)
- Pingbo Yin
- Neural Systems Laboratory, Institute for Systems Research, 2207 A.V. Williams Building, University of Maryland, College Park, Maryland 20742, USA
| | - Shihab A Shamma
- Neural Systems Laboratory, Institute for Systems Research, Electrical and Computer Engineering Department, 2203 A.V. Williams Building, University of Maryland, College Park, Maryland 20742, USA
| | - Jonathan B Fritz
- Neural Systems Laboratory, Institute for Systems Research, 2207 A.V. Williams Building, University of Maryland, College Park, Maryland 20742, USA
| |
Collapse
|