1
|
Shamma S, Dutta K. Spectro-temporal templates unify the pitch percepts of resolved and unresolved harmonics. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:615. [PMID: 30823787 PMCID: PMC6910008 DOI: 10.1121/1.5088504] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2018] [Revised: 12/07/2018] [Accepted: 01/09/2019] [Indexed: 06/09/2023]
Abstract
Pitch is a fundamental attribute in auditory perception involved in source identification and segregation, music, and speech understanding. Pitch percepts are intimately related to harmonic resolvability of sound. When harmonics are well-resolved, the induced pitch is usually salient and precise, and several models relying on autocorrelations or harmonic spectral templates can account for these percepts. However, when harmonics are not completely resolved, the pitch percept becomes less salient, poorly discriminated, with upper range limited to a few hundred hertz, and spectral templates fail to convey percept since only temporal cues are available. Here, a biologically-motivated model is presented that combines spectral and temporal cues to account for both percepts. The model explains how temporal analysis to estimate the pitch of the unresolved harmonics is performed by bandpass filters implemented by resonances in dendritic trees of neurons in the early auditory pathway. It is demonstrated that organizing and exploiting such dendritic tuning can occur spontaneously in response to white noise. This paper then shows how temporal cues of unresolved harmonics may be integrated with spectrally resolved harmonics, creating spectro-temporal harmonic templates for all pitch percepts. Finally, the model extends its account of monaural pitch percepts to pitches evoked by dichotic binaural stimuli.
Collapse
Affiliation(s)
- Shihab Shamma
- Department of Electrical and Computer Engineering & Institute for Systems Research, University of Maryland, College Park, Maryland 20742, USA
| | - Kelsey Dutta
- Department of Electrical and Computer Engineering & Institute for Systems Research, University of Maryland, College Park, Maryland 20742, USA
| |
Collapse
|
2
|
Joosten ERM, Shamma SA, Lorenzi C, Neri P. Dynamic Reweighting of Auditory Modulation Filters. PLoS Comput Biol 2016; 12:e1005019. [PMID: 27398600 PMCID: PMC4939963 DOI: 10.1371/journal.pcbi.1005019] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2015] [Accepted: 06/13/2016] [Indexed: 11/22/2022] Open
Abstract
Sound waveforms convey information largely via amplitude modulations (AM). A large body of experimental evidence has provided support for a modulation (bandpass) filterbank. Details of this model have varied over time partly reflecting different experimental conditions and diverse datasets from distinct task strategies, contributing uncertainty to the bandwidth measurements and leaving important issues unresolved. We adopt here a solely data-driven measurement approach in which we first demonstrate how different models can be subsumed within a common 'cascade' framework, and then proceed to characterize the cascade via system identification analysis using a single stimulus/task specification and hence stable task rules largely unconstrained by any model or parameters. Observers were required to detect a brief change in level superimposed onto random level changes that served as AM noise; the relationship between trial-by-trial noisy fluctuations and corresponding human responses enables targeted identification of distinct cascade elements. The resulting measurements exhibit a dynamic complex picture in which human perception of auditory modulations appears adaptive in nature, evolving from an initial lowpass to bandpass modes (with broad tuning, Q∼1) following repeated stimulus exposure.
Collapse
Affiliation(s)
- Eva R. M. Joosten
- Laboratoire Psychologie de la Perception (CNRS UMR 8242) and Université Paris Descartes, Sorbonne Paris Cité, Paris, France
| | - Shihab A. Shamma
- Laboratoire des Systèmes Perceptifs (CNRS UMR 8248) and Département d’études cognitives, Ecole Normale Supérieure, PSL Research University, Paris, France
- Department of Electrical and Computer Engineering, Institute for Systems Research, University of Maryland, College Park, Maryland, United States of America
| | - Christian Lorenzi
- Laboratoire des Systèmes Perceptifs (CNRS UMR 8248) and Département d’études cognitives, Ecole Normale Supérieure, PSL Research University, Paris, France
| | - Peter Neri
- Laboratoire des Systèmes Perceptifs (CNRS UMR 8248) and Département d’études cognitives, Ecole Normale Supérieure, PSL Research University, Paris, France
| |
Collapse
|
3
|
Shamma S, Lorenzi C. On the balance of envelope and temporal fine structure in the encoding of speech in the early auditory system. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 133:2818-33. [PMID: 23654388 PMCID: PMC3663870 DOI: 10.1121/1.4795783] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
There is much debate on how the spectrotemporal modulations of speech (or its spectrogram) are encoded in the responses of the auditory nerve, and whether speech intelligibility is best conveyed via the "envelope" (E) or "temporal fine-structure" (TFS) of the neural responses. Wide use of vocoders to resolve this question has commonly assumed that manipulating the amplitude-modulation and frequency-modulation components of the vocoded signal alters the relative importance of E or TFS encoding on the nerve, thus facilitating assessment of their relative importance to intelligibility. Here we argue that this assumption is incorrect, and that the vocoder approach is ineffective in differentially altering the neural E and TFS. In fact, we demonstrate using a simplified model of early auditory processing that both neural E and TFS encode the speech spectrogram with constant and comparable relative effectiveness regardless of the vocoder manipulations. However, we also show that neural TFS cues are less vulnerable than their E counterparts under severe noisy conditions, and hence should play a more prominent role in cochlear stimulation strategies.
Collapse
Affiliation(s)
- Shihab Shamma
- Electrical and Computer Engineering Department and Institute for Systems Research, University of Maryland, College Park, Maryland 20742, USA.
| | | |
Collapse
|
4
|
Wang GI, Delgutte B. Sensitivity of cochlear nucleus neurons to spatio-temporal changes in auditory nerve activity. J Neurophysiol 2012; 108:3172-95. [PMID: 22972956 DOI: 10.1152/jn.00160.2012] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The spatio-temporal pattern of auditory nerve (AN) activity, representing the relative timing of spikes across the tonotopic axis, contains cues to perceptual features of sounds such as pitch, loudness, timbre, and spatial location. These spatio-temporal cues may be extracted by neurons in the cochlear nucleus (CN) that are sensitive to relative timing of inputs from AN fibers innervating different cochlear regions. One possible mechanism for this extraction is "cross-frequency" coincidence detection (CD), in which a central neuron converts the degree of coincidence across the tonotopic axis into a rate code by preferentially firing when its AN inputs discharge in synchrony. We used Huffman stimuli (Carney LH. J Neurophysiol 64: 437-456, 1990), which have a flat power spectrum but differ in their phase spectra, to systematically manipulate relative timing of spikes across tonotopically neighboring AN fibers without changing overall firing rates. We compared responses of CN units to Huffman stimuli with responses of model CD cells operating on spatio-temporal patterns of AN activity derived from measured responses of AN fibers with the principle of cochlear scaling invariance. We used the maximum likelihood method to determine the CD model cell parameters most likely to produce the measured CN unit responses, and thereby could distinguish units behaving like cross-frequency CD cells from those consistent with same-frequency CD (in which all inputs would originate from the same tonotopic location). We find that certain CN unit types, especially those associated with globular bushy cells, have responses consistent with cross-frequency CD cells. A possible functional role of a cross-frequency CD mechanism in these CN units is to increase the dynamic range of binaural neurons that process cues for sound localization.
Collapse
Affiliation(s)
- Grace I Wang
- Eaton-Peabody Laboratories, Massachusetts Eye and Ear Infirmary, Boston, MA, USA
| | | |
Collapse
|
5
|
Perceptual learning evidence for tuning to spectrotemporal modulation in the human auditory system. J Neurosci 2012; 32:6542-9. [PMID: 22573676 DOI: 10.1523/jneurosci.5732-11.2012] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Natural sounds are characterized by complex patterns of sound intensity distributed across both frequency (spectral modulation) and time (temporal modulation). Perception of these patterns has been proposed to depend on a bank of modulation filters, each tuned to a unique combination of a spectral and a temporal modulation frequency. There is considerable physiological evidence for such combined spectrotemporal tuning. However, direct behavioral evidence is lacking. Here we examined the processing of spectrotemporal modulation behaviorally using a perceptual-learning paradigm. We trained human listeners for ∼1 h/d for 7 d to discriminate the depth of spectral (0.5 cyc/oct; 0 Hz), temporal (0 cyc/oct; 32 Hz), or upward spectrotemporal (0.5 cyc/oct; 32 Hz) modulation. Each trained group learned more on their respective trained condition than did controls who received no training. Critically, this depth-discrimination learning did not generalize to the trained stimuli of the other groups or to downward spectrotemporal (0.5 cyc/oct; -32 Hz) modulation. Learning on discrimination also led to worsening on modulation detection, but only when the same spectrotemporal modulation was used for both tasks. Thus, these influences of training were specific to the trained combination of spectral and temporal modulation frequencies, even when the trained and untrained stimuli had one modulation frequency in common. This specificity indicates that training modified circuitry that had combined spectrotemporal tuning, and therefore that circuits with such tuning can influence perception. These results are consistent with the possibility that the auditory system analyzes sounds through filters tuned to combined spectrotemporal modulation.
Collapse
|
6
|
Macherey O, Carlyon RP. Temporal pitch percepts elicited by dual-channel stimulation of a cochlear implant. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 127:339-49. [PMID: 20058981 PMCID: PMC3000475 DOI: 10.1121/1.3269042] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
McKay and McDermott [J. Acoust. Soc. Am. 100, 1081-1092 (1996)] found that when two different amplitude-modulated pulse trains are presented to two channels separated by <1.5 mm, some cochlear implant (CI) listeners perceive the aggregate temporal pattern. The present study attempted to extend this general finding and to test whether dual-electrode stimulation would increase the upper limit of temporal pitch perception in CIs. Six subjects were asked to rank 12 dual-channel stimuli differing in their rate [ranging from 92 to 516 pps (pulses per second) on each individual channel] and in their inter-channel delay (pulses on the two channels being either nearly simultaneous or delayed by half the period). The data showed that, for an electrode separation of 0.75 or 1.1 mm, (a) the perceived pitch was on average slightly higher for the long-delay than for the short-delay stimuli but never matched the pitch corresponding to the aggregate temporal pattern, (b) the upper limit of temporal pitch did not increase using long-delay stimuli, and (c) the pitch differences between short- and long-delay stimuli were largely insensitive to channel order and to electrode configuration. These results suggest that there may be more independence between CI channels than previously thought.
Collapse
Affiliation(s)
- Olivier Macherey
- MRC Cognition and Brain Sciences Unit, 15 Chaucer Road, Cambridge CB2 7EF, United Kingdom.
| | | |
Collapse
|
7
|
Abstract
Adaptively optimizing experiments has the potential to significantly reduce the number of trials needed to build parametric statistical models of neural systems. However, application of adaptive methods to neurophysiology has been limited by severe computational challenges. Since most neurons are high-dimensional systems, optimizing neurophysiology experiments requires computing high-dimensional integrations and optimizations in real time. Here we present a fast algorithm for choosing the most informative stimulus by maximizing the mutual information between the data and the unknown parameters of a generalized linear model (GLM) that we want to fit to the neuron's activity. We rely on important log concavity and asymptotic normality properties of the posterior to facilitate the required computations. Our algorithm requires only low-rank matrix manipulations and a two-dimensional search to choose the optimal stimulus. The average running time of these operations scales quadratically with the dimensionality of the GLM, making real-time adaptive experimental design feasible even for high-dimensional stimulus and parameter spaces. For example, we require roughly 10 milliseconds on a desktop computer to optimize a 100-dimensional stimulus. Despite using some approximations to make the algorithm efficient, our algorithm asymptotically decreases the uncertainty about the model parameters at a rate equal to the maximum rate predicted by an asymptotic analysis. Simulation results show that picking stimuli by maximizing the mutual information can speed up convergence to the optimal values of the parameters by an order of magnitude compared to using random (nonadaptive) stimuli. Finally, applying our design procedure to real neurophysiology experiments requires addressing the nonstationarities that we would expect to see in neural responses; our algorithm can efficiently handle both fast adaptation due to spike history effects and slow, nonsystematic drifts in a neuron's activity.
Collapse
Affiliation(s)
- Jeremy Lewi
- Bioengineering Graduate Program, Wallace H. Coulter Department of Biomedical Engineering, Laboratory for Neuroengineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | | | | |
Collapse
|
8
|
Krumbholz K, Magezi DA, Moore RC, Patterson RD. Binaural sluggishness precludes temporal pitch processing based on envelope cues in conditions of binaural unmasking. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 125:1067-1074. [PMID: 19206881 DOI: 10.1121/1.3056557] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Binaural sluggishness refers to the binaural system's inability to follow fast changes in the interaural configuration of the incoming sound stream. Several studies have measured binaural sluggishness by measuring signal detection in conditions of binaural unmasking when the interaural configuration of the masker is changed over time. However, it has been shown that, in conditions of binaural unmasking, binaural sluggishness also affects the perception of temporal changes in the properties of the signal (i.e., its frequency or level) and not just in the interaural configuration of the masker. By measuring the temporal modulation transfer function for sinusoidally modulated noise presented in conditions of binaural unmasking, the first experiment of the current study showed that, due to binaural sluggishness, the internal representation of binaurally unmasked sounds conveys little or no information about envelope fluctuations with rates within the pitch range (i.e., above 30 Hz). The second experiment measured the masked detection threshold for musical interval recognition in binaurally unmasked harmonic tones and showed that, in conditions of binaural unmasking, pitch wanes when the harmonics become unresolved by the cochlear filters. These results suggest that binaural sluggishness precludes temporal pitch processing based on envelope cues in binaurally unmasked sounds.
Collapse
Affiliation(s)
- Katrin Krumbholz
- MRC Institute of Hearing Research, University Park, Nottingham, United Kingdom.
| | | | | | | |
Collapse
|
9
|
Elhilali M, Shamma SA. A cocktail party with a cortical twist: how cortical mechanisms contribute to sound segregation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 124:3751-71. [PMID: 19206802 PMCID: PMC2676630 DOI: 10.1121/1.3001672] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2007] [Revised: 09/16/2008] [Accepted: 09/24/2008] [Indexed: 05/11/2023]
Abstract
Sound systems and speech technologies can benefit greatly from a deeper understanding of how the auditory system, and particularly the auditory cortex, is able to parse complex acoustic scenes into meaningful auditory objects and streams under adverse conditions. In the current work, a biologically plausible model of this process is presented, where the role of cortical mechanisms in organizing complex auditory scenes is explored. The model consists of two stages: (i) a feature analysis stage that maps the acoustic input into a multidimensional cortical representation and (ii) an integrative stage that recursively builds up expectations of how streams evolve over time and reconciles its predictions with the incoming sensory input by sorting it into different clusters. This approach yields a robust computational scheme for speaker separation under conditions of speech or music interference. The model can also emulate the archetypal streaming percepts of tonal stimuli that have long been tested in human subjects. The implications of this model are discussed with respect to the physiological correlates of streaming in the cortex as well as the role of attention and other top-down influences in guiding sound organization.
Collapse
Affiliation(s)
- Mounya Elhilali
- Department of Electrical and Computer Engineering, Johns Hopkins University, Barton, Baltimore, Maryland 21218, USA.
| | | |
Collapse
|
10
|
Kumar S, Forster HM, Bailey P, Griffiths TD. Mapping unpleasantness of sounds to their auditory representation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 124:3810-3817. [PMID: 19206807 DOI: 10.1121/1.3006380] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Certain sounds, for example, the squeal of chalk on a blackboard, are typically perceived as highly unpleasant. This study addressed the question of what aspects of the auditory representation of such sounds are associated with judgments of unpleasantness. Participants rated the perceived unpleasantness of a large number of sounds that included "griding" and other unpleasant sounds. A multivariate partial least-squares (PLS) model was then built to relate the ratings of unpleasantness with an auditory representation derived from a model of processing in the auditory pathway. The "existence region" of unpleasantness in the auditory space of frequency-temporal modulation was determined after the PLS model had been validated by predicting the unpleasantness of novel sounds from the auditory representation. It was observed that the existence region corresponded to spectral frequencies between 2500 and 5500 Hz, and temporal modulations in the range 1-16 Hz.
Collapse
Affiliation(s)
- Sukhbinder Kumar
- Auditory Group, Newcastle University Medical School, Framlington Place, Newcastle upon Tyne, United Kingdom
| | | | | | | |
Collapse
|
11
|
Magezi DA, Krumbholz K. Can the binaural system extract fine-structure interaural time differences from noncorresponding frequency channels? THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 124:3095-3107. [PMID: 19045795 DOI: 10.1121/1.2980522] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Due to the phase differences in the basilar membrane response between neighboring places along the cochlea, it is generally assumed that the processing of interaural time differences (ITDs) in the temporal fine structure relies on comparisons between corresponding frequency channels from the two ears. This study was aimed to test whether the auditory system is capable of extracting fine-structure ITDs from noncorresponding channels. For that, the ITD discrimination threshold was measured for a 500 Hz pure tone partially masked by a lowpass masker in one ear and a highpass masker in the other. The maskers were intended to obscure the apical or basal part of the tone's excitation pattern, respectively, and thus force the listener to extract ITDs from disparate channels. While the results did not allow any definite conclusions as to whether or not ITD processing in these conditions was based on cross-channel comparisons, some aspects of the data suggest that it was. Modeling simulations showed that any cross-channel comparisons would have to be limited to a fairly narrow frequency range of little more than one auditory-filter bandwidth. However, the between-channel phase differences within even such a narrow range would be sufficient to explain ITD sensitivity in neurophysiological data.
Collapse
Affiliation(s)
- David A Magezi
- MRC Institute of Hearing Research, University Park, Nottingham NG7 2RD, United Kingdom and School of Psychology, University of Nottingham, Nottingham NG7 2RD, United Kingdom
| | | |
Collapse
|
12
|
Jepsen ML, Ewert SD, Dau T. A computational model of human auditory signal processing and perception. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 124:422-438. [PMID: 18646987 DOI: 10.1121/1.2924135] [Citation(s) in RCA: 84] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
A model of computational auditory signal-processing and perception that accounts for various aspects of simultaneous and nonsimultaneous masking in human listeners is presented. The model is based on the modulation filterbank model described by Dau et al. [J. Acoust. Soc. Am. 102, 2892 (1997)] but includes major changes at the peripheral and more central stages of processing. The model contains outer- and middle-ear transformations, a nonlinear basilar-membrane processing stage, a hair-cell transduction stage, a squaring expansion, an adaptation stage, a 150-Hz lowpass modulation filter, a bandpass modulation filterbank, a constant-variance internal noise, and an optimal detector stage. The model was evaluated in experimental conditions that reflect, to a different degree, effects of compression as well as spectral and temporal resolution in auditory processing. The experiments include intensity discrimination with pure tones and broadband noise, tone-in-noise detection, spectral masking with narrow-band signals and maskers, forward masking with tone signals and tone or noise maskers, and amplitude-modulation detection with narrow- and wideband noise carriers. The model can account for most of the key properties of the data and is more powerful than the original model. The model might be useful as a front end in technical applications.
Collapse
Affiliation(s)
- Morten L Jepsen
- Centre for Applied Hearing Research, Acoustic Technology, Department of Electrical Engineering, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
| | | | | |
Collapse
|
13
|
Carlyon RP, Mahendran S, Deeks JM, Long CJ, Axon P, Baguley D, Bleeck S, Winter IM. Behavioral and physiological correlates of temporal pitch perception in electric and acoustic hearing. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 123:973-985. [PMID: 18247900 PMCID: PMC2279014 DOI: 10.1121/1.2821986] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
In the "4-6" condition of experiment 1, normal-hearing (NH) listeners compared the pitch of a bandpass-filtered pulse train, whose inter-pulse intervals (IPIs) alternated between 4 and 6 ms, to that of isochronous pulse trains. Consistent with previous results obtained at a lower signal level, the pitch of the 4-6 stimulus corresponded to that of an isochronous pulse train having a period of 5.7 ms-longer than the mean IPI of 5 ms. In other conditions the IPI alternated between 3.5-5.5 and 4.5-6.5 ms. Experiment 2 was similar but presented electric pulse trains to one channel of a cochlear implant. In both cases, as overall IPI increased, the pitch of the alternating-interval stimulus approached that of an isochronous train having a period equal to the mean IPI. Experiment 3 measured compound action potentials (CAPs) to alternating-interval stimuli in guinea pigs and in NH listeners. The CAPs to pulses occurring after 4-ms intervals were smaller than responses to pulses occurring after 6-ms intervals, resulting in a modulated pattern that was independent of overall level. The results are compared to the predictions of a simple model incorporating auditory-nerve (AN) refractoriness, and where pitch is estimated from first-order intervals in the AN response.
Collapse
Affiliation(s)
- Robert P Carlyon
- MRC Cognition & Brain Sciences Unit, 15 Chaucer Road, Cambridge CB2 7EF, United Kingdom.
| | | | | | | | | | | | | | | |
Collapse
|
14
|
Mesgarani N, Slaney M, Shamma S. Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations. ACTA ACUST UNITED AC 2006. [DOI: 10.1109/tsa.2005.858055] [Citation(s) in RCA: 130] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
15
|
Chi T, Ru P, Shamma SA. Multiresolution spectrotemporal analysis of complex sounds. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2005; 118:887-906. [PMID: 16158645 DOI: 10.1121/1.1945807] [Citation(s) in RCA: 256] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
A computational model of auditory analysis is described that is inspired by psychoacoustical and neurophysiological findings in early and central stages of the auditory system. The model provides a unified multiresolution representation of the spectral and temporal features likely critical in the perception of sound. Simplified, more specifically tailored versions of this model have already been validated by successful application in the assessment of speech intelligibility [Elhilali et al., Speech Commun. 41(2-3), 331-348 (2003); Chi et al., J. Acoust. Soc. Am. 106, 2719-2732 (1999)] and in explaining the perception of monaural phase sensitivity [R. Carlyon and S. Shamma, J. Acoust. Soc. Am. 114, 333-348 (2003)]. Here we provide a more complete mathematical formulation of the model, illustrating how complex signals are transformed through various stages of the model, and relating it to comparable existing models of auditory processing. Furthermore, we outline several reconstruction algorithms to resynthesize the sound from the model output so as to evaluate the fidelity of the representation and contribution of different features and cues to the sound percept.
Collapse
Affiliation(s)
- Taishih Chi
- Center for Auditory and Acoustics Research, Institute for Systems Research Electrical and Computer Engineering Department, University of Maryland, College Park, Maryland 20742, USA
| | | | | |
Collapse
|
16
|
Bernstein JGW, Oxenham AJ. An autocorrelation model with place dependence to account for the effect of harmonic number on fundamental frequency discrimination. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2005; 117:3816-31. [PMID: 16018484 PMCID: PMC1451417 DOI: 10.1121/1.1904268] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Fundamental frequency (f0) difference limens (DLs) were measured as a function of f0 for sine- and random-phase harmonic complexes, bandpass filtered with 3-dB cutoff frequencies of 2.5 and 3.5 kHz (low region) or 5 and 7 kHz (high region), and presented at an average 15 dB sensation level (approximately 48 dB SPL) per component in a wideband background noise. Fundamental frequencies ranged from 50 to 300 Hz and 100 to 600 Hz in the low and high spectral regions, respectively. In each spectral region, f0 DLs improved dramatically with increasing f0 as approximately the tenth harmonic appeared in the passband. Generally, f0 DLs for complexes with similar harmonic numbers were similar in the two spectral regions. The dependence of f0 discrimination on harmonic number presents a significant challenge to autocorrelation (AC) models of pitch, in which predictions generally depend more on spectral region than harmonic number. A modification involving a "lag window"is proposed and tested, restricting the AC representation to a limited range of lags relative to each channel's characteristic frequency. This modified unitary pitch model was able to account for the dependence of f0 DLs on harmonic number, although this correct behavior was not based on peripheral harmonic resolvability.
Collapse
Affiliation(s)
- Joshua G W Bernstein
- Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.
| | | |
Collapse
|
17
|
|