1
|
Deloche F, Parida S, Sivaprakasam A, Heinz MG. Estimation of Cochlear Frequency Selectivity Using a Convolution Model of Forward-Masked Compound Action Potentials. J Assoc Res Otolaryngol 2024; 25:35-51. [PMID: 38278969 PMCID: PMC10907335 DOI: 10.1007/s10162-023-00922-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 12/09/2023] [Indexed: 01/28/2024] Open
Abstract
PURPOSE Frequency selectivity is a fundamental property of the peripheral auditory system; however, the invasiveness of auditory nerve (AN) experiments limits its study in the human ear. Compound action potentials (CAPs) associated with forward masking have been suggested as an alternative to assess cochlear frequency selectivity. Previous methods relied on an empirical comparison of AN and CAP tuning curves in animal models, arguably not taking full advantage of the information contained in forward-masked CAP waveforms. METHODS To improve the estimation of cochlear frequency selectivity based on the CAP, we introduce a convolution model to fit forward-masked CAP waveforms. The model generates masking patterns that, when convolved with a unitary response, can predict the masking of the CAP waveform induced by Gaussian noise maskers. Model parameters, including those characterizing frequency selectivity, are fine-tuned by minimizing waveform prediction errors across numerous masking conditions, yielding robust estimates. RESULTS The method was applied to click-evoked CAPs at the round window of anesthetized chinchillas using notched-noise maskers with various notch widths and attenuations. The estimated quality factor Q10 as a function of center frequency is shown to closely match the average quality factor obtained from AN fiber tuning curves, without the need for an empirical correction factor. CONCLUSION This study establishes a moderately invasive method for estimating cochlear frequency selectivity with potential applicability to other animal species or humans. Beyond the estimation of frequency selectivity, the proposed model proved to be remarkably accurate in fitting forward-masked CAP responses and could be extended to study more complex aspects of cochlear signal processing (e.g., compressive nonlinearities).
Collapse
Affiliation(s)
- François Deloche
- Department of Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, 47907, IN, USA.
| | - Satyabrata Parida
- Department of Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, 47907, IN, USA
- Weldon School of Biomedical Engineering, Purdue University, 206 S. Martin Jischke Drive, West Lafayette, 47907, IN, USA
| | - Andrew Sivaprakasam
- Weldon School of Biomedical Engineering, Purdue University, 206 S. Martin Jischke Drive, West Lafayette, 47907, IN, USA
| | - Michael G Heinz
- Department of Speech, Language, and Hearing Sciences, Purdue University, 715 Clinic Drive, West Lafayette, 47907, IN, USA
- Weldon School of Biomedical Engineering, Purdue University, 206 S. Martin Jischke Drive, West Lafayette, 47907, IN, USA
| |
Collapse
|
2
|
Irino T, Yokota K, Patterson RD. Improving Auditory Filter Estimation by Incorporating Absolute Threshold and a Level-dependent Internal Noise. Trends Hear 2023; 27:23312165231209750. [PMID: 37905400 PMCID: PMC10619342 DOI: 10.1177/23312165231209750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Accepted: 10/07/2023] [Indexed: 11/02/2023] Open
Abstract
Auditory filter (AF) shape has traditionally been estimated with a combination of a notched-noise (NN) masking experiment and a power spectrum model (PSM) of masking. However, there are several challenges that remain in both the simultaneous and forward masking paradigms. We hypothesized that AF shape estimation would be improved if absolute threshold (AT) and a level-dependent internal noise were explicitly represented in the PSM. To document the interaction between NN threshold and AT in normal hearing (NH) listeners, a large set of NN thresholds was measured at four center frequencies (500, 1000, 2000, and 4000 Hz) with the emphasis on low-level maskers. The proposed PSM, consisting of the compressive gammachirp (cGC) filter and three nonfilter parameters, allowed AF estimation over a wide range of frequencies and levels with fewer coefficients and less error than previous models. The results also provided new insights into the nonfilter parameters. The detector signal-to-noise ratio (K ) was found to be constant across signal frequencies, suggesting that no frequency dependence hypothesis is required in the postfiltering process. The ANSI standard "Hearing Level-0dB" function, i.e., AT of NH listeners, could be applied to the frequency distribution of the noise floor for the best AF estimation. The introduction of a level-dependent internal noise could mitigate the nonlinear effects that occur in the simultaneous NN masking paradigm. The new PSM improves the applicability of the model, particularly when the sound pressure level of the NN threshold is close to AT.
Collapse
Affiliation(s)
- Toshio Irino
- Faculty of Systems Engineering, Wakayama University, Japan
| | - Kenji Yokota
- Faculty of Systems Engineering, Wakayama University, Japan
| | - Roy D. Patterson
- Department of Physiology, Development and Neuroscience, University
of Cambridge, UK
| |
Collapse
|
3
|
A Novel Pathological Voice Identification Technique through Simulated Cochlear Implant Processing Systems. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12052398] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
This paper presents a pathological voice identification system employing signal processing techniques through cochlear implant models. The fundamentals of the biological process for speech perception are investigated to develop this technique. Two cochlear implant models are considered in this work: one uses a conventional bank of bandpass filters, and the other one uses a bank of optimized gammatone filters. The critical center frequencies of those filters are selected to mimic the human cochlear vibration patterns caused by audio signals. The proposed system processes the speech samples and applies a CNN for final pathological voice identification. The results show that the two proposed models adopting bandpass and gammatone filterbanks can discriminate the pathological voices from healthy ones, resulting in F1 scores of 77.6% and 78.7%, respectively, with speech samples. The obtained results of this work are also compared with those of other related published works.
Collapse
|
4
|
Vinay, Sandhya, Moore BCJ. Effect of age, test frequency and level on thresholds for the TEN(HL) test for people with normal hearing. Int J Audiol 2020; 59:915-920. [DOI: 10.1080/14992027.2020.1783584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Vinay
- Department of Health Sciences, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| | - Sandhya
- Department of Health Sciences, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| | - Brian C. J. Moore
- Department of Experimental Psychology, University of Cambridge, Cambridge, UK
| |
Collapse
|
5
|
Auditory motion perception emerges from successive sound localizations integrated over time. Sci Rep 2019; 9:16437. [PMID: 31712688 PMCID: PMC6848124 DOI: 10.1038/s41598-019-52742-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2019] [Accepted: 10/11/2019] [Indexed: 11/18/2022] Open
Abstract
Humans rely on auditory information to estimate the path of moving sound sources. But unlike in vision, the existence of motion-sensitive mechanisms in audition is still open to debate. Psychophysical studies indicate that auditory motion perception emerges from successive localization, but existing models fail to predict experimental results. However, these models do not account for any temporal integration. We propose a new model tracking motion using successive localization snapshots but integrated over time. This model is derived from psychophysical experiments on the upper limit for circular auditory motion perception (UL), defined as the speed above which humans no longer identify the direction of sounds spinning around them. Our model predicts ULs measured with different stimuli using solely static localization cues. The temporal integration blurs these localization cues rendering them unreliable at high speeds, which results in the UL. Our findings indicate that auditory motion perception does not require motion-sensitive mechanisms.
Collapse
|
6
|
Xu Y, Thakur CS, Singh RK, Hamilton TJ, Wang RM, van Schaik A. A FPGA Implementation of the CAR-FAC Cochlear Model. Front Neurosci 2018; 12:198. [PMID: 29692700 PMCID: PMC5902704 DOI: 10.3389/fnins.2018.00198] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Accepted: 03/12/2018] [Indexed: 11/19/2022] Open
Abstract
This paper presents a digital implementation of the Cascade of Asymmetric Resonators with Fast-Acting Compression (CAR-FAC) cochlear model. The CAR part simulates the basilar membrane's (BM) response to sound. The FAC part models the outer hair cell (OHC), the inner hair cell (IHC), and the medial olivocochlear efferent system functions. The FAC feeds back to the CAR by moving the poles and zeros of the CAR resonators automatically. We have implemented a 70-section, 44.1 kHz sampling rate CAR-FAC system on an Altera Cyclone V Field Programmable Gate Array (FPGA) with 18% ALM utilization by using time-multiplexing and pipeline parallelizing techniques and present measurement results here. The fully digital reconfigurable CAR-FAC system is stable, scalable, easy to use, and provides an excellent input stage to more complex machine hearing tasks such as sound localization, sound segregation, speech recognition, and so on.
Collapse
Affiliation(s)
| | | | | | | | | | - André van Schaik
- MARCS Institute, Western Sydney University, Sydney, NSW, Australia
| |
Collapse
|
7
|
Kates JM, Prabhu S. The dynamic gammawarp auditory filterbank. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:1603. [PMID: 29604718 DOI: 10.1121/1.5027827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Auditory filterbanks are an integral part of many metrics designed to predict speech intelligibility and speech quality. Considerations in these applications include accurate reproduction of auditory filter shapes, the ability to reproduce the impact of hearing loss as well as normal hearing, and computational efficiency. This paper presents an alternative method for implementing a dynamic compressive gammachirp (dcGC) auditory filterbank [Irino and Patterson (2006). IEEE Trans. Audio Speech Lang. Proc. 14, 2222-2232]. Instead of using a cascade of second-order sections, this approach uses digital frequency warping to give the gammawarp filterbank. The set of warped finite impulse response filter coefficients is constrained to be symmetrical, which results in the same phase response for all filters in the filterbank. The identical phase responses allow the dynamic variation in the gammachirp filter magnitude response to be realized as a sum, using time-varying weights, of three filters that provide the responses for high-, mid-, and low-intensity input signals, respectively. The gammawarp filterbank offers a substantial improvement in execution speed compared to previous dcGC implementations; for a dcGC filterbank, the gammawarp implementation is 24 to 38 times faster than the dcGC Matlab code of Irino.
Collapse
Affiliation(s)
- James M Kates
- Department of Speech Language and Hearing Sciences, University of Colorado, Boulder, Colorado 80309, USA
| | - Shashidhar Prabhu
- Department of Electrical Computer and Energy Engineering, University of Colorado, Boulder, Colorado 80309, USA
| |
Collapse
|
8
|
English phonology and an acoustic language universal. Sci Rep 2017; 7:46049. [PMID: 28397801 PMCID: PMC5387398 DOI: 10.1038/srep46049] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2016] [Accepted: 02/07/2017] [Indexed: 11/08/2022] Open
Abstract
Acoustic analyses of eight different languages/dialects had revealed a language universal: Three spectral factors consistently appeared in analyses of power fluctuations of spoken sentences divided by critical-band filters into narrow frequency bands. Examining linguistic implications of these factors seems important to understand how speech sounds carry linguistic information. Here we show the three general categories of the English phonemes, i.e., vowels, sonorant consonants, and obstruents, to be discriminable in the Cartesian space constructed by these factors: A factor related to frequency components above 3,300 Hz was associated only with obstruents (e.g., /k/ or /z/), and another factor related to frequency components around 1,100 Hz only with vowels (e.g., /a/ or /i/) and sonorant consonants (e.g., /w/, /r/, or /m/). The latter factor highly correlated with the hypothetical concept of sonority or aperture in phonology. These factors turned out to connect the linguistic and acoustic aspects of speech sounds systematically.
Collapse
|
9
|
Hansen AS, Raen Ø, Moore BCJ. Reference thresholds for the TEN(HL) test for people with normal hearing. Int J Audiol 2017; 56:672-676. [PMID: 28394651 DOI: 10.1080/14992027.2017.1307531] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
OBJECTIVE To estimate normative values and repeatability of thresholds for the TEN(HL) test for diagnosing dead regions in the cochlea, as a function of signal frequency, TEN(HL) level, age and gender. DESIGN The TEN(HL) test was administered twice for each ear of each participant using signal frequencies from 0.5 to 4 kHz and TEN(HL) levels of 30, 50 and 70 dB HL/ERBN. STUDY SAMPLE In all, 29 young participants and 8 older participants were tested. All had normal audiograms with no history of hearing problems. RESULTS There was good repeatability across sessions. There was no significant effect of ear, gender or age group. The average signal-to-TEN ratio (STR) at threshold was close to 0 dB. For low signal frequencies, the STR at threshold varied only slightly with TEN(HL) level, but for the signal frequencies of 3 and 4 kHz the STR at threshold increased to about +2.7 dB for the TEN(HL) level of 70 dB/ERBN. CONCLUSIONS For a high TEN(HL) level, the "normal" STR at threshold at 3 and 4 kHz is closer to +2 dB than to 0 dB. Further research is needed to assess whether the TEN(HL)-test criteria need to be modified when testing at high frequencies and high levels.
Collapse
Affiliation(s)
- Andreas S Hansen
- a Audiology Program, Department of Neuromedicine and Movement Science, Faculty of Medicine and Health Sciences , Norwegian University of Science and Technology (NTNU) , Trondheim , Norway and
| | - Øyvind Raen
- a Audiology Program, Department of Neuromedicine and Movement Science, Faculty of Medicine and Health Sciences , Norwegian University of Science and Technology (NTNU) , Trondheim , Norway and
| | - Brian C J Moore
- b Department of Experimental Psychology , University of Cambridge , Cambridge , UK
| |
Collapse
|
10
|
An acoustic key to eight languages/dialects: Factor analyses of critical-band-filtered speech. Sci Rep 2017; 7:42468. [PMID: 28198405 PMCID: PMC5309770 DOI: 10.1038/srep42468] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2016] [Accepted: 01/11/2017] [Indexed: 11/08/2022] Open
Abstract
The peripheral auditory system functions like a frequency analyser, often modelled as a bank of non-overlapping band-pass filters called critical bands; 20 bands are necessary for simulating frequency resolution of the ear within an ordinary frequency range of speech (up to 7,000 Hz). A far smaller number of filters seemed sufficient, however, to re-synthesise intelligible speech sentences with power fluctuations of the speech signals passing through them; nevertheless, the number and frequency ranges of the frequency bands for efficient speech communication are yet unknown. We derived four common frequency bands—covering approximately 50–540, 540–1,700, 1,700–3,300, and above 3,300 Hz—from factor analyses of spectral fluctuations in eight different spoken languages/dialects. The analyses robustly led to three factors common to all languages investigated—the low & mid-high factor related to the two separate frequency ranges of 50–540 and 1,700–3,300 Hz, the mid-low factor the range of 540–1,700 Hz, and the high factor the range above 3,300 Hz—in these different languages/dialects, suggesting a language universal.
Collapse
|
11
|
Tabuchi H, Laback B, Necciari T, Majdak P. The role of compression in the simultaneous masker phase effect. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:2680. [PMID: 27794305 PMCID: PMC5714264 DOI: 10.1121/1.4964328] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Peripheral compression is believed to play a major role in the masker phase effect (MPE). While compression is almost instantaneous, activation of the efferent system reduces compression in a temporally evolving manner. To study the role of efferent-controlled compression in the MPE, in experiment 1, simultaneous masking of a 30-ms 4-kHz tone by 40-ms Schroeder-phase harmonic complexes was measured with on- and off-frequency precursors as a function of masker phase curvature for two masker levels (60 and 90 dB sound pressure level). The MPE was quantified by the threshold range [min/max difference (MMD)] across the phase curvatures. For the 60-dB condition, the presence of on-frequency precursor decreased the MMD from 10 to 5 dB. Experiment 2 studied the role of the precursor on the auditory filter's bandwidth. The on-frequency precursor was found to increase the bandwidth, an effect incorporated in the subsequent modeling. A model of the auditory periphery including cochlear filtering and basilar membrane compression generally underestimated the MMDs. A model based on two-step compression, including compression of inner hair cells, accounted for the MMDs across precursor and level conditions. Overall, the observed precursor effects and the model predictions suggest an important role of compression in the simultaneous MPE.
Collapse
Affiliation(s)
- Hisaaki Tabuchi
- Austrian Academy of Sciences, Acoustics Research Institute, Wohllebengasse 12-14, 1040 Vienna, Austria
| | - Bernhard Laback
- Austrian Academy of Sciences, Acoustics Research Institute, Wohllebengasse 12-14, 1040 Vienna, Austria
| | - Thibaud Necciari
- Austrian Academy of Sciences, Acoustics Research Institute, Wohllebengasse 12-14, 1040 Vienna, Austria
| | - Piotr Majdak
- Austrian Academy of Sciences, Acoustics Research Institute, Wohllebengasse 12-14, 1040 Vienna, Austria
| |
Collapse
|
12
|
Saremi A, Beutelmann R, Dietz M, Ashida G, Kretzberg J, Verhulst S. A comparative study of seven human cochlear filter models. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:1618. [PMID: 27914400 DOI: 10.1121/1.4960486] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Auditory models have been developed for decades to simulate characteristics of the human auditory system, but it is often unknown how well auditory models compare to each other or perform in tasks they were not primarily designed for. This study systematically analyzes predictions of seven publicly-available cochlear filter models in response to a fixed set of stimuli to assess their capabilities of reproducing key aspects of human cochlear mechanics. The following features were assessed at frequencies of 0.5, 1, 2, 4, and 8 kHz: cochlear excitation patterns, nonlinear response growth, frequency selectivity, group delays, signal-in-noise processing, and amplitude modulation representation. For each task, the simulations were compared to available physiological data recorded in guinea pigs and gerbils as well as to human psychoacoustics data. The presented results provide application-oriented users with comprehensive information on the advantages, limitations and computation costs of these seven mainstream cochlear filter models.
Collapse
Affiliation(s)
- Amin Saremi
- Computational Neuroscience and Cluster of Excellence "Hearing4all," Department of Neuroscience, University of Oldenburg, Oldenburg, Germany
| | - Rainer Beutelmann
- Animal Physiology and Behavior and Cluster of Excellence "Hearing4all," Department of Neuroscience, University of Oldenburg, Oldenburg, Germany
| | - Mathias Dietz
- Medizinische Physik and Cluster of Excellence "Hearing4all," Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany
| | - Go Ashida
- Computational Neuroscience and Cluster of Excellence "Hearing4all," Department of Neuroscience, University of Oldenburg, Oldenburg, Germany
| | - Jutta Kretzberg
- Computational Neuroscience and Cluster of Excellence "Hearing4all," Department of Neuroscience, University of Oldenburg, Oldenburg, Germany
| | - Sarah Verhulst
- Medizinische Physik and Cluster of Excellence "Hearing4all," Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
13
|
Bidelman GM, Jennings SG, Strickland EA. PsyAcoustX: A flexible MATLAB(®) package for psychoacoustics research. Front Psychol 2015; 6:1498. [PMID: 26528199 PMCID: PMC4601020 DOI: 10.3389/fpsyg.2015.01498] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2015] [Accepted: 09/17/2015] [Indexed: 11/21/2022] Open
Abstract
The demands of modern psychophysical studies require precise stimulus delivery and flexible platforms for experimental control. Here, we describe PsyAcoustX, a new, freely available suite of software tools written in the MATLAB(®) environment to conduct psychoacoustics research on a standard PC. PsyAcoustX provides a flexible platform to generate and present auditory stimuli in real time and record users' behavioral responses. Data are automatically logged by stimulus condition and aggregated in an exported spreadsheet for offline analysis. Detection thresholds can be measured adaptively under basic and complex auditory masking tasks and other paradigms (e.g., amplitude modulation detection) within minutes. The flexibility of the module offers experimenters access to nearly every conceivable combination of stimulus parameters (e.g., probe-masker relations). Example behavioral applications are highlighted including the measurement of audiometric thresholds, basic simultaneous and non-simultaneous (i.e., forward and backward) masking paradigms, gap detection, and amplitude modulation detection. Examples of these measurements are provided including the psychoacoustic phenomena of temporal overshoot, psychophysical tuning curves, and temporal modulation transfer functions. Importantly, the core design of PsyAcoustX is easily modifiable, allowing users the ability to adapt its basic structure and create additional modules for measuring discrimination/detection thresholds for other auditory attributes (e.g., pitch, intensity, etc.) or binaural paradigms.
Collapse
Affiliation(s)
- Gavin M. Bidelman
- Institute for Intelligent Systems, University of Memphis, MemphisTN, USA
- School of Communication Sciences and Disorders, University of Memphis, MemphisTN, USA
| | - Skyler G. Jennings
- Department of Communication Sciences and Disorders, University of Utah, Salt Lake CityUT, USA
| | - Elizabeth A. Strickland
- Department of Speech, Language, and Hearing Sciences, Purdue University, West LafayetteIN, USA
| |
Collapse
|
14
|
Pham CQ, Bremen P, Shen W, Yang SM, Middlebrooks JC, Zeng FG, Mc Laughlin M. Central Auditory Processing of Temporal and Spectral-Variance Cues in Cochlear Implant Listeners. PLoS One 2015; 10:e0132423. [PMID: 26176553 PMCID: PMC4503639 DOI: 10.1371/journal.pone.0132423] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2014] [Accepted: 06/13/2015] [Indexed: 11/25/2022] Open
Abstract
Cochlear implant (CI) listeners have difficulty understanding speech in complex listening environments. This deficit is thought to be largely due to peripheral encoding problems arising from current spread, which results in wide peripheral filters. In normal hearing (NH) listeners, central processing contributes to segregation of speech from competing sounds. We tested the hypothesis that basic central processing abilities are retained in post-lingually deaf CI listeners, but processing is hampered by degraded input from the periphery. In eight CI listeners, we measured auditory nerve compound action potentials to characterize peripheral filters. Then, we measured psychophysical detection thresholds in the presence of multi-electrode maskers placed either inside (peripheral masking) or outside (central masking) the peripheral filter. This was intended to distinguish peripheral from central contributions to signal detection. Introduction of temporal asynchrony between the signal and masker improved signal detection in both peripheral and central masking conditions for all CI listeners. Randomly varying components of the masker created spectral-variance cues, which seemed to benefit only two out of eight CI listeners. Contrastingly, the spectral-variance cues improved signal detection in all five NH listeners who listened to our CI simulation. Together these results indicate that widened peripheral filters significantly hamper central processing of spectral-variance cues but not of temporal cues in post-lingually deaf CI listeners. As indicated by two CI listeners in our study, however, post-lingually deaf CI listeners may retain some central processing abilities similar to NH listeners.
Collapse
Affiliation(s)
- Carol Q. Pham
- Center for Hearing Research, University of California Irvine, Irvine, California, United States of America
- Department of Anatomy and Neurobiology, University of California Irvine, Irvine, California, United States of America
- * E-mail:
| | - Peter Bremen
- Center for Hearing Research, University of California Irvine, Irvine, California, United States of America
- Department of Otolaryngology- Head and Neck Surgery, University of California Irvine, Irvine, California, United States of America
| | - Weidong Shen
- Institute of Otolaryngology, Chinese PLA Genera Hospital, Beijing, China
| | - Shi-Ming Yang
- Institute of Otolaryngology, Chinese PLA Genera Hospital, Beijing, China
| | - John C. Middlebrooks
- Center for Hearing Research, University of California Irvine, Irvine, California, United States of America
- Department of Otolaryngology- Head and Neck Surgery, University of California Irvine, Irvine, California, United States of America
- Department of Neurobiology and Behavior, University of California Irvine, Irvine, California, United States of America
- Department of Biomedical Engineering, University of California Irvine, Irvine, California, United States of America
- Department of Cognitive Sciences, University of California Irvine, Irvine, California, United States of America
| | - Fan-Gang Zeng
- Center for Hearing Research, University of California Irvine, Irvine, California, United States of America
- Department of Anatomy and Neurobiology, University of California Irvine, Irvine, California, United States of America
- Department of Otolaryngology- Head and Neck Surgery, University of California Irvine, Irvine, California, United States of America
- Department of Biomedical Engineering, University of California Irvine, Irvine, California, United States of America
- Department of Cognitive Sciences, University of California Irvine, Irvine, California, United States of America
| | - Myles Mc Laughlin
- Center for Hearing Research, University of California Irvine, Irvine, California, United States of America
- Department of Otolaryngology- Head and Neck Surgery, University of California Irvine, Irvine, California, United States of America
| |
Collapse
|
15
|
Marmel F, Plack CJ, Hopkins K. The role of excitation-pattern cues in the detection of frequency shifts in bandpass-filtered complex tones. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 137:2687-97. [PMID: 25994700 PMCID: PMC5044982 DOI: 10.1121/1.4919315] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
One task intended to measure sensitivity to temporal fine structure (TFS) involves the discrimination of a harmonic complex tone from a tone in which all harmonics are shifted upwards by the same amount in hertz. Both tones are passed through a fixed bandpass filter centered on the high harmonics to reduce the availability of excitation-pattern cues and a background noise is used to mask combination tones. The role of frequency selectivity in this "TFS1" task was investigated by varying level. Experiment 1 showed that listeners performed more poorly at a high level than at a low level. Experiment 2 included intermediate levels and showed that performance deteriorated for levels above about 57 dB sound pressure level. Experiment 3 estimated the magnitude of excitation-pattern cues from the variation in forward masking of a pure tone as a function of frequency shift in the complex tones. There was negligible variation, except for the lowest level used. The results indicate that the changes in excitation level at threshold for the TFS1 task would be too small to be usable. The results are consistent with the TFS1 task being performed using TFS cues, and with frequency selectivity having an indirect effect on performance via its influence on TFS cues.
Collapse
Affiliation(s)
- Frederic Marmel
- School of Psychological Sciences, Manchester Academic Health Science Centre, The University of Manchester, Oxford Road, Manchester M13 9PL, United Kingdom
| | - Christopher J. Plack
- School of Psychological Sciences, Manchester Academic Health Science Centre, The University of Manchester, Oxford Road, Manchester M13 9PL, United Kingdom
| | - Kathryn Hopkins
- School of Psychological Sciences, Manchester Academic Health Science Centre, The University of Manchester, Oxford Road, Manchester M13 9PL, United Kingdom
| |
Collapse
|
16
|
Flamme GA, Geda K, McGregor K, Wyllys K, Deiters KK, Murphy WJ, Stephenson MR. Stimulus and transducer effects on threshold. Int J Audiol 2015; 54 Suppl 1:S19-29. [PMID: 25549164 PMCID: PMC4559258 DOI: 10.3109/14992027.2014.979300] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
OBJECTIVE This study examined differences in thresholds obtained under Sennheiser HDA200 circumaural earphones using pure tone, equivalent rectangular noise bands, and 1/3 octave noise bands relative to thresholds obtained using Telephonics TDH-39P supra-aural earphones. DESIGN Thresholds were obtained via each transducer and stimulus condition six times within a 10-day period. STUDY SAMPLE Forty-nine adults were selected from a prior study to represent low, moderate, and high threshold reliability. RESULTS The results suggested that (1) only small adjustments were needed to reach equivalent TDH-39P thresholds, (2) pure-tone thresholds obtained with HDA200 circumaural earphones had reliability equal to or better than those obtained using TDH-39P earphones, (3) the reliability of noise-band thresholds improved with broader stimulus bandwidth and was either equal to or better than pure-tone thresholds, and (4) frequency-specificity declined with stimulus bandwidths greater than one equivalent rectangular band, which could complicate early detection of hearing changes that occur within a narrow frequency range. CONCLUSIONS These data suggest that circumaural earphones such as the HDA200 headphones provide better reliability for audiometric testing as compared to the TDH-39P earphones. These data support the use of noise bands, preferably ERB noises, as stimuli for audiometric monitoring.
Collapse
Affiliation(s)
- Gregory A. Flamme
- Department of Speech Pathology and Audiology, Western Michigan University, Kalamazoo, MI, USA
| | - Kyle Geda
- Department of Speech Pathology and Audiology, Western Michigan University, Kalamazoo, MI, USA
| | - Kara McGregor
- Department of Speech Pathology and Audiology, Western Michigan University, Kalamazoo, MI, USA
| | - Krista Wyllys
- Department of Speech Pathology and Audiology, Western Michigan University, Kalamazoo, MI, USA
| | - Kristy K. Deiters
- Department of Speech Pathology and Audiology, Western Michigan University, Kalamazoo, MI, USA
| | - William J. Murphy
- Division of Applied Research and Technology, National Institute for Occupational Safety and Health, Cincinnati, Ohio, USA
| | - Mark R. Stephenson
- Division of Applied Research and Technology, National Institute for Occupational Safety and Health, Cincinnati, Ohio, USA
| |
Collapse
|
17
|
Zouhir Y, Ouni K. A bio-inspired feature extraction for robust speech recognition. SPRINGERPLUS 2014; 3:651. [PMID: 25485194 PMCID: PMC4230714 DOI: 10.1186/2193-1801-3-651] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/29/2014] [Accepted: 10/24/2014] [Indexed: 12/03/2022]
Abstract
In this paper, a feature extraction method for robust speech recognition in noisy environments is proposed. The proposed method is motivated by a biologically inspired auditory model which simulates the outer/middle ear filtering by a low-pass filter and the spectral behaviour of the cochlea by the Gammachirp auditory filterbank (GcFB). The speech recognition performance of our method is tested on speech signals corrupted by real-world noises. The evaluation results show that the proposed method gives better recognition rates compared to the classic techniques such as Perceptual Linear Prediction (PLP), Linear Predictive Coding (LPC), Linear Prediction Cepstral coefficients (LPCC) and Mel Frequency Cepstral Coefficients (MFCC). The used recognition system is based on the Hidden Markov Models with continuous Gaussian Mixture densities (HMM-GM).
Collapse
Affiliation(s)
- Youssef Zouhir
- Research Unit: Signals and Mechatronic Systems, SMS, Higher School of Technology and Computer Science (ESTI), University of Carthage, Carthage, Tunisia
| | - Kaïs Ouni
- Research Unit: Signals and Mechatronic Systems, SMS, Higher School of Technology and Computer Science (ESTI), University of Carthage, Carthage, Tunisia
| |
Collapse
|
18
|
Abstract
This article reviews the evolution of a series of models of loudness developed in Cambridge, UK. The first model, applicable to stationary sounds, was based on modifications of the model developed by Zwicker, including the introduction of a filter to allow for the effects of transfer of sound through the outer and middle ear prior to the calculation of an excitation pattern, and changes in the way that the excitation pattern was calculated. Later, modifications were introduced to the assumed middle-ear transfer function and to the way that specific loudness was calculated from excitation level. These modifications led to a finite calculated loudness at absolute threshold, which made it possible to predict accurately the absolute thresholds of broadband and narrowband sounds, based on the assumption that the absolute threshold corresponds to a fixed small loudness. The model was also modified to give predictions of partial loudness—the loudness of one sound in the presence of another. This allowed predictions of masked thresholds based on the assumption that the masked threshold corresponds to a fixed small partial loudness. Versions of the model for time-varying sounds were developed, which allowed prediction of the masked threshold of any sound in a background of any other sound. More recent extensions incorporate binaural processing to account for the summation of loudness across ears. In parallel, versions of the model for predicting loudness for hearing-impaired ears have been developed and have been applied to the development of methods for fitting multichannel compression hearing aids.
Collapse
Affiliation(s)
- Brian C J Moore
- Department of Experimental Psychology, University of Cambridge, UK
| |
Collapse
|
19
|
Hilkhuysen G, Macherey O. Optimizing pulse-spreading harmonic complexes to minimize intrinsic modulations after auditory filtering. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 136:1281. [PMID: 25190401 DOI: 10.1121/1.4890642] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
All signals, except sine waves, exhibit intrinsic modulations that affect perceptual masking. Reducing the physical intrinsic modulations of a broadband signal does not necessarily have a perceptual impact: auditory filtering can reintroduce modulations. Broadband signals with low intrinsic modulations after auditory filtering have proved difficult to design. To that end, this paper introduces a class of signals termed pulse-spreading harmonic complexes (PSHCs). PSHCs are generated by summing harmonically related components with such a phase that the resulting waveform exhibits pulses equally-spaced within a repetition period. The order of a PSHC determines its pulse rate. Simulations with a gamma-tone filterbank suggest an optimal pulse rate at which, after auditory filtering, the PSHC's intrinsic modulations are lowest. These intrinsic modulations appear to be less than those for broadband pseudo-random (PR) or low-noise (LN) noise. This hypothesis was tested in a modulation-detection experiment involving five modulation rates ranging from 8 to 128 Hz and both broadband and narrowband carriers using PSHCs, PR, and LN noise. PSHC showed the lowest thresholds of all broadband signals. Results imply that optimized PSHCs exhibit less intrinsic modulations after auditory filtering than any other broadband signal previously considered.
Collapse
Affiliation(s)
- Gaston Hilkhuysen
- Laboratoire de Mecanique et d'Acoustique, Centre National de la Recherche Scientifique, Unité Propre de Recherche 7051, Aix-Marseille Université, Centrale Marseille, 31 Chemin Joseph Aiguier, F-13402 Marseille Cedex 20, France
| | - Olivier Macherey
- Laboratoire de Mecanique et d'Acoustique, Centre National de la Recherche Scientifique, Unité Propre de Recherche 7051, Aix-Marseille Université, Centrale Marseille, 31 Chemin Joseph Aiguier, F-13402 Marseille Cedex 20, France
| |
Collapse
|
20
|
Baumgartner R, Majdak P, Laback B. Modeling sound-source localization in sagittal planes for human listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 136:791-802. [PMID: 25096113 PMCID: PMC4582445 DOI: 10.1121/1.4887447] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Monaural spectral features are important for human sound-source localization in sagittal planes, including front-back discrimination and elevation perception. These directional features result from the acoustic filtering of incoming sounds by the listener's morphology and are described by listener-specific head-related transfer functions (HRTFs). This article proposes a probabilistic, functional model of sagittal-plane localization that is based on human listeners' HRTFs. The model approximates spectral auditory processing, accounts for acoustic and non-acoustic listener specificity, allows for predictions beyond the median plane, and directly predicts psychoacoustic measures of localization performance. The predictive power of the listener-specific modeling approach was verified under various experimental conditions: The model predicted effects on localization performance of band limitation, spectral warping, non-individualized HRTFs, spectral resolution, spectral ripples, and high-frequency attenuation in speech. The functionalities of vital model components were evaluated and discussed in detail. Positive spectral gradient extraction, sensorimotor mapping, and binaural weighting of monaural spatial information were addressed in particular. Potential applications of the model include predictions of psychophysical effects, for instance, in the context of virtual acoustics or hearing assistive devices.
Collapse
|
21
|
Lina IA, Lauer AM. Rapid measurement of auditory filter shape in mice using the auditory brainstem response and notched noise. Hear Res 2013; 298:73-9. [PMID: 23347916 PMCID: PMC3639490 DOI: 10.1016/j.heares.2013.01.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/17/2012] [Revised: 12/27/2012] [Accepted: 01/07/2013] [Indexed: 11/21/2022]
Abstract
The notched noise method is an effective procedure for measuring frequency resolution and auditory filter shapes in both human and animal models of hearing. Briefly, auditory filter shape and bandwidth estimates are derived from masked thresholds for tones presented in noise containing widening spectral notches. As the spectral notch widens, increasingly less of the noise falls within the auditory filter and the tone becomes more detectible until the notch width exceeds the filter bandwidth. Behavioral procedures have been used for the derivation of notched noise auditory filter shapes in mice; however, the time and effort needed to train and test animals on these tasks renders a constraint on the widespread application of this testing method. As an alternative procedure, we combined relatively non-invasive auditory brainstem response (ABR) measurements and the notched noise method to estimate auditory filters in normal-hearing mice at center frequencies of 8, 11.2, and 16 kHz. A complete set of simultaneous masked thresholds for a particular tone frequency were obtained in about an hour. ABR-derived filter bandwidths broadened with increasing frequency, consistent with previous studies. The ABR notched noise procedure provides a fast alternative to estimating frequency selectivity in mice that is well-suited to high through-put or time-sensitive screening.
Collapse
Affiliation(s)
- Ioan A. Lina
- Johns Hopkins University School of Medicine, Department of Otolaryngology – HNS, Center for Hearing and Balance, 515 Traylor, 720 Rutland Ave., Baltimore, MD 21205, United States
| | - Amanda M. Lauer
- Johns Hopkins University School of Medicine, Department of Otolaryngology – HNS, Center for Hearing and Balance, 515 Traylor, 720 Rutland Ave., Baltimore, MD 21205, United States
| |
Collapse
|
22
|
Yamashita Y, Nakajima Y, Ueda K, Shimada Y, Hirsh D, Seno T, Smith BA. Acoustic analyses of speech sounds and rhythms in Japanese- and english-learning infants. Front Psychol 2013; 4:57. [PMID: 23450824 PMCID: PMC3584442 DOI: 10.3389/fpsyg.2013.00057] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2012] [Accepted: 01/25/2013] [Indexed: 11/25/2022] Open
Abstract
The purpose of this study was to explore developmental changes, in terms of spectral fluctuations and temporal periodicity with Japanese- and English-learning infants. Three age groups (15, 20, and 24 months) were selected, because infants diversify phonetic inventories with age. Natural speech of the infants was recorded. We utilized a critical-band-filter bank, which simulated the frequency resolution in adults' auditory periphery. First, the correlations between the power fluctuations of the critical-band outputs represented by factor analysis were observed in order to see how the critical bands should be connected to each other, if a listener is to differentiate sounds in infants' speech. In the following analysis, we analyzed the temporal fluctuations of factor scores by calculating autocorrelations. The present analysis identified three factors as had been observed in adult speech at 24 months of age in both linguistic environments. These three factors were shifted to a higher frequency range corresponding to the smaller vocal tract size of the infants. The results suggest that the vocal tract structures of the infants had developed to become adult-like configuration by 24 months of age in both language environments. The amount of utterances with periodic nature of shorter time increased with age in both environments. This trend was clearer in the Japanese environment.
Collapse
Affiliation(s)
- Yuko Yamashita
- Graduate School of Design, Kyushu UniversityFukuoka, Japan
| | - Yoshitaka Nakajima
- Department of Human Science, Center for Applied Perceptual Research, Kyushu UniversityFukuoka, Japan
| | - Kazuo Ueda
- Department of Human Science, Center for Applied Perceptual Research, Kyushu UniversityFukuoka, Japan
| | - Yohko Shimada
- Graduate School of Asian and African Studies, Kyoto UniversityKyoto, Japan
| | - David Hirsh
- Faculty of Education and Social Work, University of SydneySydney, NSW, Australia
| | - Takeharu Seno
- Faculty of Design, Institute for Advanced Study, Kyushu UniversityFukuoka, Japan
| | | |
Collapse
|
23
|
Abstracts of the British Society of Audiology annual conference (incorporating the Experimental and Clinical Short papers meetings). Int J Audiol 2012. [DOI: 10.3109/14992027.2012.653103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
|
24
|
Lyon RF. Cascades of two-pole-two-zero asymmetric resonators are good models of peripheral auditory function. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 130:3893-3904. [PMID: 22225045 DOI: 10.1121/1.3658470] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
A cascade of two-pole-two-zero filter stages is a good model of the auditory periphery in two distinct ways. First, in the form of the pole-zero filter cascade, it acts as an auditory filter model that provides an excellent fit to data on human detection of tones in masking noise, with fewer fitting parameters than previously reported filter models such as the roex and gammachirp models. Second, when extended to the form of the cascade of asymmetric resonators with fast-acting compression, it serves as an efficient front-end filterbank for machine-hearing applications, including dynamic nonlinear effects such as fast wide-dynamic-range compression. In their underlying linear approximations, these filters are described by their poles and zeros, that is, by rational transfer functions, which makes them simple to implement in analog or digital domains. Other advantages in these models derive from the close connection of the filter-cascade architecture to wave propagation in the cochlea. These models also reflect the automatic-gain-control function of the auditory system and can maintain approximately constant impulse-response zero-crossing times as the level-dependent parameters change.
Collapse
Affiliation(s)
- Richard F Lyon
- Google Inc., 1600 Amphitheatre Parkway, Mountain View, California 94043, USA.
| |
Collapse
|
25
|
Sęk A, Moore BCJ. Implementation of two tests for measuring sensitivity to temporal fine structure. Int J Audiol 2011; 51:58-63. [PMID: 22050366 DOI: 10.3109/14992027.2011.605808] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
OBJECTIVE To implement two methods for measuring sensitivity to temporal fine structure (TFS) for use in assessing effects of hearing loss and age that may not be apparent from the audiogram. DESIGN The TFS1 test was described by Moore and Sek (2009). The task is to discriminate a harmonic complex tone from a tone in which all frequency components are shifted upwards by the same amount in Hz. The TFSLF test was described by Hopkins and Moore (2010a). The task is to detect changes in lateral position of a binaurally presented tone based on interaural phase difference (IPD). Both tests have been implemented in software that can be run on a PC with a good-quality sound card. The software includes a routine for measuring the absolute threshold at the test frequency. RESULTS For each test, an experimental run at a single frequency takes about three minutes. Practice tasks (frequency discrimination of pure tones for TFS1 and discrimination of changes in lateral position based on interaural level difference for TFSLF) are also implemented that are similar to the main task, but easier. CONCLUSIONS The software implementation allows sensitivity to TFS to be measured quickly without a requirement for specialized equipment.
Collapse
Affiliation(s)
- Aleksander Sęk
- Institute of Acoustics, Adam Mickiewicz University, Poznań, Poland
| | | |
Collapse
|
26
|
Chen Z, Hu G, Glasberg BR, Moore BCJ. A new method of calculating auditory excitation patterns and loudness for steady sounds. Hear Res 2011; 282:204-15. [PMID: 21851853 DOI: 10.1016/j.heares.2011.08.001] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/13/2011] [Revised: 07/20/2011] [Accepted: 08/03/2011] [Indexed: 11/26/2022]
Abstract
A new method for calculating auditory excitation patterns and loudness for steady sounds is described. The method is based on a nonlinear filterbank in which each filter is the sum of a broad passive filter and a sharp active filter. All filters have a rounded-exponential shape. For each center frequency (CF), the gain of the active filter is controlled by the output of the passive filter. The parameters of the model were derived from large sets of previously published notched-noise masking data obtained from human subjects. Excitation patterns derived using the new filterbank include the effects of basilar membrane compression. Loudness can be calculated as the area under the excitation pattern when plotted in intensity-like units on an ERB(N)-number (Cam) scale; no transformation from excitation to specific loudness is required. The method predicts the standard equal-loudness contours and loudness as a function of bandwidth with good accuracy. With some additional assumptions, the method also gives reasonably accurate predictions of partial loudness.
Collapse
Affiliation(s)
- Zhangli Chen
- Department of Biomedical Engineering, Medical School, Tsinghua University, Beijing 100084, China
| | | | | | | |
Collapse
|
27
|
Moore BCJ, Sek A. Effect of level on the discrimination of harmonic and frequency-shifted complex tones at high frequencies. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 129:3206-3212. [PMID: 21568422 DOI: 10.1121/1.3570958] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Moore and Sęk [J. Acoust. Soc. Am. 125, 3186-3193 (2009)] measured discrimination of a harmonic complex tone and a tone in which all harmonics were shifted upwards by the same amount in Hertz. Both tones were passed through a fixed bandpass filter and a background noise was used to mask combination tones. Performance was well above chance when the fundamental frequency was 800 Hz, and all audible components were above 8000 Hz. Moore and Sęk argued that this suggested the use of temporal fine structure information at high frequencies. However, the task may have been performed using excitation-pattern cues. To test this idea, performance on a similar task was measured as a function of level. The auditory filters broaden with increasing level, so performance based on excitation-pattern cues would be expected to worsen as level increases. The results did not show such an effect, suggesting that the task was not performed using excitation-pattern cues.
Collapse
Affiliation(s)
- Brian C J Moore
- Department of Experimental Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, England.
| | | |
Collapse
|
28
|
Jurado C, Pedersen CS, Moore BCJ. Psychophysical tuning curves for frequencies below 100 Hz. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 129:3166-3180. [PMID: 21568419 DOI: 10.1121/1.3560535] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Psychophysical tuning curves (PTCs) were measured for sinusoidal signals with frequency f(s) = 31.5, 40, 50, 63, and 80 Hz, using sinusoidal and narrowband-noise maskers. For the former, conditions were included where a pair of beating tones was added to reduce the use of cues related to beats. Estimates of each subject's middle-ear transfer function (METF) were obtained from equal-loudness contours measured from 20 to 160 Hz. With decreasing f(s), the PTCs became progressively broadened and markedly asymmetrical, with shallow upper skirts and steep lower skirts. For the sinusoidal maskers, the tips were more irregular than for narrowband-noise maskers or when beating tones were added. For f(s) = 31.5 and 40 Hz, the tips of the PTCs always fell above f(s). Allowing for the METF so as to infer underlying filter shapes resulted in flatter lower skirts, especially below 40 Hz, and reduced the frequency at the tips for f(s) between 31.5 and 50 Hz; however, the tips did not fall below 40 to 50 Hz. The bandwidths of the PTCs increased with decreasing f(s) below 80 Hz. However, bandwidths remained roughly constant if the METF was included as part of auditory filtering for frequencies below 40 Hz.
Collapse
Affiliation(s)
- Carlos Jurado
- Section of Acoustics, Department of Electronic Systems, Aalborg University, Fredrik Bajers Vej 7-B5, Aalborg Ø 9220, Denmark.
| | | | | |
Collapse
|
29
|
Resolvability of components in complex tones and implications for theories of pitch perception. Hear Res 2011; 276:88-97. [PMID: 21236327 DOI: 10.1016/j.heares.2011.01.003] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/14/2010] [Revised: 12/03/2010] [Accepted: 01/04/2011] [Indexed: 11/20/2022]
Abstract
This paper reviews methods that have been used to estimate the resolvability of individual partials in harmonic and inharmonic complex tones and considers the implications of the results for theories of pitch perception. The methods include: requiring comparisons of the pitch of an isolated pure tone and a partial within a complex tone as a measure of the ability to "hear out" that partial; considering the magnitude of ripples in the calculated excitation pattern of a complex tone; using a complex tone as a forward masker and using ripples in the masking pattern to estimate resolvability; measuring sensitivity to the relative phase of the components within complex tones. The measures are broadly consistent in indicating that harmonics with numbers up to about five are well resolved, but that resolution decreases for higher harmonics. Most measures suggest that harmonics with numbers above eight are poorly, if at all, resolved. However, there are uncertainties associated with each method that make the exact upper limit of resolvability uncertain. Evidence is presented suggesting a partial dissociation between resolution in the excitation pattern and the ability to hear out a partial. It is proposed that the latter requires information from temporal fine structure (phase locking).
Collapse
|
30
|
Jurado C, Moore BCJ. Frequency selectivity for frequencies below 100 Hz: comparisons with mid-frequencies. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 128:3585-3596. [PMID: 21218891 DOI: 10.1121/1.3504657] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Auditory filter shapes were derived for signal frequencies (f(s)) between 50 and 1000 Hz, using the notched-noise method. The masker spectrum level (N(0)) was 50 dB (re 20 μPa). For f(s) = 63 and 50 Hz, measurements were also made with N(0) = 62 dB for the lower band. The data were fitted using a rounded-exponential filter model, with special consideration of the filtering effects of the middle-ear transfer function (METF) at low frequencies. The results showed: (1) For very low values of f(s), the lower skirts of the filters were only well defined when N(0) = 62 dB for the lower band; (2) the sharpness of both sides of the filters decreased with decreasing f(s); (3) the dynamic range of the filters decreased with decreasing f(s); (4) the equivalent rectangular bandwidth of the filters decreased with decreasing f(s) down to f(s) = 80 Hz, but increased for f(s) below that; (5) the assumed METF, which includes the shunt effect of the helicotrema for frequencies below 50 Hz, increasingly influenced the low-frequency skirt of the filters as f(s) decreased; and (6) detection efficiency worsened with decreasing f(s) for f(s) between 100 and 500 Hz, but improved slightly below that.
Collapse
Affiliation(s)
- Carlos Jurado
- Section of Acoustics, Department of Electronic Systems, Aalborg University, Fredrik Bajersvej 7-A, Denmark.
| | | |
Collapse
|
31
|
Ernst SMA, Moore BCJ. Mechanisms underlying the detection of frequency modulation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 128:3642-3648. [PMID: 21218896 DOI: 10.1121/1.3506350] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Frequency modulation detection limens (FMDLs) were measured for carrier frequencies (f(c)) of 1000, 4000, and 6000 Hz, using modulation frequencies (f(m)) of 2 and 10 Hz and levels of 20 and 60 dB sensation level (SL), both with and without random amplitude modulation (AM), applied in all intervals of a forced-choice trial. The AM was intended to disrupt excitation-pattern cues. At 60 dB SL, the deleterious effect of the AM was smaller for f(m) = 2 than for f(m) = 10 Hz for f(c) = 1000 and 4000 Hz, respectively, while for f(c) = 6000 Hz the deleterious effect was large and similar for the two values of f(m). This is consistent with the idea that, for f(c) below about 5000 Hz and f(m) = 2 Hz, frequency modulation can be detected via changes in phase locking over time. However, at 20 dB SL, the deleterious effect of the added AM for f(c) = 1000 and 4000 Hz was similar for the two values of f(m), while for f(c) = 6000 Hz, the deleterious effect of the AM was greater for f(m) = 10 than for f(m) = 2 Hz. It is suggested that, at low SLs, the auditory filters become relatively sharp and phase locking weakens, so that excitation-pattern cues influence FMDLs even for low f(c) and low f(m).
Collapse
Affiliation(s)
- Stephan M A Ernst
- Department of Experimental Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, United Kingdom.
| | | |
Collapse
|
32
|
Lyon RF, Rehn M, Bengio S, Walters TC, Chechik G. Sound Retrieval and Ranking Using Sparse Auditory Representations. Neural Comput 2010; 22:2390-416. [DOI: 10.1162/neco_a_00011] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
To create systems that understand the sounds that humans are exposed to in everyday life, we need to represent sounds with features that can discriminate among many different sound classes. Here, we use a sound-ranking framework to quantitatively evaluate such representations in a large-scale task. We have adapted a machine-vision method, the passive-aggressive model for image retrieval (PAMIR), which efficiently learns a linear mapping from a very large sparse feature space to a large query-term space. Using this approach, we compare different auditory front ends and different ways of extracting sparse features from high-dimensional auditory images. We tested auditory models that use an adaptive pole–zero filter cascade (PZFC) auditory filter bank and sparse-code feature extraction from stabilized auditory images with multiple vector quantizers. In addition to auditory image models, we compare a family of more conventional mel-frequency cepstral coefficient (MFCC) front ends. The experimental results show a significant advantage for the auditory models over vector-quantized MFCCs. When thousands of sound files with a query vocabulary of thousands of words were ranked, the best precision at top-1 was 73% and the average precision was 35%, reflecting a 18% improvement over the best competing MFCC front end.
Collapse
|
33
|
Henry KS, Lucas JR. Habitat-related differences in the frequency selectivity of auditory filters in songbirds. Funct Ecol 2009. [DOI: 10.1111/j.1365-2435.2009.01674.x] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
34
|
Millman RE, Prendergast G, Kitterick PT, Woods WP, Green GGR. Spatiotemporal reconstruction of the auditory steady-state response to frequency modulation using magnetoencephalography. Neuroimage 2009; 49:745-58. [PMID: 19699806 DOI: 10.1016/j.neuroimage.2009.08.029] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2009] [Revised: 07/09/2009] [Accepted: 08/13/2009] [Indexed: 11/28/2022] Open
Abstract
The aim of this study was to investigate the mechanisms involved in the perception of perceptually salient frequency modulation (FM) using auditory steady-state responses (ASSRs) measured with magnetoencephalography (MEG). Previous MEG studies using frequency-modulated amplitude modulation as stimuli (Luo et al., 2006, 2007) suggested that a phase modulation encoding mechanism exists for low (<5 Hz) FM modulation frequencies but additional amplitude modulation encoding is required for faster FM modulation frequencies. In this study single-cycle sinusoidal FM stimuli were used to generate the ASSR. The stimulus was either an unmodulated 1-kHz sinusoid or a 1-kHz sinusoid that was frequency-modulated with a repetition rate of 4, 8, or 12 Hz. The fast Fourier transform (FFT) of each MEG channel was calculated to obtain the phase and magnitude of the ASSR in sensor-space and multivariate Hotelling's T(2) statistics were used to determine the statistical significance of ASSRs. MEG beamformer analyses were used to localise the ASSR sources. Virtual electrode analyses were used to reconstruct the time series at each source. FFTs of the virtual electrode time series were calculated to obtain the amplitude and phase characteristics of each source identified in the beamforming analyses. Multivariate Hotelling's T(2) statistics were used to determine the statistical significance of these reconstructed ASSRs. The results suggest that the ability of auditory cortex to phase-lock to FM is dependent on the FM pulse rate and that the ASSR to FM is lateralised to the right hemisphere.
Collapse
Affiliation(s)
- Rebecca E Millman
- York Neuroimaging Centre, The Biocentre, York Science Park, Heslington, UK.
| | | | | | | | | |
Collapse
|
35
|
Bernstein LR, Trahiotis C. How sensitivity to ongoing interaural temporal disparities is affected by manipulations of temporal features of the envelopes of high-frequency stimuli. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 125:3234-42. [PMID: 19425666 PMCID: PMC2736741 DOI: 10.1121/1.3101454] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
This study addressed how manipulating certain aspects of the envelopes of high-frequency stimuli affects sensitivity to envelope-based interaural temporal disparities (ITDs). Listener's threshold ITDs were measured using an adaptive two-alternative paradigm employing "raised-sine" stimuli [John, M. S., et al. (2002). Ear Hear. 23, 106-117] which permit independent variation in their modulation frequency, modulation depth, and modulation exponent. Threshold ITDs were measured while manipulating modulation exponent for stimuli having modulation frequencies between 32 and 256 Hz. The results indicated that graded increases in the exponent led to graded decreases in envelope-based threshold ITDs. Threshold ITDs were also measured while parametrically varying modulation exponent and modulation depth. Overall, threshold ITDs decreased with increases in the modulation depth. Unexpectedly, increases in the exponent of the raised-sine led to especially large decreases in threshold ITD when the modulation depth was low. An interaural correlation-based model was generally able to capture changes in threshold ITD stemming from changes in the exponent, depth of modulation, and frequency of modulation of the raised-sine stimuli. The model (and several variations of it), however, could not account for the unexpected interaction between the value of raised-sine exponent and its modulation depth.
Collapse
Affiliation(s)
- Leslie R Bernstein
- Department of Neuroscience, University of Connecticut Health Center, Farmington, Connecticut 06030, USA
| | | |
Collapse
|
36
|
Ives DT, Patterson RD. Pitch strength decreases as F0 and harmonic resolution increase in complex tones composed exclusively of high harmonics. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 123:2670-2679. [PMID: 18529186 PMCID: PMC2423004 DOI: 10.1121/1.2890737] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
A melodic pitch experiment was performed to demonstrate the importance of time-interval resolution for pitch strength. The experiments show that notes with a low fundamental (75 Hz) and relatively few resolved harmonics support better performance than comparable notes with a higher fundamental (300 Hz) and more resolved harmonics. Two four note melodies were presented to listeners and one note in the second melody was changed by one or two semitones. Listeners were required to identify the note that changed. There were three orthogonal stimulus dimensions: F0 (75 and 300 Hz); lowest frequency component (3, 7, 11, or 15); and number of harmonics (4 or 8). Performance decreased as the frequency of the lowest component increased for both F0's, but performance was better for the lower F0. The spectral and temporal information in the stimuli were compared using a time-domain model of auditory perception. It is argued that the distribution of time intervals in the auditory nerve can explain the decrease in performance as F0, and spectral resolution increase. Excitation patterns based on the same time-interval information do not contain sufficient resolution to explain listener's performance on the melody task.
Collapse
Affiliation(s)
- D Timothy Ives
- Centre for Neural Basis of Hearing, Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge, CB2 3EG, United Kingdom.
| | | |
Collapse
|
37
|
Patterson RD, Johnsrude IS. Functional imaging of the auditory processing applied to speech sounds. Philos Trans R Soc Lond B Biol Sci 2008; 363:1023-35. [PMID: 17827103 PMCID: PMC2606794 DOI: 10.1098/rstb.2007.2157] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In this paper, we describe domain-general auditory processes that we believe are prerequisite to the linguistic analysis of speech. We discuss biological evidence for these processes and how they might relate to processes that are specific to human speech and language. We begin with a brief review of (i) the anatomy of the auditory system and (ii) the essential properties of speech sounds. Section 4 describes the general auditory mechanisms that we believe are applied to all communication sounds, and how functional neuroimaging is being used to map the brain networks associated with domain-general auditory processing. Section 5 discusses recent neuroimaging studies that explore where such general processes give way to those that are specific to human speech and language.
Collapse
Affiliation(s)
- Roy D Patterson
- Centre for the Neural Basis of Hearing, Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3EG, UK.
| | | |
Collapse
|