1
|
Verschooten E, Shamma S, Oxenham AJ, Moore BCJ, Joris PX, Heinz MG, Plack CJ. The upper frequency limit for the use of phase locking to code temporal fine structure in humans: A compilation of viewpoints. Hear Res 2019; 377:109-121. [PMID: 30927686 PMCID: PMC6524635 DOI: 10.1016/j.heares.2019.03.011] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Revised: 02/09/2019] [Accepted: 03/13/2019] [Indexed: 11/27/2022]
Abstract
The relative importance of neural temporal and place coding in auditory perception is still a matter of much debate. The current article is a compilation of viewpoints from leading auditory psychophysicists and physiologists regarding the upper frequency limit for the use of neural phase locking to code temporal fine structure in humans. While phase locking is used for binaural processing up to about 1500 Hz, there is disagreement regarding the use of monaural phase-locking information at higher frequencies. Estimates of the general upper limit proposed by the contributors range from 1500 to 10000 Hz. The arguments depend on whether or not phase locking is needed to explain psychophysical discrimination performance at frequencies above 1500 Hz, and whether or not the phase-locked neural representation is sufficiently robust at these frequencies to provide useable information. The contributors suggest key experiments that may help to resolve this issue, and experimental findings that may cause them to change their minds. This issue is of crucial importance to our understanding of the neural basis of auditory perception in general, and of pitch perception in particular.
Collapse
Affiliation(s)
- Eric Verschooten
- Laboratory of Auditory Neurophysiology, KU Leuven, B-3000, Leuven, Belgium
| | - Shihab Shamma
- Institute for Systems Research and Electrical and Computer Engineering, University of Maryland, College Park, MD, 20742, USA; Laboratory of Sensory Perception, Department of Cognitive Studies, Ecole Normale Superieure, 29 Rue d'Ulm, Paris, 75005, France
| | - Andrew J Oxenham
- Department of Psychology, University of Minnesota, N218 Elliott Hall, 75 E. River Road, Minneapolis, MN, 55455, USA
| | - Brian C J Moore
- Department of Psychology, University of Cambridge, Downing Street, Cambridge, CB2 3EB, UK
| | - Philip X Joris
- Laboratory of Auditory Neurophysiology, KU Leuven, B-3000, Leuven, Belgium
| | - Michael G Heinz
- Departments of Speech, Language, & Hearing Sciences and Biomedical Engineering, Purdue University, 715 Clinic Drive, West Lafayette, IN, 47907, USA
| | - Christopher J Plack
- Manchester Centre for Audiology and Deafness, The University of Manchester, Manchester Academic Health Science Centre, M13 9PL, UK; Department of Psychology, Lancaster University, Lancaster, LA1 4YF, UK.
| |
Collapse
|
2
|
Effect of sound level on virtual and free-field localization of brief sounds in the anterior median plane. Hear Res 2018; 365:28-35. [DOI: 10.1016/j.heares.2018.06.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/18/2017] [Revised: 05/31/2018] [Accepted: 06/08/2018] [Indexed: 11/19/2022]
|
3
|
Spagnol S. On distance dependence of pinna spectral patterns in head-related transfer functions. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 137:EL58-EL64. [PMID: 25618100 DOI: 10.1121/1.4903919] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
The aim of this letter is to address a little understood question in sound source localization: Can the distance of a near sound source affect our own perception of its elevation? The issue is studied by means of an objective analysis of a database of distance-dependent head-related transfer functions (HRTFs) of a KEMAR (Knowles Electronic Manikin for Acoustic Research) mannequin with different pinnae on a dense spatial grid. Iso-directional HRTFs are compared through spectral error metrics; results indicate significant distance-dependent HRTF modifications due to the pinna occur when the source is close to the interaural axis.
Collapse
Affiliation(s)
- Simone Spagnol
- Department of Information Engineering, University of Padova, Padova 35131, Italy
| |
Collapse
|
4
|
Spagnol S, Geronazzo M, Rocchesso D, Avanzini F. Synthetic individual binaural audio delivery by pinna image processing. INTERNATIONAL JOURNAL OF PERVASIVE COMPUTING AND COMMUNICATIONS 2014. [DOI: 10.1108/ijpcc-06-2014-0035] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Purpose
– The purpose of this paper is to present a system for customized binaural audio delivery based on the extraction of relevant features from a 2-D representation of the listener’s pinna.
Design/methodology/approach
– The most significant pinna contours are extracted by means of multi-flash imaging, and they provide values for the parameters of a structural head-related transfer function (HRTF) model. The HRTF model spatializes a given sound file according to the listener’s head orientation, tracked by sensor-equipped headphones, with respect to the virtual sound source.
Findings
– A preliminary localization test shows that the model is able to statically render the elevation of a virtual sound source better than non-individual HRTFs.
Research limitations/implications
– Results encourage a deeper analysis of the psychoacoustic impact that the individualized HRTF model has on perceived elevation of virtual sound sources.
Practical implications
– The model has low complexity and is suitable for implementation on mobile devices. The resulting hardware/software package will hopefully allow an easy and low-tech fruition of custom spatial audio to any user.
Originality/value
– The authors show that custom binaural audio can be successfully deployed without the need of cumbersome subjective measurements.
Collapse
|
5
|
Alves-Pinto A, Palmer AR, Lopez-Poveda EA. Perception and coding of high-frequency spectral notches: potential implications for sound localization. Front Neurosci 2014; 8:112. [PMID: 24904258 PMCID: PMC4034511 DOI: 10.3389/fnins.2014.00112] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Accepted: 04/29/2014] [Indexed: 11/13/2022] Open
Abstract
The interaction of sound waves with the human pinna introduces high-frequency notches (5-10 kHz) in the stimulus spectrum that are thought to be useful for vertical sound localization. A common view is that these notches are encoded as rate profiles in the auditory nerve (AN). Here, we review previously published psychoacoustical evidence in humans and computer-model simulations of inner hair cell responses to noises with and without high-frequency spectral notches that dispute this view. We also present new recordings from guinea pig AN and "ideal observer" analyses of these recordings that suggest that discrimination between noises with and without high-frequency spectral notches is probably based on the information carried in the temporal pattern of AN discharges. The exact nature of the neural code involved remains nevertheless uncertain: computer model simulations suggest that high-frequency spectral notches are encoded in spike timing patterns that may be operant in the 4-7 kHz frequency regime, while "ideal observer" analysis of experimental neural responses suggest that an effective cue for high-frequency spectral discrimination may be based on sampling rates of spike arrivals of AN fibers using non-overlapping time binwidths of between 4 and 9 ms. Neural responses show that sensitivity to high-frequency notches is greatest for fibers with low and medium spontaneous rates than for fibers with high spontaneous rates. Based on this evidence, we conjecture that inter-subject variability at high-frequency spectral notch detection and, consequently, at vertical sound localization may partly reflect individual differences in the available number of functional medium- and low-spontaneous-rate fibers.
Collapse
Affiliation(s)
- Ana Alves-Pinto
- Klinikum rechts der Isar, Technische Universität MünchenMunich, Germany
| | - Alan R. Palmer
- Medical Research Council Institute of Hearing Research, University ParkNottingham, UK
| | - Enrique A. Lopez-Poveda
- Departamento de Cirugía, Facultad de Medicina, Instituto de Neurociencias de Castilla y León, Instituto de Investigación Biomédica de Salamanca, Universidad de SalamancaSalamanca, Spain
| |
Collapse
|
6
|
Suied C, Agus TR, Thorpe SJ, Mesgarani N, Pressnitzer D. Auditory gist: recognition of very short sounds from timbre cues. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 135:1380-1391. [PMID: 24606276 DOI: 10.1121/1.4863659] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Sounds such as the voice or musical instruments can be recognized on the basis of timbre alone. Here, sound recognition was investigated with severely reduced timbre cues. Short snippets of naturally recorded sounds were extracted from a large corpus. Listeners were asked to report a target category (e.g., sung voices) among other sounds (e.g., musical instruments). All sound categories covered the same pitch range, so the task had to be solved on timbre cues alone. The minimum duration for which performance was above chance was found to be short, on the order of a few milliseconds, with the best performance for voice targets. Performance was independent of pitch and was maintained when stimuli contained less than a full waveform cycle. Recognition was not generally better when the sound snippets were time-aligned with the sound onset compared to when they were extracted with a random starting time. Finally, performance did not depend on feedback or training, suggesting that the cues used by listeners in the artificial gating task were similar to those relevant for longer, more familiar sounds. The results show that timbre cues for sound recognition are available at a variety of time scales, including very short ones.
Collapse
Affiliation(s)
- Clara Suied
- Institut de Recherche Biomédicale des Armées, Département Action et Cognition en Situation Opérationnelle, 91223 Brétigny sur Orge, France
| | - Trevor R Agus
- Sonic Arts Research Centre, School of Creative Arts, 1 Cloreen Park, Queen's University Belfast, Belfast, BT7 1NN, United Kingdom
| | - Simon J Thorpe
- Centre de Recherche Cerveau et Cognition, UMR 5549, CNRS and Université Paul Sabatier, Toulouse, France
| | - Nima Mesgarani
- Departments of Neurological Surgery and Physiology, UCSF Center for Integrative Neuroscience, University of California, San Francisco, California 94143
| | - Daniel Pressnitzer
- Laboratoire des Systèmes Perceptifs, UMR 8248, CNRS and École normale supérieure, 29 rue d'Ulm, 75005 Paris, France
| |
Collapse
|
7
|
Macpherson EA, Sabin AT. Vertical-plane sound localization with distorted spectral cues. Hear Res 2013; 306:76-92. [PMID: 24076423 DOI: 10.1016/j.heares.2013.09.007] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/16/2013] [Revised: 09/11/2013] [Accepted: 09/17/2013] [Indexed: 10/26/2022]
Abstract
For human listeners, the primary cues for localization in the vertical plane are provided by the direction-dependent filtering of the pinnae, head, and upper body. Vertical-plane localization generally is accurate for broadband sounds, but when such sounds are presented at near-threshold levels or at high levels with short durations (<20 ms), the apparent location is biased toward the horizontal plane (i.e., elevation gain <1). We tested the hypothesis that these effects result in part from distorted peripheral representations of sound spectra. Human listeners indicated the apparent position of 100-ms, 50-60 dB SPL, wideband noise-burst targets by orienting their heads. The targets were synthesized in virtual auditory space and presented over headphones. Faithfully synthesized targets were interleaved with targets for which the directional transfer function spectral notches were filled in, peaks were leveled off, or the spectral contrast of the entire profile was reduced or expanded. As notches were filled in progressively or peaks leveled progressively, elevation gain decreased in a graded manner similar to that observed as sensation level is reduced below 30 dB or, for brief sounds, increased above 45 dB. As spectral contrast was reduced, gain dropped only at the most extreme reduction (25% of normal). Spectral contrast expansion had little effect. The results are consistent with the hypothesis that loss of representation of spectral features contributes to reduced elevation gain at low and high sound levels. The results also suggest that perceived location depends on a correlation-like spectral matching process that is sensitive to the relative, rather than absolute, across-frequency shape of the spectral profile.
Collapse
Affiliation(s)
- Ewan A Macpherson
- Kresge Hearing Research Institute, University of Michigan Medical School, 1150 W. Medical Center Drive, Ann Arbor, MI 48109-5616, USA; National Centre for Audiology, Western University, 1201 Western Road, London, Ontario, Canada N6G 1H1.
| | | |
Collapse
|
8
|
Size does not matter: size-invariant echo-acoustic object classification. J Comp Physiol A Neuroethol Sens Neural Behav Physiol 2012. [DOI: 10.1007/s00359-012-0777-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
9
|
Dubno JR, Ahlstrom JB, Wang X, Horwitz AR. Level-dependent changes in perception of speech envelope cues. J Assoc Res Otolaryngol 2012; 13:835-52. [PMID: 22872414 DOI: 10.1007/s10162-012-0343-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2012] [Accepted: 07/16/2012] [Indexed: 11/28/2022] Open
Abstract
Level-dependent changes in temporal envelope fluctuations in speech and related changes in speech recognition may reveal effects of basilar-membrane nonlinearities. As a result of compression in the basilar-membrane response, the "effective" magnitude of envelope fluctuations may be reduced as speech level increases from lower level (more linear) to mid-level (more compressive) regions. With further increases to a more linear region, speech envelope fluctuations may become more pronounced. To assess these effects, recognition of consonants and key words in sentences was measured as a function of speech level for younger adults with normal hearing. Consonant-vowel syllables and sentences were spectrally degraded using "noise vocoder" processing to maximize perceptual effects of changes to the speech envelope. Broadband noise at a fixed signal-to-noise ratio maintained constant audibility as speech level increased. Results revealed significant increases in scores and envelope-dependent feature transmission from 45 to 60 dB SPL and decreasing scores and feature transmission from 60 to 85 dB SPL. This quadratic pattern, with speech recognition maximized at mid levels and poorer at lower and higher levels, is consistent with a role of cochlear nonlinearities in perception of speech envelope cues.
Collapse
Affiliation(s)
- Judy R Dubno
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 550, Charleston, SC 29425-5500, USA.
| | | | | | | |
Collapse
|
10
|
Linnenschmidt M, Beedholm K, Wahlberg M, Højer-Kristensen J, Nachtigall PE. Keeping returns optimal: gain control exerted through sensitivity adjustments in the harbour porpoise auditory system. Proc Biol Sci 2012; 279:2237-45. [PMID: 22279169 DOI: 10.1098/rspb.2011.2465] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Animals that use echolocation (biosonar) listen to acoustic signals with a large range of intensities, because echo levels vary with the fourth power of the animal's distance to the target. In man-made sonar, engineers apply automatic gain control to stabilize the echo energy levels, thereby rendering them independent of distance to the target. Both toothed whales and bats vary the level of their echolocation clicks to compensate for the distance-related energy loss. By monitoring the auditory brainstem response (ABR) during a psychophysical task, we found that a harbour porpoise (Phocoena phocoena), in addition to adjusting the sound level of the outgoing signals up to 5.4 dB, also reduces its ABR threshold by 6 dB when the target distance doubles. This self-induced threshold shift increases the dynamic range of the biosonar system and compensates for half of the variation of energy that is caused by changes in the distance to the target. In combination with an increased source level as a function of target range, this helps the porpoise to maintain a stable echo-evoked ABR amplitude irrespective of target range, and is therefore probably an important tool enabling porpoises to efficiently analyse and classify received echoes.
Collapse
Affiliation(s)
- Meike Linnenschmidt
- Institute of Biology, University of Southern Denmark, Campusvej 55, 5230 Odense M, Denmark.
| | | | | | | | | |
Collapse
|
11
|
Reiss LAJ, Ramachandran R, May BJ. Effects of signal level and background noise on spectral representations in the auditory nerve of the domestic cat. J Assoc Res Otolaryngol 2010; 12:71-88. [PMID: 20824483 DOI: 10.1007/s10162-010-0232-5] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2009] [Accepted: 08/09/2010] [Indexed: 12/22/2022] Open
Abstract
Background noise poses a significant obstacle for auditory perception, especially among individuals with hearing loss. To better understand the physiological basis of this perceptual impediment, the present study evaluated the effects of background noise on the auditory nerve representation of head-related transfer functions (HRTFs). These complex spectral shapes describe the directional filtering effects of the head and torso. When a broadband sound passes through the outer ear en route to the tympanic membrane, the HRTF alters its spectrum in a manner that establishes the perceived location of the sound source. HRTF-shaped noise shares many of the acoustic features of human speech, while communicating biologically relevant localization cues that are generalized across mammalian species. Previous studies have used parametric manipulations of random spectral shapes to elucidate HRTF coding principles at various stages of the cat's auditory system. This study extended that body of work by examining the effects of sound level and background noise on the quality of spectral coding in the auditory nerve. When fibers were classified by their spontaneous rates, the coding properties of the more numerous low-threshold, high-spontaneous rate fibers were found to degrade at high presentation levels and in low signal-to-noise ratios. Because cats are known to maintain accurate directional hearing under these challenging listening conditions, behavioral performance may be disproportionally based on the enhanced dynamic range of the less common high-threshold, low-spontaneous rate fibers.
Collapse
Affiliation(s)
- Lina A J Reiss
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21205, USA.
| | | | | |
Collapse
|
12
|
Malinina ES, Andreeva IG, Altman YA. Localization of virtual stimuli moving in the median plane by listeners of different age. J EVOL BIOCHEM PHYS+ 2009. [DOI: 10.1134/s0022093009020091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
13
|
Alves-Pinto A, Lopez-Poveda EA. Psychophysical assessment of the level-dependent representation of high-frequency spectral notches in the peripheral auditory system. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 124:409-421. [PMID: 18646986 DOI: 10.1121/1.2920957] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
To discriminate between broadband noises with and without a high-frequency spectral notch is more difficult at 70-80 dB sound pressure level than at lower or higher levels [Alves-Pinto, A. and Lopez-Poveda, E. A. (2005). "Detection of high-frequency spectral notches as a function of level," J. Acoust. Soc. Am. 118, 2458-2469]. One possible explanation is that the notch is less clearly represented internally at 70-80 dB SPL than at any other level. To test this hypothesis, forward-masking patterns were measured for flat-spectrum and notched noise maskers for masker levels of 50, 70, 80, and 90 dB SPL. Masking patterns were measured in two conditions: (1) fixing the masker-probe time interval at 2 ms and (2) varying the interval to achieve similar masked thresholds for different masker levels. The depth of the spectral notch remained approximately constant in the fixed-interval masking patterns and gradually decreased with increasing masker level in the variable-interval masking patterns. This difference probably reflects the effects of peripheral compression. These results are inconsistent with the nonmonotonic level-dependent performance in spectral discrimination. Assuming that a forward-masking pattern is a reasonable psychoacoustical correlate of the auditory-nerve rate-profile representation of the stimulus spectrum, these results undermine the common view that high-frequency spectral notches must be encoded in the rate-profile of auditory-nerve fibers.
Collapse
Affiliation(s)
- Ana Alves-Pinto
- Unidad de Audición Computacional y Psicoacústica, Instituto de Neurociencias de Castilla y León, Universidad de Salamanca, Avenida Alfonso X "El Sabio" s/n, 37007 Salamanca, Spain.
| | | |
Collapse
|
14
|
Lopez-Poveda EA, Alves-Pinto A, Palmer AR, Eustaquio-Martín A. Rate versus time representation of high-frequency spectral notches in the peripheral auditory system: A computational modeling study. Neurocomputing 2008. [DOI: 10.1016/j.neucom.2007.07.030] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
15
|
Lopez-Poveda EA. Spectral processing by the peripheral auditory system: facts and models. INTERNATIONAL REVIEW OF NEUROBIOLOGY 2005; 70:7-48. [PMID: 16472630 DOI: 10.1016/s0074-7742(05)70001-5] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Affiliation(s)
- Enrique A Lopez-Poveda
- Instituto de Neurociencias de Castilla y León, Universidad de Salamanca, Salamanca 37007, Spain
| |
Collapse
|