1
|
Chou KF, Boyd AD, Best V, Colburn HS, Sen K. A biologically oriented algorithm for spatial sound segregation. Front Neurosci 2022; 16:1004071. [PMID: 36312015 PMCID: PMC9614053 DOI: 10.3389/fnins.2022.1004071] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Accepted: 09/28/2022] [Indexed: 11/13/2022] Open
Abstract
Listening in an acoustically cluttered scene remains a difficult task for both machines and hearing-impaired listeners. Normal-hearing listeners accomplish this task with relative ease by segregating the scene into its constituent sound sources, then selecting and attending to a target source. An assistive listening device that mimics the biological mechanisms underlying this behavior may provide an effective solution for those with difficulty listening in acoustically cluttered environments (e.g., a cocktail party). Here, we present a binaural sound segregation algorithm based on a hierarchical network model of the auditory system. In the algorithm, binaural sound inputs first drive populations of neurons tuned to specific spatial locations and frequencies. The spiking response of neurons in the output layer are then reconstructed into audible waveforms via a novel reconstruction method. We evaluate the performance of the algorithm with a speech-on-speech intelligibility task in normal-hearing listeners. This two-microphone-input algorithm is shown to provide listeners with perceptual benefit similar to that of a 16-microphone acoustic beamformer. These results demonstrate the promise of this biologically inspired algorithm for enhancing selective listening in challenging multi-talker scenes.
Collapse
Affiliation(s)
- Kenny F. Chou
- Department of Biomedical Engineering, Boston University, Boston, MA, United States
| | - Alexander D. Boyd
- Department of Biomedical Engineering, Boston University, Boston, MA, United States
| | - Virginia Best
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, MA, United States
| | - H. Steven Colburn
- Department of Biomedical Engineering, Boston University, Boston, MA, United States
| | - Kamal Sen
- Department of Biomedical Engineering, Boston University, Boston, MA, United States
- *Correspondence: Kamal Sen,
| |
Collapse
|
2
|
Yun D, Jennings TR, Kidd G, Goupell MJ. Benefits of triple acoustic beamforming during speech-on-speech masking and sound localization for bilateral cochlear-implant users. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:3052. [PMID: 34241104 PMCID: PMC8102069 DOI: 10.1121/10.0003933] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Revised: 03/03/2021] [Accepted: 03/06/2021] [Indexed: 05/30/2023]
Abstract
Bilateral cochlear-implant (CI) users struggle to understand speech in noisy environments despite receiving some spatial-hearing benefits. One potential solution is to provide acoustic beamforming. A headphone-based experiment was conducted to compare speech understanding under natural CI listening conditions and for two non-adaptive beamformers, one single beam and one binaural, called "triple beam," which provides an improved signal-to-noise ratio (beamforming benefit) and usable spatial cues by reintroducing interaural level differences. Speech reception thresholds (SRTs) for speech-on-speech masking were measured with target speech presented in front and two maskers in co-located or narrow/wide separations. Numerosity judgments and sound-localization performance also were measured. Natural spatial cues, single-beam, and triple-beam conditions were compared. For CI listeners, there was a negligible change in SRTs when comparing co-located to separated maskers for natural listening conditions. In contrast, there were 4.9- and 16.9-dB improvements in SRTs for the beamformer and 3.5- and 12.3-dB improvements for triple beam (narrow and wide separations). Similar results were found for normal-hearing listeners presented with vocoded stimuli. Single beam improved speech-on-speech masking performance but yielded poor sound localization. Triple beam improved speech-on-speech masking performance, albeit less than the single beam, and sound localization. Thus, triple beam was the most versatile across multiple spatial-hearing domains.
Collapse
Affiliation(s)
- David Yun
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA
| | - Todd R Jennings
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Gerald Kidd
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Matthew J Goupell
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA
| |
Collapse
|
3
|
Wang L, Best V, Shinn-Cunningham BG. Benefits of Beamforming With Local Spatial-Cue Preservation for Speech Localization and Segregation. Trends Hear 2020; 24:2331216519896908. [PMID: 31931677 PMCID: PMC6961143 DOI: 10.1177/2331216519896908] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
A study was conducted to examine the benefits afforded by a
signal-processing strategy that imposes the binaural cues present in a
natural signal, calculated locally in time and frequency, on the
output of a beamforming microphone array. Such a strategy has the
potential to combine the signal-to-noise ratio advantage of
beamforming with the perceptual benefit of spatialization to enhance
performance in multitalker mixtures. Participants with normal hearing
and with hearing loss were tested on both speech localization and
speech-on-speech masking tasks. Performance for the spatialized
beamformer was compared with that for three other conditions: a
reference condition with no processing, a beamformer with no
spatialization, and a hybrid beamformer that operates only in the high
frequencies to preserve natural binaural cues in the low frequencies.
Beamforming with full-bandwidth spatialization supported speech
localization and produced better speech reception thresholds than the
other conditions.
Collapse
Affiliation(s)
- Le Wang
- Department of Biomedical Engineering, Boston University, Boston, MA, USA
| | - Virginia Best
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, MA, USA
| | | |
Collapse
|
4
|
Villard S, Kidd G. Assessing the benefit of acoustic beamforming for listeners with aphasia using modified psychoacoustic methods. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:2894. [PMID: 33261373 PMCID: PMC8097716 DOI: 10.1121/10.0002454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 10/13/2020] [Accepted: 10/14/2020] [Indexed: 06/12/2023]
Abstract
Acoustic beamforming has been shown to improve identification of target speech in noisy listening environments for individuals with sensorineural hearing loss. This study examined whether beamforming would provide a similar benefit for individuals with aphasia (acquired neurological language impairment). The benefit of beamforming was examined for persons with aphasia (PWA) and age- and hearing-matched controls in both a speech masking condition and a speech-shaped, speech-modulated noise masking condition. Performance was measured when natural spatial cues were provided, as well as when the target speech level was enhanced via a single-channel beamformer. Because typical psychoacoustic methods may present substantial experimental confounds for PWA, clinically guided modifications of experimental procedures were determined individually for each PWA participant. Results indicated that the beamformer provided a significant overall benefit to listeners. On an individual level, both PWA and controls who exhibited poorer performance on the speech masking condition with spatial cues benefited from the beamformer, while those who achieved better performance with spatial cues did not. All participants benefited from the beamformer in the noise masking condition. The findings suggest that a spatially tuned hearing aid may be beneficial for older listeners with relatively mild hearing loss who have difficulty taking advantage of spatial cues.
Collapse
Affiliation(s)
- Sarah Villard
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Gerald Kidd
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| |
Collapse
|
5
|
|
6
|
Kidd G, Mason CR, Best V, Swaminathan J. Benefits of Acoustic Beamforming for Solving the Cocktail Party Problem. Trends Hear 2015; 19:2331216515593385. [PMID: 26126896 PMCID: PMC4509760 DOI: 10.1177/2331216515593385] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
The benefit provided to listeners with sensorineural hearing loss (SNHL) by an acoustic beamforming microphone array was determined in a speech-on-speech masking experiment. Normal-hearing controls were tested as well. For the SNHL listeners, prescription-determined gain was applied to the stimuli, and performance using the beamformer was compared with that obtained using bilateral amplification. The listener identified speech from a target talker located straight ahead (0° azimuth) in the presence of four competing talkers that were either colocated with, or spatially separated from, the target. The stimuli were spatialized using measured impulse responses and presented via earphones. In the spatially separated masker conditions, the four maskers were arranged symmetrically around the target at ±15° and ±30° or at ±45° and ±90°. Results revealed that masked speech reception thresholds for spatially separated maskers were higher (poorer) on average for the SNHL than for the normal-hearing listeners. For most SNHL listeners in the wider masker separation condition, lower thresholds were obtained through the microphone array than through bilateral amplification. Large intersubject differences were found in both listener groups. The best masked speech reception thresholds overall were found for a hybrid condition that combined natural and beamforming listening in order to preserve localization for broadband sources.
Collapse
|
7
|
Kidd G, Favrot S, Desloge JG, Streeter TM, Mason CR. Design and preliminary testing of a visually guided hearing aid. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 133:EL202-EL207. [PMID: 23464129 PMCID: PMC3585754 DOI: 10.1121/1.4791710] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2012] [Accepted: 01/28/2013] [Indexed: 06/01/2023]
Abstract
An approach to hearing aid design is described, and preliminary acoustical and perceptual measurements are reported, in which an acoustic beam-forming microphone array is coupled to an eye-glasses-mounted eye-tracker. This visually guided hearing aid (VGHA)-currently a laboratory-based prototype-senses direction of gaze using the eye tracker and an interface converts those values into control signals that steer the acoustic beam accordingly. Preliminary speech intelligibility measurements with noise and speech maskers revealed near- or better-than normal spatial release from masking with the VGHA. Although not yet a wearable prosthesis, the principle underlying the device is supported by these findings.
Collapse
Affiliation(s)
- Gerald Kidd
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA.
| | | | | | | | | |
Collapse
|
8
|
Speech/Music Classification Based on Distributed Evolutionary Fuzzy Logic for Intelligent Audio Coding. PATTERN RECOGNITION AND IMAGE ANALYSIS 2007. [DOI: 10.1007/978-3-540-72849-8_70] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
9
|
Schwetz I, Gruhler G, Obermayer K. A cross-spectrum weighting algorithm for speech enhancement and array processing: combining phase-shift information and stationary signal properties. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2006; 119:952-64. [PMID: 16521757 DOI: 10.1121/1.2149767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
In this paper, a gain function for noise cancellation with a two-channel microphone array is presented. This gain function combines ideas from one- and multichannel algorithms. It is developed using a minimum mean square error estimator for the amplitude of the speech signal from the cross spectrum between two microphone signals. To consider speech pauses and the absence of spectral components of the speech, an extension of this gain function is presented. The performance of the overall gain function is shown in terms of the cancellation of (diffuse) driving noise as well as the cancellation of an interfering speech signal, both recorded in a car.
Collapse
|
10
|
Lockwood ME, Jones DL. Beamformer performance with acoustic vector sensors in air. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2006; 119:608-19. [PMID: 16454314 DOI: 10.1121/1.2139073] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
For some time, compact acoustic vector sensors (AVSs) capable of sensing particle velocity in three orthogonal directions have been used in underwater acoustic sensing applications. Potential advantages of using AVSs in air include substantial noise reduction with a very small aperture and few channels. For this study, a four-microphone array approximating a small (1 cm3) AVS in air was constructed using three gradient microphones and one omnidirectional microphone. This study evaluates the signal extraction performance of one nonadaptive and four adaptive beamforming algorithms. Test signals, consisting of two to five speech sources, were processed with each algorithm, and the signal extraction performance was quantified by calculating the signal-to-noise ratio (SNR) of the output. For a three-microphone array, robust and nonrobust versions of a frequency-domain minimum-variance (FMV) distortionless-response beamformer produced SNR improvements of 11 to 14 dB, and a generalized sidelobe canceller (GSC) produced improvements of 5.5 to 8.5 dB. In comparison, a two-microphone omnidirectional array with a spacing of 15 cm yielded slightly lower SNR improvements for similar multi-interferer speech signals.
Collapse
Affiliation(s)
- Michael E Lockwood
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, 405 North Mathews Avenue, Urbana, Illinois 61801, USA.
| | | |
Collapse
|
11
|
Sato H, Bradley JS, Morimoto M. Using listening difficulty ratings of conditions for speech communication in rooms. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2005; 117:1157-1167. [PMID: 15807005 DOI: 10.1121/1.1849936] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
The use of listening difficulty ratings of speech communication in rooms is explored because, in common situations, word recognition scores do not discriminate well among conditions that are near to acceptable. In particular, the benefits of early reflections of speech sounds on listening difficulty were investigated and compared to the known benefits to word intelligibility scores. Listening tests were used to assess word intelligibility and perceived listening difficulty of speech in simulated sound fields. The experiments were conducted in three types of sound fields with constant levels of ambient noise: only direct sound, direct sound with early reflections, and direct sound with early reflections and reverberation. The results demonstrate that (1) listening difficulty can better discriminate among these conditions than can word recognition scores; (2) added early reflections increase the effective signal-to-noise ratio equivalent to the added energy in the conditions without reverberation; (3) the benefit of early reflections on difficulty scores is greater than expected from the simple increase in early arriving speech energy with reverberation; (4) word intelligibility tests are most appropriate for conditions with signal-to-noise (S/N) ratios less than 0 dBA, and where S/N is between 0 and 15-dBA S/N, listening difficulty is a more appropriate evaluation tool.
Collapse
Affiliation(s)
- Hiroshi Sato
- Institute for Research in Construction, National Research Council, Ottawa, K1A 0R6 Canada.
| | | | | |
Collapse
|
12
|
Lockwood ME, Jones DL, Bilger RC, Lansing CR, O'Brien WD, Wheeler BC, Feng AS. Performance of time- and frequency-domain binaural beamformers based on recorded signals from real rooms. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2004; 115:379-391. [PMID: 14759029 DOI: 10.1121/1.1624064] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Extraction of a target sound source amidst multiple interfering sound sources is difficult when there are fewer sensors than sources, as is the case for human listeners in the classic cocktail-party situation. This study compares the signal extraction performance of five algorithms using recordings of speech sources made with three different two-microphone arrays in three rooms of varying reverberation time. Test signals, consisting of two to five speech sources, were constructed for each room and array. The signals were processed with each algorithm, and the signal extraction performance was quantified by calculating the signal-to-noise ratio of the output. A frequency-domain minimum-variance distortionless-response beamformer outperformed the time-domain based Frost beamformer and generalized sidelobe canceler for all tests with two or more interfering sound sources, and performed comparably or better than the time-domain algorithms for tests with one interfering sound source. The frequency-domain minimum-variance algorithm offered performance comparable to that of the Peissig-Kollmeier binaural frequency-domain algorithm, but with much less distortion of the target signal. Comparisons were also made to a simple beamformer. In addition, computer simulations illustrate that, when processing speech signals, the chosen implementation of the frequency-domain minimum-variance technique adapts more quickly and accurately than time-domain techniques.
Collapse
Affiliation(s)
- Michael E Lockwood
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, 405 North Mathews Ave., Urbana, Illinois 61801, USA.
| | | | | | | | | | | | | |
Collapse
|