1
|
|
2
|
A novel cascade structure for joint backward blind acoustic noise and echo cancellation systems. SN APPLIED SCIENCES 2020. [DOI: 10.1007/s42452-020-2942-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
|
3
|
Lee H. Simultaneous Blind Separation and Recognition of Speech Mixtures Using Two Microphones to Control a Robot Cleaner. INT J ADV ROBOT SYST 2017. [DOI: 10.5772/55408] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Affiliation(s)
- Heungkyu Lee
- Speech Group, Future IT R&D Lab, LG Electronics Advanced Research Institute, Seoul, Republic of Korea
| |
Collapse
|
4
|
Deleforge A, Forbes F, Horaud R. Acoustic Space Learning for Sound-Source Separation and Localization on Binaural Manifolds. Int J Neural Syst 2015; 25:1440003. [DOI: 10.1142/s0129065714400036] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In this paper, we address the problems of modeling the acoustic space generated by a full-spectrum sound source and using the learned model for the localization and separation of multiple sources that simultaneously emit sparse-spectrum sounds. We lay theoretical and methodological grounds in order to introduce the binaural manifold paradigm. We perform an in-depth study of the latent low-dimensional structure of the high-dimensional interaural spectral data, based on a corpus recorded with a human-like audiomotor robot head. A nonlinear dimensionality reduction technique is used to show that these data lie on a two-dimensional (2D) smooth manifold parameterized by the motor states of the listener, or equivalently, the sound-source directions. We propose a probabilistic piecewise affine mapping model (PPAM) specifically designed to deal with high-dimensional data exhibiting an intrinsic piecewise linear structure. We derive a closed-form expectation-maximization (EM) procedure for estimating the model parameters, followed by Bayes inversion for obtaining the full posterior density function of a sound-source direction. We extend this solution to deal with missing data and redundancy in real-world spectrograms, and hence for 2D localization of natural sound sources such as speech. We further generalize the model to the challenging case of multiple sound sources and we propose a variational EM framework. The associated algorithm, referred to as variational EM for source separation and localization (VESSL) yields a Bayesian estimation of the 2D locations and time-frequency masks of all the sources. Comparisons of the proposed approach with several existing methods reveal that the combination of acoustic-space learning with Bayesian inference enables our method to outperform state-of-the-art methods.
Collapse
Affiliation(s)
- Antoine Deleforge
- INRIA Grenoble Rhône-Alpes, 655 Avenue de l'Europe, Saint-Ismier, 38334, France
| | - Florence Forbes
- INRIA Grenoble Rhône-Alpes, 655 Avenue de l'Europe, Saint-Ismier, 38334, France
| | - Radu Horaud
- INRIA Grenoble Rhône-Alpes, 655 Avenue de l'Europe, Saint-Ismier, 38334, France
| |
Collapse
|
5
|
Reindl K, Zheng Y, Schwarz A, Meier S, Maas R, Sehr A, Kellermann W. A stereophonic acoustic signal extraction scheme for noisy and reverberant environments. COMPUT SPEECH LANG 2013. [DOI: 10.1016/j.csl.2012.07.011] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
6
|
Yoshioka T, Nakatani T. Generalization of Multi-Channel Linear Prediction Methods for Blind MIMO Impulse Response Shortening. ACTA ACUST UNITED AC 2012. [DOI: 10.1109/tasl.2012.2210879] [Citation(s) in RCA: 121] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
7
|
Practically Efficient Blind Speech Separation Using Frequency Band Selection Based on Magnitude Squared Coherence and a Small Dodecahedral Microphone Array. JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING 2012. [DOI: 10.1155/2012/324398] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Small agglomerative microphone array systems have been proposed for use with speech communication and recognition systems. Blind source separation methods based on frequency domain independent component analysis have shown significant separation performance, and the microphone arrays are small enough to make them portable. However, the level of computational complexity involved is very high because the conventional signal collection and processing method uses 60 microphones. In this paper, we propose a band selection method based on magnitude squared coherence. Frequency bands are selected based on the spatial and geometric characteristics of the microphone array device which is strongly related to the dodecahedral shape, and the selected bands are nonuniformly spaced. The estimated reduction in the computational complexity is 90% with a 68% reduction in the number of frequency bands. Separation performance achieved during our experimental evaluation was 7.45 (dB) (signal-to-noise ratio) and 2.30 (dB) (cepstral distortion). These results show improvement in performance compared to the use of uniformly spaced frequency band.
Collapse
|
8
|
Koldovsky Z, Tichavsky P. Time-Domain Blind Separation of Audio Sources on the Basis of a Complete ICA Decomposition of an Observation Space. ACTA ACUST UNITED AC 2011. [DOI: 10.1109/tasl.2010.2049411] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
9
|
Woods WS, Merks I, Zhang T, Fitz K, Edwards B. Assessing the benefit of adaptive null-steering using real-world signals. Int J Audiol 2010; 49:434-43. [PMID: 20192874 DOI: 10.3109/14992020903518128] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
This study compared the noise reduction of adaptive null-steering and near-hypercardioid directional hearing-aid algorithms via performance on real-world signals. Using subject-individualized and generic (i.e. similar to current hearing aids), off-line frequency-domain implementations, we processed recordings made through two microphones of a BTE device worn by five subjects. Recording scenarios included homes, offices, cafés, streets, buses, and automobiles. We found practically all (> 95% of recording time) adaptive noise-reduction benefit for generic implementations is below 1.2 dB, and 96% and 92% is below 2 dB for 16-and 32-band individualized implementations, respectively. A 256-band, individualized implementation showed a majority of benefit between 1-4 dB. We found no extended (> 2 s) continuous periods of significant (> 2 dB) benefit for the generic adaptive implementations. The recordings-having many independent and simultaneously active sources, spatially extended sources, significant reverberation, or combinations thereof-indicate an environment comprising few instances of high direct-to-diffuse energy situations. Combined with results from previous field trials, the evidence suggests that such an environment is common and represents a significant limitation on adaptive benefit.
Collapse
|
10
|
Tonazzini A, Gerace I, Martinelli F. Multichannel blind separation and deconvolution of images for document analysis. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2010; 19:912-925. [PMID: 20028627 DOI: 10.1109/tip.2009.2038814] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
In this paper, we apply Bayesian blind source separation (BSS) from noisy convolutive mixtures to jointly separate and restore source images degraded through unknown blur operators, and then linearly mixed. We found that this problem arises in several image processing applications, among which there are some interesting instances of degraded document analysis. In particular, the convolutive mixture model is proposed for describing multiple views of documents affected by the overlapping of two or more text patterns. We consider two different models, the interchannel model, where the data represent multispectral views of a single-sided document, and the intrachannel model, where the data are given by two sets of multispectral views of the recto and verso side of a document page. In both cases, the aim of the analysis is to recover clean maps of the main foreground text, but also the enhancement and extraction of other document features, such as faint or masked patterns. We adopt Bayesian estimation for all the unknowns and describe the typical local correlation within the individual source images through the use of suitable Gibbs priors, accounting also for well-behaved edges in the images. This a priori information is particularly suitable for the kind of objects depicted in the images treated, i.e., homogeneous texts in homogeneous background, and, as such, is capable to stabilize the ill-posed, inverse problem considered. The method is validated through numerical and real experiments that are representative of various real scenarios.
Collapse
Affiliation(s)
- Anna Tonazzini
- Istituto di Scienza e Tecnologie dell'Informazione, Consiglio Nazionale delle Ricerche, Pisa, Italy.
| | | | | |
Collapse
|
11
|
Luts H, Eneman K, Wouters J, Schulte M, Vormann M, Buechler M, Dillier N, Houben R, Dreschler WA, Froehlich M, Puder H, Grimm G, Hohmann V, Leijon A, Lombard A, Mauler D, Spriet A. Multicenter evaluation of signal enhancement algorithms for hearing aids. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 127:1491-1505. [PMID: 20329849 DOI: 10.1121/1.3299168] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
In the framework of the European HearCom project, promising signal enhancement algorithms were developed and evaluated for future use in hearing instruments. To assess the algorithms' performance, five of the algorithms were selected and implemented on a common real-time hardware/software platform. Four test centers in Belgium, The Netherlands, Germany, and Switzerland perceptually evaluated the algorithms. Listening tests were performed with large numbers of normal-hearing and hearing-impaired subjects. Three perceptual measures were used: speech reception threshold (SRT), listening effort scaling, and preference rating. Tests were carried out in two types of rooms. Speech was presented in multitalker babble arriving from one or three loudspeakers. In a pseudo-diffuse noise scenario, only one algorithm, the spatially preprocessed speech-distortion-weighted multi-channel Wiener filtering, provided a SRT improvement relative to the unprocessed condition. Despite the general lack of improvement in SRT, some algorithms were preferred over the unprocessed condition at all tested signal-to-noise ratios (SNRs). These effects were found across different subject groups and test sites. The listening effort scores were less consistent over test sites. For the algorithms that did not affect speech intelligibility, a reduction in listening effort was observed at 0 dB SNR.
Collapse
Affiliation(s)
- Heleen Luts
- ExpORL, Department of Neurosciences, Katholieke Universiteit Leuven, Herestraat 49 bus 721, B-3000 Leuven, Belgium.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Mandel M, Weiss R, Ellis D. Model-Based Expectation-Maximization Source Separation and Localization. ACTA ACUST UNITED AC 2010. [DOI: 10.1109/tasl.2009.2029711] [Citation(s) in RCA: 211] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
13
|
Reju V, Soo Nqee Koh, Ing Yann Soon. Underdetermined Convolutive Blind Source Separation via Time–Frequency Masking. ACTA ACUST UNITED AC 2010. [DOI: 10.1109/tasl.2009.2024380] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
14
|
Gur MB, Niezrecki C. A source separation approach to enhancing marine mammal vocalizations. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 126:3062-3070. [PMID: 20000920 DOI: 10.1121/1.3257549] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
A common problem in passive acoustic based marine mammal monitoring is the contamination of vocalizations by a noise source, such as a surface vessel. The conventional approach in improving the vocalization signal to noise ratio (SNR) is to suppress the unwanted noise sources by beamforming the measurements made using an array. In this paper, an alternative approach to multi-channel underwater signal enhancement is proposed. Specifically, a blind source separation algorithm that extracts the vocalization signal from two-channel noisy measurements is derived and implemented. The proposed algorithm uses a robust decorrelation criterion to separate the vocalization from background noise, and hence is suitable for low SNR measurements. To overcome the convergence limitations resulting from temporally correlated recordings, the supervised affine projection filter update rule is adapted to the unsupervised source separation framework. The proposed method is evaluated using real West Indian manatee (Trichechus manatus latirostris) vocalizations and watercraft emitted noise measurements made within a typical manatee habitat in Florida. The results suggest that the proposed algorithm can improve the detection range of a passive acoustic detector five times on average (for input SNR between -10 and 5 dB) using only two receivers.
Collapse
Affiliation(s)
- M Berke Gur
- Department of Mechatronics Engineering, Bahcesehir University, Besiktas, Istanbul 34353, Turkey.
| | | |
Collapse
|
15
|
Reju V, Koh SN, Soon IY. Partial separation method for solving permutation problem in frequency domain blind source separation of speech signals. Neurocomputing 2008. [DOI: 10.1016/j.neucom.2007.08.030] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
16
|
He Z, Xie S, Ding S, Cichocki A. Convolutive Blind Source Separation in the Frequency Domain Based on Sparse Representation. ACTA ACUST UNITED AC 2007. [DOI: 10.1109/tasl.2007.898457] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
17
|
Seltzer M, Stern R. Subband Likelihood-Maximizing Beamforming for Speech Recognition in Reverberant Environments. ACTA ACUST UNITED AC 2006. [DOI: 10.1109/tasl.2006.872614] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
18
|
Anemüller J, Duann JR, Sejnowski TJ, Makeig S. Spatio-temporal dynamics in fMRI recordings revealed with complex independent component analysis. Neurocomputing 2006; 69:1502-1512. [PMID: 20689619 PMCID: PMC2916201 DOI: 10.1016/j.neucom.2005.12.029] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Independent component analysis (ICA) of functional magnetic resonance imaging (fMRI) data is commonly carried out under the assumption that each source may be represented as a spatially fixed pattern of activation, which leads to the instantaneous mixing model. To allow modeling patterns of spatio-temporal dynamics, in particular, the flow of oxygenated blood, we have developed a convolutive ICA approach: spatial complex ICA applied to frequency-domain fMRI data. In several frequency-bands, we identify components pertaining to activity in primary visual cortex (V1) and blood supply vessels. One such component, obtained in the 0.10 Hz band, is analyzed in detail and found to likely reflect flow of oxygenated blood in V1.
Collapse
Affiliation(s)
- Jörn Anemüller
- Swartz Center for Computational Neuroscience, Institute for Neural Computation, University of California San Diego, La Jolla, CA, USA
| | | | | | | |
Collapse
|