1
|
Luo X, Ke Y, Li X, Zheng C. On phase recovery and preserving early reflections for deep-learning speech dereverberation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:436-451. [PMID: 38240664 DOI: 10.1121/10.0024348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Accepted: 12/21/2023] [Indexed: 01/23/2024]
Abstract
In indoor environments, reverberation often distorts clean speech. Although deep learning-based speech dereverberation approaches have shown much better performance than traditional ones, the inferior speech quality of the dereverberated speech caused by magnitude distortion and limited phase recovery is still a serious problem for practical applications. This paper improves the performance of deep learning-based speech dereverberation from the perspectives of both network design and mapping target optimization. Specifically, on the one hand, a bifurcated-and-fusion network and its guidance loss functions were designed to help reduce the magnitude distortion while enhancing the phase recovery. On the other hand, the time boundary between the early and late reflections in the mapped speech was investigated, so as to make a balance between the reverberation tailing effect and the difficulty of magnitude/phase recovery. Mathematical derivations were provided to show the rationality of the specially designed loss functions. Geometric illustrations were given to explain the importance of preserving early reflections in reducing the difficulty of phase recovery. Ablation study results confirmed the validity of the proposed network topology and the importance of preserving 20 ms early reflections in the mapped speech. Objective and subjective test results showed that the proposed system outperformed other baselines in the speech dereverberation task.
Collapse
Affiliation(s)
- Xiaoxue Luo
- Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences, 100190, Beijing, China
| | - Yuxuan Ke
- Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences, 100190, Beijing, China
| | - Xiaodong Li
- Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences, 100190, Beijing, China
| | - Chengshi Zheng
- Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences, 100190, Beijing, China
| |
Collapse
|
2
|
Peracha FK, Khattak MI, Salem N, Saleem N. Causal speech enhancement using dynamical-weighted loss and attention encoder-decoder recurrent neural network. PLoS One 2023; 18:e0285629. [PMID: 37167227 PMCID: PMC10174555 DOI: 10.1371/journal.pone.0285629] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 04/26/2023] [Indexed: 05/13/2023] Open
Abstract
Speech enhancement (SE) reduces background noise signals in target speech and is applied at the front end in various real-world applications, including robust ASRs and real-time processing in mobile phone communications. SE systems are commonly integrated into mobile phones to increase quality and intelligibility. As a result, a low-latency system is required to operate in real-world applications. On the other hand, these systems need efficient optimization. This research focuses on the single-microphone SE operating in real-time systems with better optimization. We propose a causal data-driven model that uses attention encoder-decoder long short-term memory (LSTM) to estimate the time-frequency mask from a noisy speech in order to make a clean speech for real-time applications that need low-latency causal processing. The encoder-decoder LSTM and a causal attention mechanism are used in the proposed model. Furthermore, a dynamical-weighted (DW) loss function is proposed to improve model learning by varying the weight loss values. Experiments demonstrated that the proposed model consistently improves voice quality, intelligibility, and noise suppression. In the causal processing mode, the LSTM-based estimated suppression time-frequency mask outperforms the baseline model for unseen noise types. The proposed SE improved the STOI by 2.64% (baseline LSTM-IRM), 6.6% (LSTM-KF), 4.18% (DeepXi-KF), and 3.58% (DeepResGRU-KF). In addition, we examine word error rates (WERs) using Google's Automatic Speech Recognition (ASR). The ASR results show that error rates decreased from 46.33% (noisy signals) to 13.11% (proposed) 15.73% (LSTM), and 14.97% (LSTM-KF).
Collapse
Affiliation(s)
- Fahad Khalil Peracha
- Department of Electrical Engineering, University of Engineering and Technology, Peshawar, KPK, Pakistan
| | - Muhammad Irfan Khattak
- Department of Electrical Engineering, University of Engineering and Technology, Peshawar, KPK, Pakistan
| | - Nema Salem
- Electrical and Computer Engineering Department, Effat College of Engineering, Effat University, Jeddah, KSA
| | - Nasir Saleem
- Department of Electrical Engineering, University of Engineering and Technology, Peshawar, KPK, Pakistan
| |
Collapse
|
3
|
A context aware-based deep neural network approach for simultaneous speech denoising and dereverberation. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-06968-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
4
|
Hussain T, Siniscalchi SM, Wang HLS, Tsao Y, Salerno VM, Liao WH. Ensemble Hierarchical Extreme Learning Machine for Speech Dereverberation. IEEE Trans Cogn Dev Syst 2020. [DOI: 10.1109/tcds.2019.2953620] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
5
|
Saleem N, Khattak MI. Multi-scale decomposition based supervised single channel deep speech enhancement. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106666] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
6
|
Zhao Y, Wang D, Xu B, Zhang T. Monaural Speech Dereverberation Using Temporal Convolutional Networks with Self Attention. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 2020; 28:1598-1607. [PMID: 33748325 PMCID: PMC7971181 DOI: 10.1109/taslp.2020.2995273] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
In daily listening environments, human speech is often degraded by room reverberation, especially under highly reverberant conditions. Such degradation poses a challenge for many speech processing systems, where the performance becomes much worse than in anechoic environments. To combat the effect of reverberation, we propose a monaural (single-channel) speech dereverberation algorithm using temporal convolutional networks with self attention. Specifically, the proposed system includes a self-attention module to produce dynamic representations given input features, a temporal convolutional network to learn a nonlinear mapping from such representations to the magnitude spectrum of anechoic speech, and a one-dimensional (1-D) convolution module to smooth the enhanced magnitude among adjacent frames. Systematic evaluations demonstrate that the proposed algorithm improves objective metrics of speech quality in a wide range of reverberant conditions. In addition, it generalizes well to untrained reverberation times, room sizes, measured room impulse responses, real-world recorded noisy-reverberant speech, and different speakers.
Collapse
Affiliation(s)
- Yan Zhao
- Department of Computer Science and Engineering and the Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, OH, 43210 USA
| | - DeLiang Wang
- Department of Computer Science and Engineering and the Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, OH, 43210 USA. He also held a visiting appointment at the Center of Intelligent Acoustics and Immersive Communications, Northwestern Polytechnical University, Xi'an, China
| | - Buye Xu
- Starkey Hearing Technologies, Eden Prairie, MN 55344 USA. He is now with Facebook Reality Labs, Facebook, Inc., Redmond, WA 98052 USA
| | - Tao Zhang
- Starkey Hearing Technologies, Eden Prairie, MN 55344 USA
| |
Collapse
|
7
|
Abstract
This paper presents a novel scheme for speech dereverberation. The core of our method is a two-stage single-channel speech enhancement scheme. Degraded speech obtains a sparser representation of the linear prediction residual in the first stage of our proposed scheme by applying orthogonal matching pursuit on overcomplete bases, trained by the K-SVD algorithm. Our method includes an estimation of reverberation and mixing time from a recorded hand clap or a simulated room impulse response, which are used to create a time-domain envelope. Late reverberation is suppressed at the second stage by estimating its energy from the previous envelope and removed with spectral subtraction. Further speech enhancement is applied on minimizing the background noise, based on optimal smoothing and minimum statistics. Experimental results indicate favorable quality, compared to two state-of-the-art methods, especially in real reverberant environments with increased reverberation and background noise.
Collapse
|
8
|
Wang D, Chen J. Supervised Speech Separation Based on Deep Learning: An Overview. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 2018; 26:1702-1726. [PMID: 31223631 PMCID: PMC6586438 DOI: 10.1109/taslp.2018.2842159] [Citation(s) in RCA: 119] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Speech separation is the task of separating target speech from background interference. Traditionally, speech separation is studied as a signal processing problem. A more recent approach formulates speech separation as a supervised learning problem, where the discriminative patterns of speech, speakers, and background noise are learned from training data. Over the past decade, many supervised separation algorithms have been put forward. In particular, the recent introduction of deep learning to supervised speech separation has dramatically accelerated progress and boosted separation performance. This paper provides a comprehensive overview of the research on deep learning based supervised speech separation in the last several years. We first introduce the background of speech separation and the formulation of supervised separation. Then, we discuss three main components of supervised separation: learning machines, training targets, and acoustic features. Much of the overview is on separation algorithms where we review monaural methods, including speech enhancement (speech-nonspeech separation), speaker separation (multitalker separation), and speech dereverberation, as well as multimicrophone techniques. The important issue of generalization, unique to supervised learning, is discussed. This overview provides a historical perspective on how advances are made. In addition, we discuss a number of conceptual issues, including what constitutes the target source.
Collapse
Affiliation(s)
- DeLiang Wang
- Department of Computer Science and Engineering and the Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, OH 43210 USA, and also with the Center of Intelligent Acoustics and Immersive Communications, Northwestern Polytechnical University, Xi'an 710072, China
| | - Jitong Chen
- Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210 USA. He is now with Silicon Valley AI Lab, Baidu Research, Sunnyvale, CA 94089 USA
| |
Collapse
|
9
|
Speech Enhancement of Mobile Devices Based on the Integration of a Dual Microphone Array and a Background Noise Elimination Algorithm. SENSORS 2018; 18:s18051467. [PMID: 29738481 PMCID: PMC5982214 DOI: 10.3390/s18051467] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Revised: 04/29/2018] [Accepted: 05/03/2018] [Indexed: 11/21/2022]
Abstract
Mobile devices are often used in our daily lives for the purposes of speech and communication. The speech quality of mobile devices is always degraded due to the environmental noises surrounding mobile device users. Regretfully, an effective background noise reduction solution cannot easily be developed for this speech enhancement problem. Due to these depicted reasons, a methodology is systematically proposed to eliminate the effects of background noises for the speech communication of mobile devices. This methodology integrates a dual microphone array with a background noise elimination algorithm. The proposed background noise elimination algorithm includes a whitening process, a speech modelling method and an H2 estimator. Due to the adoption of the dual microphone array, a low-cost design can be obtained for the speech enhancement of mobile devices. Practical tests have proven that this proposed method is immune to random background noises, and noiseless speech can be obtained after executing this denoise process.
Collapse
|
10
|
Williamson DS, Wang D. Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 2017; 25:1492-1501. [PMID: 30112422 PMCID: PMC6089240 DOI: 10.1109/taslp.2017.2696307] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
In real-world situations, speech is masked by both background noise and reverberation, which negatively affect perceptual quality and intelligibility. In this paper, we address monaural speech separation in reverberant and noisy environments. We perform dereverberation and denoising using supervised learning with a deep neural network. Specifically, we enhance the magnitude and phase by performing separation with an estimate of the complex ideal ratio mask. We define the complex ideal ratio mask so that direct speech results after the mask is applied to reverberant and noisy speech. Our approach is evaluated using simulated and real room impulse responses, and with background noises. The proposed approach improves objective speech quality and intelligibility significantly. Evaluations and comparisons show that it outperforms related methods in many reverberant and noisy environments.
Collapse
Affiliation(s)
- Donald S Williamson
- Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210 USA
| | - DeLiang Wang
- Department of Computer Science and Engineering, Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, OH 43210 USA
| |
Collapse
|
11
|
Monaghan JJM, Seeber BU. A method to enhance the use of interaural time differences for cochlear implants in reverberant environments. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:1116. [PMID: 27586742 PMCID: PMC5708523 DOI: 10.1121/1.4960572] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The ability of normal-hearing (NH) listeners to exploit interaural time difference (ITD) cues conveyed in the modulated envelopes of high-frequency sounds is poor compared to ITD cues transmitted in the temporal fine structure at low frequencies. Sensitivity to envelope ITDs is further degraded when envelopes become less steep, when modulation depth is reduced, and when envelopes become less similar between the ears, common factors when listening in reverberant environments. The vulnerability of envelope ITDs is particularly problematic for cochlear implant (CI) users, as they rely on information conveyed by slowly varying amplitude envelopes. Here, an approach to improve access to envelope ITDs for CIs is described in which, rather than attempting to reduce reverberation, the perceptual saliency of cues relating to the source is increased by selectively sharpening peaks in the amplitude envelope judged to contain reliable ITDs. Performance of the algorithm with room reverberation was assessed through simulating listening with bilateral CIs in headphone experiments with NH listeners. Relative to simulated standard CI processing, stimuli processed with the algorithm generated lower ITD discrimination thresholds and increased extents of laterality. Depending on parameterization, intelligibility was unchanged or somewhat reduced. The algorithm has the potential to improve spatial listening with CIs.
Collapse
Affiliation(s)
- Jessica J M Monaghan
- Medical Research Council Institute of Hearing Research, Nottingham, United Kingdom
| | - Bernhard U Seeber
- Medical Research Council Institute of Hearing Research, Nottingham, United Kingdom
| |
Collapse
|
12
|
Hazrati O, Sadjadi SO, Loizou PC, Hansen JHL. Simultaneous suppression of noise and reverberation in cochlear implants using a ratio masking strategy. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 134:3759-3765. [PMID: 24180786 PMCID: PMC3829893 DOI: 10.1121/1.4823839] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/01/2013] [Revised: 09/03/2013] [Accepted: 09/10/2013] [Indexed: 06/02/2023]
Abstract
Cochlear implant (CI) recipients' ability to identify words is reduced in noisy or reverberant environments. The speech identification task for CI users becomes even more challenging in conditions where both reverberation and noise co-exist as they mask the spectro-temporal cues of speech in a rather complementary fashion. Ideal channel selection (ICS) was found to result in significantly more intelligible speech when applied to the noisy, reverberant, as well as noisy reverberant speech. In this study, a blind single-channel ratio masking strategy is presented to simultaneously suppress the negative effects of reverberation and noise on speech identification performance for CI users. In this strategy, noise power spectrum is estimated from the non-speech segments of the utterance while reverberation spectral variance is computed as a delayed and scaled version of the reverberant speech spectrum. Based on the estimated noise and reverberation power spectra, a weight between 0 and 1 is assigned to each time-frequency unit to form the final mask. Listening experiments conducted with CI users in two reverberant conditions (T60 = 0.6 and 0.8 s) at a signal-to-noise ratio of 15 dB indicate substantial improvements in speech intelligibility in both reverberant-alone and noisy reverberant conditions considered.
Collapse
Affiliation(s)
- Oldooz Hazrati
- Department of Electrical Engineering, The University of Texas at Dallas, Richardson, Texas 75080
| | | | | | | |
Collapse
|
13
|
Ashwini JK, Kumaraswamy R. Single-Channel Speech Enhancement Techniques for Distant Speech Recognition. JOURNAL OF INTELLIGENT SYSTEMS 2013. [DOI: 10.1515/jisys-2012-0051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
AbstractThis article presents an overview of the single-channel dereverberation methods suitable for distant speech recognition (DSR) application. The dereverberation methods are mainly classified based on the domain of enhancement of speech signal captured by a distant microphone. Many single-channel speech enhancement methods focus on either denoising or dereverberating the distorted speech signal. There are very few methods that consider both noise and reverberation effects. Such methods are discussed under a multistage approach in this article. The article concludes with a hypothesis that the methods that do not require an a priori reverberation impulse response is desirable in varying the environmental conditions for DSR applications such as intelligent home and office environments, humanoid robots, and automobiles rather than the methods that require an a priori reverberation impulse response.
Collapse
Affiliation(s)
- Jaya Kumar Ashwini
- 1Department of Electronics and Communication, Siddaganga Institute of Technology, Tumkur, Karnataka, India
| | - Ramaswamy Kumaraswamy
- 1Department of Electronics and Communication, Siddaganga Institute of Technology, Tumkur, Karnataka, India
| |
Collapse
|
14
|
Mosayyebpour S, Esmaeili M, Gulliver TA. Single-Microphone Early and Late Reverberation Suppression in Noisy Speech. ACTA ACUST UNITED AC 2013. [DOI: 10.1109/tasl.2012.2224341] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
15
|
Tsilfidis A, Mporas I, Mourjopoulos J, Fakotakis N. Automatic speech recognition performance in different room acoustic environments with and without dereverberation preprocessing. COMPUT SPEECH LANG 2013. [DOI: 10.1016/j.csl.2012.07.004] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
16
|
Yoshioka T, Nakatani T. Generalization of Multi-Channel Linear Prediction Methods for Blind MIMO Impulse Response Shortening. ACTA ACUST UNITED AC 2012. [DOI: 10.1109/tasl.2012.2210879] [Citation(s) in RCA: 121] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
17
|
Hazrati O, Loizou PC. Tackling the combined effects of reverberation and masking noise using ideal channel selection. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2012; 55:500-510. [PMID: 22232411 PMCID: PMC3320694 DOI: 10.1044/1092-4388(2011/11-0073)] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
PURPOSE In this article, a new signal-processing algorithm is proposed and evaluated for the suppression of the combined effects of reverberation and noise. METHOD The proposed algorithm decomposes, on a short-term basis (every 20 ms), the reverberant stimuli into a number of channels and retains only a subset of the channels satisfying a signal-to-reverberant ratio (SRR) criterion. The construction of this criterion assumes access to a priori knowledge of the target (anechoic) signal, and the aim of this study was to assess the full potential of the proposed channel-selection algorithm, assuming that this criterion could be estimated accurately. Listening tests with normal-hearing listeners were conducted to assess the performance of the proposed algorithm in highly reverberant conditions (T(60) = 1.0 s), which included additive noise at 0 and 5 dB signal-to-noise ratios (SNRs). RESULTS A substantial gain in intelligibility was obtained in both reverberant and combined reverberant and noise conditions. The mean intelligibility scores improved by 44 and 33 percentage points at 0 and 5 dB SNR reverberation + noise conditions. Feature analysis of the consonant confusion matrices revealed that the transmission of voicing information was most negatively affected, followed by manner and place of articulation. CONCLUSIONS The proposed algorithm produced substantial gains in intelligibility, and this benefit was attributed to the ability of the proposed SRR criterion to detect accurately voiced or unvoiced boundaries. It was postulated that detection of those boundaries is critical for better perception of voicing information and manner of articulation.
Collapse
|
18
|
Mackersie CL, Dewey J, Guthrie LA. Effects of fundamental frequency and vocal-tract length cues on sentence segregation by listeners with hearing loss. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 130:1006-19. [PMID: 21877813 PMCID: PMC3190663 DOI: 10.1121/1.3605548] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2010] [Revised: 05/22/2011] [Accepted: 06/01/2011] [Indexed: 05/20/2023]
Abstract
The purpose was to determine the effect of hearing loss on the ability to separate competing talkers using talker differences in fundamental frequency (F0) and apparent vocal-tract length (VTL). Performance of 13 adults with hearing loss and 6 adults with normal hearing was measured using the Coordinate Response Measure. For listeners with hearing loss, the speech was amplified and filtered according to the NAL-RP hearing aid prescription. Target-to-competition ratios varied from 0 to 9 dB. The target sentence was randomly assigned to the higher or lower values of F0 or VTL on each trial. Performance improved for F0 differences up to 9 and 6 semitones for people with normal hearing and hearing loss, respectively, but only when the target talker had the higher F0. Recognition for the lower F0 target improved when trial-to-trial uncertainty was removed (9-semitone condition). Scores improved with increasing differences in VTL for the normal-hearing group. On average, hearing-impaired listeners did not benefit from VTL cues, but substantial inter-subject variability was observed. The amount of benefit from VTL cues was related to the average hearing loss in the 1-3-kHz region when the target talker had the shorter VTL.
Collapse
|
19
|
Jeub M, Schafer M, Esch T, Vary P. Model-Based Dereverberation Preserving Binaural Cues. ACTA ACUST UNITED AC 2010. [DOI: 10.1109/tasl.2010.2052156] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
20
|
Kokkinakis K, Loizou PC. Selective-Tap Blind Dereverberation for Two-Microphone Enhancement of Reverberant Speech. IEEE SIGNAL PROCESSING LETTERS 2009; 16:961-964. [PMID: 19885386 PMCID: PMC2747744 DOI: 10.1109/lsp.2009.2027658] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
In this letter we propose a novel approach for two-microphone enhancement of speech corrupted by reverberation. Our approach steers computational resources to filter coefficients having the largest impact on the error surface and therefore only updates a subset of coefficients in every iteration. Experimental results carried out in a realistically reverberant setup indicate that the performance of the proposed algorithm is comparable to the performance of its full-update counterpart.
Collapse
Affiliation(s)
- Kostas Kokkinakis
- Center for Robust Speech Systems, Department of Electrical Engineering, University of Texas at Dallas, Richardson, TX 75080, USA (e-mail: , )
| | | |
Collapse
|
21
|
Zarouchas T, Mourjopoulos J. Modeling perceptual effects of reverberation on stereophonic sound reproduction in rooms. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 126:229-242. [PMID: 19603880 DOI: 10.1121/1.3129382] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
The proposed model derives time-frequency maps to estimate perceived alterations due to reverberation in stereo audio signals reproduced in rooms. These alterations relate to monaural masking due to reverberant decay, derived via a computational auditory masking model and to inter-channel cues for the formation of the spatial position of the aural objects, derived via an inter-channel cue mapping module. The maps illustrate in detail the varying nature of the perceptually-relevant alterations due to room reverberation. Quantitative metrics are also introduced which were found to be proportional to reverberation interference, to room-reverberation time and to depend on the specific audio signal. A statistical approach classifies room response properties via their histogram distributions. Corresponding distributions were also applied to the proposed signal-dependent perceptual maps. Such distributions were found to be useful for interpreting the perceived alterations with different kinds of signals, such as music or speech.
Collapse
Affiliation(s)
- Thomas Zarouchas
- Department of Electrical Engineering and Computer Engineering, University of Patras, Achaia 26500, Greece.
| | | |
Collapse
|
22
|
Zhaozhang Jin, DeLiang Wang. A Supervised Learning Approach to Monaural Segregation of Reverberant Speech. ACTA ACUST UNITED AC 2009. [DOI: 10.1109/tasl.2008.2010633] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
23
|
Wolfel M. Enhanced Speech Features by Single-Channel Joint Compensation of Noise and Reverberation. ACTA ACUST UNITED AC 2009. [DOI: 10.1109/tasl.2008.2009161] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
24
|
System Identification in the Short-Time Fourier Transform Domain With Crossband Filtering. ACTA ACUST UNITED AC 2007. [DOI: 10.1109/tasl.2006.889720] [Citation(s) in RCA: 110] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|