1
|
Saba JN, Hansen JHL. The effects of Lombard perturbation on speech intelligibility in noise for normal hearing and cochlear implant listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:1007. [PMID: 35232065 PMCID: PMC8849642 DOI: 10.1121/10.0009377] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Revised: 01/09/2022] [Accepted: 01/09/2022] [Indexed: 06/02/2023]
Abstract
Natural compensation of speech production in challenging listening environments is referred to as the Lombard effect (LE). The resulting acoustic differences between neutral and Lombard speech have been shown to provide intelligibility benefits for normal hearing (NH) and cochlear implant (CI) listeners alike. Motivated by this outcome, three LE perturbation approaches consisting of pitch, duration, formant, intensity, and spectral contour modifications were designed specifically for CI listeners to combat speech-in-noise performance deficits. Experiment 1 analyzed the effects of loudness, quality, and distortion of approaches on speech intelligibility with and without formant-shifting. Significant improvements of +9.4% were observed in CI listeners without the formant-shifting approach at +5 dB signal-to-noise ratio (SNR) large-crowd-noise (LCN) when loudness was controlled, however, performance was found to be significantly lower for NH listeners. Experiment 2 evaluated the non-formant-shifting approach with additional spectral contour and high pass filtering to reduce spectral smearing and decrease distortion observed in Experiment 1. This resulted in significant intelligibility benefits of +30.2% for NH and +21.2% for CI listeners at 0 and +5 dB SNR LCN, respectively. These results suggest that LE perturbation may be useful as front-end speech modification approaches to improve intelligibility for CI users in noise.
Collapse
Affiliation(s)
- Juliana N Saba
- Center for Robust Speech Systems-Cochlear Implant Processing Lab, University of Texas at Dallas, 800 West Campbell Road, Richardson, Texas 75080, USA
| | - John H L Hansen
- Center for Robust Speech Systems-Cochlear Implant Processing Lab, University of Texas at Dallas, 800 West Campbell Road, Richardson, Texas 75080, USA
| |
Collapse
|
2
|
Kelly F, Hansen JHL. Analysis and Calibration of Lombard Effect and Whisper for Speaker Recognition. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 2021; 29:927-942. [PMID: 35783572 PMCID: PMC9245507 DOI: 10.1109/taslp.2021.3053388] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Variations in vocal effort can create challenges for speaker recognition systems that are optimized for use with neutral speech. The Lombard effect and whisper are two commonly-occurring forms of vocal effort variation that result in non-neutral speech, the first due to noise exposure and the second due to intentional adjustment on the part of the speaker. In this article, a comparative evaluation of speaker recognition performance in non-neutral conditions is presented using multiple Lombard effect and whisper corpora. The detrimental impact of these vocal effort variations on discrimination and calibration performance on global, per-corpus, and per-speaker levels is explored using conventional error metrics, along with visual representations of the model and score spaces. A non-neutral speech detector is subsequently introduced and used to inform score calibration in several ways. Two calibration approaches are proposed and shown to reduce error to the same level as an optimal calibration approach that relies on ground-truth vocal effort information. This article contributes a generalizable methodology towards detecting vocal effort variation and using this knowledge to inform and advance speaker recognition system behavior.
Collapse
Affiliation(s)
- Finnian Kelly
- Center for Robust Speech Systems (CRSS), University of Texas at Dallas, Richardson, TX 75083-0688 USA
| | - John H L Hansen
- Center for Robust Speech Systems (CRSS), University of Texas at Dallas, Richardson, TX 75083-0688 USA
| |
Collapse
|
3
|
Conversation in small groups: Speaking and listening strategies depend on the complexities of the environment and group. Psychon Bull Rev 2020; 28:632-640. [PMID: 33051825 PMCID: PMC8062389 DOI: 10.3758/s13423-020-01821-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/22/2020] [Indexed: 11/29/2022]
Abstract
Many conversations in our day-to-day lives are held in noisy environments – impeding comprehension, and in groups – taxing auditory attention-switching processes. These situations are particularly challenging for older adults in cognitive and sensory decline. In noisy environments, a variety of extra-linguistic strategies are available to speakers and listeners to facilitate communication, but while models of language account for the impact of context on word choice, there has been little consideration of the impact of context on extra-linguistic behaviour. To address this issue, we investigate how the complexity of the acoustic environment and interaction situation impacts extra-linguistic conversation behaviour of older adults during face-to-face conversations. Specifically, we test whether the use of intelligibility-optimising strategies increases with complexity of the background noise (from quiet to loud, and in speech-shaped vs. babble noise), and with complexity of the conversing group (dyad vs. triad). While some communication strategies are enhanced in more complex background noise, with listeners orienting to talkers more optimally and moving closer to their partner in babble than speech-shaped noise, this is not the case with all strategies, as we find greater vocal level increases in the less complex speech-shaped noise condition. Other behaviours are enhanced in the more complex interaction situation, with listeners using more optimal head orientations, and taking longer turns when gaining the floor in triads compared to dyads. This study elucidates how different features of the conversation context impact individuals’ communication strategies, which is necessary to both develop a comprehensive cognitive model of multimodal conversation behaviour, and effectively support individuals that struggle conversing.
Collapse
|
4
|
Understanding Lombard speech: a review of compensation techniques towards improving speech based recognition systems. Artif Intell Rev 2020. [DOI: 10.1007/s10462-020-09907-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
5
|
Hansen JHL, Bokshi M, Khorram S. Speech variability: A cross-language study on acoustic variations of speaking versus untrained singing. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:829. [PMID: 32873043 PMCID: PMC7438159 DOI: 10.1121/10.0001526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Revised: 06/17/2020] [Accepted: 06/19/2020] [Indexed: 06/11/2023]
Abstract
Speech production variability introduces significant challenges for existing speech technologies such as speaker identification (SID), speaker diarization, speech recognition, and language identification (ID). There has been limited research analyzing changes in acoustic characteristics for speech produced by untrained singing versus speaking. To better understand changes in speech production of the untrained singing voice, this study presents the first cross-language comparison between normal speaking and untrained karaoke singing of the same text content. Previous studies comparing professional singing versus speaking have shown deviations in both prosodic and spectral features. Some investigations also considered assigning the intrinsic activity of the singing. Motivated by these studies, a series of experiments to investigate both prosodic and spectral variations of untrained karaoke singers for three languages, American English, Hindi, and Farsi, are considered. A comprehensive comparison on common prosodic features, including phoneme duration, mean fundamental frequency (F0), and formant center frequencies of vowels was performed. Collective changes in the corresponding overall acoustic spaces based on the Kullback-Leibler distance using Gaussian probability distribution models trained on spectral features were analyzed. Finally, these models were used in a Gausian mixture model with universal background model SID evaluation to quantify speaker changes between speaking and singing when the audio text content is the same. The experiments showed that many acoustic characteristics of untrained singing are considerably different from speaking when the text content is the same. It is suggested that these results would help advance automatic speech production normalization/compensation to improve performance of speech processing applications (e.g., speaker ID, speech recognition, and language ID).
Collapse
Affiliation(s)
- John H L Hansen
- Robust Speech Technologies Laboratory (RSTL), Center for Robust Speech Systems (CRSS), University of Texas at Dallas, Richardson, Texas 75080, USA
| | - Marigona Bokshi
- Robust Speech Technologies Laboratory (RSTL), Center for Robust Speech Systems (CRSS), University of Texas at Dallas, Richardson, Texas 75080, USA
| | - Soheil Khorram
- Robust Speech Technologies Laboratory (RSTL), Center for Robust Speech Systems (CRSS), University of Texas at Dallas, Richardson, Texas 75080, USA
| |
Collapse
|
6
|
Whittico TH, Ortiz AJ, Marks KL, Toles LE, Van Stan JH, Hillman RE, Mehta DD. Ambulatory monitoring of Lombard-related vocal characteristics in vocally healthy female speakers. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:EL552. [PMID: 32611177 PMCID: PMC7316514 DOI: 10.1121/10.0001446] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Speakers typically modify their voice in the presence of increased background noise levels, exhibiting the classic Lombard effect. Lombard-related characteristics during everyday activities were recorded from 17 vocally healthy women who wore an acoustic noise dosimeter and ambulatory voice monitor. The linear relationship between vocal sound pressure level and environmental noise level exhibited an average slope of 0.54 dB/dB and value of 72.8 dB SPL at 50 dBA when correlation coefficients were greater than 0.4. These results, coupled with analyses of spectral and cepstral vocal function measures, provide normative ambulatory Lombard characteristics for comparison with patients with voice-use related disorders.
Collapse
Affiliation(s)
- Thomas H Whittico
- Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, Boston, Massachusetts 02114, , , , , , ,
| | - Andrew J Ortiz
- Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, Boston, Massachusetts 02114, , , , , , ,
| | - Katherine L Marks
- Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, Boston, Massachusetts 02114, , , , , , ,
| | - Laura E Toles
- Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, Boston, Massachusetts 02114, , , , , , ,
| | - Jarrad H Van Stan
- Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, Boston, Massachusetts 02114, , , , , , ,
| | - Robert E Hillman
- Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, Boston, Massachusetts 02114, , , , , , ,
| | - Daryush D Mehta
- Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, Boston, Massachusetts 02114, , , , , , ,
| |
Collapse
|
7
|
Hansen JHL, Lee J, Ali H, Saba JN. A speech perturbation strategy based on "Lombard effect" for enhanced intelligibility for cochlear implant listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:1418. [PMID: 32237802 PMCID: PMC7054124 DOI: 10.1121/10.0000690] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Revised: 12/09/2019] [Accepted: 01/21/2020] [Indexed: 06/02/2023]
Abstract
The goal of this study is to determine potential intelligibility benefits from Lombard speech for cochlear implant (CI) listeners in speech-in-noise conditions. "Lombard effect" (LE) is the natural response of adjusting speech production via auditory feedback due to noise exposure within acoustic environments. To evaluate intelligibility performance of natural and artificially induced Lombard speech, a corpus was generated to create natural LE from large crowd noise (LCN) exposure at 70, 80, and 90 dB sound pressure level (SPL). Clean speech was mixed with 15 and 10 dB SNR LCN and presented to five CI users. First, speech intelligibility was analyzed as a function of increasing LE and decreasing SNR. Results indicate significant improvements (p < 0.05) with Lombard speech intelligibility in noise conditions for 80 and 90 dB SPL. Next, an offline perturbation strategy was formulated to modify/perturb neutral speech so as to mimic LE through amplification of highly intelligible segments, uniform time stretching, and spectral mismatch filtering. This process effectively introduces aspects of LE into the neutral speech, with the hypothesis that this would benefit intelligibility for CI users. Significant (p < 0.01) intelligibility improvements of 13% and 16% percentage points were observed for 15 and 10 dB SNR conditions respectively for CI users. The results indicate how LE and LE-inspired acoustic and frequency-based modifications can be leveraged within signal processing to improve intelligibility of speech for CI users.
Collapse
Affiliation(s)
- John H L Hansen
- Cochlear Implant Processing Laboratory, Center for Robust Speech Systems (CRSS-CILab), Department of Electrical Engineering and Computer Science, University of Texas at Dallas, Richardson, Texas 75080, USA
| | - Jaewook Lee
- Cochlear Implant Processing Laboratory, Center for Robust Speech Systems (CRSS-CILab), Department of Electrical Engineering and Computer Science, University of Texas at Dallas, Richardson, Texas 75080, USA
| | - Hussnain Ali
- Cochlear Implant Processing Laboratory, Center for Robust Speech Systems (CRSS-CILab), Department of Electrical Engineering and Computer Science, University of Texas at Dallas, Richardson, Texas 75080, USA
| | - Juliana N Saba
- Cochlear Implant Processing Laboratory, Center for Robust Speech Systems (CRSS-CILab), Department of Electrical Engineering and Computer Science, University of Texas at Dallas, Richardson, Texas 75080, USA
| |
Collapse
|
8
|
Chennupati N, Kadiri SR, B. Y. Spectral and temporal manipulations of SFF envelopes for enhancement of speech intelligibility in noise. COMPUT SPEECH LANG 2019. [DOI: 10.1016/j.csl.2018.09.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
9
|
Lee J, Ali H, Ziaei A, Tobey EA, Hansen JHL. The Lombard effect observed in speech produced by cochlear implant users in noisy environments: A naturalistic study. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:2788. [PMID: 28464686 PMCID: PMC5398925 DOI: 10.1121/1.4979927] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/28/2015] [Revised: 03/25/2017] [Accepted: 03/27/2017] [Indexed: 06/02/2023]
Abstract
The Lombard effect is an involuntary response speakers experience in the presence of noise during voice communication. This phenomenon is known to cause changes in speech production such as an increase in intensity, pitch structure, formant characteristics, etc., for enhanced audibility in noisy environments. Although well studied for normal hearing listeners, the Lombard effect has received little, if any, attention in the field of cochlear implants (CIs). The objective of this study is to analyze speech production of CI users who are postlingually deafened adults with respect to environmental context. A total of six adult CI users were recruited to produce spontaneous speech in various realistic environments. Acoustic-phonetic analysis was then carried out to characterize their speech production in these environments. The Lombard effect was observed in the speech production of all CI users who participated in this study in adverse listening environments. The results indicate that both suprasegmental (e.g., F0, glottal spectral tilt and vocal intensity) and segmental (e.g., F1 for /i/ and /u/) features were altered in such environments. The analysis from this study suggests that modification of speech production of CI users under the Lombard effect may contribute to some degree an intelligible communication in adverse noisy environments.
Collapse
Affiliation(s)
- Jaewook Lee
- Center for Robust Speech Systems-Cochlear Implant Lab (CRSS-CIL), Department of Electrical Engineering, The University of Texas at Dallas, 800 West Campbell Road, Richardson, Texas 75080, USA
| | - Hussnain Ali
- Center for Robust Speech Systems-Cochlear Implant Lab (CRSS-CIL), Department of Electrical Engineering, The University of Texas at Dallas, 800 West Campbell Road, Richardson, Texas 75080, USA
| | - Ali Ziaei
- Center for Robust Speech Systems-Cochlear Implant Lab (CRSS-CIL), Department of Electrical Engineering, The University of Texas at Dallas, 800 West Campbell Road, Richardson, Texas 75080, USA
| | - Emily A Tobey
- Center for Robust Speech Systems-Cochlear Implant Lab (CRSS-CIL), Department of Electrical Engineering, The University of Texas at Dallas, 800 West Campbell Road, Richardson, Texas 75080, USA
| | - John H L Hansen
- Center for Robust Speech Systems-Cochlear Implant Lab (CRSS-CIL), Department of Electrical Engineering, The University of Texas at Dallas, 800 West Campbell Road, Richardson, Texas 75080, USA
| |
Collapse
|
10
|
Hansen JHL, Nandwana MK, Shokouhi N. Analysis of human scream and its impact on text-independent speaker verification. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:2957. [PMID: 28464689 DOI: 10.1121/1.4979337] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Scream is defined as sustained, high-energy vocalizations that lack phonological structure. Lack of phonological structure is how scream is identified from other forms of loud vocalization, such as "yell." This study investigates the acoustic aspects of screams and addresses those that are known to prevent standard speaker identification systems from recognizing the identity of screaming speakers. It is well established that speaker variability due to changes in vocal effort and Lombard effect contribute to degraded performance in automatic speech systems (i.e., speech recognition, speaker identification, diarization, etc.). However, previous research in the general area of speaker variability has concentrated on human speech production, whereas less is known about non-speech vocalizations. The UT-NonSpeech corpus is developed here to investigate speaker verification from scream samples. This study considers a detailed analysis in terms of fundamental frequency, spectral peak shift, frame energy distribution, and spectral tilt. It is shown that traditional speaker recognition based on the Gaussian mixture models-universal background model framework is unreliable when evaluated with screams.
Collapse
Affiliation(s)
- John H L Hansen
- Center for Robust Speech Systems (CRSS), Erik Jonsson School of Engineering and Computer Science, University of Texas at Dallas, Richardson, Texas 75080, USA
| | - Mahesh Kumar Nandwana
- Center for Robust Speech Systems (CRSS), Erik Jonsson School of Engineering and Computer Science, University of Texas at Dallas, Richardson, Texas 75080, USA
| | - Navid Shokouhi
- Center for Robust Speech Systems (CRSS), Erik Jonsson School of Engineering and Computer Science, University of Texas at Dallas, Richardson, Texas 75080, USA
| |
Collapse
|
11
|
Bouserhal RE, Macdonald EN, Falk TH, Voix J. Variations in voice level and fundamental frequency with changing background noise level and talker-to-listener distance while wearing hearing protectors: A pilot study. Int J Audiol 2016; 55 Suppl 1:S13-20. [DOI: 10.3109/14992027.2015.1122240] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
|
12
|
Šimko J, Beňuš Š, Vainio M. Hyperarticulation in Lombard speech: Global coordination of the jaw, lips and the tongue. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 139:151-62. [PMID: 26827013 DOI: 10.1121/1.4939495] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Over the last century, researchers have collected a considerable amount of data reflecting the properties of Lombard speech, i.e., speech in a noisy environment. The documented phenomena predominately report effects on the speech signal produced in ambient noise. In comparison, relatively little is known about the underlying articulatory patterns of Lombard speech, in particular for lingual articulation. Here the authors present an analysis of articulatory recordings of speech material in babble noise of different intensity levels and in hypoarticulated speech and report quantitative differences in relative expansion of movement of different articulatory subsystems (the jaw, the lips and the tongue) as well as in relative expansion of utterance duration. The trajectory modifications for one articulator can be relatively reliably predicted by those for another one, but subsystems differ in a degree of continuity in trajectory expansion elicited across different noise levels. Regression analysis of articulatory modifications against durational expansion shows further qualitative differences between the subsystems, namely, the jaw and the tongue. The findings are discussed in terms of possible influences of a combination of prosodic, segmental, and physiological factors. In addition, the Lombard effect is put forward as a viable methodology for eliciting global articulatory variation in a controlled manner.
Collapse
Affiliation(s)
- Juraj Šimko
- Institute of Behavioural Sciences, University of Helsinki, Siltavuorenpenger 3A - PL 9, 00014 Helsinki, Finland
| | - Štefan Beňuš
- Faculty of Arts, Constantine the Philosopher University, Štefánikova 67, 949 74 Nitra, Slovakia
| | - Martti Vainio
- Institute of Behavioural Sciences, University of Helsinki, Siltavuorenpenger 3A - PL 9, 00014 Helsinki, Finland
| |
Collapse
|
13
|
Poblete V, Espic F, King S, Stern RM, Huenupán F, Fredes J, Yoma NB. A perceptually-motivated low-complexity instantaneous linear channel normalization technique applied to speaker verification. COMPUT SPEECH LANG 2015. [DOI: 10.1016/j.csl.2014.10.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
14
|
An adaptive post-filtering method producing an artificial Lombard-like effect for intelligibility enhancement of narrowband telephone speech. COMPUT SPEECH LANG 2014. [DOI: 10.1016/j.csl.2013.03.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
15
|
Pohjalainen J, Raitio T, Yrttiaho S, Alku P. Detection of shouted speech in noise: human and machine. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 133:2377-2389. [PMID: 23556603 DOI: 10.1121/1.4794394] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
High vocal effort has characteristic acoustic effects on speech. This study focuses on the utilization of this information by human listeners and a machine-based detection system in the task of detecting shouted speech in the presence of noise. Both female and male speakers read Finnish sentences using normal and shouted voice in controlled conditions, with the sound pressure level recorded. The speech material was artificially corrupted by noise and supplemented with pure noise. The human performance level was statistically evaluated by a listening test, where the subjects labeled noisy samples according to whether shouting was heard or not. A Bayesian detection system was constructed and statistically evaluated. Its performance was compared against that of human listeners, substituting different spectrum analysis methods in the feature extraction stage. Using features capable of taking into account the spectral fine structure (i.e., the fundamental frequency and its harmonics), the machine reached the detection level of humans even in the noisiest conditions. In the listening test, male listeners detected shouted speech significantly better than female listeners, especially with speakers making a smaller vocal effort increase for shouting.
Collapse
Affiliation(s)
- Jouni Pohjalainen
- Department of Signal Processing and Acoustics, Aalto University, P.O. Box 13000, FI-00076 AALTO, Espoo, Finland.
| | | | | | | |
Collapse
|
16
|
Cooke M, Lu Y. Spectral and temporal changes to speech produced in the presence of energetic and informational maskers. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 128:2059-2069. [PMID: 20968376 DOI: 10.1121/1.3478775] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Talkers change the way they speak in noisy conditions. For energetic maskers, speech production changes are relatively well-understood, but less is known about how informational maskers such as competing speech affect speech production. The current study examines the effect of energetic and informational maskers on speech production by talkers speaking alone or in pairs. Talkers produced speech in quiet and in backgrounds of speech-shaped noise, speech-modulated noise, and competing speech. Relative to quiet, speech output level and fundamental frequency increased and spectral tilt flattened in proportion to the energetic masking capacity of the background. In response to modulated backgrounds, talkers were able to reduce substantially the degree of temporal overlap with the noise, with greater reduction for the competing speech background. Reduction in foreground-background overlap can be expected to lead to a release from both energetic and informational masking for listeners. Passive changes in speech rate, mean pause length or pause distribution cannot explain the overlap reduction, which appears instead to result from a purposeful process of listening while speaking. Talkers appear to monitor the background and exploit upcoming pauses, a strategy which is particularly effective for backgrounds containing intelligible speech.
Collapse
Affiliation(s)
- Martin Cooke
- Ikerbasque (Basque Science Foundation) and Language and Speech Laboratory, Facultad de Letras, Universidad del Pais Vasco, Paseo de la Universidad 5, Vitoria, Alava 01006, Spain.
| | | |
Collapse
|
17
|
Boril H, Hansen JHL. Unsupervised Equalization of Lombard Effect for Speech Recognition in Noisy Adverse Environments. ACTA ACUST UNITED AC 2010. [DOI: 10.1109/tasl.2009.2034770] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|