1
|
Elie B, Šimko J, Turk A. Optimization-based modeling of Lombard speech articulation: Supraglottal characteristics. JASA EXPRESS LETTERS 2024; 4:015204. [PMID: 38206126 DOI: 10.1121/10.0024364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 12/30/2023] [Indexed: 01/12/2024]
Abstract
This paper shows that a highly simplified model of speech production based on the optimization of articulatory effort versus intelligibility can account for some observed articulatory consequences of signal-to-noise ratio. Simulations of static vowels in the presence of various background noise levels show that the model predicts articulatory and acoustic modifications of the type observed in Lombard speech. These features were obtained only when the constraint applied to articulatory effort decreases as the level of background noise increases. These results support the hypothesis that Lombard speech is listener oriented and speakers adapt their articulation in noisy environments.
Collapse
Affiliation(s)
- Benjamin Elie
- Linguistics and English Language, School of Philosophy, Psychology and Language Sciences, The University of Edinburgh, Edinburgh, Scotland, United Kingdom
| | - Juraj Šimko
- Department of Digital Humanities, Faculty of Arts, University of Helsinki, Helsinki, , ,
| | - Alice Turk
- Linguistics and English Language, School of Philosophy, Psychology and Language Sciences, The University of Edinburgh, Edinburgh, Scotland, United Kingdom
| |
Collapse
|
2
|
Rodríguez-Ferreiro M, Durán-Bouza M, Marrero-Aguiar V. Design and Development of a Spanish Hearing Test for Speech in Noise (PAHRE). Audiol Res 2022; 13:32-48. [PMID: 36648925 PMCID: PMC9844292 DOI: 10.3390/audiolres13010004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 12/17/2022] [Accepted: 12/27/2022] [Indexed: 01/03/2023] Open
Abstract
BACKGROUND There are few hearing tests in Spanish that assess speech discrimination in noise in the adult population that take into account the Lombard effect. This study presents the design and development of a Spanish hearing test for speech in noise (Prueba Auditiva de Habla en Ruido en Español (PAHRE) in Spanish). The pattern of the Quick Speech in Noise test was followed when drafting sentences with five key words each grouped in lists of six sentences. It was necessary to take into account the differences between English and Spanish. METHODS A total of 61 people (24 men and 37 women) with an average age of 46.9 (range 18-84 years) participated in the study. The work was carried out in two phases. In the first phase, a list of Spanish sentences was drafted and subjected to a familiarity test based on the semantic and syntactic characteristics of the sentences; as a result, a list of sentences was selected for the final test. In the second phase, the selected sentences were recorded with and without the Lombard effect, the equivalence between both lists was analysed, and the test was applied to a first reference population. RESULTS The results obtained allow us to affirm that it is representative of the Spanish spoken in its variety in peninsular Spain. CONCLUSIONS In addition, these results point to the usefulness of the PAHRE test in assessing speech in noise by maintaining a fixed speech intensity while varying the intensity of the multi-speaker background noise. The incorporation of the Lombard effect in the test shows discrimination differences with the same signal-to-noise ratio compared to the test without the Lombard effect.
Collapse
Affiliation(s)
| | - Montserrat Durán-Bouza
- Psychology Department, University of A Coruña, 15008 A Coruña, Spain
- Correspondence: ; Tel.: +34-654262068
| | - Victoria Marrero-Aguiar
- Spanish Language and General Linguistics Department, Universidad Nacional de Educación a Distancia, UNED, 28040 Madrid, Spain
| |
Collapse
|
3
|
Ma P, Petridis S, Pantic M. Visual speech recognition for multiple languages in the wild. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00550-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
4
|
Marklund E, Marklund U, Gustavsson L. An Association Between Phonetic Complexity of Infant Vocalizations and Parent Vowel Hyperarticulation. Front Psychol 2021; 12:693866. [PMID: 34354637 PMCID: PMC8329736 DOI: 10.3389/fpsyg.2021.693866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Accepted: 06/21/2021] [Indexed: 12/02/2022] Open
Abstract
Extreme or exaggerated articulation of vowels, or vowel hyperarticulation, is a characteristic commonly found in infant-directed speech (IDS). High degrees of vowel hyperarticulation in parent IDS has been tied to better speech sound category development and bigger vocabulary size in infants. In the present study, the relationship between vowel hyperarticulation in Swedish IDS to 12-month-old and phonetic complexity of infant vocalizations is investigated. Articulatory adaptation toward hyperarticulation is quantified as difference in vowel space area between IDS and adult-directed speech (ADS). Phonetic complexity is estimated using the Word Complexity Measure for Swedish (WCM-SE). The results show that vowels in IDS was more hyperarticulated than vowels in ADS, and that parents' articulatory adaptation in terms of hyperarticulation correlates with phonetic complexity of infant vocalizations. This can be explained either by the parents' articulatory behavior impacting the infants' vocalization behavior, the infants' social and communicative cues eliciting hyperarticulation in the parents' speech, or the two variables being impacted by a third, underlying variable such as parents' general communicative adaptiveness.
Collapse
Affiliation(s)
- Ellen Marklund
- Phonetics Laboratory, Stockholm Babylab, Department of Linguistics, Stockholm University, Stockholm, Sweden
| | - Ulrika Marklund
- Division of Sensory Organs and Communication, Department of Biomedical and Clinical Sciences, Linköping University, Linköping, Sweden
- Speech and Language Clinic, Department of Neurology, Danderyd Hospital, Stockholm, Sweden
- Division of Speech and Language Pathology, Department of Clinical Science, Intervention and Technology, Karolinska Institutet, Stockholm, Sweden
| | - Lisa Gustavsson
- Phonetics Laboratory, Stockholm Babylab, Department of Linguistics, Stockholm University, Stockholm, Sweden
- Division of Speech and Language Pathology, Department of Clinical Science, Intervention and Technology, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
5
|
Villegas J, Perkins J, Wilson I. Effects of task and language nativeness on the Lombard effect and on its onset and offset timing. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:1855. [PMID: 33765802 DOI: 10.1121/10.0003772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Accepted: 02/24/2021] [Indexed: 06/12/2023]
Abstract
This study focuses on the differences in speech sound pressure levels (here, called speech loudness) of Lombard speech (i.e., speech produced in the presence of an energetic masker) associated with different tasks and language nativeness. Vocalizations were produced by native speakers of Japanese with normal hearing and limited English proficiency while performing four tasks: dialog, a competitive game (both communicative), soliloquy, and text passage reading (noncommunicative). Relative to the native language (L1), larger loudness increments were observed in the game and text reading when performed in the second language (L2). Communicative tasks yielded louder vocalizations and larger increments of speech loudness than did noncommunicative tasks regardless of the spoken language. The period in which speakers increased their loudness after the onset of the masker was about fourfold longer than the time in which they decreased their loudness after the offset of the masker. Results suggest that when relying on acoustic signals, speakers use similar vocalization strategies in L1 and L2, and these depend on the complexity of the task, the need for accurate pronunciation, and the presence of a listener. Results also suggest that speakers use different strategies depending on the onset or offset of an energetic masker.
Collapse
Affiliation(s)
- Julián Villegas
- Computer Arts Laboratory, University of Aizu, Aizu-Wakamatsu, Fukushima, 965-8580, Japan
| | - Jeremy Perkins
- CLR Phonetics Laboratory, University of Aizu, Aizu-Wakamatsu, Fukushima, 965-8580, Japan
| | - Ian Wilson
- CLR Phonetics Laboratory, University of Aizu, Aizu-Wakamatsu, Fukushima, 965-8580, Japan
| |
Collapse
|
6
|
Xue Y, Marxen M, Akagi M, Birkholz P. Acoustic and articulatory analysis and synthesis of shouted vowels. COMPUT SPEECH LANG 2021. [DOI: 10.1016/j.csl.2020.101156] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
7
|
Whittico TH, Ortiz AJ, Marks KL, Toles LE, Van Stan JH, Hillman RE, Mehta DD. Ambulatory monitoring of Lombard-related vocal characteristics in vocally healthy female speakers. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:EL552. [PMID: 32611177 PMCID: PMC7316514 DOI: 10.1121/10.0001446] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Speakers typically modify their voice in the presence of increased background noise levels, exhibiting the classic Lombard effect. Lombard-related characteristics during everyday activities were recorded from 17 vocally healthy women who wore an acoustic noise dosimeter and ambulatory voice monitor. The linear relationship between vocal sound pressure level and environmental noise level exhibited an average slope of 0.54 dB/dB and value of 72.8 dB SPL at 50 dBA when correlation coefficients were greater than 0.4. These results, coupled with analyses of spectral and cepstral vocal function measures, provide normative ambulatory Lombard characteristics for comparison with patients with voice-use related disorders.
Collapse
Affiliation(s)
- Thomas H Whittico
- Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, Boston, Massachusetts 02114, , , , , , ,
| | - Andrew J Ortiz
- Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, Boston, Massachusetts 02114, , , , , , ,
| | - Katherine L Marks
- Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, Boston, Massachusetts 02114, , , , , , ,
| | - Laura E Toles
- Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, Boston, Massachusetts 02114, , , , , , ,
| | - Jarrad H Van Stan
- Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, Boston, Massachusetts 02114, , , , , , ,
| | - Robert E Hillman
- Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, Boston, Massachusetts 02114, , , , , , ,
| | - Daryush D Mehta
- Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, Boston, Massachusetts 02114, , , , , , ,
| |
Collapse
|
8
|
Niebuhr O, Nazaryan AN. Money Talks — But Less Well so over the Mobile Phone? The Persistence of the Telephone Voice in a 4G Technology Setting and the Resulting Implications for Business Communication and Mobile-Phone Innovation. INTERNATIONAL JOURNAL OF INNOVATION AND TECHNOLOGY MANAGEMENT 2019. [DOI: 10.1142/s0219877019500135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Our study is a first step toward the innovative further development of mobile phones with special emphasis on optimizing them for business communication. Traditional landline phones and mobile phones up to 3G technology are known to trigger the so-called “telephone voice”. The phonetic changes induced by the telephone voice (louder speech at a higher pitch level) are suitable for undermining the perceived competence, trustworthiness and charisma of a speaker and can, thus, negatively influence business actions over the mobile phone. In a speech production experiment with 20 speakers and a subsequent acoustic speech-signal analysis of almost 15 000 utterances, we tested in comparison to a baseline face-to-face dialog condition, whether the telephone voice still exists in a technological setting of VoLTE 4G mobile-phone communication. In fact, we found that the typical characteristics of the telephone voice persist even under the currently best technological 4G standards and under silent communication conditions. Moreover, we identified further acoustic-phonetic parameters of the telephone voice, some of which (like a more monotonous intonation) further compound the problem of business communication over the mobile phone. In combination, the extended parametric picture and the persistent occurrence of the “telephone voice” even under quiet 4G conditions suggest that a speech-in-noise-like (i.e. Lombard) adaption is not the only and perhaps not even the primary cause behind the telephone voice. Based on this, we propose a number of innovations and R&D activities for making mobile-phone technology more suitable for business communication.
Collapse
Affiliation(s)
- Oliver Niebuhr
- CIE — Centre for Industrial Electronics, Mads Clausen Institute, University of Southern Denmark, Sønderborg, Denmark
| | - Anush Norika Nazaryan
- Department of General Linguistics, Institute of Scandinavian Studies, Frisian, and General Linguistics, Kiel University, Germany
| |
Collapse
|
9
|
Pittman AL, Daliri A, Meadows L. Vocal Biomarkers of Mild-to-Moderate Hearing Loss in Children and Adults: Voiceless Sibilants. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2018; 61:2814-2826. [PMID: 30458528 DOI: 10.1044/2018_jslhr-h-17-0460] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Accepted: 07/08/2018] [Indexed: 06/09/2023]
Abstract
PURPOSE The purpose of this study was to determine if an objective measure of speech production could serve as a vocal biomarker for the effects of high-frequency hearing loss on speech perception. It was hypothesized that production of voiceless sibilants is governed sufficiently by auditory feedback that high-frequency hearing loss results in subtle but significant shifts in the spectral characteristics of these sibilants. METHOD Sibilant production was examined in individuals with mild to moderately severe congenital (22 children; 8-17 years old) and acquired (23 adults; 55-80 years old) hearing losses. Measures of hearing level (pure-tone average thresholds at 4 and 8 kHz), speech perception (detection of nonsense words within sentences), and speech production (spectral center of gravity [COG] for /s/ and /ʃ/) were obtained in unaided and aided conditions. RESULTS For both children and adults, detection of nonsense words increased significantly as hearing thresholds improved. Spectral COG for /ʃ/ was unaffected by hearing loss in both listening conditions, whereas the spectral COG for /s/ significantly decreased as high-frequency hearing loss increased. The distance in spectral COG between /s/ and /ʃ/ decreased significantly with increasing hearing level. COG distance significantly predicted nonsense-word detection in children but not in adults. CONCLUSIONS At least one aspect of speech production (voiceless sibilants) is measurably affected by high-frequency hearing loss and is related to speech perception in children. Speech production did not predict speech perception in adults, suggesting a more complex relationship between auditory feedback and feedforward mechanisms with age. Even so, these results suggest that this vocal biomarker may be useful for identifying the presence of high-frequency hearing loss in adults and children and for predicting the impact of hearing loss in children.
Collapse
Affiliation(s)
- Andrea L Pittman
- Department of Speech and Hearing Science, Arizona State University, Tempe
| | - Ayoub Daliri
- Department of Speech and Hearing Science, Arizona State University, Tempe
| | - Lauren Meadows
- Department of Speech and Hearing Science, Arizona State University, Tempe
| |
Collapse
|
10
|
Garnier M, Ménard L, Alexandre B. Hyper-articulation in Lombard speech: An active communicative strategy to enhance visible speech cues? THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:1059. [PMID: 30180713 DOI: 10.1121/1.5051321] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2017] [Accepted: 08/02/2018] [Indexed: 06/08/2023]
Abstract
This study investigates the hypothesis that speakers make active use of the visual modality in production to improve their speech intelligibility in noisy conditions. Six native speakers of Canadian French produced speech in quiet conditions and in 85 dB of babble noise, in three situations: interacting face-to-face with the experimenter (AV), using the auditory modality only (AO), or reading aloud (NI, no interaction). The audio signal was recorded with the three-dimensional movements of their lips and tongue, using electromagnetic articulography. All the speakers reacted similarly to the presence vs absence of communicative interaction, showing significant speech modifications with noise exposure in both interactive and non-interactive conditions, not only for parameters directly related to voice intensity or for lip movements (very visible) but also for tongue movements (less visible); greater adaptation was observed in interactive conditions, though. However, speakers reacted differently to the availability or unavailability of visual information: only four speakers enhanced their visible articulatory movements more in the AV condition. These results support the idea that the Lombard effect is at least partly a listener-oriented adaptation. However, to clarify their speech in noisy conditions, only some speakers appear to make active use of the visual modality.
Collapse
Affiliation(s)
- Maëva Garnier
- Centre National de la Recherche Scientifique, Laboratoire Grenoble Images Parole Signal Automatique, 11 rue des Mathématiques, Grenoble Campus, Boîte Postale 46, F-38402 Saint Martin d'Hères Cedex, France
| | - Lucie Ménard
- Département de Linguistique, Laboratoire de Phonétique, Center for Research on Brain, Language, and Music, Université du Québec à Montréal, 320, Ste-Catherine Est, Montréal, Quebec H2X 1L7, Canada
| | - Boris Alexandre
- Centre National de la Recherche Scientifique, Laboratoire Grenoble Images Parole Signal Automatique, 11 rue des Mathématiques, Grenoble Campus, Boîte Postale 46, F-38402 Saint Martin d'Hères Cedex, France
| |
Collapse
|
11
|
Alghamdi N, Maddock S, Marxer R, Barker J, Brown GJ. A corpus of audio-visual Lombard speech with frontal and profile views. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:EL523. [PMID: 29960497 DOI: 10.1121/1.5042758] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper presents a bi-view (front and side) audiovisual Lombard speech corpus, which is freely available for download. It contains 5400 utterances (2700 Lombard and 2700 plain reference utterances), produced by 54 talkers, with each utterance in the dataset following the same sentence format as the audiovisual "Grid" corpus [Cooke, Barker, Cunningham, and Shao (2006). J. Acoust. Soc. Am. 120(5), 2421-2424]. Analysis of this dataset confirms previous research, showing prominent acoustic, phonetic, and articulatory speech modifications in Lombard speech. In addition, gender differences are observed in the size of Lombard effect. Specifically, female talkers exhibit a greater increase in estimated vowel duration and a greater reduction in F2 frequency.
Collapse
Affiliation(s)
- Najwa Alghamdi
- Department of Computer Science, University of Sheffield, Sheffield, United Kingdom , , , ,
| | - Steve Maddock
- Department of Computer Science, University of Sheffield, Sheffield, United Kingdom , , , ,
| | - Ricard Marxer
- Department of Computer Science, University of Sheffield, Sheffield, United Kingdom , , , ,
| | - Jon Barker
- Department of Computer Science, University of Sheffield, Sheffield, United Kingdom , , , ,
| | - Guy J Brown
- Department of Computer Science, University of Sheffield, Sheffield, United Kingdom , , , ,
| |
Collapse
|
12
|
Benuš Š, Šimko J. Stability and Variability in Slovak Prosodic Boundaries. PHONETICA 2017; 73:163-193. [PMID: 28208129 DOI: 10.1159/000446350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2015] [Accepted: 04/15/2016] [Indexed: 06/06/2023]
Abstract
BACKGROUND/AIM Encoding intended meanings in the type and strength of prosodic boundaries and strategies for communicating these meanings in ambient noise use similar prosodic cues. We analyze how increasing the level of ambient noise affects the realization of Slovak prosodic boundaries. METHODS Five native speakers of Slovak read sentences, manipulating the boundary type (weak, rise, fall) and the location of pre-boundary pitch accent. Ambient noise of several levels was administered via headphones. Acoustic and articulatory data (electromagnetometry) were collected. RESULTS Under normal condition, boundary strength is signaled with longer pre-boundary rhymes, more frequent pauses, greater crossboundary f0 resets and jaw displacement. The strength of falls is realized in crossboundary features (pauses, f0 reset), and rises in pre-boundary features (rhyme duration, f0 range). Pitch-accented rhymes are strengthened in all features, but f0 range. In noise, the increase in boundary strength is weak, and falls strengthen more than rises. F0 targets for falls and rises are adjusted in addition to noiseinduced global f0 scaling and lengthening. CONCLUSION Hyper-articulation of prosodic boundaries in ambient noise is not robust and uniform; rather, durational, f0 and jaw displacement features co-create complex prosodic patterns in a complementary and synergetic manner based on affordances in normal speech.
Collapse
Affiliation(s)
- Štefan Benuš
- Constantine the Philosopher University, Nitra, Slovakia
| | | |
Collapse
|