1
|
Carmo Alves MD, Mancini PC, Teixeira LC. Use of Auditory Feedback Amplifier in Women Without Voice Complaints: A Comparison of Acoustic Measures, Self-Rated Vocal Effort, and Voice Intensity. J Voice 2024:S0892-1997(23)00347-8. [PMID: 38326173 DOI: 10.1016/j.jvoice.2023.10.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 10/18/2023] [Accepted: 10/18/2023] [Indexed: 02/09/2024]
Abstract
OBJECTIVE To compare the immediate effects of using MindVox in women without voice complaints for 1, 3, 5, and 7 minutes of reading tasks, on acoustic measurements of the vocal signal in low, medium, and strong intensity emissions; on self-rated effort vocal, and on the intensity of voice reception and production. METHODS Participants read one text using MindVox for 1, 3, 5, and 7 minutes. After each time, measures of self-rated vocal effort were collected (BORG CR10-BR Scale), as well as samples of the vowel /e/ at low (>70 dB), moderate (≥70 dB and ≤80 dB), and high intensities (>80 dB). Acoustic measurements (F0, short-term acoustic measurements, and cepstral peak prominence measurements) were also collected before and after the procedure and subsequently analyzed in the CTS 5.0 Vox-Metria Program. Voice reception and production intensities were captured during the reading task using two decibel meters. One decibel meter was installed near the ear (average intensity received by the ear (EAVG)) and the other near the lips (average intensity captured near the lips (LAVG)), and the data were submitted for analysis. RESULTS The Cepstral Peak Prominence-Smoothed increased in the first minute, the Cepstral Peak Prominence increased in the third minute, and the jitter decreased from the first minute. All these changes were observed at low intensity and were maintained at the other time points. For every 5 dB of amplification (EAVG), there was a 1 dB decrease in voice production (LAVG). CONCLUSION Using MindVox in women without voice complaints brings positive immediate effects in cepstral measures and jitter at low intensity. There is a connection between the intensity of the voice received by the ear and the intensity of voice production.
Collapse
Affiliation(s)
- Moisés do Carmo Alves
- Department of Speech-Language Pathology, Universidade Federal de Minas Gerais, UFMG, Belo Horizonte (MG), Brazil.
| | - Patrícia Cotta Mancini
- Department of Speech-Language Pathology, Universidade Federal de Minas Gerais, UFMG, Belo Horizonte (MG), Brazil.
| | - Letícia Caldas Teixeira
- Department of Speech-Language Pathology, Universidade Federal de Minas Gerais, UFMG, Belo Horizonte (MG), Brazil.
| |
Collapse
|
2
|
Dimos K, He L, Dellwo V. Shouting affects temporal properties of the speech amplitude envelope. JASA EXPRESS LETTERS 2024; 4:015202. [PMID: 38169314 DOI: 10.1121/10.0023995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 11/27/2023] [Indexed: 01/05/2024]
Abstract
Distinguishing shouted from non-shouted speech is crucial in communication. We examined how shouting affects temporal properties of the amplitude envelope (ENV) in a total of 720 sentences read by 18 Swiss German speakers in normal and shouted modes; shouting was characterised by maintaining sound pressure levels of ≥80 dB sound pressure level (dB-SPL) (C-weighted) at a 1-meter distance from the mouth. Generalized additive models revealed significant temporal alterations of ENV in shouted speech, marked by steeper ascent, delayed peak, and extended high levels. These findings offer potential cues for identifying shouting, particularly useful when fine-structure and dynamic range cues are absent, for example, in cochlear implant users.
Collapse
Affiliation(s)
- Kostis Dimos
- Department of Computational Linguistics, University of Zurich, Zurich, , ,
| | - Lei He
- Department of Computational Linguistics, University of Zurich, Zurich, , ,
| | - Volker Dellwo
- Department of Computational Linguistics, University of Zurich, Zurich, , ,
| |
Collapse
|
3
|
Alves MDC, Mancini PC, Teixeira LC. Modifications of auditory feedback and its effects on the voice of adult subjects: a scoping review. Codas 2023; 36:e20220202. [PMID: 38126424 PMCID: PMC10750862 DOI: 10.1590/2317-1782/20232022202pt] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 05/29/2023] [Indexed: 12/23/2023] Open
Abstract
INTRODUCTION The auditory perception of voice and its production involve auditory feedback, kinesthetic cues and the feedforward system that produce different effects for the voice. The Lombard, Sidetone and Pitch-Shift-Reflex effects are the most studied. The mapping of scientific experiments on changes in auditory feedback for voice motor control makes it possible to examine the existing literature on the phenomenon and may contribute to voice training or therapies. PURPOSE To map experiments and research results with manipulation of auditory feedback for voice motor control in adults. METHOD Scope review following the Checklist Preferred Reporting Items for Systematic reviews and Meta-Analyses extension (PRISMA-ScR) to answer the question: "What are the investigation methods and main research findings on the manipulation of auditory feedback in voice self-monitoring of adults?". The search protocol was based on the Population, Concept, and Context (PCC) mnemonic strategy, in which the population is adult individuals, the concept is the manipulation of auditory feedback and the context is on motor voice control. Articles were searched in the databases: BVS/Virtual Health Library, MEDLINE/Medical Literature Analysis and Retrieval System online, COCHRANE, CINAHL/Cumulative Index to Nursing and Allied Health Literature, SCOPUS and WEB OF SCIENCE. RESULTS 60 articles were found, 19 on the Lombard Effect, 25 on the Pitch-shift-reflex effect, 12 on the Sidetone effect and four on the Sidetone/Lombard effect. The studies are in agreement that the insertion of a noise that masks the auditory feedback causes an increase in the individual's speech intensity and that the amplification of the auditory feedback promotes the reduction of the sound pressure level in the voice production. A reflex response to the change in pitch is observed in the auditory feedback, however, with particular characteristics in each study. CONCLUSION The material and method of the experiments are different, there are no standardizations in the tasks, the samples are varied and often reduced. The methodological diversity makes it difficult to generalize the results. The main findings of research on auditory feedback on voice motor control confirm that in the suppression of auditory feedback, the individual tends to increase the intensity of the voice. In auditory feedback amplification, the individual decreases the intensity and has greater control over the fundamental frequency, and in frequency manipulations, the individual tends to correct the manipulation. The few studies with dysphonic individuals show that they behave differently from non-dysphonic individuals.
Collapse
Affiliation(s)
- Moisés do Carmo Alves
- Programa de Pós-graduação em Ciências Fonoaudiológicas, Departamento de Fonoaudiologia, Faculdade de Medicina, Universidade Federal de Minas Gerais – UFMG - Belo Horizonte (MG), Brasil.
| | - Patrícia Cotta Mancini
- Programa de Pós-graduação em Ciências Fonoaudiológicas, Departamento de Fonoaudiologia, Faculdade de Medicina, Universidade Federal de Minas Gerais – UFMG - Belo Horizonte (MG), Brasil.
| | - Leticia Caldas Teixeira
- Programa de Pós-graduação em Ciências Fonoaudiológicas, Departamento de Fonoaudiologia, Faculdade de Medicina, Universidade Federal de Minas Gerais – UFMG - Belo Horizonte (MG), Brasil.
| |
Collapse
|
4
|
Kąkol K, Korvel G, Tamulevičius G, Kostek B. Detecting Lombard Speech Using Deep Learning Approach. SENSORS (BASEL, SWITZERLAND) 2022; 23:315. [PMID: 36616913 PMCID: PMC9824848 DOI: 10.3390/s23010315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/03/2022] [Revised: 12/22/2022] [Accepted: 12/24/2022] [Indexed: 06/17/2023]
Abstract
Robust Lombard speech-in-noise detecting is challenging. This study proposes a strategy to detect Lombard speech using a machine learning approach for applications such as public address systems that work in near real time. The paper starts with the background concerning the Lombard effect. Then, assumptions of the work performed for Lombard speech detection are outlined. The framework proposed combines convolutional neural networks (CNNs) and various two-dimensional (2D) speech signal representations. To reduce the computational cost and not resign from the 2D representation-based approach, a strategy for threshold-based averaging of the Lombard effect detection results is introduced. The pseudocode of the averaging process is also included. A series of experiments are performed to determine the most effective network structure and the 2D speech signal representation. Investigations are carried out on German and Polish recordings containing Lombard speech. All 2D signal speech representations are tested with and without augmentation. Augmentation means using the alpha channel to store additional data: gender of the speaker, F0 frequency, and first two MFCCs. The experimental results show that Lombard and neutral speech recordings can clearly be discerned, which is done with high detection accuracy. It is also demonstrated that the proposed speech detection process is capable of working in near real-time. These are the key contributions of this work.
Collapse
Affiliation(s)
| | - Gražina Korvel
- Institute of Data Science and Digital Technologies, Vilnius University, LT-08412 Vilnius, Lithuania
| | - Gintautas Tamulevičius
- Institute of Data Science and Digital Technologies, Vilnius University, LT-08412 Vilnius, Lithuania
| | - Bożena Kostek
- Audio Acoustics Laboratory, Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology, Narutowicza 11/12, 80-233 Gdańsk, Poland
| |
Collapse
|
5
|
Castro C, Prado P, Espinoza VM, Testart A, Marfull D, Manriquez R, Stepp CE, Mehta DD, Hillman RE, Zañartu M. Lombard Effect in Individuals With Nonphonotraumatic Vocal Hyperfunction: Impact on Acoustic, Aerodynamic, and Vocal Fold Vibratory Parameters. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:2881-2895. [PMID: 35930680 PMCID: PMC9913286 DOI: 10.1044/2022_jslhr-21-00508] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 03/17/2022] [Accepted: 05/11/2022] [Indexed: 06/15/2023]
Abstract
PURPOSE This exploratory study aims to investigate variations in voice production in the presence of background noise (Lombard effect) in individuals with nonphonotraumatic vocal hyperfunction (NPVH) and individuals with typical voices using acoustic, aerodynamic, and vocal fold vibratory measures of phonatory function. METHOD Nineteen participants with NPVH and 19 participants with typical voices produced simple vocal tasks in three sequential background conditions: baseline (in quiet), Lombard (in noise), and recovery (5 min after removing the noise). The Lombard condition consisted of speech-shaped noise at 80 dB SPL through audiometric headphones. Acoustic measures from a microphone, glottal aerodynamic parameters estimated from the oral airflow measured with a circumferentially vented pneumotachograph mask, and vocal fold vibratory parameters from high-speed videoendoscopy were analyzed. RESULTS During the Lombard condition, both groups exhibited a decrease in open quotient and increases in sound pressure level, peak-to-peak glottal airflow, maximum flow declination rate, and subglottal pressure. During the recovery condition, the acoustic and aerodynamic measures of individuals with typical voices returned to those of the baseline condition; however, recovery measures for individuals with NPVH did not return to baseline values. CONCLUSIONS As expected, individuals with NPVH and participants with typical voices exhibited a Lombard effect in the presence of elevated background noise levels. During the recovery condition, individuals with NPVH did not return to their baseline state, pointing to a persistence of the Lombard effect after noise removal. This behavior could be related to disruptions in laryngeal motor control and may play a role in the etiology of NPVH. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.20415600.
Collapse
Affiliation(s)
- Christian Castro
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Valparaíso, Chile
- Department of Speech and Language Pathology, Universidad de Valparaíso, Chile
- Department of Speech and Language Pathology, Universidad de Chile, Santiago
| | - Pavel Prado
- Latin American Brain Health Institute (BrainLat), Universidad Adolfo Ibáñez, Santiago, Chile
| | | | - Alba Testart
- Department of Speech and Language Pathology, Universidad de Playa Ancha, Valparaíso, Chile
| | - Daphne Marfull
- Department of Speech and Language Pathology, Universidad de Valparaíso, Chile
| | - Rodrigo Manriquez
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Valparaíso, Chile
| | - Cara E. Stepp
- Department of Speech, Language, and Hearing Sciences, Boston University, MA
- Department of Biomedical Engineering, Boston University, MA
- Department of Otolaryngology-Head and Neck Surgery, Boston University, MA
| | - Daryush D. Mehta
- Center for Laryngeal Surgery & Voice Rehabilitation, Massachusetts General Hospital, Boston
- Department of Surgery, Harvard Medical School, Boston, MA
- MGH Institute of Health Professions, Boston, MA
| | - Robert E. Hillman
- Center for Laryngeal Surgery & Voice Rehabilitation, Massachusetts General Hospital, Boston
- Department of Surgery, Harvard Medical School, Boston, MA
- MGH Institute of Health Professions, Boston, MA
| | - Matías Zañartu
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Valparaíso, Chile
| |
Collapse
|
6
|
Abstract
Voice loss constitutes a crucial disorder which is highly associated with social isolation. The use of multimodal information sources, such as, audiovisual information, is crucial since it can lead to the development of straightforward personalized word prediction models which can reproduce the patient’s original voice. In this work we designed a multimodal approach based on audiovisual information from patients before loss-of-voice to develop a system for automated lip-reading in the Greek language. Data pre-processing methods, such as, lip-segmentation and frame-level sampling techniques were used to enhance the quality of the imaging data. Audio information was incorporated in the model to automatically annotate sets of frames as words. Recurrent neural networks were trained on four different video recordings to develop a robust word prediction model. The model was able to correctly identify test words in different time frames with 95% accuracy. To our knowledge, this is the first word prediction model that is trained to recognize words from video recordings in the Greek language.
Collapse
|
7
|
Kelly F, Hansen JHL. Analysis and Calibration of Lombard Effect and Whisper for Speaker Recognition. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 2021; 29:927-942. [PMID: 35783572 PMCID: PMC9245507 DOI: 10.1109/taslp.2021.3053388] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Variations in vocal effort can create challenges for speaker recognition systems that are optimized for use with neutral speech. The Lombard effect and whisper are two commonly-occurring forms of vocal effort variation that result in non-neutral speech, the first due to noise exposure and the second due to intentional adjustment on the part of the speaker. In this article, a comparative evaluation of speaker recognition performance in non-neutral conditions is presented using multiple Lombard effect and whisper corpora. The detrimental impact of these vocal effort variations on discrimination and calibration performance on global, per-corpus, and per-speaker levels is explored using conventional error metrics, along with visual representations of the model and score spaces. A non-neutral speech detector is subsequently introduced and used to inform score calibration in several ways. Two calibration approaches are proposed and shown to reduce error to the same level as an optimal calibration approach that relies on ground-truth vocal effort information. This article contributes a generalizable methodology towards detecting vocal effort variation and using this knowledge to inform and advance speaker recognition system behavior.
Collapse
Affiliation(s)
- Finnian Kelly
- Center for Robust Speech Systems (CRSS), University of Texas at Dallas, Richardson, TX 75083-0688 USA
| | - John H L Hansen
- Center for Robust Speech Systems (CRSS), University of Texas at Dallas, Richardson, TX 75083-0688 USA
| |
Collapse
|
8
|
An Experimental Analysis of Deep Learning Architectures for Supervised Speech Enhancement. ELECTRONICS 2020. [DOI: 10.3390/electronics10010017] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Recent speech enhancement research has shown that deep learning techniques are very effective in removing background noise. Many deep neural networks are being proposed, showing promising results for improving overall speech perception. The Deep Multilayer Perceptron, Convolutional Neural Networks, and the Denoising Autoencoder are well-established architectures for speech enhancement; however, choosing between different deep learning models has been mainly empirical. Consequently, a comparative analysis is needed between these three architecture types in order to show the factors affecting their performance. In this paper, this analysis is presented by comparing seven deep learning models that belong to these three categories. The comparison includes evaluating the performance in terms of the overall quality of the output speech using five objective evaluation metrics and a subjective evaluation with 23 listeners; the ability to deal with challenging noise conditions; generalization ability; complexity; and, processing time. Further analysis is then provided while using two different approaches. The first approach investigates how the performance is affected by changing network hyperparameters and the structure of the data, including the Lombard effect. While the second approach interprets the results by visualizing the spectrogram of the output layer of all the investigated models, and the spectrograms of the hidden layers of the convolutional neural network architecture. Finally, a general evaluation is performed for supervised deep learning-based speech enhancement while using SWOC analysis, to discuss the technique’s Strengths, Weaknesses, Opportunities, and Challenges. The results of this paper contribute to the understanding of how different deep neural networks perform the speech enhancement task, highlight the strengths and weaknesses of each architecture, and provide recommendations for achieving better performance. This work facilitates the development of better deep neural networks for speech enhancement in the future.
Collapse
|
9
|
Saleem N, Khattak MI. Multi-scale decomposition based supervised single channel deep speech enhancement. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106666] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
10
|
Understanding Lombard speech: a review of compensation techniques towards improving speech based recognition systems. Artif Intell Rev 2020. [DOI: 10.1007/s10462-020-09907-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|