1
|
Konuma Y, Asakura T. Effects of microphone mounting location and gender on accuracy in speech recognition using a throat microphone. JASA EXPRESS LETTERS 2023; 3:095203. [PMID: 37725518 DOI: 10.1121/10.0020988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Accepted: 08/26/2023] [Indexed: 09/21/2023]
Abstract
Speech recognition using air-conduction microphones is less accurate under high noise conditions and when the volume of the speaker's voice is relatively low. In this study, the effect of mounting location of throat microphones (which are less susceptible to ambient noise) on recognition accuracy was experimentally investigated. The results confirmed that mounting position and speaker gender affected recognition accuracy, regardless of any other factor or speech recognition system. In addition, relatively lower recognition accuracy was observed in the upper part of the neck near the mandibular angle for both males and females.
Collapse
Affiliation(s)
- Y Konuma
- Tokyo University of Science, 2641 Yamazaki, Noda-shi, Chiba, 278-0022, ,
| | - T Asakura
- Tokyo University of Science, 2641 Yamazaki, Noda-shi, Chiba, 278-0022, ,
| |
Collapse
|
2
|
Bottalico P, Nudelman CJ. Do-It-Yourself Voice Dosimeter Device: A Tutorial and Performance Results. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023:1-15. [PMID: 37263017 DOI: 10.1044/2023_jslhr-23-00060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
PURPOSE Voice dosimeters gather voice production data in the daily lives of individuals with voice disorders. Additionally, voice dosimeters aid in understanding the pathophysiology of voice disorders. Previously, several voice dosimeters were commercially available. However, these devices have been discontinued and are not available to clinicians and researchers alike. In this tutorial, instructions for a low-cost, easy-to-assemble voice dosimeter are provided. This do-it-yourself (DIY) voice dosimeter is further validated based on performance results. METHOD Ten vocally healthy participants wore the DIY voice dosimeter. They produced a sustained /a/ vowel and read a text with three different vocal efforts. These tasks were recorded by the DIY voice dosimeter and a reference microphone simultaneously. The expanded uncertainty of the mean error in the estimation of four voice acoustic parameters as measured by the DIY dosimeter was performed by comparing the signals acquired through the reference microphone and the dosimeter. RESULTS For measures of sound pressure level, the DIY voice dosimeter had a mean error of -0.68 dB with an uncertainty of 0.56 dB. For fundamental frequency, the mean error was 1.56 Hz for female participants and 1.11 Hz for male participants, with an uncertainty of 0.62 Hz and 0.34 Hz for female and male participants, respectively. Cepstral peak prominence smoothed and L1 minus L2 had mean errors (uncertainty) of -0.06 dB (0.27 dB) and 2.20 dB (0.72 dB). CONCLUSION The mean error and uncertainties for the DIY voice dosimeter are comparable to those for the most accurate voice dosimeters that were previously on the market.
Collapse
Affiliation(s)
- Pasquale Bottalico
- Department of Speech and Hearing Science, University of Illinois at Urbana-Champaign
| | - Charles J Nudelman
- Department of Speech and Hearing Science, University of Illinois at Urbana-Champaign
| |
Collapse
|
3
|
Lee SH, Torng PC, Lee GS. Contributions of Forward-Focused Voice to Audio-Vocal Feedback Measured Using Nasal Accelerometry and Power Spectral Analysis of Vocal Fundamental Frequency. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:1751-1766. [PMID: 35353595 DOI: 10.1044/2022_jslhr-21-00443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
PURPOSE The spectral powers of the modulations of vocal fundamental frequency (f o) less than 3 Hz (low-frequency power, LFP) and between 3 and 8 Hz (middle-frequency power, MFP) had been established to indicate the audio-vocal feedback status and vocal efficiency of a speaker, and a resonant voice may enhance the auditory-vocal feedback. This study aims to determine whether the auditory feedback can be augmented by a forward and resonant voice and therefore contribute to the modulations of f o variability. METHOD Vocal signals and accelerometric signals of lateral nasal cartilage were obtained from 27 healthy adults who, respectively, sustained vowels /a/ and /i/ with their habitual speaking voice and with a forward-focused voice under three auditory conditions: natural hearing (N0), high-level noise exposure (N90), and low-level noise exposure (N60). Nasal skin vibrations were measured using a nasal accelerometry to reflect voice resonance status. Vocal intensity and f o variability were also analyzed to show the auditory-vocal interactions under varied conditions of auditory feedback and voice resonance. RESULTS In both N0 and N90 conditions, forward-focused voice showed a significantly lower LFP than the speakers' habitual voice. In addition, LFP of f o would significantly increase during natural voice production as the voice feedback was greatly masked by high-intensity noise; however, with a forward-focused voice, the noise-induced variation in LFP was significantly decreased. Under N90, MFP significantly decreased during forward-focused voice production compared with that measured during natural voice production. The stability of f o modulations was not adversely affected by N60. CONCLUSION The results support the idea that vocalizing with a forward-focused voice enhance the auditory feedback of the speaker's own voice and, thus, reduce the variability of f o during sustained phonation, especially when vocalizing in the high noise condition.
Collapse
Affiliation(s)
- Shao-Hsuan Lee
- Department of Speech Language Pathology and Audiology, National Taipei University of Nursing and Health Sciences, Taiwan
| | - Pao-Chuan Torng
- Department of Speech Language Pathology and Audiology, National Taipei University of Nursing and Health Sciences, Taiwan
| | - Guo-She Lee
- School of Medicine, College of Medicine, National Yang Ming Chiao Tung University, Taiwan
- Department of Otorhinolaryngology, Taipei City Hospital, Renai Branch, Taiwan
| |
Collapse
|
4
|
Selosse G, Grandjean D, Ceravolo L. Influence of bodily resonances on emotional prosody perception. Front Psychol 2022; 13:1061930. [PMID: 36571062 PMCID: PMC9773097 DOI: 10.3389/fpsyg.2022.1061930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 11/22/2022] [Indexed: 12/13/2022] Open
Abstract
Introduction Emotional prosody is defined as suprasegmental and segmental changes in the human voice and related acoustic parameters that can inform the listener about the emotional state of the speaker. While the processing of emotional prosody is well represented in the literature, the mechanism of embodied cognition in emotional voice perception is very little studied. This study aimed to investigate the influence of induced bodily vibrations-through a vibrator placed close to the vocal cords-in the perception of emotional vocalizations. The main hypothesis was that induced body vibrations would constitute a potential interoceptive feedback that can influence the auditory perception of emotions. It was also expected that these effects would be greater for stimuli that are more ambiguous. Methods Participants were presented with emotional vocalizations expressing joy or anger which varied from low-intensity vocalizations, considered as ambiguous, to high-intensity ones, considered as non-ambiguous. Vibrations were induced simultaneously in half of the trials and expressed joy or anger congruently with the voice stimuli. Participants had to evaluate each voice stimulus using four visual analog scales (joy, anger, and surprise, sadness as control scales). Results A significant effect of the vibrations was observed on the three behavioral indexes-discrimination, confusion and accuracy-with vibrations confusing rather than facilitating vocal emotion processing. Conclusion Over all, this study brings new light on a poorly documented topic, namely the potential use of vocal cords vibrations as an interoceptive feedback allowing humans to modulate voice production and perception during social interactions.
Collapse
Affiliation(s)
- Garance Selosse
- Neuroscience of Emotion and Affective Dynamics Lab, Department of Psychology, University of Geneva, Geneva, Switzerland
- Swiss Center for Affective Sciences, University of Geneva, Geneva, Switzerland
- *Correspondence: Garance Selosse,
| | - Didier Grandjean
- Neuroscience of Emotion and Affective Dynamics Lab, Department of Psychology, University of Geneva, Geneva, Switzerland
- Swiss Center for Affective Sciences, University of Geneva, Geneva, Switzerland
| | - Leonardo Ceravolo
- Neuroscience of Emotion and Affective Dynamics Lab, Department of Psychology, University of Geneva, Geneva, Switzerland
- Swiss Center for Affective Sciences, University of Geneva, Geneva, Switzerland
| |
Collapse
|
5
|
Frič M, Hruška V, Dlask P. Full-field face vibration measurement in singing—Case study. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2021.102427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
6
|
Patel RR, Lulich SM, Verdi A. Vocal tract shape and acoustic adjustments of children during phonation into narrow flow-resistant tubes. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:352. [PMID: 31370566 DOI: 10.1121/1.5116681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2018] [Accepted: 06/25/2019] [Indexed: 06/10/2023]
Abstract
The goal of the study is to quantify the salient vocal tract acoustic, subglottal acoustic, and vocal tract physiological characteristics during phonation into a narrow flow-resistant tube with 2.53 mm inner diameter and 124 mm length in typically developing vocally healthy children using simultaneous microphone, accelerometer, and 3D/4D ultrasound recordings. Acoustic measurements included fundamental frequency (fo), first formant frequency (F1), second formant frequency (F2), first subglottal resonance (FSg1), and peak-to-peak amplitude ratio (Pvt:Psg). Physiological measurements included posterior tongue height (D1), tongue dorsum height (D2), tongue tip height (D3), tongue length (D4), oral cavity width (D5), hyoid elevation (D6), pharynx width (D7). All measurements were made on eight boys and ten girls (6-9 years) during sustained /o:/ production at typical pitch and loudness, with and without flow-resistant tube. Phonation with the flow-resistant tube resulted in a significant decrease in F1, F2, and Pvt:Psg and a significant increase in D2, D3, and FSg1. A statistically significant gender effect was observed for D1, with D1 higher in boys. These findings agree well with reported findings from adults, suggesting common acoustic and articulatory mechanisms for narrow flow-resistant tube phonation. Theoretical implications of the findings are discussed.
Collapse
Affiliation(s)
- Rita R Patel
- Department of Speech and Hearing Sciences, Indiana University, 200 South Jordan Avenue, Bloomington, Indiana 47405-7002, USA
| | - Steven M Lulich
- Department of Speech and Hearing Sciences, Indiana University, 200 South Jordan Avenue, Bloomington, Indiana 47405-7002, USA
| | - Alessandra Verdi
- Department of Speech and Hearing Sciences, Indiana University, 200 South Jordan Avenue, Bloomington, Indiana 47405-7002, USA
| |
Collapse
|
7
|
Combined Use of Standard and Throat Microphones for Measurement of Acoustic Voice Parameters and Voice Categorization. J Voice 2015; 29:552-9. [PMID: 25795349 DOI: 10.1016/j.jvoice.2014.10.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2014] [Accepted: 10/14/2014] [Indexed: 11/20/2022]
Abstract
OBJECTIVE The aim of the present study was to evaluate the reliability of the measurements of acoustic voice parameters obtained simultaneously using oral and contact (throat) microphones and to investigate utility of combined use of these microphones for voice categorization. MATERIALS AND METHODS Voice samples of sustained vowel /a/ obtained from 157 subjects (105 healthy and 52 pathological voices) were recorded in a soundproof booth simultaneously through two microphones: oral AKG Perception 220 microphone (AKG Acoustics, Vienna, Austria) and contact (throat) Triumph PC microphone (Clearer Communications, Inc, Burnaby, Canada) placed on the lamina of thyroid cartilage. Acoustic voice signal data were measured for fundamental frequency, percent of jitter and shimmer, normalized noise energy, signal-to-noise ratio, and harmonic-to-noise ratio using Dr. Speech software (Tiger Electronics, Seattle, WA). RESULTS The correlations of acoustic voice parameters in vocal performance were statistically significant and strong (r = 0.71-1.0) for the entire functional measurements obtained for the two microphones. When classifying into healthy-pathological voice classes, the oral-shimmer revealed the correct classification rate (CCR) of 75.2% and the throat-jitter revealed CCR of 70.7%. However, combination of both throat and oral microphones allowed identifying a set of three voice parameters: throat-signal-to-noise ratio, oral-shimmer, and oral-normalized noise energy, which provided the CCR of 80.3%. CONCLUSIONS The measurements of acoustic voice parameters using a combination of oral and throat microphones showed to be reliable in clinical settings and demonstrated high CCRs when distinguishing the healthy and pathological voice patient groups. Our study validates the suitability of the throat microphone signal for the task of automatic voice analysis for the purpose of voice screening.
Collapse
|
8
|
Data dependent random forest applied to screening for laryngeal disorders through analysis of sustained phonation: Acoustic versus contact microphone. Med Eng Phys 2015; 37:210-8. [DOI: 10.1016/j.medengphy.2014.12.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2014] [Revised: 10/31/2014] [Accepted: 12/31/2014] [Indexed: 11/21/2022]
|
9
|
Abstract
PURPOSE This study investigated whether resonant voice training would enhance facial bone vibration during resonant voice production. METHODS Twelve normal healthy participants undertook four sessions of resonant voice training, each lasted for 30 minutes. Piezoelectric accelerometer was used to measure the vibratory level on the face (nasal bridge and upper lip) and the perilaryngeal area during the production of nasal consonant /m/ and vowels /a/, /i/, and /u/ before and after the resonant voice training. The extents of vibration of these four sounds among these three sites were compared. RESULTS Significant increase in facial bone vibration following resonant voice training was found. The nasal bridge showed a significantly larger magnitude of increase when compared with that at the upper lip. Different sounds were also found to facilitate different magnitude of facial bone vibration. Greater magnitude of facial bone vibration was found with the phonation of /m/, /i/, and /u/ when compared with the phonation of /a/. CONCLUSION Resonant voice training facilitated an increase in facial bone vibration, more so at the nasal bridge area than around the upper lip. This is hypothesized to contribute to the improved resonant voice production. Sounds that involve relatively restricted oropharyngeal cavities facilitated a greater extent of facial bone vibration during resonant voice production.
Collapse
|
10
|
Kechichian P, Srinivasan S. Model-based speech enhancement using a bone-conducted signal. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 131:EL262-EL267. [PMID: 22423818 DOI: 10.1121/1.3687014] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Codebook-based single-microphone noise suppressors, which exploit prior knowledge about speech and noise statistics, provide better performance in nonstationary noise. However, as the enhancement involves a joint optimization over speech and noise codebooks, this results in high computational complexity. A codebook-based method is proposed that uses a reference signal observed by a bone-conduction microphone, and a mapping between air- and bone-conduction codebook entries generated during an offline training phase. A smaller subset of air-conducted speech codebook entries that accurately models the clean speech signal is selected using this reference signal. Experiments support the expected improvement in performance at low computational complexity.
Collapse
Affiliation(s)
- Patrick Kechichian
- Philips Research, High Tech Campus 36, 5656AE Eindhoven, The Netherlands.
| | | |
Collapse
|