1
|
Echternach M, Burk F, Kirsch J, Traser L, Birkholz P, Burdumy M, Richter B. Articulatory and acoustic differences between lyric and dramatic singing in Western classical music. J Acoust Soc Am 2024; 155:2659-2669. [PMID: 38634661 DOI: 10.1121/10.0025751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Accepted: 03/27/2024] [Indexed: 04/19/2024]
Abstract
Within the realm of voice classification, singers could be sub-categorized by the weight of their repertoire, the so-called "singer's Fach." However, the opposite pole terms "lyric" and "dramatic" singing are not yet well defined by their acoustic and articulatory characteristics. Nine professional singers of different singers' Fach were asked to sing a diatonic scale on the vowel /a/, first in what the singers considered as lyric and second in what they considered as dramatic. Image recording was performed using real time magnetic resonance imaging (MRI) with 25 frames/s, and the audio signal was recorded via an optical microphone system. Analysis was performed with regard to sound pressure level (SPL), vibrato amplitude, and frequency and resonance frequencies as well as articulatory settings of the vocal tract. The analysis revealed three primary differences between dramatic and lyric singing: Dramatic singing was associated with greater SPL and greater vibrato amplitude and frequency as well as lower resonance frequencies. The higher SPL is an indication of voice source changes, and the lower resonance frequencies are probably caused by the lower larynx position. However, all these strategies showed a considerable individual variability. The singers' Fach might contribute to perceptual differences even for the same singer with regard to the respective repertoire.
Collapse
Affiliation(s)
- Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Marchioninistr. 15, 81377 Munich, Germany
| | - Fabian Burk
- Department of Otorhinolaryngology and Plastic Surgery, SRH Wald-Klinikum Gera, Str. des Friedens 122, 07548 Gera, Germany
| | - Jonas Kirsch
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Marchioninistr. 15, 81377 Munich, Germany
| | - Louisa Traser
- Institute of Musicians' Medicine, Faculty of Medicine, Freiburg University and Freiburg University Medical Center, Breisacher Str. 60, 79106 Freiburg, Germany
| | - Peter Birkholz
- Institute of Acoustics and Speech Communication, Technische Universität Dresden, 01062 Dresden, Germany
| | - Michael Burdumy
- Institute of Musicians' Medicine, Faculty of Medicine, Freiburg University and Freiburg University Medical Center, Breisacher Str. 60, 79106 Freiburg, Germany
- Department of Medical Physics, Radiology, Faculty of Medicine, Freiburg University and Freiburg University Medical Center Breisacher Str. 60, 79106 Freiburg, Germany
| | - Bernhard Richter
- Institute of Musicians' Medicine, Faculty of Medicine, Freiburg University and Freiburg University Medical Center, Breisacher Str. 60, 79106 Freiburg, Germany
| |
Collapse
|
2
|
Häsner P, Birkholz P. Manufacturing Process for Non-Adhesive Super-Soft Vocal Fold Models. J Vis Exp 2024. [PMID: 38251763 DOI: 10.3791/66222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2024] Open
Abstract
This study aims to develop super-soft, non-sticky vocal fold models for voice research. The conventional manufacturing process of silicone-based vocal fold models results in models with undesirable properties, such as stickiness and reproducibility issues. Those vocal fold models are prone to rapid aging, leading to poor comparability across different measurements. In this study, we propose a modification to the manufacturing process by changing the order of layering the silicone material, which leads to the production of non-sticky and highly consistent vocal fold models. We also compare a model produced using this method with a conventionally manufactured vocal fold model that is adversely affected by its sticky surface. We detail the manufacturing process and characterize the properties of the models for potential applications. The outcomes of the study demonstrate the efficacy of the modified fabrication method, highlighting the superior qualities of our non-sticky vocal fold models. The findings contribute to the development of realistic and reliable vocal fold models for research and clinical applications.
Collapse
Affiliation(s)
- Patrick Häsner
- Institute of Acoustics and Speech Communication, Technische Universität Dresden;
| | - Peter Birkholz
- Institute of Acoustics and Speech Communication, Technische Universität Dresden
| |
Collapse
|
3
|
Werner R, Fuchs S, Trouvain J, Kürbis S, Möbius B, Birkholz P. Acoustics of Breath Noises in Human Speech: Descriptive and Three-Dimensional Modeling Approaches. J Speech Lang Hear Res 2023:1-15. [PMID: 37971432 DOI: 10.1044/2023_jslhr-23-00112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2023]
Abstract
PURPOSE Breathing is ubiquitous in speech production, crucial for structuring speech, and a potential diagnostic indicator for respiratory diseases. However, the acoustic characteristics of speech breathing remain underresearched. This work aims to characterize the spectral properties of human inhalation noises in a large speaker sample and explore their potential similarities with speech sounds. Speech sounds are mostly realized with egressive airflow. To account for this, we investigated the effect of airflow direction (inhalation vs. exhalation) on acoustic properties of certain vocal tract (VT) configurations. METHOD To characterize human inhalation, we describe spectra of breath noises produced by human speakers from two data sets comprising 34 female and 100 male participants. To investigate the effect of airflow direction, three-dimensional-printed VT models of a male and a female speaker with static VT configurations of four vowels and four fricatives were used. An airstream was directed through these VT configurations in both directions, and their spectral consequences were analyzed. RESULTS For human inhalations, we found spectra with a decreasing slope and several weak peaks below 3 kHz. These peaks show moderate (female) to strong (male) overlap with resonances found for participants inhaling with a VT configuration of a central vowel. Results for the VT models suggest that airflow direction is crucial for spectral properties of sibilants, /ç/, and /i:/, but not the other sounds we investigated. Inhalation noise is most similar to /ə/ where airflow direction does not play a role. CONCLUSIONS Inhalation is realized on ingressive airflow, and inhalation noises have specific resonance properties that are most similar to /ə/ but occur without phonation. Airflow direction does not play a role in this specific VT configuration, but subglottal resonances may do. For future work, we suggest investigating the articulation of speech breathing and link it to current work on pause postures. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.24520585.
Collapse
Affiliation(s)
- Raphael Werner
- Department of Language Science and Technology, Saarland University, Saarbrücken, Germany
| | - Susanne Fuchs
- Leibniz-Centre General Linguistics (ZAS), Berlin, Germany
| | - Jürgen Trouvain
- Department of Language Science and Technology, Saarland University, Saarbrücken, Germany
| | - Steffen Kürbis
- Institute of Acoustics and Speech Communication, Technische Universität Dresden, Germany
| | - Bernd Möbius
- Department of Language Science and Technology, Saarland University, Saarbrücken, Germany
| | - Peter Birkholz
- Institute of Acoustics and Speech Communication, Technische Universität Dresden, Germany
| |
Collapse
|
4
|
Steiner P, Jalalvand A, Birkholz P. Cluster-Based Input Weight Initialization for Echo State Networks. IEEE Trans Neural Netw Learn Syst 2023; 34:7648-7659. [PMID: 35120012 DOI: 10.1109/tnnls.2022.3145565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Echo state networks (ESNs) are a special type of recurrent neural networks (RNNs), in which the input and recurrent connections are traditionally generated randomly, and only the output weights are trained. Despite the recent success of ESNs in various tasks of audio, image, and radar recognition, we postulate that a purely random initialization is not the ideal way of initializing ESNs. The aim of this work is to propose an unsupervised initialization of the input connections using the K -means algorithm on the training data. We show that for a large variety of datasets, this initialization performs equivalently or superior than a randomly initialized ESN while needing significantly less reservoir neurons. Furthermore, we discuss that this approach provides the opportunity to estimate a suitable size of the reservoir based on prior knowledge about the data.
Collapse
|
5
|
Birkholz P, Blandin R, Kürbis S. Bandwidths of vocal tract resonances in physical models compared to transmission-line simulations. J Acoust Soc Am 2023; 153:3281. [PMID: 37307363 DOI: 10.1121/10.0019682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 05/25/2023] [Indexed: 06/14/2023]
Abstract
This study investigated how the bandwidths of resonances simulated by transmission-line models of the vocal tract compare to bandwidths measured from physical three-dimensional printed vowel resonators. Three types of physical resonators were examined: models with realistic vocal tract shapes based on Magnetic Resonance Imaging (MRI) data, straight axisymmetric tubes with varying cross-sectional areas, and two-tube approximations of the vocal tract with notched lips. All physical models had hard walls and closed glottis so the main loss mechanisms contributing to the bandwidths were sound radiation, viscosity, and heat conduction. These losses were accordingly included in the simulations, in two variants: A coarse approximation of the losses with frequency-independent lumped elements, and a detailed, theoretically more precise loss model. Across the examined frequency range from 0 to 5 kHz, the resonance bandwidths increased systematically from the simulations with the coarse loss model to the simulations with the detailed loss model, to the tube-shaped physical resonators, and to the MRI-based resonators. This indicates that the simulated losses, especially the commonly used approximations, underestimate the real losses in physical resonators. Hence, more realistic acoustic simulations of the vocal tract require improved models for viscous and radiation losses.
Collapse
Affiliation(s)
- Peter Birkholz
- Institute of Acoustics and Speech Communication, TU Dresden, Dresden, 01062, Germany
| | - Rémi Blandin
- Institute of Acoustics and Speech Communication, TU Dresden, Dresden, 01062, Germany
| | - Steffen Kürbis
- Institute of Acoustics and Speech Communication, TU Dresden, Dresden, 01062, Germany
| |
Collapse
|
6
|
Häsner P, Birkholz P. Reproducibility and Aging of Different Silicone Vocal Folds Models. J Voice 2023:S0892-1997(23)00085-1. [PMID: 36966126 DOI: 10.1016/j.jvoice.2023.02.028] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Revised: 02/21/2023] [Accepted: 02/21/2023] [Indexed: 03/27/2023]
Abstract
In this study, silicone vocal fold models with different geometries were manufactured using the common silicone brand EcoFlex 00-30 with typical oil mixing ratios. However, the proportions of oil typically used are higher than the manufacturer's recommended limit, in order to attain the softness of human vocal folds. This additional oil causes direct effects on the silicone, such as shrinkage, stickiness, evaporation, embrittlement, and uneven vulcanization. This study investigated the impact of these effects on the oscillation characteristics of the silicone vocal fold models and how they change over time. The goal was to examine the comparability of produced silicone vocal fold models and the results obtained from experiments performed with these models. For the manufactured models, the phonation onset pressure, offset pressure, mean volume velocity, pulmonary power, fundamental frequency, and measures of the glottal area waveform were collected over a period of up to 8 weeks. The results showed that the data for the models were highly scattered. Furthermore, over time, the phonation onset/offset pressures increased, leading to failure to oscillate for some models, and the glottal area waveform also changed. In conclusion, when working with over-thinned silicone vocal fold models, their characteristics depend strongly on the time of measurement. Therefore, it is recommended to carefully consider the effects of oil-oversaturation and timing of measurements when using silicone vocal fold models in experiments.
Collapse
Affiliation(s)
- Patrick Häsner
- Institute of Acoustics and Speech Communication, Technische Universität Dresden, Saxony, Germany.
| | - Peter Birkholz
- Institute of Acoustics and Speech Communication, Technische Universität Dresden, Saxony, Germany.
| |
Collapse
|
7
|
Kleiner C, Häsner P, Birkholz P. Intrinsic velocity differences between larynx raising and larynx lowering. PLoS One 2023; 18:e0281877. [PMID: 36795744 PMCID: PMC9934366 DOI: 10.1371/journal.pone.0281877] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Accepted: 02/02/2023] [Indexed: 02/17/2023] Open
Abstract
In this study, 23 subjects produced cyclic transitions between rounded vowels and unrounded vowels as in /o-i-o-i-o-…/ at two specific speaking rates. Rounded vowels are typically produced with a lower larynx position than unrounded vowels. This contrast in vertical larynx position was further amplified by producing the unrounded vowels with a higher pitch than the rounded vowels. The vertical larynx movements of each subject were measured by means of object tracking in laryngeal ultrasound videos. The results indicate that larynx lowering was on average 26% faster than larynx raising, and that this velocity difference was more pronounced in woman than in men. Possible reasons for this are discussed with a focus on specific biomechanical properties. The results can help to interpret vertical larynx movements with regard to underlying neural control and aerodynamic conditions, and to improve movement models for articulatory speech synthesis.
Collapse
Affiliation(s)
- Christian Kleiner
- Institute of Acoustics and Speech Communication, Faculty of Electrical and Computer Engineering, Technische Universität Dresden, Dresden, Germany
- * E-mail:
| | - Patrick Häsner
- Institute of Acoustics and Speech Communication, Faculty of Electrical and Computer Engineering, Technische Universität Dresden, Dresden, Germany
| | - Peter Birkholz
- Institute of Acoustics and Speech Communication, Faculty of Electrical and Computer Engineering, Technische Universität Dresden, Dresden, Germany
| |
Collapse
|
8
|
Wagner C, Schaffer P, Amini Digehsara P, Bärhold M, Plettemeier D, Birkholz P. Silent speech command word recognition using stepped frequency continuous wave radar. Sci Rep 2022; 12:4192. [PMID: 35273225 PMCID: PMC8913675 DOI: 10.1038/s41598-022-07842-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 02/21/2022] [Indexed: 11/25/2022] Open
Abstract
Recovering speech in the absence of the acoustic speech signal itself, i.e., silent speech, holds great potential for restoring or enhancing oral communication in those who lost it. Radar is a relatively unexplored silent speech sensing modality, even though it has the advantage of being fully non-invasive. We therefore built a custom stepped frequency continuous wave radar hardware to measure the changes in the transmission spectra during speech between three antennas, located on both cheeks and the chin with a measurement update rate of 100 Hz. We then recorded a command word corpus of 40 phonetically balanced, two-syllable German words and the German digits zero to nine for two individual speakers and evaluated both the speaker-dependent multi-session and inter-session recognition accuracies on this 50-word corpus using a bidirectional long-short term memory network. We obtained recognition accuracies of 99.17% and 88.87% for the speaker-dependent multi-session and inter-session accuracy, respectively. These results show that the transmission spectra are very well suited to discriminate individual words from one another, even across different sessions, which is one of the key challenges for fully non-invasive silent speech interfaces.
Collapse
Affiliation(s)
- Christoph Wagner
- Institute of Acoustics and Speech Communication, Chair for Speech Technology and Cognitive Systems, Technische Universität Dresden, 01069, Dresden, Germany.
| | - Petr Schaffer
- Institute of Communication Technology, Chair of Radio Frequency and Photonics Engineering, Technische Universität Dresden, 01069, Dresden, Germany.
| | - Pouriya Amini Digehsara
- Institute of Acoustics and Speech Communication, Chair for Speech Technology and Cognitive Systems, Technische Universität Dresden, 01069, Dresden, Germany
| | - Michael Bärhold
- Institute of Communication Technology, Chair of Radio Frequency and Photonics Engineering, Technische Universität Dresden, 01069, Dresden, Germany
| | - Dirk Plettemeier
- Institute of Communication Technology, Chair of Radio Frequency and Photonics Engineering, Technische Universität Dresden, 01069, Dresden, Germany
| | - Peter Birkholz
- Institute of Acoustics and Speech Communication, Chair for Speech Technology and Cognitive Systems, Technische Universität Dresden, 01069, Dresden, Germany
| |
Collapse
|
9
|
Kleiner C, Kainz MA, Echternach M, Birkholz P. Velocity differences in laryngeal adduction and abduction gestures. J Acoust Soc Am 2022; 151:45. [PMID: 35105025 DOI: 10.1121/10.0009141] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 12/05/2021] [Indexed: 06/14/2023]
Abstract
The periodic repetitions of laryngeal adduction and abduction gestures were uttered by 16 subjects. The movement of the cuneiform tubercles was tracked over time in the laryngoscopic recordings of these utterances. The adduction velocity and abduction velocity were determined objectively by means of a piecewise linear model fitted to the cuneiform tubercle trajectories. The abduction was found to be significantly faster than the adduction. This was interpreted in terms of the biomechanics and active control by the nervous system. The biomechanical properties could be responsible for a velocity of abduction that is up to 51% higher compared to the velocity of adduction. Additionally, the adduction velocity may be actively limited to prevent an overshoot of the intended adduction degree when the vocal folds are approximated to initiate phonation.
Collapse
Affiliation(s)
- Christian Kleiner
- Institute of Acoustics and Speech Communication, Technische Universität Dresden, Dresden, Germany
| | - Marie-Anne Kainz
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Munich, Germany
| | - Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Munich, Germany
| | - Peter Birkholz
- Institute of Acoustics and Speech Communication, Technische Universität Dresden, Dresden, Germany
| |
Collapse
|
10
|
Köberlein M, Birkholz P, Burdumy M, Richter B, Burk F, Traser L, Echternach M. Investigation of resonance strategies of high pitch singing sopranos using dynamic three-dimensional magnetic resonance imaging. J Acoust Soc Am 2021; 150:4191. [PMID: 34972262 DOI: 10.1121/10.0008903] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Accepted: 11/10/2021] [Indexed: 06/14/2023]
Abstract
Resonance-strategies with respect to vocal registers, i.e., frequency-ranges of uniform, demarcated voice quality, for the highest part of the female voice are still not completely understood. The first and second vocal tract resonances usually determine vowels. If the fundamental frequency exceeds the vowel-shaping resonance frequencies of speech, vocal tract resonances are tuned to voice source partials. It has not yet been clarified if such tuning is applicable for the entire voice-range, particularly for the top pitches. We investigated professional sopranos who regularly sing pitches above C6 (1047 Hz). Dynamic three-dimensional (3D) magnetic resonance imaging was used to calculate resonances for pitches from C5 (523 Hz) to C7 (2093 Hz) with different vowel configurations ([a:], [i:], [u:]), and different contexts (scales or octave jumps). A spectral analysis and an acoustic analysis of 3D-printed vocal tract models were conducted. The results suggest that there is no exclusive register-defining resonance-strategy. The intersection of fundamental frequency and first vocal tract resonance was not found to necessarily indicate a register shift. The articulators and the vocal tract resonances were either kept without significant adjustments, or the fR1:fo-tuning, wherein the first vocal tract resonance enhances the fundamental frequency, was applied until F6 (1396 Hz). An fR2:fo-tuning was not observed.
Collapse
Affiliation(s)
- Marie Köberlein
- Medical Faculty of the Albert-Ludwigs-University Freiburg, Freiburg Institute for Musicians' Medicine, University Medical Center Freiburg, University of Music Freiburg, Elsässer Straße 2m, 79110, Freiburg, Germany
| | - Peter Birkholz
- Institute of Acoustics and Speech Communication, Technische Universität Dresden, Germany
| | - Michael Burdumy
- Department of Medical Physics, Radiology, Freiburg University Medical Center, Germany
| | - Bernhard Richter
- Medical Faculty of the Albert-Ludwigs-University Freiburg, Freiburg Institute for Musicians' Medicine, University Medical Center Freiburg, University of Music Freiburg, Elsässer Straße 2m, 79110, Freiburg, Germany
| | - Fabian Burk
- Department of Otorhinolaryngology, Head and Neck Surgery, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Louisa Traser
- Medical Faculty of the Albert-Ludwigs-University Freiburg, Freiburg Institute for Musicians' Medicine, University Medical Center Freiburg, University of Music Freiburg, Elsässer Straße 2m, 79110, Freiburg, Germany
| | - Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, University Hospital, LMU Munich, Germany
| |
Collapse
|
11
|
Krug PK, Gerazov B, van Niekerk DR, Xu A, Xu Y, Birkholz P. Modelling microprosodic effects can lead to an audible improvement in articulatory synthesis. J Acoust Soc Am 2021; 150:1209. [PMID: 34470273 DOI: 10.1121/10.0005876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Accepted: 07/22/2021] [Indexed: 06/13/2023]
Abstract
When pitch is explicitly modelled for parametric speech synthesis, microprosodic variations of the fundamental frequency f0 are usually disregarded by current intonation models. While there are numerous studies dealing with the nature and the origin of microprosody, little research has been done on its audibility and its effect on the naturalness of synthetic speech. In this work, the influence of obstruent-related microprosodic variations on the perceived naturalness of articulatory speech synthesis was studied. A small corpus of 20 German words and sentences was re-synthesized using the state-of-the-art articulatory synthesizer VocalTractLab. The pitch contours of the real utterances were extracted and fitted with the Target-Approximation-Model. After the real microprosodic variations were removed from the obtained pitch contours, synthetic variations were applied based on a microprosody model. Subsequently, multiple stimuli with different microprosody amplitudes were synthesized and evaluated in a listening experiment. The results indicate that microprosodic variations are barely audible, but can lead to a greater perceived naturalness of the synthesized speech in certain cases.
Collapse
Affiliation(s)
- Paul Konstantin Krug
- Institute of Acoustics and Speech Communication, Technische Universität Dresden, Germany
| | - Branislav Gerazov
- Faculty of Electrical Engineering and Information Technologies, Ss. Cyril and Methodius University in Skopje, Republic of North Macedonia
| | - Daniel R van Niekerk
- Department of Speech, Hearing and Phonetic Sciences, University College London, United Kingdom
| | - Anqi Xu
- Department of Speech, Hearing and Phonetic Sciences, University College London, United Kingdom
| | - Yi Xu
- Department of Speech, Hearing and Phonetic Sciences, University College London, United Kingdom
| | - Peter Birkholz
- Institute of Acoustics and Speech Communication, Technische Universität Dresden, Germany
| |
Collapse
|
12
|
Cucchi M, Gruener C, Petrauskas L, Steiner P, Tseng H, Fischer A, Penkovsky B, Matthus C, Birkholz P, Kleemann H, Leo K. Reservoir computing with biocompatible organic electrochemical networks for brain-inspired biosignal classification. Sci Adv 2021; 7:7/34/eabh0693. [PMID: 34407948 PMCID: PMC8373129 DOI: 10.1126/sciadv.abh0693] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2021] [Accepted: 06/28/2021] [Indexed: 05/12/2023]
Abstract
Early detection of malign patterns in patients' biological signals can save millions of lives. Despite the steady improvement of artificial intelligence-based techniques, the practical clinical application of these methods is mostly constrained to an offline evaluation of the patients' data. Previous studies have identified organic electrochemical devices as ideal candidates for biosignal monitoring. However, their use for pattern recognition in real time was never demonstrated. Here, we produce and characterize brain-inspired networks composed of organic electrochemical transistors and use them for time-series predictions and classification tasks using the reservoir computing approach. To show their potential use for biofluid monitoring and biosignal analysis, we classify four classes of arrhythmic heartbeats with an accuracy of 88%. The results of this study introduce a previously unexplored paradigm for biocompatible computational platforms and may enable development of ultralow-power consumption hardware-based artificial neural networks capable of interacting with body fluids and biological tissues.
Collapse
Affiliation(s)
- Matteo Cucchi
- Dresden Integrated Center for Applied Physics and Photonic Materials (IAPP), Nöthnitzer Str. 61, 01187 Dresden, Germany.
| | - Christopher Gruener
- Dresden Integrated Center for Applied Physics and Photonic Materials (IAPP), Nöthnitzer Str. 61, 01187 Dresden, Germany
| | - Lautaro Petrauskas
- Dresden Integrated Center for Applied Physics and Photonic Materials (IAPP), Nöthnitzer Str. 61, 01187 Dresden, Germany
- Chair for Circuit Design and Network Theory (CCN), Technische Universität Dresden, Helmholtzstr. 18, 01069 Dresden, Germany
| | - Peter Steiner
- Institute for Acoustics and Speech Communication (IAS), Technische Universität Dresden, Helmholtzstr. 18, 01069 Dresden, Germany
| | - Hsin Tseng
- Dresden Integrated Center for Applied Physics and Photonic Materials (IAPP), Nöthnitzer Str. 61, 01187 Dresden, Germany
| | - Axel Fischer
- Dresden Integrated Center for Applied Physics and Photonic Materials (IAPP), Nöthnitzer Str. 61, 01187 Dresden, Germany
| | - Bogdan Penkovsky
- National University of Kyiv-Mohyla Academy, Skovorody Str. 2, 04655 Kyiv, Ukraine
- Alysophil SAS, Bio Parc, 850 Boulevard Sebastien Brant, BP 30170 F, 67405, Illkirch CEDEX, France
| | - Christian Matthus
- Chair for Circuit Design and Network Theory (CCN), Technische Universität Dresden, Helmholtzstr. 18, 01069 Dresden, Germany
| | - Peter Birkholz
- Institute for Acoustics and Speech Communication (IAS), Technische Universität Dresden, Helmholtzstr. 18, 01069 Dresden, Germany
| | - Hans Kleemann
- Dresden Integrated Center for Applied Physics and Photonic Materials (IAPP), Nöthnitzer Str. 61, 01187 Dresden, Germany
| | - Karl Leo
- Dresden Integrated Center for Applied Physics and Photonic Materials (IAPP), Nöthnitzer Str. 61, 01187 Dresden, Germany
| |
Collapse
|
13
|
Wagner C, Stappenbeck L, Wenzel H, Steiner P, Lehnert B, Birkholz P. Evaluation of a non-personalized optopalatographic device for prospective use in functional post-stroke dysphagia therapy. IEEE Trans Biomed Eng 2021; 69:356-365. [PMID: 34214033 DOI: 10.1109/tbme.2021.3094415] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
OBJECTIVE Stroke survivors commonly suffer from dysphagia, originating from oro-facial impairments which affect swallowing function. Functional therapy often employs tongue exercises that require the patient to perform short motion sequences. Evaluating the patients performance on those exercises is difficult, because there is no reliable form of visual feedback. METHODS We propose an optopalatographic device that does not require a personalized dental retainer and is capable of measuring tongue movement trajectories intraorally. The device features nine optical proximity sensors at 100 Hz and is fixated against the hard palate with a specifically developed palatal adhesive. The sensing capabilities of the device were evaluated on a tongue gesture corpus recorded from nine healthy individuals, containing eight different tongue exercises commonly used in functional dysphagia therapy. RESULTS The measured tongue trajectories contained temporally and spatially resolved information about the tongue movement and location during each exercise. Furthermore, a simple DTW-kNN classifier was able to distinguish the exercises from one another with an average classification accuracy of 97.9 % and 61.4 % (cross-validation and inter-speaker test accuracy, respectively). CONCLUSION the device can provide real-time feedback for tongue motion and we obtained promising gesture recognition results with relatively few sensors, even in the absence of a personalized dental retainer. SIGNIFICANCE Non-personalized optopalatography is readily available and could aid in improving functional dysphagia therapy by providing visual feedback to both the physician and patient.
Collapse
|
14
|
Gao Y, Ding H, Birkholz P, Lin Y. Comparing fundamental frequency of German vowels produced by German native speakers and Mandarin Chinese learners. JASA Express Lett 2021; 1:075203. [PMID: 36154640 DOI: 10.1121/10.0005593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
This study compared the f0 of 14 German vowels in monosyllabic words (/dVt/) embedded in carrier sentences produced by 30 native speakers and 30 Mandarin Chinese learners. Appropriate techniques were employed to robustly measure f0 values and reliably analyze f0 profiles. The results showed that Mandarin learners produced the vowels bearing sentence stress with significantly larger f0 ranges and steeper f0 slopes but comparable f0 mean and maximum in comparison to German natives. Moreover, lax vowels produced by both groups demonstrated narrower ranges with faster f0 changes than tense vowels, which was stronger for Mandarin learners.
Collapse
Affiliation(s)
- Yingming Gao
- Institute of Acoustics and Speech Communication, TU Dresden, Dresden 01069, Germany
| | - Hongwei Ding
- Speech-Language-Hearing Center, School of Foreign Languages, Shanghai Jiao Tong University, Shanghai 200240, China , , ,
| | - Peter Birkholz
- Institute of Acoustics and Speech Communication, TU Dresden, Dresden 01069, Germany
| | - Yi Lin
- Speech-Language-Hearing Center, School of Foreign Languages, Shanghai Jiao Tong University, Shanghai 200240, China , , ,
| |
Collapse
|
15
|
|
16
|
Häsner P, Prescher A, Birkholz P. Effect of wavy trachea walls on the oscillation onset pressure of silicone vocal folds. J Acoust Soc Am 2021; 149:466. [PMID: 33514162 DOI: 10.1121/10.0003362] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Accepted: 12/29/2020] [Indexed: 06/12/2023]
Abstract
The influence of non-smooth trachea walls on phonation onset and offset pressures and the fundamental frequency of oscillation were experimentally investigated for three different synthetic vocal fold models. Three models of the trachea were compared: a cylindrical tube (smooth walls) and wavy-walled tubes with ripple depths of 1 and 2 mm. Threshold pressures for the onset and offset of phonation were measured at the lower and upper ends of each trachea tube. All measurements were performed both with and without a supraglottal resonator. While the fundamental frequency was not affected by non-smooth trachea walls, the phonation onset and offset pressures measured right below the glottis decreased with an increasing ripple depth of the trachea walls (up to 20% for 2 mm ripples). This effect was independent from the type of glottis model and the presence of a supraglottal resonator. The pressures at the lower end of the trachea and the average volume velocities showed a tendency to decrease with an increasing ripple depth of the trachea walls but to a much smaller extent. These results indicate that the subglottal geometry and the flow conditions in the trachea can substantially affect the oscillation of synthetic vocal folds.
Collapse
Affiliation(s)
- Patrick Häsner
- Insitute of Acoustics and Speech Communication, Technische Universität Dresden, Germany
| | - Andreas Prescher
- Institute of Molecular and Cellular Anatomy, Aachen University Hospital, Aachen, Germany
| | - Peter Birkholz
- Insitute of Acoustics and Speech Communication, Technische Universität Dresden, Germany
| |
Collapse
|
17
|
Toya T, Birkholz P, Unoki M. Measurements of Transmission Characteristics Related to Bone-Conducted Speech Using Excitation Signals in the Oral Cavity. J Speech Lang Hear Res 2020; 63:4252-4264. [PMID: 33170762 DOI: 10.1044/2020_jslhr-20-00097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Purpose Psychoacoustical studies on transmission characteristics related to bone-conducted (BC) speech, perceived by speakers during vocalization, are important for further understanding the relationship between speech production and perception, especially auditory feedback. For exploring how the outer ear part contributes to BC speech transmission, this article aims to measure the transmission characteristics of bone conduction focusing on the vibration of the regio temporalis (RT) and sound radiation in the ear canal (EC) due to the excitation in the oral cavity (OC). Method While an excitation signal was presented through a loudspeaker located in the enclosed cavity below the hard palate, transmitted signals were measured on the RT and in the EC. The transfer functions of the RT vibration and EC sound pressure relative to OC sound pressure were determined from the measured signals using the sweep-sine method. Results Our findings obtained from the measurements of five participants are as follows: (a) the transfer function of the RT vibration relative to the OC sound pressure attenuated the frequency components above 1 kHz and (b) the transfer function of the EC relative to the OC sound pressure emphasized the frequency components between 2 and 3 kHz. Conclusions The vibration of the soft tissue or the skull bone has an effect of low-pass filtering, whereas the sound radiation in the EC has an effect of 2-3 kHz bandpass filtering. Considering the perceptual effect of low-pass filtering in BC speech, our findings suggest that the transmission to the outer ear may not be a dominant contributor to BC speech perception during vocalization.
Collapse
Affiliation(s)
- Teruki Toya
- Graduate School of Advanced Science and Technology, Japan Advanced Institute of Science and Technology, Ishikawa
| | - Peter Birkholz
- Institute of Acoustics and Speech Communications,Technisch Universität Dresden, Germany
| | - Masashi Unoki
- Graduate School of Advanced Science and Technology, Japan Advanced Institute of Science and Technology, Ishikawa
| |
Collapse
|
18
|
Birkholz P, Kürbis S, Stone S, Häsner P, Blandin R, Fleischer M. Printable 3D vocal tract shapes from MRI data and their acoustic and aerodynamic properties. Sci Data 2020; 7:255. [PMID: 32759947 PMCID: PMC7406497 DOI: 10.1038/s41597-020-00597-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Accepted: 07/13/2020] [Indexed: 11/09/2022] Open
Abstract
A detailed understanding of how the acoustic patterns of speech sounds are generated by the complex 3D shapes of the vocal tract is a major goal in speech research. The Dresden Vocal Tract Dataset (DVTD) presented here contains geometric and (aero)acoustic data of the vocal tract of 22 German speech sounds (16 vowels, 5 fricatives, 1 lateral), each from one male and one female speaker. The data include the 3D Magnetic Resonance Imaging data of the vocal tracts, the corresponding 3D-printable and finite-element models, and their simulated and measured acoustic and aerodynamic properties. The dataset was evaluated in terms of the plausibility and the similarity of the resonance frequencies determined by the acoustic simulations and measurements, and in terms of the human identification rate of the vowels and fricatives synthesized by the artificially excited 3D-printed vocal tract models. According to both the acoustic and perceptual metrics, most models are accurate representations of the intended speech sounds and can be readily used for research and education.
Collapse
Affiliation(s)
- Peter Birkholz
- Institute of Acoustics and Speech Communication, TU Dresden, Dresden, Germany.
| | - Steffen Kürbis
- Institute of Acoustics and Speech Communication, TU Dresden, Dresden, Germany
| | - Simon Stone
- Institute of Acoustics and Speech Communication, TU Dresden, Dresden, Germany
| | - Patrick Häsner
- Institute of Acoustics and Speech Communication, TU Dresden, Dresden, Germany
| | - Rémi Blandin
- Institute of Acoustics and Speech Communication, TU Dresden, Dresden, Germany
| | - Mario Fleischer
- Charité - Universitätsmedizin Berlin, Department of Audiology and Phoniatrics, Berlin, Germany
| |
Collapse
|
19
|
Gao Y, Ding H, Birkholz P. An acoustic comparison of German tense and lax vowels produced by German native speakers and Mandarin Chinese learners. J Acoust Soc Am 2020; 148:EL112. [PMID: 32752753 DOI: 10.1121/10.0001628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Accepted: 07/07/2020] [Indexed: 06/11/2023]
Abstract
This study analyzed the durational and spectral differences and their interaction in the production of seven German tense-lax vowel pairs between 30 German native speakers and 30 Mandarin learners of German. The results showed that Mandarin speakers differed significantly from the German speakers in producing the German tense-lax contrast. The general pattern was that Mandarin learners employed temporal features more strongly than spectral features to indicate the tense-lax contrast as compared to German speakers. The phonetic influences of the Mandarin language on the production of German tense and lax vowels are discussed.
Collapse
Affiliation(s)
- Yingming Gao
- Institute of Acoustics and Speech Communication, Technische Universität Dresden, Dresden 01069, Germany
| | - Hongwei Ding
- Speech-Language-Hearing Center, School of Foreign Languages, Shanghai Jiao Tong University, Shanghai 200240, , ,
| | - Peter Birkholz
- Institute of Acoustics and Speech Communication, Technische Universität Dresden, Dresden 01069, Germany
| |
Collapse
|
20
|
Birkholz P, Gabriel F, Kürbis S, Echternach M. How the peak glottal area affects linear predictive coding-based formant estimates of vowels. J Acoust Soc Am 2019; 146:223. [PMID: 31370636 DOI: 10.1121/1.5116137] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Accepted: 06/20/2019] [Indexed: 06/10/2023]
Abstract
The estimation of formant frequencies from acoustic speech signals is mostly based on Linear Predictive Coding (LPC) algorithms. Since LPC is based on the source-filter model of speech production, the formant frequencies obtained are often implicitly regarded as those for an infinite glottal impedance, i.e., a closed glottis. However, previous studies have indicated that LPC-based formant estimates of vowels generated with a realistically varying glottal area may substantially differ from the resonances of the vocal tract with a closed glottis. In the present study, the deviation between closed-glottis resonances and LPC-estimated formants during phonation with different peak glottal areas has been systematically examined both using physical vocal tract models excited with a self-oscillating rubber model of the vocal folds, and by computer simulations of interacting source and filter models. Ten vocal tract resonators representing different vowels have been analyzed. The results showed that F1 increased with the peak area of the time-varying glottis, while F2 and F3 were not systematically affected. The effect of the peak glottal area on F1 was strongest for close-mid to close vowels, and more moderate for mid to open vowels.
Collapse
Affiliation(s)
- Peter Birkholz
- Institute of Acoustics and Speech Communication, TU Dresden, 01062 Dresden, Germany
| | - Falk Gabriel
- Institute of Acoustics and Speech Communication, TU Dresden, 01062 Dresden, Germany
| | - Steffen Kürbis
- Institute of Acoustics and Speech Communication, TU Dresden, 01062 Dresden, Germany
| | - Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital, LMU, Munich, Germany
| |
Collapse
|
21
|
Fleischer M, Mainka A, Kürbis S, Birkholz P. How to precisely measure the volume velocity transfer function of physical vocal tract models by external excitation. PLoS One 2018; 13:e0193708. [PMID: 29543829 PMCID: PMC5854283 DOI: 10.1371/journal.pone.0193708] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Accepted: 02/18/2018] [Indexed: 11/18/2022] Open
Abstract
Recently, 3D printing has been increasingly used to create physical models of the vocal tract with geometries obtained from magnetic resonance imaging. These printed models allow measuring the vocal tract transfer function, which is not reliably possible in vivo for the vocal tract of living humans. The transfer functions enable the detailed examination of the acoustic effects of specific articulatory strategies in speaking and singing, and the validation of acoustic plane-wave models for realistic vocal tract geometries in articulatory speech synthesis. To measure the acoustic transfer function of 3D-printed models, two techniques have been described: (1) excitation of the models with a broadband sound source at the glottis and measurement of the sound pressure radiated from the lips, and (2) excitation of the models with an external source in front of the lips and measurement of the sound pressure inside the models at the glottal end. The former method is more frequently used and more intuitive due to its similarity to speech production. However, the latter method avoids the intricate problem of constructing a suitable broadband glottal source and is therefore more effective. It has been shown to yield a transfer function similar, but not exactly equal to the volume velocity transfer function between the glottis and the lips, which is usually used to characterize vocal tract acoustics. Here, we revisit this method and show both, theoretically and experimentally, how it can be extended to yield the precise volume velocity transfer function of the vocal tract.
Collapse
Affiliation(s)
- Mario Fleischer
- Division of Phoniatrics and Audiology, Department of Otorhinolaryngology, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Fetscherstrasse 74, 01307 Dresden, Germany
- * E-mail:
| | - Alexander Mainka
- Division of Phoniatrics and Audiology, Department of Otorhinolaryngology, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Fetscherstrasse 74, 01307 Dresden, Germany
- Voice Research Laboratory, Hochschule für Musik Carl Maria von Weber Dresden, Wettiner Platz 13, 01067 Dresden, Germany
| | - Steffen Kürbis
- Institute of Acoustics and Speech Communication, Faculty of Electrical and Computer Engineering, Technische Universität Dresden, Helmholtzstrasse 18, 01062 Dresden, Germany
| | - Peter Birkholz
- Institute of Acoustics and Speech Communication, Faculty of Electrical and Computer Engineering, Technische Universität Dresden, Helmholtzstrasse 18, 01062 Dresden, Germany
| |
Collapse
|
22
|
Traser L, Birkholz P, Flügge TV, Kamberger R, Burdumy M, Richter B, Korvink JG, Echternach M. Relevance of the Implementation of Teeth in Three-Dimensional Vocal Tract Models. J Speech Lang Hear Res 2017; 60:2379-2393. [PMID: 28898358 DOI: 10.1044/2017_jslhr-s-16-0395] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Accepted: 02/23/2017] [Indexed: 06/07/2023]
Abstract
PURPOSE Recently, efforts have been made to investigate the vocal tract using magnetic resonance imaging (MRI). Due to technical limitations, teeth were omitted in many previous studies on vocal tract acoustics. However, the knowledge of how teeth influence vocal tract acoustics might be important in order to estimate the necessity of implementing teeth in vocal tract models. The aim of this study was therefore to estimate the effect of teeth on vocal tract acoustics. METHOD The acoustic properties of 18 solid (3-dimensional printed) vocal tract models without teeth were compared to the same 18 models including teeth in terms of resonance frequencies (fRn). The fRn were obtained from the transfer functions of these models excited by white noise at the glottis level. The models were derived from MRI data of 2 trained singers performing 3 different vowel conditions (/i/, /a/, and /u/) in speech and low-pitched and high-pitched singing. RESULTS Depending on the oral configuration, models exhibiting side cavities or side branches were characterized by major changes in the transfer function when teeth were implemented via the introduction of pole-zero pairs. CONCLUSIONS To avoid errors in modeling, teeth should be included in 3-dimensional vocal tract models for acoustic evaluation. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.5386771.
Collapse
Affiliation(s)
- Louisa Traser
- Institute of Musicians' Medicine, Freiburg University Medical Center, Germany
- Department of Otolaryngology, Freiburg University Medical Center, Germany
- Faculty of Medicine, University of Freiburg, Germany
| | - Peter Birkholz
- Institute of Acoustics and Speech Communication, Technische Universität, Dresden, Germany
| | - Tabea Viktoria Flügge
- Faculty of Medicine, University of Freiburg, Germany
- Department of Craniomaxillofacial Surgery, Freiburg University Medical Center, Germany
| | - Robert Kamberger
- Laboratory of Simulation, Department of Microsystems Engineering-IMTEK, University of Freiburg, Germany
| | - Michael Burdumy
- Faculty of Medicine, University of Freiburg, Germany
- Department of Medical Physics, Radiology, Freiburg University Medical Center, Germany
| | - Bernhard Richter
- Institute of Musicians' Medicine, Freiburg University Medical Center, Germany
- Faculty of Medicine, University of Freiburg, Germany
| | - Jan Gerrit Korvink
- Institute of Microstructure Technology, Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen, Germany
| | - Matthias Echternach
- Institute of Musicians' Medicine, Freiburg University Medical Center, Germany
- Faculty of Medicine, University of Freiburg, Germany
| |
Collapse
|
23
|
Abstract
PURPOSE To explore possible effects of tongue piercing on perceived speech quality. METHODS Using a quasi-experimental design, we analyzed the effect of tongue piercing on speech in a perception experiment. Samples of spontaneous speech and read speech were recorded from 20 long-term pierced and 20 non-pierced individuals (10 males, 10 females each). The individuals having a tongue piercing were recorded with attached and removed piercing. The audio samples were blindly rated by 26 female and 20 male laypersons and by 5 female speech-language pathologists with regard to perceived speech quality along 5 dimensions: speech clarity, speech rate, prosody, rhythm and fluency. RESULTS We found no statistically significant differences for any of the speech quality dimensions between the pierced and non-pierced individuals, neither for the read nor for the spontaneous speech. In addition, neither length nor position of piercing had a significant effect on speech quality. The removal of tongue piercings had no effects on speech performance either. Rating differences between laypersons and speech-language pathologists were not dependent on the presence of a tongue piercing. CONCLUSIONS People are able to perfectly adapt their articulation to long-term tongue piercings such that their speech quality is not perceptually affected.
Collapse
Affiliation(s)
- Esther Heinen
- a Department of Phoniatrics, Pedaudiology and Communication Disorders , University Hospital and Medical Faculty of the RWTH Aachen University , Aachen , Germany
| | - Peter Birkholz
- b Institute of Acoustics and Speech Communication, TU Dresden , Dresden , Germany
| | - Klaus Willmes
- c Department of Neurology , University Hospital and Medical Faculty of the RWTH Aachen University , Aachen , Germany
| | - Christiane Neuschaefer-Rube
- a Department of Phoniatrics, Pedaudiology and Communication Disorders , University Hospital and Medical Faculty of the RWTH Aachen University , Aachen , Germany
| |
Collapse
|
24
|
Zaretsky E, Pluschinski P, Sader R, Birkholz P, Neuschaefer-Rube C, Hey C. Identification of the most significant electrode positions in electromyographic evaluation of swallowing-related movements in humans. Eur Arch Otorhinolaryngol 2016; 274:989-995. [PMID: 27581722 DOI: 10.1007/s00405-016-4288-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Accepted: 08/26/2016] [Indexed: 12/11/2022]
Abstract
Surface electromyography (sEMG) is a well-established procedure for recording swallowing-related muscle activities. Because the use of a large number of sEMG channels is time consuming and technically sophisticated, the aim of this study was to identify the most significant electrode positions associated with oropharyngeal swallowing activities. Healthy subjects (N = 16) were tested with a total of 42 channels placed in M. masseter, M. orbicularis oris, submental and paralaryngeal regions. Each test subject swallowed 10 ml of water five times. After having identified 16 optimal electrode positions, that is, positions with the strongest signals quantified by the highest integral values, differences to 26 other ones were determined by a Mann-Whitney U test. Kruskal-Wallis H test was utilized for the analysis of differences between single subjects, subject subgroups, and single electrode positions. Factors associated with sEMG signals were examined in a linear regression. Sixteen electrode positions were chosen by a simple ranking of integral values. These positions delivered significantly higher signals than the other 26 positions. Differences between single electrode positions and between test subjects were also significant. Sixteen most significant positions were identified which represent swallowing-related muscle potentials in healthy subjects.
Collapse
Affiliation(s)
- E Zaretsky
- Department of Phoniatrics and Pediatric Audiology, University Hospital of Marburg, Baldingerstr. 1, 35032, Marburg, Germany
| | - P Pluschinski
- Department of Phoniatrics and Pediatric Audiology, University Hospital of Marburg, Baldingerstr. 1, 35032, Marburg, Germany
| | - R Sader
- Center of Surgery, Clinic for Oral, Dental and Cosmetic Facial Surgery, University Hospital of Frankfurt/Main, Theodor-Stern-Kai 7, 60590, Frankfurt/Main, Germany
| | - P Birkholz
- Institute for Acoustics and Speech Communication, Faculty for Electrical Engineering and Information Technology, Technische Universität Dresden, Helmholtzstr. 10, 01069, Dresden, Germany
| | - C Neuschaefer-Rube
- Department of Phoniatrics and Pediatric Audiology, University Hospital of Aachen, Pauwelsstraße 30, 52074, Aachen, Germany
| | - Christiane Hey
- Department of Phoniatrics and Pediatric Audiology, University Hospital of Marburg, Baldingerstr. 1, 35032, Marburg, Germany.
| |
Collapse
|
25
|
Echternach M, Birkholz P, Sundberg J, Traser L, Korvink JG, Richter B. Resonatory Properties in Professional Tenors Singing Above the Passaggio. ACTA ACUST UNITED AC 2016. [DOI: 10.3813/aaa.918945] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
26
|
Echternach M, Birkholz P, Traser L, Flügge TV, Kamberger R, Burk F, Burdumy M, Richter B. Articulation and vocal tract acoustics at soprano subject's high fundamental frequencies. J Acoust Soc Am 2015; 137:2586-2595. [PMID: 25994691 DOI: 10.1121/1.4919356] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
The role of the vocal tract for phonation at very high soprano fundamental frequencies (F0s) is not yet understood in detail. In this investigation, two experiments were carried out with a single professional high soprano subject. First, using two dimensional (2D) dynamic real-time magnetic resonance imaging (MRI) (24 fps) midsagittal and coronal vocal tract shapes were analyzed while the subject sang a scale from Bb5 (932 Hz) to G6 (1568 Hz). In a second experiment, volumetric vocal tract MRI data were recorded from sustained phonations (13 s) for the pitches C6 (1047 Hz) and G6 (1568 Hz). Formant frequencies were measured in physical models created by 3D printing, and calculated from area functions obtained from the 3D vocal tract shapes. The data showed that there were only minor modifications of the vocal tract shape. These changes involved a decrease of the piriform sinus as well as small changes of tongue position. Formant frequencies did not exhibit major differences between C6 and G6 for F1 and F3, respectively. Only F2 was slightly raised for G6. For G6, however, F2 is not excited by any voice source partial. Therefore, this investigation was not able to confirm that the analyzed professional soprano subject adjusted formants to voice source partials for the analyzed F0s.
Collapse
Affiliation(s)
- Matthias Echternach
- Institute of Musicians' Medicine, Freiburg University Medical Center, Breisacher Str. 60, 79106 Freiburg, Germany
| | - Peter Birkholz
- Institute of Acoustics and Speech Communication, Technische Universität Dresden, Dresden, 01062 Dresden, Germany
| | - Louisa Traser
- Institute of Musicians' Medicine, Freiburg University Medical Center, Breisacher Str. 60, 79106 Freiburg, Germany
| | - Tabea V Flügge
- Department of Craniomaxillofacial Surgery, Freiburg University Medical Center, Hugstetterstr. 55, 79106 Freiburg, Germany
| | - Robert Kamberger
- Laboratory of Simulation, Department of Microsystems Engineering-IMTEK, University of Freiburg, Georges-Köhler-Allee 102, 79110 Freiburg, Germany
| | - Fabian Burk
- Institute of Musicians' Medicine, Freiburg University Medical Center, Breisacher Str. 60, 79106 Freiburg, Germany
| | - Michael Burdumy
- Department of Radiology, Medical Physics, Freiburg University Medical Center, Breisacher Str. 60a, 79106 Freiburg, Germany
| | - Bernhard Richter
- Institute of Musicians' Medicine, Freiburg University Medical Center, Breisacher Str. 60, 79106 Freiburg, Germany
| |
Collapse
|
27
|
Birkholz P, Martin L, Willmes K, Kröger BJ, Neuschaefer-Rube C. The contribution of phonation type to the perception of vocal emotions in German: an articulatory synthesis study. J Acoust Soc Am 2015; 137:1503-1512. [PMID: 25786961 DOI: 10.1121/1.4906836] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Vocal emotions are signaled by specific patterns of prosodic parameters, most notably pitch, phone duration, intensity, and phonation type. Phonation type was so far the least accessible parameter in emotion research, because it was difficult to extract from speech signals and difficult to manipulate in natural or synthetic speech. The present study built on recent advances in articulatory speech synthesis to exclusively control phonation type in re-synthesized German sentences spoken with seven different emotions. The goal was to find out to what extent the sole change of phonation type affects the perception of these emotions. Therefore, portrayed emotional utterances were re-synthesized with their original phonation type, as well as with each purely breathy, modal, and pressed phonation, and then rated by listeners with respect to the perceived emotions. Highly significant effects of phonation type on the recognition rates of the original emotions were found, except for disgust. While fear, anger, and the neutral emotion require specific phonation types for correct perception, sadness, happiness, boredom, and disgust primarily rely on other prosodic parameters. These results can help to improve the expression of emotions in synthesized speech and facilitate the robust automatic recognition of vocal emotions.
Collapse
Affiliation(s)
- Peter Birkholz
- Department of Phoniatrics, Pedaudiology and Communication Disorders, University Hospital Aachen and RWTH Aachen University, Pauwelsstr. 30, 52074 Aachen, Germany
| | - Lucia Martin
- Department of Phoniatrics, Pedaudiology and Communication Disorders, University Hospital Aachen and RWTH Aachen University, Pauwelsstr. 30, 52074 Aachen, Germany
| | - Klaus Willmes
- Section Neuropsychology, Department of Neurology, University Hospital Aachen and RWTH Aachen University, Pauwelsstr. 30, 52074 Aachen, Germany
| | - Bernd J Kröger
- Department of Phoniatrics, Pedaudiology and Communication Disorders, University Hospital Aachen and RWTH Aachen University, Pauwelsstr. 30, 52074 Aachen, Germany
| | - Christiane Neuschaefer-Rube
- Department of Phoniatrics, Pedaudiology and Communication Disorders, University Hospital Aachen and RWTH Aachen University, Pauwelsstr. 30, 52074 Aachen, Germany
| |
Collapse
|
28
|
Junger J, Habel U, Bröhr S, Neulen J, Neuschaefer-Rube C, Birkholz P, Kohler C, Schneider F, Derntl B, Pauly K. More than just two sexes: the neural correlates of voice gender perception in gender dysphoria. PLoS One 2014; 9:e111672. [PMID: 25375171 PMCID: PMC4222943 DOI: 10.1371/journal.pone.0111672] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2014] [Accepted: 10/03/2014] [Indexed: 01/28/2023] Open
Abstract
Gender dysphoria (also known as “transsexualism”) is characterized as a discrepancy between anatomical sex and gender identity. Research points towards neurobiological influences. Due to the sexually dimorphic characteristics of the human voice, voice gender perception provides a biologically relevant function, e.g. in the context of mating selection. There is evidence for a better recognition of voices of the opposite sex and a differentiation of the sexes in its underlying functional cerebral correlates, namely the prefrontal and middle temporal areas. This fMRI study investigated the neural correlates of voice gender perception in 32 male-to-female gender dysphoric individuals (MtFs) compared to 20 non-gender dysphoric men and 19 non-gender dysphoric women. Participants indicated the sex of 240 voice stimuli modified in semitone steps in the direction to the other gender. Compared to men and women, MtFs showed differences in a neural network including the medial prefrontal gyrus, the insula, and the precuneus when responding to male vs. female voices. With increased voice morphing men recruited more prefrontal areas compared to women and MtFs, while MtFs revealed a pattern more similar to women. On a behavioral and neuronal level, our results support the feeling of MtFs reporting they cannot identify with their assigned sex.
Collapse
Affiliation(s)
- Jessica Junger
- Department of Psychiatry, Psychotherapy and Psychosomatics, Medical School, RWTH Aachen University, Aachen, Germany
- Jülich Aachen Research Alliance-Translational Brain Medicine, Jülich, Germany
- * E-mail:
| | - Ute Habel
- Department of Psychiatry, Psychotherapy and Psychosomatics, Medical School, RWTH Aachen University, Aachen, Germany
- Jülich Aachen Research Alliance-Translational Brain Medicine, Jülich, Germany
| | - Sabine Bröhr
- Department of Psychiatry, Psychotherapy and Psychosomatics, Medical School, RWTH Aachen University, Aachen, Germany
| | - Josef Neulen
- Department of Gynaecological Endocrinology and Reproductive Medicine, Medical School, RWTH Aachen University, Aachen, Germany
| | - Christiane Neuschaefer-Rube
- Department of Phoniatrics, Pedaudiology and Communication Disorders, Medical School, RWTH Aachen University, Aachen, Germany
| | - Peter Birkholz
- Department of Phoniatrics, Pedaudiology and Communication Disorders, Medical School, RWTH Aachen University, Aachen, Germany
| | - Christian Kohler
- Department of Psychiatry, Neuropsychiatry Division, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, United States of America
| | - Frank Schneider
- Department of Psychiatry, Psychotherapy and Psychosomatics, Medical School, RWTH Aachen University, Aachen, Germany
- Jülich Aachen Research Alliance-Translational Brain Medicine, Jülich, Germany
| | - Birgit Derntl
- Department of Psychiatry, Psychotherapy and Psychosomatics, Medical School, RWTH Aachen University, Aachen, Germany
- Jülich Aachen Research Alliance-Translational Brain Medicine, Jülich, Germany
| | - Katharina Pauly
- Department of Psychiatry, Psychotherapy and Psychosomatics, Medical School, RWTH Aachen University, Aachen, Germany
- Jülich Aachen Research Alliance-Translational Brain Medicine, Jülich, Germany
| |
Collapse
|
29
|
Junger J, Pauly K, Bröhr S, Birkholz P, Neuschaefer-Rube C, Kohler C, Schneider F, Derntl B, Habel U. Sex matters: Neural correlates of voice gender perception. Neuroimage 2013; 79:275-87. [PMID: 23660030 DOI: 10.1016/j.neuroimage.2013.04.105] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2012] [Revised: 04/12/2013] [Accepted: 04/24/2013] [Indexed: 10/26/2022] Open
Abstract
The basis for different neural activations in response to male and female voices as well as the question, whether men and women perceive male and female voices differently, has not been thoroughly investigated. Therefore, the aim of the present study was to examine the behavioral and neural correlates of gender-related voice perception in healthy male and female volunteers. fMRI data were collected while 39 participants (19 female) were asked to indicate the gender of 240 voice stimuli. These stimuli included recordings of 3-syllable nouns as well as the same recordings pitch-shifted in 2, 4 and 6 semitone steps in the direction of the other gender. Data analysis revealed a) equal voice discrimination sensitivity in men and women but better performance in the categorization of opposite-sex stimuli at least in men, b) increased responses to increasing gender ambiguity in the mid cingulate cortex and bilateral inferior frontal gyri, and c) stronger activation in a fronto-temporal neural network in response to voices of the opposite sex. Our results indicate a gender specific processing for male and female voices on a behavioral and neuronal level. We suggest that our results reflect higher sensitivity probably due to the evolutionary relevance of voice perception in mate selection.
Collapse
Affiliation(s)
- Jessica Junger
- Department of Psychiatry, Medical School, RWTH Aachen University, Aachen, Germany
| | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Abstract
Voice, as a secondary sexual characteristic, is known to affect the perceived attractiveness of human individuals. But the underlying mechanism of vocal attractiveness has remained unclear. Here, we presented human listeners with acoustically altered natural sentences and fully synthetic sentences with systematically manipulated pitch, formants and voice quality based on a principle of body size projection reported for animal calls and emotional human vocal expressions. The results show that male listeners preferred a female voice that signals a small body size, with relatively high pitch, wide formant dispersion and breathy voice, while female listeners preferred a male voice that signals a large body size with low pitch and narrow formant dispersion. Interestingly, however, male vocal attractiveness was also enhanced by breathiness, which presumably softened the aggressiveness associated with a large body size. These results, together with the additional finding that the same vocal dimensions also affect emotion judgment, indicate that humans still employ a vocal interaction strategy used in animal calls despite the development of complex language.
Collapse
Affiliation(s)
- Yi Xu
- Department of Speech, Hearing and Phonetic Sciences, Division of Psychology and Language Sciences, University College London, London, United Kingdom.
| | | | | | | | | |
Collapse
|
31
|
Abstract
A central challenge for articulatory speech synthesis is the simulation of realistic articulatory movements, which is critical for the generation of highly natural and intelligible speech. This includes modeling coarticulation, i.e., the context-dependent variation of the articulatory and acoustic realization of phonemes, especially of consonants. Here we propose a method to simulate the context-sensitive articulation of consonants in consonant-vowel syllables. To achieve this, the vocal tract target shape of a consonant in the context of a given vowel is derived as the weighted average of three measured and acoustically-optimized reference vocal tract shapes for that consonant in the context of the corner vowels /a/, /i/, and /u/. The weights are determined by mapping the target shape of the given context vowel into the vowel subspace spanned by the corner vowels. The model was applied for the synthesis of consonant-vowel syllables with the consonants /b/, /d/, /g/, /l/, /r/, /m/, /n/ in all combinations with the eight long German vowels. In a perception test, the mean recognition rate for the consonants in the isolated syllables was 82.4%. This demonstrates the potential of the approach for highly intelligible articulatory speech synthesis.
Collapse
Affiliation(s)
- Peter Birkholz
- Department of Phoniatrics, Pedaudiology, and Communication Disorders, University Hospital Aachen and RWTH Aachen University, Aachen, Germany.
| |
Collapse
|
32
|
Birkholz P, Kroger BJ, Neuschaefer-Rube C. Model-Based Reproduction of Articulatory Trajectories for Consonant–Vowel Sequences. ACTA ACUST UNITED AC 2011. [DOI: 10.1109/tasl.2010.2091632] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
33
|
Kröger BJ, Birkholz P, Kannampuzha J, Kaufmann E, Mittelberg I. Movements and Holds in Fluent Sentence Production of American Sign Language: The Action-Based Approach. Cognit Comput 2010. [DOI: 10.1007/s12559-010-9071-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
34
|
|
35
|
Abstract
BACKGROUND Detailed knowledge of the neurophysiology of speech acquisition is important for understanding the developmental aspects of speech perception and production and for understanding developmental disorders of speech perception and production. METHOD A computer implemented neural model of sensorimotor control of speech production was developed. The model is capable of demonstrating the neural functions of different cortical areas during speech production in detail. RESULTS (i) Two sensory and two motor maps or neural representations and the appertaining neural mappings or projections establish the sensorimotor feedback control system. These maps and mappings are already formed and trained during the prelinguistic phase of speech acquisition. (ii) The feedforward sensorimotor control system comprises the lexical map (representations of sounds, syllables, and words of the first language) and the mappings from lexical to sensory and to motor maps. The training of the appertaining mappings form the linguistic phase of speech acquisition. (iii) Three prelinguistic learning phases--i. e. silent mouthing, quasi stationary vocalic articulation, and realisation of articulatory protogestures--can be defined on the basis of our simulation studies using the computational neural model. These learning phases can be associated with temporal phases of prelinguistic speech acquisition obtained from natural data. CONCLUSIONS The neural model illuminates the detailed function of specific cortical areas during speech production. In particular it can be shown that developmental disorders of speech production may result from a delayed or incorrect process within one of the prelinguistic learning phases defined by the neural model.
Collapse
Affiliation(s)
- B J Kröger
- Klinik für Phoniatrie, Pädaudiologie und Kommunikationsstörungen, Universitätsklinikum Aachen.
| | | | | |
Collapse
|