1
|
Iob NA, He L, Ternström S, Cai H, Brockmann-Bauser M. Effects of Speech Characteristics on Electroglottographic and Instrumental Acoustic Voice Analysis Metrics in Women With Structural Dysphonia Before and After Treatment. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024; 67:1660-1681. [PMID: 38758676 DOI: 10.1044/2024_jslhr-23-00253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2024]
Abstract
PURPOSE Literature suggests a dependency of the acoustic metrics, smoothed cepstral peak prominence (CPPS) and harmonics-to-noise ratio (HNR), on human voice loudness and fundamental frequency (F0). Even though this has been explained with different oscillatory patterns of the vocal folds, so far, it has not been specifically investigated. In the present work, the influence of three elicitation levels, calibrated sound pressure level (SPL), F0 and vowel on the electroglottographic (EGG) and time-differentiated EGG (dEGG) metrics hybrid open quotient (OQ), dEGG OQ and peak dEGG, as well as on the acoustic metrics CPPS and HNR, was examined, and their suitability for voice assessment was evaluated. METHOD In a retrospective study, 29 women with a mean age of 25 years (± 8.9, range: 18-53) diagnosed with structural vocal fold pathologies were examined before and after voice therapy or phonosurgery. Both acoustic and EGG signals were recorded simultaneously during the phonation of the sustained vowels /ɑ/, /i/, and /u/ at three elicited levels of loudness (soft/comfortable/loud) and unconstrained F0 conditions. RESULTS A linear mixed-model analysis showed a significant effect of elicitation effort levels on peak dEGG, HNR, and CPPS (all p < .01). Calibrated SPL significantly influenced HNR and CPPS (both p < .01). Furthermore, F0 had a significant effect on peak dEGG and CPPS (p < .0001). All metrics showed significant changes with regard to vowel (all p < .05). However, the treatment had no effect on the examined metrics, regardless of the treatment type (surgery vs. voice therapy). CONCLUSIONS The value of the investigated metrics for voice assessment purposes when sampled without sufficient control of SPL and F0 is limited, in that they are significantly influenced by the phonatory context, be it speech or elicited sustained vowels. Future studies should explore the diagnostic value of new data collation approaches such as voice mapping, which take SPL and F0 effects into account.
Collapse
Affiliation(s)
- Naomi Anna Iob
- Division of Phoniatrics and Speech Pathology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Zurich, University of Zurich, Switzerland
| | - Lei He
- Division of Phoniatrics and Speech Pathology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Zurich, University of Zurich, Switzerland
- Department of Computational Linguistics, University of Zurich, Switzerland
| | - Sten Ternström
- Division of Speech, Music and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Huanchen Cai
- Division of Speech, Music and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Meike Brockmann-Bauser
- Division of Phoniatrics and Speech Pathology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Zurich, University of Zurich, Switzerland
| |
Collapse
|
2
|
Wu HY. Uncovering Gender-Specific and Cross-Gender Features in Mandarin Deception: An Acoustic and Electroglottographic Approach. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024:1-17. [PMID: 38820240 DOI: 10.1044/2024_jslhr-23-00288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2024]
Abstract
PURPOSE This study aimed to investigate the acoustic and electroglottographic (EGG) profiles of Mandarin deception, including global characteristics and the influence of gender. METHOD Thirty-six Mandarin speakers participated in an interactive interview game in which they provided both deceptive and truthful answers to 14 biographical questions. Acoustic and EGG signals of the participants' responses were simultaneously recorded; 20 acoustic and 14 EGG features were analyzed using binary logistic regression models. RESULTS Increases in fundamental frequency (F0) mean, intensity mean, first formant (F1), fifth formant (F5), contact quotient (CQ), decontacting-time quotient (DTQ), and contact index (CI) as well as decreases in jitter, shimmer, harmonics-to-noise ratio (HNR), and fourth formant (F4) were significantly correlated with global deception. Cross-gender features included increases in intensity mean and F5 and decreases in jitter, HNR, and F4, whereas gender-specific features encompassed increases in F0 mean, shimmer, F1, third formant, and DTQ, as well as decreases in F0 maximum and CQ for female deception, and increases in CQ and CI and decreases in shimmer for male deception. CONCLUSIONS The results suggest that Mandarin deception could be tied to underlying pragmatic functions, emotional arousal, decreased glottal contact skewness, and more pressed phonation. Disparities in gender-specific features lend support to differences in the use of pragmatics, levels of deception-induced emotional arousal, skewness of glottal contact patterns, and phonation types.
Collapse
Affiliation(s)
- Hao-Yu Wu
- Department of English, National Taiwan Normal University, Taipei City
| |
Collapse
|
3
|
Ternström S. Pragmatic De-Noising of Electroglottographic Signals. Bioengineering (Basel) 2024; 11:479. [PMID: 38790346 PMCID: PMC11117636 DOI: 10.3390/bioengineering11050479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 04/30/2024] [Accepted: 05/07/2024] [Indexed: 05/26/2024] Open
Abstract
In voice analysis, the electroglottographic (EGG) signal has long been recognized as a useful complement to the acoustic signal, but only when the vocal folds are actually contacting, such that this signal has an appreciable amplitude. However, phonation can also occur without the vocal folds contacting, as in breathy voice, in which case the EGG amplitude is low, but not zero. It is of great interest to identify the transition from non-contacting to contacting, because this will substantially change the nature of the vocal fold oscillations; however, that transition is not in itself audible. The magnitude of the cycle-normalized peak derivative of the EGG signal is a convenient indicator of vocal fold contacting, but no current EGG hardware has a sufficient signal-to-noise ratio of the derivative. We show how the textbook techniques of spectral thresholding and static notch filtering are straightforward to implement, can run in real time, and can mitigate several noise problems in EGG hardware. This can be useful to researchers in vocology.
Collapse
Affiliation(s)
- Sten Ternström
- Division of Speech, Music and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, 100 44 Stockholm, Sweden
| |
Collapse
|
4
|
Dornelas R, Casmerides MCB, da Silva RC, Victória Dos Anjos Souza M, Pereira LT, Ribeiro VV, Behlau M. Clinical Parameters of the Speech-Language Pathology Assessment of the Chronic Cough: A Scoping Review. J Voice 2024; 38:703-710. [PMID: 35012819 DOI: 10.1016/j.jvoice.2021.12.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 12/14/2021] [Accepted: 12/15/2021] [Indexed: 11/30/2022]
Abstract
OBJECTIVE to map the clinical parameters used in the speech-language pathology assessment of the chronic cough. METHODS a scoping review was performed to answer the clinical question: "What are the clinical parameters included in the speech-language pathology assessment of patients with chronic cough?" Evidence was searched by electronic and manual search. The electronic search included: MEDLINE, Cochrane Library, EMBASE, Web of Science, SCOPUS, and LILACS. Each database had a specific search strategy. The manual search included Journal of Voice, Chest, and Thorax, Brazilian Library of Theses and Dissertations, Open Grey, and Clinical Trials, in addition to scanning the references of the included studies. The extracted data considered information regarding the publication, sample, assessment, and measures used when assessing chronic cough. RESULTS the electronic search found 289 studies; the manual search found 1036 studies; 12 were selected for the present study. The most used assessments were: self-assessment (75%), aerodynamic analysis (66.67%), the perceptual auditory judgment of the voice quality (58.33%), acoustic analysis of the voice (41.67%), cough frequency, and cough threshold (41.67%) and electroglottography (25%). CONCLUSIONS the subjective instruments were used more frequently, while specific objective instruments, which are recent, were used less frequently. Complementary assessments such as vocal assessment, have been frequently used, also, with no other parameter. A lack of homogeneity was identified in the speech-language pathology assessment and measures of patients with chronic cough, thus, the comparison among studies and clinical analysis is difficult.
Collapse
Affiliation(s)
- Rodrigo Dornelas
- Speech-Language Pathology Department, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil; Centro de Estudos da Voz - CEV, São Paulo, Brazil
| | - Maria Christina Bussamara Casmerides
- Postgraduate Program in Medicine (Otorhinolaryngology), Universidade Federal de São Paulo, São Paulo, Brazil; Centro de Estudos da Voz - CEV, São Paulo, Brazil
| | - Rebeca Cardoso da Silva
- Speech-Language Pathology Department, Universidade Federal de Sergipe, Lagarto, Sergipe, Brazil; Centro de Estudos da Voz - CEV, São Paulo, Brazil
| | - Maria Victória Dos Anjos Souza
- Speech-Language Pathology Department, Universidade Federal de Sergipe - UFS. Lagarto, Sergipe, Brazil; Speech-Language Pathology Department, Universidade Federal de Sergipe, Lagarto, Sergipe, Brazil; Centro de Estudos da Voz - CEV, São Paulo, Brazil
| | - Lucas Tito Pereira
- Speech-Language Pathology Department, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil; Centro de Estudos da Voz - CEV, São Paulo, Brazil; Speech-Language Pathology Department, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Vanessa Veis Ribeiro
- Speech-Language Pathology Department, Universidade Federal da Paraíba, João Pessoa, Paraiba, Brazil; Centro de Estudos da Voz - CEV, São Paulo, Brazil.
| | - Mara Behlau
- Speech-Language Pathology Department, Universidade Federal de São Paulo, São Paulo, Brazil; Centro de Estudos da Voz - CEV, São Paulo, Brazil
| |
Collapse
|
5
|
Yiu EML, Cheng LKH, Wang F. Frequency Transmission of Oscillation from External Whole-Body Vibration Platform to the Larynx. J Voice 2024:S0892-1997(24)00093-6. [PMID: 38614894 DOI: 10.1016/j.jvoice.2024.03.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Revised: 03/18/2024] [Accepted: 03/19/2024] [Indexed: 04/15/2024]
Abstract
PURPOSE This study investigates (1) the presence of frequency transmission of oscillation from an external whole-body vibration (WBV) platform to the larynx; and (2) the factors that influence this frequency transmission. METHODS Thirty participants (mean age=22.3years) with normal voice were exposed to four frequency-intensity levels of WBV (10 Hz-10%, 10 Hz-20%, 20 Hz-10%, 20 Hz-20%) and were instructed to produce the natural vowel /a/ three times during each WBV setting. The frequency was extracted from the middle 1-second of each electroglottographic (EGG) signal after passing through a Hann band filter with a range of 6-24 Hz. Linear mixed-effects models were applied to determine the factors that influenced the absolute deviation of the frequency transmission. RESULTS All participants exhibit an extracted EGG frequency that aligns with the external WBV frequency, deviating by - 0.6 to 1.2 Hz. The absolute deviation of WBV frequency transmission is consistent for both sexes across various WBV settings, except the 10 Hz-10% setting where men tend to exhibit significantly higher deviations (P = 0.018). CONCLUSION Oscillations at a specific frequency are transmitted from an external WBV platform to the larynx. This study proposes the use of a "spring" system to investigate the effect of WBV on the larynx, and recommends further research to explore the potential of WBV in managing voice disorders.
Collapse
Affiliation(s)
- E M-L Yiu
- Voice Research Laboratory, The University of Hong Kong, Pokfulam, Hong Kong
| | - L K H Cheng
- Voice Research Laboratory, The University of Hong Kong, Pokfulam, Hong Kong
| | - F Wang
- School of Humanities, Shanghai Normal University, Shanghai, China.
| |
Collapse
|
6
|
Shahid MS, French AP, Valstar MF, Yakubov GE. Research in methodologies for modelling the oral cavity. Biomed Phys Eng Express 2024; 10:032001. [PMID: 38350128 DOI: 10.1088/2057-1976/ad28cc] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 02/13/2024] [Indexed: 02/15/2024]
Abstract
The paper aims to explore the current state of understanding surrounding in silico oral modelling. This involves exploring methodologies, technologies and approaches pertaining to the modelling of the whole oral cavity; both internally and externally visible structures that may be relevant or appropriate to oral actions. Such a model could be referred to as a 'complete model' which includes consideration of a full set of facial features (i.e. not only mouth) as well as synergistic stimuli such as audio and facial thermal data. 3D modelling technologies capable of accurately and efficiently capturing a complete representation of the mouth for an individual have broad applications in the study of oral actions, due to their cost-effectiveness and time efficiency. This review delves into the field of clinical phonetics to classify oral actions pertaining to both speech and non-speech movements, identifying how the various vocal organs play a role in the articulatory and masticatory process. Vitaly, it provides a summation of 12 articulatory recording methods, forming a tool to be used by researchers in identifying which method of recording is appropriate for their work. After addressing the cost and resource-intensive limitations of existing methods, a new system of modelling is proposed that leverages external to internal correlation modelling techniques to create a more efficient models of the oral cavity. The vision is that the outcomes will be applicable to a broad spectrum of oral functions related to physiology, health and wellbeing, including speech, oral processing of foods as well as dental health. The applications may span from speech correction, designing foods for the aging population, whilst in the dental field we would be able to gain information about patient's oral actions that would become part of creating a personalised dental treatment plan.
Collapse
Affiliation(s)
| | - Andrew P French
- School of Computer Science, University of Nottingham, NG8 1BB, United Kingdom
- School of Biosciences, University of Nottingham, LE12 5RD, United Kingdom
| | - Michel F Valstar
- School of Computer Science, University of Nottingham, NG8 1BB, United Kingdom
| | - Gleb E Yakubov
- School of Biosciences, University of Nottingham, LE12 5RD, United Kingdom
| |
Collapse
|
7
|
Wang F, Yiu EML. Predicting Dysphonia by Measuring Surface Electromyographic Activity of the Supralaryngeal Muscles. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024; 67:740-752. [PMID: 38315579 DOI: 10.1044/2023_jslhr-23-00110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2024]
Abstract
PURPOSE This study set out to investigate whether individuals with dysphonia, as determined by either self-assessment or clinician-based auditory-perceptual judgment, exhibited differences in perilaryngeal muscle activities using surface electromyography (sEMG) during various phonatory tasks. Additionally, the study aimed to assess the effectiveness of sEMG in identifying dysphonic cases. METHOD A total of 77 adults (44 women, 33 men, Mage = 30.4 years) participated in this study, with dysphonic cases identified separately using either a 10-item Voice Handicap Index (VHI-10) or clinician-based auditory-perceptual voice quality (APVQ) evaluation. sEMG activities were measured from the areas of suprahyoid and sternocleidomastoid muscles during prolonged vowel /i/ phonations at different pitch and loudness levels. Normalized root-mean-square value against the maximal voluntary contraction (RMS %MVC) of the sEMG signals was obtained for each phonation and compared between subject groups and across phonatory tasks. Additionally, binary logistic regression analysis was performed to determine how the sEMG measures could predict the VHI-10-based or APVQ-based dysphonic cases. RESULTS Participants who scored above the criteria on either the VHI-10 (n = 29) or APVQ judgment (n = 17) exhibited significantly higher RMS %MVC in the right suprahyoid muscles compared to the corresponding control groups. Although the RMS %MVC value from the right suprahyoid muscles alone was not a significant predictor of self-evaluated dysphonic cases, a combination of the RMS %MVC values from both the right and left suprahyoid muscles significantly predicted APVQ-based dysphonic cases with a 69.66% fair level. CONCLUSIONS This study found that individuals with dysphonia, as determined by either self-assessment or APVQ judgment, displayed more imbalanced suprahyoid muscle activities in voice production compared to nondysphonic groups. The combination of the sEMG measures from both left and right suprahyoid muscles showed potential as a predictor of dysphonia with a fair level of confidence. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.25112804.
Collapse
Affiliation(s)
- Feifan Wang
- School of Humanities, Shanghai Normal University, Shanghai, China
- Voice Research Laboratory, The University of Hong Kong, Pokfulam
| | - Edwin M-L Yiu
- Voice Research Laboratory, The University of Hong Kong, Pokfulam
| |
Collapse
|
8
|
Portalete CR, Moraes DADO, Pagliarin KC, Keske-Soares M, Cielo CA. Acoustic and Physiological Voice Assessment And Maximum Phonation Time In Patients With Different Types Of Dysarthria. J Voice 2024; 38:540.e1-540.e11. [PMID: 34895782 DOI: 10.1016/j.jvoice.2021.09.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2021] [Revised: 09/09/2021] [Accepted: 09/16/2021] [Indexed: 10/19/2022]
Abstract
OBJECTIVE To compare the maximum phonation time of /a/, acoustic glottal source parameters, and physiological measures in patients with dysarthria. METHOD Thirteen patients were classified according to dysarthria type and divided into functional profiles (hypofunctional, hyperfunctional, and mixed). Assessments of maximum phonation time of /a/, glottal source parameters, electroglottography, and nasometry were performed. Results were compared between groups using ANOVA and Tukey posthoc tests. RESULTS The highest fundamental frequency differed significantly between groups, with the hyperfunctional profile showing higher values than the other participant groups. Reductions in the maximum phonation time of /a/ and alterations in acoustic glottal source parameters and electroglottography measures were observed in all groups, with no significant differences between them. The remaining measures did not differ between groups. CONCLUSION The maximum phonation times for /a/ were reduced in all participant groups, suggesting air escape during phonation. The presence of alterations in several glottal source parameters in all participant groups is indicative of noise, tremor, and vocal instability. Lastly, the high fundamental frequency in patients with a hyperfunctional profile reinforces the presence of vocal instability. These findings suggest that, although the characteristics observed in the assessments were consistent with expectations of patients with dysarthria, it is difficult to perform a differential diagnosis of this condition based on acoustic and physiological parameters alone.
Collapse
|
9
|
Ren Z, Shang F, Zheng Y, Wu N, Ma L, Zhou X. The Role of EGG in Identifying Prevocalic Glottal Stop. J Voice 2024:S0892-1997(24)00020-1. [PMID: 38402112 DOI: 10.1016/j.jvoice.2024.01.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/26/2024] [Accepted: 01/26/2024] [Indexed: 02/26/2024]
Abstract
OBJECTIVE The aim of the study is to investigate the use of incidences and characteristics of Prevocalic Electroglottographic Signal (PVES) derived from electroglottography (EGG) in characterizing glottal stops (GS) in cleft palate speech. METHODS Mandarin nonaspirated monosyllabic first-tone words were used for the speech sampling procedure. A total of 1680 utterances (from 83 patients with repaired cleft palates) were divided into three categories based on the results of auditory-perceptual evaluation of recorded speech sounds by three independent reviewers: [Category A (absence of GS agreed by all three reviewers) (n = 1192 tokens), Category B (two out of three reviewers agreed on the presence of a GS) (n = 181 tokens) and Category C (all three reviewers agreed on the presence of a GS) (n = 307 tokens)]. The EGG signals of the 1680 utterances were analyzed using a MATLAB program to automatically mark the instances of PVES (amplitude and time-interval) in the GS utterances. RESULTS The result showed that the incidence of EGG PVES presented good positive correlation with auditory-perceptual evaluation (r = 0.703, P<0.000). Statistical analysis revealed a significant difference in mean PVES amplitude among different groups (P<0.05). There was a significant distinction in the time interval between groups A and B, as well as in groups A and C (P<0.05). CONCLUSIONS The study suggests PVES can be an objective means of identifying GS in cleft palate speech. It also indicates that proportion of amplitude and time interval of PVES tend to be positively correlate with subjective assessment.
Collapse
Affiliation(s)
- Zhen Ren
- Department of Oral & Maxillofacial Surgery, Peking University School and Hospital of Stomatology, Beijing, China
| | - Feifei Shang
- Department of Oral & Maxillofacial Surgery, Peking University School and Hospital of Stomatology, Beijing, China
| | - Yafeng Zheng
- Department of Oral & Maxillofacial Surgery, Peking University School and Hospital of Stomatology, Beijing, China
| | - Nankai Wu
- Department of Chinese Language and Literature, Jinan University, Guangzhou, China
| | - Lian Ma
- Department of Oral & Maxillofacial Surgery, Peking University School and Hospital of Stomatology, Beijing, China
| | - Xia Zhou
- Department of Oral & Maxillofacial Surgery, Peking University School and Hospital of Stomatology, Beijing, China.
| |
Collapse
|
10
|
Lester-Smith RA, Derrick E, Larson CR. Characterization of Source-Filter Interactions in Vocal Vibrato Using a Neck-Surface Vibration Sensor: A Pilot Study. J Voice 2024; 38:1-9. [PMID: 34649740 PMCID: PMC8995401 DOI: 10.1016/j.jvoice.2021.08.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 08/18/2021] [Accepted: 08/23/2021] [Indexed: 11/23/2022]
Abstract
PURPOSE Vocal vibrato is a singing technique that involves periodic modulation of fundamental frequency (fo) and intensity. The physiological sources of modulation within the speech mechanism and the interactions between the laryngeal source and vocal tract filter in vibrato are not fully understood. Therefore, the purpose of this study was to determine if differences in the rate and extent of fo and intensity modulation could be captured using simultaneously recorded signals from a neck-surface vibration sensor and a microphone, which represent features of the source before and after supraglottal vocal tract filtering. METHOD Nine classically-trained singers produced sustained vowels with vibrato while simultaneous signals were recorded using a vibration sensor and a microphone. Acoustical analyses were performed to measure the rate and extent of fo and intensity modulation for each trial. Paired-samples sign tests were used to analyze differences between the rate and extent of fo and intensity modulation in the vibration sensor and microphone signals. RESULTS The rate and extent of fo modulation and the extent of intensity modulation were equivalent in the vibration sensor and microphone signals, but the rate of intensity modulation was significantly higher in the microphone signal than in the vibration sensor signal. Larger differences in the rate of intensity modulation were seen with vowels that typically have smaller differences between the first and second formant frequencies. CONCLUSIONS This study demonstrated that the rate of intensity modulation at the source prior to supraglottal vocal tract filtering, as measured in neck-surface vibration sensor signals, was lower than the rate of intensity modulation after supraglottal vocal tract filtering, as measured in microphone signals. The difference in rate varied based on the vowel. These findings provide further support of the resonance-harmonics interaction in vocal vibrato. Further investigation is warranted to determine if differences in the physiological source(s) of vibrato account for inconsistent relationships between the extent of intensity modulation in neck-surface vibration sensor and microphone signals.
Collapse
Affiliation(s)
- Rosemary A Lester-Smith
- Department of Physical Medicine & Rehabilitation, Feinberg School of Medicine, Northwestern University, Chicago, Illinois.
| | - Elaina Derrick
- Department of Speech, Language and Hearing Sciences, Moody College of Communication, The University of Texas at Austin, Austin, Texas
| | - Charles R Larson
- Department of Communication Sciences and Disorders, Northwestern University, Evanston, Illinois
| |
Collapse
|
11
|
Tomaszewska JZ, Georgakis A. Electroglottography in Medical Diagnostics of Vocal Tract Pathologies: A Systematic Review. J Voice 2023:S0892-1997(23)00388-0. [PMID: 38143204 DOI: 10.1016/j.jvoice.2023.12.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 12/02/2023] [Accepted: 12/04/2023] [Indexed: 12/26/2023]
Abstract
Electroglottography (EGG) is a technology developed for measuring the vocal fold contact area during human voice production. Although considered subjective and unreliable as a sole diagnostic method, with the correct application of relevant computational methods, it can constitute a most promising non-invasive voice disorder diagnostic tools in a form of a digital vocal tract pathology classifier. The aim of the following study is to gather and evaluate currently existing digital voice quality assessment systems and vocal tract abnormality classification systems that rely on the use of electroglottographic bio-impedance signals. To fully comprehend the findings of this review, first the subject of EGG is introduced. For that, we summarise most relevant existing research on EGG with a particular focus on its application in diagnostics. Then, we move on to the focal point of this work, which is describing and comparing the existing EGG-based digital voice pathology classification systems. With the application of PRISMA model, 13 articles were chosen and analysed in detail. Direct comparison between chosen studies brought us to pivotal conclusions, which have been described in Section 5 of this report. Meanwhile, certain limitations arising from the literature were identified, such as questionable understanding of the nature of EGG bio-impedance signals. The appropriate recommendations for future work were made, including the application of different methods for EGG feature extraction, as well as the need for continuous EGG datasets development containing signals gathered in various conditions and with different equipments.
Collapse
|
12
|
D'Amario S, Ternström S, Goebl W, Bishop L. Body motion of choral singers. Front Psychol 2023; 14:1220904. [PMID: 38187406 PMCID: PMC10771835 DOI: 10.3389/fpsyg.2023.1220904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 11/24/2023] [Indexed: 01/09/2024] Open
Abstract
Recent investigations on music performances have shown the relevance of singers' body motion for pedagogical as well as performance purposes. However, little is known about how the perception of voice-matching or task complexity affects choristers' body motion during ensemble singing. This study focussed on the body motion of choral singers who perform in duo along with a pre-recorded tune presented over a loudspeaker. Specifically, we examined the effects of the perception of voice-matching, operationalized in terms of sound spectral envelope, and task complexity on choristers' body motion. Fifteen singers with advanced choral experience first manipulated the spectral components of a pre-recorded short tune composed for the study, by choosing the settings they felt most and least together with. Then, they performed the tune in unison (i.e., singing the same melody simultaneously) and in canon (i.e., singing the same melody but at a temporal delay) with the chosen filter settings. Motion data of the choristers' upper body and audio of the repeated performances were collected and analyzed. Results show that the settings perceived as least together relate to extreme differences between the spectral components of the sound. The singers' wrists and torso motion was more periodic, their upper body posture was more open, and their bodies were more distant from the music stand when singing in unison than in canon. These findings suggest that unison singing promotes an expressive-periodic motion of the upper body.
Collapse
Affiliation(s)
- Sara D'Amario
- Department of Music Acoustics, mdw – University of Music and Performing Arts Vienna, Vienna, Austria
- RITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Oslo, Norway
- Department of Musicology, University of Oslo, Oslo, Norway
| | - Sten Ternström
- Division of Speech, Music, and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Werner Goebl
- Department of Music Acoustics, mdw – University of Music and Performing Arts Vienna, Vienna, Austria
| | - Laura Bishop
- RITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Oslo, Norway
- Department of Musicology, University of Oslo, Oslo, Norway
| |
Collapse
|
13
|
Herbst CT, Prigge T, Garcia M, Hampala V, Hofer R, Weissengruber GE, Svec JG, Fitch WT. Domestic cat larynges can produce purring frequencies without neural input. Curr Biol 2023; 33:4727-4732.e4. [PMID: 37794583 DOI: 10.1016/j.cub.2023.09.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 07/12/2023] [Accepted: 09/05/2023] [Indexed: 10/06/2023]
Abstract
Most mammals produce vocal sounds according to the myoelastic-aerodynamic (MEAD) principle, through self-sustaining oscillation of laryngeal tissues.1,2 In contrast, cats have long been believed to produce their low-frequency purr vocalizations through a radically different mechanism involving active muscle contractions (AMC), where neurally driven electromyographic burst patterns (typically at 20-30 Hz) cause the intrinsic laryngeal muscles to actively modulate the respiratory airflow. Direct empirical evidence for this AMC mechanism is sparse.3 Here, the fundamental frequency (fo) ranges of eight domestic cats (Felis silvestris catus) were investigated in an excised larynx setup, to test the prediction of the AMC hypothesis that vibration should be impossible without neuromuscular activity, and thus unattainable in excised larynx setups, which are based on MEAD principles. Surprisingly, all eight excised larynges produced self-sustained oscillations at typical cat purring rates. Histological analysis of cat larynges revealed the presence of connective tissue masses, up to 4 mm in diameter, embedded in the vocal fold.4 This vocal fold specialization appears to allow the unusually low fo values observed in purring. While our data do not fully reject the AMC hypothesis for purring, they show that cat larynges can easily produce sounds in the purr regime with fundamental frequencies of 25 to 30 Hz without neural input or muscular contraction. This strongly suggests that the physical and physiological basis of cat purring involves the same MEAD-based mechanisms as other cat vocalizations (e.g., meows) and most other vertebrate vocalizations but is potentially augmented by AMC.
Collapse
Affiliation(s)
- Christian T Herbst
- Bioacoustics Laboratory, Department of Behavioral and Cognitive Biology, University of Vienna, Djerassiplatz 1, Vienna 1030, Austria; Janette Ogg Voice Research Center, Shenandoah Conservatory, 1460 University Drive, Winchester, VA 22601, USA.
| | - Tamara Prigge
- Institute of Morphology, University of Veterinary Medicine, Veterinärplatz 1, Vienna 1210, Austria
| | - Maxime Garcia
- Department of Livestock Sciences, Research Institute of Organic Agriculture FiBL, Ackerstrasse 113, Box 219, 5070 Frick, Switzerland
| | - Vit Hampala
- Voice Research Lab, Department of Experimental Physics, Faculty of Science, Palacký University, 17. listopadu 1192/12, 779 00 Olomouc, Czechia
| | - Riccardo Hofer
- Bioacoustics Laboratory, Department of Behavioral and Cognitive Biology, University of Vienna, Djerassiplatz 1, Vienna 1030, Austria
| | - Gerald E Weissengruber
- Institute of Morphology, University of Veterinary Medicine, Veterinärplatz 1, Vienna 1210, Austria
| | - Jan G Svec
- Voice Research Lab, Department of Experimental Physics, Faculty of Science, Palacký University, 17. listopadu 1192/12, 779 00 Olomouc, Czechia
| | - W Tecumseh Fitch
- Bioacoustics Laboratory, Department of Behavioral and Cognitive Biology, University of Vienna, Djerassiplatz 1, Vienna 1030, Austria.
| |
Collapse
|
14
|
Noufi C, Berger J, Frank M, Parker K, Bowling DL. ACOUSTICALLY-DRIVEN PHONEME REMOVAL THAT PRESERVES VOCAL AFFECT CUES. PROCEEDINGS OF THE ... IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING. ICASSP (CONFERENCE) 2023; 2023:10.1109/icassp49357.2023.10095942. [PMID: 37701064 PMCID: PMC10495117 DOI: 10.1109/icassp49357.2023.10095942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/14/2023]
Abstract
In this paper, we propose a method for removing linguistic information from speech for the purpose of isolating paralinguistic indicators of affect. The immediate utility of this method lies in clinical tests of sensitivity to vocal affect that are not confounded by language, which is impaired in a variety of clinical populations. The method is based on simultaneous recordings of speech audio and electroglotto-graphic (EGG) signals. The speech audio signal is used to estimate the average vocal tract filter response and amplitude envelop. The EGG signal supplies a direct correlate of voice source activity that is mostly independent of phonetic articulation. These signals are used to create a third signal designed to capture as much paralinguistic information from the vocal production system as possible-maximizing the retention of bioacoustic cues to affect-while eliminating phonetic cues to verbal meaning. To evaluate the success of this method, we studied the perception of corresponding speech audio and transformed EGG signals in an affect rating experiment with online listeners. The results show a high degree of similarity in the perceived affect of matched signals, indicating that our method is effective.
Collapse
Affiliation(s)
- Camille Noufi
- Stanford University, Center for Computer Research in Music and Acoustics, Stanford, CA, USA
| | - Jonathan Berger
- Stanford University, Center for Computer Research in Music and Acoustics, Stanford, CA, USA
| | - Michael Frank
- Stanford School of Medicine, Department of Psychiatry and Behavioral Sciences, Stanford, CA, USA
| | - Karen Parker
- Stanford School of Medicine, Department of Psychiatry and Behavioral Sciences, Stanford, CA, USA
| | - Daniel L Bowling
- Stanford School of Medicine, Department of Psychiatry and Behavioral Sciences, Stanford, CA, USA
| |
Collapse
|
15
|
Donati E, Chousidis C, Ribeiro HDM, Russo N. Classification of Speaking and Singing Voices Using Bioimpedance Measurements and Deep Learning. J Voice 2023:S0892-1997(23)00120-0. [PMID: 37156686 DOI: 10.1016/j.jvoice.2023.03.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 03/29/2023] [Accepted: 03/29/2023] [Indexed: 05/10/2023]
Abstract
The acts of speaking and singing are different phenomena displaying distinct characteristics. The classification and distinction of these voice acts is vastly approached utilizing voice audio recordings and microphones. The use of audio recordings, however, can become challenging and computationally expensive due to the complexity of the voice signal. The research presented in this paper seeks to address this issue by implementing a deep learning classifier of speaking and singing voices based on bioimpedance measurement in replacement of audio recordings. In addition, the proposed research aims to develop a real-time voice act classification for the integration with voice-to-MIDI conversion. For such purposes, a system was designed, implemented, and tested using electroglottographic signals, Mel Frequency Cepstral Coefficients, and a deep neural network. The lack of datasets for the training of the model was tackled by creating a dedicated dataset 7200 bioimpedance measurement of both singing and speaking. The use of bioimpedance measurements allows to deliver high classification accuracy whilst keeping low computational needs for both preprocessing and classification. These characteristics, in turn, allows a fast deployment of the system for near-real-time applications. After the training, the system was broadly tested achieving a testing accuracy of 92% to 94%.
Collapse
Affiliation(s)
- Eugenio Donati
- School of Computing and Engineering, University of West London, London, UK.
| | - Christos Chousidis
- Department of Music and Media, Institute of Sound Recording, University of Surrey, Guildford, UK
| | | | - Nicola Russo
- School of Computing and Engineering, University of West London, London, UK
| |
Collapse
|
16
|
Masapollo M, Nittrouer S. Interarticulator Speech Coordination: Timing Is of the Essence. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:901-915. [PMID: 36827516 DOI: 10.1044/2022_jslhr-22-00594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
PURPOSE In skilled speech production, sets of articulators, such as the jaw, tongue, and lips, work cooperatively to achieve task-specific movement goals, despite rampant contextual variation. Efforts to understand these functional units, termed coordinative structures, have focused on identifying the essential control parameters responsible for allowing articulators to achieve these goals, with some research focusing on temporal parameters (relative timing of movements) and other research focusing on spatiotemporal parameters (phase angle of movement onset for one articulator, relative to another). Here, both types of parameters were investigated and compared in detail. METHOD Ten talkers recorded nonsense, disyllabic /tV#Cat/ utterances using electromagnetic articulography, with alternative V (/ɑ/-/ɛ/) and C (/t/-/d/), across variation in rate (fast-slow) and stress (first syllable stressed-unstressed). Two measures were obtained: (a) the timing of tongue-tip raising onset for medial C, relative to jaw opening-closing cycles and (b) the angle of tongue-tip raising onset, relative to the jaw phase plane. RESULTS Results showed that any manipulation that shortened the jaw opening-closing cycle reduced both the relative timing and phase angle of the tongue-tip movement onset, but relative timing of tongue-tip movement onset scaled more consistently with jaw opening-closing across rate and stress variation. CONCLUSION These findings suggest the existence of an intrinsic timing mechanism (or "central clock") that is the primary control parameter for coordinative structures, with online compensation then allowing these structures to achieve their goals spatially. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.22144259.
Collapse
Affiliation(s)
- Matthew Masapollo
- Department of Speech, Language, and Hearing Sciences, University of Florida, Gainesville
| | - Susan Nittrouer
- Department of Speech, Language, and Hearing Sciences, University of Florida, Gainesville
| |
Collapse
|
17
|
Differences Among Mixed, Chest, and Falsetto Registers: A Multiparametric Study. J Voice 2023; 37:298.e11-298.e29. [PMID: 33518476 DOI: 10.1016/j.jvoice.2020.12.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Revised: 12/23/2020] [Accepted: 12/28/2020] [Indexed: 11/23/2022]
Abstract
INTRODUCTION Typical singing registers are the chest and falsetto; however, trained singers have an additional register, namely, the mixed register. The mixed register, which is also called "mixed voice" or "mix," is an important technique for singers, as it can help bridge from the chest voice to falsetto without noticeable voice breaks. OBJECTIVE The present study aims to reveal the nature of the voice-production mechanism of the different registers (chest, mix, and falsetto) using high-speed digital imaging (HSDI), electroglottography (EGG), and acoustic and aerodynamic measurements. STUDY DESIGN Cross-sectional study. METHODS Aerodynamic measurements were acquired for twelve healthy singers (six men and women) during the phonation of a variety of pitches using three registers. HSDI and EGG devices were simultaneously used on three healthy singers (two men and one woman) from which an open quotient (OQ) and speed quotient (SQ) were detected. Audio signals were recorded for five sustained vowels, and a spectral analysis was conducted to determine the amplitude of each harmonic component. Furthermore, the absolute (not relative) value of the glottal volume flow was estimated by integrating data obtained from the HSDI and aerodynamic studies. RESULTS For all singers, the subglottal pressure (PSub) was the highest for the chest in the three registers, and the mean flow rate (MFR) was the highest for the falsetto. Conversely, the PSub of the mix was as low as the falsetto, and the MFR of the mix was as low as the chest. The HSDI analysis showed that the OQ differed significantly among the registers, even when the fundamental frequency was the same; the OQ of the mix was higher than that of the chest but lower than that of the falsetto. The acoustic analysis showed that, for the mix, the harmonic structure was intermediate between the chest and falsetto. The results of the glottal volume-flow analysis revealed that the maximum volume velocity was the least for the mix register at every fundamental frequency. The first and second harmonic (H1-H2) difference of the voice source spectrum was the greatest for the falsetto, then the mix, and finally, the chest. CONCLUSIONS We found differences in the registers in terms of the aeromechanical mechanisms and vibration patterns of the vocal folds. The mixed register proved to have a distinct voice-production mechanism, which can be differentiated from those of the chest or falsetto registers.
Collapse
|
18
|
Pedersen M, Larsen CF, Madsen B, Eeg M. Localization and quantification of glottal gaps on deep learning segmentation of vocal folds. Sci Rep 2023; 13:878. [PMID: 36650265 PMCID: PMC9845318 DOI: 10.1038/s41598-023-27980-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 01/11/2023] [Indexed: 01/19/2023] Open
Abstract
The entire glottis has mostly been the focus in the tracking of the vocal folds, both manually and automatically. From a treatment point of view, the various regions of the glottis are of specific interest. The aim of the study was to test if it was possible to supplement an existing convolutional neural network (CNN) with post-network calculations for the localization and quantification of posterior glottal gaps during phonation, usable for vocal fold function analysis of e.g. laryngopharyngeal reflux findings. 30 subjects/videos with insufficient closure in the rear glottal area and 20 normal subjects/videos were selected from our database, recorded with a commercial high-speed video setup (HSV with 4000 frames per second), and segmented with an open-source CNN for validating voice function. We made post-network calculations to localize and quantify the 10% and 50% distance lines from the rear part of the glottis. The results showed a significant difference using the algorithm at the 10% line distance between the two groups of p < 0.0001 and no difference at 50%. These novel results show that it is possible to use post-network calculations on CNNs for the localization and quantification of posterior glottal gaps.
Collapse
|
19
|
Grouping Intrinsic Mode Functions and Residue for Pathological Classifications via Electroglottograms. Ing Rech Biomed 2022. [DOI: 10.1016/j.irbm.2022.11.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
20
|
Chi Y, Honda K, Wei J. Near-infrared photoglottography for measuring multiple glottal events. JASA EXPRESS LETTERS 2022; 2:105203. [PMID: 36319211 DOI: 10.1121/10.0014810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Near-infrared (NIR) photoglottography (PGG) is a non-invasive method for monitoring glottal activities which retains functionality of conventional PGG using visible light with more convenient accessibility. This paper is to investigate its performance in comparison with simultaneously recorded electroglottography (EGG) signals. Results showed that NIR PGG detects continuous transillumination for glottal aperture and vocal-fold contact. Glottal timing markers known as glottal closure and opening instants are detectable agreeing to the corresponding EGG-based instants. Further, it was inferred that variations of glottal waveforms based on NIR PGG reflect vertical vocal-fold edge motions.
Collapse
Affiliation(s)
- Yujie Chi
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China , ,
| | - Kiyoshi Honda
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China , ,
| | - Jianguo Wei
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China , ,
| |
Collapse
|
21
|
Oral and Laryngeal Articulation Control of Voicing in Children with and without Speech Sound Disorders. CHILDREN 2022; 9:children9050649. [PMID: 35626826 PMCID: PMC9139554 DOI: 10.3390/children9050649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 04/07/2022] [Accepted: 04/28/2022] [Indexed: 11/25/2022]
Abstract
Voicing contrast is hard to master during speech motor development, and the phonological process of consonant devoicing is very frequent in children with Speech Sound Disorders (SSD). Therefore, the aim of this study was to characterise the oral and laryngeal articulation control strategies used by children with and without SSD as a function of place of articulation. The articulation rate and relative oral airflow amplitude (flow) were used to analyse how children controlled oral articulation; fundamental frequency (fo), open quotient (OQ), and a classification of voicing were used to explore laryngeal behaviour. Data from detailed speech and language assessments, oral airflow and electroglottography signals were collected from 13 children with SSD and 17 children without SSD, aged 5; 0 to 7; 8, using picture naming tasks. Articulation rate and flow in children with and without SSD were not significantly different, but a statistically reliable effect of place on flow was found. Children with and without SSD used different relative fo (which captures changes in fo during the consonant-vowel transition) and OQ values, and place of articulation had an effect on the strength of voicing. All children used very similar oral articulation control of voicing, but children with SSD used less efficient laryngeal articulation strategies (higher subglottal damping and more air from the lungs expelled in each glottal cycle) than children without SSD.
Collapse
|
22
|
Abril-Rodríguez S, Herrero R. Biofeedback electromiográfico y electroglotográfico aplicado a la terapia vocal: una revisión sistemática. REVISTA DE INVESTIGACIÓN EN LOGOPEDIA 2022. [DOI: 10.5209/rlog.75581] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
La electromiografía y electroglotografía son técnicas de exploración que, combinadas con el biofeedback, permiten en el emisor ajustes musculares para la mejora de la función fonatoria. Nos proponemos, pues, determinar los efectos del biofeedback electromiográfico para aumentar o disminuir el tono en los músculos que intervienen de manera indirecta o directa en la producción de la voz, identificar los efectos del biofeedback electroglográfico para producir cambios en el patrón vibratorio de los pliegues vocales y, finalmente, determinar la frecuencia del biofeedback en el tratamiento vocal, a partir de una revisión sistemática de los trabajos publicados desde el año 2000 en revistas logopédicas y laringológicas. El análisis de los estudios obtenidos en el proceso de documentación sugiere que el uso del biofeedback electromiográfico y electroglotográfico puede producir cambios perdurables en el tiempo sobre el patrón vibratorio de los pliegues vocales y en la actividad muscular de la producción vocal, de manera que podría ser un instrumento útil añadido a la intervención vocal basadaen la evidencia. Los datos relativos a la frecuencia de uso necesaria de este instrumento, sin embargo, no parecen concluyentes.
Collapse
|
23
|
Geng L, Shan H, Xiao Z, Wang W, Wei M. Voice pathology detection and classification from speech signals and EGG signals based on a multimodal fusion method. BIOMED ENG-BIOMED TE 2021; 66:613-625. [PMID: 34845886 DOI: 10.1515/bmt-2021-0112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 11/12/2021] [Indexed: 11/15/2022]
Abstract
Automatic voice pathology detection and classification plays an important role in the diagnosis and prevention of voice disorders. To accurately describe the pronunciation characteristics of patients with dysarthria and improve the effect of pathological voice detection, this study proposes a pathological voice detection method based on a multi-modal network structure. First, speech signals and electroglottography (EGG) signals are mapped from the time domain to the frequency domain spectrogram via a short-time Fourier transform (STFT). The Mel filter bank acts on the spectrogram to enhance the signal's harmonics and denoise. Second, a pre-trained convolutional neural network (CNN) is used as the backbone network to extract sound state features and vocal cord vibration features from the two signals. To obtain a better classification effect, the fused features are input into the long short-term memory (LSTM) network for voice feature selection and enhancement. The proposed system achieves 95.73% for accuracy with 96.10% F1-score and 96.73% recall using the Saarbrucken Voice Database (SVD); thus, enabling a new method for pathological speech detection.
Collapse
Affiliation(s)
- Lei Geng
- School of Life Sciences, Tiangong University, Tianjin, China.,Tianjin Key Laboratory of Optoelectronic Detection Technology and System, Tianjin, China
| | - Hongfeng Shan
- School of Electronic and Information Engineering, Tiangong University, Tianjin, China.,Tianjin Key Laboratory of Optoelectronic Detection Technology and System, Tianjin, China
| | - Zhitao Xiao
- School of Life Sciences, Tiangong University, Tianjin, China.,Tianjin Key Laboratory of Optoelectronic Detection Technology and System, Tianjin, China
| | - Wei Wang
- Department of Otorhinolaryngology Head and Neck Surgery, Tianjin First Central Hospital, Tianjin, China.,Institute of Otolaryngology of Tianjin, Tianjin, China.,Key Laboratory of Auditory Speech and Balance Medicine, Tianjin, China.,Key Clinical Discipline of Tianjin (Otolaryngology), Tianjin, China.,Otolaryngology Clinical Quality Control Centre, Tianjin, China
| | - Mei Wei
- Department of Otorhinolaryngology Head and Neck Surgery, Tianjin First Central Hospital, Tianjin, China.,Institute of Otolaryngology of Tianjin, Tianjin, China.,Key Laboratory of Auditory Speech and Balance Medicine, Tianjin, China.,Key Clinical Discipline of Tianjin (Otolaryngology), Tianjin, China.,Otolaryngology Clinical Quality Control Centre, Tianjin, China
| |
Collapse
|
24
|
Angelakis E, Kotsani N, Georgaki A. Towards a Singing Voice Multi-Sensor Analysis Tool: System Design, and Assessment Based on Vocal Breathiness. SENSORS 2021; 21:s21238006. [PMID: 34884019 PMCID: PMC8659512 DOI: 10.3390/s21238006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 11/14/2021] [Accepted: 11/19/2021] [Indexed: 11/16/2022]
Abstract
Singing voice is a human quality that requires the precise coordination of numerous kinetic functions and results in a perceptually variable auditory outcome. The use of multi-sensor systems can facilitate the study of correlations between the vocal mechanism kinetic functions and the voice output. This is directly relevant to vocal education, rehabilitation, and prevention of vocal health issues in educators; professionals; and students of singing, music, and acting. In this work, we present the initial design of a modular multi-sensor system for singing voice analysis, and describe its first assessment experiment on the ‘vocal breathiness’ qualitative characteristic. A system case study with two professional singers was conducted, utilizing signals from four sensors. Participants sung a protocol of vocal trials in various degrees of intended vocal breathiness. Their (i) vocal output, (ii) phonatory function, and (iii) respiratory behavior-per-condition were recorded through a condenser microphone (CM), an Electroglottograph (EGG), and thoracic and abdominal respiratory effort transducers (RET), respectively. Participants’ individual respiratory management strategies were studied through qualitative analysis of RET data. Microphone audio samples breathiness degree was rated perceptually, and correlation analysis was performed between sample ratings and parameters extracted from CM and EGG data. Smoothed Cepstral Peak Prominence (CPPS) and vocal folds’ Open Quotient (OQ), as computed with the Howard method (HOQ), demonstrated the higher correlation coefficients, when analyzed individually. DECOM method-computed OQ (DOQ) was also examined. Interestingly, the correlation coefficient of pitch difference between estimates from CM and EGG signals appeared to be (based on the Pearson correlation coefficient) statistically insignificant (a result that warrants investigation in larger populations). The study of multi-variate models revealed even higher correlation coefficients. Models studied were the Acoustic Breathiness Index (ABI) and the proposed multiple regression model CDH (CPPS, DOQ, and HOQ), which was attempted in order to combine analysis results from microphone and EGG signals. The model combination of ABI and the proposed CDH appeared to yield the highest correlation with perceptual breathiness ratings. Study results suggest potential for the use of a completed system version in vocal pedagogy and research, as the case study indicated system practicality, a number of pertinent correlations, and introduced topics with further research possibilities.
Collapse
|
25
|
Fischer J, Özen AC, Ilbey S, Traser L, Echternach M, Richter B, Bock M. Sub-millisecond 2D MRI of the vocal fold oscillation using single-point imaging with rapid encoding. MAGNETIC RESONANCE MATERIALS IN PHYSICS BIOLOGY AND MEDICINE 2021; 35:301-310. [PMID: 34542771 PMCID: PMC8995286 DOI: 10.1007/s10334-021-00959-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 08/06/2021] [Accepted: 09/06/2021] [Indexed: 10/24/2022]
Abstract
OBJECTIVE The slow spatial encoding of MRI has precluded its application to rapid physiologic motion in the past. The purpose of this study is to introduce a new fast acquisition method and to demonstrate feasibility of encoding rapid two-dimensional motion of human vocal folds with sub-millisecond resolution. METHOD In our previous work, we achieved high temporal resolution by applying a rapidly switched phase encoding gradient along the direction of motion. In this work, we extend phase encoding to the second image direction by using single-point imaging with rapid encoding (SPIRE) to image the two-dimensional vocal fold oscillation in the coronal view. Image data were gated using electroglottography (EGG) and motion corrected. An iterative reconstruction with a total variation (TV) constraint was used and the sequence was also simulated using a motion phantom. RESULTS Dynamic images of the vocal folds during phonation at pitches of 150 and 165 Hz were acquired in two volunteers and the periodic motion of the vocal folds at a temporal resolution of about 600 µs was shown. The simulations emphasize the necessity of SPIRE for two-dimensional motion encoding. DISCUSSION SPIRE is a new MRI method to image rapidly oscillating structures and for the first time provides dynamic images of the vocal folds oscillations in the coronal plane.
Collapse
Affiliation(s)
- Johannes Fischer
- Department of Radiology, Medical Physics, University Medical Center Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany.
| | - Ali Caglar Özen
- Department of Radiology, Medical Physics, University Medical Center Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany.,German Consortium for Translational Cancer Research Partner Site Freiburg, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Serhat Ilbey
- Department of Radiology, Medical Physics, University Medical Center Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Louisa Traser
- Freiburg Institute for Musicians' Medicine, Freiburg University Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Ludwig-Maximilians-University, Munich, Germany
| | - Bernhard Richter
- Freiburg Institute for Musicians' Medicine, Freiburg University Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Michael Bock
- Department of Radiology, Medical Physics, University Medical Center Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| |
Collapse
|
26
|
Yılmaz G, Cangi ME, Yelken K. Receiver operating characteristic analysis of acoustic and electroglottographic parameters with different sustained vowels. LOGOP PHONIATR VOCO 2021; 47:284-291. [PMID: 34519593 DOI: 10.1080/14015439.2021.1974934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
OBJECTIVE To examine the power of the parameters obtained from different sustained vowels used in acoustic and electroglottographic (EGG) voice evaluation protocols to discriminate between dysphonic and non-dysphonic voice quality. METHODS Sixty non-dysphonic participants and 30 dysphonic participants were included in the study. In addition to the time domain amplitude and frequency perturbation parameters obtained from the sustained phonation of /ʌ/-/ɛ/-/i/-/u/ vowels, several frequency-domain spectral/cepstral parameters and EGG parameters were evaluated. The classification performance of the acoustic and electroglottographic measures was quantified using analysis and receiver operating characteristic (ROC) curve analysis. RESULTS As a result of ROC analysis, the discriminative diagnostic performance (area under the curve, AUC) of the test for low-vowel (/ʌ/-/ɛ/) phonation was higher than values obtained from high-vowel (/i/-/u/) phonation. For /ʌ/ and /ɛ/ sustained vowels, the parameters exhibiting the highest discrimination were fundamental frequency standard deviation (fo/STD), cepstral peak prominence (CPP), relative average perturbation (RAP), pitch perturbation quotient (PPQ), and jitter percent (JITT). In the EGG parameters, on the other hand, average jitter and periodicity parameters obtained from front vowels (/ɛ/-/i/) were found to have higher AUC values compared to back vowels (/ʌ/-/u/). CONCLUSIONS In acoustic analyses, /ʌ/ and /ɛ/ sustained vowels give the highest diagnostic performance. In the electroglottographic evaluation, on the other hand, /ɛ/ and /i/ vowels, when the position of the tongue is forward, have better classification performance compared to /ʌ/ and /u/ vowels, when the position of the tongue is back.
Collapse
Affiliation(s)
- Göksu Yılmaz
- Department of Speech and Language Therapy, Uskudar University, İstanbul, Turkey
| | - M Emrah Cangi
- Department of Speech and Language Therapy, Uskudar University, İstanbul, Turkey
| | - Kürşat Yelken
- Department of Otolaryngology, Maltepe University Medicine Faculty, İstanbul, Turkey
| |
Collapse
|
27
|
Patel RR, Ternström S. Quantitative and Qualitative Electroglottographic Wave Shape Differences in Children and Adults Using Voice Map-Based Analysis. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:2977-2995. [PMID: 34319772 DOI: 10.1044/2021_jslhr-20-00717] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Purpose The purpose of this study is to identify the extent to which various measurements of contacting parameters differ between children and adults during habitual range and overlap vocal frequency/intensity, using voice map-based assessment of noninvasive electroglottography (EGG). Method EGG voice maps were analyzed from 26 adults (22-45 years) and 22 children (4-8 years) during connected speech and vowel /a/ over the habitual range and the overlap vocal frequency/intensity from the voice range profile task on the vowel /a/. Mean and standard deviations of contact quotient by integration, normalized contacting speed, quotient of speed by integration, and cycle-rate sample entropy were obtained. Group differences were evaluated using the linear mixed model analysis for the habitual range connected speech and the vowel, whereas analysis of covariance was conducted for the overlap vocal frequency/intensity from the voice range profile task. Presence of a "knee" on the EGG wave shape was determined by visual inspection of the presence of convexity along the decontacting slope of the EGG pulse and the presence of the second derivative zero-crossing. Results The contact quotient by integration, normalized contacting speed, quotient of speed by integration, and cycle-rate sample entropy were significantly different in children compared to (a) adult males for habitual range and (b) adult males and adult females for the overlap vocal frequency/intensity. None of the children had a "knee" on the decontacting slope of the EGG slope. Conclusion EGG parameters of contact quotient by integration, normalized contacting speed, quotient of speed by integration, cycle-rate sample entropy, and absence of a "knee" on the decontacting slope characterize the wave shape differences between children and adults, whereas the normalized contacting speed, quotient of speed by integration, cycle-rate sample entropy, and presence of a "knee" on the downward pulse slope characterize the wave shape differences between adult males and adult females. Supplemental Material https://doi.org/10.23641/asha.15057345.
Collapse
Affiliation(s)
- Rita R Patel
- Department of Speech, Language and Hearing Sciences, Indiana University Bloomington
| | - Sten Ternström
- Division of Speech, Music, and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden
| |
Collapse
|
28
|
Analysis of localized bioimpedance from healthy young adults during activities of the vocal folds using Cole-impedance model representation. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2021.102665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
29
|
Herbst CT. Performance Evaluation of Subharmonic-to-Harmonic Ratio (SHR) Computation. J Voice 2021; 35:365-375. [DOI: 10.1016/j.jvoice.2019.11.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Revised: 11/09/2019] [Accepted: 11/11/2019] [Indexed: 10/24/2022]
|
30
|
Hirosaki M, Kanazawa T, Komazawa D, Konomi U, Sakaguchi Y, Katori Y, Watanabe Y. Predominant Vertical Location of Benign Vocal Fold Lesions by Sex and Music Genre: Implication for Pathogenesis. Laryngoscope 2021; 131:E2284-E2291. [PMID: 33421134 DOI: 10.1002/lary.29378] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Revised: 12/04/2020] [Accepted: 12/24/2020] [Indexed: 11/09/2022]
Abstract
OBJECTIVES/HYPOTHESIS Vertical locations of vocal fold mucosal lesions (VFMLs) vary along the free edge. As the vertical contact area of vocal folds (VFs) depends on the vocal register, lesions may occur in the contact area of more frequently used vocal registers. This study investigated the cause of location variations by comparing the vertical sites of VFMLs in singers of both sexes with different music genres. STUDY DESIGN Retrospective review. METHODS Sixty professional classical and rock singers (11 male classical [M-classical], 22 male rock [M-rock], 13 female classical [F-classical], and 14 female rock [F-rock] singers) who underwent microlaryngeal surgery for VF polyps and nodules and their 108 lesions were enrolled. The VF free edge was vertically divided into three equal parts and classified into the following four lesion sites: upper, middle, lower, and multiple sites. RESULTS Upper lesions were most common among F-classical singers (73.9%), whereas lower lesions were most common among M-classical (90.0%) and M-rock (60.6%) singers. Among lesions localized to a single site, lower lesions were most common among F-rock singers (37.0%). F-classical singers had significantly more upper lesions than the other groups (P < .001). M-classical singers had significantly more lower lesions than female singers of any genre (P < .001). CONCLUSION Upper lesions were most common among F-classical singers who mostly used the head voice. Lower lesions were most common among singers who mainly used the modal voice. This study suggests that sex, the dominant vocal register used for singing, and mechanical stress on VFs influence the vertical site of VFMLs. LEVEL OF EVIDENCE 4 Laryngoscope, 131:E2284-E2291, 2021.
Collapse
Affiliation(s)
- Mayu Hirosaki
- Tokyo Voice Center, International University of Health and Welfare, Tokyo, Japan.,Department of Otolaryngology-Head and Neck Surgery, Tohoku University Graduate School of Medicine, Miyagi, Japan
| | - Takeharu Kanazawa
- Tokyo Voice Center, International University of Health and Welfare, Tokyo, Japan.,Department of Otolaryngology-Head and Neck Surgery, Jichi Medical University, Tochigi, Japan
| | - Daigo Komazawa
- Tokyo Voice Center, International University of Health and Welfare, Tokyo, Japan.,AKASAKA Voice Health Center, Tokyo, Japan
| | - Ujimoto Konomi
- Tokyo Voice Center, International University of Health and Welfare, Tokyo, Japan.,Voice and Dizziness Clinic Futakotamagawa Otolaryngology, Tokyo, Japan
| | - Yu Sakaguchi
- Tokyo Voice Center, International University of Health and Welfare, Tokyo, Japan
| | - Yukio Katori
- Department of Otolaryngology-Head and Neck Surgery, Tohoku University Graduate School of Medicine, Miyagi, Japan
| | - Yusuke Watanabe
- Tokyo Voice Center, International University of Health and Welfare, Tokyo, Japan
| |
Collapse
|
31
|
Paroni A, Henrich Bernardoni N, Savariaux C, Lœvenbruck H, Calabrese P, Pellegrini T, Mouysset S, Gerber S. Vocal drum sounds in human beatboxing: An acoustic and articulatory exploration using electromagnetic articulography. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:191. [PMID: 33514144 DOI: 10.1121/10.0002921] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Accepted: 11/24/2020] [Indexed: 06/12/2023]
Abstract
Acoustic characteristics, lingual and labial articulatory dynamics, and ventilatory behaviors were studied on a beatboxer producing twelve drum sounds belonging to five main categories of his repertoire (kick, snare, hi-hat, rimshot, cymbal). Various types of experimental data were collected synchronously (respiratory inductance plethysmography, electroglottography, electromagnetic articulography, and acoustic recording). Automatic unsupervised classification was successfully applied on acoustic data with t-SNE spectral clustering technique. A cluster purity value of 94% was achieved, showing that each sound has a specific acoustic signature. Acoustical intensity of sounds produced with the humming technique was found to be significantly lower than their non-humming counterparts. For these sounds, a dissociation between articulation and breathing was observed. Overall, a wide range of articulatory gestures was observed, some of which were non-linguistic. The tongue was systematically involved in the articulation of the explored beatboxing sounds, either as the main articulator or as accompanying the lip dynamics. Two pulmonic and three non-pulmonic airstream mechanisms were identified. Ejectives were found in the production of all the sounds with bilabial occlusion or alveolar occlusion with egressive airstream. A phonetic annotation using the IPA alphabet was performed, highlighting the complexity of such sound production and the limits of speech-based annotation.
Collapse
Affiliation(s)
- Annalisa Paroni
- Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, F-38000 Grenoble, France
| | | | | | - Hélène Lœvenbruck
- Univ. Grenoble Alpes, Univ. Savoie Mont-Blanc, CNRS, LPNC, F-38000 Grenoble, France
| | - Pascale Calabrese
- Univ. Grenoble Alpes, CNRS, Grenoble INP, TIMC-IMAG, F-38000 Grenoble, France
| | | | | | - Silvain Gerber
- Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, F-38000 Grenoble, France
| |
Collapse
|
32
|
Evaluation of the Electroglottographic Signal Variability in Organic and Functional Dysphonia. J Voice 2020; 36:881.e5-881.e16. [DOI: 10.1016/j.jvoice.2020.09.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Revised: 09/07/2020] [Accepted: 09/10/2020] [Indexed: 11/18/2022]
|
33
|
Lã FM, Ternström S. Flow ball-assisted voice training: Immediate effects on vocal fold contacting. Biomed Signal Process Control 2020. [DOI: 10.1016/j.bspc.2020.102064] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
34
|
Hosbach-Cannon CJ, Lowell SY, Colton RH, Kelley RT, Bao X. Assessment of Tongue Position and Laryngeal Height in Two Professional Voice Populations. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2020; 63:109-124. [PMID: 31944876 DOI: 10.1044/2019_jslhr-19-00164] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Purpose To advance our current knowledge of singer physiology by using ultrasonography in combination with acoustic measures to compare physiological differences between musical theater (MT) and opera (OP) singers under controlled phonation conditions. Primary objectives addressed in this study were (a) to determine if differences in hyolaryngeal and vocal fold contact dynamics occur between two professional voice populations (MT and OP) during singing tasks and (b) to determine if differences occur between MT and OP singers in oral configuration and associated acoustic resonance during singing tasks. Method Twenty-one singers (10 MT and 11 OP) were included. All participants were currently enrolled in a music program. Experimental procedures consisted of sustained phonation on the vowels /i/ and /ɑ/ during both a low-pitch task and a high-pitch task. Measures of hyolaryngeal elevation, tongue height, and tongue advancement were assessed using ultrasonography. Vocal fold contact dynamics were measured using electroglottography. Simultaneous acoustic recordings were obtained during all ultrasonography procedures for analysis of the first two formant frequencies. Results Significant oral configuration differences, reflected by measures of tongue height and tongue advancement, were seen between groups. Measures of acoustic resonance also showed significant differences between groups during specific tasks. Both singer groups significantly raised their hyoid position when singing high-pitched vowels, but hyoid elevation was not statistically different between groups. Likewise, vocal fold contact dynamics did not significantly differentiate the two singer groups. Conclusions These findings suggest that, under controlled phonation conditions, MT singers alter their oral configuration and achieve differing resultant formants as compared with OP singers. Because singers are at a high risk of developing a voice disorder, understanding how these two groups of singers adjust their vocal tract configuration during their specific singing genre may help to identify risky vocal behavior and provide a basis for prevention of voice disorders.
Collapse
Affiliation(s)
| | - Soren Y Lowell
- Department of Communication Sciences and Disorders, Syracuse University, NY
| | - Raymond H Colton
- Department of Communication Sciences and Disorders, Syracuse University, NY
| | - Richard T Kelley
- Department of Otolaryngology, Upstate Medical University, Syracuse, NY
| | - Xue Bao
- Department of Speech-Language Pathology, MGH-IHP, Boston, MA
| |
Collapse
|
35
|
Turkmen HI, Karsligil ME. Advanced computing solutions for analysis of laryngeal disorders. Med Biol Eng Comput 2019; 57:2535-2552. [DOI: 10.1007/s11517-019-02031-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Accepted: 08/13/2019] [Indexed: 11/29/2022]
|
36
|
Ning LH. The effects of age and pitch level on electroglottographic measures during sustained phonation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:640. [PMID: 31370629 DOI: 10.1121/1.5119127] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Accepted: 07/03/2019] [Indexed: 06/10/2023]
Abstract
The aim of the present study was to use electroglottography (EGG) to explore the effects of age and pitch level on sustained vowel phonation. Thirty female individuals (10 young, 10 middle-aged, and 10 older speakers) without voice disorders or training in singing participated in this study. Eight EGG parameters were measured during sustained vowel production with a high, mid, or low pitch: fundamental frequency, contact quotient, contacting-time quotient, decontacting-time quotient, speed quotient with a midslope criterion (SQ-mid), jitter, shimmer, and the harmonics-to-noise ratio. Age was found to be a significant factor in fundamental frequency, contact quotient, contacting-time quotient, decontacting-time quotient, and SQ-mid. With increasing age, the mean fundamental frequency decreased while the contact quotient increased. The middle-aged and older speakers had more asymmetrical vocal fold vibratory patterns than the young speakers. As for pitch level, the high pitch had a significantly less decontacting-time quotient and greater SQ-mid than low and mid pitches. The lack of significant interaction between age and pitch level indicates that the effects of age and pitch level could be additive. Finally, the discriminant analyses show that contact quotient is an important factor in predicting the age of a voice.
Collapse
Affiliation(s)
- Li-Hsin Ning
- Department of English, National Taiwan Normal University, 162 Heping East Road, Daan District, Taipei City 106, Taiwan
| |
Collapse
|
37
|
Ternström S. Normalized time-domain parameters for electroglottographic waveforms. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:EL65. [PMID: 31370590 DOI: 10.1121/1.5117174] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 06/28/2019] [Indexed: 06/10/2023]
Abstract
The electroglottographic waveform is of interest for characterizing phonation non-invasively. Existing parameterizations tend to give disparate results because they rely on somewhat arbitrary thresholds and/or contacting events. It is shown that neither are needed for formulating a normalized contact quotient and a normalized peak derivative. A heuristic combination of the two resolves also the ambiguity of a moderate contact quotient, with regard to vocal fold contacting being firm versus weak or absent. As preliminaries, schemes for electroglottography signal preconditioning and time-domain period detection are described that improve somewhat on similar methods. The algorithms are simple and compute quickly.
Collapse
Affiliation(s)
- Sten Ternström
- Department of Speech, Music and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm,
| |
Collapse
|