1
|
Naghibolhosseini M, Henry TM, Zayernouri M, Zacharias SRC, Deliyski DD. Supraglottic Laryngeal Maneuvers in Adductor Laryngeal Dystonia During Connected Speech. J Voice 2024:S0892-1997(24)00257-1. [PMID: 39217084 DOI: 10.1016/j.jvoice.2024.08.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Revised: 08/05/2024] [Accepted: 08/06/2024] [Indexed: 09/04/2024]
Abstract
OBJECTIVE Adductor laryngeal dystonia (AdLD) disrupts fine motor movements of vocal folds during speech, resulting in a strained, broken, and strangled voice. Laryngeal high-speed videoendoscopy (HSV) in connected speech enables the direct visualization of detailed laryngeal dynamics, hence, it can be effectively used to study AdLD. The current study utilizes HSV to investigate supraglottic laryngeal tissue maneuvers obstructing the view of the vocal folds, in AdLD and normophonic speakers during connected speech. Characterizing the laryngeal maneuvers in these groups can facilitate a deeper understanding of the normophonic voice physiology and AdLD voice pathophysiology. METHODS HSV data were obtained from six normophonic speakers and six patients with AdLD during production of connected speech. Three experienced raters visually analyzed the data to determine laryngeal tissues leading to obstructions of vocal folds in HSV images. The raters recorded the duration of each obstruction and indicated the specific tissue(s) leading to the obstruction. After the completion of their individual visual analysis, the raters came to consensus about their observations and measurements. RESULTS Statistical analysis indicated that AdLD patients exhibited higher occurrences of vocal fold obstructions and longer durations of obstructions compared with the normophonic group. Similar obstruction types were found in both groups, with the epiglottis being the primary site of obstruction for both. Participants with AdLD displayed significantly elevated occurrences of sphincteric compression resulting in vocal fold obstruction. CONCLUSION HSV can be used to study the movements of laryngeal tissues in detail during connected speech. The analysis of supraglottic laryngeal tissue dynamics in speech can help us characterize the AdLD pathophysiology. The study's findings regarding the tissues implicated in obstructions may potentially inform the development of patient-specific therapeutic strategies targeting individual control over specific laryngeal muscles during phonation and speech production.
Collapse
Affiliation(s)
- Maryam Naghibolhosseini
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan.
| | - Trent M Henry
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | - Mohsen Zayernouri
- Department of Mechanical Engineering, and Statistics and Probability, Michigan State University, East Lansing, Michigan
| | - Stephanie R C Zacharias
- Head and Neck Regenerative Medicine Program, Mayo Clinic, Scottsdale, Arizona; Department of Otolaryngology-Head and Neck Surgery, Mayo Clinic, Phoenix, Arizona
| | - Dimitar D Deliyski
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| |
Collapse
|
2
|
Yousef AM, Deliyski DD, Zacharias SRC, Naghibolhosseini M. Detection of Vocal Fold Image Obstructions in High-Speed Videoendoscopy During Connected Speech in Adductor Spasmodic Dysphonia: A Convolutional Neural Networks Approach. J Voice 2024; 38:951-962. [PMID: 35304042 PMCID: PMC9474736 DOI: 10.1016/j.jvoice.2022.01.028] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 01/30/2022] [Accepted: 01/30/2022] [Indexed: 01/10/2023]
Abstract
OBJECTIVE Adductor spasmodic dysphonia (AdSD) is a neurogenic voice disorder, affecting the intrinsic laryngeal muscle control. AdSD leads to involuntary laryngeal spasms and only reveals during connected speech. Laryngeal high-speed videoendoscopy (HSV) coupled with a flexible fiberoptic endoscope provides a unique opportunity to study voice production and visualize the vocal fold vibrations in AdSD during speech. The goal of this study is to automatically detect instances during which the image of the vocal folds is optically obstructed in HSV recordings obtained during connected speech. METHODS HSV data were recorded from vocally normal adults and patients with AdSD during reading of the "Rainbow Passage", six CAPE-V sentences, and production of the vowel /i/. A convolutional neural network was developed and trained as a classifier to detect obstructed/unobstructed vocal folds in HSV frames. Manually labelled data were used for training, validating, and testing of the network. Moreover, a comprehensive robustness evaluation was conducted to compare the performance of the developed classifier and visual analysis of HSV data. RESULTS The developed convolutional neural network was able to automatically detect the vocal fold obstructions in HSV data in vocally normal participants and AdSD patients. The trained network was tested successfully and showed an overall classification accuracy of 94.18% on the testing dataset. The robustness evaluation showed an average overall accuracy of 94.81% on a massive number of HSV frames demonstrating the high robustness of the introduced technique while keeping a high level of accuracy. CONCLUSIONS The proposed approach can be used for efficient analysis of HSV data to study laryngeal maneuvers in patients with AdSD during connected speech. Additionally, this method will facilitate development of vocal fold vibratory measures for HSV frames with an unobstructed view of the vocal folds. Indicating parts of connected speech that provide an unobstructed view of the vocal folds can be used for developing optimal passages for precise HSV examination during connected speech and subject-specific clinical voice assessment protocols.
Collapse
Affiliation(s)
- Ahmed M Yousef
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | - Dimitar D Deliyski
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | - Stephanie R C Zacharias
- Head and Neck Regenerative Medicine Program, Mayo Clinic, Scottsdale, Arizona; Department of Otolaryngology-Head and Neck Surgery, Mayo Clinic, Phoenix, Arizona
| | - Maryam Naghibolhosseini
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan.
| |
Collapse
|
3
|
Stager S, Maryn Y. A Retrospective Study of Acoustic Measures of Glottal Stop Production to Assess Vocal Function in Unilateral Vocal Fold Paresis/Paralysis Patients. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024; 67:1643-1659. [PMID: 38683058 DOI: 10.1044/2024_jslhr-23-00576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/01/2024]
Abstract
PURPOSE The aim of this study was to determine (a) diagnostic accuracy of acoustic measures of glottal stop production (GSP; intensity differences, slopes, complete voicing cessation) to distinguish between unilateral vocal fold paresis/paralysis (UVFP) patients and controls; (b) if acoustic measures of GSP significantly correlated with an acoustic measure of voice disorder severity, acoustic voice quality index (AVQI); and (c) if acoustic measures from another type of voicing cessation, voiceless consonant production, also significantly differed between groups. METHOD Ninety-seven patients with unilateral paresis/paralysis and 35 controls with normal laryngostroboscopic signs produced two sets of five repeated [i] and four repeated [isi]. Tokens were randomized by type between groups and analyzed blinded using a customized Praat program that computed intensity differences and slopes between vowel maxima and glottal stop minima for inter-[i] tokens and vowel maxima and voiceless consonant minima for intra-[isi] tokens. The number of voicing cessations for inter-[i] tokens was obtained. RESULTS Onset and offset intensity differences and number of voicing cessations from inter-[i] tokens had the greatest areas under the curve (.854, .856, and .835, respectively). Correlation coefficients were significant (p < .01) between AVQI and all GSP acoustic measures with weak/medium effect sizes. No significant differences were found between controls and participants with UVFP for acoustic measures from intra-[isi]. CONCLUSIONS Acoustic GSP measures demonstrated good diagnostic accuracy and some relationship to severity of voice disorder. No significant differences in acoustic measures for medial voiceless fricative consonants between controls and participants with UVFP suggested that voicing cessation for voiceless fricatives differs from voicing cessation for GSP.
Collapse
Affiliation(s)
- Sheila Stager
- Division of Otolaryngology, Department of Surgery, GW Medical Faculty Associates Voice Treatment Center, Washington, DC
| | - Youri Maryn
- ENT Department, GZA Sint-Augustinus, European Institute for ORL-HNS, Antwerp, Belgium
- Department of Rehabilitation Sciences, Faculty of Medicine and Health Sciences, University of Ghent, Belgium
- School of Speech Therapy, Faculty of Psychology and Educational Sciences, Université Catholique Louvain, Louvain-La-Neuve, Belgium
- Department of Speech-Language Therapy and Audiology, University College Ghent, Belgium
- Phonanium, Lokeren, Belgium
| |
Collapse
|
4
|
Malinowski J, Pietruszewska W, Kowalczyk M, Niebudek-Bogusz E. Value of high-speed videoendoscopy as an auxiliary tool in differentiation of benign and malignant unilateral vocal lesions. J Cancer Res Clin Oncol 2024; 150:10. [PMID: 38216796 PMCID: PMC10786956 DOI: 10.1007/s00432-023-05543-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 12/13/2023] [Indexed: 01/14/2024]
Abstract
PURPOSE The study aimed to assess the relevance of objective vibratory parameters derived from high-speed videolaryngoscopy (HSV) as a supporting tool, to assist clinicians in establishing the initial diagnosis of benign and malignant glottal organic lesions. METHODS The HSV examinations were conducted in 175 subjects: 50 normophonic, 85 subjects with benign vocal fold lesions, and 40 with early glottic cancer; organic lesions were confirmed by histopathologic examination. The parameters, derived from HSV kymography: amplitude, symmetry, and glottal dynamic characteristics, were compared statistically between the groups with the following ROC analysis. RESULTS Among 14 calculated parameters, 10 differed significantly between the groups. Four of them, the average resultant amplitude of the involved vocal fold (AmpInvolvedAvg), average amplitude asymmetry for the whole glottis and its middle third part (AmplAsymAvg; AmplAsymAvg_2/3), and absolute average phase difference (AbsPhaseDiffAvg), showed significant differences between benign and malignant lesions. Amplitude values were decreasing, while asymmetry and phase difference values were increasing with the risk of malignancy. In ROC analysis, the highest AUC was observed for AmpAsymAvg (0.719; p < 0.0001), and next in order was AmpInvolvedAvg (0.70; p = 0.0002). CONCLUSION The golden standard in the diagnosis of organic lesions of glottis remains clinical examination with videolaryngoscopy, confirmed by histopathological examination. Our results showed that measurements of amplitude, asymmetry, and phase of vibrations in malignant vocal fold masses deteriorate significantly in comparison to benign vocal lesions. High-speed videolaryngoscopy could aid their preliminary differentiation noninvasively before histopathological examination; however, further research on larger groups is needed.
Collapse
Affiliation(s)
- Jakub Malinowski
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, Lodz, Poland.
| | - Wioletta Pietruszewska
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, Lodz, Poland
| | - Magdalena Kowalczyk
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, Lodz, Poland
| | - Ewa Niebudek-Bogusz
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, Lodz, Poland
| |
Collapse
|
5
|
Yousef AM, Deliyski DD, Zayernouri M, Zacharias SRC, Naghibolhosseini M. Deep Learning-Based Analysis of Glottal Attack and Offset Times in Adductor Laryngeal Dystonia. J Voice 2023:S0892-1997(23)00319-3. [PMID: 37977969 PMCID: PMC11093885 DOI: 10.1016/j.jvoice.2023.10.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 10/05/2023] [Accepted: 10/06/2023] [Indexed: 11/19/2023]
Abstract
OBJECTIVE Diagnosis of adductor laryngeal dystonia (AdLD) is challenging as it mimics voice features of other voice disorders. This could lead to misdiagnosis (or delayed diagnosis) and ineffective treatments of AdLD. This paper develops automated measurements of glottal attack time (GAT) and glottal offset time (GOT) from high-speed videoendoscopy (HSV) in connected speech as objective measures that can potentially facilitate the diagnosis of this disorder in the future. METHODS HSV data were recorded from vocally normal adults and patients with AdLD during the reading of the "Rainbow Passage" and six CAPE-V (Consensus Auditory-Perceptual Evaluation of Voice) sentences. A deep learning framework was designed and trained to segment the glottal area and detect the vocal fold edges in the HSV dataset. This automated framework allowed us to automatically measure and quantify the GATs and GOTs for the participants. Accordingly, a comparison was held between the obtained measurements among vocally normal speakers and those with AdLD. RESULTS The automated framework was successfully developed and able to accurately segment the glottal area/edges. The precise automated measurements of GAT and GOT revealed minor, nonsignificant differences compared to the results of manual analysis-showing a strong correlation between the measures by the automated and manual methods. The results showed significant differences in the GAT values between the vocally normal subjects and AdLD patients, with larger variability in both the GAT and GOT measures in the AdLD group. CONCLUSIONS The developed automated approach for GAT and GOT measurement can be valuable in clinical practice. These quantitative measurements can be used as meaningful biomarkers of the impaired vocal function in AdLD and help its differential diagnosis in the future.
Collapse
Affiliation(s)
- Ahmed M Yousef
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | - Dimitar D Deliyski
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | - Mohsen Zayernouri
- Departments of Mechanical Engineering & Statistics and Probability, Michigan State University, East Lansing, Michigan
| | - Stephanie R C Zacharias
- Head and Neck Regenerative Medicine Program, Mayo Clinic, Scottsdale, Arizona; Department of Otolaryngology-Head and Neck Surgery, Mayo Clinic, Phoenix, Arizona
| | - Maryam Naghibolhosseini
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan.
| |
Collapse
|
6
|
Tsilivigkos C, Athanasopoulos M, Micco RD, Giotakis A, Mastronikolis NS, Mulita F, Verras GI, Maroulis I, Giotakis E. Deep Learning Techniques and Imaging in Otorhinolaryngology-A State-of-the-Art Review. J Clin Med 2023; 12:6973. [PMID: 38002588 PMCID: PMC10672270 DOI: 10.3390/jcm12226973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Revised: 11/02/2023] [Accepted: 11/06/2023] [Indexed: 11/26/2023] Open
Abstract
Over the last decades, the field of medicine has witnessed significant progress in artificial intelligence (AI), the Internet of Medical Things (IoMT), and deep learning (DL) systems. Otorhinolaryngology, and imaging in its various subspecialties, has not remained untouched by this transformative trend. As the medical landscape evolves, the integration of these technologies becomes imperative in augmenting patient care, fostering innovation, and actively participating in the ever-evolving synergy between computer vision techniques in otorhinolaryngology and AI. To that end, we conducted a thorough search on MEDLINE for papers published until June 2023, utilizing the keywords 'otorhinolaryngology', 'imaging', 'computer vision', 'artificial intelligence', and 'deep learning', and at the same time conducted manual searching in the references section of the articles included in our manuscript. Our search culminated in the retrieval of 121 related articles, which were subsequently subdivided into the following categories: imaging in head and neck, otology, and rhinology. Our objective is to provide a comprehensive introduction to this burgeoning field, tailored for both experienced specialists and aspiring residents in the domain of deep learning algorithms in imaging techniques in otorhinolaryngology.
Collapse
Affiliation(s)
- Christos Tsilivigkos
- 1st Department of Otolaryngology, National and Kapodistrian University of Athens, Hippocrateion Hospital, 115 27 Athens, Greece; (A.G.); (E.G.)
| | - Michail Athanasopoulos
- Department of Otolaryngology, University Hospital of Patras, 265 04 Patras, Greece; (M.A.); (N.S.M.)
| | - Riccardo di Micco
- Department of Otolaryngology and Head and Neck Surgery, Medical School of Hannover, 30625 Hannover, Germany;
| | - Aris Giotakis
- 1st Department of Otolaryngology, National and Kapodistrian University of Athens, Hippocrateion Hospital, 115 27 Athens, Greece; (A.G.); (E.G.)
| | - Nicholas S. Mastronikolis
- Department of Otolaryngology, University Hospital of Patras, 265 04 Patras, Greece; (M.A.); (N.S.M.)
| | - Francesk Mulita
- Department of Surgery, University Hospital of Patras, 265 04 Patras, Greece; (G.-I.V.); (I.M.)
| | - Georgios-Ioannis Verras
- Department of Surgery, University Hospital of Patras, 265 04 Patras, Greece; (G.-I.V.); (I.M.)
| | - Ioannis Maroulis
- Department of Surgery, University Hospital of Patras, 265 04 Patras, Greece; (G.-I.V.); (I.M.)
| | - Evangelos Giotakis
- 1st Department of Otolaryngology, National and Kapodistrian University of Athens, Hippocrateion Hospital, 115 27 Athens, Greece; (A.G.); (E.G.)
| |
Collapse
|
7
|
Naghibolhosseini M, Zacharias SRC, Zenas S, Levesque F, Deliyski DD. Laryngeal Imaging Study of Glottal Attack/Offset Time in Adductor Spasmodic Dysphonia during Connected Speech. APPLIED SCIENCES (BASEL, SWITZERLAND) 2023; 13:2979. [PMID: 37034315 PMCID: PMC10077958 DOI: 10.3390/app13052979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/12/2023]
Abstract
Adductor spasmodic dysphonia (AdSD) disrupts laryngeal muscle control during speech and, therefore, affects the onset and offset of phonation. In this study, the goal is to use laryngeal high-speed videoendoscopy (HSV) to measure the glottal attack time (GAT) and glottal offset time (GOT) during connected speech for normophonic (vocally normal) and AdSD voices. A monochrome HSV system was used to record readings of six CAPE-V sentences and part of the "Rainbow Passage" from the participants. Three raters visually analyzed the HSV data using a playback software to measure the GAT and GOT. The results show that the GAT was greater in the AdSD group than in the normophonic group; however, the clinical significance of the amount of this difference needs to be studied further. More variability was observed in both GATs and GOTs of the disorder group. Additionally, the GAT and GOT time series were found to be nonstationary for the AdSD group while they were stationary for the normophonic voices. This study shows that the GAT and GOT measures can be potentially used as objective markers to characterize AdSD. The findings will potentially help in the development of standardized measures for voice evaluation and the accurate diagnosis of AdSD.
Collapse
Affiliation(s)
- Maryam Naghibolhosseini
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, MI 48824, USA
- Correspondence:
| | - Stephanie R. C. Zacharias
- Head and Neck Regenerative Medicine Program, Mayo Clinic, Scottsdale, AZ 85259, USA
- Department of Otolaryngology-Head and Neck Surgery, Mayo Clinic, Phoenix, AZ 85054, USA
| | - Sarah Zenas
- Lyman Briggs College, Michigan State University, East Lansing, MI 48825, USA
| | - Farrah Levesque
- Warrington College of Business, University of Florida, Gainesville, FL 32611, USA
| | - Dimitar D. Deliyski
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
8
|
Sakthivel S, Prabhu V. Optimal Deep Learning-Based Vocal Fold Disorder Detection and Classification Model on High-Speed Video Endoscopy. JOURNAL OF HEALTHCARE ENGINEERING 2022; 2022:4248938. [PMID: 36353680 PMCID: PMC9640237 DOI: 10.1155/2022/4248938] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 09/04/2022] [Accepted: 09/21/2022] [Indexed: 08/08/2023]
Abstract
The use of high-speed video-endoscopy (HSV) in the study of phonatory processes linked to speech needs the precise identification of vocal fold boundaries at the time of vibration. The HSV is a unique laryngeal imaging technology that captures intracycle vocal fold vibrations at a higher frame rate without the need for auditory inputs. The HSV is also effective in identifying the vibrational characteristics of the vocal folds with an increased temporal resolution during retained phonation and flowing speech. Clinically significant vocal fold vibratory characteristics in running speech can be retrieved by creating automated algorithms for extracting HSV-based vocal fold vibration data. The best deep learning-based diagnosis and categorization of vocal fold abnormalities is due to the usage of HSV (ODL-VFDDC). The suggested ODL-VFDDC technique starts with temporal segmentation and motion correction to identify vocalized regions from the HSV recording and gathers the position of movable vocal folds across frames. The attributes gathered are fed into the deep belief network (DBN) model. Furthermore, the agricultural fertility algorithm (AFA) is used to optimize the hyperparameter tuning of the DBN model, which improves classification results. In terms of vocal fold disorder classification, the testing results demonstrated that the ODL-VFDDC technique beats the other existing methodologies. The farmland fertility algorithm (FFA) is then used to accurately determine the glottal limits of vibrating vocal folds. The suggested method has successfully tracked the speech fold boundaries across frames with minimum processing cost and high resilience to picture noise. This method gives a way to look at how the vocal folds move during a connected speech that is completely done by itself.
Collapse
Affiliation(s)
- S. Sakthivel
- Department of Computer Science and Engineering, Vel Tech High Tech Dr. Rangarajan Dr. Sakunthala Engineering College, Avadi, Chennai, India
| | - V. Prabhu
- Department of Electronics and Communication Engineering, Vel Tech Multi Tech Dr. Rangarajan Dr. Sakunthala Engineering College, Chennai, India
| |
Collapse
|
9
|
Yousef AM, Deliyski DD, Zacharias SRC, Naghibolhosseini M. Deep-Learning-Based Representation of Vocal Fold Dynamics in Adductor Spasmodic Dysphonia during Connected Speech in High-Speed Videoendoscopy. J Voice 2022:S0892-1997(22)00263-6. [PMID: 36154973 PMCID: PMC10030376 DOI: 10.1016/j.jvoice.2022.08.022] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Revised: 08/14/2022] [Accepted: 08/17/2022] [Indexed: 11/28/2022]
Abstract
OBJECTIVE Adductor spasmodic dysphonia (AdSD) is a neurogenic dystonia, which causes spasms of the laryngeal muscles. This disorder mainly affects production of connected speech. To understand how AdSD affects vocal fold (VF) movements and hence, the speech signal, it is necessary to study VF kinematics during the running speech. This paper introduces an automated method for analysis of VF vibrations in AdSD using laryngeal high-speed videoendoscopy (HSV) in running speech. METHODS A monochrome HSV system was used to obtain video recordings from vocally normal individuals and AdSD patients during production of the six CAPE-V sentences and the "Rainbow Passage." A deep neural network was designed based on the UNet architecture. The network was developed for glottal area segmentation in HSV data providing a tool for quantitative analysis of VF vibrations in both norm and AdSD. The network was trained and validated using the manually labeled HSV frames. After training the network, the segmentation quality was quantitatively evaluated against visual analysis results of a test dataset including segregated HSV frames and a short sequence of VF vibrations in consecutive frames. RESULTS The developed convolutional network was successfully trained and demonstrated an accurate segmentation on the testing dataset with a mean Intersection over Union (IoU) of 0.81 and a mean Boundary-F1 score of 0.93. Moreover, the visual assessment of the automated technique showed an accurate detection of the glottal edges/area in the HSV data even with challenging image quality and excessive laryngeal maneuvers of AdSD patients during the running speech. CONCLUSION The introduced automated approach provides an accurate representation of the glottal edges/area during connected speech in HSV data for norm and AdSD patients. This method facilitates the development of HSV-based measures to quantify VF dynamics in AdSD. Using HSV to automatically analyze VF vibrations in AdSD can allow for understanding AdSD vocal mechanisms and characteristics.
Collapse
Affiliation(s)
- Ahmed M Yousef
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | - Dimitar D Deliyski
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | - Stephanie R C Zacharias
- Head and Neck Regenerative Medicine Program, Mayo Clinic, Scottsdale, Arizona; Department of Otolaryngology-Head and Neck Surgery, Mayo Clinic, Phoenix, Arizona
| | - Maryam Naghibolhosseini
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan.
| |
Collapse
|
10
|
Yousef AM, Deliyski DD, Zacharias SRC, de Alarcon A, Orlikoff RF, Naghibolhosseini M. A Deep Learning Approach for Quantifying Vocal Fold Dynamics During Connected Speech Using Laryngeal High-Speed Videoendoscopy. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:2098-2113. [PMID: 35605603 PMCID: PMC9567340 DOI: 10.1044/2022_jslhr-21-00540] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Revised: 01/30/2022] [Accepted: 02/28/2022] [Indexed: 06/15/2023]
Abstract
PURPOSE Voice disorders are best assessed by examining vocal fold dynamics in connected speech. This can be achieved using flexible laryngeal high-speed videoendoscopy (HSV), which enables us to study vocal fold mechanics with high temporal details. Analysis of vocal fold vibration using HSV requires accurate segmentation of the vocal fold edges. This article presents an automated deep-learning scheme to segment the glottal area in HSV from which the glottal edges are derived during connected speech. METHOD Using a custom-built HSV system, data were obtained from a vocally healthy participant reciting the "Rainbow Passage." A deep neural network was designed for glottal area segmentation in the HSV data. A recently introduced hybrid approach by the authors was utilized as an automated labeling tool to train the network on a set of HSV frames, where the glottis region was automatically annotated during vocal fold vibrations. The network was then tested against manually segmented frames using different metrics, intersection over union (IoU), and Boundary F1 (BF) score, and its performance was assessed on various phonatory events on the HSV sequence. RESULTS The designed network was successfully trained using the hybrid approach, without the need for manual labeling, and tested on the manually labeled data. The performance metrics showed a mean IoU of 0.82 and a mean BF score of 0.96. In addition, the evaluation assessment of the network's performance demonstrated an accurate segmentation of the glottal edges/area even during complex nonstationary phonatory events and when vocal folds were not vibrating, thus overcoming the limitations of the previous hybrid approach that could only be applied to the vibrating vocal folds. CONCLUSIONS The introduced automated scheme guarantees accurate glottis representation in challenging color HSV data with lower image quality and excessive laryngeal maneuvers during all instances of connected speech. This facilitates the future development of HSV-based measures to assess the running vibratory characteristics of the vocal folds in speakers with and without voice disorder. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.19798864.
Collapse
Affiliation(s)
- Ahmed M. Yousef
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing
| | - Dimitar D. Deliyski
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing
| | - Stephanie R. C. Zacharias
- Head and Neck Regenerative Medicine Program, Mayo Clinic, Scottsdale, AZ
- Department of Otolaryngology—Head and Neck Surgery, Mayo Clinic, Phoenix, AZ
| | - Alessandro de Alarcon
- Division of Pediatric Otolaryngology, Cincinnati Children's Hospital Medical Center, OH
- Department of Otolaryngology—Head and Neck Surgery, University of Cincinnati, OH
| | - Robert F. Orlikoff
- College of Allied Health Sciences, East Carolina University, Greenville, NC
| | | |
Collapse
|
11
|
Kopczynski B, Niebudek-Bogusz E, Pietruszewska W, Strumillo P. Segmentation of Glottal Images from High-Speed Videoendoscopy Optimized by Synchronous Acoustic Recordings. SENSORS (BASEL, SWITZERLAND) 2022; 22:1751. [PMID: 35270897 PMCID: PMC8915112 DOI: 10.3390/s22051751] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 02/12/2022] [Accepted: 02/15/2022] [Indexed: 05/17/2023]
Abstract
Laryngeal high-speed videoendoscopy (LHSV) is an imaging technique offering novel visualization quality of the vibratory activity of the vocal folds. However, in most image analysis methods, the interaction of the medical personnel and access to ground truth annotations are required to achieve accurate detection of vocal folds edges. In our fully automatic method, we combine video and acoustic data that are synchronously recorded during the laryngeal endoscopy. We show that the image segmentation algorithm of the glottal area can be optimized by matching the Fourier spectra of the pre-processed video and the spectra of the acoustic recording during the phonation of sustained vowel /i:/. We verify our method on a set of LHSV recordings taken from subjects with normophonic voice and patients with voice disorders due to glottal insufficiency. We show that the computed geometric indices of the glottal area make it possible to discriminate between normal and pathologic voices. The median of the Open Quotient and Minimal Relative Glottal Area values for healthy subjects were 0.69 and 0.06, respectively, while for dysphonic subjects were 1 and 0.35, respectively. We also validate these results using independent phoniatrician experts.
Collapse
Affiliation(s)
- Bartosz Kopczynski
- Institute of Electronics, Lodz University of Technology, 90-924 Lodz, Poland;
| | - Ewa Niebudek-Bogusz
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, 90-001 Lodz, Poland; (E.N.-B.); (W.P.)
| | - Wioletta Pietruszewska
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, 90-001 Lodz, Poland; (E.N.-B.); (W.P.)
| | - Pawel Strumillo
- Institute of Electronics, Lodz University of Technology, 90-924 Lodz, Poland;
| |
Collapse
|
12
|
Kist AM, Gómez P, Dubrovskiy D, Schlegel P, Kunduk M, Echternach M, Patel R, Semmler M, Bohr C, Dürr S, Schützenberger A, Döllinger M. A Deep Learning Enhanced Novel Software Tool for Laryngeal Dynamics Analysis. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:1889-1903. [PMID: 34000199 DOI: 10.1044/2021_jslhr-20-00498] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Purpose High-speed videoendoscopy (HSV) is an emerging, but barely used, endoscopy technique in the clinic to assess and diagnose voice disorders because of the lack of dedicated software to analyze the data. HSV allows to quantify the vocal fold oscillations by segmenting the glottal area. This challenging task has been tackled by various studies; however, the proposed approaches are mostly limited and not suitable for daily clinical routine. Method We developed a user-friendly software in C# that allows the editing, motion correction, segmentation, and quantitative analysis of HSV data. We further provide pretrained deep neural networks for fully automatic glottis segmentation. Results We freely provide our software Glottis Analysis Tools (GAT). Using GAT, we provide a general threshold-based region growing platform that enables the user to analyze data from various sources, such as in vivo recordings, ex vivo recordings, and high-speed footage of artificial vocal folds. Additionally, especially for in vivo recordings, we provide three robust neural networks at various speed and quality settings to allow a fully automatic glottis segmentation needed for application by untrained personnel. GAT further evaluates video and audio data in parallel and is able to extract various features from the video data, among others the glottal area waveform, that is, the changing glottal area over time. In total, GAT provides 79 unique quantitative analysis parameters for video- and audio-based signals. Many of these parameters have already been shown to reflect voice disorders, highlighting the clinical importance and usefulness of the GAT software. Conclusion GAT is a unique tool to process HSV and audio data to determine quantitative, clinically relevant parameters for research, diagnosis, and treatment of laryngeal disorders. Supplemental Material https://doi.org/10.23641/asha.14575533.
Collapse
Affiliation(s)
- Andreas M Kist
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Pablo Gómez
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Denis Dubrovskiy
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Patrick Schlegel
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Melda Kunduk
- Department of Communication Sciences and Disorders, Louisiana State University, Baton Rouge
| | - Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Germany
| | - Rita Patel
- Department of Speech, Language and Hearing Sciences, College of Arts and Sciences, Indiana University, Bloomington
| | - Marion Semmler
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Christopher Bohr
- Klinik und Poliklinik für Hals-Nasen-Ohren-Heilkunde Universitätsklinikum Regensburg, Germany
| | - Stephan Dürr
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| |
Collapse
|
13
|
Yousef AM, Deliyski DD, Zacharias SRC, de Alarcon A, Orlikoff RF, Naghibolhosseini M. A Hybrid Machine-Learning-Based Method for Analytic Representation of the Vocal Fold Edges during Connected Speech. APPLIED SCIENCES-BASEL 2021; 11. [PMID: 33717604 PMCID: PMC7954580 DOI: 10.3390/app11031179] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Investigating the phonatory processes in connected speech from high-speed videoendoscopy (HSV) demands the accurate detection of the vocal fold edges during vibration. The present paper proposes a new spatio-temporal technique to automatically segment vocal fold edges in HSV data during running speech. The HSV data were recorded from a vocally normal adult during a reading of the “Rainbow Passage.” The introduced technique was based on an unsupervised machine-learning (ML) approach combined with an active contour modeling (ACM) technique (also known as a hybrid approach). The hybrid method was implemented to capture the edges of vocal folds on different HSV kymograms, extracted at various cross-sections of vocal folds during vibration. The k-means clustering method, an ML approach, was first applied to cluster the kymograms to identify the clustered glottal area and consequently provided an initialized contour for the ACM. The ACM algorithm was then used to precisely detect the glottal edges of the vibrating vocal folds. The developed algorithm was able to accurately track the vocal fold edges across frames with low computational cost and high robustness against image noise. This algorithm offers a fully automated tool for analyzing the vibratory features of vocal folds in connected speech.
Collapse
Affiliation(s)
- Ahmed M. Yousef
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, MI 48824, USA
| | - Dimitar D. Deliyski
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, MI 48824, USA
| | - Stephanie R. C. Zacharias
- Head and Neck Regenerative Medicine Program, Mayo Clinic, Scottsdale, AZ 85259, and Department of Otolaryngology-Head and Neck Surgery, Mayo Clinic, Phoenix, AZ 85054, USA
| | - Alessandro de Alarcon
- Division of Pediatric Otolaryngology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, and Department of Otolaryngology—Head and Neck Surgery, University of Cincinnati College of Medicine, Cincinnati, OH 45267, USA
| | - Robert F. Orlikoff
- College of Allied Health Sciences, East Carolina University, Greenville, NC 27834, USA
| | - Maryam Naghibolhosseini
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, MI 48824, USA
- Correspondence: ; Tel.: +1-517-884-2256
| |
Collapse
|