1
|
Naghibolhosseini M, Henry TM, Zayernouri M, Zacharias SRC, Deliyski DD. Supraglottic Laryngeal Maneuvers in Adductor Laryngeal Dystonia During Connected Speech. J Voice 2024:S0892-1997(24)00257-1. [PMID: 39217084 DOI: 10.1016/j.jvoice.2024.08.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Revised: 08/05/2024] [Accepted: 08/06/2024] [Indexed: 09/04/2024]
Abstract
OBJECTIVE Adductor laryngeal dystonia (AdLD) disrupts fine motor movements of vocal folds during speech, resulting in a strained, broken, and strangled voice. Laryngeal high-speed videoendoscopy (HSV) in connected speech enables the direct visualization of detailed laryngeal dynamics, hence, it can be effectively used to study AdLD. The current study utilizes HSV to investigate supraglottic laryngeal tissue maneuvers obstructing the view of the vocal folds, in AdLD and normophonic speakers during connected speech. Characterizing the laryngeal maneuvers in these groups can facilitate a deeper understanding of the normophonic voice physiology and AdLD voice pathophysiology. METHODS HSV data were obtained from six normophonic speakers and six patients with AdLD during production of connected speech. Three experienced raters visually analyzed the data to determine laryngeal tissues leading to obstructions of vocal folds in HSV images. The raters recorded the duration of each obstruction and indicated the specific tissue(s) leading to the obstruction. After the completion of their individual visual analysis, the raters came to consensus about their observations and measurements. RESULTS Statistical analysis indicated that AdLD patients exhibited higher occurrences of vocal fold obstructions and longer durations of obstructions compared with the normophonic group. Similar obstruction types were found in both groups, with the epiglottis being the primary site of obstruction for both. Participants with AdLD displayed significantly elevated occurrences of sphincteric compression resulting in vocal fold obstruction. CONCLUSION HSV can be used to study the movements of laryngeal tissues in detail during connected speech. The analysis of supraglottic laryngeal tissue dynamics in speech can help us characterize the AdLD pathophysiology. The study's findings regarding the tissues implicated in obstructions may potentially inform the development of patient-specific therapeutic strategies targeting individual control over specific laryngeal muscles during phonation and speech production.
Collapse
Affiliation(s)
- Maryam Naghibolhosseini
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan.
| | - Trent M Henry
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | - Mohsen Zayernouri
- Department of Mechanical Engineering, and Statistics and Probability, Michigan State University, East Lansing, Michigan
| | - Stephanie R C Zacharias
- Head and Neck Regenerative Medicine Program, Mayo Clinic, Scottsdale, Arizona; Department of Otolaryngology-Head and Neck Surgery, Mayo Clinic, Phoenix, Arizona
| | - Dimitar D Deliyski
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| |
Collapse
|
2
|
Yousef AM, Deliyski DD, Zacharias SRC, Naghibolhosseini M. Detection of Vocal Fold Image Obstructions in High-Speed Videoendoscopy During Connected Speech in Adductor Spasmodic Dysphonia: A Convolutional Neural Networks Approach. J Voice 2024; 38:951-962. [PMID: 35304042 PMCID: PMC9474736 DOI: 10.1016/j.jvoice.2022.01.028] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 01/30/2022] [Accepted: 01/30/2022] [Indexed: 01/10/2023]
Abstract
OBJECTIVE Adductor spasmodic dysphonia (AdSD) is a neurogenic voice disorder, affecting the intrinsic laryngeal muscle control. AdSD leads to involuntary laryngeal spasms and only reveals during connected speech. Laryngeal high-speed videoendoscopy (HSV) coupled with a flexible fiberoptic endoscope provides a unique opportunity to study voice production and visualize the vocal fold vibrations in AdSD during speech. The goal of this study is to automatically detect instances during which the image of the vocal folds is optically obstructed in HSV recordings obtained during connected speech. METHODS HSV data were recorded from vocally normal adults and patients with AdSD during reading of the "Rainbow Passage", six CAPE-V sentences, and production of the vowel /i/. A convolutional neural network was developed and trained as a classifier to detect obstructed/unobstructed vocal folds in HSV frames. Manually labelled data were used for training, validating, and testing of the network. Moreover, a comprehensive robustness evaluation was conducted to compare the performance of the developed classifier and visual analysis of HSV data. RESULTS The developed convolutional neural network was able to automatically detect the vocal fold obstructions in HSV data in vocally normal participants and AdSD patients. The trained network was tested successfully and showed an overall classification accuracy of 94.18% on the testing dataset. The robustness evaluation showed an average overall accuracy of 94.81% on a massive number of HSV frames demonstrating the high robustness of the introduced technique while keeping a high level of accuracy. CONCLUSIONS The proposed approach can be used for efficient analysis of HSV data to study laryngeal maneuvers in patients with AdSD during connected speech. Additionally, this method will facilitate development of vocal fold vibratory measures for HSV frames with an unobstructed view of the vocal folds. Indicating parts of connected speech that provide an unobstructed view of the vocal folds can be used for developing optimal passages for precise HSV examination during connected speech and subject-specific clinical voice assessment protocols.
Collapse
Affiliation(s)
- Ahmed M Yousef
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | - Dimitar D Deliyski
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | - Stephanie R C Zacharias
- Head and Neck Regenerative Medicine Program, Mayo Clinic, Scottsdale, Arizona; Department of Otolaryngology-Head and Neck Surgery, Mayo Clinic, Phoenix, Arizona
| | - Maryam Naghibolhosseini
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan.
| |
Collapse
|
3
|
Mohd Khairuddin KA, Ahmad K, Proehoeman SC, Mohd Ibrahim H, Yan Y. Preliminary Findings of Vocal Fold Vibratory Characteristics of Singers Analyzed by Laryngeal High-Speed Videoendoscopy. J Voice 2024:S0892-1997(24)00173-5. [PMID: 38902142 DOI: 10.1016/j.jvoice.2024.06.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Revised: 05/31/2024] [Accepted: 06/01/2024] [Indexed: 06/22/2024]
Abstract
OBJECTIVES This study investigates the vocal fold vibratory dynamics of singers, which are postulated to differ from those of normal speakers due to the singers' regular vocal training. The measurement of vocal fold vibration involved the utilization of laryngeal high-speed videoendoscopy (LHSV) and subsequent LHSV-based analysis. The focus of the present study is to characterize and compare the LHSV-based measures derived from the glottal area waveform (GAW), namely fundamental frequency (F0GAW), glottal perturbation (jitterGAW and shimmerGAW), open quotient (OQGAW), and Nyquist plots, between singers and normal speakers across genders. METHODS Participants comprised 13 singers from a local cultural and heritage academy and 56 normal speakers from a local university, all were evaluated to have normal voices. Each participant underwent LHSV procedures to capture images of vocal fold vibration, which were subsequently analyzed to generate the LHSV-based measures. RESULTS Male singers exhibited lower F0GAW, jitterGAW, shimmerGAW, and OQGAW than female singers. When compared to normal speakers, male singers demonstrated higher F0GAW, and lower jitterGAW and shimmerGAW. No difference in OQGAW was found between male singers and normal speakers. Female singers exhibited lower jitterGAW compared to normal speakers, but no differences were observed in shimmerGAW and OQGAW. The results of Nyquist plots indicated no gender-related associations between types of rim width and among singers. However, for rim pattern, male singers were associated with a higher percentage of clustered rim, suggesting more regular vocal fold vibration, compared to female singers and normal male speakers. CONCLUSIONS Singers, particularly male singers, demonstrate distinct and potentially superior vocal fold vibrations compared to normal speakers, likely attributed to their regular vocal training, resulting in refined vocal fold configurations even during speaking. Despite the limited sample of singers, the study offers valuable insights into the vocal fold vibratory behaviors in singers analyzed using LHSV.
Collapse
Affiliation(s)
- Khairy Anuar Mohd Khairuddin
- Speech Pathology Program, School of Health Sciences, Universiti Sains Malaysia, Kubang Kerian, Kelantan, Malaysia; Speech Sciences Program, Faculty of Health Sciences, Universiti Kebangsaan Malaysia, Kuala Lumpur, Malaysia.
| | - Kartini Ahmad
- Speech Sciences Program, Faculty of Health Sciences, Universiti Kebangsaan Malaysia, Kuala Lumpur, Malaysia
| | | | - Hasherah Mohd Ibrahim
- Speech Sciences Program, Faculty of Health Sciences, Universiti Kebangsaan Malaysia, Kuala Lumpur, Malaysia
| | - Yuling Yan
- Department of Bioengineering, School of Engineering, Santa Clara University, Santa Clara, California
| |
Collapse
|
4
|
Schlegel P, Rhyn Chung H, Döllinger M, Chhetri DK. Reconstruction of Vocal Fold Medial Surface 3D Trajectories: Effects of Neuromuscular Stimulation and Airflow. Laryngoscope 2024; 134:1249-1257. [PMID: 37672673 PMCID: PMC10915101 DOI: 10.1002/lary.31029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2023] [Revised: 08/12/2023] [Accepted: 08/22/2023] [Indexed: 09/08/2023]
Abstract
INTRODUCTION Analysis of medial surface dynamics of the vocal folds (VF) is critical to understanding voice production and treatment of voice disorders. We analyzed VF medial surface vibratory dynamics, evaluating the effects of airflow and nerve stimulation using 3D reconstruction and empirical eigenfunctions (EEF). STUDY DESIGN In vivo canine hemilarynx phonation. METHODS An in vivo canine hemilarynx was phonated while graded stimulation of the recurrent and superior laryngeal nerves (RLN and SLN) was performed. For each phonatory condition, vibratory cycles were 3D reconstructed from tattooed landmarks on the VF medial surface at low, medium, and high airflows. Parameters describing medial surface trajectory shape were calculated, and underlying patterns were emphasized using EEFs. Fundamental frequency and smoothed cepstral peak prominence (CPPS) were calculated from acoustic data. RESULTS Convex-hull area of landmark trajectories increased with increasing flow and decreasing nerve activation level. Trajectory shapes observed included circular, ellipsoid, bent, and figure-eight. They were more circular on the superior and anterior VF, and more elliptical and line-like on the inferior and posterior VF. The EEFs capturing synchronal opening and closing (EEF1) and alternating convergent/divergent (EEF2) glottis shapes were mostly unaffected by flow and nerve stimulation levels. CPPS increased with higher airflow except for low RLN activation and very dominant SLN stimulation. CONCLUSION We analyzed VF vibration as a function of neuromuscular stimulation and airflow levels. Oscillation patterns such as figure-eight and bent trajectories were linked to high nerve activation and flow. Further studies investigating longer sections of 3D reconstructed oscillations are needed. LEVEL OF EVIDENCE N/A, Basic Science Laryngoscope, 134:1249-1257, 2024.
Collapse
Affiliation(s)
- Patrick Schlegel
- Department of Head and Neck Surgery, University of California, Los Angeles; Los Angeles, CA
| | - Hye Rhyn Chung
- Department of Head and Neck Surgery, University of California, Los Angeles; Los Angeles, CA
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology, Department of Head and Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Dinesh K. Chhetri
- Department of Head and Neck Surgery, University of California, Los Angeles; Los Angeles, CA
| |
Collapse
|
5
|
Malinowski J, Pietruszewska W, Stawiski K, Kowalczyk M, Barańska M, Rycerz A, Niebudek-Bogusz E. High-Speed Videoendoscopy Enhances the Objective Assessment of Glottic Organic Lesions: A Case-Control Study with Multivariable Data-Mining Model Development. Cancers (Basel) 2023; 15:3716. [PMID: 37509377 PMCID: PMC10378075 DOI: 10.3390/cancers15143716] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Revised: 07/13/2023] [Accepted: 07/19/2023] [Indexed: 07/30/2023] Open
Abstract
The aim of the study was to utilize a quantitative assessment of the vibratory characteristics of vocal folds in diagnosing benign and malignant lesions of the glottis using high-speed videolaryngoscopy (HSV). METHODS Case-control study including 100 patients with unilateral vocal fold lesions in comparison to 38 normophonic subjects. Quantitative assessment with the determination of vocal fold oscillation parameters was performed based on HSV kymography. Machine-learning predictive models were developed and validated. RESULTS All calculated parameters differed significantly between healthy subjects and patients with organic lesions. The first predictive model distinguishing any organic lesion patients from healthy subjects reached an area under the curve (AUC) equal to 0.983 and presented with 89.3% accuracy, 97.0% sensitivity, and 71.4% specificity on the testing set. The second model identifying malignancy among organic lesions reached an AUC equal to 0.85 and presented with 80.6% accuracy, 100% sensitivity, and 71.1% specificity on the training set. Important predictive factors for the models were frequency perturbation measures. CONCLUSIONS The standard protocol for distinguishing between benign and malignant lesions continues to be clinical evaluation by an experienced ENT specialist and confirmed by histopathological examination. Our findings did suggest that advanced machine learning models, which consider the complex interactions present in HSV data, could potentially indicate a heightened risk of malignancy. Therefore, this technology could prove pivotal in aiding in early cancer detection, thereby emphasizing the need for further investigation and validation.
Collapse
Affiliation(s)
- Jakub Malinowski
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, 90-419 Lodz, Poland
| | - Wioletta Pietruszewska
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, 90-419 Lodz, Poland
| | - Konrad Stawiski
- Department of Radiation Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA
- Department of Biostatistics and Translational Medicine, Medical University of Lodz, 90-419 Lodz, Poland
| | - Magdalena Kowalczyk
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, 90-419 Lodz, Poland
| | - Magda Barańska
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, 90-419 Lodz, Poland
| | - Aleksander Rycerz
- Department of Biostatistics and Translational Medicine, Medical University of Lodz, 90-419 Lodz, Poland
| | - Ewa Niebudek-Bogusz
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, 90-419 Lodz, Poland
| |
Collapse
|
6
|
Yousef AM, Deliyski DD, Zacharias SRC, Naghibolhosseini M. Deep-Learning-Based Representation of Vocal Fold Dynamics in Adductor Spasmodic Dysphonia during Connected Speech in High-Speed Videoendoscopy. J Voice 2022:S0892-1997(22)00263-6. [PMID: 36154973 PMCID: PMC10030376 DOI: 10.1016/j.jvoice.2022.08.022] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Revised: 08/14/2022] [Accepted: 08/17/2022] [Indexed: 11/28/2022]
Abstract
OBJECTIVE Adductor spasmodic dysphonia (AdSD) is a neurogenic dystonia, which causes spasms of the laryngeal muscles. This disorder mainly affects production of connected speech. To understand how AdSD affects vocal fold (VF) movements and hence, the speech signal, it is necessary to study VF kinematics during the running speech. This paper introduces an automated method for analysis of VF vibrations in AdSD using laryngeal high-speed videoendoscopy (HSV) in running speech. METHODS A monochrome HSV system was used to obtain video recordings from vocally normal individuals and AdSD patients during production of the six CAPE-V sentences and the "Rainbow Passage." A deep neural network was designed based on the UNet architecture. The network was developed for glottal area segmentation in HSV data providing a tool for quantitative analysis of VF vibrations in both norm and AdSD. The network was trained and validated using the manually labeled HSV frames. After training the network, the segmentation quality was quantitatively evaluated against visual analysis results of a test dataset including segregated HSV frames and a short sequence of VF vibrations in consecutive frames. RESULTS The developed convolutional network was successfully trained and demonstrated an accurate segmentation on the testing dataset with a mean Intersection over Union (IoU) of 0.81 and a mean Boundary-F1 score of 0.93. Moreover, the visual assessment of the automated technique showed an accurate detection of the glottal edges/area in the HSV data even with challenging image quality and excessive laryngeal maneuvers of AdSD patients during the running speech. CONCLUSION The introduced automated approach provides an accurate representation of the glottal edges/area during connected speech in HSV data for norm and AdSD patients. This method facilitates the development of HSV-based measures to quantify VF dynamics in AdSD. Using HSV to automatically analyze VF vibrations in AdSD can allow for understanding AdSD vocal mechanisms and characteristics.
Collapse
Affiliation(s)
- Ahmed M Yousef
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | - Dimitar D Deliyski
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | - Stephanie R C Zacharias
- Head and Neck Regenerative Medicine Program, Mayo Clinic, Scottsdale, Arizona; Department of Otolaryngology-Head and Neck Surgery, Mayo Clinic, Phoenix, Arizona
| | - Maryam Naghibolhosseini
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan.
| |
Collapse
|
7
|
Isolated Severe Dysphonia as a Presentation of Post-COVID-19 Syndrome. Diagnostics (Basel) 2022; 12:diagnostics12081839. [PMID: 36010188 PMCID: PMC9406942 DOI: 10.3390/diagnostics12081839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 07/26/2022] [Accepted: 07/26/2022] [Indexed: 11/18/2022] Open
Abstract
This is the first study assessing the clinical management of severe, isolated dysphonia during post-COVID-19 syndrome. One hundred and fifty-eight subjects met the inclusion criteria for the post-COVID-19 condition as specified by the WHO. Six patients were diagnosed with isolated severe dysphonia, constituting 3.8% of the initial group. The pre- and post-examination protocol consisted of subjective voice self-assessment and routine laryngological examination, followed by an instrumental examination by means of Laryngovideostroboscopy (LVS) and High-Speed Videolaryngoscopy (HSV). The treatment included short-term systemic steroids in decreasing doses, moisturizing inhalations with hyaluronic acid, and protective agents against Laryngopharyngeal Reflux. The kinematic imaging of the glottis performed by means of HSV before treatment showed deviations in the regularity and symmetry of vocal fold vibrations, absence of mucosal wave, and incomplete glottal closure. Improvement of the structural and functional state of the larynx was observed post-treatment. Kymographic sections and Glottal Width Waveform (GWW) graphs obtained from post-treatment HSV recordings showed improvement in vocal fold vibrations. The decrease in mean Jitter and Shimmer was observed, with the following mean values of 3.16 pre-treatment and 2.97 post-treatment for Jitter and 7.16 pre-treatment and 2.77 post-treatment for Shimmer. The post-treatment self-evaluation of voice showed considerable improvement in vocal function and voice quality in all the examined patients. Severe dysphonia in patients with post-COVID-19 syndrome requires urgent ENT diagnosis using instrumental assessment with the evaluation of laryngeal phonatory function and intensive comprehensive treatment.
Collapse
|
8
|
Kopczynski B, Niebudek-Bogusz E, Pietruszewska W, Strumillo P. Segmentation of Glottal Images from High-Speed Videoendoscopy Optimized by Synchronous Acoustic Recordings. SENSORS (BASEL, SWITZERLAND) 2022; 22:1751. [PMID: 35270897 PMCID: PMC8915112 DOI: 10.3390/s22051751] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 02/12/2022] [Accepted: 02/15/2022] [Indexed: 05/17/2023]
Abstract
Laryngeal high-speed videoendoscopy (LHSV) is an imaging technique offering novel visualization quality of the vibratory activity of the vocal folds. However, in most image analysis methods, the interaction of the medical personnel and access to ground truth annotations are required to achieve accurate detection of vocal folds edges. In our fully automatic method, we combine video and acoustic data that are synchronously recorded during the laryngeal endoscopy. We show that the image segmentation algorithm of the glottal area can be optimized by matching the Fourier spectra of the pre-processed video and the spectra of the acoustic recording during the phonation of sustained vowel /i:/. We verify our method on a set of LHSV recordings taken from subjects with normophonic voice and patients with voice disorders due to glottal insufficiency. We show that the computed geometric indices of the glottal area make it possible to discriminate between normal and pathologic voices. The median of the Open Quotient and Minimal Relative Glottal Area values for healthy subjects were 0.69 and 0.06, respectively, while for dysphonic subjects were 1 and 0.35, respectively. We also validate these results using independent phoniatrician experts.
Collapse
Affiliation(s)
- Bartosz Kopczynski
- Institute of Electronics, Lodz University of Technology, 90-924 Lodz, Poland;
| | - Ewa Niebudek-Bogusz
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, 90-001 Lodz, Poland; (E.N.-B.); (W.P.)
| | - Wioletta Pietruszewska
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, 90-001 Lodz, Poland; (E.N.-B.); (W.P.)
| | - Pawel Strumillo
- Institute of Electronics, Lodz University of Technology, 90-924 Lodz, Poland;
| |
Collapse
|