1
|
Nobel SMN, Swapno SMMR, Islam MR, Safran M, Alfarhood S, Mridha MF. A machine learning approach for vocal fold segmentation and disorder classification based on ensemble method. Sci Rep 2024; 14:14435. [PMID: 38910146 DOI: 10.1038/s41598-024-64987-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Accepted: 06/14/2024] [Indexed: 06/25/2024] Open
Abstract
In the healthcare domain, the essential task is to understand and classify diseases affecting the vocal folds (VFs). The accurate identification of VF disease is the key issue in this domain. Integrating VF segmentation and disease classification into a single system is challenging but important for precise diagnostics. Our study addresses this challenge by combining VF illness categorization and VF segmentation into a single integrated system. We utilized two effective ensemble machine learning methods: ensemble EfficientNetV2L-LGBM and ensemble UNet-BiGRU. We utilized the EfficientNetV2L-LGBM model for classification, achieving a training accuracy of 98.88%, validation accuracy of 97.73%, and test accuracy of 97.88%. These exceptional outcomes highlight the system's ability to classify different VF illnesses precisely. In addition, we utilized the UNet-BiGRU model for segmentation, which attained a training accuracy of 92.55%, a validation accuracy of 89.87%, and a significant test accuracy of 91.47%. In the segmentation task, we examined some methods to improve our ability to divide data into segments, resulting in a testing accuracy score of 91.99% and an Intersection over Union (IOU) of 87.46%. These measures demonstrate skill of the model in accurately defining and separating VF. Our system's classification and segmentation results confirm its capacity to effectively identify and segment VF disorders, representing a significant advancement in enhancing diagnostic accuracy and healthcare in this specialized field. This study emphasizes the potential of machine learning to transform the medical field's capacity to categorize VF and segment VF, providing clinicians with a vital instrument to mitigate the profound impact of the condition. Implementing this innovative approach is expected to enhance medical procedures and provide a sense of optimism to those globally affected by VF disease.
Collapse
Affiliation(s)
- S M Nuruzzaman Nobel
- Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka, 1216, Bangladesh
| | - S M Masfequier Rahman Swapno
- Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka, 1216, Bangladesh
| | - Md Rajibul Islam
- Department of Electrical and Electronic Engineering, The Hong Kong Polytechnic University, Hong Kong, China
| | - Mejdl Safran
- Department of Computer Science, College of Computer and Information Sciences, King Saud University, P. O. Box 51178, 11543, Riyadh, Saudi Arabia.
| | - Sultan Alfarhood
- Department of Computer Science, College of Computer and Information Sciences, King Saud University, P. O. Box 51178, 11543, Riyadh, Saudi Arabia
| | - M F Mridha
- Department of Computer Science, American International University-Bangladesh, Dhaka, 1229, Bangladesh
| |
Collapse
|
2
|
Liu GS, Jovanovic N, Sung CK, Doyle PC. A Scoping Review of Artificial Intelligence Detection of Voice Pathology: Challenges and Opportunities. Otolaryngol Head Neck Surg 2024. [PMID: 38738887 DOI: 10.1002/ohn.809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 04/05/2024] [Accepted: 04/19/2024] [Indexed: 05/14/2024]
Abstract
OBJECTIVE Survey the current literature on artificial intelligence (AI) applications for detecting and classifying vocal pathology using voice recordings, and identify challenges and opportunities for advancing the field forward. DATA SOURCES PubMed, EMBASE, CINAHL, and Scopus databases. REVIEW METHODS A comprehensive literature search was performed following the Preferred Reporting Items for Systematic Reviews and Meta-analyses Extension for Scoping Reviews guidelines. Peer-reviewed journal articles in the English language were included if they used an AI approach to detect or classify pathological voices using voice recordings from patients diagnosed with vocal pathologies. RESULTS Eighty-two studies were included in the review between the years 2000 and 2023, with an increase in publication rate from one study per year in 2012 to 10 per year in 2022. Seventy-two studies (88%) were aimed at detecting the presence of voice pathology, 24 (29%) at classifying the type of voice pathology present, and 4 (5%) at assessing pathological voice using the Grade, Roughness, Breathiness, Asthenia, and Strain scale. Thirty-six databases were used to collect and analyze speech samples. Fourteen articles (17%) did not provide information about their AI model validation methodology. Zero studies moved beyond the preclinical and offline AI model development stages. Zero studies specified following a reporting guideline for AI research. CONCLUSION There is rising interest in the potential of AI technology to aid the detection and classification of voice pathology. Three challenges-and areas of opportunities-for advancing this research are heterogeneity of databases, lack of clinical validation studies, and inconsistent reporting.
Collapse
Affiliation(s)
- George S Liu
- Department of Otolaryngology-Head and Neck Surgery, Stanford University, Stanford, California, USA
| | - Nedeljko Jovanovic
- Rehabilitation Sciences-Voice Production and Perception Laboratory, Western University, London, Ontario, Canada
| | - C Kwang Sung
- Department of Otolaryngology-Head and Neck Surgery, Stanford University, Stanford, California, USA
| | - Philip C Doyle
- Department of Otolaryngology-Head and Neck Surgery, Stanford University, Stanford, California, USA
| |
Collapse
|
3
|
Dadras AA, Aichinger P. Deep Learning-Based Detection of Glottis Segmentation Failures. Bioengineering (Basel) 2024; 11:443. [PMID: 38790311 PMCID: PMC11118004 DOI: 10.3390/bioengineering11050443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Revised: 04/23/2024] [Accepted: 04/26/2024] [Indexed: 05/26/2024] Open
Abstract
Medical image segmentation is crucial for clinical applications, but challenges persist due to noise and variability. In particular, accurate glottis segmentation from high-speed videos is vital for voice research and diagnostics. Manual searching for failed segmentations is labor-intensive, prompting interest in automated methods. This paper proposes the first deep learning approach for detecting faulty glottis segmentations. For this purpose, faulty segmentations are generated by applying both a poorly performing neural network and perturbation procedures to three public datasets. Heavy data augmentations are added to the input until the neural network's performance decreases to the desired mean intersection over union (IoU). Likewise, the perturbation procedure involves a series of image transformations to the original ground truth segmentations in a randomized manner. These data are then used to train a ResNet18 neural network with custom loss functions to predict the IoU scores of faulty segmentations. This value is then thresholded with a fixed IoU of 0.6 for classification, thereby achieving 88.27% classification accuracy with 91.54% specificity. Experimental results demonstrate the effectiveness of the presented approach. Contributions include: (i) a knowledge-driven perturbation procedure, (ii) a deep learning framework for scoring and detecting faulty glottis segmentations, and (iii) an evaluation of custom loss functions.
Collapse
Affiliation(s)
| | - Philipp Aichinger
- Speech and Hearing Science Lab, Division of Phoniatrics-Logopedics, Department of Otorhinolaryngology, Medical University of Vienna, Währinger Gürtel 18-20, 1090 Vienna, Austria;
| |
Collapse
|
4
|
Schlegel P, Berry DA, Moffatt C, Zhang Z, Chhetri DK. Register transitions in an in vivo canine model as a function of intrinsic laryngeal muscle stimulation, fundamental frequency, and sound pressure level. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:2139-2150. [PMID: 38498507 PMCID: PMC10954347 DOI: 10.1121/10.0025135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 01/09/2024] [Accepted: 02/16/2024] [Indexed: 03/20/2024]
Abstract
Phonatory instabilities and involuntary register transitions can occur during singing. However, little is known regarding the mechanisms which govern such transitions. To investigate this phenomenon, we systematically varied laryngeal muscle activation and airflow in an in vivo canine larynx model during phonation. We calculated voice range profiles showing average nerve activations for all combinations of fundamental frequency (F0) and sound pressure level (SPL). Further, we determined closed-quotient (CQ) and minimum-posterior-area (MPA) based on high-speed video recordings. While different combinations of muscle activation favored different combinations of F0 and SPL, in the investigated larynx there was a consistent region of instability at about 400 Hz which essentially precluded phonation. An explanation for this region may be a larynx specific coupling between sound source and subglottal tract or an effect based purely on larynx morphology. Register transitions crossed this region, with different combinations of cricothyroid and thyroarytenoid muscle (TA) activation stabilizing higher or lower neighboring frequencies. Observed patterns in CQ and MPA dependent on TA activation reproduced patterns found in singers in previous work. Lack of control of TA stimulation may result in phonation instabilities, and enhanced control of TA stimulation may help to avoid involuntary register transitions, especially in the singing voice.
Collapse
Affiliation(s)
- Patrick Schlegel
- Department of Head and Neck Surgery, David Geffen School of Medicine at the University of California-Los Angeles, Los Angeles, California 90095, USA
| | - David A Berry
- Department of Head and Neck Surgery, David Geffen School of Medicine at the University of California-Los Angeles, Los Angeles, California 90095, USA
| | - Clare Moffatt
- Department of Head and Neck Surgery, David Geffen School of Medicine at the University of California-Los Angeles, Los Angeles, California 90095, USA
| | - Zhaoyan Zhang
- Department of Head and Neck Surgery, David Geffen School of Medicine at the University of California-Los Angeles, Los Angeles, California 90095, USA
| | - Dinesh K Chhetri
- Department of Head and Neck Surgery, David Geffen School of Medicine at the University of California-Los Angeles, Los Angeles, California 90095, USA
| |
Collapse
|
5
|
Malinowski J, Pietruszewska W, Kowalczyk M, Niebudek-Bogusz E. Value of high-speed videoendoscopy as an auxiliary tool in differentiation of benign and malignant unilateral vocal lesions. J Cancer Res Clin Oncol 2024; 150:10. [PMID: 38216796 PMCID: PMC10786956 DOI: 10.1007/s00432-023-05543-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 12/13/2023] [Indexed: 01/14/2024]
Abstract
PURPOSE The study aimed to assess the relevance of objective vibratory parameters derived from high-speed videolaryngoscopy (HSV) as a supporting tool, to assist clinicians in establishing the initial diagnosis of benign and malignant glottal organic lesions. METHODS The HSV examinations were conducted in 175 subjects: 50 normophonic, 85 subjects with benign vocal fold lesions, and 40 with early glottic cancer; organic lesions were confirmed by histopathologic examination. The parameters, derived from HSV kymography: amplitude, symmetry, and glottal dynamic characteristics, were compared statistically between the groups with the following ROC analysis. RESULTS Among 14 calculated parameters, 10 differed significantly between the groups. Four of them, the average resultant amplitude of the involved vocal fold (AmpInvolvedAvg), average amplitude asymmetry for the whole glottis and its middle third part (AmplAsymAvg; AmplAsymAvg_2/3), and absolute average phase difference (AbsPhaseDiffAvg), showed significant differences between benign and malignant lesions. Amplitude values were decreasing, while asymmetry and phase difference values were increasing with the risk of malignancy. In ROC analysis, the highest AUC was observed for AmpAsymAvg (0.719; p < 0.0001), and next in order was AmpInvolvedAvg (0.70; p = 0.0002). CONCLUSION The golden standard in the diagnosis of organic lesions of glottis remains clinical examination with videolaryngoscopy, confirmed by histopathological examination. Our results showed that measurements of amplitude, asymmetry, and phase of vibrations in malignant vocal fold masses deteriorate significantly in comparison to benign vocal lesions. High-speed videolaryngoscopy could aid their preliminary differentiation noninvasively before histopathological examination; however, further research on larger groups is needed.
Collapse
Affiliation(s)
- Jakub Malinowski
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, Lodz, Poland.
| | - Wioletta Pietruszewska
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, Lodz, Poland
| | - Magdalena Kowalczyk
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, Lodz, Poland
| | - Ewa Niebudek-Bogusz
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, Lodz, Poland
| |
Collapse
|
6
|
Schraut T, Schützenberger A, Arias-Vergara T, Kunduk M, Echternach M, Döllinger M. Machine learning based estimation of hoarseness severity using sustained vowelsa). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:381-395. [PMID: 38240668 DOI: 10.1121/10.0024341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 12/18/2023] [Indexed: 01/23/2024]
Abstract
Auditory perceptual evaluation is considered the gold standard for assessing voice quality, but its reliability is limited due to inter-rater variability and coarse rating scales. This study investigates a continuous, objective approach to evaluate hoarseness severity combining machine learning (ML) and sustained phonation. For this purpose, 635 acoustic recordings of the sustained vowel /a/ and subjective ratings based on the roughness, breathiness, and hoarseness scale were collected from 595 subjects. A total of 50 temporal, spectral, and cepstral features were extracted from each recording and used to identify suitable ML algorithms. Using variance and correlation analysis followed by backward elimination, a subset of relevant features was selected. Recordings were classified into two levels of hoarseness, H<2 and H≥2, yielding a continuous probability score ŷ∈[0,1]. An accuracy of 0.867 and a correlation of 0.805 between the model's predictions and subjective ratings was obtained using only five acoustic features and logistic regression (LR). Further examination of recordings pre- and post-treatment revealed high qualitative agreement with the change in subjectively determined hoarseness levels. Quantitatively, a moderate correlation of 0.567 was obtained. This quantitative approach to hoarseness severity estimation shows promising results and potential for improving the assessment of voice quality.
Collapse
Affiliation(s)
- Tobias Schraut
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Tomás Arias-Vergara
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Melda Kunduk
- Department of Communication Sciences and Disorders, Louisiana State University, Baton Rouge, Louisiana 70803, USA
| | - Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Munich, Ludwig-Maximilians-Universität München, 81377 Munich, Germany
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany
| |
Collapse
|
7
|
Tur B, Gühring L, Wendler O, Schlicht S, Drummer D, Kniesburges S. Effect of Ligament Fibers on Dynamics of Synthetic, Self-Oscillating Vocal Folds in a Biomimetic Larynx Model. Bioengineering (Basel) 2023; 10:1130. [PMID: 37892860 PMCID: PMC10604794 DOI: 10.3390/bioengineering10101130] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 09/13/2023] [Accepted: 09/25/2023] [Indexed: 10/29/2023] Open
Abstract
Synthetic silicone larynx models are essential for understanding the biomechanics of physiological and pathological vocal fold vibrations. The aim of this study is to investigate the effects of artificial ligament fibers on vocal fold vibrations in a synthetic larynx model, which is capable of replicating physiological laryngeal functions such as elongation, abduction, and adduction. A multi-layer silicone model with different mechanical properties for the musculus vocalis and the lamina propria consisting of ligament and mucosa was used. Ligament fibers of various diameters and break resistances were cast into the vocal folds and tested at different tension levels. An electromechanical setup was developed to mimic laryngeal physiology. The measurements included high-speed video recordings of vocal fold vibrations, subglottal pressure and acoustic. For the evaluation of the vibration characteristics, all measured values were evaluated and compared with parameters from ex and in vivo studies. The fundamental frequency of the synthetic larynx model was found to be approximately 200-520 Hz depending on integrated fiber types and tension levels. This range of the fundamental frequency corresponds to the reproduction of a female normal and singing voice range. The investigated voice parameters from vocal fold vibration, acoustics, and subglottal pressure were within normal value ranges from ex and in vivo studies. The integration of ligament fibers leads to an increase in the fundamental frequency with increasing airflow, while the tensioning of the ligament fibers remains constant. In addition, a tension increase in the fibers also generates a rise in the fundamental frequency delivering the physiological expectation of the dynamic behavior of vocal folds.
Collapse
Affiliation(s)
- Bogac Tur
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Medical School, Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany
| | - Lucia Gühring
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Medical School, Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany
| | - Olaf Wendler
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Medical School, Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany
| | - Samuel Schlicht
- Institute of Polymer Technology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Am Weichselgarten 10, 91058 Erlangen, Germany
| | - Dietmar Drummer
- Institute of Polymer Technology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Am Weichselgarten 10, 91058 Erlangen, Germany
| | - Stefan Kniesburges
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Medical School, Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany
| |
Collapse
|
8
|
Semmler M, Lasar S, Kremer F, Reinwald L, Wittig F, Peters G, Schraut T, Wendler O, Seyferth S, Schützenberger A, Dürr S. Extent and Effect of Covering Laryngeal Structures with Synthetic Laryngeal Mucus via Two Different Administration Techniques. J Voice 2023:S0892-1997(23)00228-X. [PMID: 37648625 DOI: 10.1016/j.jvoice.2023.07.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 07/20/2023] [Accepted: 07/21/2023] [Indexed: 09/01/2023]
Abstract
OBJECTIVE The first goal of this study was to investigate the coverage of laryngeal structures using two potential administration techniques for synthetic mucus: inhalation and lozenge ingestion. As a second research question, the study investigated the potential effects of these techniques on standardized voice assessment parameters. METHODS Fluorescein was added to throat lozenges and to an inhalation solution to visualize the coverage of laryngeal structures through blue light imaging. The study included 70 vocally healthy subjects. Fifty subjects underwent administration via lozenge ingestion and 20 subjects performed the inhalation process. For the first research question, the recordings from the blue light imaging system were categorized to compare the extent of coverage on individual laryngeal structures objectively. Secondly, a standardized voice evaluation protocol was performed before and after each administration to determine any measurable effects of typical voice parameters. RESULTS The administration via inhalation demonstrated complete coverage of all laryngeal structures, including the vocal folds, ventricular folds, and arytenoid cartilages, as visualized by the fluorescent dye. In contrast, the application of the lozenge predominantly covered the pharynx and laryngeal surface toward the aryepiglottic fold, but not the inferior structures. All in all, the comparison before and after administration showed no clear effect, although a minor deterioration of the acoustic signal was noted in the shimmer and cepstral peak prominence after the inhalation. CONCLUSIONS Our findings indicate that the inhalation process is a more effective technique for covering deeper laryngeal structures such as the vocal folds and ventricular folds with synthetic mucus. This knowledge enables further in vivo studies on the role of laryngeal mucus in phonation in general, and how it can be substituted or supplemented for patients with reduced glandular activity as well as for heavy voice users.
Collapse
Affiliation(s)
- Marion Semmler
- University Hospital Erlangen, Medical School, Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany.
| | - Sarina Lasar
- University Hospital Erlangen, Medical School, Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany.
| | - Franziska Kremer
- University Hospital Erlangen, Medical School, Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany.
| | - Laura Reinwald
- University Hospital Erlangen, Medical School, Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany.
| | - Fiori Wittig
- University Hospital Erlangen, Medical School, Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany.
| | - Gregor Peters
- University Hospital Erlangen, Medical School, Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany.
| | - Tobias Schraut
- University Hospital Erlangen, Medical School, Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany.
| | - Olaf Wendler
- University Hospital Erlangen, Medical School, Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany.
| | - Stefan Seyferth
- Department of Chemistry and Pharmacy, Chair of Pharmaceutics, Friedrich-Alexander-University Erlangen-Nürnberg, Cauerstr. 4, 91058 Erlangen, Germany.
| | - Anne Schützenberger
- University Hospital Erlangen, Medical School, Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany.
| | - Stephan Dürr
- University Hospital Regensburg, Department of Otorhinolaryngology, Division of Phoniatrics and Pediatric Audiology, Franz-Josef-Strauß-Allee 11, 93053 Regensburg, Germany.
| |
Collapse
|
9
|
Malinowski J, Pietruszewska W, Stawiski K, Kowalczyk M, Barańska M, Rycerz A, Niebudek-Bogusz E. High-Speed Videoendoscopy Enhances the Objective Assessment of Glottic Organic Lesions: A Case-Control Study with Multivariable Data-Mining Model Development. Cancers (Basel) 2023; 15:3716. [PMID: 37509377 PMCID: PMC10378075 DOI: 10.3390/cancers15143716] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Revised: 07/13/2023] [Accepted: 07/19/2023] [Indexed: 07/30/2023] Open
Abstract
The aim of the study was to utilize a quantitative assessment of the vibratory characteristics of vocal folds in diagnosing benign and malignant lesions of the glottis using high-speed videolaryngoscopy (HSV). METHODS Case-control study including 100 patients with unilateral vocal fold lesions in comparison to 38 normophonic subjects. Quantitative assessment with the determination of vocal fold oscillation parameters was performed based on HSV kymography. Machine-learning predictive models were developed and validated. RESULTS All calculated parameters differed significantly between healthy subjects and patients with organic lesions. The first predictive model distinguishing any organic lesion patients from healthy subjects reached an area under the curve (AUC) equal to 0.983 and presented with 89.3% accuracy, 97.0% sensitivity, and 71.4% specificity on the testing set. The second model identifying malignancy among organic lesions reached an AUC equal to 0.85 and presented with 80.6% accuracy, 100% sensitivity, and 71.1% specificity on the training set. Important predictive factors for the models were frequency perturbation measures. CONCLUSIONS The standard protocol for distinguishing between benign and malignant lesions continues to be clinical evaluation by an experienced ENT specialist and confirmed by histopathological examination. Our findings did suggest that advanced machine learning models, which consider the complex interactions present in HSV data, could potentially indicate a heightened risk of malignancy. Therefore, this technology could prove pivotal in aiding in early cancer detection, thereby emphasizing the need for further investigation and validation.
Collapse
Affiliation(s)
- Jakub Malinowski
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, 90-419 Lodz, Poland
| | - Wioletta Pietruszewska
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, 90-419 Lodz, Poland
| | - Konrad Stawiski
- Department of Radiation Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA
- Department of Biostatistics and Translational Medicine, Medical University of Lodz, 90-419 Lodz, Poland
| | - Magdalena Kowalczyk
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, 90-419 Lodz, Poland
| | - Magda Barańska
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, 90-419 Lodz, Poland
| | - Aleksander Rycerz
- Department of Biostatistics and Translational Medicine, Medical University of Lodz, 90-419 Lodz, Poland
| | - Ewa Niebudek-Bogusz
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, 90-419 Lodz, Poland
| |
Collapse
|
10
|
Movahhedi M, Liu XY, Geng B, Elemans C, Xue Q, Wang JX, Zheng X. Predicting 3D soft tissue dynamics from 2D imaging using physics informed neural networks. Commun Biol 2023; 6:541. [PMID: 37208428 DOI: 10.1038/s42003-023-04914-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 05/04/2023] [Indexed: 05/21/2023] Open
Abstract
Tissue dynamics play critical roles in many physiological functions and provide important metrics for clinical diagnosis. Capturing real-time high-resolution 3D images of tissue dynamics, however, remains a challenge. This study presents a hybrid physics-informed neural network algorithm that infers 3D flow-induced tissue dynamics and other physical quantities from sparse 2D images. The algorithm combines a recurrent neural network model of soft tissue with a differentiable fluid solver, leveraging prior knowledge in solid mechanics to project the governing equation on a discrete eigen space. The algorithm uses a Long-short-term memory-based recurrent encoder-decoder connected with a fully connected neural network to capture the temporal dependence of flow-structure-interaction. The effectiveness and merit of the proposed algorithm is demonstrated on synthetic data from a canine vocal fold model and experimental data from excised pigeon syringes. The results showed that the algorithm accurately reconstructs 3D vocal dynamics, aerodynamics, and acoustics from sparse 2D vibration profiles.
Collapse
Affiliation(s)
| | - Xin-Yang Liu
- Aerospace and Mechanical Engineering Department, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Biao Geng
- Mechanical Engineering Department, University of Maine, Orono, ME, 04469, USA
- Mechanical Engineering Department, Rochester Institute of Technology, Rochester, NY, 14623, USA
| | - Coen Elemans
- Department of Biology, University of Southern Denmark, Odense M, 5230, Denmark
| | - Qian Xue
- Mechanical Engineering Department, University of Maine, Orono, ME, 04469, USA
- Mechanical Engineering Department, Rochester Institute of Technology, Rochester, NY, 14623, USA
| | - Jian-Xun Wang
- Aerospace and Mechanical Engineering Department, University of Notre Dame, Notre Dame, IN, 46556, USA.
| | - Xudong Zheng
- Mechanical Engineering Department, University of Maine, Orono, ME, 04469, USA.
- Mechanical Engineering Department, Rochester Institute of Technology, Rochester, NY, 14623, USA.
| |
Collapse
|
11
|
Arias-Vergara T, Döllinger M, Schraut T, Mohd Khairuddin KA, Schützenberger A. Nyquist Plot Parametrization for Quantitative Analysis of Vibration of the Vocal Folds. J Voice 2023:S0892-1997(23)00014-0. [PMID: 36774264 DOI: 10.1016/j.jvoice.2023.01.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 01/12/2023] [Accepted: 01/12/2023] [Indexed: 02/11/2023]
Abstract
OBJECTIVES The Nyquist plot provides a graphical representation of the glottal cycles as elliptical trajectories in a 2D plane. This study proposes a methodology to parameterize the Nyquist plot with application to support the quantitative analysis of voice disorders. METHODS We considered high-speed videoendoscopy recordings of 33 functional dysphonia (FD) patients and 33 normophonic controls (NC). Quantitative analysis was performed by computing four shape-based parameters from the Nyquist plot: Variability, Size (Perimeter and Area), and Consistency. Additionally, we performed automatic classification using a linear support vector machine and feature importance analysis by combining the proposed features with state-of-the-art glottal area waveform (GAW) parameters. RESULTS We found that the inter-cycle variability was significantly higher in FD patients compared to NC. We achieved a classification accuracy of 83% when the top 30 most important features were used. Furthermore, the proposed Nyquist plot features were ranked in the top 12 most important features. CONCLUSIONS The Nyquist plot provides complementary information for subjective and objective assessment of voice disorders. On the one hand, with visual inspection it is possible to observe intra- and inter-glottal cycle irregularities during sustained phonation. On the other hand, shaped-based parameters allow quantifying such irregularities and provide complementary information to state-of-the-art GAW parameters.
Collapse
Affiliation(s)
- Tomás Arias-Vergara
- University Hospital Erlangen, Medical School Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany.
| | - Michael Döllinger
- University Hospital Erlangen, Medical School Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
| | - Tobias Schraut
- University Hospital Erlangen, Medical School Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
| | | | - Anne Schützenberger
- University Hospital Erlangen, Medical School Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
| |
Collapse
|
12
|
Pedersen M, Larsen CF, Madsen B, Eeg M. Localization and quantification of glottal gaps on deep learning segmentation of vocal folds. Sci Rep 2023; 13:878. [PMID: 36650265 PMCID: PMC9845318 DOI: 10.1038/s41598-023-27980-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 01/11/2023] [Indexed: 01/19/2023] Open
Abstract
The entire glottis has mostly been the focus in the tracking of the vocal folds, both manually and automatically. From a treatment point of view, the various regions of the glottis are of specific interest. The aim of the study was to test if it was possible to supplement an existing convolutional neural network (CNN) with post-network calculations for the localization and quantification of posterior glottal gaps during phonation, usable for vocal fold function analysis of e.g. laryngopharyngeal reflux findings. 30 subjects/videos with insufficient closure in the rear glottal area and 20 normal subjects/videos were selected from our database, recorded with a commercial high-speed video setup (HSV with 4000 frames per second), and segmented with an open-source CNN for validating voice function. We made post-network calculations to localize and quantify the 10% and 50% distance lines from the rear part of the glottis. The results showed a significant difference using the algorithm at the 10% line distance between the two groups of p < 0.0001 and no difference at 50%. These novel results show that it is possible to use post-network calculations on CNNs for the localization and quantification of posterior glottal gaps.
Collapse
|
13
|
Kaluza J, Niebudek-Bogusz E, Malinowski J, Strumillo P, Pietruszewska W. Assessment of Vocal Fold Stiffness by Means of High-Speed Videolaryngoscopy with Laryngotopography in Prediction of Early Glottic Malignancy: Preliminary Report. Cancers (Basel) 2022; 14:cancers14194697. [PMID: 36230618 PMCID: PMC9563419 DOI: 10.3390/cancers14194697] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 09/07/2022] [Accepted: 09/19/2022] [Indexed: 11/16/2022] Open
Abstract
Simple Summary The method described in our manuscript can help to objectively assess the vibration of each vocal fold using larygotopographic analysis of high-speed videoendoscopy (HSV) recordings. We have developed image processing and analysis procedures to detect vocal fold regions in HSV films and quantitatively analyze their shape and kinematics. We proposed the term Stiffness Asymmetry Index which can provide valuable information on the texture and kinematic properties of individual vocal fold tissues, which can be important in the diagnosis of early glottis cancer. Our study showed that a low value of SAI indicated large, non-vibrating vocal fold areas, characteristic of infiltrative lesions such as invasive carcinoma. This important clinical information can help to assess the depth of vocal fold invasion before direct histologic examination and discriminate benign from malignant lesions. Abstract One of the most important challenges in laryngological practice is the early diagnosis of laryngeal cancer. Detection of non-vibrating areas affected by neoplastic lesions of the vocal folds can be crucial in the recognition of early cancerogenous infiltration. Glottal pathologies associated with abnormal vibration patterns of the vocal folds can be detected and quantified using High-speed Videolaryngoscopy (HSV), also in subjects with severe voice disorders, and analyzed with the aid of computer image processing procedures. We present a method that enables the assessment of vocal fold pathologies with the use of HSV. The calculated laryngotopographic (LTG) maps of the vocal folds based on HSV allowed for a detailed characterization of vibration patterns and abnormalities in different regions of the vocal folds. We verified our methods with HSV recordings from 31 subjects with a normophonic voice and benign and malignant vocal fold lesions. We proposed the novel Stiffness Asymmetry Index (SAI) to differentiate between early glottis cancer (SAI = 0.65 ± 0.18) and benign vocal fold masses (SAI = 0.16 ± 0.13). Our results showed that these glottal pathologies might be noninvasively distinguished prior to histopathological examination. However, this needs to be confirmed by further research on larger groups of benign and malignant laryngeal lesions.
Collapse
Affiliation(s)
- Justyna Kaluza
- Institute of Electronics, Lodz University of Technology, 90-924 Lodz, Poland
| | - Ewa Niebudek-Bogusz
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, 90-001 Lodz, Poland
| | - Jakub Malinowski
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, 90-001 Lodz, Poland
| | - Pawel Strumillo
- Institute of Electronics, Lodz University of Technology, 90-924 Lodz, Poland
| | - Wioletta Pietruszewska
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, 90-001 Lodz, Poland
- Correspondence:
| |
Collapse
|
14
|
Paderno A, Gennarini F, Sordi A, Montenegro C, Lancini D, Villani FP, Moccia S, Piazza C. Artificial intelligence in clinical endoscopy: Insights in the field of videomics. Front Surg 2022; 9:933297. [PMID: 36171813 PMCID: PMC9510389 DOI: 10.3389/fsurg.2022.933297] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 08/22/2022] [Indexed: 11/13/2022] Open
Abstract
Artificial intelligence is being increasingly seen as a useful tool in medicine. Specifically, these technologies have the objective to extract insights from complex datasets that cannot easily be analyzed by conventional statistical methods. While promising results have been obtained for various -omics datasets, radiological images, and histopathologic slides, analysis of videoendoscopic frames still represents a major challenge. In this context, videomics represents a burgeoning field wherein several methods of computer vision are systematically used to organize unstructured data from frames obtained during diagnostic videoendoscopy. Recent studies have focused on five broad tasks with increasing complexity: quality assessment of endoscopic images, classification of pathologic and nonpathologic frames, detection of lesions inside frames, segmentation of pathologic lesions, and in-depth characterization of neoplastic lesions. Herein, we present a broad overview of the field, with a focus on conceptual key points and future perspectives.
Collapse
Affiliation(s)
- Alberto Paderno
- Unit of Otorhinolaryngology—Head and Neck Surgery, ASST Spedali Civili of Brescia, Brescia, Italy
- Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, School of Medicine, University of Brescia, Brescia, Italy
- Correspondence: Alberto Paderno
| | - Francesca Gennarini
- Unit of Otorhinolaryngology—Head and Neck Surgery, ASST Spedali Civili of Brescia, Brescia, Italy
- Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, School of Medicine, University of Brescia, Brescia, Italy
| | - Alessandra Sordi
- Unit of Otorhinolaryngology—Head and Neck Surgery, ASST Spedali Civili of Brescia, Brescia, Italy
- Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, School of Medicine, University of Brescia, Brescia, Italy
| | - Claudia Montenegro
- Unit of Otorhinolaryngology—Head and Neck Surgery, ASST Spedali Civili of Brescia, Brescia, Italy
- Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, School of Medicine, University of Brescia, Brescia, Italy
| | - Davide Lancini
- Unit of Otorhinolaryngology—Head and Neck Surgery, ASST Spedali Civili of Brescia, Brescia, Italy
| | - Francesca Pia Villani
- The BioRobotics Institute, Scuola Superiore Sant’Anna, Pisa, Italy
- Department of Excellence in Robotics and AI, Scuola Superiore Sant’Anna, Pisa, Italy
| | - Sara Moccia
- The BioRobotics Institute, Scuola Superiore Sant’Anna, Pisa, Italy
- Department of Excellence in Robotics and AI, Scuola Superiore Sant’Anna, Pisa, Italy
| | - Cesare Piazza
- Unit of Otorhinolaryngology—Head and Neck Surgery, ASST Spedali Civili of Brescia, Brescia, Italy
- Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, School of Medicine, University of Brescia, Brescia, Italy
| |
Collapse
|
15
|
Schlegel P, Berry DA, Chhetri DK. Analysis of vibratory mode changes in symmetric and asymmetric activation of the canine larynx. PLoS One 2022; 17:e0266910. [PMID: 35421159 PMCID: PMC9009716 DOI: 10.1371/journal.pone.0266910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 03/29/2022] [Indexed: 12/02/2022] Open
Abstract
Investigations of neuromuscular control of voice production have primarily focused on the roles of muscle activation levels, posture, and stiffness at phonation onset. However, little work has been done investigating the stability of the phonation process in regards to spontaneous changes in vibratory mode of vocal fold oscillation as a function of neuromuscular activation. We evaluated 320 phonatory conditions representing combinations of superior and recurrent laryngeal nerve (SLN and RLN) activations in an in vivo canine model of phonation. At each combination of neuromuscular input, airflow was increased linearly to reach phonation onset and beyond from 300 to 1400 mL/s. High-speed video and acoustic data were recorded during phonation, and spectrograms and glottal-area-based parameters were calculated. Vibratory mode changes were detected based on sudden increases or drops of local fundamental frequency. Mode changes occurred only when SLNs were concurrently stimulated and were more frequent for higher, less asymmetric RLN stimulation. A slight increase in amplitude and cycle length perturbation usually preceded the changes in the vibratory mode. However, no inherent differences between signals with mode changes and signals without were found.
Collapse
Affiliation(s)
- Patrick Schlegel
- Department of Head & Neck Surgery, David Geffen School of Medicine, University of California Los Angeles (UCLA), Los Angeles, CA, United States of America
- * E-mail:
| | - David A. Berry
- Department of Head & Neck Surgery, David Geffen School of Medicine, University of California Los Angeles (UCLA), Los Angeles, CA, United States of America
| | - Dinesh K. Chhetri
- Department of Head & Neck Surgery, David Geffen School of Medicine, University of California Los Angeles (UCLA), Los Angeles, CA, United States of America
| |
Collapse
|
16
|
Malinowski J, Niebudek-Bogusz E, Just M, Morawska J, Racino A, Hoffman J, Barańska M, Kowalczyk MM, Pietruszewska W. Laryngeal High-Speed Videoendoscopy with Laser Illumination: A Preliminary Report. Otolaryngol Pol 2021; 75:1-10. [PMID: 35175220 DOI: 10.5604/01.3001.0015.2575] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
<br><b>Introduction:</b> Advances in computer image analysis have enabled the use of new functional imaging methods in the diagnosis of laryngeal diseases. Particularly interesting techniques of dynamic laryngeal imaging involve High Speed Videoendoscopy (HSV). This still-developed technique allows to overcome the limitations of laryngovideostroboscopy (LVS) and a more detailed analysis of the glottal function based on the image of the actual vibrations of the vocal folds. It also enables the determination of objective coefficients parameterizing phonatory vibrations of the vocal folds.</br> <br><b>Aim:</b> The aim of this pilot study was to evaluate the use of a high-speed videoendoscopy set with laser illumination for the diagnosis of glottic pathology in ENT practice.</br> <br><b>Material and methods:</b> The study included 40 patients who underwent LVS followed by HSV. The modern HSV examination kit - Advanced Larynx Imager System (ALIS), used for the first time in a clinical setting in Poland, is characterized by significantly improved, compared to the previously used high-speed cameras, operational parameters - a light head, the possibility of continuous lighting operation without excessive heating of the head tip, registration of the image in full color scale. Thanks to such modernization, the safety and course of the examination do not differ from laryngoscopy conducted with commonly used recorders. The device owes some of these improvements to a laser illuminator which was used for the first time as the main light source in a high-speed camera. In the study, two cases were selected to present the results of HSV and the analysis of the generated kymograms - a woman with no glottic pathology and a man with a polyp of the right vocal fold. In the first case, the HSV examination compared with the LVS revealed a discrete glottis functional disorder in the form of a tendency to hyperphonation. The patient with an organic lesion had a clearly visible irregularity of vocal fold vibrations, which also allowed to trace mucosal wave disturbances related to its reflection from the pathological structure of the glottis and the formation of a return wave, both on the fold affected by the lesion and, to a lesser extent, contralaterally. The glottic dysfunctions observed in the studied patients were confirmed in the generated kymograms and the graphs of the glottal width waveform (GWW), as well as in the parameters calculated on their basis, assessing the frequency and amplitude of phonatory vibrations.</br> <br><b>Conclusions:</b> The use of high-speed videoendoscopy allows for a much more accurate assessment of the phonatory function of the glottis than in laryngovideostroboscopy. The presented HSV system allows for obtaining high quality kinematic images of the larynx, color fidelity, and contrast. The use of this technology in laryngological practice enables precise structural and functional assessment of the glottis and detection of discrete phonation disorders that elude the techniques used so far.</br>.
Collapse
Affiliation(s)
- Jakub Malinowski
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, Poland
| | - Ewa Niebudek-Bogusz
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, Poland
| | - Marcin Just
- Diagnova Technologies, Wroclaw Technology Park, Wroclaw, Poland
| | - Joanna Morawska
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, Poland
| | - Anna Racino
- Diagnova Technologies, Wroclaw Technology Park, Wroclaw, Poland
| | - Joanna Hoffman
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, Poland
| | - Magda Barańska
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, Poland
| | | | - Wioletta Pietruszewska
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, Poland
| |
Collapse
|
17
|
Kist AM, Dürr S, Schützenberger A, Döllinger M. OpenHSV: an open platform for laryngeal high-speed videoendoscopy. Sci Rep 2021; 11:13760. [PMID: 34215788 PMCID: PMC8253769 DOI: 10.1038/s41598-021-93149-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Accepted: 06/03/2021] [Indexed: 11/22/2022] Open
Abstract
High-speed videoendoscopy is an important tool to study laryngeal dynamics, to quantify vocal fold oscillations, to diagnose voice impairments at laryngeal level and to monitor treatment progress. However, there is a significant lack of an open source, expandable research tool that features latest hardware and data analysis. In this work, we propose an open research platform termed OpenHSV that is based on state-of-the-art, commercially available equipment and features a fully automatic data analysis pipeline. A publicly available, user-friendly graphical user interface implemented in Python is used to interface the hardware. Video and audio data are recorded in synchrony and are subsequently fully automatically analyzed. Video segmentation of the glottal area is performed using efficient deep neural networks to derive glottal area waveform and glottal midline. Established quantitative, clinically relevant video and audio parameters were implemented and computed. In a preliminary clinical study, we recorded video and audio data from 28 healthy subjects. Analyzing these data in terms of image quality and derived quantitative parameters, we show the applicability, performance and usefulness of OpenHSV. Therefore, OpenHSV provides a valid, standardized access to high-speed videoendoscopy data acquisition and analysis for voice scientists, highlighting its use as a valuable research tool in understanding voice physiology. We envision that OpenHSV serves as basis for the next generation of clinical HSV systems.
Collapse
Affiliation(s)
- Andreas M Kist
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstr. 1, 91054, Erlangen, Germany. .,Department of Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nürnberg, Henkestr. 91, 91054, Erlangen, Germany.
| | - Stephan Dürr
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstr. 1, 91054, Erlangen, Germany
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstr. 1, 91054, Erlangen, Germany
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstr. 1, 91054, Erlangen, Germany
| |
Collapse
|
18
|
Kim Y, Oh J, Choi SH, Jung A, Lee JG, Lee YS, Kim JK. A Portable Smartphone-Based Laryngoscope System for High-Speed Vocal Cord Imaging of Patients With Throat Disorders: Instrument Validation Study. JMIR Mhealth Uhealth 2021; 9:e25816. [PMID: 34142978 PMCID: PMC8277344 DOI: 10.2196/25816] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 02/17/2021] [Accepted: 05/13/2021] [Indexed: 11/13/2022] Open
Abstract
Background Currently, high-speed digital imaging (HSDI), especially endoscopic HSDI, is routinely used for the diagnosis of vocal cord disorders. However, endoscopic HSDI devices are usually large and costly, which limits access to patients in underdeveloped countries and in regions with inadequate medical infrastructure. Modern smartphones have sufficient functionality to process the complex calculations that are required for processing high-resolution images and videos with a high frame rate. Recently, several attempts have been made to integrate medical endoscopes with smartphones to make them more accessible to people in underdeveloped countries. Objective This study aims to develop a smartphone adaptor for endoscopes, which enables smartphone-based vocal cord imaging, to demonstrate the feasibility of performing high-speed vocal cord imaging via the high-speed imaging functions of a high-performance smartphone camera, and to determine the acceptability of the smartphone-based high-speed vocal cord imaging system for clinical applications in developing countries. Methods A customized smartphone adaptor optical relay was designed for clinical endoscopy using selective laser melting–based 3D printing. A standard laryngoscope was attached to the smartphone adaptor to acquire high-speed vocal cord endoscopic images. Only existing basic functions of the smartphone camera were used for HSDI of the vocal cords. Extracted still frames were observed for qualitative glottal volume and shape. For image processing, segmented glottal and vocal cord areas were calculated from whole HSDI frames to characterize the amplitude of the vibrations on each side of the glottis, including the frequency, edge length, glottal areas, base cord, and lateral phase differences over the acquisition time. The device was incorporated into a preclinical videokymography diagnosis routine to compare functionality. Results Smartphone-based HSDI with the smartphone-endoscope adaptor could achieve 940 frames per second and a resolution of 1280 by 720 frames, which corresponds to the detection of 3 to 8 frames per vocal cycle at double the spatial resolution of existing devices. The device was used to image the vocal cords of 4 volunteers: 1 healthy individual and 3 patients with vocal cord paralysis, chronic laryngitis, or vocal cord polyps. The resultant image stacks were sufficient for most diagnostic purposes. The cost of the device including the smartphone was lower than that of existing HSDI devices. The image processing and analytics demonstrated the successful calculation of relevant diagnostic variables from the acquired images. Patients with vocal pathologies were easily differentiable in the quantitative data. Conclusions A smartphone-based HSDI endoscope system can function as a point-of-care clinical diagnostic device. The resulting analysis is of higher quality than that accessible by videostroboscopy and promises comparable quality and greater accessibility than HSDI. In particular, this system is suitable for use as an accessible diagnostic tool in underdeveloped areas with inadequate medical service infrastructure.
Collapse
Affiliation(s)
- Youngkyu Kim
- Biomedical Engineering Research Center, Asan Institute for Life Sciences, Asan Medical Center, Seoul, Republic of Korea.,Department of Convergence Medicine, College of Medicine, University of Ulsan, Seoul, Republic of Korea
| | - Jeongmin Oh
- Biomedical Engineering Research Center, Asan Institute for Life Sciences, Asan Medical Center, Seoul, Republic of Korea
| | - Seung-Ho Choi
- Department of Otorhinolaryngology-Head and Neck Surgery, Asan Medical Center, Seoul, Republic of Korea
| | - Ahra Jung
- Department of Otorhinolaryngology-Head and Neck Surgery, Eulji Medical Center, Eulji University School of Medicine, Seoul, Republic of Korea
| | - June-Goo Lee
- Biomedical Engineering Research Center, Asan Institute for Life Sciences, Asan Medical Center, Seoul, Republic of Korea.,Department of Convergence Medicine, College of Medicine, University of Ulsan, Seoul, Republic of Korea
| | - Yoon Se Lee
- Department of Otorhinolaryngology-Head and Neck Surgery, Asan Medical Center, Seoul, Republic of Korea
| | - Jun Ki Kim
- Biomedical Engineering Research Center, Asan Institute for Life Sciences, Asan Medical Center, Seoul, Republic of Korea.,Department of Convergence Medicine, College of Medicine, University of Ulsan, Seoul, Republic of Korea
| |
Collapse
|
19
|
Echternach M, Herbst CT, Köberlein M, Story B, Döllinger M, Gellrich D. Are source-filter interactions detectable in classical singing during vowel glides? THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:4565. [PMID: 34241428 DOI: 10.1121/10.0005432] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Accepted: 06/03/2021] [Indexed: 06/13/2023]
Abstract
In recent studies, it has been assumed that vocal tract formants (Fn) and the voice source could interact. However, there are only few studies analyzing this assumption in vivo. Here, the vowel transition /i/-/a/-/u/-/i/ of 12 professional classical singers (6 females, 6 males) when phonating on the pitch D4 [fundamental frequency (ƒo) ca. 294 Hz] were analyzed using transnasal high speed videoendoscopy (20.000 fps), electroglottography (EGG), and audio recordings. Fn data were calculated using a cepstral method. Source-filter interaction candidates (SFICs) were determined by (a) algorithmic detection of major intersections of Fn/nƒo and (b) perceptual assessment of the EGG signal. Although the open quotient showed some increase for the /i-a/ and /u-i/ transitions, there were no clear effects at the expected Fn/nƒo intersections. In contrast, ƒo adjustments and changes in the phonovibrogram occurred at perceptually derived SFICs, suggesting level-two interactions. In some cases, these were constituted by intersections between higher nƒo and Fn. The presented data partially corroborates that vowel transitions may result in level-two interactions also in professional singers. However, the lack of systematically detectable effects suggests either the absence of a strong interaction or existence of confounding factors, which may potentially counterbalance the level-two-interactions.
Collapse
Affiliation(s)
- Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Marchioninistrasse 15, Munich, 81377, Germany
| | - Christian T Herbst
- Antonio Salieri Department of Vocal Studies and Vocal Research in Music Education, University of Music and Performing Arts Vienna, Vienna, Austria
| | - Marie Köberlein
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Marchioninistrasse 15, Munich, 81377, Germany
| | - Brad Story
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85718, USA
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head and Neck Surgery, University Hospital Erlangen, Medical School Waldstrasse 1, Erlangen, 91054, Germany
| | - Donata Gellrich
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Marchioninistrasse 15, Munich, 81377, Germany
| |
Collapse
|
20
|
Schlegel P, Kist AM, Kunduk M, Dürr S, Döllinger M, Schützenberger A. Interdependencies between acoustic and high-speed videoendoscopy parameters. PLoS One 2021; 16:e0246136. [PMID: 33529244 PMCID: PMC7853476 DOI: 10.1371/journal.pone.0246136] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Accepted: 01/13/2021] [Indexed: 02/06/2023] Open
Abstract
In voice research, uncovering relations between the oscillating vocal folds, being the sound source of phonation, and the resulting perceived acoustic signal are of great interest. This is especially the case in the context of voice disorders, such as functional dysphonia (FD). We investigated 250 high-speed videoendoscopy (HSV) recordings with simultaneously recorded acoustic signals (124 healthy females, 60 FD females, 44 healthy males, 22 FD males). 35 glottal area waveform (GAW) parameters and 14 acoustic parameters were calculated for each recording. Linear and non-linear relations between GAW and acoustic parameters were investigated using Pearson correlation coefficients (PCC) and distance correlation coefficients (DCC). Further, norm values for parameters obtained from 250 ms long sustained phonation data (vowel /i/) were provided. 26 PCCs in females (5.3%) and 8 in males (1.6%) were found to be statistically significant (|corr.| ≥ 0.3). Only minor differences were found between PCCs and DCCs, indicating presence of weak non-linear dependencies between parameters. Fundamental frequency was involved in the majority of all relevant PCCs between GAW and acoustic parameters (19 in females and 7 in males). The most distinct difference between correlations in females and males was found for the parameter Period Variability Index. The study shows only weak relations between investigated acoustic and GAW-parameters. This indicates that the reduction of the complex 3D glottal dynamics to the 1D-GAW may erase laryngeal dynamic characteristics that are reflected within the acoustic signal. Hence, other GAW parameters, 2D-, 3D-laryngeal dynamics and vocal tract parameters should be further investigated towards potential correlations to the acoustic signal.
Collapse
Affiliation(s)
- Patrick Schlegel
- Department of Head & Neck Surgery, David Geffen School of Medicine, University of California Los Angeles (UCLA), Los Angeles, California, United States of America
- Dep. of Otorhinolaryngology, Div. of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
- * E-mail:
| | - Andreas M. Kist
- Dep. of Otorhinolaryngology, Div. of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Melda Kunduk
- Dep. of Communication Sciences and Disorders, Louisiana State University, Baton Rouge, Louisiana, United States of America
| | - Stephan Dürr
- Dep. of Otorhinolaryngology, Div. of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Michael Döllinger
- Dep. of Otorhinolaryngology, Div. of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Anne Schützenberger
- Dep. of Otorhinolaryngology, Div. of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
21
|
Hsu CM, Yang MY, Fang TJ, Wu CY, Tsai YT, Chang GH, Tsai MS. Maximum and Minimum Phonatory Glottal Area before and after Treatment for Vocal Nodules. Healthcare (Basel) 2020; 8:healthcare8030326. [PMID: 32906704 PMCID: PMC7551475 DOI: 10.3390/healthcare8030326] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Revised: 08/29/2020] [Accepted: 09/04/2020] [Indexed: 11/30/2022] Open
Abstract
Background: Vocal fold nodules (VFNs) are a challenge for otolaryngologists. Glottal area (GA) waveform analysis is an examination method used for assessing vocal fold vibration and function. However, GA in patients with VFNs has rarely been studied. This study investigated the maximum and minimum GA in VFN patients using modern waveform analysis combining ImageJ software and videostroboscopy. Methods: This study enrolled 42 patients newly diagnosed with VFN, 15 of whom received voice therapy and 27 of whom underwent surgery. Acoustic parameters and maximum phonation time (MPT) were recorded, and patients completed the Chinese Voice Handicap Index-10 (VHI-C10) before and after treatment. After videostroboscopy examination, the maximum and minimum GAs were calculated using ImageJ software. The GAs of patients with VFNs before and after surgery or voice therapy were analyzed. Results: The MPTs of the patients before and after voice therapy or surgery did not change significantly. VHI-C10 scores decreased after voice therapy but the decrease was nonsignificant (14.0 ± 8.44 vs. 9.40 ± 10.24, p = 0.222); VHI-C10 scores were significantly decreased after surgery (22.53 ± 7.17 vs. 12.75 ± 9.84, p = 0.038). Voice therapy significantly increased the maximum GA (5.58 ± 2.41 vs. 8.65 ± 3.17, p = 0.012) and nonsignificantly decreased the minimum GA (0.60 ± 0.73 vs. 0.21 ± 0.46, p = 0.098). Surgery nonsignificantly increased the maximum GA (6.34 ± 3.82 vs. 8.73 ± 5.57, p = 0.118) and significantly decreased the minimum GA (0.30 ± 0.59 vs. 0.00 ± 0.00, p = 0.036). Conclusion: This study investigated the GA of patients with VFNs who received voice therapy or surgery. The findings indicated that voice therapy significantly increased maximum GA and surgery significantly decreased minimum GA. GA analysis could be applied to evaluate the efficacy of voice therapy, and it may help physicians to develop precise treatment for VFN patients (either by optimizing voice therapy or by performing surgery directly).
Collapse
Affiliation(s)
- Cheng-Ming Hsu
- Department of Otolaryngology–Head and Neck Surgery, Chiayi Chang Gung Memorial Hospital, Chiayi 613, Taiwan; (C.-M.H.); (Y.-T.T.); (G.-H.C.)
- Faculty of Medicine, College of Medicine, Chang Gung University, Taoyuan 333, Taiwan;
| | - Ming-Yu Yang
- Department of Otolaryngology, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung 833, Taiwan;
- Graduate Institute of Clinical Medical Sciences, College of Medicine, Chang Gung University, Taoyuan 333, Taiwan
| | - Tuan-Jen Fang
- Faculty of Medicine, College of Medicine, Chang Gung University, Taoyuan 333, Taiwan;
- Department of Otolaryngology–Head and Neck Surgery, Linkou Chang Gung Memorial Hospital, Taoyuan 333, Taiwan
| | - Ching-Yuan Wu
- Department of Traditional Chinese Medicine, Chiayi Chang Gung Memorial Hospital, Chiayi 613, Taiwan;
- School of Traditional Chinese Medicine, College of Medicine, Chang Gung University, Taoyuan 333, Taiwan
| | - Yao-Te Tsai
- Department of Otolaryngology–Head and Neck Surgery, Chiayi Chang Gung Memorial Hospital, Chiayi 613, Taiwan; (C.-M.H.); (Y.-T.T.); (G.-H.C.)
| | - Geng-He Chang
- Department of Otolaryngology–Head and Neck Surgery, Chiayi Chang Gung Memorial Hospital, Chiayi 613, Taiwan; (C.-M.H.); (Y.-T.T.); (G.-H.C.)
| | - Ming-Shao Tsai
- Department of Otolaryngology–Head and Neck Surgery, Chiayi Chang Gung Memorial Hospital, Chiayi 613, Taiwan; (C.-M.H.); (Y.-T.T.); (G.-H.C.)
- Faculty of Medicine, College of Medicine, Chang Gung University, Taoyuan 333, Taiwan;
- Graduate Institute of Clinical Medical Sciences, College of Medicine, Chang Gung University, Taoyuan 333, Taiwan
- Correspondence: ; Tel.: +886-53621000 (ext. 2076); Fax: +886-53623002
| |
Collapse
|