1
|
Kavak ÖT, Gündüz Ş, Vural C, Enver N. Artificial intelligence based diagnosis of sulcus: assesment of videostroboscopy via deep learning. Eur Arch Otorhinolaryngol 2024:10.1007/s00405-024-08801-y. [PMID: 39001913 DOI: 10.1007/s00405-024-08801-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 06/19/2024] [Indexed: 07/15/2024]
Abstract
PURPOSE To develop a convolutional neural network (CNN)-based model for classifying videostroboscopic images of patients with sulcus, benign vocal fold (VF) lesions, and healthy VFs to improve clinicians' accuracy in diagnosis during videostroboscopies when evaluating sulcus. MATERIALS AND METHODS Videostroboscopies of 433 individuals who were diagnosed with sulcus (91), who were diagnosed with benign VF diseases (i.e., polyp, nodule, papilloma, cyst, or pseudocyst [311]), or who were healthy (33) were analyzed. After extracting 91,159 frames from videostroboscopies, a CNN-based model was created and tested. The healthy and sulcus groups underwent binary classification. In the second phase of the study, benign VF lesions were added to the training set, and multiclassification was executed across all groups. The proposed CNN-based model results were compared with five laryngology experts' assessments. RESULTS In the binary classification phase, the CNN-based model achieved 98% accuracy, 98% recall, 97% precision, and a 97% F1 score for classifying sulcus and healthy VFs. During the multiclassification phase, when evaluated on a subset of frames encompassing all included groups, the CNN-based model demonstrated greater accuracy when compared with that of the five laryngologists (%76 versus 72%, 68%, 72%, 63%, and 72%). CONCLUSION The utilization of a CNN-based model serves as a significant aid in the diagnosis of sulcus, a VF disease that presents notable challenges in the diagnostic process. Further research could be undertaken to assess the practicality of implementing this approach in real-time application in clinical practice.
Collapse
Affiliation(s)
- Ömer Tarık Kavak
- Department of Otorhinolaryngology, Marmara University Faculty of Medicine, Pendik Training and Research Hospital, Fevzi Çakmak Muhsin Yazıcıoğlu Street, İstanbul, 34899, Turkey.
| | - Şevket Gündüz
- VRLab Academy, 32 Willoughby Rd, Harringay Ladder, London, N8 0JG, UK
| | - Cabir Vural
- Marmara University Faculty of Engineering, Electrical and Electronics Engineering, Başıbüyük, RTE Campus, İstanbul, 34854, Turkey
| | - Necati Enver
- Department of Otorhinolaryngology, Marmara University Faculty of Medicine, Pendik Training and Research Hospital, Fevzi Çakmak Muhsin Yazıcıoğlu Street, İstanbul, 34899, Turkey
| |
Collapse
|
2
|
Wang CT, Chen TM, Lee NT, Fang SH. AI Detection of Glottic Neoplasm Using Voice Signals, Demographics, and Structured Medical Records. Laryngoscope 2024. [PMID: 38864282 DOI: 10.1002/lary.31563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 04/16/2024] [Accepted: 05/21/2024] [Indexed: 06/13/2024]
Abstract
OBJECTIVE This study investigated whether artificial intelligence (AI) models combining voice signals, demographics, and structured medical records can detect glottic neoplasm from benign voice disorders. METHODS We used a primary dataset containing 2-3 s of vowel "ah", demographics, and 26 items of structured medical records (e.g., symptoms, comorbidity, smoking and alcohol consumption, vocal demand) from 60 patients with pathology-proved glottic neoplasm (i.e., squamous cell carcinoma, carcinoma in situ, and dysplasia) and 1940 patients with benign voice disorders. The validation dataset comprised data from 23 patients with glottic neoplasm and 1331 patients with benign disorders. The AI model combined convolutional neural networks, gated recurrent units, and attention layers. We used 10-fold cross-validation (training-validation-testing: 8-1-1) and preserved the percentage between neoplasm and benign disorders in each fold. RESULTS Results from the AI model using voice signals reached an area under the ROC curve (AUC) value of 0.631, and additional demographics increased this to 0.807. The highest AUC of 0.878 was achieved when combining voice, demographics, and medical records (sensitivity: 0.783, specificity: 0.816, accuracy: 0.815). External validation yielded an AUC value of 0.785 (voice plus demographics; sensitivity: 0.739, specificity: 0.745, accuracy: 0.745). Subanalysis showed that AI had higher sensitivity but lower specificity than human assessment (p < 0.01). The accuracy of AI detection with additional medical records was comparable with human assessment (82% vs. 83%, p = 0.78). CONCLUSIONS Voice signal alone was insufficient for AI differentiation between glottic neoplasm and benign voice disorders, but additional demographics and medical records notably improved AI performance and approximated the prediction accuracy of humans. LEVEL OF EVIDENCE NA Laryngoscope, 2024.
Collapse
Affiliation(s)
- Chi-Te Wang
- Department of Otolaryngology Head and Neck Surgery, Far Eastern Memorial Hospital, Taipei, Taiwan
- Center of Artificial Intelligence, Far Eastern Memorial Hospital, Taipei, Taiwan
- Department of Electrical Engineering, Yuan Ze University, Taoyuan, Taiwan
| | - Tsai-Min Chen
- Graduate Program of Data Science, National Taiwan University and Academia Sinica, Taipei, Taiwan
- Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan
| | - Nien-Ting Lee
- Center of Artificial Intelligence, Far Eastern Memorial Hospital, Taipei, Taiwan
| | - Shih-Hau Fang
- Department of Electrical Engineering, Yuan Ze University, Taoyuan, Taiwan
- Department of Electrical Engineering, National Taiwan Normal University, Taipei, Taiwan
| |
Collapse
|
3
|
Barlow J, Sragi Z, Rivera-Rivera G, Al-Awady A, Daşdöğen Ü, Courey MS, Kirke DN. The Use of Deep Learning Software in the Detection of Voice Disorders: A Systematic Review. Otolaryngol Head Neck Surg 2024; 170:1531-1543. [PMID: 38168017 DOI: 10.1002/ohn.636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 11/30/2023] [Accepted: 12/07/2023] [Indexed: 01/05/2024]
Abstract
OBJECTIVE To summarize the use of deep learning in the detection of voice disorders using acoustic and laryngoscopic input, compare specific neural networks in terms of accuracy, and assess their effectiveness compared to expert clinical visual examination. DATA SOURCES Embase, MEDLINE, and Cochrane Central. REVIEW METHODS Databases were screened through November 11, 2023 for relevant studies. The inclusion criteria required studies to utilize a specified deep learning method, use laryngoscopy or acoustic input, and measure accuracy of binary classification between healthy patients and those with voice disorders. RESULTS Thirty-four studies met the inclusion criteria, with 18 focusing on voice analysis, 15 on imaging analysis, and 1 both. Across the 18 acoustic studies, 21 programs were used for identification of organic and functional voice disorders. These technologies included 10 convolutional neural networks (CNNs), 6 multilayer perceptrons (MLPs), and 5 other neural networks. The binary classification systems yielded a mean accuracy of 89.0% overall, including 93.7% for MLP programs and 84.5% for CNNs. Among the 15 imaging analysis studies, a total of 23 programs were utilized, resulting in a mean accuracy of 91.3%. Specifically, the twenty CNNs achieved a mean accuracy of 92.6% compared to 83.0% for the 3 MLPs. CONCLUSION Deep learning models were shown to be highly accurate in the detection of voice pathology, with CNNs most effective for assessing laryngoscopy images and MLPs most effective for assessing acoustic input. While deep learning methods outperformed expert clinical exam in limited comparisons, further studies integrating external validation are necessary.
Collapse
Affiliation(s)
- Joshua Barlow
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York City, New York, USA
| | - Zara Sragi
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York City, New York, USA
| | - Gabriel Rivera-Rivera
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York City, New York, USA
| | - Abdurrahman Al-Awady
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York City, New York, USA
| | - Ümit Daşdöğen
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York City, New York, USA
| | - Mark S Courey
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York City, New York, USA
| | - Diana N Kirke
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York City, New York, USA
| |
Collapse
|
4
|
Mamidi IS, Dunham ME, Adkins LK, McWhorter AJ, Fang Z, Banh BT. Laryngeal Cancer Screening During Flexible Video Laryngoscopy Using Large Computer Vision Models. Ann Otol Rhinol Laryngol 2024:34894241253376. [PMID: 38755974 DOI: 10.1177/00034894241253376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/18/2024]
Abstract
OBJECTIVE Develop an artificial intelligence assisted computer vision model to screen for laryngeal cancer during flexible laryngoscopy. METHODS Using laryngeal images and flexible laryngoscopy video recordings, we developed computer vision models to classify video frames for usability and cancer screening. A separate model segments any identified lesions on the frames. We used these computer vision models to construct a video stream annotation system. This system classifies findings from flexible laryngoscopy as "potentially malignant" or "probably benign" and segments any detected lesions. Additionally, the model provides a confidence level for each classification. RESULTS The overall accuracy of the flexible laryngoscopy cancer screening model was 92%. For cancer screening, it achieved a sensitivity of 97.7% and a specificity of 76.9%. The segmentation model attained an average precision at a 0.50 intersection-over-union of 0.595. The confidence level for positive screening results can assist clinicians in counseling patients regarding the findings. CONCLUSION Our model is highly sensitive and adequately specific for laryngeal cancer screening. Segmentation helps endoscopists identify and describe potential lesions. Further optimization is required to enable the model's deployment in clinical settings for real-time annotation during flexible laryngoscopy.
Collapse
Affiliation(s)
- Ishwarya S Mamidi
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, LA, USA
| | - Michael E Dunham
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, LA, USA
| | - Lacey K Adkins
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, LA, USA
| | - Andrew J McWhorter
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, LA, USA
| | - Zhide Fang
- Biostatistics Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans LA, USA
| | - Britney T Banh
- Our Lady of the Lake Voice Center, Our Lady of the Lake Regional Medical Center, Baton Rouge, LA, USA
| |
Collapse
|
5
|
Alter IL, Chan K, Lechien J, Rameau A. An introduction to machine learning and generative artificial intelligence for otolaryngologists-head and neck surgeons: a narrative review. Eur Arch Otorhinolaryngol 2024; 281:2723-2731. [PMID: 38393353 DOI: 10.1007/s00405-024-08512-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Accepted: 01/25/2024] [Indexed: 02/25/2024]
Abstract
PURPOSE Despite the robust expansion of research surrounding artificial intelligence (AI) and machine learning (ML) and their applications to medicine, these methodologies often remain opaque and inaccessible to many otolaryngologists. Especially, with the increasing ubiquity of large-language models (LLMs), such as ChatGPT and their potential implementation in clinical practice, clinicians may benefit from a baseline understanding of some aspects of AI. In this narrative review, we seek to clarify underlying concepts, illustrate applications to otolaryngology, and highlight future directions and limitations of these tools. METHODS Recent literature regarding AI principles and otolaryngologic applications of ML and LLMs was reviewed via search in PubMed and Google Scholar. RESULTS Significant recent strides have been made in otolaryngology research utilizing AI and ML, across all subspecialties, including neurotology, head and neck oncology, laryngology, rhinology, and sleep surgery. Potential applications suggested by recent publications include screening and diagnosis, predictive tools, clinical decision support, and clinical workflow improvement via LLMs. Ongoing concerns regarding AI in medicine include ethical concerns around bias and data sharing, as well as the "black box" problem and limitations in explainability. CONCLUSIONS Potential implementations of AI in otolaryngology are rapidly expanding. While implementation in clinical practice remains theoretical for most of these tools, their potential power to influence the practice of otolaryngology is substantial. LEVEL OF EVIDENCE: 4
Collapse
Affiliation(s)
- Isaac L Alter
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medical College, 240 E 59 St, New York, NY, 10022, USA
| | - Karly Chan
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medical College, 240 E 59 St, New York, NY, 10022, USA
| | - Jérome Lechien
- Department of Otorhinolaryngology, Head and Neck Surgery, Hôpital Foch, School of Medicine, UFR Simone Veil, Université Versailles Saint-Quentin-en-Yvelines (Paris Saclay University), Paris, France
- Department of Human Anatomy and Experimental Oncology, Faculty of Medicine, UMONS Research Institute for Health and Sciences Technology, University of Mons (UMons), Mons, Belgium
| | - Anaïs Rameau
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medical College, 240 E 59 St, New York, NY, 10022, USA.
| |
Collapse
|
6
|
Evangelista E, Kale R, McCutcheon D, Rameau A, Gelbard A, Powell M, Johns M, Law A, Song P, Naunheim M, Watts S, Bryson PC, Crowson MG, Pinto J, Bensoussan Y. Current Practices in Voice Data Collection and Limitations to Voice AI Research: A National Survey. Laryngoscope 2024; 134:1333-1339. [PMID: 38087983 DOI: 10.1002/lary.31052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Revised: 08/08/2023] [Accepted: 08/29/2023] [Indexed: 02/17/2024]
Abstract
INTRODUCTION Accuracy and validity of voice AI algorithms rely on substantial quality voice data. Although commensurable amounts of voice data are captured daily in voice centers across North America, there is no standardized protocol for acoustic data management, which limits the usability of these datasets for voice artificial intelligence (AI) research. OBJECTIVE The aim was to capture current practices of voice data collection, storage, analysis, and perceived limitations to collaborative voice research. METHODS A 30-question online survey was developed with expert guidance from the voicecollab.ai members, an international collaborative of voice AI researchers. The survey was disseminated via REDCap to an estimated 200 practitioners at North American voice centers. Survey questions assessed respondents' current practices in terms of acoustic data collection, storage, and retrieval as well as limitations to collaborative voice research. RESULTS Seventy-two respondents completed the survey of which 81.7% were laryngologists and 18.3% were speech language pathologists (SLPs). Eighteen percent of respondents reported seeing 40%-60% and 55% reported seeing >60 patients with voice disorders weekly (conservative estimate of over 4000 patients/week). Only 28% of respondents reported utilizing standardized protocols for collection and storage of acoustic data. Although, 87% of respondents conduct voice research, only 38% of respondents report doing so on a multi-institutional level. Perceived limitations to conducting collaborative voice research include lack of standardized methodology for collection (30%) and lack of human resources to prepare and label voice data adequately (55%). CONCLUSION To conduct large-scale multi-institutional voice research with AI, there is a pertinent need for standardization of acoustic data management, as well as an infrastructure for secure and efficient data sharing. LEVEL OF EVIDENCE 5 Laryngoscope, 134:1333-1339, 2024.
Collapse
Affiliation(s)
- Emily Evangelista
- University of South Florida Morsani College of Medicine, Tampa, Florida, U.S.A
| | - Rohan Kale
- Department of Biology, University of South Florida, Tampa, Florida, U.S.A
| | | | - Anais Rameau
- Department of Otolaryngology, Head and Neck Surgery Weill Cornell Medical College, Ithaca, New York, U.S.A
| | - Alexander Gelbard
- Department of Otolaryngology, Head and Neck Surgery Vanderbilt University Medical Center, Nashville, Tennessee, U.S.A
| | - Maria Powell
- Department of Otolaryngology, Head and Neck Surgery Vanderbilt University Medical Center, Nashville, Tennessee, U.S.A
| | - Michael Johns
- Department of Otolaryngology-Head and Neck Surgery Keck College of Medicine, University of Southern California, Los Angeles, California, U.S.A
| | - Anthony Law
- Department of Otolaryngology, Emory University School of Medicine, Atlanta, Georgia, U.S.A
| | - Phillip Song
- Massachusetts Eye and Ear, Division of Laryngology, Otolaryngology-Head and Neck Surgery Harvard Medical School, Boston, Massachusetts, U.S.A
| | - Matthew Naunheim
- Massachusetts Eye and Ear, Division of Laryngology, Otolaryngology-Head and Neck Surgery Harvard Medical School, Boston, Massachusetts, U.S.A
| | - Stephanie Watts
- Department of Otolaryngology, Head and Neck Surgery at University of South Florida Morsani College of Medicine, Tampa, Florida, U.S.A
| | - Paul C Bryson
- Department of Otolaryngology, Head and Neck Surgery at Cleveland Clinic, Cleveland, Ohio, U.S.A
| | - Matthew G Crowson
- Massachusetts Eye and Ear, Otolaryngology-Head and Neck Surgery Harvard Medical School, Boston, Massachusetts, U.S.A
| | - Jeremy Pinto
- Mila Quebec Artificial Intelligence Institute, Montreal, Quebec, Canada
| | - Yael Bensoussan
- Division of Laryngology Department of Otolaryngology, Head and Neck Surgery at University of South Florida Morsani College of Medicine, Tampa, Florida, U.S.A
| |
Collapse
|
7
|
You Z, Han B, Shi Z, Zhao M, Du S, Yan J, Liu H, Hei X, Ren X, Yan Y. Vocal cord leukoplakia classification using deep learning models in white light and narrow band imaging endoscopy images. Head Neck 2023; 45:3129-3145. [PMID: 37837264 DOI: 10.1002/hed.27543] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 09/15/2023] [Accepted: 09/29/2023] [Indexed: 10/15/2023] Open
Abstract
BACKGROUND Accurate vocal cord leukoplakia classification is critical for the individualized treatment and early detection of laryngeal cancer. Numerous deep learning techniques have been proposed, but it is unclear how to select one to apply in the laryngeal tasks. This article introduces and reliably evaluates existing deep learning models for vocal cord leukoplakia classification. METHODS We created white light and narrow band imaging (NBI) image datasets of vocal cord leukoplakia which were classified into six classes: normal tissues (NT), inflammatory keratosis (IK), mild dysplasia (MiD), moderate dysplasia (MoD), severe dysplasia (SD), and squamous cell carcinoma (SCC). Vocal cord leukoplakia classification was performed using six classical deep learning models, AlexNet, VGG, Google Inception, ResNet, DenseNet, and Vision Transformer. RESULTS GoogLeNet (i.e., Google Inception V1), DenseNet-121, and ResNet-152 perform excellent classification. The highest overall accuracy of white light image classification is 0.9583, while the highest overall accuracy of NBI image classification is 0.9478. These three neural networks all provide very high sensitivity, specificity, and precision values. CONCLUSION GoogLeNet, ResNet, and DenseNet can provide accurate pathological classification of vocal cord leukoplakia. It facilitates early diagnosis, providing judgment on conservative treatment or surgical treatment of different degrees, and reducing the burden on endoscopists.
Collapse
Affiliation(s)
- Zhenzhen You
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, China
| | - Botao Han
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, China
| | - Zhenghao Shi
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, China
| | - Minghua Zhao
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, China
| | - Shuangli Du
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, China
| | - Jing Yan
- Department of Otorhinolaryngology, Second Affiliated Hospital of Medical College, Xi'an Jiaotong University, Xi'an, China
| | - Haiqin Liu
- Department of Otorhinolaryngology, Second Affiliated Hospital of Medical College, Xi'an Jiaotong University, Xi'an, China
| | - Xinhong Hei
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, China
| | - Xiaoyong Ren
- Department of Otorhinolaryngology, Second Affiliated Hospital of Medical College, Xi'an Jiaotong University, Xi'an, China
| | - Yan Yan
- Department of Otorhinolaryngology, Second Affiliated Hospital of Medical College, Xi'an Jiaotong University, Xi'an, China
| |
Collapse
|
8
|
Bur AM, Zhang T, Chen X, Kavookjian H, Kraft S, Karadaghy O, Farrokhian N, Mussatto C, Penn J, Wang G. Interpretable Computer Vision to Detect and Classify Structural Laryngeal Lesions in Digital Flexible Laryngoscopic Images. Otolaryngol Head Neck Surg 2023; 169:1564-1572. [PMID: 37350279 DOI: 10.1002/ohn.411] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 06/01/2023] [Accepted: 06/10/2023] [Indexed: 06/24/2023]
Abstract
OBJECTIVE To localize structural laryngeal lesions within digital flexible laryngoscopic images and to classify them as benign or suspicious for malignancy using state-of-the-art computer vision detection models. STUDY DESIGN Cross-sectional diagnostic study SETTING: Tertiary care voice clinic METHODS: Digital stroboscopic videos, demographic and clinical data were collected from patients evaluated for a structural laryngeal lesion. Laryngoscopic images were extracted from videos and manually labeled with bounding boxes encompassing the lesion. Four detection models were employed to simultaneously localize and classify structural laryngeal lesions in laryngoscopic images. Classification accuracy, intersection over union (IoU) and mean average precision (mAP) were evaluated as measures of classification, localization, and overall performance, respectively. RESULTS In total, 8,172 images from 147 patients were included in the laryngeal image dataset. Classification accuracy was 88.5 for individual laryngeal images and increased to 92.0 when all images belonging to the same sequence (video) were considered. Mean average precision across all four detection models was 50.1 using an IoU threshold of 0.5 to determine successful localization. CONCLUSION Results of this study showed that deep neural network-based detection models trained using a labeled dataset of digital laryngeal images have the potential to classify structural laryngeal lesions as benign or suspicious for malignancy and to localize them within an image. This approach provides valuable insight into which part of the image was used by the model to determine a diagnosis, allowing clinicians to independently evaluate models' predictions.
Collapse
Affiliation(s)
- Andrés M Bur
- Department of Otolaryngology-Head and Neck Surgery, University of Kansas Medical Center, Kansas City, KS, USA
| | - Tianxiao Zhang
- Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS, USA
| | - Xiangyu Chen
- Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS, USA
| | - Hannah Kavookjian
- Department of Otolaryngology-Head and Neck Surgery, University of Kansas Medical Center, Kansas City, KS, USA
| | - Shannon Kraft
- Department of Otolaryngology-Head and Neck Surgery, University of Kansas Medical Center, Kansas City, KS, USA
| | - Omar Karadaghy
- Department of Otolaryngology-Head and Neck Surgery, University of Kansas Medical Center, Kansas City, KS, USA
| | - Nathan Farrokhian
- Department of Otolaryngology-Head and Neck Surgery, University of Kansas Medical Center, Kansas City, KS, USA
| | | | - Joseph Penn
- University of Kansas School of Medicine, Kansas City, KS, USA
| | - Guanghui Wang
- Department of Computer Science, Toronto Metropolitan University, Toronto, ON, Canada
| |
Collapse
|
9
|
Wu Q, Wang X, Liang G, Luo X, Zhou M, Deng H, Zhang Y, Huang X, Yang Q. Advances in Image-Based Artificial Intelligence in Otorhinolaryngology-Head and Neck Surgery: A Systematic Review. Otolaryngol Head Neck Surg 2023; 169:1132-1142. [PMID: 37288505 DOI: 10.1002/ohn.391] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 04/27/2023] [Accepted: 05/13/2023] [Indexed: 06/09/2023]
Abstract
OBJECTIVE To update the literature and provide a systematic review of image-based artificial intelligence (AI) applications in otolaryngology, highlight its advances, and propose future challenges. DATA SOURCES Web of Science, Embase, PubMed, and Cochrane Library. REVIEW METHODS Studies written in English, published between January 2020 and December 2022. Two independent authors screened the search results, extracted data, and assessed studies. RESULTS Overall, 686 studies were identified. After screening titles and abstracts, 325 full-text studies were assessed for eligibility, and 78 studies were included in this systematic review. The studies originated from 16 countries. Among these countries, the top 3 were China (n = 29), Korea (n = 8), the United States, and Japan (n = 7 each). The most common area was otology (n = 35), followed by rhinology (n = 20), pharyngology (n = 18), and head and neck surgery (n = 5). Most applications of AI in otology, rhinology, pharyngology, and head and neck surgery mainly included chronic otitis media (n = 9), nasal polyps (n = 4), laryngeal cancer (n = 12), and head and neck squamous cell carcinoma (n = 3), respectively. The overall performance of AI in accuracy, the area under the curve, sensitivity, and specificity were 88.39 ± 9.78%, 91.91 ± 6.70%, 86.93 ± 11.59%, and 88.62 ± 14.03%, respectively. CONCLUSION This state-of-the-art review aimed to highlight the increasing applications of image-based AI in otorhinolaryngology head and neck surgery. The following steps will entail multicentre collaboration to ensure data reliability, ongoing optimization of AI algorithms, and integration into real-world clinical practice. Future studies should consider 3-dimensional (3D)-based AI, such as 3D surgical AI.
Collapse
Affiliation(s)
- Qingwu Wu
- Department of Otorhinolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
- Department of Allergy, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Xinyue Wang
- Department of Otorhinolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Guixian Liang
- Department of Otorhinolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Xin Luo
- Department of Otorhinolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Min Zhou
- Department of Otorhinolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
- Department of Allergy, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Huiyi Deng
- Department of Otorhinolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Yana Zhang
- Department of Otorhinolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Xuekun Huang
- Department of Otorhinolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Qintai Yang
- Department of Otorhinolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
- Department of Allergy, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
10
|
Korn GP, Gama ACC, Nascimento UND. Visual-perceptive assessment of glottic characteristics of vocal nodules by means of high-speed videoendoscopy. Braz J Otorhinolaryngol 2023; 89:101275. [PMID: 37271116 PMCID: PMC10250930 DOI: 10.1016/j.bjorl.2023.05.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 05/03/2023] [Indexed: 06/06/2023] Open
Abstract
OBJECTIVE Visual-perceptive assessment of glottic characteristics of vocal nodules by means of high-speed videoendoscopy. METHODS Descriptive observational research with convenience sampling of five laryngeal videos of women with an average age of 25 years. The diagnosis of vocal nodules was defined by two otolaryngologists, with 100% intra-rater agreement and 53.40% inter-rater agreement and five otolaryngologists as judge assessed the laryngeal videos based on an adapted protocol. The statistical analysis calculated measures of central tendency and dispersion, as well as percentage. The AC1 coefficient was used for agreement analysis. RESULTS In high-speed videoendoscopy imaging, vocal nodules are characterized by amplitude of the mucosal wave and muco-undulatory movement with magnitude between 50% and 60%. Non-vibrating segments of vocal folds are scarce, and the glottal cycle does not show a predominant phase, it is symmetric and periodic. Glottal closure is characterized by the presence of a mid-posterior triangular chink (double chink or isolated mid-posterior triangular chink), without movement of supraglottic laryngeal structures, with irregular contour of the free edge of vocal folds, which are vertically on-plane. CONCLUSION Vocal nodules present mid-posterior triangular chink and irregular free edge contour. Amplitude and mucosal wave were partially reduced. LEVEL OF EVIDENCE Level 4 (Case-series).
Collapse
|
11
|
A Novel Framework of Manifold Learning Cascade-Clustering for the Informative Frame Selection. Diagnostics (Basel) 2023; 13:diagnostics13061151. [PMID: 36980459 PMCID: PMC10047422 DOI: 10.3390/diagnostics13061151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 03/05/2023] [Accepted: 03/10/2023] [Indexed: 03/19/2023] Open
Abstract
Narrow band imaging is an established non-invasive tool used for the early detection of laryngeal cancer in surveillance examinations. Most images produced from the examination are useless, such as blurred, specular reflection, and underexposed. Removing the uninformative frames is vital to improve detection accuracy and speed up computer-aided diagnosis. It often takes a lot of time for the physician to manually inspect the informative frames. This issue is commonly addressed by a classifier with task-specific categories of the uninformative frames. However, the definition of the uninformative categories is ambiguous, and tedious labeling still cannot be avoided. Here, we show that a novel unsupervised scheme is comparable to the current benchmarks on the dataset of NBI-InfFrames. We extract feature embedding using a vanilla neural network (VGG16) and introduce a new dimensionality reduction method called UMAP that distinguishes the feature embedding in the lower-dimensional space. Along with the proposed automatic cluster labeling algorithm and cost function in Bayesian optimization, the proposed method coupled with UMAP achieves state-of-the-art performance. It outperforms the baseline by 12% absolute. The overall median recall of the proposed method is currently the highest, 96%. Our results demonstrate the effectiveness of the proposed scheme and the robustness of detecting the informative frames. It also suggests the patterns embedded in the data help develop flexible algorithms that do not require manual labeling.
Collapse
|
12
|
Bensoussan Y, Vanstrum EB, Johns MM, Rameau A. Artificial Intelligence and Laryngeal Cancer: From Screening to Prognosis: A State of the Art Review. Otolaryngol Head Neck Surg 2023; 168:319-329. [PMID: 35787073 DOI: 10.1177/01945998221110839] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 06/13/2022] [Indexed: 11/16/2022]
Abstract
OBJECTIVE This state of the art review aims to examine contemporary advances in applications of artificial intelligence (AI) to the screening, detection, management, and prognostication of laryngeal cancer (LC). DATA SOURCES Four bibliographic databases were searched: PubMed, EMBASE, Cochrane, and IEEE. REVIEW METHODS A structured review of the current literature (up to January 2022) was performed. Search terms related to topics of AI in LC were identified and queried by 2 independent reviewers. Citations of selected studies and review articles were also evaluated to ensure comprehensiveness. CONCLUSIONS AI applications in LC have encompassed a variety of data modalities, including radiomics, genomics, acoustics, clinical data, and videomics, to support screening, diagnosis, therapeutic decision making, and prognosis. However, most studies remain at the proof-of-concept level, as AI algorithms are trained on single-institution databases with limited data sets and a single data modality. IMPLICATIONS FOR PRACTICE AI algorithms in LC will need to be trained on large multi-institutional data sets and integrate multimodal data for optimal performance and clinical utility from screening to prognosis. Out of the data types reviewed, genomics has the most potential to provide generalizable models thanks to available large multi-institutional open access genomic data sets. Voice acoustic data represent an inexpensive and accurate biomarker, which is easy and noninvasive to capture, offering a unique opportunity for screening and monitoring of LA, especially in low-resource settings.
Collapse
Affiliation(s)
- Yael Bensoussan
- Department of Otolaryngology-Head and Neck Surgery, University of South Florida, Tampa, Florida, USA
| | - Erik B Vanstrum
- Keck School of Medicine, University of Southern California, Los Angeles, California, USA
| | - Michael M Johns
- Department of Otolaryngology-Head and Neck Surgery, University of Southern California, Los Angeles, California, USA
| | - Anaïs Rameau
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medical College, New York, New York, USA
| |
Collapse
|
13
|
Zhu JQ, Wang ML, Li Y, Zhang W, Li LJ, Liu L, Zhang Y, Han CJ, Tie CW, Wang SX, Wang GQ, Ni XG. Convolutional neural network based anatomical site identification for laryngoscopy quality control: A multicenter study. Am J Otolaryngol 2023; 44:103695. [PMID: 36473265 DOI: 10.1016/j.amjoto.2022.103695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Revised: 09/26/2022] [Accepted: 11/19/2022] [Indexed: 11/25/2022]
Abstract
OBJECTIVES Video laryngoscopy is an important diagnostic tool for head and neck cancers. The artificial intelligence (AI) system has been shown to monitor blind spots during esophagogastroduodenoscopy. This study aimed to test the performance of AI-driven intelligent laryngoscopy monitoring assistant (ILMA) for landmark anatomical sites identification on laryngoscopic images and videos based on a convolutional neural network (CNN). MATERIALS AND METHODS The laryngoscopic images taken from January to December 2018 were retrospectively collected, and ILMA was developed using the CNN model of Inception-ResNet-v2 + Squeeze-and-Excitation Networks (SENet). A total of 16,000 laryngoscopic images were used for training. These were assigned to 20 landmark anatomical sites covering six major head and neck regions. In addition, the performance of ILMA in identifying anatomical sites was validated using 4000 laryngoscopic images and 25 videos provided by five other tertiary hospitals. RESULTS ILMA identified the 20 anatomical sites on the laryngoscopic images with a total accuracy of 97.60 %, and the average sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were 100 %, 99.87 %, 97.65 %, and 99.87 %, respectively. In addition, multicenter clinical verification displayed that the accuracy of ILMA in identifying the 20 targeted anatomical sites in 25 laryngoscopic videos from five hospitals was ≥95 %. CONCLUSION The proposed CNN-based ILMA model can rapidly and accurately identify the anatomical sites on laryngoscopic images. The model can reflect the coverage of anatomical regions of the head and neck by laryngoscopy, showing application potential in improving the quality of laryngoscopy.
Collapse
Affiliation(s)
- Ji-Qing Zhu
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Mei-Ling Wang
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China
| | - Ying Li
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China
| | - Wei Zhang
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China
| | - Li-Juan Li
- Department of Otorhinolaryngology, The People's Hospital of Wenshan Prefecture, Wenshan, Yunnan, China
| | - Lin Liu
- Department of Otolaryngology-Head and Neck Surgery, Dalian Municipal Friendship Hospital, Dalian, Liaoning, China
| | - Yan Zhang
- Department of Otorhinolaryngology, Chongqing Traditional Chinese Medicine Hospital, Chongqing, China
| | - Cai-Juan Han
- Department of Otolaryngology-Head and Neck Surgery, Qilu Hospital (Qingdao), Cheeloo College of Medicine, Shandong University, Qingdao, Shandong, China
| | - Cheng-Wei Tie
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Shi-Xu Wang
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Gui-Qi Wang
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.
| | - Xiao-Guang Ni
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.
| |
Collapse
|
14
|
Arias-Vergara T, Döllinger M, Schraut T, Mohd Khairuddin KA, Schützenberger A. Nyquist Plot Parametrization for Quantitative Analysis of Vibration of the Vocal Folds. J Voice 2023:S0892-1997(23)00014-0. [PMID: 36774264 DOI: 10.1016/j.jvoice.2023.01.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 01/12/2023] [Accepted: 01/12/2023] [Indexed: 02/11/2023]
Abstract
OBJECTIVES The Nyquist plot provides a graphical representation of the glottal cycles as elliptical trajectories in a 2D plane. This study proposes a methodology to parameterize the Nyquist plot with application to support the quantitative analysis of voice disorders. METHODS We considered high-speed videoendoscopy recordings of 33 functional dysphonia (FD) patients and 33 normophonic controls (NC). Quantitative analysis was performed by computing four shape-based parameters from the Nyquist plot: Variability, Size (Perimeter and Area), and Consistency. Additionally, we performed automatic classification using a linear support vector machine and feature importance analysis by combining the proposed features with state-of-the-art glottal area waveform (GAW) parameters. RESULTS We found that the inter-cycle variability was significantly higher in FD patients compared to NC. We achieved a classification accuracy of 83% when the top 30 most important features were used. Furthermore, the proposed Nyquist plot features were ranked in the top 12 most important features. CONCLUSIONS The Nyquist plot provides complementary information for subjective and objective assessment of voice disorders. On the one hand, with visual inspection it is possible to observe intra- and inter-glottal cycle irregularities during sustained phonation. On the other hand, shaped-based parameters allow quantifying such irregularities and provide complementary information to state-of-the-art GAW parameters.
Collapse
Affiliation(s)
- Tomás Arias-Vergara
- University Hospital Erlangen, Medical School Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany.
| | - Michael Döllinger
- University Hospital Erlangen, Medical School Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
| | - Tobias Schraut
- University Hospital Erlangen, Medical School Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
| | | | - Anne Schützenberger
- University Hospital Erlangen, Medical School Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
| |
Collapse
|
15
|
Lechien JR, Rameau A, De Marrez LG, Le Bosse G, Negro K, Sebestyen A, Baudouin R, Saussez S, Hans S. Usefulness, acceptation and feasibility of electronic medical history tool in reflux disease. Eur Arch Otorhinolaryngol 2023; 280:259-267. [PMID: 35763082 DOI: 10.1007/s00405-022-07520-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 06/19/2022] [Indexed: 01/07/2023]
Abstract
OBJECTIVES To investigate usefulness, feasibility, and patient satisfaction of an electronic pre-consultation medical history tool (EPMH) in laryngopharyngeal reflux (LPR) work-up. METHODS Seventy-five patients with LPR were invited to complete electronic medical history assessment prior to laryngology consultation. EPMH collected the following parameters: demographic and epidemiological data, medication, medical and surgical histories, diet habits, stress and symptom findings. Stress and symptoms were assessed with perceived stress scale and reflux symptom score. Duration of consultation, acceptance, and satisfaction of patients (feasibility, usefulness, effectiveness, understanding of questions) were evaluated through a 9-item patient-reported outcome questionnaire. RESULTS Seventy patients completed the evaluation (93% participation rate). The mean age of cohort was 51.2 ± 15.6 years old. There were 35 females and 35 males. Patients who refused to participate (N = 5) were > 65 years old. The consultation duration was significantly lower in patients who used the EPMH (11.3 ± 2.7 min) compared with a control group (18.1 ± 5.1 min; p = 0.001). Ninety percent of patients were satisfied about EPMH easiness and usefulness, while 97.1% thought that EPMH may improve the disease management. Patients would recommend similar approach for otolaryngological or other specialty consultations in 98.6% and 92.8% of cases, respectively. CONCLUSION The use of EPMH is associated with adequate usefulness, feasibility, and satisfaction outcomes in patients with LPR. This software is a preliminary step in the development of an AI-based diagnostic decision support tool to help laryngologists in their daily practice. Future randomized controlled studies are needed to investigate the gain of similar approaches on the traditional consultation format.
Collapse
Affiliation(s)
- Jerome R Lechien
- Department of Otolaryngology, Elsan Hospital, Paris, France. .,Department of Otolaryngology-Head and Neck Surgery, Foch Hospital, School of Medicine, University Paris Saclay, Worth street, 40, 92150, Paris, Suresnes, France. .,Department of Otolaryngology-Head and Neck Surgery, CHU Saint-Pierre, Brussels, Belgium. .,Department of Human Anatomy and Experimental Oncology, Faculty of Medicine, UMONS Research Institute for Health Sciences and Technology, University of Mons (UMons), Mons, Belgium.
| | - Anaïs Rameau
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, New York, NY, USA
| | - Lisa G De Marrez
- Department of Otolaryngology-Head and Neck Surgery, Foch Hospital, School of Medicine, University Paris Saclay, Worth street, 40, 92150, Paris, Suresnes, France
| | - Gautier Le Bosse
- Department of Otolaryngology-Head and Neck Surgery, Foch Hospital, School of Medicine, University Paris Saclay, Worth street, 40, 92150, Paris, Suresnes, France.,Department of Artificial Intelligence Applied to Medical Structure, Special School of Mechanic and Electricity (ESME) Sudria, Paris, France
| | - Karina Negro
- Department of Otolaryngology-Head and Neck Surgery, Foch Hospital, School of Medicine, University Paris Saclay, Worth street, 40, 92150, Paris, Suresnes, France.,Department of Artificial Intelligence Applied to Medical Structure, Special School of Mechanic and Electricity (ESME) Sudria, Paris, France
| | - Andra Sebestyen
- Department of Otolaryngology-Head and Neck Surgery, Foch Hospital, School of Medicine, University Paris Saclay, Worth street, 40, 92150, Paris, Suresnes, France
| | - Robin Baudouin
- Department of Otolaryngology-Head and Neck Surgery, Foch Hospital, School of Medicine, University Paris Saclay, Worth street, 40, 92150, Paris, Suresnes, France
| | - Sven Saussez
- Department of Otolaryngology-Head and Neck Surgery, CHU Saint-Pierre, Brussels, Belgium.,Department of Human Anatomy and Experimental Oncology, Faculty of Medicine, UMONS Research Institute for Health Sciences and Technology, University of Mons (UMons), Mons, Belgium
| | - Stéphane Hans
- Department of Otolaryngology-Head and Neck Surgery, Foch Hospital, School of Medicine, University Paris Saclay, Worth street, 40, 92150, Paris, Suresnes, France
| |
Collapse
|
16
|
Comparison of convolutional neural networks for classification of vocal fold nodules from high-speed video images. Eur Arch Otorhinolaryngol 2022; 280:2365-2371. [PMID: 36357609 DOI: 10.1007/s00405-022-07736-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Accepted: 10/29/2022] [Indexed: 11/12/2022]
Abstract
OBJECTIVES Deep learning is in this study used through convolutional neural networks (CNN) to the determination of vocal fold nodules. Through high-speed video (HSV) images and computer-assisted tools, a comparison of convolutional neural network models and their accuracy will be presented. METHODS The data have been collected by an Ear Nose Throat (ENT) specialist with a 90° rigid scope in the years from 2007 to 2019, where 15.732 high-speed videos have been collected from 7909 patients. A total of 4000 images have been carefully selected, 2000 images were of normal vocal folds and 2000 images were of vocal folds with varying degrees of vocal fold nodules. These images were then split into training-, validation-, and testing-data set, for use with a CNN model with 5 layers (CNN5) and compared to other models: VGG19, MobileNetV2, and Inception-ResNetV2. To compare the neural network models, the following evaluation metrics have been calculated: accuracy, sensitivity, specificity, precision, and negative predictive values. RESULTS All the trained CNN models have shown high accuracy when applied to the test set. The accuracy is 97.75%, 83.5%, 91.5%, and 89.75%, for CNN5, VGG19, MobileNetV2, and InceptionResNetV2, respectively. CONCLUSIONS Precision was identified as the most relevant performance metric for a study that focuses on the classification of vocal fold nodules. The highest performing model was MobilNetV2 with a precision of 97.7%. The average accuracy across all 4 neural networks was 90.63% showing that neural networks can be used for classifying vocal fold nodules in a clinical setting.
Collapse
|
17
|
Döllinger M, Schraut T, Henrich LA, Chhetri D, Echternach M, Johnson AM, Kunduk M, Maryn Y, Patel RR, Samlan R, Semmler M, Schützenberger A. Re-Training of Convolutional Neural Networks for Glottis Segmentation in Endoscopic High-Speed Videos. APPLIED SCIENCES (BASEL, SWITZERLAND) 2022; 12:9791. [PMID: 37583544 PMCID: PMC10427138 DOI: 10.3390/app12199791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/17/2023]
Abstract
Endoscopic high-speed video (HSV) systems for visualization and assessment of vocal fold dynamics in the larynx are diverse and technically advancing. To consider resulting "concepts shifts" for neural network (NN)-based image processing, re-training of already trained and used NNs is necessary to allow for sufficiently accurate image processing for new recording modalities. We propose and discuss several re-training approaches for convolutional neural networks (CNN) being used for HSV image segmentation. Our baseline CNN was trained on the BAGLS data set (58,750 images). The new BAGLS-RT data set consists of additional 21,050 images from previously unused HSV systems, light sources, and different spatial resolutions. Results showed that increasing data diversity by means of preprocessing already improves the segmentation accuracy (mIoU + 6.35%). Subsequent re-training further increases segmentation performance (mIoU + 2.81%). For re-training, finetuning with dynamic knowledge distillation showed the most promising results. Data variety for training and additional re-training is a helpful tool to boost HSV image segmentation quality. However, when performing re-training, the phenomenon of catastrophic forgetting should be kept in mind, i.e., adaption to new data while forgetting already learned knowledge.
Collapse
Affiliation(s)
- Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhino-laryngology Head & Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Tobias Schraut
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhino-laryngology Head & Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Lea A. Henrich
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhino-laryngology Head & Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Dinesh Chhetri
- Department of Head and Neck Surgery, David Geffen School of Medicine at the University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), 80331 Munich, Germany
| | - Aaron M. Johnson
- NYU Voice Center, Department of Otolaryngology–Head and Neck Surgery, New York University, Grossman School of Medicine, New York, NY 10001, USA
| | - Melda Kunduk
- Department of Communication Sciences and Disorders, Louisiana State University, Baton Rouge, LA 70801, USA
| | - Youri Maryn
- Department of Speech, Language and Hearing Sciences, University of Ghent, 9000 Ghent, Belgium
| | - Rita R. Patel
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IA 47401, USA
| | - Robin Samlan
- Department of Speech, Language, & Hearing Sciences, University of Arizona, Tucson, AZ 85641, USA
| | - Marion Semmler
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhino-laryngology Head & Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhino-laryngology Head & Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, 91054 Erlangen, Germany
| |
Collapse
|
18
|
Azam MA, Sampieri C, Ioppi A, Benzi P, Giordano GG, De Vecchi M, Campagnari V, Li S, Guastini L, Paderno A, Moccia S, Piazza C, Mattos LS, Peretti G. Videomics of the Upper Aero-Digestive Tract Cancer: Deep Learning Applied to White Light and Narrow Band Imaging for Automatic Segmentation of Endoscopic Images. Front Oncol 2022; 12:900451. [PMID: 35719939 PMCID: PMC9198427 DOI: 10.3389/fonc.2022.900451] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Accepted: 04/26/2022] [Indexed: 12/13/2022] Open
Abstract
Introduction Narrow Band Imaging (NBI) is an endoscopic visualization technique useful for upper aero-digestive tract (UADT) cancer detection and margins evaluation. However, NBI analysis is strongly operator-dependent and requires high expertise, thus limiting its wider implementation. Recently, artificial intelligence (AI) has demonstrated potential for applications in UADT videoendoscopy. Among AI methods, deep learning algorithms, and especially convolutional neural networks (CNNs), are particularly suitable for delineating cancers on videoendoscopy. This study is aimed to develop a CNN for automatic semantic segmentation of UADT cancer on endoscopic images. Materials and Methods A dataset of white light and NBI videoframes of laryngeal squamous cell carcinoma (LSCC) was collected and manually annotated. A novel DL segmentation model (SegMENT) was designed. SegMENT relies on DeepLabV3+ CNN architecture, modified using Xception as a backbone and incorporating ensemble features from other CNNs. The performance of SegMENT was compared to state-of-the-art CNNs (UNet, ResUNet, and DeepLabv3). SegMENT was then validated on two external datasets of NBI images of oropharyngeal (OPSCC) and oral cavity SCC (OSCC) obtained from a previously published study. The impact of in-domain transfer learning through an ensemble technique was evaluated on the external datasets. Results 219 LSCC patients were retrospectively included in the study. A total of 683 videoframes composed the LSCC dataset, while the external validation cohorts of OPSCC and OCSCC contained 116 and 102 images. On the LSCC dataset, SegMENT outperformed the other DL models, obtaining the following median values: 0.68 intersection over union (IoU), 0.81 dice similarity coefficient (DSC), 0.95 recall, 0.78 precision, 0.97 accuracy. For the OCSCC and OPSCC datasets, results were superior compared to previously published data: the median performance metrics were, respectively, improved as follows: DSC=10.3% and 11.9%, recall=15.0% and 5.1%, precision=17.0% and 14.7%, accuracy=4.1% and 10.3%. Conclusion SegMENT achieved promising performances, showing that automatic tumor segmentation in endoscopic images is feasible even within the highly heterogeneous and complex UADT environment. SegMENT outperformed the previously published results on the external validation cohorts. The model demonstrated potential for improved detection of early tumors, more precise biopsies, and better selection of resection margins.
Collapse
Affiliation(s)
- Muhammad Adeel Azam
- Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy
| | - Claudio Sampieri
- Unit of Otorhinolaryngology - Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy.,Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| | - Alessandro Ioppi
- Unit of Otorhinolaryngology - Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy.,Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| | - Pietro Benzi
- Unit of Otorhinolaryngology - Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy.,Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| | - Giorgio Gregory Giordano
- Unit of Otorhinolaryngology - Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy.,Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| | - Marta De Vecchi
- Unit of Otorhinolaryngology - Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy.,Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| | - Valentina Campagnari
- Unit of Otorhinolaryngology - Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy.,Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| | - Shunlei Li
- Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy
| | - Luca Guastini
- Unit of Otorhinolaryngology - Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy.,Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| | - Alberto Paderno
- Unit of Otorhinolaryngology - Head and Neck Surgery, ASST Spedali Civili of Brescia, Brescia, Italy.,Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, University of Brescia, Brescia, Italy
| | - Sara Moccia
- The BioRobotics Institute and Department of Excellence in Robotics and AI, Scuola Superiore Sant'Anna, Pisa, Italy
| | - Cesare Piazza
- Unit of Otorhinolaryngology - Head and Neck Surgery, ASST Spedali Civili of Brescia, Brescia, Italy.,Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, University of Brescia, Brescia, Italy
| | - Leonardo S Mattos
- Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy
| | - Giorgio Peretti
- Unit of Otorhinolaryngology - Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy.,Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| |
Collapse
|
19
|
Otorhinolaryngological Advancements in Phoniatrics. JOURNAL OF OTORHINOLARYNGOLOGY, HEARING AND BALANCE MEDICINE 2022. [DOI: 10.3390/ohbm3010001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The production of voice is a powerful tool not only for communication, but also for artistic performances [...]
Collapse
|