1
|
Nwosu OI, Naunheim MR. Artificial Intelligence in Laryngology, Broncho-Esophagology, and Sleep Surgery. Otolaryngol Clin North Am 2024; 57:821-829. [PMID: 38719714 DOI: 10.1016/j.otc.2024.04.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2024]
Abstract
Technological advancements in laryngology, broncho-esophagology, and sleep surgery have enabled the collection of increasing amounts of complex data for diagnosis and treatment of voice, swallowing, and sleep disorders. Clinicians face challenges in efficiently synthesizing these data for personalized patient care. Artificial intelligence (AI), specifically machine learning and deep learning, offers innovative solutions for processing and interpreting these data, revolutionizing diagnosis and management in these fields, and making care more efficient and effective. In this study, we review recent AI-based innovations in the fields of laryngology, broncho-esophagology, and sleep surgery.
Collapse
Affiliation(s)
- Obinna I Nwosu
- Department of Otolaryngology-Head & Neck Surgery, Massachusetts Eye & Ear, Boston, MA, USA; Department of Otolaryngology-Head & Neck Surgery, Harvard Medical School, Boston, MA, USA
| | - Matthew R Naunheim
- Department of Otolaryngology-Head & Neck Surgery, Massachusetts Eye & Ear, Boston, MA, USA; Department of Otolaryngology-Head & Neck Surgery, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
2
|
Paderno A, Bedi N, Rau A, Holsinger CF. Computer Vision and Videomics in Otolaryngology-Head and Neck Surgery: Bridging the Gap Between Clinical Needs and the Promise of Artificial Intelligence. Otolaryngol Clin North Am 2024; 57:703-718. [PMID: 38981809 DOI: 10.1016/j.otc.2024.05.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/11/2024]
Abstract
This article discusses the role of computer vision in otolaryngology, particularly through endoscopy and surgery. It covers recent applications of artificial intelligence (AI) in nonradiologic imaging within otolaryngology, noting the benefits and challenges, such as improving diagnostic accuracy and optimizing therapeutic outcomes, while also pointing out the necessity for enhanced data curation and standardized research methodologies to advance clinical applications. Technical aspects are also covered, providing a detailed view of the progression from manual feature extraction to more complex AI models, including convolutional neural networks and vision transformers and their potential application in clinical settings.
Collapse
Affiliation(s)
- Alberto Paderno
- IRCCS Humanitas Research Hospital, via Manzoni 56, Rozzano, Milan 20089, Italy; Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini 4, Pieve Emanuele, Milan 20072, Italy.
| | - Nikita Bedi
- Division of Head and Neck Surgery, Department of Otolaryngology, Stanford University, Palo Alto, CA, USA
| | - Anita Rau
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, USA
| | | |
Collapse
|
3
|
Baldini C, Azam MA, Sampieri C, Ioppi A, Ruiz-Sevilla L, Vilaseca I, Alegre B, Tirrito A, Pennacchi A, Peretti G, Moccia S, Mattos LS. An automated approach for real-time informative frames classification in laryngeal endoscopy using deep learning. Eur Arch Otorhinolaryngol 2024; 281:4255-4264. [PMID: 38698163 PMCID: PMC11266252 DOI: 10.1007/s00405-024-08676-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2024] [Accepted: 04/08/2024] [Indexed: 05/05/2024]
Abstract
PURPOSE Informative image selection in laryngoscopy has the potential for improving automatic data extraction alone, for selective data storage and a faster review process, or in combination with other artificial intelligence (AI) detection or diagnosis models. This paper aims to demonstrate the feasibility of AI in providing automatic informative laryngoscopy frame selection also capable of working in real-time providing visual feedback to guide the otolaryngologist during the examination. METHODS Several deep learning models were trained and tested on an internal dataset (n = 5147 images) and then tested on an external test set (n = 646 images) composed of both white light and narrow band images. Four videos were used to assess the real-time performance of the best-performing model. RESULTS ResNet-50, pre-trained with the pretext strategy, reached a precision = 95% vs. 97%, recall = 97% vs, 89%, and the F1-score = 96% vs. 93% on the internal and external test set respectively (p = 0.062). The four testing videos are provided in the supplemental materials. CONCLUSION The deep learning model demonstrated excellent performance in identifying diagnostically relevant frames within laryngoscopic videos. With its solid accuracy and real-time capabilities, the system is promising for its development in a clinical setting, either autonomously for objective quality control or in conjunction with other algorithms within a comprehensive AI toolset aimed at enhancing tumor detection and diagnosis.
Collapse
Affiliation(s)
- Chiara Baldini
- Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy
- Departement of Informatics, Bioengineering, Robotics and Systems Engineering, University of Genoa, Genoa, Italy
| | - Muhammad Adeel Azam
- Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy
- Departement of Informatics, Bioengineering, Robotics and Systems Engineering, University of Genoa, Genoa, Italy
| | - Claudio Sampieri
- Department of Experimental Medicine (DIMES), University of Genoa, Genoa, Italy.
- Department of Otolaryngology, Hospital Clínic, C. de Villarroel, 170, 08029, Barcelona, Spain.
- Unit of Head and Neck Tumors, Hospital Clínic, Barcelona, Spain.
| | | | - Laura Ruiz-Sevilla
- Otorhinolaryngology Head-Neck Surgery Department, Hospital Universitari Joan XXIII de Tarragona, Tarragona, Spain
| | - Isabel Vilaseca
- Department of Otolaryngology, Hospital Clínic, C. de Villarroel, 170, 08029, Barcelona, Spain
- Unit of Head and Neck Tumors, Hospital Clínic, Barcelona, Spain
- Translational Genomics and Target Therapies in Solid Tumors Group, Institut d́Investigacions Biomèdiques August Pi i Sunyer, IDIBAPS, Barcelona, Spain
- Faculty of Medicine, University of Barcelona, Barcelona, Spain
| | - Berta Alegre
- Department of Otolaryngology, Hospital Clínic, C. de Villarroel, 170, 08029, Barcelona, Spain
- Unit of Head and Neck Tumors, Hospital Clínic, Barcelona, Spain
| | - Alessandro Tirrito
- Unit of Otorhinolaryngology-Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy
- Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| | - Alessia Pennacchi
- Unit of Otorhinolaryngology-Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy
- Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| | - Giorgio Peretti
- Unit of Otorhinolaryngology-Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy
- Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| | - Sara Moccia
- The BioRobotics Institute and Department of Excellence in Robotics and AI, Scuola Superiore Sant'Anna, Pisa, Italy
| | - Leonardo S Mattos
- Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy
| |
Collapse
|
4
|
Mamidi IS, Dunham ME, Adkins LK, McWhorter AJ, Fang Z, Banh BT. Laryngeal Cancer Screening During Flexible Video Laryngoscopy Using Large Computer Vision Models. Ann Otol Rhinol Laryngol 2024; 133:720-728. [PMID: 38755974 DOI: 10.1177/00034894241253376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/18/2024]
Abstract
OBJECTIVE Develop an artificial intelligence assisted computer vision model to screen for laryngeal cancer during flexible laryngoscopy. METHODS Using laryngeal images and flexible laryngoscopy video recordings, we developed computer vision models to classify video frames for usability and cancer screening. A separate model segments any identified lesions on the frames. We used these computer vision models to construct a video stream annotation system. This system classifies findings from flexible laryngoscopy as "potentially malignant" or "probably benign" and segments any detected lesions. Additionally, the model provides a confidence level for each classification. RESULTS The overall accuracy of the flexible laryngoscopy cancer screening model was 92%. For cancer screening, it achieved a sensitivity of 97.7% and a specificity of 76.9%. The segmentation model attained an average precision at a 0.50 intersection-over-union of 0.595. The confidence level for positive screening results can assist clinicians in counseling patients regarding the findings. CONCLUSION Our model is highly sensitive and adequately specific for laryngeal cancer screening. Segmentation helps endoscopists identify and describe potential lesions. Further optimization is required to enable the model's deployment in clinical settings for real-time annotation during flexible laryngoscopy.
Collapse
Affiliation(s)
- Ishwarya S Mamidi
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, LA, USA
| | - Michael E Dunham
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, LA, USA
| | - Lacey K Adkins
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, LA, USA
| | - Andrew J McWhorter
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, LA, USA
| | - Zhide Fang
- Biostatistics Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans LA, USA
| | - Britney T Banh
- Our Lady of the Lake Voice Center, Our Lady of the Lake Regional Medical Center, Baton Rouge, LA, USA
| |
Collapse
|
5
|
Wu YH, Huang KY, Tseng ACC. Development of an Artificial Intelligence-Based Image Recognition System for Time-Sequence Analysis of Tracheal Intubation. Anesth Analg 2024; 139:357-365. [PMID: 38381700 DOI: 10.1213/ane.0000000000006934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
BACKGROUND Total intubation time (TIT) is an objective indicator of tracheal intubation (TI) difficulties. However, large variations in TIT because of diverse initial and end targets make it difficult to compare studies. A video laryngoscope (VLS) can capture images during the TI process. By using artificial intelligence (AI) to detect airway structures, the start and end points can be freely selected, thus eliminating the inconsistencies. Further deconstructing the process and establishing time-sequence analysis may aid in gaining further understanding of the TI process. METHODS We developed a time-sequencing system for analyzing TI performed using a #3 Macintosh VLS. This system was established and validated on 30 easy TIs performed by specialists and validated using TI videos performed by a postgraduate-year (PGY) physician. Thirty easy intubation videos were selected from a cohort approved by our institutional review board (B-ER-107-088), and 6 targets were labeled: the lip, epiglottis, laryngopharynx, glottic opening, tube tip, and a black line on the endotracheal tube. We used 887 captured images to develop an AI model trained using You Only Look Once, Version 3 (YOLOv3). Seven cut points were selected for phase division. Seven experts selected the cut points. The expert cut points were used to validate the AI-identified cut points and time-sequence data. After the removal of the tube tip and laryngopharynx images, the durations between 5 identical cut points and sequentially identified the durations of 4 intubation phases, as well as TIT. RESULTS The average and total losses approached 0 within 150 cycles of model training for target identification. The identification rate for all cut points was 92.4% (194 of 210), which increased to 99.4% (179 of 180) after the removal of the tube tip target. The 4 phase durations and TIT calculated by the AI model and those from the expert exhibited strong Pearson correlation (phase I, r = 0.914; phase II, r = 0.868; phase III, r = 0.964; and phase IV, r = 0.949; TIT, r = 0.99; all P < .001). Similar findings were obtained for the PGY's observations (r > 0.95; P < .01). CONCLUSIONS YOLOv3 is a powerful tool for analyzing images recorded by VLS. By using AI to detect the airway structures, the start and end points can be freely selected, resolving the heterogeneity resulting from the inconsistencies in the TIT cut points across studies. Time-sequence analysis involving the deconstruction of VLS-recorded TI images into several phases should be conducted in further TI research.
Collapse
Affiliation(s)
- Yu-Hwa Wu
- From the Department of Anesthesia, National Cheng Kung University Hospital, College of Medicine, National Cheng Kung University, Tainan, Taiwan
| | - Kun-Yi Huang
- Department of Computer Science and Information Engineering, Southern Taiwan University of Science and Technology, Tainan, Taiwan
| | - Alex Chia-Chih Tseng
- Department of Anesthesiology, National Cheng Kung University Hospital, College of Medicine, National Cheng Kung University, Tainan, Taiwan
| |
Collapse
|
6
|
Torborg SR, Kim AYE, Rameau A. New developments in the application of artificial intelligence to laryngology. Curr Opin Otolaryngol Head Neck Surg 2024:00020840-990000000-00141. [PMID: 39146248 DOI: 10.1097/moo.0000000000000999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
PURPOSE OF REVIEW The purpose of this review is to summarize the existing literature on artificial intelligence technology utilization in laryngology, highlighting recent advances and current barriers to implementation. RECENT FINDINGS The volume of publications studying applications of artificial intelligence in laryngology has rapidly increased, demonstrating a strong interest in utilizing this technology. Vocal biomarkers for disease screening, deep learning analysis of videolaryngoscopy for lesion identification, and auto-segmentation of videofluoroscopy for detection of aspiration are a few of the new ways in which artificial intelligence is poised to transform clinical care in laryngology. Increasing collaboration is ongoing to establish guidelines and standards for the field to ensure generalizability. SUMMARY Artificial intelligence tools have the potential to greatly advance laryngology care by creating novel screening methods, improving how data-heavy diagnostics of laryngology are analyzed, and standardizing outcome measures. However, physician and patient trust in artificial intelligence must improve for the technology to be successfully implemented. Additionally, most existing studies lack large and diverse datasets, external validation, and consistent ground-truth references necessary to produce generalizable results. Collaborative, large-scale studies will fuel technological innovation and bring artificial intelligence to the forefront of patient care in laryngology.
Collapse
Affiliation(s)
- Stefan R Torborg
- Sean Parker Institute for the Voice, Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine
- Weill Cornell/Rockefeller/Sloan Kettering Tri-Institutional MD-PhD Program, New York, New York, USA
| | - Ashley Yeo Eun Kim
- Sean Parker Institute for the Voice, Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine
| | - Anaïs Rameau
- Sean Parker Institute for the Voice, Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine
| |
Collapse
|
7
|
Dao TTP, Huynh TL, Pham MK, Le TN, Nguyen TC, Nguyen QT, Tran BA, Van BN, Ha CC, Tran MT. Improving Laryngoscopy Image Analysis Through Integration of Global Information and Local Features in VoFoCD Dataset. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024:10.1007/s10278-024-01068-z. [PMID: 38809338 DOI: 10.1007/s10278-024-01068-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 02/24/2024] [Accepted: 02/26/2024] [Indexed: 05/30/2024]
Abstract
The diagnosis and treatment of vocal fold disorders heavily rely on the use of laryngoscopy. A comprehensive vocal fold diagnosis requires accurate identification of crucial anatomical structures and potential lesions during laryngoscopy observation. However, existing approaches have yet to explore the joint optimization of the decision-making process, including object detection and image classification tasks simultaneously. In this study, we provide a new dataset, VoFoCD, with 1724 laryngology images designed explicitly for object detection and image classification in laryngoscopy images. Images in the VoFoCD dataset are categorized into four classes and comprise six glottic object types. Moreover, we propose a novel Multitask Efficient trAnsformer network for Laryngoscopy (MEAL) to classify vocal fold images and detect glottic landmarks and lesions. To further facilitate interpretability for clinicians, MEAL provides attention maps to visualize important learned regions for explainable artificial intelligence results toward supporting clinical decision-making. We also analyze our model's effectiveness in simulated clinical scenarios where shaking of the laryngoscopy process occurs. The proposed model demonstrates outstanding performance on our VoFoCD dataset. The accuracy for image classification and mean average precision at an intersection over a union threshold of 0.5 (mAP50) for object detection are 0.951 and 0.874, respectively. Our MEAL method integrates global knowledge, encompassing general laryngoscopy image classification, into local features, which refer to distinct anatomical regions of the vocal fold, particularly abnormal regions, including benign and malignant lesions. Our contribution can effectively aid laryngologists in identifying benign or malignant lesions of vocal folds and classifying images in the laryngeal endoscopy process visually.
Collapse
Affiliation(s)
- Thao Thi Phuong Dao
- University of Science, Ho Chi Minh City, Vietnam
- John von Neumann Institute, Ho Chi Minh City, Vietnam
- Vietnam National University, Ho Chi Minh City, Vietnam
- Department of Otolaryngology, Thong Nhat Hospital, Tan Binh District, Ho Chi Minh City, Vietnam
| | - Tuan-Luc Huynh
- University of Science, Ho Chi Minh City, Vietnam
- Vietnam National University, Ho Chi Minh City, Vietnam
| | | | - Trung-Nghia Le
- University of Science, Ho Chi Minh City, Vietnam
- Vietnam National University, Ho Chi Minh City, Vietnam
| | - Tan-Cong Nguyen
- University of Science, Ho Chi Minh City, Vietnam
- Vietnam National University, Ho Chi Minh City, Vietnam
- University of Social Sciences and Humanities, Ho Chi Minh City, Vietnam
| | - Quang-Thuc Nguyen
- University of Science, Ho Chi Minh City, Vietnam
- John von Neumann Institute, Ho Chi Minh City, Vietnam
- Vietnam National University, Ho Chi Minh City, Vietnam
| | - Bich Anh Tran
- Otorhinolaryngology Department, Cho Ray Hospital, District 5, Ho Chi Minh City, Vietnam
| | - Boi Ngoc Van
- Department of Otolaryngology, Vinmec Central Park International Hospital, Binh Thanh District, Ho Chi Minh City, Vietnam
| | - Chanh Cong Ha
- Department of Otolaryngology, District 7 Hospital, District 7, Ho Chi Minh City, Vietnam
| | - Minh-Triet Tran
- University of Science, Ho Chi Minh City, Vietnam.
- John von Neumann Institute, Ho Chi Minh City, Vietnam.
- Vietnam National University, Ho Chi Minh City, Vietnam.
| |
Collapse
|
8
|
Yao P, Witte D, German A, Periyakoil P, Kim YE, Gimonet H, Sulica L, Born H, Elemento O, Barnes J, Rameau A. A deep learning pipeline for automated classification of vocal fold polyps in flexible laryngoscopy. Eur Arch Otorhinolaryngol 2024; 281:2055-2062. [PMID: 37695363 DOI: 10.1007/s00405-023-08190-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 08/12/2023] [Indexed: 09/12/2023]
Abstract
PURPOSE To develop and validate a deep learning model for distinguishing healthy vocal folds (HVF) and vocal fold polyps (VFP) on laryngoscopy videos, while demonstrating the ability of a previously developed informative frame classifier in facilitating deep learning development. METHODS Following retrospective extraction of image frames from 52 HVF and 77 unilateral VFP videos, two researchers manually labeled each frame as informative or uninformative. A previously developed informative frame classifier was used to extract informative frames from the same video set. Both sets of videos were independently divided into training (60%), validation (20%), and test (20%) by patient. Machine-labeled frames were independently verified by two researchers to assess the precision of the informative frame classifier. Two models, pre-trained on ResNet18, were trained to classify frames as containing HVF or VFP. The accuracy of the polyp classifier trained on machine-labeled frames was compared to that of the classifier trained on human-labeled frames. The performance was measured by accuracy and area under the receiver operating characteristic curve (AUROC). RESULTS When evaluated on a hold-out test set, the polyp classifier trained on machine-labeled frames achieved an accuracy of 85% and AUROC of 0.84, whereas the classifier trained on human-labeled frames achieved an accuracy of 69% and AUROC of 0.66. CONCLUSION An accurate deep learning classifier for vocal fold polyp identification was developed and validated with the assistance of a peer-reviewed informative frame classifier for dataset assembly. The classifier trained on machine-labeled frames demonstrates improved performance compared to the classifier trained on human-labeled frames. LEVEL OF EVIDENCE: 4
Collapse
Affiliation(s)
- Peter Yao
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, 240 East 59th St, New York, NY, 10022, USA
| | - Dan Witte
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, 240 East 59th St, New York, NY, 10022, USA
| | - Alexander German
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, 240 East 59th St, New York, NY, 10022, USA
| | - Preethi Periyakoil
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, 240 East 59th St, New York, NY, 10022, USA
| | - Yeo Eun Kim
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, 240 East 59th St, New York, NY, 10022, USA
| | - Hortense Gimonet
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, 240 East 59th St, New York, NY, 10022, USA
| | - Lucian Sulica
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, 240 East 59th St, New York, NY, 10022, USA
| | - Hayley Born
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, 240 East 59th St, New York, NY, 10022, USA
| | - Olivier Elemento
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Josue Barnes
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, 240 East 59th St, New York, NY, 10022, USA
| | - Anaïs Rameau
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, 240 East 59th St, New York, NY, 10022, USA.
| |
Collapse
|
9
|
Pennington-FitzGerald W, Joshi A, Honzel E, Hernandez-Morato I, Pitman MJ, Moayedi Y. Development and Application of Automated Vocal Fold Tracking Software in a Rat Surgical Model. Laryngoscope 2024; 134:340-346. [PMID: 37543969 DOI: 10.1002/lary.30930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 06/21/2023] [Accepted: 07/15/2023] [Indexed: 08/08/2023]
Abstract
OBJECTIVE The rat is a widely used model for studying vocal fold (VF) function after recurrent laryngeal nerve injury, but common techniques for evaluating rat VF motion remain subjective and imprecise. To address this, we developed a software package, called RatVocalTracker1.0 (RVT1.0), to quantify VF motion and tested it on rats with iatrogenic unilateral vocal fold paralysis (VFP). METHODS A deep neural network was trained to identify the positions of the VFs and arytenoid cartilages (ACs) in transoral laryngoscope videos of the rat glottis. Software was developed to estimate glottic midline, VF displacement, VF velocity, and AC angle. The software was applied to laryngoscope videos of adult rats before and after right recurrent and superior laryngeal nerve transection (N = 15; 6M, 9F). All software calculated metrics were compared before and after injury and validated against manually calculated metrics. RESULTS RVT1.0 accurately tracked and quantified VF displacement, VF velocity, and AC angle. Significant differences were found before and after surgery for all RVT1.0 calculated metrics. There was strong agreement between programmatically and manually calculated measures. Automated analysis was also more efficient than nearly all manual methods. CONCLUSION This approach provides fast, accurate assessment of VF motion in rats with minimal labor and allows for quantitative comparison of lateral differences in movement. Through this novel analysis method, we can differentiate healthy movement from unilateral VFP. RVT1.0 is open-source and will be a valuable tool for researchers using the rat model for laryngology research. LEVEL OF EVIDENCE NA Laryngoscope, 134:340-346, 2024.
Collapse
Affiliation(s)
| | - Abhinav Joshi
- The Center for Voice and Swallowing, Department of Otolaryngology-Head & Neck Surgery, Columbia University Irving Medical Center, New York, New York, U.S.A
| | - Emily Honzel
- College of Physicians and Surgeons, Columbia University, New York, New York, U.S.A
| | - Ignacio Hernandez-Morato
- The Center for Voice and Swallowing, Department of Otolaryngology-Head & Neck Surgery, Columbia University Irving Medical Center, New York, New York, U.S.A
| | - Michael J Pitman
- The Center for Voice and Swallowing, Department of Otolaryngology-Head & Neck Surgery, Columbia University Irving Medical Center, New York, New York, U.S.A
| | - Yalda Moayedi
- The Center for Voice and Swallowing, Department of Otolaryngology-Head & Neck Surgery, Columbia University Irving Medical Center, New York, New York, U.S.A
- Department of Neurology, Columbia University, New York, New York, U.S.A
| |
Collapse
|
10
|
Tsilivigkos C, Athanasopoulos M, Micco RD, Giotakis A, Mastronikolis NS, Mulita F, Verras GI, Maroulis I, Giotakis E. Deep Learning Techniques and Imaging in Otorhinolaryngology-A State-of-the-Art Review. J Clin Med 2023; 12:6973. [PMID: 38002588 PMCID: PMC10672270 DOI: 10.3390/jcm12226973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Revised: 11/02/2023] [Accepted: 11/06/2023] [Indexed: 11/26/2023] Open
Abstract
Over the last decades, the field of medicine has witnessed significant progress in artificial intelligence (AI), the Internet of Medical Things (IoMT), and deep learning (DL) systems. Otorhinolaryngology, and imaging in its various subspecialties, has not remained untouched by this transformative trend. As the medical landscape evolves, the integration of these technologies becomes imperative in augmenting patient care, fostering innovation, and actively participating in the ever-evolving synergy between computer vision techniques in otorhinolaryngology and AI. To that end, we conducted a thorough search on MEDLINE for papers published until June 2023, utilizing the keywords 'otorhinolaryngology', 'imaging', 'computer vision', 'artificial intelligence', and 'deep learning', and at the same time conducted manual searching in the references section of the articles included in our manuscript. Our search culminated in the retrieval of 121 related articles, which were subsequently subdivided into the following categories: imaging in head and neck, otology, and rhinology. Our objective is to provide a comprehensive introduction to this burgeoning field, tailored for both experienced specialists and aspiring residents in the domain of deep learning algorithms in imaging techniques in otorhinolaryngology.
Collapse
Affiliation(s)
- Christos Tsilivigkos
- 1st Department of Otolaryngology, National and Kapodistrian University of Athens, Hippocrateion Hospital, 115 27 Athens, Greece; (A.G.); (E.G.)
| | - Michail Athanasopoulos
- Department of Otolaryngology, University Hospital of Patras, 265 04 Patras, Greece; (M.A.); (N.S.M.)
| | - Riccardo di Micco
- Department of Otolaryngology and Head and Neck Surgery, Medical School of Hannover, 30625 Hannover, Germany;
| | - Aris Giotakis
- 1st Department of Otolaryngology, National and Kapodistrian University of Athens, Hippocrateion Hospital, 115 27 Athens, Greece; (A.G.); (E.G.)
| | - Nicholas S. Mastronikolis
- Department of Otolaryngology, University Hospital of Patras, 265 04 Patras, Greece; (M.A.); (N.S.M.)
| | - Francesk Mulita
- Department of Surgery, University Hospital of Patras, 265 04 Patras, Greece; (G.-I.V.); (I.M.)
| | - Georgios-Ioannis Verras
- Department of Surgery, University Hospital of Patras, 265 04 Patras, Greece; (G.-I.V.); (I.M.)
| | - Ioannis Maroulis
- Department of Surgery, University Hospital of Patras, 265 04 Patras, Greece; (G.-I.V.); (I.M.)
| | - Evangelos Giotakis
- 1st Department of Otolaryngology, National and Kapodistrian University of Athens, Hippocrateion Hospital, 115 27 Athens, Greece; (A.G.); (E.G.)
| |
Collapse
|
11
|
Sampieri C, Baldini C, Azam MA, Moccia S, Mattos LS, Vilaseca I, Peretti G, Ioppi A. Artificial Intelligence for Upper Aerodigestive Tract Endoscopy and Laryngoscopy: A Guide for Physicians and State-of-the-Art Review. Otolaryngol Head Neck Surg 2023; 169:811-829. [PMID: 37051892 DOI: 10.1002/ohn.343] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 03/03/2023] [Accepted: 03/23/2023] [Indexed: 04/14/2023]
Abstract
OBJECTIVE The endoscopic and laryngoscopic examination is paramount for laryngeal, oropharyngeal, nasopharyngeal, nasal, and oral cavity benign lesions and cancer evaluation. Nevertheless, upper aerodigestive tract (UADT) endoscopy is intrinsically operator-dependent and lacks objective quality standards. At present, there has been an increased interest in artificial intelligence (AI) applications in this area to support physicians during the examination, thus enhancing diagnostic performances. The relative novelty of this research field poses a challenge both for the reviewers and readers as clinicians often lack a specific technical background. DATA SOURCES Four bibliographic databases were searched: PubMed, EMBASE, Cochrane, and Google Scholar. REVIEW METHODS A structured review of the current literature (up to September 2022) was performed. Search terms related to topics of AI, machine learning (ML), and deep learning (DL) in UADT endoscopy and laryngoscopy were identified and queried by 3 independent reviewers. Citations of selected studies were also evaluated to ensure comprehensiveness. CONCLUSIONS Forty-one studies were included in the review. AI and computer vision techniques were used to achieve 3 fundamental tasks in this field: classification, detection, and segmentation. All papers were summarized and reviewed. IMPLICATIONS FOR PRACTICE This article comprehensively reviews the latest developments in the application of ML and DL in UADT endoscopy and laryngoscopy, as well as their future clinical implications. The technical basis of AI is also explained, providing guidance for nonexpert readers to allow critical appraisal of the evaluation metrics and the most relevant quality requirements.
Collapse
Affiliation(s)
- Claudio Sampieri
- Department of Experimental Medicine (DIMES), University of Genoa, Genoa, Italy
- Functional Unit of Head and Neck Tumors, Hospital Clínic, Barcelona, Spain
- Otorhinolaryngology Department, Hospital Clínic, Barcelona, Spain
| | - Chiara Baldini
- Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy
- Dipartimento di Informatica, Bioingegneria, Robotica e Ingegneria dei Sistemi (DIBRIS), University of Genoa, Genoa, Italy
| | - Muhammad Adeel Azam
- Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy
- Dipartimento di Informatica, Bioingegneria, Robotica e Ingegneria dei Sistemi (DIBRIS), University of Genoa, Genoa, Italy
| | - Sara Moccia
- Department of Excellence in Robotics and AI, The BioRobotics Institute, Pisa, Italy
| | - Leonardo S Mattos
- Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy
| | - Isabel Vilaseca
- Functional Unit of Head and Neck Tumors, Hospital Clínic, Barcelona, Spain
- Otorhinolaryngology Department, Hospital Clínic, Barcelona, Spain
- Head Neck Clínic, Agència de Gestió d'Ajuts Universitaris i de Recerca, Barcelona, Catalunya, Spain
- Surgery and Medical-Surgical Specialties Department, Faculty of Medicine and Health Sciences, Universitat de Barcelona, Barcelona, Spain
- Translational Genomics and Target Therapies in Solid Tumors Group, Faculty of Medicine, Institut d́Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
- University of Barcelona, Barcelona, Spain
| | - Giorgio Peretti
- Unit of Otorhinolaryngology-Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy
- Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| | - Alessandro Ioppi
- Unit of Otorhinolaryngology-Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy
- Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| |
Collapse
|
12
|
Multi-level classification of knee cartilage lesion in multimodal MRI based on deep learning. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2023.104687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2023]
|
13
|
A Novel Framework of Manifold Learning Cascade-Clustering for the Informative Frame Selection. Diagnostics (Basel) 2023; 13:diagnostics13061151. [PMID: 36980459 PMCID: PMC10047422 DOI: 10.3390/diagnostics13061151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 03/05/2023] [Accepted: 03/10/2023] [Indexed: 03/19/2023] Open
Abstract
Narrow band imaging is an established non-invasive tool used for the early detection of laryngeal cancer in surveillance examinations. Most images produced from the examination are useless, such as blurred, specular reflection, and underexposed. Removing the uninformative frames is vital to improve detection accuracy and speed up computer-aided diagnosis. It often takes a lot of time for the physician to manually inspect the informative frames. This issue is commonly addressed by a classifier with task-specific categories of the uninformative frames. However, the definition of the uninformative categories is ambiguous, and tedious labeling still cannot be avoided. Here, we show that a novel unsupervised scheme is comparable to the current benchmarks on the dataset of NBI-InfFrames. We extract feature embedding using a vanilla neural network (VGG16) and introduce a new dimensionality reduction method called UMAP that distinguishes the feature embedding in the lower-dimensional space. Along with the proposed automatic cluster labeling algorithm and cost function in Bayesian optimization, the proposed method coupled with UMAP achieves state-of-the-art performance. It outperforms the baseline by 12% absolute. The overall median recall of the proposed method is currently the highest, 96%. Our results demonstrate the effectiveness of the proposed scheme and the robustness of detecting the informative frames. It also suggests the patterns embedded in the data help develop flexible algorithms that do not require manual labeling.
Collapse
|
14
|
Bensoussan Y, Vanstrum EB, Johns MM, Rameau A. Artificial Intelligence and Laryngeal Cancer: From Screening to Prognosis: A State of the Art Review. Otolaryngol Head Neck Surg 2023; 168:319-329. [PMID: 35787073 DOI: 10.1177/01945998221110839] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 06/13/2022] [Indexed: 11/16/2022]
Abstract
OBJECTIVE This state of the art review aims to examine contemporary advances in applications of artificial intelligence (AI) to the screening, detection, management, and prognostication of laryngeal cancer (LC). DATA SOURCES Four bibliographic databases were searched: PubMed, EMBASE, Cochrane, and IEEE. REVIEW METHODS A structured review of the current literature (up to January 2022) was performed. Search terms related to topics of AI in LC were identified and queried by 2 independent reviewers. Citations of selected studies and review articles were also evaluated to ensure comprehensiveness. CONCLUSIONS AI applications in LC have encompassed a variety of data modalities, including radiomics, genomics, acoustics, clinical data, and videomics, to support screening, diagnosis, therapeutic decision making, and prognosis. However, most studies remain at the proof-of-concept level, as AI algorithms are trained on single-institution databases with limited data sets and a single data modality. IMPLICATIONS FOR PRACTICE AI algorithms in LC will need to be trained on large multi-institutional data sets and integrate multimodal data for optimal performance and clinical utility from screening to prognosis. Out of the data types reviewed, genomics has the most potential to provide generalizable models thanks to available large multi-institutional open access genomic data sets. Voice acoustic data represent an inexpensive and accurate biomarker, which is easy and noninvasive to capture, offering a unique opportunity for screening and monitoring of LA, especially in low-resource settings.
Collapse
Affiliation(s)
- Yael Bensoussan
- Department of Otolaryngology-Head and Neck Surgery, University of South Florida, Tampa, Florida, USA
| | - Erik B Vanstrum
- Keck School of Medicine, University of Southern California, Los Angeles, California, USA
| | - Michael M Johns
- Department of Otolaryngology-Head and Neck Surgery, University of Southern California, Los Angeles, California, USA
| | - Anaïs Rameau
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medical College, New York, New York, USA
| |
Collapse
|