1
|
Cubero L, Tessier C, Castelli J, Robert K, de Crevoisier R, Jégoux F, Pascau J, Acosta O. Automated dysphagia characterization in head and neck cancer patients using videofluoroscopic swallowing studies. Comput Biol Med 2025; 187:109759. [PMID: 39914196 DOI: 10.1016/j.compbiomed.2025.109759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2024] [Revised: 01/24/2025] [Accepted: 01/27/2025] [Indexed: 02/21/2025]
Abstract
BACKGROUND Dysphagia is one of the most common toxicities following head and neck cancer (HNC) radiotherapy (RT). Videofluoroscopic Swallowing Studies (VFSS) are the gold standard for diagnosing and assessing dysphagia, but current evaluation methods are manual, subjective, and time-consuming. This study introduces a novel framework for the automated analysis of VFSS to characterize dysphagia in HNC patients. METHOD The proposed methodology integrates three key steps: (i) a deep learning-based labeling framework, trained iteratively to identify ten regions of interest; (ii) extraction of 23 swallowing dynamic parameters, followed by comparison across diverse cohorts; and (iii) machine learning (ML) classification of the extracted parameters into four dysphagia-related impairments. RESULTS The labeling framework achieved high accuracy, with a mean error of 1.6 pixels across the ten regions of interest in an independent test dataset. Analysis of the extracted parameters revealed significant differences in swallowing dynamics between healthy individuals, HNC patients before and after RT, and patients with non-HNC-related dysphagia. The ML classifiers achieved accuracies ranging from 0.60 to 0.87 for the four dysphagia-related impairments. CONCLUSIONS Despite challenges related to dataset size and VFSS variability, our framework demonstrates substantial potential for automatically identifying ten regions of interest and four dysphagia-related impairments from VFSS. This work sets the foundation for future research aimed at refining dysphagia analysis and characterization using VFSS, particularly in the context of HNC RT.
Collapse
Affiliation(s)
- Lucía Cubero
- Université Rennes, CLCC Eugène Marquis, Inserm, LTSI - UMR 1099, F-35000, Rennes, France; Departamento de Bioingeniería, Universidad Carlos III de Madrid, Madrid, Spain.
| | - Christophe Tessier
- Service d'ORL et Chirurgie Maxillo-Faciale, CHU Pontchaillou, Université Rennes, 35033, Rennes, France
| | - Joël Castelli
- Université Rennes, CLCC Eugène Marquis, Inserm, LTSI - UMR 1099, F-35000, Rennes, France
| | - Kilian Robert
- Service d'ORL et Chirurgie Maxillo-Faciale, CHU Pontchaillou, Université Rennes, 35033, Rennes, France
| | - Renaud de Crevoisier
- Université Rennes, CLCC Eugène Marquis, Inserm, LTSI - UMR 1099, F-35000, Rennes, France
| | - Franck Jégoux
- Service d'ORL et Chirurgie Maxillo-Faciale, CHU Pontchaillou, Université Rennes, 35033, Rennes, France
| | - Javier Pascau
- Departamento de Bioingeniería, Universidad Carlos III de Madrid, Madrid, Spain; Instituto de Investigación Sanitaria Gregorio Marañón, Madrid, Spain.
| | - Oscar Acosta
- Université Rennes, CLCC Eugène Marquis, Inserm, LTSI - UMR 1099, F-35000, Rennes, France
| |
Collapse
|
2
|
Shu K, Mao S, Zhang Z, Coyle JL, Sejdić E. Recent advancements and future directions in automatic swallowing analysis via videofluoroscopy: A review. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 259:108505. [PMID: 39579458 DOI: 10.1016/j.cmpb.2024.108505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 11/06/2024] [Accepted: 11/06/2024] [Indexed: 11/25/2024]
Abstract
Videofluoroscopic swallowing studies (VFSS) capture the complex anatomy and physiology contributing to bolus transport and airway protection during swallowing. While clinical assessment of VFSS can be affected by evaluators subjectivity and variability in evaluation protocols, many efforts have been dedicated to developing methods to ensure consistent measures and reliable analyses of swallowing physiology using advanced computer-assisted methods. Latest advances in computer vision, pattern recognition, and deep learning technologies provide new paradigms to explore and extract information from VFSS recordings. The literature search was conducted on four bibliographic databases with exclusive focus on automatic videofluoroscopic analyses. We identified 46 studies that employ state-of-the-art image processing techniques to solve VFSS analytical tasks including anatomical structure detection, bolus contrast segmentation, and kinematic event recognition. Advanced computer vision and deep learning techniques have enabled fully automatic swallowing analysis and abnormality detection, resulting in improved accuracy and unprecedented efficiency in swallowing assessment. By establishing this review of image processing techniques applied to automatic swallowing analysis, we intend to demonstrate the current challenges in VFSS analyses and provide insight into future directions in developing more accurate and clinically explainable algorithms.
Collapse
Affiliation(s)
- Kechen Shu
- School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan, China
| | - Shitong Mao
- Department of Head and Neck Surgery, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Zhenwei Zhang
- Center for Advanced Analytics, Baptist Health South Florida, Miami, FL, USA
| | - James L Coyle
- Department of Communication Science and Disorders, School of Health and Rehabilitation Sciences, University of Pittsburgh, Pittsburgh, PA, USA; Department of Otolaryngology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA; Department of Electrical and Computer Engineering, Swanson School of Engineering, University of Pittsburgh, Pittsburgh, PA, USA
| | - Ervin Sejdić
- Edward S. Rogers Department of Electrical and Computer Engineering, Faculty of Applied Science and Engineering, University of Toronto, Toronto, ON, Canada; North York General Hospital, Toronto, ON, Canada.
| |
Collapse
|
3
|
Nwosu OI, Naunheim MR. Artificial Intelligence in Laryngology, Broncho-Esophagology, and Sleep Surgery. Otolaryngol Clin North Am 2024; 57:821-829. [PMID: 38719714 DOI: 10.1016/j.otc.2024.04.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2024]
Abstract
Technological advancements in laryngology, broncho-esophagology, and sleep surgery have enabled the collection of increasing amounts of complex data for diagnosis and treatment of voice, swallowing, and sleep disorders. Clinicians face challenges in efficiently synthesizing these data for personalized patient care. Artificial intelligence (AI), specifically machine learning and deep learning, offers innovative solutions for processing and interpreting these data, revolutionizing diagnosis and management in these fields, and making care more efficient and effective. In this study, we review recent AI-based innovations in the fields of laryngology, broncho-esophagology, and sleep surgery.
Collapse
Affiliation(s)
- Obinna I Nwosu
- Department of Otolaryngology-Head & Neck Surgery, Massachusetts Eye & Ear, Boston, MA, USA; Department of Otolaryngology-Head & Neck Surgery, Harvard Medical School, Boston, MA, USA
| | - Matthew R Naunheim
- Department of Otolaryngology-Head & Neck Surgery, Massachusetts Eye & Ear, Boston, MA, USA; Department of Otolaryngology-Head & Neck Surgery, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
4
|
Nam K, Lee C, Lee T, Shin M, Kim BH, Park JW. Automated Laryngeal Invasion Detector of Boluses in Videofluoroscopic Swallowing Study Videos Using Action Recognition-Based Networks. Diagnostics (Basel) 2024; 14:1444. [PMID: 39001334 PMCID: PMC11241273 DOI: 10.3390/diagnostics14131444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 07/01/2024] [Accepted: 07/04/2024] [Indexed: 07/16/2024] Open
Abstract
We aimed to develop an automated detector that determines laryngeal invasion during swallowing. Laryngeal invasion, which causes significant clinical problems, is defined as two or more points on the penetration-aspiration scale (PAS). We applied two three-dimensional (3D) stream networks for action recognition in videofluoroscopic swallowing study (VFSS) videos. To detect laryngeal invasion (PAS 2 or higher scores) in VFSS videos, we employed two 3D stream networks for action recognition. To establish the robustness of our model, we compared its performance with those of various current image classification-based architectures. The proposed model achieved an accuracy of 92.10%. Precision, recall, and F1 scores for detecting laryngeal invasion (≥PAS 2) in VFSS videos were 0.9470 each. The accuracy of our model in identifying laryngeal invasion surpassed that of other updated image classification models (60.58% for ResNet101, 60.19% for Swin-Transformer, 63.33% for EfficientNet-B2, and 31.17% for HRNet-W32). Our model is the first automated detector of laryngeal invasion in VFSS videos based on video action recognition networks. Considering its high and balanced performance, it may serve as an effective screening tool before clinicians review VFSS videos, ultimately reducing the burden on clinicians.
Collapse
Affiliation(s)
- Kihwan Nam
- Graduate School of Management of Technology, Korea University, Seoul 02841, Republic of Korea
| | | | - Taeheon Lee
- Department of Physical Medicine and Rehabilitation, Dongguk University Ilsan Hospital, College of Medicine, 27 Dongguk-ro, Ilsandong-gu, Goyang 10326, Republic of Korea
| | - Munseop Shin
- Department of Physical Medicine and Rehabilitation, Dongguk University Ilsan Hospital, College of Medicine, 27 Dongguk-ro, Ilsandong-gu, Goyang 10326, Republic of Korea
| | - Bo Hae Kim
- Department of Otorhinolaryngology-Head and Neck Surgery, Dongguk University Ilsan Hospital, College of Medicine, 27 Dongguk-ro, Ilsandong-gu, Goyang 10326, Republic of Korea
| | - Jin-Woo Park
- Department of Physical Medicine and Rehabilitation, Dongguk University Ilsan Hospital, College of Medicine, 27 Dongguk-ro, Ilsandong-gu, Goyang 10326, Republic of Korea
| |
Collapse
|
5
|
Jeong CW, Lee CS, Lim DW, Noh SH, Moon HK, Park C, Kim MS. The Development of an Artificial Intelligence Video Analysis-Based Web Application to Diagnose Oropharyngeal Dysphagia: A Pilot Study. Brain Sci 2024; 14:546. [PMID: 38928546 PMCID: PMC11201460 DOI: 10.3390/brainsci14060546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 05/18/2024] [Accepted: 05/26/2024] [Indexed: 06/28/2024] Open
Abstract
The gold standard test for diagnosing dysphagia is the videofluoroscopic swallowing study (VFSS). However, the accuracy of this test varies depending on the specialist's skill level. We proposed a VFSS-based artificial intelligence (AI) web application to diagnose dysphagia. Video from the VFSS consists of multiframe data that contain approximately 300 images. To label the data, the server separated them into frames during the upload and stored them as a video for analysis. Then, the separated data were loaded into a labeling tool to perform the labeling. The labeled file was downloaded, and an AI model was developed by training with You Only Look Once (YOLOv7). Using a utility called SplitFolders, the entire dataset was divided according to a ratio of training (70%), test (10%), and validation (20%). When a VFSS video file was uploaded to an application equipped with the developed AI model, it was automatically classified and labeled as oral, pharyngeal, or esophageal. The dysphagia of a person was categorized as either penetration or aspiration, and the final analyzed result was displayed to the viewer. The following labeling datasets were created for the AI learning: oral (n = 2355), pharyngeal (n = 2338), esophageal (n = 1480), penetration (n = 1856), and aspiration (n = 1320); the learning results of the YOLO model, which analyzed dysphagia using the dataset, were predicted with accuracies of 0.90, 0.82, 0.79, 0.92, and 0.96, respectively. This is expected to help clinicians more efficiently suggest the proper dietary options for patients with oropharyngeal dysphagia.
Collapse
Affiliation(s)
- Chang-Won Jeong
- STSC Center, Wonkwang University, Iksan 54538, Republic of Korea; (C.-W.J.); (C.-S.L.); (D.-W.L.); (S.-H.N.)
- Smart Team, Wonkwang University Hospital, Iksan 54538, Republic of Korea
| | - Chung-Sub Lee
- STSC Center, Wonkwang University, Iksan 54538, Republic of Korea; (C.-W.J.); (C.-S.L.); (D.-W.L.); (S.-H.N.)
| | - Dong-Wook Lim
- STSC Center, Wonkwang University, Iksan 54538, Republic of Korea; (C.-W.J.); (C.-S.L.); (D.-W.L.); (S.-H.N.)
| | - Si-Hyeong Noh
- STSC Center, Wonkwang University, Iksan 54538, Republic of Korea; (C.-W.J.); (C.-S.L.); (D.-W.L.); (S.-H.N.)
| | - Hee-Kyung Moon
- Institute for Educational Innovation, Wonkwang University, Iksan 54538, Republic of Korea;
| | - Chul Park
- Division of Pulmonology and Critical Care Medicine, Department of Internal Medicine, Ulsan University Hospital, Ulsan 44033, Republic of Korea
| | - Min-Su Kim
- Department of Regenerative Medicine, College of Medicine, Soonchunhyang University, Cheonan 31151, Republic of Korea
- Department of Rehabilitation Medicine, Soonchunhyang University Cheonan Hospital, Cheonan 31151, Republic of Korea
| |
Collapse
|
6
|
Srinivasan Y, Liu A, Rameau A. Machine learning in the evaluation of voice and swallowing in the head and neck cancer patient. Curr Opin Otolaryngol Head Neck Surg 2024; 32:105-112. [PMID: 38116798 DOI: 10.1097/moo.0000000000000948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2023]
Abstract
PURPOSE OF REVIEW The purpose of this review is to present recent advances and limitations in machine learning applied to the evaluation of speech, voice, and swallowing in head and neck cancer. RECENT FINDINGS Novel machine learning models incorporating diverse data modalities with improved discriminatory capabilities have been developed for predicting toxicities following head and neck cancer therapy, including dysphagia, dysphonia, xerostomia, and weight loss as well as guiding treatment planning. Machine learning has been applied to the care of posttreatment voice and swallowing dysfunction by offering objective and standardized assessments and aiding innovative technologies for functional restoration. Voice and speech are also being utilized in machine learning algorithms to screen laryngeal cancer. SUMMARY Machine learning has the potential to help optimize, assess, predict, and rehabilitate voice and swallowing function in head and neck cancer patients as well as aid in cancer screening. However, existing studies are limited by the lack of sufficient external validation and generalizability, insufficient transparency and reproducibility, and no clear superior predictive modeling strategies. Algorithms and applications will need to be trained on large multiinstitutional data sets, incorporate sociodemographic data to reduce bias, and achieve validation through clinical trials for optimal performance and utility.
Collapse
Affiliation(s)
- Yashes Srinivasan
- Sean Parker Institute for the Voice, Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, New York, New York
| | - Amy Liu
- University of California, San Diego, School of Medicine, San Diego, California, USA
| | - Anaïs Rameau
- Sean Parker Institute for the Voice, Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, New York, New York
| |
Collapse
|