1
|
Alazwari S, Maashi M, Alsamri J, Alamgeer M, Ebad SA, Alotaibi SS, Obayya M, Al Zanin S. Improving laryngeal cancer detection using chaotic metaheuristics integration with squeeze-and-excitation resnet model. Health Inf Sci Syst 2024; 12:38. [PMID: 39006830 PMCID: PMC11239646 DOI: 10.1007/s13755-024-00296-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Accepted: 06/26/2024] [Indexed: 07/16/2024] Open
Abstract
Laryngeal cancer (LC) represents a substantial world health problem, with diminished survival rates attributed to late-stage diagnoses. Correct treatment for LC is complex, particularly in the final stages. This kind of cancer is a complex malignancy inside the head and neck region of patients. Recently, researchers serving medical consultants to recognize LC efficiently develop different analysis methods and tools. However, these existing tools and techniques have various problems regarding performance constraints, like lesser accuracy in detecting LC at the early stages, additional computational complexity, and colossal time utilization in patient screening. Deep learning (DL) approaches have been established that are effective in the recognition of LC. Therefore, this study develops an efficient LC Detection using the Chaotic Metaheuristics Integration with the DL (LCD-CMDL) technique. The LCD-CMDL technique mainly focuses on detecting and classifying LC utilizing throat region images. In the LCD-CMDL technique, the contrast enhancement process uses the CLAHE approach. For feature extraction, the LCD-CMDL technique applies the Squeeze-and-Excitation ResNet (SE-ResNet) model to learn the complex and intrinsic features from the image preprocessing. Moreover, the hyperparameter tuning of the SE-ResNet approach is performed using a chaotic adaptive sparrow search algorithm (CSSA). Finally, the extreme learning machine (ELM) model was applied to detect and classify the LC. The performance evaluation of the LCD-CMDL approach occurs utilizing a benchmark throat region image database. The experimental values implied the superior performance of the LCD-CMDL approach over recent state-of-the-art approaches.
Collapse
Affiliation(s)
- Sana Alazwari
- Department of Information Technology, College of Computers and Information Technology, Taif University, Taif P.O. Box 11099, 21944 Taif, Saudi Arabia
| | - Mashael Maashi
- Department of Software Engineering, College of Computer and Information Sciences, King Saud University, Po Box 103786, 11543 Riyadh, Saudi Arabia
| | - Jamal Alsamri
- Department of Biomedical Engineering, College of Engineering, Princess Nourah Bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Mohammad Alamgeer
- Department of Information Systems, College of Science & Art at Mahayil, King Khalid University, Abha, Saudi Arabia
| | - Shouki A Ebad
- Department of Computer Science, Faculty of Science, Northern Border University, 91431 Arar, Saudi Arabia
| | - Saud S Alotaibi
- Department of Information Systems, College of Computing and Information Systems,, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Marwa Obayya
- Department of Biomedical Engineering, College of Engineering, Princess Nourah Bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Samah Al Zanin
- Department of Computer Science, Applied College, Prince Sattam Bin Abdulaziz University, Kharj, Saudi Arabia
| |
Collapse
|
2
|
Paderno A, Bedi N, Rau A, Holsinger CF. Computer Vision and Videomics in Otolaryngology-Head and Neck Surgery: Bridging the Gap Between Clinical Needs and the Promise of Artificial Intelligence. Otolaryngol Clin North Am 2024; 57:703-718. [PMID: 38981809 DOI: 10.1016/j.otc.2024.05.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/11/2024]
Abstract
This article discusses the role of computer vision in otolaryngology, particularly through endoscopy and surgery. It covers recent applications of artificial intelligence (AI) in nonradiologic imaging within otolaryngology, noting the benefits and challenges, such as improving diagnostic accuracy and optimizing therapeutic outcomes, while also pointing out the necessity for enhanced data curation and standardized research methodologies to advance clinical applications. Technical aspects are also covered, providing a detailed view of the progression from manual feature extraction to more complex AI models, including convolutional neural networks and vision transformers and their potential application in clinical settings.
Collapse
Affiliation(s)
- Alberto Paderno
- IRCCS Humanitas Research Hospital, via Manzoni 56, Rozzano, Milan 20089, Italy; Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini 4, Pieve Emanuele, Milan 20072, Italy.
| | - Nikita Bedi
- Division of Head and Neck Surgery, Department of Otolaryngology, Stanford University, Palo Alto, CA, USA
| | - Anita Rau
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, USA
| | | |
Collapse
|
3
|
Asya O, Kavak ÖT, Özden HÖ, Günal D, Enver N. Demographic and clinical characteristics of our patients diagnosed with laryngeal dystonia. Eur Arch Otorhinolaryngol 2024; 281:4265-4271. [PMID: 38710818 PMCID: PMC11266236 DOI: 10.1007/s00405-024-08688-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 04/14/2024] [Indexed: 05/08/2024]
Abstract
PURPOSE Laryngeal dystonia (LD) is a focal dystonia affecting laryngeal musculature with no known etiology or cure. The present study evaluated the sociodemographic and clinical features of patients diagnosed with LD. MATERIALS AND METHODS All patients diagnosed with LD at our University Hospital's Ear, Nose, and Throat Department between January 2017 and July 2023 were retrospectively analyzed. The study included 43 patients. RESULTS Out of the 43 patients, 19 (44%) were male. At the time of diagnosis, the mean age of the patients was 35.1 years (ranging from 17 to 65 years). The mean elapsed time between the first symptom onset and the first diagnosis was 49.2 months (min. 4 months, max. 240 months). Of the participants, 94% had adductor-type LD. None of the patients had a family history of LD. Of the patients, 9 (20%) experienced a life-altering event or trauma just before the onset of symptoms. All patients who consumed alcohol reported symptom relief with alcohol intake. A total of 67.6% of patients stated that their symptoms were triggered by stress. All of our patients received at least one Botulinum toxin injection, with an average of 2.75 dosages per patient. CONCLUSION The gender distribution was approximately equitable between males and females. There was a tendency for men to receive a diagnosis earlier than women following the manifestation of symptoms. A significant number of patients associate the emergence of their symptoms with a stressful event or traumatic experience. This study represents the initial investigation into the sociodemographic characteristics of patients within the Turkish population.
Collapse
Affiliation(s)
- Orhan Asya
- Department of Otorhinolaryngology, Pendik Training and Research Hospital, Marmara University Faculty of Medicine, Fevzi Çakmak, Muhsin Yazıcıoğlu Street, 34899, Istanbul, Turkey
| | - Ömer Tarık Kavak
- Department of Otorhinolaryngology, Pendik Training and Research Hospital, Marmara University Faculty of Medicine, Fevzi Çakmak, Muhsin Yazıcıoğlu Street, 34899, Istanbul, Turkey.
| | - Hatice Ömercikoğlu Özden
- Department of Neurology, Pendik Training and Research Hospital, Marmara University Faculty of Medicine, Fevzi Çakmak, Muhsin Yazıcıoğlu Street, 34899, Istanbul, Turkey
| | - Dilek Günal
- Department of Neurology, Pendik Training and Research Hospital, Marmara University Faculty of Medicine, Fevzi Çakmak, Muhsin Yazıcıoğlu Street, 34899, Istanbul, Turkey
| | - Necati Enver
- Department of Otorhinolaryngology, Pendik Training and Research Hospital, Marmara University Faculty of Medicine, Fevzi Çakmak, Muhsin Yazıcıoğlu Street, 34899, Istanbul, Turkey
| |
Collapse
|
4
|
Soe NN, Yu Z, Latt PM, Lee D, Ong JJ, Ge Z, Fairley CK, Zhang L. Evaluation of artificial intelligence-powered screening for sexually transmitted infections-related skin lesions using clinical images and metadata. BMC Med 2024; 22:296. [PMID: 39020355 PMCID: PMC11256573 DOI: 10.1186/s12916-024-03512-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 07/02/2024] [Indexed: 07/19/2024] Open
Abstract
BACKGROUND Sexually transmitted infections (STIs) pose a significant global public health challenge. Early diagnosis and treatment reduce STI transmission, but rely on recognising symptoms and care-seeking behaviour of the individual. Digital health software that distinguishes STI skin conditions could improve health-seeking behaviour. We developed and evaluated a deep learning model to differentiate STIs from non-STIs based on clinical images and symptoms. METHODS We used 4913 clinical images of genital lesions and metadata from the Melbourne Sexual Health Centre collected during 2010-2023. We developed two binary classification models to distinguish STIs from non-STIs: (1) a convolutional neural network (CNN) using images only and (2) an integrated model combining both CNN and fully connected neural network (FCN) using images and metadata. We evaluated the model performance by the area under the ROC curve (AUC) and assessed metadata contributions to the Image-only model. RESULTS Our study included 1583 STI and 3330 non-STI images. Common STI diagnoses were syphilis (34.6%), genital warts (24.5%) and herpes (19.4%), while most non-STIs (80.3%) were conditions such as dermatitis, lichen sclerosis and balanitis. In both STI and non-STI groups, the most frequently observed groups were 25-34 years (48.6% and 38.2%, respectively) and heterosexual males (60.3% and 45.9%, respectively). The Image-only model showed a reasonable performance with an AUC of 0.859 (SD 0.013). The Image + Metadata model achieved a significantly higher AUC of 0.893 (SD 0.018) compared to the Image-only model (p < 0.01). Out of 21 metadata, the integration of demographic and dermatological metadata led to the most significant improvement in model performance, increasing AUC by 6.7% compared to the baseline Image-only model. CONCLUSIONS The Image + Metadata model outperformed the Image-only model in distinguishing STIs from other skin conditions. Using it as a screening tool in a clinical setting may require further development and evaluation with larger datasets.
Collapse
Affiliation(s)
- Nyi N Soe
- Melbourne Sexual Health Centre, Alfred Health, 580 Swanston Street, Carlton, Melbourne, VIC, 3053, Australia
- School of Translational Medicine, Faculty of Medicine, Nursing and Health Sciences, Monash University, Melbourne, Australia
| | - Zhen Yu
- School of Translational Medicine, Faculty of Medicine, Nursing and Health Sciences, Monash University, Melbourne, Australia
| | - Phyu M Latt
- Melbourne Sexual Health Centre, Alfred Health, 580 Swanston Street, Carlton, Melbourne, VIC, 3053, Australia
- School of Translational Medicine, Faculty of Medicine, Nursing and Health Sciences, Monash University, Melbourne, Australia
| | - David Lee
- Melbourne Sexual Health Centre, Alfred Health, 580 Swanston Street, Carlton, Melbourne, VIC, 3053, Australia
| | - Jason J Ong
- Melbourne Sexual Health Centre, Alfred Health, 580 Swanston Street, Carlton, Melbourne, VIC, 3053, Australia
- School of Translational Medicine, Faculty of Medicine, Nursing and Health Sciences, Monash University, Melbourne, Australia
| | - Zongyuan Ge
- Augmented Intelligence and Multimodal analytics (AIM) for Health Lab, Faculty of Information Technology, Monash University, Melbourne, Australia
| | - Christopher K Fairley
- Melbourne Sexual Health Centre, Alfred Health, 580 Swanston Street, Carlton, Melbourne, VIC, 3053, Australia
- School of Translational Medicine, Faculty of Medicine, Nursing and Health Sciences, Monash University, Melbourne, Australia
| | - Lei Zhang
- Melbourne Sexual Health Centre, Alfred Health, 580 Swanston Street, Carlton, Melbourne, VIC, 3053, Australia.
- School of Translational Medicine, Faculty of Medicine, Nursing and Health Sciences, Monash University, Melbourne, Australia.
- Clinical Medical Research Centre, Children's Hospital of Nanjing Medical University, Nanjing, Jiangsu Province, 210008, China.
| |
Collapse
|
5
|
Kavak ÖT, Gündüz Ş, Vural C, Enver N. Artificial intelligence based diagnosis of sulcus: assesment of videostroboscopy via deep learning. Eur Arch Otorhinolaryngol 2024:10.1007/s00405-024-08801-y. [PMID: 39001913 DOI: 10.1007/s00405-024-08801-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 06/19/2024] [Indexed: 07/15/2024]
Abstract
PURPOSE To develop a convolutional neural network (CNN)-based model for classifying videostroboscopic images of patients with sulcus, benign vocal fold (VF) lesions, and healthy VFs to improve clinicians' accuracy in diagnosis during videostroboscopies when evaluating sulcus. MATERIALS AND METHODS Videostroboscopies of 433 individuals who were diagnosed with sulcus (91), who were diagnosed with benign VF diseases (i.e., polyp, nodule, papilloma, cyst, or pseudocyst [311]), or who were healthy (33) were analyzed. After extracting 91,159 frames from videostroboscopies, a CNN-based model was created and tested. The healthy and sulcus groups underwent binary classification. In the second phase of the study, benign VF lesions were added to the training set, and multiclassification was executed across all groups. The proposed CNN-based model results were compared with five laryngology experts' assessments. RESULTS In the binary classification phase, the CNN-based model achieved 98% accuracy, 98% recall, 97% precision, and a 97% F1 score for classifying sulcus and healthy VFs. During the multiclassification phase, when evaluated on a subset of frames encompassing all included groups, the CNN-based model demonstrated greater accuracy when compared with that of the five laryngologists (%76 versus 72%, 68%, 72%, 63%, and 72%). CONCLUSION The utilization of a CNN-based model serves as a significant aid in the diagnosis of sulcus, a VF disease that presents notable challenges in the diagnostic process. Further research could be undertaken to assess the practicality of implementing this approach in real-time application in clinical practice.
Collapse
Affiliation(s)
- Ömer Tarık Kavak
- Department of Otorhinolaryngology, Marmara University Faculty of Medicine, Pendik Training and Research Hospital, Fevzi Çakmak Muhsin Yazıcıoğlu Street, İstanbul, 34899, Turkey.
| | - Şevket Gündüz
- VRLab Academy, 32 Willoughby Rd, Harringay Ladder, London, N8 0JG, UK
| | - Cabir Vural
- Marmara University Faculty of Engineering, Electrical and Electronics Engineering, Başıbüyük, RTE Campus, İstanbul, 34854, Turkey
| | - Necati Enver
- Department of Otorhinolaryngology, Marmara University Faculty of Medicine, Pendik Training and Research Hospital, Fevzi Çakmak Muhsin Yazıcıoğlu Street, İstanbul, 34899, Turkey
| |
Collapse
|
6
|
Yousef AM, Deliyski DD, Zacharias SRC, Naghibolhosseini M. Detection of Vocal Fold Image Obstructions in High-Speed Videoendoscopy During Connected Speech in Adductor Spasmodic Dysphonia: A Convolutional Neural Networks Approach. J Voice 2024; 38:951-962. [PMID: 35304042 PMCID: PMC9474736 DOI: 10.1016/j.jvoice.2022.01.028] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 01/30/2022] [Accepted: 01/30/2022] [Indexed: 01/10/2023]
Abstract
OBJECTIVE Adductor spasmodic dysphonia (AdSD) is a neurogenic voice disorder, affecting the intrinsic laryngeal muscle control. AdSD leads to involuntary laryngeal spasms and only reveals during connected speech. Laryngeal high-speed videoendoscopy (HSV) coupled with a flexible fiberoptic endoscope provides a unique opportunity to study voice production and visualize the vocal fold vibrations in AdSD during speech. The goal of this study is to automatically detect instances during which the image of the vocal folds is optically obstructed in HSV recordings obtained during connected speech. METHODS HSV data were recorded from vocally normal adults and patients with AdSD during reading of the "Rainbow Passage", six CAPE-V sentences, and production of the vowel /i/. A convolutional neural network was developed and trained as a classifier to detect obstructed/unobstructed vocal folds in HSV frames. Manually labelled data were used for training, validating, and testing of the network. Moreover, a comprehensive robustness evaluation was conducted to compare the performance of the developed classifier and visual analysis of HSV data. RESULTS The developed convolutional neural network was able to automatically detect the vocal fold obstructions in HSV data in vocally normal participants and AdSD patients. The trained network was tested successfully and showed an overall classification accuracy of 94.18% on the testing dataset. The robustness evaluation showed an average overall accuracy of 94.81% on a massive number of HSV frames demonstrating the high robustness of the introduced technique while keeping a high level of accuracy. CONCLUSIONS The proposed approach can be used for efficient analysis of HSV data to study laryngeal maneuvers in patients with AdSD during connected speech. Additionally, this method will facilitate development of vocal fold vibratory measures for HSV frames with an unobstructed view of the vocal folds. Indicating parts of connected speech that provide an unobstructed view of the vocal folds can be used for developing optimal passages for precise HSV examination during connected speech and subject-specific clinical voice assessment protocols.
Collapse
Affiliation(s)
- Ahmed M Yousef
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | - Dimitar D Deliyski
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | - Stephanie R C Zacharias
- Head and Neck Regenerative Medicine Program, Mayo Clinic, Scottsdale, Arizona; Department of Otolaryngology-Head and Neck Surgery, Mayo Clinic, Phoenix, Arizona
| | - Maryam Naghibolhosseini
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan.
| |
Collapse
|
7
|
Wang ML, Tie CW, Wang JH, Zhu JQ, Chen BH, Li Y, Zhang S, Liu L, Guo L, Yang L, Yang LQ, Wei J, Jiang F, Zhao ZQ, Wang GQ, Zhang W, Zhang QM, Ni XG. Multi-instance learning based artificial intelligence model to assist vocal fold leukoplakia diagnosis: A multicentre diagnostic study. Am J Otolaryngol 2024; 45:104342. [PMID: 38703609 DOI: 10.1016/j.amjoto.2024.104342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 04/23/2024] [Indexed: 05/06/2024]
Abstract
OBJECTIVE To develop a multi-instance learning (MIL) based artificial intelligence (AI)-assisted diagnosis models by using laryngoscopic images to differentiate benign and malignant vocal fold leukoplakia (VFL). METHODS The AI system was developed, trained and validated on 5362 images of 551 patients from three hospitals. Automated regions of interest (ROI) segmentation algorithm was utilized to construct image-level features. MIL was used to fusion image level results to patient level features, then the extracted features were modeled by seven machine learning algorithms. Finally, we evaluated the image level and patient level results. Additionally, 50 videos of VFL were prospectively gathered to assess the system's real-time diagnostic capabilities. A human-machine comparison database was also constructed to compare the diagnostic performance of otolaryngologists with and without AI assistance. RESULTS In internal and external validation sets, the maximum area under the curve (AUC) for image level segmentation models was 0.775 (95 % CI 0.740-0.811) and 0.720 (95 % CI 0.684-0.756), respectively. Utilizing a MIL-based fusion strategy, the AUC at the patient level increased to 0.869 (95 % CI 0.798-0.940) and 0.851 (95 % CI 0.756-0.945). For real-time video diagnosis, the maximum AUC at the patient level reached 0.850 (95 % CI, 0.743-0.957). With AI assistance, the AUC improved from 0.720 (95 % CI 0.682-0.755) to 0.808 (95 % CI 0.775-0.839) for senior otolaryngologists and from 0.647 (95 % CI 0.608-0.686) to 0.807 (95 % CI 0.773-0.837) for junior otolaryngologists. CONCLUSIONS The MIL based AI-assisted diagnosis system can significantly improve the diagnostic performance of otolaryngologists for VFL and help to make proper clinical decisions.
Collapse
Affiliation(s)
- Mei-Ling Wang
- Department of Endoscopy, National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences, Peking Union Medical College, Shenzhen, China
| | - Cheng-Wei Tie
- Department of Endoscopy, National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, China
| | - Jian-Hui Wang
- Department of Endoscopy, Shanxi Province Cancer Hospital, Shanxi Hospital Affiliated to Cancer Hospital, Chinese Academy of Medical Sciences, Cancer Hospital Affiliated to Shanxi Medical University, Taiyuan, China
| | - Ji-Qing Zhu
- Department of Endoscopy, National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, China
| | - Bing-Hong Chen
- Department of Endoscopy, National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences, Peking Union Medical College, Shenzhen, China
| | - Ying Li
- Department of Endoscopy, National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences, Peking Union Medical College, Shenzhen, China
| | - Sen Zhang
- Department of Otolaryngology Head and Neck Surgery, The First Hospital, Shanxi Medical University, Taiyuan, China
| | - Lin Liu
- Department of Otolaryngology Head and Neck Surgery, Dalian Friendship Hospital, Dalian, China
| | - Li Guo
- Department of Otolaryngology Head and Neck Surgery, the First Affiliated Hospital, College of Clinical Medicine of Henan University of Science and Technology, Luoyang, China
| | - Long Yang
- Department of Otolaryngology, The Second People's Hospital of Baoshan City, Baoshan, China
| | - Li-Qun Yang
- Department of Otolaryngology, The Second People's Hospital of Baoshan City, Baoshan, China
| | - Jiao Wei
- Department of Otolaryngology, Qujing Second People's Hospital of Yunnan Province, Qujing, China
| | - Feng Jiang
- Department of Otolaryngology, Kunming First People's Hospital, Kunming, China
| | - Zhi-Qiang Zhao
- Department of Otolaryngology, Baoshan People's Hospital, Baoshan, China
| | - Gui-Qi Wang
- Department of Endoscopy, National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, China.
| | - Wei Zhang
- Department of Endoscopy, National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences, Peking Union Medical College, Shenzhen, China.
| | - Quan-Mao Zhang
- Department of Endoscopy, Shanxi Province Cancer Hospital, Shanxi Hospital Affiliated to Cancer Hospital, Chinese Academy of Medical Sciences, Cancer Hospital Affiliated to Shanxi Medical University, Taiyuan, China.
| | - Xiao-Guang Ni
- Department of Endoscopy, National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, China.
| |
Collapse
|
8
|
Paderno A, Rau A, Bedi N, Bossi P, Mercante G, Piazza C, Holsinger FC. Computer Vision Foundation Models in Endoscopy: Proof of Concept in Oropharyngeal Cancer. Laryngoscope 2024. [PMID: 38850247 DOI: 10.1002/lary.31534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Revised: 04/15/2024] [Accepted: 05/06/2024] [Indexed: 06/10/2024]
Abstract
OBJECTIVES To evaluate the performance of vision transformer-derived image embeddings for distinguishing between normal and neoplastic tissues in the oropharynx and to investigate the potential of computer vision (CV) foundation models in medical imaging. METHODS Computational study using endoscopic frames with a focus on the application of a self-supervised vision transformer model (DINOv2) for tissue classification. High-definition endoscopic images were used to extract image patches that were then normalized and processed using the DINOv2 model to obtain embeddings. These embeddings served as input for a standard support vector machine (SVM) to classify the tissues as neoplastic or normal. The model's discriminative performance was validated using an 80-20 train-validation split. RESULTS From 38 endoscopic NBI videos, 327 image patches were analyzed. The classification results in the validation cohort demonstrated high accuracy (92%) and precision (89%), with a perfect recall (100%) and an F1-score of 94%. The receiver operating characteristic (ROC) curve yielded an area under the curve (AUC) of 0.96. CONCLUSION The use of large vision model-derived embeddings effectively differentiated between neoplastic and normal oropharyngeal tissues. This study supports the feasibility of employing CV foundation models like DINOv2 in the endoscopic evaluation of mucosal lesions, potentially augmenting diagnostic precision in Otorhinolaryngology. LEVEL OF EVIDENCE 4 Laryngoscope, 2024.
Collapse
Affiliation(s)
- Alberto Paderno
- Otorhinolaryngology Unit, IRCCS Humanitas Research Hospital, Milan, Italy
- Department of Biomedical Sciences, Humanitas University, Milan, Italy
| | - Anita Rau
- Department of Biomedical Data Science, Stanford University, Palo Alto, California, U.S.A
| | - Nikita Bedi
- Division of Head and Neck Surgery, Department of Otolaryngology, Stanford University, Palo Alto, California, U.S.A
| | - Paolo Bossi
- Department of Biomedical Sciences, Humanitas University, Milan, Italy
- Oncology Unit, IRCCS Humanitas Research Hospital, Milan, Italy
| | - Giuseppe Mercante
- Otorhinolaryngology Unit, IRCCS Humanitas Research Hospital, Milan, Italy
- Department of Biomedical Sciences, Humanitas University, Milan, Italy
| | - Cesare Piazza
- Unit of Otorhinolaryngology - Head and Neck Surgery, ASST Spedali Civili, Department of Surgical and Medical Specialties, Radiological Sciences, and Public Health, University of Brescia, School of Medicine, Brescia, Italy
| | - Floyd Christopher Holsinger
- Division of Head and Neck Surgery, Department of Otolaryngology, Stanford University, Palo Alto, California, U.S.A
| |
Collapse
|
9
|
Barlow J, Sragi Z, Rivera-Rivera G, Al-Awady A, Daşdöğen Ü, Courey MS, Kirke DN. The Use of Deep Learning Software in the Detection of Voice Disorders: A Systematic Review. Otolaryngol Head Neck Surg 2024; 170:1531-1543. [PMID: 38168017 DOI: 10.1002/ohn.636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 11/30/2023] [Accepted: 12/07/2023] [Indexed: 01/05/2024]
Abstract
OBJECTIVE To summarize the use of deep learning in the detection of voice disorders using acoustic and laryngoscopic input, compare specific neural networks in terms of accuracy, and assess their effectiveness compared to expert clinical visual examination. DATA SOURCES Embase, MEDLINE, and Cochrane Central. REVIEW METHODS Databases were screened through November 11, 2023 for relevant studies. The inclusion criteria required studies to utilize a specified deep learning method, use laryngoscopy or acoustic input, and measure accuracy of binary classification between healthy patients and those with voice disorders. RESULTS Thirty-four studies met the inclusion criteria, with 18 focusing on voice analysis, 15 on imaging analysis, and 1 both. Across the 18 acoustic studies, 21 programs were used for identification of organic and functional voice disorders. These technologies included 10 convolutional neural networks (CNNs), 6 multilayer perceptrons (MLPs), and 5 other neural networks. The binary classification systems yielded a mean accuracy of 89.0% overall, including 93.7% for MLP programs and 84.5% for CNNs. Among the 15 imaging analysis studies, a total of 23 programs were utilized, resulting in a mean accuracy of 91.3%. Specifically, the twenty CNNs achieved a mean accuracy of 92.6% compared to 83.0% for the 3 MLPs. CONCLUSION Deep learning models were shown to be highly accurate in the detection of voice pathology, with CNNs most effective for assessing laryngoscopy images and MLPs most effective for assessing acoustic input. While deep learning methods outperformed expert clinical exam in limited comparisons, further studies integrating external validation are necessary.
Collapse
Affiliation(s)
- Joshua Barlow
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York City, New York, USA
| | - Zara Sragi
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York City, New York, USA
| | - Gabriel Rivera-Rivera
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York City, New York, USA
| | - Abdurrahman Al-Awady
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York City, New York, USA
| | - Ümit Daşdöğen
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York City, New York, USA
| | - Mark S Courey
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York City, New York, USA
| | - Diana N Kirke
- Department of Otolaryngology-Head and Neck Surgery, Icahn School of Medicine at Mount Sinai, New York City, New York, USA
| |
Collapse
|
10
|
Dao TTP, Huynh TL, Pham MK, Le TN, Nguyen TC, Nguyen QT, Tran BA, Van BN, Ha CC, Tran MT. Improving Laryngoscopy Image Analysis Through Integration of Global Information and Local Features in VoFoCD Dataset. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024:10.1007/s10278-024-01068-z. [PMID: 38809338 DOI: 10.1007/s10278-024-01068-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 02/24/2024] [Accepted: 02/26/2024] [Indexed: 05/30/2024]
Abstract
The diagnosis and treatment of vocal fold disorders heavily rely on the use of laryngoscopy. A comprehensive vocal fold diagnosis requires accurate identification of crucial anatomical structures and potential lesions during laryngoscopy observation. However, existing approaches have yet to explore the joint optimization of the decision-making process, including object detection and image classification tasks simultaneously. In this study, we provide a new dataset, VoFoCD, with 1724 laryngology images designed explicitly for object detection and image classification in laryngoscopy images. Images in the VoFoCD dataset are categorized into four classes and comprise six glottic object types. Moreover, we propose a novel Multitask Efficient trAnsformer network for Laryngoscopy (MEAL) to classify vocal fold images and detect glottic landmarks and lesions. To further facilitate interpretability for clinicians, MEAL provides attention maps to visualize important learned regions for explainable artificial intelligence results toward supporting clinical decision-making. We also analyze our model's effectiveness in simulated clinical scenarios where shaking of the laryngoscopy process occurs. The proposed model demonstrates outstanding performance on our VoFoCD dataset. The accuracy for image classification and mean average precision at an intersection over a union threshold of 0.5 (mAP50) for object detection are 0.951 and 0.874, respectively. Our MEAL method integrates global knowledge, encompassing general laryngoscopy image classification, into local features, which refer to distinct anatomical regions of the vocal fold, particularly abnormal regions, including benign and malignant lesions. Our contribution can effectively aid laryngologists in identifying benign or malignant lesions of vocal folds and classifying images in the laryngeal endoscopy process visually.
Collapse
Affiliation(s)
- Thao Thi Phuong Dao
- University of Science, Ho Chi Minh City, Vietnam
- John von Neumann Institute, Ho Chi Minh City, Vietnam
- Vietnam National University, Ho Chi Minh City, Vietnam
- Department of Otolaryngology, Thong Nhat Hospital, Tan Binh District, Ho Chi Minh City, Vietnam
| | - Tuan-Luc Huynh
- University of Science, Ho Chi Minh City, Vietnam
- Vietnam National University, Ho Chi Minh City, Vietnam
| | | | - Trung-Nghia Le
- University of Science, Ho Chi Minh City, Vietnam
- Vietnam National University, Ho Chi Minh City, Vietnam
| | - Tan-Cong Nguyen
- University of Science, Ho Chi Minh City, Vietnam
- Vietnam National University, Ho Chi Minh City, Vietnam
- University of Social Sciences and Humanities, Ho Chi Minh City, Vietnam
| | - Quang-Thuc Nguyen
- University of Science, Ho Chi Minh City, Vietnam
- John von Neumann Institute, Ho Chi Minh City, Vietnam
- Vietnam National University, Ho Chi Minh City, Vietnam
| | - Bich Anh Tran
- Otorhinolaryngology Department, Cho Ray Hospital, District 5, Ho Chi Minh City, Vietnam
| | - Boi Ngoc Van
- Department of Otolaryngology, Vinmec Central Park International Hospital, Binh Thanh District, Ho Chi Minh City, Vietnam
| | - Chanh Cong Ha
- Department of Otolaryngology, District 7 Hospital, District 7, Ho Chi Minh City, Vietnam
| | - Minh-Triet Tran
- University of Science, Ho Chi Minh City, Vietnam.
- John von Neumann Institute, Ho Chi Minh City, Vietnam.
- Vietnam National University, Ho Chi Minh City, Vietnam.
| |
Collapse
|
11
|
Tie CW, Li DY, Zhu JQ, Wang ML, Wang JH, Chen BH, Li Y, Zhang S, Liu L, Guo L, Yang L, Yang LQ, Wei J, Jiang F, Zhao ZQ, Wang GQ, Zhang W, Zhang QM, Ni XG. Multi-Instance Learning for Vocal Fold Leukoplakia Diagnosis Using White Light and Narrow-Band Imaging: A Multicenter Study. Laryngoscope 2024. [PMID: 38801129 DOI: 10.1002/lary.31537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 05/01/2024] [Accepted: 05/09/2024] [Indexed: 05/29/2024]
Abstract
OBJECTIVES Vocal fold leukoplakia (VFL) is a precancerous lesion of laryngeal cancer, and its endoscopic diagnosis poses challenges. We aim to develop an artificial intelligence (AI) model using white light imaging (WLI) and narrow-band imaging (NBI) to distinguish benign from malignant VFL. METHODS A total of 7057 images from 426 patients were used for model development and internal validation. Additionally, 1617 images from two other hospitals were used for model external validation. Modeling learning based on WLI and NBI modalities was conducted using deep learning combined with a multi-instance learning approach (MIL). Furthermore, 50 prospectively collected videos were used to evaluate real-time model performance. A human-machine comparison involving 100 patients and 12 laryngologists assessed the real-world effectiveness of the model. RESULTS The model achieved the highest area under the receiver operating characteristic curve (AUC) values of 0.868 and 0.884 in the internal and external validation sets, respectively. AUC in the video validation set was 0.825 (95% CI: 0.704-0.946). In the human-machine comparison, AI significantly improved AUC and accuracy for all laryngologists (p < 0.05). With the assistance of AI, the diagnostic abilities and consistency of all laryngologists improved. CONCLUSIONS Our multicenter study developed an effective AI model using MIL and fusion of WLI and NBI images for VFL diagnosis, particularly aiding junior laryngologists. However, further optimization and validation are necessary to fully assess its potential impact in clinical settings. LEVEL OF EVIDENCE 3 Laryngoscope, 2024.
Collapse
Affiliation(s)
- Cheng-Wei Tie
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - De-Yang Li
- The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Ji-Qing Zhu
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Mei-Ling Wang
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China
| | - Jian-Hui Wang
- Department of Endoscopy, Shanxi Province Cancer Hospital/Shanxi Hospital Affiliated to Cancer Hospital, Chinese Academy of Medical Sciences/Cancer Hospital Affiliated to Shanxi Medical University, Taiyuan, China
| | - Bing-Hong Chen
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China
| | - Ying Li
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China
| | - Sen Zhang
- Department of Otolaryngology Head and Neck Surgery, The First Hospital, Shanxi Medical University, Taiyuan, China
| | - Lin Liu
- Department of Otolaryngology Head and Neck Surgery, Dalian Friendship Hospital, Dalian, China
| | - Li Guo
- Department of Otolaryngology Head and Neck Surgery, The First Affiliated Hospital, College of Clinical Medicine of Henan University of Science and Technology, Luoyang, China
| | - Long Yang
- Department of Otolaryngology, The Second People's Hospital of Baoshan City, Baoshan, China
| | - Li-Qun Yang
- Department of Otolaryngology, The Second People's Hospital of Baoshan City, Baoshan, China
| | - Jiao Wei
- Department of Otolaryngology, Qujing Second People's Hospital of Yunnan Province, Qujing, China
| | - Feng Jiang
- Department of Otolaryngology, Kunming First People's Hospital, Kunming, China
| | - Zhi-Qiang Zhao
- Department of Otolaryngology, Baoshan People's Hospital, Baoshan, China
| | - Gui-Qi Wang
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Wei Zhang
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China
| | - Quan-Mao Zhang
- Department of Endoscopy, Shanxi Province Cancer Hospital/Shanxi Hospital Affiliated to Cancer Hospital, Chinese Academy of Medical Sciences/Cancer Hospital Affiliated to Shanxi Medical University, Taiyuan, China
| | - Xiao-Guang Ni
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| |
Collapse
|
12
|
Kim HB, Song J, Park S, Lee YO. Classification of laryngeal diseases including laryngeal cancer, benign mucosal disease, and vocal cord paralysis by artificial intelligence using voice analysis. Sci Rep 2024; 14:9297. [PMID: 38654036 DOI: 10.1038/s41598-024-58817-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 04/03/2024] [Indexed: 04/25/2024] Open
Abstract
Voice change is often the first sign of laryngeal cancer, leading to diagnosis through hospital laryngoscopy. Screening for laryngeal cancer solely based on voice could enhance early detection. However, identifying voice indicators specific to laryngeal cancer is challenging, especially when differentiating it from other laryngeal ailments. This study presents an artificial intelligence model designed to distinguish between healthy voices, laryngeal cancer voices, and those of the other laryngeal conditions. We gathered voice samples of individuals with laryngeal cancer, vocal cord paralysis, benign mucosal diseases, and healthy participants. Comprehensive testing was conducted to determine the best mel-frequency cepstral coefficient conversion and machine learning techniques, with results analyzed in-depth. In our tests, laryngeal diseases distinguishing from healthy voices achieved an accuracy of 0.85-0.97. However, when multiclass classification, accuracy ranged from 0.75 to 0.83. These findings highlight the challenges of artificial intelligence-driven voice-based diagnosis due to overlaps with benign conditions but also underscore its potential.
Collapse
Affiliation(s)
- Hyun-Bum Kim
- Department of Otolaryngology-Head and Neck Surgery, The Catholic University of Korea, Seoul, South Korea
| | - Jaemin Song
- Department of Industrial and Data Engineering, Hongik University, Seoul, South Korea
| | - Seho Park
- Department of Industrial and Data Engineering, Hongik University, Seoul, South Korea
| | - Yong Oh Lee
- Department of Industrial and Data Engineering, Hongik University, Seoul, South Korea.
| |
Collapse
|
13
|
You Z, Han B, Shi Z, Zhao M, Du S, Liu H, Hei X, Ren X, Yan Y. Vocal Cord Leukoplakia Classification Using Siamese Network Under Small Samples of White Light Endoscopy Images. Otolaryngol Head Neck Surg 2024; 170:1099-1108. [PMID: 38037413 DOI: 10.1002/ohn.591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 10/25/2023] [Accepted: 11/02/2023] [Indexed: 12/02/2023]
Abstract
OBJECTIVE Accurate vocal cord leukoplakia classification is instructive for clinical diagnosis and surgical treatment. This article introduces a reliable very deep Siamese network for accurate vocal cord leukoplakia classification. STUDY DESIGN A study of a classification network based on a retrospective database. SETTING Academic university and hospital. METHODS The white light image datasets of vocal cord leukoplakia used in this article were classified into 6 classes: normal tissues, inflammatory keratosis, mild dysplasia, moderate dysplasia, severe dysplasia, and squamous cell carcinoma. The classification performance was assessed by comparing it with 6 classical deep learning models, including AlexNet, VGG Net, Google Inception, ResNet, DenseNet, and Vision Transformer. RESULTS Experiments show the superior classification performance of our proposed network compared to state-of-the-art methods. The overall accuracy is 0.9756. The values of sensitivity and specificity are very high as well. The confusion matrix provides information for the 6-class classification task and demonstrates the superiority of our proposed network. CONCLUSION Our very deep Siamese network can provide accurate classification results of vocal cord leukoplakia, which facilitates early detection, clinical diagnosis, and surgical treatment. The excellent performance obtained in white light images can reduce the cost for patients, especially those living in developing countries.
Collapse
Affiliation(s)
- Zhenzhen You
- Scool of Computer Science and Engineering, Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, China
| | - Botao Han
- Scool of Computer Science and Engineering, Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, China
| | - Zhenghao Shi
- Scool of Computer Science and Engineering, Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, China
| | - Minghua Zhao
- Scool of Computer Science and Engineering, Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, China
| | - Shuangli Du
- Scool of Computer Science and Engineering, Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, China
| | - Haiqin Liu
- Department of Otorhinolaryngology, Second Affiliated Hospital of Medical College, Xi'an Jiaotong University, Xi'an, China
| | - Xinhong Hei
- Scool of Computer Science and Engineering, Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, China
| | - Xiaoyong Ren
- Department of Otorhinolaryngology, Second Affiliated Hospital of Medical College, Xi'an Jiaotong University, Xi'an, China
| | - Yan Yan
- Department of Otorhinolaryngology, Second Affiliated Hospital of Medical College, Xi'an Jiaotong University, Xi'an, China
| |
Collapse
|
14
|
Yao P, Witte D, German A, Periyakoil P, Kim YE, Gimonet H, Sulica L, Born H, Elemento O, Barnes J, Rameau A. A deep learning pipeline for automated classification of vocal fold polyps in flexible laryngoscopy. Eur Arch Otorhinolaryngol 2024; 281:2055-2062. [PMID: 37695363 DOI: 10.1007/s00405-023-08190-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 08/12/2023] [Indexed: 09/12/2023]
Abstract
PURPOSE To develop and validate a deep learning model for distinguishing healthy vocal folds (HVF) and vocal fold polyps (VFP) on laryngoscopy videos, while demonstrating the ability of a previously developed informative frame classifier in facilitating deep learning development. METHODS Following retrospective extraction of image frames from 52 HVF and 77 unilateral VFP videos, two researchers manually labeled each frame as informative or uninformative. A previously developed informative frame classifier was used to extract informative frames from the same video set. Both sets of videos were independently divided into training (60%), validation (20%), and test (20%) by patient. Machine-labeled frames were independently verified by two researchers to assess the precision of the informative frame classifier. Two models, pre-trained on ResNet18, were trained to classify frames as containing HVF or VFP. The accuracy of the polyp classifier trained on machine-labeled frames was compared to that of the classifier trained on human-labeled frames. The performance was measured by accuracy and area under the receiver operating characteristic curve (AUROC). RESULTS When evaluated on a hold-out test set, the polyp classifier trained on machine-labeled frames achieved an accuracy of 85% and AUROC of 0.84, whereas the classifier trained on human-labeled frames achieved an accuracy of 69% and AUROC of 0.66. CONCLUSION An accurate deep learning classifier for vocal fold polyp identification was developed and validated with the assistance of a peer-reviewed informative frame classifier for dataset assembly. The classifier trained on machine-labeled frames demonstrates improved performance compared to the classifier trained on human-labeled frames. LEVEL OF EVIDENCE: 4
Collapse
Affiliation(s)
- Peter Yao
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, 240 East 59th St, New York, NY, 10022, USA
| | - Dan Witte
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, 240 East 59th St, New York, NY, 10022, USA
| | - Alexander German
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, 240 East 59th St, New York, NY, 10022, USA
| | - Preethi Periyakoil
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, 240 East 59th St, New York, NY, 10022, USA
| | - Yeo Eun Kim
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, 240 East 59th St, New York, NY, 10022, USA
| | - Hortense Gimonet
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, 240 East 59th St, New York, NY, 10022, USA
| | - Lucian Sulica
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, 240 East 59th St, New York, NY, 10022, USA
| | - Hayley Born
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, 240 East 59th St, New York, NY, 10022, USA
| | - Olivier Elemento
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Josue Barnes
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, 240 East 59th St, New York, NY, 10022, USA
| | - Anaïs Rameau
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, 240 East 59th St, New York, NY, 10022, USA.
| |
Collapse
|
15
|
Kim J, Wang SG, Lee JC, Cheon YI, Shin SC, Lim DW, Jang DI, Bhattacharjee S, Hwang YB, Choi HK, Kwon I, Kim SJ, Kwon SB. Evaluation of Vertical Level Differences Between Left and Right Vocal Folds Using Artificial Intelligence System in Excised Canine Larynx. J Voice 2024:S0892-1997(23)00385-5. [PMID: 38216386 DOI: 10.1016/j.jvoice.2023.11.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Revised: 11/28/2023] [Accepted: 11/28/2023] [Indexed: 01/14/2024]
Abstract
OBJECTIVES This study aimed to establish an artificial intelligence (AI) system to classify vertical level differences between vocal folds during vocalization and to evaluate the accuracy of the classification. METHODS We designed models with different depths between the right and left vocal folds using an excised canine larynx. Video files for the data set were obtained using a high-speed camera system and a color complementary metal oxide semiconductor camera with global shutter. The data sets were divided into training, validation, and testing. We used 20,000 images for building the model and 8000 images for testing. To perform deep learning multiclass classification and to estimate the vertical level difference, we introduced DenseNet121-ConvLSTM. RESULTS The model was trained several times using different numbers of epochs. We achieved the most optimal results at 100 epochs, and the batch size used during training was 16. The proposed DenseNet121-ConvLSTM model achieved classification accuracies of 99.5% and 88.0% for training and testing, respectively. After verification using an external data set, the overall accuracy, precision, recall, and f1-score were 90.8%, 91.6%, 90.9%, and 91.2%, respectively. CONCLUSIONS The newly developed AI system may be an easy and accurate method for classifying superior and inferior vertical level differences between vocal folds. Thus, this AI system can be applied and may help in the assessment of vertical level differences in patients with unilateral vocal fold paralysis.
Collapse
Affiliation(s)
- Jaewon Kim
- Department of Cognitive Science, Pusan National University, Doctor's Course, Busan, South Korea; Department of Otorhinolaryngology, Head and Neck Surgery, Pusan National University Yangsan Hospital, Yangsan, Gyeongsangnam-do, South Korea
| | - Soo-Geun Wang
- Department of Otorhinolaryngology, Head and Neck Surgery, College of Medicine, Pusan National University and Medical Research Institute, Pusan National University Hospital, Busan, South Korea
| | - Jin-Choon Lee
- Department of Otorhinolaryngology, Head and Neck Surgery, Pusan National University School of Medicine, Pusan National University Yangsan Hospital, Yangsan, Gyeongsangnam-do, South Korea
| | - Yong-Il Cheon
- Department of Otorhinolaryngology, Head and Neck Surgery, Biomedical Research Institute, Pusan National University School of Medicine, Pusan National University Hospital, Busan, South Korea
| | - Sung-Chan Shin
- Department of Otorhinolaryngology, Head and Neck Surgery, Biomedical Research Institute, Pusan National University School of Medicine, Pusan National University Hospital, Busan, South Korea
| | - Dong-Won Lim
- Department of Otorhinolaryngology, Head and Neck Surgery, Pusan National University Hospital, Busan, South Korea
| | - Dae-Ik Jang
- Department of Otorhinolaryngology, Head and Neck Surgery, Kosin University Gospel Hospital, Kosin University College of Medicine, Busan, South Korea
| | | | - Yeong-Byn Hwang
- Department of Digital Anti-Aging Healthcare, u-AHRC, Inje University, Gimhae, South Korea
| | - Heung-Kook Choi
- Department of Digital Anti-Aging Healthcare, u-AHRC, Inje University, Gimhae, South Korea; Artificial Intelligence Research Center, JLK Inc., Seoul, South Korea
| | - Ickhwan Kwon
- Platform Development Headquarters, Autonomous A2Z, Daegu, South Korea
| | - Seon-Jong Kim
- Department of Applied IT and Engineering, Pusan National University, Miryang, Gyeongsangnam-do, South Korea
| | - Soon-Bok Kwon
- Department of Humanities, Language and Information, Pusan National University, Busan, South Korea.
| |
Collapse
|
16
|
Kryukov AI, Sudarev PA, Romanenko SG, Kurbanova DI, Lesogorova EV, Krasilnikova EN, Pavlikhin OG, Ivanova AA, Osadchiy AP, Shevyrina NG. [Diagnosis of benign laryngeal tumors using neural network]. Vestn Otorinolaringol 2024; 89:24-28. [PMID: 39104269 DOI: 10.17116/otorino20248903124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/07/2024]
Abstract
The article describes our experience in developing and training an artificial neural network based on artificial intelligence algorithms for recognizing the characteristic features of benign laryngeal tumors and variants of the norm of the larynx based on the analysis of laryngoscopy pictures obtained during the examination of patients. During the preparation of data for training the neural network, a dataset was collected, labeled and loaded, consisting of 1471 images of the larynx in digital formats (jpg, bmp). Next, the neural network was trained and tested in order to recognize images of the norm and neoplasms of the larynx. The developed and trained artificial neural network demonstrated an accuracy of 86% in recognizing of benign laryngeal tumors and variants of the norm of the larynx. The proposed technology can be further used in practical healthcare to control and improve the quality of diagnosis of laryngeal pathologies.
Collapse
Affiliation(s)
- A I Kryukov
- Sverzhevsky Research Clinical Institute of Otorhinolaryngology, Moscow, Russia
| | - P A Sudarev
- Sverzhevsky Research Clinical Institute of Otorhinolaryngology, Moscow, Russia
| | - S G Romanenko
- Sverzhevsky Research Clinical Institute of Otorhinolaryngology, Moscow, Russia
| | - D I Kurbanova
- Sverzhevsky Research Clinical Institute of Otorhinolaryngology, Moscow, Russia
| | - E V Lesogorova
- Sverzhevsky Research Clinical Institute of Otorhinolaryngology, Moscow, Russia
| | - E N Krasilnikova
- Sverzhevsky Research Clinical Institute of Otorhinolaryngology, Moscow, Russia
| | - O G Pavlikhin
- Sverzhevsky Research Clinical Institute of Otorhinolaryngology, Moscow, Russia
| | | | | | | |
Collapse
|
17
|
Xu ZH, Fan DG, Huang JQ, Wang JW, Wang Y, Li YZ. Computer-Aided Diagnosis of Laryngeal Cancer Based on Deep Learning with Laryngoscopic Images. Diagnostics (Basel) 2023; 13:3669. [PMID: 38132254 PMCID: PMC10743023 DOI: 10.3390/diagnostics13243669] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 12/11/2023] [Accepted: 12/12/2023] [Indexed: 12/23/2023] Open
Abstract
Laryngeal cancer poses a significant global health burden, with late-stage diagnoses contributing to reduced survival rates. This study explores the application of deep convolutional neural networks (DCNNs), specifically the Densenet201 architecture, in the computer-aided diagnosis of laryngeal cancer using laryngoscopic images. Our dataset comprised images from two medical centers, including benign and malignant cases, and was divided into training, internal validation, and external validation groups. We compared the performance of Densenet201 with other commonly used DCNN models and clinical assessments by experienced clinicians. Densenet201 exhibited outstanding performance, with an accuracy of 98.5% in the training cohort, 92.0% in the internal validation cohort, and 86.3% in the external validation cohort. The area under the curve (AUC) values consistently exceeded 92%, signifying robust discriminatory ability. Remarkably, Densenet201 achieved high sensitivity (98.9%) and specificity (98.2%) in the training cohort, ensuring accurate detection of both positive and negative cases. In contrast, other DCNN models displayed varying degrees of performance degradation in the external validation cohort, indicating the superiority of Densenet201. Moreover, Densenet201's performance was comparable to that of an experienced clinician (Clinician A) and outperformed another clinician (Clinician B), particularly in the external validation cohort. Statistical analysis, including the DeLong test, confirmed the significance of these performance differences. Our study demonstrates that Densenet201 is a highly accurate and reliable tool for the computer-aided diagnosis of laryngeal cancer based on laryngoscopic images. The findings underscore the potential of deep learning as a complementary tool for clinicians and the importance of incorporating advanced technology in improving diagnostic accuracy and patient care in laryngeal cancer diagnosis. Future work will involve expanding the dataset and further optimizing the deep learning model.
Collapse
Affiliation(s)
- Zhi-Hui Xu
- Department of Otolaryngology, The Second Affiliated Hospital, Fujian Medical University, 950 Donghai Street, Fengze District, Quanzhou 362000, China; (Z.-H.X.)
| | - Da-Ge Fan
- Department of Pathology, The Second Affiliated Hospital, Fujian Medical University, 950 Donghai Street, Fengze District, Quanzhou 362000, China;
| | - Jian-Qiang Huang
- Department of Otolaryngology, The Second Affiliated Hospital, Fujian Medical University, 950 Donghai Street, Fengze District, Quanzhou 362000, China; (Z.-H.X.)
| | - Jia-Wei Wang
- Department of Emergency, The Second Affiliated Hospital, Fujian Medical University, 950 Donghai Street, Fengze District, Quanzhou 362000, China;
| | - Yi Wang
- CT/MRI Department, The Second Affiliated Hospital, Fujian Medical University, 950 Donghai Street, Fengze District, Quanzhou 362000, China
| | - Yuan-Zhe Li
- CT/MRI Department, The Second Affiliated Hospital, Fujian Medical University, 950 Donghai Street, Fengze District, Quanzhou 362000, China
| |
Collapse
|
18
|
You Z, Han B, Shi Z, Zhao M, Du S, Yan J, Liu H, Hei X, Ren X, Yan Y. Vocal cord leukoplakia classification using deep learning models in white light and narrow band imaging endoscopy images. Head Neck 2023; 45:3129-3145. [PMID: 37837264 DOI: 10.1002/hed.27543] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 09/15/2023] [Accepted: 09/29/2023] [Indexed: 10/15/2023] Open
Abstract
BACKGROUND Accurate vocal cord leukoplakia classification is critical for the individualized treatment and early detection of laryngeal cancer. Numerous deep learning techniques have been proposed, but it is unclear how to select one to apply in the laryngeal tasks. This article introduces and reliably evaluates existing deep learning models for vocal cord leukoplakia classification. METHODS We created white light and narrow band imaging (NBI) image datasets of vocal cord leukoplakia which were classified into six classes: normal tissues (NT), inflammatory keratosis (IK), mild dysplasia (MiD), moderate dysplasia (MoD), severe dysplasia (SD), and squamous cell carcinoma (SCC). Vocal cord leukoplakia classification was performed using six classical deep learning models, AlexNet, VGG, Google Inception, ResNet, DenseNet, and Vision Transformer. RESULTS GoogLeNet (i.e., Google Inception V1), DenseNet-121, and ResNet-152 perform excellent classification. The highest overall accuracy of white light image classification is 0.9583, while the highest overall accuracy of NBI image classification is 0.9478. These three neural networks all provide very high sensitivity, specificity, and precision values. CONCLUSION GoogLeNet, ResNet, and DenseNet can provide accurate pathological classification of vocal cord leukoplakia. It facilitates early diagnosis, providing judgment on conservative treatment or surgical treatment of different degrees, and reducing the burden on endoscopists.
Collapse
Affiliation(s)
- Zhenzhen You
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, China
| | - Botao Han
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, China
| | - Zhenghao Shi
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, China
| | - Minghua Zhao
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, China
| | - Shuangli Du
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, China
| | - Jing Yan
- Department of Otorhinolaryngology, Second Affiliated Hospital of Medical College, Xi'an Jiaotong University, Xi'an, China
| | - Haiqin Liu
- Department of Otorhinolaryngology, Second Affiliated Hospital of Medical College, Xi'an Jiaotong University, Xi'an, China
| | - Xinhong Hei
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, China
| | - Xiaoyong Ren
- Department of Otorhinolaryngology, Second Affiliated Hospital of Medical College, Xi'an Jiaotong University, Xi'an, China
| | - Yan Yan
- Department of Otorhinolaryngology, Second Affiliated Hospital of Medical College, Xi'an Jiaotong University, Xi'an, China
| |
Collapse
|
19
|
Tsilivigkos C, Athanasopoulos M, Micco RD, Giotakis A, Mastronikolis NS, Mulita F, Verras GI, Maroulis I, Giotakis E. Deep Learning Techniques and Imaging in Otorhinolaryngology-A State-of-the-Art Review. J Clin Med 2023; 12:6973. [PMID: 38002588 PMCID: PMC10672270 DOI: 10.3390/jcm12226973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Revised: 11/02/2023] [Accepted: 11/06/2023] [Indexed: 11/26/2023] Open
Abstract
Over the last decades, the field of medicine has witnessed significant progress in artificial intelligence (AI), the Internet of Medical Things (IoMT), and deep learning (DL) systems. Otorhinolaryngology, and imaging in its various subspecialties, has not remained untouched by this transformative trend. As the medical landscape evolves, the integration of these technologies becomes imperative in augmenting patient care, fostering innovation, and actively participating in the ever-evolving synergy between computer vision techniques in otorhinolaryngology and AI. To that end, we conducted a thorough search on MEDLINE for papers published until June 2023, utilizing the keywords 'otorhinolaryngology', 'imaging', 'computer vision', 'artificial intelligence', and 'deep learning', and at the same time conducted manual searching in the references section of the articles included in our manuscript. Our search culminated in the retrieval of 121 related articles, which were subsequently subdivided into the following categories: imaging in head and neck, otology, and rhinology. Our objective is to provide a comprehensive introduction to this burgeoning field, tailored for both experienced specialists and aspiring residents in the domain of deep learning algorithms in imaging techniques in otorhinolaryngology.
Collapse
Affiliation(s)
- Christos Tsilivigkos
- 1st Department of Otolaryngology, National and Kapodistrian University of Athens, Hippocrateion Hospital, 115 27 Athens, Greece; (A.G.); (E.G.)
| | - Michail Athanasopoulos
- Department of Otolaryngology, University Hospital of Patras, 265 04 Patras, Greece; (M.A.); (N.S.M.)
| | - Riccardo di Micco
- Department of Otolaryngology and Head and Neck Surgery, Medical School of Hannover, 30625 Hannover, Germany;
| | - Aris Giotakis
- 1st Department of Otolaryngology, National and Kapodistrian University of Athens, Hippocrateion Hospital, 115 27 Athens, Greece; (A.G.); (E.G.)
| | - Nicholas S. Mastronikolis
- Department of Otolaryngology, University Hospital of Patras, 265 04 Patras, Greece; (M.A.); (N.S.M.)
| | - Francesk Mulita
- Department of Surgery, University Hospital of Patras, 265 04 Patras, Greece; (G.-I.V.); (I.M.)
| | - Georgios-Ioannis Verras
- Department of Surgery, University Hospital of Patras, 265 04 Patras, Greece; (G.-I.V.); (I.M.)
| | - Ioannis Maroulis
- Department of Surgery, University Hospital of Patras, 265 04 Patras, Greece; (G.-I.V.); (I.M.)
| | - Evangelos Giotakis
- 1st Department of Otolaryngology, National and Kapodistrian University of Athens, Hippocrateion Hospital, 115 27 Athens, Greece; (A.G.); (E.G.)
| |
Collapse
|
20
|
Li Y, Gu W, Yue H, Lei G, Guo W, Wen Y, Tang H, Luo X, Tu W, Ye J, Hong R, Cai Q, Gu Q, Liu T, Miao B, Wang R, Ren J, Lei W. Real-time detection of laryngopharyngeal cancer using an artificial intelligence-assisted system with multimodal data. J Transl Med 2023; 21:698. [PMID: 37805551 PMCID: PMC10559609 DOI: 10.1186/s12967-023-04572-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 09/23/2023] [Indexed: 10/09/2023] Open
Abstract
BACKGROUND Laryngopharyngeal cancer (LPC) includes laryngeal and hypopharyngeal cancer, whose early diagnosis can significantly improve the prognosis and quality of life of patients. Pathological biopsy of suspicious cancerous tissue under the guidance of laryngoscopy is the gold standard for diagnosing LPC. However, this subjective examination largely depends on the skills and experience of laryngologists, which increases the possibility of missed diagnoses and repeated unnecessary biopsies. We aimed to develop and validate a deep convolutional neural network-based Laryngopharyngeal Artificial Intelligence Diagnostic System (LPAIDS) for real-time automatically identifying LPC in both laryngoscopy white-light imaging (WLI) and narrow-band imaging (NBI) images to improve the diagnostic accuracy of LPC by reducing diagnostic variation among on-expert laryngologists. METHODS All 31,543 laryngoscopic images from 2382 patients were categorised into training, verification, and test sets to develop, validate, and internal test LPAIDS. Another 25,063 images from five other hospitals were used as external tests. Overall, 551 videos were used to evaluate the real-time performance of the system, and 200 randomly selected videos were used to compare the diagnostic performance of the LPAIDS with that of laryngologists. Two deep-learning models using either WLI (model W) or NBI (model N) images were constructed to compare with LPAIDS. RESULTS LPAIDS had a higher diagnostic performance than models W and N, with accuracies of 0·956 and 0·949 in the internal image and video tests, respectively. The robustness and stability of LPAIDS were validated in external sets with the area under the receiver operating characteristic curve values of 0·965-0·987. In the laryngologist-machine competition, LPAIDS achieved an accuracy of 0·940, which was comparable to expert laryngologists and outperformed other laryngologists with varying qualifications. CONCLUSIONS LPAIDS provided high accuracy and stability in detecting LPC in real-time, which showed great potential for using LPAIDS to improve the diagnostic accuracy of LPC by reducing diagnostic variation among on-expert laryngologists.
Collapse
Affiliation(s)
- Yun Li
- Otorhinolaryngology Hospital, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, 510080, Guangdong, China
| | - Wenxin Gu
- School of Computer Science and Engineering, Guangdong Province Key Lab of Computational Science, Sun Yat-Sen University, Guangzhou, 510006, Guangdong, China
| | - Huijun Yue
- Otorhinolaryngology Hospital, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, 510080, Guangdong, China
| | - Guoqing Lei
- Otorhinolaryngology Hospital, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, 510080, Guangdong, China
| | - Wenbin Guo
- Otorhinolaryngology Hospital, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, 510080, Guangdong, China
| | - Yihui Wen
- Otorhinolaryngology Hospital, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, 510080, Guangdong, China
| | - Haocheng Tang
- Department of Otolaryngology-Head and Neck Surgery, Nanfang Hospital, Southern Medical University, Guangzhou, Guangdong, China
| | - Xin Luo
- Department of Otolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, Guangdong, China
| | - Wenjuan Tu
- Department of Otolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, Guangdong, China
| | - Jin Ye
- Department of Otolaryngology-Head and Neck Surgery, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, Guangdong, China
| | - Ruomei Hong
- Department of Otolaryngology-Head and Neck, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, Guangzhou, Guangdong, China
| | - Qian Cai
- Department of Otolaryngology-Head and Neck, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, Guangzhou, Guangdong, China
| | - Qingyu Gu
- Department of Otorhinolaryngology-Head and Neck Surgery, The Sixth Affiliated Hospital of Sun Yat-Sen University, Guangzhou, Guangdong, China
| | - Tianrun Liu
- Department of Otorhinolaryngology-Head and Neck Surgery, The Sixth Affiliated Hospital of Sun Yat-Sen University, Guangzhou, Guangdong, China
| | - Beiping Miao
- Department of Otolaryngology-Head and Neck Surgery, Shenzhen Secondary Hospital and First Affiliated Hospital of Shenzhen University, Shenzhen, Guangdong, China
| | - Ruxin Wang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China
| | - Jiangtao Ren
- School of Computer Science and Engineering, Guangdong Province Key Lab of Computational Science, Sun Yat-Sen University, Guangzhou, 510006, Guangdong, China.
| | - Wenbin Lei
- Otorhinolaryngology Hospital, The First Affiliated Hospital, Sun Yat-Sen University, Guangzhou, 510080, Guangdong, China.
| |
Collapse
|
21
|
Sampieri C, Baldini C, Azam MA, Moccia S, Mattos LS, Vilaseca I, Peretti G, Ioppi A. Artificial Intelligence for Upper Aerodigestive Tract Endoscopy and Laryngoscopy: A Guide for Physicians and State-of-the-Art Review. Otolaryngol Head Neck Surg 2023; 169:811-829. [PMID: 37051892 DOI: 10.1002/ohn.343] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 03/03/2023] [Accepted: 03/23/2023] [Indexed: 04/14/2023]
Abstract
OBJECTIVE The endoscopic and laryngoscopic examination is paramount for laryngeal, oropharyngeal, nasopharyngeal, nasal, and oral cavity benign lesions and cancer evaluation. Nevertheless, upper aerodigestive tract (UADT) endoscopy is intrinsically operator-dependent and lacks objective quality standards. At present, there has been an increased interest in artificial intelligence (AI) applications in this area to support physicians during the examination, thus enhancing diagnostic performances. The relative novelty of this research field poses a challenge both for the reviewers and readers as clinicians often lack a specific technical background. DATA SOURCES Four bibliographic databases were searched: PubMed, EMBASE, Cochrane, and Google Scholar. REVIEW METHODS A structured review of the current literature (up to September 2022) was performed. Search terms related to topics of AI, machine learning (ML), and deep learning (DL) in UADT endoscopy and laryngoscopy were identified and queried by 3 independent reviewers. Citations of selected studies were also evaluated to ensure comprehensiveness. CONCLUSIONS Forty-one studies were included in the review. AI and computer vision techniques were used to achieve 3 fundamental tasks in this field: classification, detection, and segmentation. All papers were summarized and reviewed. IMPLICATIONS FOR PRACTICE This article comprehensively reviews the latest developments in the application of ML and DL in UADT endoscopy and laryngoscopy, as well as their future clinical implications. The technical basis of AI is also explained, providing guidance for nonexpert readers to allow critical appraisal of the evaluation metrics and the most relevant quality requirements.
Collapse
Affiliation(s)
- Claudio Sampieri
- Department of Experimental Medicine (DIMES), University of Genoa, Genoa, Italy
- Functional Unit of Head and Neck Tumors, Hospital Clínic, Barcelona, Spain
- Otorhinolaryngology Department, Hospital Clínic, Barcelona, Spain
| | - Chiara Baldini
- Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy
- Dipartimento di Informatica, Bioingegneria, Robotica e Ingegneria dei Sistemi (DIBRIS), University of Genoa, Genoa, Italy
| | - Muhammad Adeel Azam
- Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy
- Dipartimento di Informatica, Bioingegneria, Robotica e Ingegneria dei Sistemi (DIBRIS), University of Genoa, Genoa, Italy
| | - Sara Moccia
- Department of Excellence in Robotics and AI, The BioRobotics Institute, Pisa, Italy
| | - Leonardo S Mattos
- Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy
| | - Isabel Vilaseca
- Functional Unit of Head and Neck Tumors, Hospital Clínic, Barcelona, Spain
- Otorhinolaryngology Department, Hospital Clínic, Barcelona, Spain
- Head Neck Clínic, Agència de Gestió d'Ajuts Universitaris i de Recerca, Barcelona, Catalunya, Spain
- Surgery and Medical-Surgical Specialties Department, Faculty of Medicine and Health Sciences, Universitat de Barcelona, Barcelona, Spain
- Translational Genomics and Target Therapies in Solid Tumors Group, Faculty of Medicine, Institut d́Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
- University of Barcelona, Barcelona, Spain
| | - Giorgio Peretti
- Unit of Otorhinolaryngology-Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy
- Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| | - Alessandro Ioppi
- Unit of Otorhinolaryngology-Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy
- Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| |
Collapse
|
22
|
Zhong NN, Wang HQ, Huang XY, Li ZZ, Cao LM, Huo FY, Liu B, Bu LL. Enhancing head and neck tumor management with artificial intelligence: Integration and perspectives. Semin Cancer Biol 2023; 95:52-74. [PMID: 37473825 DOI: 10.1016/j.semcancer.2023.07.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 07/11/2023] [Accepted: 07/15/2023] [Indexed: 07/22/2023]
Abstract
Head and neck tumors (HNTs) constitute a multifaceted ensemble of pathologies that primarily involve regions such as the oral cavity, pharynx, and nasal cavity. The intricate anatomical structure of these regions poses considerable challenges to efficacious treatment strategies. Despite the availability of myriad treatment modalities, the overall therapeutic efficacy for HNTs continues to remain subdued. In recent years, the deployment of artificial intelligence (AI) in healthcare practices has garnered noteworthy attention. AI modalities, inclusive of machine learning (ML), neural networks (NNs), and deep learning (DL), when amalgamated into the holistic management of HNTs, promise to augment the precision, safety, and efficacy of treatment regimens. The integration of AI within HNT management is intricately intertwined with domains such as medical imaging, bioinformatics, and medical robotics. This article intends to scrutinize the cutting-edge advancements and prospective applications of AI in the realm of HNTs, elucidating AI's indispensable role in prevention, diagnosis, treatment, prognostication, research, and inter-sectoral integration. The overarching objective is to stimulate scholarly discourse and invigorate insights among medical practitioners and researchers to propel further exploration, thereby facilitating superior therapeutic alternatives for patients.
Collapse
Affiliation(s)
- Nian-Nian Zhong
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, Key Laboratory of Oral Biomedicine Ministry of Education, Hubei Key Laboratory of Stomatology, School & Hospital of Stomatology, Wuhan University, Wuhan 430079, China
| | - Han-Qi Wang
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, Key Laboratory of Oral Biomedicine Ministry of Education, Hubei Key Laboratory of Stomatology, School & Hospital of Stomatology, Wuhan University, Wuhan 430079, China
| | - Xin-Yue Huang
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, Key Laboratory of Oral Biomedicine Ministry of Education, Hubei Key Laboratory of Stomatology, School & Hospital of Stomatology, Wuhan University, Wuhan 430079, China
| | - Zi-Zhan Li
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, Key Laboratory of Oral Biomedicine Ministry of Education, Hubei Key Laboratory of Stomatology, School & Hospital of Stomatology, Wuhan University, Wuhan 430079, China
| | - Lei-Ming Cao
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, Key Laboratory of Oral Biomedicine Ministry of Education, Hubei Key Laboratory of Stomatology, School & Hospital of Stomatology, Wuhan University, Wuhan 430079, China
| | - Fang-Yi Huo
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, Key Laboratory of Oral Biomedicine Ministry of Education, Hubei Key Laboratory of Stomatology, School & Hospital of Stomatology, Wuhan University, Wuhan 430079, China
| | - Bing Liu
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, Key Laboratory of Oral Biomedicine Ministry of Education, Hubei Key Laboratory of Stomatology, School & Hospital of Stomatology, Wuhan University, Wuhan 430079, China; Department of Oral & Maxillofacial - Head Neck Oncology, School & Hospital of Stomatology, Wuhan University, Wuhan 430079, China.
| | - Lin-Lin Bu
- State Key Laboratory of Oral & Maxillofacial Reconstruction and Regeneration, Key Laboratory of Oral Biomedicine Ministry of Education, Hubei Key Laboratory of Stomatology, School & Hospital of Stomatology, Wuhan University, Wuhan 430079, China; Department of Oral & Maxillofacial - Head Neck Oncology, School & Hospital of Stomatology, Wuhan University, Wuhan 430079, China.
| |
Collapse
|
23
|
Zhou L, Jiang H, Li G, Ding J, Lv C, Duan M, Wang W, Chen K, Shen N, Huang X. Point-wise spatial network for identifying carcinoma at the upper digestive and respiratory tract. BMC Med Imaging 2023; 23:140. [PMID: 37749498 PMCID: PMC10521533 DOI: 10.1186/s12880-023-01076-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 08/07/2023] [Indexed: 09/27/2023] Open
Abstract
PROBLEM Artificial intelligence has been widely investigated for diagnosis and treatment strategy design, with some models proposed for detecting oral pharyngeal, nasopharyngeal, or laryngeal carcinoma. However, no comprehensive model has been established for these regions. AIM Our hypothesis was that a common pattern in the cancerous appearance of these regions could be recognized and integrated into a single model, thus improving the efficacy of deep learning models. METHODS We utilized a point-wise spatial attention network model to perform semantic segmentation in these regions. RESULTS Our study demonstrated an excellent outcome, with an average mIoU of 86.3%, and an average pixel accuracy of 96.3%. CONCLUSION The research confirmed that the mucosa of oral pharyngeal, nasopharyngeal, and laryngeal regions may share a common appearance, including the appearance of tumors, which can be recognized by a single artificial intelligence model. Therefore, a deep learning model could be constructed to effectively recognize these tumors.
Collapse
Affiliation(s)
- Lei Zhou
- Department of Otorhinolaryngology-Head and Neck Surgery, Zhongshan Hospital Affiliated to Fudan University, Xuhui District, 180 Fenglin Road, , Shanghai, 200032, P. R. China
| | - Huaili Jiang
- Department of Otorhinolaryngology-Head and Neck Surgery, Zhongshan Hospital Affiliated to Fudan University, Xuhui District, 180 Fenglin Road, , Shanghai, 200032, P. R. China
| | - Guangyao Li
- Department of Otorhinolaryngology-Head and Neck Surgery, Zhongshan Hospital Affiliated to Fudan University, Xuhui District, 180 Fenglin Road, , Shanghai, 200032, P. R. China
| | - Jiaye Ding
- Department of Otorhinolaryngology-Head and Neck Surgery, Zhongshan Hospital Affiliated to Fudan University, Xuhui District, 180 Fenglin Road, , Shanghai, 200032, P. R. China
| | - Cuicui Lv
- Department of Otorhinolaryngology-Head and Neck Surgery, Zhongshan Hospital Affiliated to Fudan University, Xuhui District, 180 Fenglin Road, , Shanghai, 200032, P. R. China
| | - Maoli Duan
- Department of Clinical Science, Intervention and Technology, Karolinska Institutet, Stockholm, Sweden
- Department of Otolaryngology Head and Neck Surgery, Karolinska University Hospital, 171 76, Stockholm, Sweden
| | - Wenfeng Wang
- Institute of Artificial Intelligence and Blockchain, Guangzhou University, Guangzhou, 510006, P. R. China
| | - Kongyang Chen
- Institute of Artificial Intelligence and Blockchain, Guangzhou University, Guangzhou, 510006, P. R. China
- Pazhou Lab, Guangzhou, 510330, P. R. China
| | - Na Shen
- Department of Otorhinolaryngology-Head and Neck Surgery, Zhongshan Hospital Affiliated to Fudan University, Xuhui District, 180 Fenglin Road, , Shanghai, 200032, P. R. China.
| | - Xinsheng Huang
- Department of Otorhinolaryngology-Head and Neck Surgery, Zhongshan Hospital Affiliated to Fudan University, Xuhui District, 180 Fenglin Road, , Shanghai, 200032, P. R. China.
| |
Collapse
|
24
|
Wellenstein DJ, Woodburn J, Marres HAM, van den Broek GB. Detection of laryngeal carcinoma during endoscopy using artificial intelligence. Head Neck 2023; 45:2217-2226. [PMID: 37377069 DOI: 10.1002/hed.27441] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 04/25/2023] [Accepted: 06/18/2023] [Indexed: 06/29/2023] Open
Abstract
BACKGROUND The objective of this study was to assess the performance and application of a self-developed deep learning (DL) algorithm for the real-time localization and classification of both vocal cord carcinoma and benign vocal cord lesions. METHODS The algorithm was trained and validated upon a dataset of videos and photos collected from our own department, as well as an open-access dataset named "Laryngoscope8". RESULTS The algorithm correctly localizes and classifies vocal cord carcinoma on still images with a sensitivity between 71% and 78% and benign vocal cord lesions with a sensitivity between 70% and 82%. Furthermore, the best algorithm had an average frame per second rate of 63, thus making it suitable to use in an outpatient clinic setting for real-time detection of laryngeal pathology. CONCLUSION We have demonstrated that our developed DL algorithm is able to localize and classify benign and malignant laryngeal pathology during endoscopy.
Collapse
Affiliation(s)
- David J Wellenstein
- Department of Otorhinolaryngology and Head and Neck Surgery, Radboud University Medical Center, Nijmegen, The Netherlands
| | | | - Henri A M Marres
- Department of Otorhinolaryngology and Head and Neck Surgery, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Guido B van den Broek
- Department of Otorhinolaryngology and Head and Neck Surgery, Radboud University Medical Center, Nijmegen, The Netherlands
- Department of Information Management, Radboud University Medical Center, Nijmegen, The Netherlands
| |
Collapse
|
25
|
Petruzzi G, Coden E, Iocca O, di Maio P, Pichi B, Campo F, De Virgilio A, Francesco M, Vidiri A, Pellini R. Machine learning in laryngeal cancer: A pilot study to predict oncological outcomes and the role of adverse features. Head Neck 2023; 45:2068-2078. [PMID: 37345573 DOI: 10.1002/hed.27434] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 04/27/2023] [Accepted: 06/10/2023] [Indexed: 06/23/2023] Open
Abstract
BACKGROUND Laryngeal carcinoma (LC) remains a significant economic and emotional problem to the healthcare system and severe social morbidity. New tools as Machine Learning could allow clinicians to develop accurate and reproducible treatments. METHODS This study aims to evaluate the performance of a ML-algorithm in predicting 1- and 3-year overall survival (OS) in a cohort of patients surgical treated for LC. Moreover, the impact of different adverse features on prognosis will be investigated. Data was collected on oncological FU of 132 patients. A retrospective review was performed to create a dataset of 23 variables for each patient. RESULTS The decision-tree algorithm is highly effective in predicting the prognosis, with a 95% accuracy in predicting the 1-year survival and 82.5% in 3-year survival; The measured AUC area is 0.886 at 1-year Test and 0.871 at 3-years Test. The measured AUC area is 0.917 at 1-year Training set and 0.964 at 3-years Training set. Factors that affected 1yOS are: LNR, type of surgery, and subsite. The most significant variables at 3yOS are: number of metastasis, perineural invasion and Grading. CONCLUSIONS The integration of ML in medical practices could revolutionize our approach on cancer pathology.
Collapse
Affiliation(s)
- Gerardo Petruzzi
- Department of Otolaryngology and Head and Neck Surgery, IRCCS Regina Elena National Cancer Institute, Rome, Italy
| | - Elisa Coden
- Division of Otorhinolaryngology - Head and Neck Surgery, ASST Sette Laghi, Ospedale di Circolo e Fondazione Macchi, University of Insubria, Varese, Italy
| | - Oreste Iocca
- Division of Maxillofacial Surgery, Città della Salute e della Scienza, University of Torino, Torino, Italy
| | - Pasquale di Maio
- Department of otolaryngology-Head and Neck Surgery, Giuseppe Fornaroli Hospital, ASST Ovest Milanese, Magenta, Italy
| | - Barbara Pichi
- Department of Otolaryngology and Head and Neck Surgery, IRCCS Regina Elena National Cancer Institute, Rome, Italy
| | - Flaminia Campo
- Department of Otolaryngology and Head and Neck Surgery, IRCCS Regina Elena National Cancer Institute, Rome, Italy
| | - Armando De Virgilio
- Department of Biomedical Sciences, Humanitas University, Milan, Italy
- Department of Otolaryngology and Head and Neck Surgery, IRCCS Humanitas Research Hospital, Milan, Italy
| | - Mazzola Francesco
- Department of Otolaryngology and Head and Neck Surgery, IRCCS Regina Elena National Cancer Institute, Rome, Italy
| | - Antonello Vidiri
- Department of Radiology and Diagnostic Imaging, IRCCS Regina Elena National Cancer Institute, Rome, Italy
| | - Raul Pellini
- Department of Otolaryngology and Head and Neck Surgery, IRCCS Regina Elena National Cancer Institute, Rome, Italy
| |
Collapse
|
26
|
Zhang T, Bur AM, Kraft S, Kavookjian H, Renslo B, Chen X, Luo B, Wang G. Gender, Smoking History, and Age Prediction from Laryngeal Images. J Imaging 2023; 9:109. [PMID: 37367457 DOI: 10.3390/jimaging9060109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 05/22/2023] [Accepted: 05/25/2023] [Indexed: 06/28/2023] Open
Abstract
Flexible laryngoscopy is commonly performed by otolaryngologists to detect laryngeal diseases and to recognize potentially malignant lesions. Recently, researchers have introduced machine learning techniques to facilitate automated diagnosis using laryngeal images and achieved promising results. The diagnostic performance can be improved when patients' demographic information is incorporated into models. However, the manual entry of patient data is time-consuming for clinicians. In this study, we made the first endeavor to employ deep learning models to predict patient demographic information to improve the detector model's performance. The overall accuracy for gender, smoking history, and age was 85.5%, 65.2%, and 75.9%, respectively. We also created a new laryngoscopic image set for the machine learning study and benchmarked the performance of eight classical deep learning models based on CNNs and Transformers. The results can be integrated into current learning models to improve their performance by incorporating the patient's demographic information.
Collapse
Affiliation(s)
- Tianxiao Zhang
- Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS 66045, USA
| | - Andrés M Bur
- Department of Otolaryngology-Head and Neck Surgery, University of Kansas Medical Center, Kansas City, KS 66160, USA
| | - Shannon Kraft
- Department of Otolaryngology-Head and Neck Surgery, University of Kansas Medical Center, Kansas City, KS 66160, USA
| | - Hannah Kavookjian
- Department of Otolaryngology-Head and Neck Surgery, University of Kansas Medical Center, Kansas City, KS 66160, USA
| | - Bryan Renslo
- Department of Otolaryngology-Head and Neck Surgery, University of Kansas Medical Center, Kansas City, KS 66160, USA
| | - Xiangyu Chen
- Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS 66045, USA
| | - Bo Luo
- Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS 66045, USA
| | - Guanghui Wang
- Department of Computer Science, Toronto Metropolitan University, Toronto, ON M5B 2K3, Canada
| |
Collapse
|
27
|
A Novel Framework of Manifold Learning Cascade-Clustering for the Informative Frame Selection. Diagnostics (Basel) 2023; 13:diagnostics13061151. [PMID: 36980459 PMCID: PMC10047422 DOI: 10.3390/diagnostics13061151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 03/05/2023] [Accepted: 03/10/2023] [Indexed: 03/19/2023] Open
Abstract
Narrow band imaging is an established non-invasive tool used for the early detection of laryngeal cancer in surveillance examinations. Most images produced from the examination are useless, such as blurred, specular reflection, and underexposed. Removing the uninformative frames is vital to improve detection accuracy and speed up computer-aided diagnosis. It often takes a lot of time for the physician to manually inspect the informative frames. This issue is commonly addressed by a classifier with task-specific categories of the uninformative frames. However, the definition of the uninformative categories is ambiguous, and tedious labeling still cannot be avoided. Here, we show that a novel unsupervised scheme is comparable to the current benchmarks on the dataset of NBI-InfFrames. We extract feature embedding using a vanilla neural network (VGG16) and introduce a new dimensionality reduction method called UMAP that distinguishes the feature embedding in the lower-dimensional space. Along with the proposed automatic cluster labeling algorithm and cost function in Bayesian optimization, the proposed method coupled with UMAP achieves state-of-the-art performance. It outperforms the baseline by 12% absolute. The overall median recall of the proposed method is currently the highest, 96%. Our results demonstrate the effectiveness of the proposed scheme and the robustness of detecting the informative frames. It also suggests the patterns embedded in the data help develop flexible algorithms that do not require manual labeling.
Collapse
|
28
|
Bensoussan Y, Vanstrum EB, Johns MM, Rameau A. Artificial Intelligence and Laryngeal Cancer: From Screening to Prognosis: A State of the Art Review. Otolaryngol Head Neck Surg 2023; 168:319-329. [PMID: 35787073 DOI: 10.1177/01945998221110839] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 06/13/2022] [Indexed: 11/16/2022]
Abstract
OBJECTIVE This state of the art review aims to examine contemporary advances in applications of artificial intelligence (AI) to the screening, detection, management, and prognostication of laryngeal cancer (LC). DATA SOURCES Four bibliographic databases were searched: PubMed, EMBASE, Cochrane, and IEEE. REVIEW METHODS A structured review of the current literature (up to January 2022) was performed. Search terms related to topics of AI in LC were identified and queried by 2 independent reviewers. Citations of selected studies and review articles were also evaluated to ensure comprehensiveness. CONCLUSIONS AI applications in LC have encompassed a variety of data modalities, including radiomics, genomics, acoustics, clinical data, and videomics, to support screening, diagnosis, therapeutic decision making, and prognosis. However, most studies remain at the proof-of-concept level, as AI algorithms are trained on single-institution databases with limited data sets and a single data modality. IMPLICATIONS FOR PRACTICE AI algorithms in LC will need to be trained on large multi-institutional data sets and integrate multimodal data for optimal performance and clinical utility from screening to prognosis. Out of the data types reviewed, genomics has the most potential to provide generalizable models thanks to available large multi-institutional open access genomic data sets. Voice acoustic data represent an inexpensive and accurate biomarker, which is easy and noninvasive to capture, offering a unique opportunity for screening and monitoring of LA, especially in low-resource settings.
Collapse
Affiliation(s)
- Yael Bensoussan
- Department of Otolaryngology-Head and Neck Surgery, University of South Florida, Tampa, Florida, USA
| | - Erik B Vanstrum
- Keck School of Medicine, University of Southern California, Los Angeles, California, USA
| | - Michael M Johns
- Department of Otolaryngology-Head and Neck Surgery, University of Southern California, Los Angeles, California, USA
| | - Anaïs Rameau
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medical College, New York, New York, USA
| |
Collapse
|
29
|
Yang D, Wu Y, Wan Z, Xu Z, Li W, Yuan P, Shang Q, Peng J, Tao L, Chen Q, Dan H, Xu H. HISMD: A Novel Immune Subtyping System for HNSCC. J Dent Res 2023; 102:270-279. [PMID: 36333876 DOI: 10.1177/00220345221134605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
Immune subtyping is an important way to reveal immune heterogeneity, which may contribute to the diversity of the progression and treatment in head and neck squamous cell carcinoma (HNSCC). However, reported immune subtypes mainly focus on levels of immune infiltration and are mostly based on a mono-omics profile. This study aimed to identify a comprehensive immune subtype for HNSCC via multi-omics clustering and build a novel subtype prediction system for clinical application. Data were obtained from The Cancer Genome Atlas database and our independent multicenter cohort. Multi-omics clustering was performed to identify 3 clusters of 499 patients in The Cancer Genome Atlas based on immune-related gene expression and somatic mutations. The immune characteristics and biological features of the obtained clusters were revealed by bioinformatics, and 3 immune subtypes were identified: 1) adaptive immune activation subtype predominantly enriched in T cells, 2) innate immune activation subtype predominantly enriched in macrophages, and 3) immune desert subtype. Subsequently, the clinical implications of each subtype were analyzed per clinical epidemiology. We found that adaptive immune activation showed better survival outcomes and had a similar response to chemotherapy with innate immune activation, whereas immune desert might be relatively resistant to chemotherapy. Moreover, a subtype prediction system was developed by deep learning with whole slide images and named HISMD: HNSCC Immune Subtypes via Multi-omics and Deep Learning. We endowed HISMD with interpretability through image-based key feature extraction. The clinical implications, biological significances, and predictive stability of HISMD were successfully verified by using our independent multicenter cohort data set. In summary, this study revealed the immune heterogeneity of HNSCC and obtained a novel, highly accurate, and interpretable immune subtyping prediction system. For clinical implementation in the future, additional validation and utility studies are warranted.
Collapse
Affiliation(s)
- D Yang
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Research Unit of Oral Carcinogenesis and Management, Chinese Academy of Medical Sciences, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Y Wu
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Research Unit of Oral Carcinogenesis and Management, Chinese Academy of Medical Sciences, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Z Wan
- Department of Pathology, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Z Xu
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Research Unit of Oral Carcinogenesis and Management, Chinese Academy of Medical Sciences, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - W Li
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Research Unit of Oral Carcinogenesis and Management, Chinese Academy of Medical Sciences, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - P Yuan
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Research Unit of Oral Carcinogenesis and Management, Chinese Academy of Medical Sciences, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Q Shang
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Research Unit of Oral Carcinogenesis and Management, Chinese Academy of Medical Sciences, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - J Peng
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Research Unit of Oral Carcinogenesis and Management, Chinese Academy of Medical Sciences, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - L Tao
- College of Mathematics, Sichuan University, Chengdu, China
| | - Q Chen
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Research Unit of Oral Carcinogenesis and Management, Chinese Academy of Medical Sciences, West China Hospital of Stomatology, Sichuan University, Chengdu, China.,Key Laboratory of Oral Biomedical Research of Zhejiang Province, Affiliated Stomatology Hospital, Zhejiang University School of Stomatology, Hangzhou, China
| | - H Dan
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Research Unit of Oral Carcinogenesis and Management, Chinese Academy of Medical Sciences, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - H Xu
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, Research Unit of Oral Carcinogenesis and Management, Chinese Academy of Medical Sciences, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| |
Collapse
|
30
|
Zhu JQ, Wang ML, Li Y, Zhang W, Li LJ, Liu L, Zhang Y, Han CJ, Tie CW, Wang SX, Wang GQ, Ni XG. Convolutional neural network based anatomical site identification for laryngoscopy quality control: A multicenter study. Am J Otolaryngol 2023; 44:103695. [PMID: 36473265 DOI: 10.1016/j.amjoto.2022.103695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Revised: 09/26/2022] [Accepted: 11/19/2022] [Indexed: 11/25/2022]
Abstract
OBJECTIVES Video laryngoscopy is an important diagnostic tool for head and neck cancers. The artificial intelligence (AI) system has been shown to monitor blind spots during esophagogastroduodenoscopy. This study aimed to test the performance of AI-driven intelligent laryngoscopy monitoring assistant (ILMA) for landmark anatomical sites identification on laryngoscopic images and videos based on a convolutional neural network (CNN). MATERIALS AND METHODS The laryngoscopic images taken from January to December 2018 were retrospectively collected, and ILMA was developed using the CNN model of Inception-ResNet-v2 + Squeeze-and-Excitation Networks (SENet). A total of 16,000 laryngoscopic images were used for training. These were assigned to 20 landmark anatomical sites covering six major head and neck regions. In addition, the performance of ILMA in identifying anatomical sites was validated using 4000 laryngoscopic images and 25 videos provided by five other tertiary hospitals. RESULTS ILMA identified the 20 anatomical sites on the laryngoscopic images with a total accuracy of 97.60 %, and the average sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were 100 %, 99.87 %, 97.65 %, and 99.87 %, respectively. In addition, multicenter clinical verification displayed that the accuracy of ILMA in identifying the 20 targeted anatomical sites in 25 laryngoscopic videos from five hospitals was ≥95 %. CONCLUSION The proposed CNN-based ILMA model can rapidly and accurately identify the anatomical sites on laryngoscopic images. The model can reflect the coverage of anatomical regions of the head and neck by laryngoscopy, showing application potential in improving the quality of laryngoscopy.
Collapse
Affiliation(s)
- Ji-Qing Zhu
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Mei-Ling Wang
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China
| | - Ying Li
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China
| | - Wei Zhang
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China
| | - Li-Juan Li
- Department of Otorhinolaryngology, The People's Hospital of Wenshan Prefecture, Wenshan, Yunnan, China
| | - Lin Liu
- Department of Otolaryngology-Head and Neck Surgery, Dalian Municipal Friendship Hospital, Dalian, Liaoning, China
| | - Yan Zhang
- Department of Otorhinolaryngology, Chongqing Traditional Chinese Medicine Hospital, Chongqing, China
| | - Cai-Juan Han
- Department of Otolaryngology-Head and Neck Surgery, Qilu Hospital (Qingdao), Cheeloo College of Medicine, Shandong University, Qingdao, Shandong, China
| | - Cheng-Wei Tie
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Shi-Xu Wang
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Gui-Qi Wang
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.
| | - Xiao-Guang Ni
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.
| |
Collapse
|
31
|
Tran BA, Dao TTP, Dung HDQ, Van NB, Ha CC, Pham NH, Nguyen TCHTNC, Nguyen TC, Pham MK, Tran MK, Tran TM, Tran MT. Support of deep learning to classify vocal fold images in flexible laryngoscopy. Am J Otolaryngol 2023; 44:103800. [PMID: 36905912 DOI: 10.1016/j.amjoto.2023.103800] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 02/19/2023] [Indexed: 02/26/2023]
Abstract
PURPOSE To collect a dataset with adequate laryngoscopy images and identify the appearance of vocal folds and their lesions in flexible laryngoscopy images by objective deep learning models. METHODS We adopted a number of novel deep learning models to train and classify 4549 flexible laryngoscopy images as no vocal fold, normal vocal folds, and abnormal vocal folds. This could help these models recognize vocal folds and their lesions within these images. Ultimately, we made a comparison between the results of the state-of-the-art deep learning models, and another comparison of the results between the computer-aided classification system and ENT doctors. RESULTS This study exhibited the performance of the deep learning models by evaluating laryngoscopy images collected from 876 patients. The efficiency of the Xception model was higher and steadier than almost the rest of the models. The accuracy of no vocal fold, normal vocal folds, and vocal fold abnormalities on this model were 98.90 %, 97.36 %, and 96.26 %, respectively. Compared to our ENT doctors, the Xception model produced better results than a junior doctor and was near an expert. CONCLUSION Our results show that current deep learning models can classify vocal fold images well and effectively assist physicians in vocal fold identification and classification of normal or abnormal vocal folds.
Collapse
Affiliation(s)
- Bich Anh Tran
- Otorhinolaryngology Department, Cho Ray Hospital, Ho Chi Minh City, Viet Nam.
| | - Thao Thi Phuong Dao
- University of Science, VNUHCM, Ho Chi Minh City, Viet Nam; John von Neumann Institute, VNUHCM, Ho Chi Minh City, Viet Nam; Vietnam National University, Ho Chi Minh City, Viet Nam; Department of Otolaryngology, Thong Nhat Hospital, Ho Chi Minh City, Viet Nam.
| | - Ho Dang Quy Dung
- Department of Endoscopy, Cho Ray Hospital, Ho Chi Minh City, Viet Nam.
| | - Ngoc Boi Van
- Department of Otolaryngology, Vinmec Central Park International Hospital, Ho Chi Minh City, Viet Nam.
| | - Chanh Cong Ha
- Department of Otolaryngology, 7A Military Hospital, Ho Chi Minh City, Viet Nam.
| | - Nam Hoang Pham
- Otorhinolaryngology Department, Cho Ray Hospital, Ho Chi Minh City, Viet Nam.
| | | | - Tan-Cong Nguyen
- University of Science, VNUHCM, Ho Chi Minh City, Viet Nam; University of Social Sciences and Humanities, VNUHCM, Ho Chi Minh City, Vietnam; Vietnam National University, Ho Chi Minh City, Viet Nam.
| | - Minh-Khoi Pham
- University of Science, VNUHCM, Ho Chi Minh City, Viet Nam; Vietnam National University, Ho Chi Minh City, Viet Nam.
| | - Mai-Khiem Tran
- University of Science, VNUHCM, Ho Chi Minh City, Viet Nam; John von Neumann Institute, VNUHCM, Ho Chi Minh City, Viet Nam; Vietnam National University, Ho Chi Minh City, Viet Nam.
| | - Truong Minh Tran
- Otorhinolaryngology Department, Cho Ray Hospital, Ho Chi Minh City, Viet Nam.
| | - Minh-Triet Tran
- University of Science, VNUHCM, Ho Chi Minh City, Viet Nam; John von Neumann Institute, VNUHCM, Ho Chi Minh City, Viet Nam; Vietnam National University, Ho Chi Minh City, Viet Nam.
| |
Collapse
|
32
|
PISDGAN: Perceive image structure and details for laryngeal image enhancement. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
33
|
王 美, 朱 继, 李 莹, 铁 成, 王 士, 张 玮, 王 贵, 倪 晓. [Automatic anatomical site recognition of laryngoscopic images using convolutional neural network]. LIN CHUANG ER BI YAN HOU TOU JING WAI KE ZA ZHI = JOURNAL OF CLINICAL OTORHINOLARYNGOLOGY, HEAD, AND NECK SURGERY 2023; 37:6-12. [PMID: 36597361 PMCID: PMC10128350 DOI: 10.13201/j.issn.2096-7993.2023.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Indexed: 01/05/2023]
Abstract
Objective:To explore the automatic recognition and classification of 20 anatomical sites in laryngoscopy by an artificial intelligence(AI) quality control system using convolutional neural network(CNN). Methods: Laryngoscopic image data archived from laryngoscopy examinations at the Department of Endoscopy, Cancer Hospital, Chinese Academy of Medical Sciences from January to December 2018 were collected retrospectively, and a CNN model was constructed using Inception-ResNet-V2+SENet. Using 14000 electronic laryngoscope images as the training set, these images were classified into 20 specific anatomical sites including the whole head and neck, and their performance was tested by 2000 laryngoscope images and 10 laryngoscope videos. Results:The average time of the trained CNN model for recognition of each laryngoscopic image was(20.59 ± 1.55) ms, and the overall accuracy of recognition of 20 anatomical sites in laryngoscopic images was 97.75%(1955/2000), with average sensitivity, specificity, positive predictive value, and negative predictive value of 100%, 99.88%, 97.76%, and 99.88%, respectively. The model had an accuracy of ≥ 99% for the identification of 20 anatomical sites in laryngoscopic videos. Conclusion:This study confirms that the CNN-based AI system can perform accurate and fast classification and identification of anatomical sites in laryngoscopic pictures and videos, which can be used for quality control of photo documentation in laryngoscopy and shows potential application in monitoring the performance of laryngoscopy.
Collapse
Affiliation(s)
- 美玲 王
- 国家癌症中心 国家恶性肿瘤临床医学研究中心 中国医学科学院北京协和医学院肿瘤医院深圳医院内镜科(深圳,518116)Department of Endoscopy, National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, 518116, China
| | - 继庆 朱
- 国家癌症中心 国家恶性肿瘤临床医学研究中心 中国医学科学院北京协和医学院肿瘤医院内镜科Department of Endoscopy, National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing
| | - 莹 李
- 国家癌症中心 国家恶性肿瘤临床医学研究中心 中国医学科学院北京协和医学院肿瘤医院深圳医院内镜科(深圳,518116)Department of Endoscopy, National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, 518116, China
| | - 成炜 铁
- 国家癌症中心 国家恶性肿瘤临床医学研究中心 中国医学科学院北京协和医学院肿瘤医院内镜科Department of Endoscopy, National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing
| | - 士旭 王
- 国家癌症中心 国家恶性肿瘤临床医学研究中心 中国医学科学院北京协和医学院肿瘤医院内镜科Department of Endoscopy, National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing
| | - 玮 张
- 国家癌症中心 国家恶性肿瘤临床医学研究中心 中国医学科学院北京协和医学院肿瘤医院深圳医院内镜科(深圳,518116)Department of Endoscopy, National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, 518116, China
| | - 贵齐 王
- 国家癌症中心 国家恶性肿瘤临床医学研究中心 中国医学科学院北京协和医学院肿瘤医院内镜科Department of Endoscopy, National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing
| | - 晓光 倪
- 国家癌症中心 国家恶性肿瘤临床医学研究中心 中国医学科学院北京协和医学院肿瘤医院内镜科Department of Endoscopy, National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing
| |
Collapse
|
34
|
Zang Q, Cui H, Guo X, Lu Y, Zou Z, Liu H. Clinical value of video-assisted single-lumen endotracheal intubation and application of artificial intelligence in it. Am J Transl Res 2022; 14:7643-7652. [PMID: 36505300 PMCID: PMC9730106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Accepted: 10/19/2022] [Indexed: 12/15/2022]
Abstract
Visualization techniques and artificial intelligence (AI) are currently used for intubation device. By providing airway visualization during tracheal intubation, the technologies provide safe and accurate access to the trachea. The ability of AI to automatically identify airways from images of intubation device makes it attractive for use in intubation devices. The purpose of this review is to introduce the state of application of visualization techniques and AI in certain intubation devices. We reviewed the evidence of clinical implications of the use of video-assisted intubation device in the intubation time, first attempt success rate, and intubation of the difficult airway. Especially, VivaSight single-lumen tube with an incorporated optics allows direct viewing of the airway. VivaSight single-lumen tube has more advantages in tracheal intubation. AI has been applied to fiberoptic bronchoscopy (FOB) and video laryngoscope with automatic airway image recognition, and has achieved certain accomplishment. Further, we discussed the possibility of applying AI to the VivaSight single-lumen tube and proposed future directions of research and application.
Collapse
Affiliation(s)
- Qinglai Zang
- Shanghai Institute for Minimally Invasive Therapy, University of Shanghai for Science and TechnologyShanghai 200093, PR China
| | - Haipo Cui
- Shanghai Institute for Minimally Invasive Therapy, University of Shanghai for Science and TechnologyShanghai 200093, PR China
| | - Xudong Guo
- Shanghai Institute for Minimally Invasive Therapy, University of Shanghai for Science and TechnologyShanghai 200093, PR China
| | - Yingxi Lu
- Shanghai Institute for Minimally Invasive Therapy, University of Shanghai for Science and TechnologyShanghai 200093, PR China
| | - Zui Zou
- School of Anesthesiology, Naval Medical UniversityShanghai 200433, PR China
| | - Hong Liu
- Information Center, The Second Affiliated Hospital of Naval Medical UniversityNo. 415, Fengyang Road, Huangpu District, Shanghai 200003, PR China
| |
Collapse
|
35
|
Sahoo PK, Mishra S, Panigrahi R, Bhoi AK, Barsocchi P. An Improvised Deep-Learning-Based Mask R-CNN Model for Laryngeal Cancer Detection Using CT Images. SENSORS (BASEL, SWITZERLAND) 2022; 22:8834. [PMID: 36433430 PMCID: PMC9697116 DOI: 10.3390/s22228834] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Revised: 11/08/2022] [Accepted: 11/10/2022] [Indexed: 05/30/2023]
Abstract
Recently, laryngeal cancer cases have increased drastically across the globe. Accurate treatment for laryngeal cancer is intricate, especially in the later stages. This type of cancer is an intricate malignancy inside the head and neck area of patients. In recent years, diverse diagnosis approaches and tools have been developed by researchers for helping clinical experts to identify laryngeal cancer effectively. However, these existing tools and approaches have diverse issues related to performance constraints such as lower accuracy in the identification of laryngeal cancer in the initial stage, more computational complexity, and large time consumption in patient screening. In this paper, the authors present a novel and enhanced deep-learning-based Mask R-CNN model for the identification of laryngeal cancer and its related symptoms by utilizing diverse image datasets and CT images in real time. Furthermore, our suggested model is capable of capturing and detecting minor malignancies of the larynx portion in a significant and faster manner in the real-time screening of patients, and it saves time for the clinicians, allowing for more patient screening every day. The outcome of the suggested model is enhanced and pragmatic and obtained an accuracy of 98.99%, precision of 98.99%, F1 score of 97.99%, and recall of 96.79% on the ImageNet dataset. Several studies have been performed in recent years on laryngeal cancer detection by using diverse approaches from researchers. For the future, there are vigorous opportunities for further research to investigate new approaches for laryngeal cancer detection by utilizing diverse and large dataset images.
Collapse
Affiliation(s)
- Pravat Kumar Sahoo
- School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar 751024, India
| | - Sushruta Mishra
- School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar 751024, India
| | - Ranjit Panigrahi
- Department of Computer Applications, Sikkim Manipal Institute of Technology, Sikkim Manipal University, Majitar, Rangpo 737136, India
| | - Akash Kumar Bhoi
- KIET Group of Institutions, Delhi-NCR, Ghaziabad 201206, India
- Directorate of Research, Sikkim Manipal University, Gangtok 737102, India
- Institute of Information Science and Technologies, National Research Council, 56124 Pisa, Italy
| | - Paolo Barsocchi
- Institute of Information Science and Technologies, National Research Council, 56124 Pisa, Italy
| |
Collapse
|
36
|
Paderno A, Gennarini F, Sordi A, Montenegro C, Lancini D, Villani FP, Moccia S, Piazza C. Artificial intelligence in clinical endoscopy: Insights in the field of videomics. Front Surg 2022; 9:933297. [PMID: 36171813 PMCID: PMC9510389 DOI: 10.3389/fsurg.2022.933297] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 08/22/2022] [Indexed: 11/13/2022] Open
Abstract
Artificial intelligence is being increasingly seen as a useful tool in medicine. Specifically, these technologies have the objective to extract insights from complex datasets that cannot easily be analyzed by conventional statistical methods. While promising results have been obtained for various -omics datasets, radiological images, and histopathologic slides, analysis of videoendoscopic frames still represents a major challenge. In this context, videomics represents a burgeoning field wherein several methods of computer vision are systematically used to organize unstructured data from frames obtained during diagnostic videoendoscopy. Recent studies have focused on five broad tasks with increasing complexity: quality assessment of endoscopic images, classification of pathologic and nonpathologic frames, detection of lesions inside frames, segmentation of pathologic lesions, and in-depth characterization of neoplastic lesions. Herein, we present a broad overview of the field, with a focus on conceptual key points and future perspectives.
Collapse
Affiliation(s)
- Alberto Paderno
- Unit of Otorhinolaryngology—Head and Neck Surgery, ASST Spedali Civili of Brescia, Brescia, Italy
- Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, School of Medicine, University of Brescia, Brescia, Italy
- Correspondence: Alberto Paderno
| | - Francesca Gennarini
- Unit of Otorhinolaryngology—Head and Neck Surgery, ASST Spedali Civili of Brescia, Brescia, Italy
- Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, School of Medicine, University of Brescia, Brescia, Italy
| | - Alessandra Sordi
- Unit of Otorhinolaryngology—Head and Neck Surgery, ASST Spedali Civili of Brescia, Brescia, Italy
- Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, School of Medicine, University of Brescia, Brescia, Italy
| | - Claudia Montenegro
- Unit of Otorhinolaryngology—Head and Neck Surgery, ASST Spedali Civili of Brescia, Brescia, Italy
- Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, School of Medicine, University of Brescia, Brescia, Italy
| | - Davide Lancini
- Unit of Otorhinolaryngology—Head and Neck Surgery, ASST Spedali Civili of Brescia, Brescia, Italy
| | - Francesca Pia Villani
- The BioRobotics Institute, Scuola Superiore Sant’Anna, Pisa, Italy
- Department of Excellence in Robotics and AI, Scuola Superiore Sant’Anna, Pisa, Italy
| | - Sara Moccia
- The BioRobotics Institute, Scuola Superiore Sant’Anna, Pisa, Italy
- Department of Excellence in Robotics and AI, Scuola Superiore Sant’Anna, Pisa, Italy
| | - Cesare Piazza
- Unit of Otorhinolaryngology—Head and Neck Surgery, ASST Spedali Civili of Brescia, Brescia, Italy
- Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, School of Medicine, University of Brescia, Brescia, Italy
| |
Collapse
|
37
|
Kwon I, Wang SG, Shin SC, Cheon YI, Lee BJ, Lee JC, Lim DW, Jo C, Cho Y, Shin BJ. Diagnosis of Early Glottic Cancer Using Laryngeal Image and Voice Based on Ensemble Learning of Convolutional Neural Network Classifiers. J Voice 2022:S0892-1997(22)00209-0. [PMID: 36075802 DOI: 10.1016/j.jvoice.2022.07.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 07/05/2022] [Accepted: 07/06/2022] [Indexed: 10/14/2022]
Abstract
OBJECTIVES The purpose of study is to improve the classification accuracy by comparing the results obtained by applying decision tree ensemble learning, which is one of the methods to increase the classification accuracy for a relatively small dataset, with the results obtained by the convolutional neural network (CNN) algorithm for the diagnosis of glottal cancer. METHODS Pusan National University Hospital (PNUH) dataset were used to establish classifiers and Pusan National University Yangsan Hospital (PNUYH) dataset were used to verify the classifier's performance in the generated model. For the diagnosis of glottic cancer, deep learning-based CNN models were established and classified using laryngeal image and voice data. Classification accuracy was obtained by performing decision tree ensemble learning using probability through CNN classification algorithm. In this process, the classification and regression tree (CART) method was used. Then, we compared the classification accuracy of decision tree ensemble learning with CNN individual classifiers by fusing the laryngeal image with the voice decision tree classifier. RESULTS We obtained classification accuracy of 81.03 % and 99.18 % in the established laryngeal image and voice classification models using PNUH training dataset, respectively. However, the classification accuracy of CNN classifiers decreased to 73.88 % in voice and 68.92 % in laryngeal image when using an external dataset of PNUYH. To solve this problem, decision tree ensemble learning of laryngeal image and voice was used, and the classification accuracy was improved by integrating data of laryngeal image and voice of the same person. The classification accuracy was 87.88 % and 89.06 % for the individualized laryngeal image and voice decision tree model respectively, and the fusion of the laryngeal image and voice decision tree results represented a classification accuracy of 95.31 %. CONCLUSION The results of our study suggest that decision tree ensemble learning aimed at training multiple classifiers is useful to obtain an increased classification accuracy despite a small dataset. Although a large data amount is essential for AI analysis, when an integrated approach is taken by combining various input data high diagnostic classification accuracy can be expected.
Collapse
Affiliation(s)
- Ickhwan Kwon
- Department of Applied IT and Engineering, Pusan National University, Miryang, Gyeongsangnam-do, South Korea
| | - Soo-Geun Wang
- Department of Otorhinolaryngology-Head and Neck Surgery, College of Medicine, Pusan National University and Medical Research Institute, Pusan National University Hospital, Busan, South Korea
| | - Sung-Chan Shin
- Department of Otorhinolaryngology-Head and Neck Surgery, College of Medicine, Pusan National University and Medical Research Institute, Pusan National University Hospital, Busan, South Korea
| | - Yong-Il Cheon
- Department of Otorhinolaryngology-Head and Neck Surgery, College of Medicine, Pusan National University and Medical Research Institute, Pusan National University Hospital, Busan, South Korea
| | - Byung-Joo Lee
- Department of Otorhinolaryngology-Head and Neck Surgery, College of Medicine, Pusan National University and Medical Research Institute, Pusan National University Hospital, Busan, South Korea
| | - Jin-Choon Lee
- Department of Otorhinolaryngology-Head and Neck Surgery, Pusan National University Yangsan Hospital, Yangsan, Gyeongsangnam-do, South Korea
| | - Dong-Won Lim
- Department of Otorhinolaryngology-Head and Neck Surgery, Pusan National University Hospital, Busan, South Korea
| | - Cheolwoo Jo
- School of Electrical, Electronics & Control Engineering, Changwon National University, Changwon, South Korea
| | - Youngseuk Cho
- Department of Statistics, College of Natural Sciences, Pusan National University, Busan, South Korea
| | - Bum-Joo Shin
- Department of Applied IT and Engineering, Pusan National University, Miryang, Gyeongsangnam-do, South Korea.
| |
Collapse
|
38
|
Pan X, Bai W, Ma M, Zhang S. RANT: A cascade reverse attention segmentation framework with hybrid transformer for laryngeal endoscope images. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2022.103890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
39
|
Warin K, Limprasert W, Suebnukarn S, Jinaporntham S, Jantana P, Vicharueang S. AI-based analysis of oral lesions using novel deep convolutional neural networks for early detection of oral cancer. PLoS One 2022; 17:e0273508. [PMID: 36001628 PMCID: PMC9401150 DOI: 10.1371/journal.pone.0273508] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 08/09/2022] [Indexed: 11/18/2022] Open
Abstract
Artificial intelligence (AI) applications in oncology have been developed rapidly with reported successes in recent years. This work aims to evaluate the performance of deep convolutional neural network (CNN) algorithms for the classification and detection of oral potentially malignant disorders (OPMDs) and oral squamous cell carcinoma (OSCC) in oral photographic images. A dataset comprising 980 oral photographic images was divided into 365 images of OSCC, 315 images of OPMDs and 300 images of non-pathological images. Multiclass image classification models were created by using DenseNet-169, ResNet-101, SqueezeNet and Swin-S. Multiclass object detection models were fabricated by using faster R-CNN, YOLOv5, RetinaNet and CenterNet2. The AUC of multiclass image classification of the best CNN models, DenseNet-196, was 1.00 and 0.98 on OSCC and OPMDs, respectively. The AUC of the best multiclass CNN-base object detection models, Faster R-CNN, was 0.88 and 0.64 on OSCC and OPMDs, respectively. In comparison, DenseNet-196 yielded the best multiclass image classification performance with AUC of 1.00 and 0.98 on OSCC and OPMD, respectively. These values were inline with the performance of experts and superior to those of general practictioners (GPs). In conclusion, CNN-based models have potential for the identification of OSCC and OPMDs in oral photographic images and are expected to be a diagnostic tool to assist GPs for the early detection of oral cancer.
Collapse
Affiliation(s)
- Kritsasith Warin
- Faculty of Dentistry, Thammasat University, Khlong Luang, Pathum Thani, Thailand
| | - Wasit Limprasert
- College of Interdisciplinary Studies, Thammasat University, Khlong Luang, Pathum Thani, Thailand
| | - Siriwan Suebnukarn
- Faculty of Dentistry, Thammasat University, Khlong Luang, Pathum Thani, Thailand
| | | | | | | |
Collapse
|
40
|
Wang S, Chen Y, Chen S, Zhong Q, Zhang K. Hierarchical dynamic convolutional neural network for laryngeal disease classification. Sci Rep 2022; 12:13914. [PMID: 35978109 PMCID: PMC9385650 DOI: 10.1038/s41598-022-18217-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 08/08/2022] [Indexed: 11/09/2022] Open
Abstract
Laryngeal disease classification is a relatively hard task in medical image processing resulting from its complex structures and varying viewpoints in data collection. Some existing methods try to tackle this task via the convolutional neural network, but they more or less ignore the intrinsic difficulty differences among different input samples and suffer from high training complexity. In order to better resolve these problems, an end-to-end Hierarchical Dynamic Convolutional Network (HDCNet) is proposed, which can dynamically process the input samples based on their difficulty. For the easy-classified samples, the HDCNet processes them with a smaller resolution and a relatively small network, while the difficult samples are passed to a large network with a larger resolution for more accurate classification results. Furthermore, a Feature Reuse Module (FRM) is designed to transfer the features learned by the small network to the corresponding block in the deep network to enhance the overall performance of some rather complicated samples. To validate the effectiveness of the proposed HDCNet, comprehensive experiments are conducted on the public available laryngeal disease classification dataset and HDCNet provides superior performances compared with other current state-of-the-art methods.
Collapse
Affiliation(s)
- Shaoli Wang
- Hainan General Hospital (Hainan Affiliated Hospital of Hainan Medical University), Xiuhua road, Hainan, China
| | - Yingying Chen
- Hainan General Hospital (Hainan Affiliated Hospital of Hainan Medical University), Xiuhua road, Hainan, China
| | - Siying Chen
- Hainan General Hospital (Hainan Affiliated Hospital of Hainan Medical University), Xiuhua road, Hainan, China
| | - Qionglei Zhong
- Hainan General Hospital (Hainan Affiliated Hospital of Hainan Medical University), Xiuhua road, Hainan, China.
| | - Kaiyan Zhang
- Hainan General Hospital (Hainan Affiliated Hospital of Hainan Medical University), Xiuhua road, Hainan, China.
| |
Collapse
|
41
|
宋 琦, 李 晓. [Application and development of voice analysis and endoscopic technology combined with artificial intelligence in the diagnosis and treatment of throat disease]. LIN CHUANG ER BI YAN HOU TOU JING WAI KE ZA ZHI = JOURNAL OF CLINICAL OTORHINOLARYNGOLOGY, HEAD, AND NECK SURGERY 2022; 36:647-650. [PMID: 35959588 PMCID: PMC10128196 DOI: 10.13201/j.issn.2096-7993.2022.08.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Indexed: 06/15/2023]
Abstract
In the diagnosis and treatment of throat disease, the application and development of combining voice analysis or endoscopic technology with artificial intelligence has developed rapidly. This paper reviews the history and principles of the combination of voice analysis or endoscopic technology with artificial intelligence, summarizes its status of application and development, and sums up its advantages that lie in the strong learning and interpretation ability, amazing speed and tolerance, and stable replication and expansion. The key to restrict its development is the uncertainty in the process of machine learning, the error caused by small samples, and the ethical philosophical thinking. Future development direction should be that the surgeons in otolaryngology head and neck department on the basis of excellent professional knowledge, learn related knowledge of epidemiology, classic statistics, strengthen the exchanges and cooperation with machine learning developers. Eventually, advanced science and technology can be truly used in clinical practice to maximize the benefit of the majority of patients.
Collapse
Affiliation(s)
- 琦 宋
- 中国人民解放军联勤保障部队第九八〇医院耳鼻咽喉头颈外科(石家庄,050082)Department of Otolaryngology Head and Neck Surgery, the 980th Hospital of the Joint Logistics Support Unit of the Chinese PLA, Shijiazhuang, 050082, China
| | - 晓明 李
- 中国人民解放军联勤保障部队第九八〇医院耳鼻咽喉头颈外科(石家庄,050082)Department of Otolaryngology Head and Neck Surgery, the 980th Hospital of the Joint Logistics Support Unit of the Chinese PLA, Shijiazhuang, 050082, China
| |
Collapse
|
42
|
Żurek M, Jasak K, Niemczyk K, Rzepakowska A. Artificial Intelligence in Laryngeal Endoscopy: Systematic Review and Meta-Analysis. J Clin Med 2022; 11:jcm11102752. [PMID: 35628878 PMCID: PMC9144710 DOI: 10.3390/jcm11102752] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Revised: 04/24/2022] [Accepted: 05/08/2022] [Indexed: 12/24/2022] Open
Abstract
Background: Early diagnosis of laryngeal lesions is necessary to begin treatment of patients as soon as possible to preserve optimal organ functions. Imaging examinations are often aided by artificial intelligence (AI) to improve quality and facilitate appropriate diagnosis. The aim of this study is to investigate diagnostic utility of AI in laryngeal endoscopy. Methods: Five databases were searched for studies implementing artificial intelligence (AI) enhanced models assessing images of laryngeal lesions taken during laryngeal endoscopy. Outcomes were analyzed in terms of accuracy, sensitivity, and specificity. Results: All 11 studies included presented an overall low risk of bias. The overall accuracy of AI models was very high (from 0.806 to 0.997). The accuracy was significantly higher in studies using a larger database. The pooled sensitivity and specificity for identification of healthy laryngeal tissue were 0.91 and 0.97, respectively. The same values for differentiation between benign and malignant lesions were 0.91 and 0.94, respectively. The comparison of the effectiveness of AI models assessing narrow band imaging and white light endoscopy images revealed no statistically significant differences (p = 0.409 and 0.914). Conclusion: In assessing images of laryngeal lesions, AI demonstrates extraordinarily high accuracy, sensitivity, and specificity.
Collapse
Affiliation(s)
- Michał Żurek
- Department of Otorhinolaryngology Head and Neck Surgery, Medical University of Warsaw, 1a Banacha Str., 02-097 Warsaw, Poland; (K.N.); (A.R.)
- Doctoral School, Medical University of Warsaw, 61 Żwirki i Wigury Str., 02-091 Warsaw, Poland
- Correspondence: ; Tel.: +48-225992716
| | - Kamil Jasak
- Students Scientific Research Group, Department of Otorhinolaryngology Head and Neck Surgery, Medical University of Warsaw, 1a Banacha Str., 02-097 Warsaw, Poland;
| | - Kazimierz Niemczyk
- Department of Otorhinolaryngology Head and Neck Surgery, Medical University of Warsaw, 1a Banacha Str., 02-097 Warsaw, Poland; (K.N.); (A.R.)
| | - Anna Rzepakowska
- Department of Otorhinolaryngology Head and Neck Surgery, Medical University of Warsaw, 1a Banacha Str., 02-097 Warsaw, Poland; (K.N.); (A.R.)
| |
Collapse
|
43
|
Russo S, Bonassi S. Prospects and Pitfalls of Machine Learning in Nutritional Epidemiology. Nutrients 2022; 14:1705. [PMID: 35565673 PMCID: PMC9105182 DOI: 10.3390/nu14091705] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Revised: 04/13/2022] [Accepted: 04/14/2022] [Indexed: 02/06/2023] Open
Abstract
Nutritional epidemiology employs observational data to discover associations between diet and disease risk. However, existing analytic methods of dietary data are often sub-optimal, with limited incorporation and analysis of the correlations between the studied variables and nonlinear behaviours in the data. Machine learning (ML) is an area of artificial intelligence that has the potential to improve modelling of nonlinear associations and confounding which are found in nutritional data. These opportunities notwithstanding, the applications of ML in nutritional epidemiology must be approached cautiously to safeguard the scientific quality of the results and provide accurate interpretations. Given the complex scenario around ML, judicious application of such tools is necessary to offer nutritional epidemiology a novel analytical resource for dietary measurement and assessment and a tool to model the complexity of dietary intake and its relation to health. This work describes the applications of ML in nutritional epidemiology and provides guidelines to avoid common pitfalls encountered in applying predictive statistical models to nutritional data. Furthermore, it helps unfamiliar readers better assess the significance of their results and provides new possible future directions in the field of ML in nutritional epidemiology.
Collapse
Affiliation(s)
- Stefania Russo
- EcoVision Lab, Photogrammetry and Remote Sensing Group, ETH Zürich, 8092 Zurich, Switzerland
| | - Stefano Bonassi
- Department of Human Sciences and Quality of Life Promotion, San Raffaele University, 00166 Rome, Italy;
- Unit of Clinical and Molecular Epidemiology, IRCCS San Raffaele Roma, 00163 Rome, Italy
| |
Collapse
|
44
|
Yao P, Witte D, Gimonet H, German A, Andreadis K, Cheng M, Sulica L, Elemento O, Barnes J, Rameau A. Automatic classification of informative laryngoscopic images using deep learning. Laryngoscope Investig Otolaryngol 2022; 7:460-466. [PMID: 35434326 PMCID: PMC9008155 DOI: 10.1002/lio2.754] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Revised: 01/05/2022] [Accepted: 01/31/2022] [Indexed: 12/23/2022] Open
Abstract
Objective This study aims to develop and validate a convolutional neural network (CNN)‐based algorithm for automatic selection of informative frames in flexible laryngoscopic videos. The classifier has the potential to aid in the development of computer‐aided diagnosis systems and reduce data processing time for clinician‐computer scientist teams. Methods A dataset of 22,132 laryngoscopic frames was extracted from 137 flexible laryngostroboscopic videos from 115 patients. 55 videos were from healthy patients with no laryngeal pathology and 82 videos were from patients with vocal fold polyps. The extracted frames were manually labeled as informative or uninformative by two independent reviewers based on vocal fold visibility, lighting, focus, and camera distance, resulting in 18,114 informative frames and 4018 uninformative frames. The dataset was split into training and test sets. A pre‐trained ResNet‐18 model was trained using transfer learning to classify frames as informative or uninformative. Hyperparameters were set using cross‐validation. The primary outcome was precision for the informative class and secondary outcomes were precision, recall, and F1‐score for all classes. The processing rate for frames between the model and a human annotator were compared. Results The automated classifier achieved an informative frame precision, recall, and F1‐score of 94.4%, 90.2%, and 92.3%, respectively, when evaluated on a hold‐out test set of 4438 frames. The model processed frames 16 times faster than a human annotator. Conclusion The CNN‐based classifier demonstrates high precision for classifying informative frames in flexible laryngostroboscopic videos. This model has the potential to aid researchers with dataset creation for computer‐aided diagnosis systems by automatically extracting relevant frames from laryngoscopic videos.
Collapse
Affiliation(s)
- Peter Yao
- Department of Otolaryngology‐Head and Neck Surgery, Sean Parker Institute for the Voice Weill Cornell Medicine New York New York USA
| | - Dan Witte
- Department of Otolaryngology‐Head and Neck Surgery, Sean Parker Institute for the Voice Weill Cornell Medicine New York New York USA
| | - Hortense Gimonet
- Department of Otolaryngology‐Head and Neck Surgery, Sean Parker Institute for the Voice Weill Cornell Medicine New York New York USA
| | - Alexander German
- Department of Otolaryngology‐Head and Neck Surgery, Sean Parker Institute for the Voice Weill Cornell Medicine New York New York USA
| | - Katerina Andreadis
- Department of Otolaryngology‐Head and Neck Surgery, Sean Parker Institute for the Voice Weill Cornell Medicine New York New York USA
| | - Michael Cheng
- Department of Otolaryngology‐Head and Neck Surgery, Sean Parker Institute for the Voice Weill Cornell Medicine New York New York USA
| | - Lucian Sulica
- Department of Otolaryngology‐Head and Neck Surgery, Sean Parker Institute for the Voice Weill Cornell Medicine New York New York USA
| | - Olivier Elemento
- Department of Otolaryngology‐Head and Neck Surgery, Sean Parker Institute for the Voice Weill Cornell Medicine New York New York USA
| | - Josue Barnes
- Department of Otolaryngology‐Head and Neck Surgery, Sean Parker Institute for the Voice Weill Cornell Medicine New York New York USA
| | - Anaïs Rameau
- Department of Otolaryngology‐Head and Neck Surgery, Sean Parker Institute for the Voice Weill Cornell Medicine New York New York USA
| |
Collapse
|
45
|
Esmaeili N, Sharaf E, Gomes Ataide EJ, Illanes A, Boese A, Davaris N, Arens C, Navab N, Friebe M. Deep Convolution Neural Network for Laryngeal Cancer Classification on Contact Endoscopy-Narrow Band Imaging. SENSORS 2021; 21:s21238157. [PMID: 34884166 PMCID: PMC8662427 DOI: 10.3390/s21238157] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2021] [Revised: 12/02/2021] [Accepted: 12/03/2021] [Indexed: 12/14/2022]
Abstract
(1) Background: Contact Endoscopy (CE) and Narrow Band Imaging (NBI) are optical imaging modalities that can provide enhanced and magnified visualization of the superficial vascular networks in the laryngeal mucosa. The similarity of vascular structures between benign and malignant lesions causes a challenge in the visual assessment of CE-NBI images. The main objective of this study is to use Deep Convolutional Neural Networks (DCNN) for the automatic classification of CE-NBI images into benign and malignant groups with minimal human intervention. (2) Methods: A pretrained Res-Net50 model combined with the cut-off-layer technique was selected as the DCNN architecture. A dataset of 8181 CE-NBI images was used during the fine-tuning process in three experiments where several models were generated and validated. The accuracy, sensitivity, and specificity were calculated as the performance metrics in each validation and testing scenario. (3) Results: Out of a total of 72 trained and tested models in all experiments, Model 5 showed high performance. This model is considerably smaller than the full ResNet50 architecture and achieved the testing accuracy of 0.835 on the unseen data during the last experiment. (4) Conclusion: The proposed fine-tuned ResNet50 model showed a high performance to classify CE-NBI images into the benign and malignant groups and has the potential to be part of an assisted system for automatic laryngeal cancer detection.
Collapse
Affiliation(s)
- Nazila Esmaeili
- INKA—Innovation Laboratory for Image Guided Therapy, Otto-von-Guericke University Magdeburg, 39120 Magdeburg, Germany; (E.S.); (E.J.G.A.); (A.I.); (A.B.); (M.F.)
- Chair for Computer Aided Medical Procedures and Augmented Reality, Technical University of Munich, 85748 Munich, Germany;
- Correspondence:
| | - Esam Sharaf
- INKA—Innovation Laboratory for Image Guided Therapy, Otto-von-Guericke University Magdeburg, 39120 Magdeburg, Germany; (E.S.); (E.J.G.A.); (A.I.); (A.B.); (M.F.)
| | - Elmer Jeto Gomes Ataide
- INKA—Innovation Laboratory for Image Guided Therapy, Otto-von-Guericke University Magdeburg, 39120 Magdeburg, Germany; (E.S.); (E.J.G.A.); (A.I.); (A.B.); (M.F.)
- Department of Nuclear Medicine, Medical Faculty, Otto-von-Guericke University Magdeburg, 39120 Magdeburg, Germany
| | - Alfredo Illanes
- INKA—Innovation Laboratory for Image Guided Therapy, Otto-von-Guericke University Magdeburg, 39120 Magdeburg, Germany; (E.S.); (E.J.G.A.); (A.I.); (A.B.); (M.F.)
| | - Axel Boese
- INKA—Innovation Laboratory for Image Guided Therapy, Otto-von-Guericke University Magdeburg, 39120 Magdeburg, Germany; (E.S.); (E.J.G.A.); (A.I.); (A.B.); (M.F.)
| | - Nikolaos Davaris
- Department of Otorhinolaryngology, Head and Neck Surgery, Magdeburg University Hospital, 39120 Magdeburg, Germany;
| | - Christoph Arens
- Department of Otorhinolaryngology, Head and Neck Surgery, Giessen University Hospital, 35392 Giessen, Germany;
| | - Nassir Navab
- Chair for Computer Aided Medical Procedures and Augmented Reality, Technical University of Munich, 85748 Munich, Germany;
| | - Michael Friebe
- INKA—Innovation Laboratory for Image Guided Therapy, Otto-von-Guericke University Magdeburg, 39120 Magdeburg, Germany; (E.S.); (E.J.G.A.); (A.I.); (A.B.); (M.F.)
- IDTM GmbH, 45657 Recklinghausen, Germany
| |
Collapse
|
46
|
Azam MA, Sampieri C, Ioppi A, Africano S, Vallin A, Mocellin D, Fragale M, Guastini L, Moccia S, Piazza C, Mattos LS, Peretti G. Deep Learning Applied to White Light and Narrow Band Imaging Videolaryngoscopy: Toward Real-Time Laryngeal Cancer Detection. Laryngoscope 2021; 132:1798-1806. [PMID: 34821396 PMCID: PMC9544863 DOI: 10.1002/lary.29960] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 11/05/2021] [Accepted: 11/15/2021] [Indexed: 12/23/2022]
Abstract
OBJECTIVES To assess a new application of artificial intelligence for real-time detection of laryngeal squamous cell carcinoma (LSCC) in both white light (WL) and narrow-band imaging (NBI) videolaryngoscopies based on the You-Only-Look-Once (YOLO) deep learning convolutional neural network (CNN). STUDY DESIGN Experimental study with retrospective data. METHODS Recorded videos of LSCC were retrospectively collected from in-office transnasal videoendoscopies and intraoperative rigid endoscopies. LSCC videoframes were extracted for training, validation, and testing of various YOLO models. Different techniques were used to enhance the image analysis: contrast limited adaptive histogram equalization, data augmentation techniques, and test time augmentation (TTA). The best-performing model was used to assess the automatic detection of LSCC in six videolaryngoscopies. RESULTS Two hundred and nineteen patients were retrospectively enrolled. A total of 624 LSCC videoframes were extracted. The YOLO models were trained after random distribution of images into a training set (82.6%), validation set (8.2%), and testing set (9.2%). Among the various models, the ensemble algorithm (YOLOv5s with YOLOv5m-TTA) achieved the best LSCC detection results, with performance metrics in par with the results reported by other state-of-the-art detection models: 0.66 Precision (positive predicted value), 0.62 Recall (sensitivity), and 0.63 mean Average Precision at 0.5 intersection over union. Tests on the six videolaryngoscopies demonstrated an average computation time per videoframe of 0.026 seconds. Three demonstration videos are provided. CONCLUSION This study identified a suitable CNN model for LSCC detection in WL and NBI videolaryngoscopies. Detection performances are highly promising. The limited complexity and quick computational times for LSCC detection make this model ideal for real-time processing. LEVEL OF EVIDENCE 3 Laryngoscope, 2021.
Collapse
Affiliation(s)
- Muhammad Adeel Azam
- Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy.,Department of Informatics, Bioengineering, Robotics, and System Engineering, University of Genoa, Genoa, Italy
| | - Claudio Sampieri
- Unit of Otorhinolaryngology - Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy.,Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| | - Alessandro Ioppi
- Unit of Otorhinolaryngology - Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy.,Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| | - Stefano Africano
- Unit of Otorhinolaryngology - Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy.,Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| | - Alberto Vallin
- Unit of Otorhinolaryngology - Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy.,Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| | - Davide Mocellin
- Unit of Otorhinolaryngology - Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy.,Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| | - Marco Fragale
- Unit of Otorhinolaryngology - Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy.,Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| | - Luca Guastini
- Unit of Otorhinolaryngology - Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy.,Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| | - Sara Moccia
- The BioRobotics Institute and Department of Excellence in Robotics and AI, Scuola Superiore Sant'Anna, Pisa, Italy
| | - Cesare Piazza
- Unit of Otorhinolaryngology - Head and Neck Surgery, ASST Spedali Civili of Brescia, Brescia, Italy.,Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, University of Brescia, Brescia, Italy
| | - Leonardo S Mattos
- Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy.,Department of Informatics, Bioengineering, Robotics, and System Engineering, University of Genoa, Genoa, Italy
| | - Giorgio Peretti
- Unit of Otorhinolaryngology - Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy.,Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| |
Collapse
|
47
|
Zhao Q, He Y, Wu Y, Huang D, Wang Y, Sun C, Ju J, Wang J, Mahr JJL. Vocal cord lesions classification based on deep convolutional neural network and transfer learning. Med Phys 2021; 49:432-442. [PMID: 34813114 DOI: 10.1002/mp.15371] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Revised: 09/12/2021] [Accepted: 09/29/2021] [Indexed: 01/01/2023] Open
Abstract
PURPOSE Laryngoscopy, the most common diagnostic method for vocal cord lesions (VCLs), is based mainly on the visual subjective inspection of otolaryngologists. This study aimed to establish a highly objective computer-aided VCLs diagnosis system based on deep convolutional neural network (DCNN) and transfer learning. METHODS To classify VCLs, our method combined the DCNN backbone with transfer learning on a system specifically finetuned for a laryngoscopy image dataset. Laryngoscopy image database was collected to train the proposed system. The diagnostic performance was compared with other DCNN-based models. Analysis of F1 score and receiver operating characteristic curves were conducted to evaluate the performance of the system. RESULTS Beyond the existing VCLs diagnosis method, the proposed system achieved an overall accuracy of 80.23%, an F1 score of 0.7836, and an area under the curve (AUC) of 0.9557 for four fine-grained classes of VCLs, namely, normal, polyp, keratinization, and carcinoma. It also demonstrated robust classification capacity for detecting urgent (keratinization, carcinoma) and non-urgent (normal, polyp), with an overall accuracy of 0.939, a sensitivity of 0.887, a specificity of 0.993, and an AUC of 0.9828. The proposed method also outperformed clinicians in the classification of normal, polyps, and carcinoma at an extremely low time cost. CONCLUSION The VCLs diagnosis system succeeded in using DCNN to distinguish the most common VCLs and normal cases, holding a practical potential for improving the overall diagnostic efficacy in VCLs examinations. The proposed VCLs diagnosis system could be appropriately integrated into the conventional workflow of VCLs laryngoscopy as a highly objective auxiliary method.
Collapse
Affiliation(s)
- Qian Zhao
- Key Laboratory of Photoelectronic Imaging Technology and System, Ministry of Education, School of Optics and Photonics, Beijing Institute of Technology, Beijing, China
| | - Yuqing He
- Key Laboratory of Photoelectronic Imaging Technology and System, Ministry of Education, School of Optics and Photonics, Beijing Institute of Technology, Beijing, China
| | - Yanda Wu
- Key Laboratory of Photoelectronic Imaging Technology and System, Ministry of Education, School of Optics and Photonics, Beijing Institute of Technology, Beijing, China
| | - Dongyan Huang
- National Clinical Research Center for Otolaryngologic Diseases, College of Otolaryngology-Head and Neck Surgery, Chinese PLA General Hospital, Beijing, China
| | - Yang Wang
- National Clinical Research Center for Otolaryngologic Diseases, College of Otolaryngology-Head and Neck Surgery, Chinese PLA General Hospital, Beijing, China
| | - Cai Sun
- National Clinical Research Center for Otolaryngologic Diseases, College of Otolaryngology-Head and Neck Surgery, Chinese PLA General Hospital, Beijing, China
| | - Jun Ju
- National Clinical Research Center for Otolaryngologic Diseases, College of Otolaryngology-Head and Neck Surgery, Chinese PLA General Hospital, Beijing, China
| | - Jiasen Wang
- National Clinical Research Center for Otolaryngologic Diseases, College of Otolaryngology-Head and Neck Surgery, Chinese PLA General Hospital, Beijing, China
| | | |
Collapse
|
48
|
Li GS, Yang LJ, Chen G, Huang SN, Fang YY, Huang WJ, Lu W, He J, Liu HC, Li LY, Mo BY, Lu HP. Laryngeal Squamous Cell Carcinoma: Clinical Significance and Potential Mechanism of Cell Division Cycle 45. Cancer Biother Radiopharm 2021; 37:300-312. [PMID: 34672813 DOI: 10.1089/cbr.2020.4314] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Background: Cell division cycle 45 (CDC45) plays an important role in the occurrence and development of numerous carcinomas, but its effect in laryngeal squamous cell carcinoma (LSCC) remains unclear. Materials and Methods: The messenger RNA and protein expression levels of CDC45 in LSCC were evaluated with a t test and the standard mean difference (SMD). The ability of CDC45 expression to distinguish the LSCC was assessed through receiver operating characteristic (ROC) curves. Gene set enrichment analysis (GSEA), protein-protein interaction, public databases, and online tools were used to explore the potential molecular mechanism of CDC45 in LSCC. Results: A high expression of CDC45 was identified in LSCC (SMD = 2.61, 95% confidence interval [1.62-3.61]). Through ROC curves, the expression of CDC45 makes it feasible to distinguish the LSCC group from the non-LSCC counterpart. CDC45 was relevant to the progression-free interval of LSCC patients (log-rank p = 0.03). GSEAs show that CDC45 is related to the cell cycle. CDC45, CDC6, KIF2C, and AURKB were identified as hub genes of LSCC. E2F1 may be the regulatory transcription factor of CDC45. Conclusions: High expression of CDC45 likely demonstrates carcinogenic effects in LSCC, and CDC45 is a potential target in screening and treatment of LSCC.
Collapse
Affiliation(s)
- Guo-Sheng Li
- Department of Pathology, First Affiliated Hospital of Guangxi Medical University, Nanning, P.R. China
| | - Lin-Jie Yang
- Department of Pathology, First Affiliated Hospital of Guangxi Medical University, Nanning, P.R. China
| | - Gang Chen
- Department of Pathology, First Affiliated Hospital of Guangxi Medical University, Nanning, P.R. China
| | - Su-Ning Huang
- Department of Radiotherapy, Guangxi Medical University Cancer Hospital, Nanning, P.R. China
| | - Ye-Ying Fang
- Department of Radiotherapy, First Affiliated Hospital of Guangxi Medical University, Nanning, P.R. China
| | - Wei-Jian Huang
- Department of Pathology, Redcross Hospital of Yulin, Yulin, P.R. China
| | - Wei Lu
- Department of Pathology, Nanning Second People's Hospital, Third Affiliated Hospital of Guangxi Medical University, Nanning, P.R. China
| | - Juan He
- Department of Pathology, First Affiliated Hospital of Guangxi Medical University, Nanning, P.R. China
| | - He-Chuan Liu
- Department of Pathology, First Affiliated Hospital of Guangxi Medical University, Nanning, P.R. China
| | - Lin-Yi Li
- Department of Pathology, First Affiliated Hospital of Guangxi Medical University, Nanning, P.R. China
| | - Bin-Yu Mo
- Department of Otolaryngology, Liuzhou People's Hospital, Liuzhou, P.R. China
| | - Hui-Ping Lu
- Department of Pathology, First Affiliated Hospital of Guangxi Medical University, Nanning, P.R. China
| |
Collapse
|
49
|
Yin L, Liu Y, Pei M, Li J, Wu M, Jia Y. Laryngoscope8: Laryngeal image dataset and classification of laryngeal disease based on attention mechanism. Pattern Recognit Lett 2021. [DOI: 10.1016/j.patrec.2021.06.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
50
|
Yao P, Usman M, Chen YH, German A, Andreadis K, Mages K, Rameau A. Applications of Artificial Intelligence to Office Laryngoscopy: A Scoping Review. Laryngoscope 2021; 132:1993-2016. [PMID: 34582043 DOI: 10.1002/lary.29886] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2021] [Revised: 09/15/2021] [Accepted: 09/17/2021] [Indexed: 01/16/2023]
Abstract
OBJECTIVES/HYPOTHESIS This scoping review aims to provide a broad overview of the applications of artificial intelligence (AI) to office laryngoscopy to identify gaps in knowledge and guide future research. STUDY DESIGN Scoping Review. METHODS Searches for studies on AI and office laryngoscopy were conducted in five databases. Title and abstract and then full-text screening were performed. Primary research studies published in English of any date were included. Studies were summarized by: AI applications, targeted conditions, imaging modalities, author affiliations, and dataset characteristics. RESULTS Studies focused on vocal fold vibration analysis (43%), lesion recognition (24%), and vocal fold movement determination (19%). The most frequently automated tasks were recognition of vocal fold nodules (19%), polyp (14%), paralysis (11%), paresis (8%), and cyst (7%). Imaging modalities included high-speed laryngeal videos (45%), stroboscopy (29%), and narrow band imaging endoscopy (7%). The body of literature was primarily authored by science, technology, engineering, and math (STEM) specialists (76%) with only 30 studies (31%) involving co-authorship by STEM specialists and otolaryngologists. Datasets were mostly from single institution (84%) and most commonly originated from Germany (23%), USA (16%), Spain (9%), Italy (8%), and China (8%). Demographic information was only reported in 39 studies (40%), with age and sex being the most commonly reported, whereas race/ethnicity and gender were not reported in any studies. CONCLUSION More interdisciplinary collaboration between STEM and otolaryngology research teams improved demographic reporting especially of race and ethnicity to ensure broad representation, and larger and more geographically diverse datasets will be crucial to future research on AI in office laryngoscopy. LEVEL OF EVIDENCE N/A Laryngoscope, 2021.
Collapse
Affiliation(s)
- Peter Yao
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, New York, New York, U.S.A
| | - Moon Usman
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, New York, New York, U.S.A
| | - Yu H Chen
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, New York, New York, U.S.A
| | - Alexander German
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, New York, New York, U.S.A
| | - Katerina Andreadis
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, New York, New York, U.S.A
| | - Keith Mages
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, New York, New York, U.S.A
| | - Anaïs Rameau
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, New York, New York, U.S.A
| |
Collapse
|