1
|
Kim YE, Serpedin A, Periyakoil P, German D, Rameau A. Sociodemographic reporting in videomics research: a review of practices in otolaryngology - head and neck surgery. Eur Arch Otorhinolaryngol 2024; 281:6047-6056. [PMID: 38704768 DOI: 10.1007/s00405-024-08659-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 04/02/2024] [Indexed: 05/07/2024]
Abstract
OBJECTIVE To assess reporting practices of sociodemographic data in Upper Aerodigestive Tract (UAT) videomics research in Otolaryngology-Head and Neck Surgery (OHNS). STUDY DESIGN Narrative review. METHODS Four online research databases were searched for peer-reviewed articles on videomics and UAT endoscopy in OHNS, published since January 1, 2017. Title and abstract search, followed by a full-text screening was performed. Dataset audit criteria were determined by the MINIMAR reporting standards for patient demographic characteristics, in addition to gender and author affiliations. RESULTS Of the 57 studies that were included, 37% reported any sociodemographic information on their dataset. Among these studies, all reported age, most reported sex (86%), two (10%) reported race, and one (5%) reported ethnicity and socioeconomic status. No studies reported gender. Most studies (84%) included at least one female author, and more than half of the studies (53%) had female first/senior authors, with no significant differences in the rate of sociodemographic reporting in studies with and without female authors (any female author: p = 0.2664; first/senior female author: p > 0.9999). Most studies based in the US reported at least one sociodemographic variable (79%), compared to those in Europe (24%) and in Asia (20%) (p = 0.0012). The rates of sociodemographic reporting in journals of different categories were as follows: clinical OHNS: 44%, clinical non-OHNS: 40%, technical: 42%, interdisciplinary: 10%. CONCLUSIONS There is prevalent underreporting of sociodemographic information in OHNS videomics research utilizing UAT endoscopy. Routine reporting of sociodemographic information should be implemented for AI-based research to help minimize algorithmic biases that have been previously demonstrated. LEVEL OF EVIDENCE: 4
Collapse
Affiliation(s)
- Yeo Eun Kim
- Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, Sean Parker Institute for the Voice, 240 East 59th St, New York, NY, 10022, USA
| | - Aisha Serpedin
- Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, Sean Parker Institute for the Voice, 240 East 59th St, New York, NY, 10022, USA
| | - Preethi Periyakoil
- Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, Sean Parker Institute for the Voice, 240 East 59th St, New York, NY, 10022, USA
| | - Daniel German
- Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, Sean Parker Institute for the Voice, 240 East 59th St, New York, NY, 10022, USA
| | - Anaïs Rameau
- Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, Sean Parker Institute for the Voice, 240 East 59th St, New York, NY, 10022, USA.
| |
Collapse
|
2
|
Paderno A, Bedi N, Rau A, Holsinger CF. Computer Vision and Videomics in Otolaryngology-Head and Neck Surgery: Bridging the Gap Between Clinical Needs and the Promise of Artificial Intelligence. Otolaryngol Clin North Am 2024; 57:703-718. [PMID: 38981809 DOI: 10.1016/j.otc.2024.05.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/11/2024]
Abstract
This article discusses the role of computer vision in otolaryngology, particularly through endoscopy and surgery. It covers recent applications of artificial intelligence (AI) in nonradiologic imaging within otolaryngology, noting the benefits and challenges, such as improving diagnostic accuracy and optimizing therapeutic outcomes, while also pointing out the necessity for enhanced data curation and standardized research methodologies to advance clinical applications. Technical aspects are also covered, providing a detailed view of the progression from manual feature extraction to more complex AI models, including convolutional neural networks and vision transformers and their potential application in clinical settings.
Collapse
Affiliation(s)
- Alberto Paderno
- IRCCS Humanitas Research Hospital, via Manzoni 56, Rozzano, Milan 20089, Italy; Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini 4, Pieve Emanuele, Milan 20072, Italy.
| | - Nikita Bedi
- Division of Head and Neck Surgery, Department of Otolaryngology, Stanford University, Palo Alto, CA, USA
| | - Anita Rau
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, USA
| | | |
Collapse
|
3
|
Nobel SMN, Swapno SMMR, Islam MR, Safran M, Alfarhood S, Mridha MF. A machine learning approach for vocal fold segmentation and disorder classification based on ensemble method. Sci Rep 2024; 14:14435. [PMID: 38910146 DOI: 10.1038/s41598-024-64987-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Accepted: 06/14/2024] [Indexed: 06/25/2024] Open
Abstract
In the healthcare domain, the essential task is to understand and classify diseases affecting the vocal folds (VFs). The accurate identification of VF disease is the key issue in this domain. Integrating VF segmentation and disease classification into a single system is challenging but important for precise diagnostics. Our study addresses this challenge by combining VF illness categorization and VF segmentation into a single integrated system. We utilized two effective ensemble machine learning methods: ensemble EfficientNetV2L-LGBM and ensemble UNet-BiGRU. We utilized the EfficientNetV2L-LGBM model for classification, achieving a training accuracy of 98.88%, validation accuracy of 97.73%, and test accuracy of 97.88%. These exceptional outcomes highlight the system's ability to classify different VF illnesses precisely. In addition, we utilized the UNet-BiGRU model for segmentation, which attained a training accuracy of 92.55%, a validation accuracy of 89.87%, and a significant test accuracy of 91.47%. In the segmentation task, we examined some methods to improve our ability to divide data into segments, resulting in a testing accuracy score of 91.99% and an Intersection over Union (IOU) of 87.46%. These measures demonstrate skill of the model in accurately defining and separating VF. Our system's classification and segmentation results confirm its capacity to effectively identify and segment VF disorders, representing a significant advancement in enhancing diagnostic accuracy and healthcare in this specialized field. This study emphasizes the potential of machine learning to transform the medical field's capacity to categorize VF and segment VF, providing clinicians with a vital instrument to mitigate the profound impact of the condition. Implementing this innovative approach is expected to enhance medical procedures and provide a sense of optimism to those globally affected by VF disease.
Collapse
Affiliation(s)
- S M Nuruzzaman Nobel
- Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka, 1216, Bangladesh
| | - S M Masfequier Rahman Swapno
- Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka, 1216, Bangladesh
| | - Md Rajibul Islam
- Department of Electrical and Electronic Engineering, The Hong Kong Polytechnic University, Hong Kong, China
| | - Mejdl Safran
- Department of Computer Science, College of Computer and Information Sciences, King Saud University, P. O. Box 51178, 11543, Riyadh, Saudi Arabia.
| | - Sultan Alfarhood
- Department of Computer Science, College of Computer and Information Sciences, King Saud University, P. O. Box 51178, 11543, Riyadh, Saudi Arabia
| | - M F Mridha
- Department of Computer Science, American International University-Bangladesh, Dhaka, 1229, Bangladesh
| |
Collapse
|
4
|
Deng B, Zheng X, Chen X, Zhang M. A Swin transformer encoder-based StyleGAN for unbalanced endoscopic image enhancement. Comput Biol Med 2024; 175:108472. [PMID: 38663349 DOI: 10.1016/j.compbiomed.2024.108472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 02/22/2024] [Accepted: 04/10/2024] [Indexed: 05/15/2024]
Abstract
With the rapid development of artificial intelligence, automated endoscopy-assisted diagnostic systems have become an effective tool for reducing the diagnostic costs and shortening the treatment cycle of patients. Typically, the performance of these systems depends on deep learning models which are pre-trained with large-scale labeled data, for example, early gastric cancer based on endoscopic images. However, the expensive annotation and the subjectivity of the annotators lead to an insufficient and class-imbalanced endoscopic image dataset, and these datasets are detrimental to the training of deep learning models. Therefore, we proposed a Swin Transformer encoder-based StyleGAN (STE-StyleGAN) for unbalanced endoscopic image enhancement, which is composed of an adversarial learning encoder and generator. Firstly, a pre-trained Swin Transformer is introduced into the encoder to extract multi-scale features layer by layer from endoscopic images. The features are subsequently fed into a mapping block for aggregation and recombination. Secondly, a self-attention mechanism is applied to the generator, which adds detailed information of the image layer by layer through recoded features, enabling the generator to autonomously learn the coupling between different image regions. Finally, we conducted extensive experiments on a private intestinal metaplasia grading dataset from a Grade-A tertiary hospital. The experimental results show that the images generated by STE-StyleGAN are closer to the initial image distribution, achieving a Fréchet Inception Distance (FID) value of 100.4. Then, these generated images are used to enhance the initial dataset to improve the robustness of the classification model, and achieved a top accuracy of 86 %.
Collapse
Affiliation(s)
- Bo Deng
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250352, China
| | - Xiangwei Zheng
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250352, China; Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology, Jinan, 250352, China; State Key Laboratory of High-end Server & Storage Technology, Jinan 250101, China.
| | - Xuanchi Chen
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250352, China
| | - Mingzhe Zhang
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250352, China
| |
Collapse
|
5
|
Dao TTP, Huynh TL, Pham MK, Le TN, Nguyen TC, Nguyen QT, Tran BA, Van BN, Ha CC, Tran MT. Improving Laryngoscopy Image Analysis Through Integration of Global Information and Local Features in VoFoCD Dataset. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024:10.1007/s10278-024-01068-z. [PMID: 38809338 DOI: 10.1007/s10278-024-01068-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 02/24/2024] [Accepted: 02/26/2024] [Indexed: 05/30/2024]
Abstract
The diagnosis and treatment of vocal fold disorders heavily rely on the use of laryngoscopy. A comprehensive vocal fold diagnosis requires accurate identification of crucial anatomical structures and potential lesions during laryngoscopy observation. However, existing approaches have yet to explore the joint optimization of the decision-making process, including object detection and image classification tasks simultaneously. In this study, we provide a new dataset, VoFoCD, with 1724 laryngology images designed explicitly for object detection and image classification in laryngoscopy images. Images in the VoFoCD dataset are categorized into four classes and comprise six glottic object types. Moreover, we propose a novel Multitask Efficient trAnsformer network for Laryngoscopy (MEAL) to classify vocal fold images and detect glottic landmarks and lesions. To further facilitate interpretability for clinicians, MEAL provides attention maps to visualize important learned regions for explainable artificial intelligence results toward supporting clinical decision-making. We also analyze our model's effectiveness in simulated clinical scenarios where shaking of the laryngoscopy process occurs. The proposed model demonstrates outstanding performance on our VoFoCD dataset. The accuracy for image classification and mean average precision at an intersection over a union threshold of 0.5 (mAP50) for object detection are 0.951 and 0.874, respectively. Our MEAL method integrates global knowledge, encompassing general laryngoscopy image classification, into local features, which refer to distinct anatomical regions of the vocal fold, particularly abnormal regions, including benign and malignant lesions. Our contribution can effectively aid laryngologists in identifying benign or malignant lesions of vocal folds and classifying images in the laryngeal endoscopy process visually.
Collapse
Affiliation(s)
- Thao Thi Phuong Dao
- University of Science, Ho Chi Minh City, Vietnam
- John von Neumann Institute, Ho Chi Minh City, Vietnam
- Vietnam National University, Ho Chi Minh City, Vietnam
- Department of Otolaryngology, Thong Nhat Hospital, Tan Binh District, Ho Chi Minh City, Vietnam
| | - Tuan-Luc Huynh
- University of Science, Ho Chi Minh City, Vietnam
- Vietnam National University, Ho Chi Minh City, Vietnam
| | | | - Trung-Nghia Le
- University of Science, Ho Chi Minh City, Vietnam
- Vietnam National University, Ho Chi Minh City, Vietnam
| | - Tan-Cong Nguyen
- University of Science, Ho Chi Minh City, Vietnam
- Vietnam National University, Ho Chi Minh City, Vietnam
- University of Social Sciences and Humanities, Ho Chi Minh City, Vietnam
| | - Quang-Thuc Nguyen
- University of Science, Ho Chi Minh City, Vietnam
- John von Neumann Institute, Ho Chi Minh City, Vietnam
- Vietnam National University, Ho Chi Minh City, Vietnam
| | - Bich Anh Tran
- Otorhinolaryngology Department, Cho Ray Hospital, District 5, Ho Chi Minh City, Vietnam
| | - Boi Ngoc Van
- Department of Otolaryngology, Vinmec Central Park International Hospital, Binh Thanh District, Ho Chi Minh City, Vietnam
| | - Chanh Cong Ha
- Department of Otolaryngology, District 7 Hospital, District 7, Ho Chi Minh City, Vietnam
| | - Minh-Triet Tran
- University of Science, Ho Chi Minh City, Vietnam.
- John von Neumann Institute, Ho Chi Minh City, Vietnam.
- Vietnam National University, Ho Chi Minh City, Vietnam.
| |
Collapse
|
6
|
Kim J, Wang SG, Lee JC, Cheon YI, Shin SC, Lim DW, Jang DI, Bhattacharjee S, Hwang YB, Choi HK, Kwon I, Kim SJ, Kwon SB. Evaluation of Vertical Level Differences Between Left and Right Vocal Folds Using Artificial Intelligence System in Excised Canine Larynx. J Voice 2024:S0892-1997(23)00385-5. [PMID: 38216386 DOI: 10.1016/j.jvoice.2023.11.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Revised: 11/28/2023] [Accepted: 11/28/2023] [Indexed: 01/14/2024]
Abstract
OBJECTIVES This study aimed to establish an artificial intelligence (AI) system to classify vertical level differences between vocal folds during vocalization and to evaluate the accuracy of the classification. METHODS We designed models with different depths between the right and left vocal folds using an excised canine larynx. Video files for the data set were obtained using a high-speed camera system and a color complementary metal oxide semiconductor camera with global shutter. The data sets were divided into training, validation, and testing. We used 20,000 images for building the model and 8000 images for testing. To perform deep learning multiclass classification and to estimate the vertical level difference, we introduced DenseNet121-ConvLSTM. RESULTS The model was trained several times using different numbers of epochs. We achieved the most optimal results at 100 epochs, and the batch size used during training was 16. The proposed DenseNet121-ConvLSTM model achieved classification accuracies of 99.5% and 88.0% for training and testing, respectively. After verification using an external data set, the overall accuracy, precision, recall, and f1-score were 90.8%, 91.6%, 90.9%, and 91.2%, respectively. CONCLUSIONS The newly developed AI system may be an easy and accurate method for classifying superior and inferior vertical level differences between vocal folds. Thus, this AI system can be applied and may help in the assessment of vertical level differences in patients with unilateral vocal fold paralysis.
Collapse
Affiliation(s)
- Jaewon Kim
- Department of Cognitive Science, Pusan National University, Doctor's Course, Busan, South Korea; Department of Otorhinolaryngology, Head and Neck Surgery, Pusan National University Yangsan Hospital, Yangsan, Gyeongsangnam-do, South Korea
| | - Soo-Geun Wang
- Department of Otorhinolaryngology, Head and Neck Surgery, College of Medicine, Pusan National University and Medical Research Institute, Pusan National University Hospital, Busan, South Korea
| | - Jin-Choon Lee
- Department of Otorhinolaryngology, Head and Neck Surgery, Pusan National University School of Medicine, Pusan National University Yangsan Hospital, Yangsan, Gyeongsangnam-do, South Korea
| | - Yong-Il Cheon
- Department of Otorhinolaryngology, Head and Neck Surgery, Biomedical Research Institute, Pusan National University School of Medicine, Pusan National University Hospital, Busan, South Korea
| | - Sung-Chan Shin
- Department of Otorhinolaryngology, Head and Neck Surgery, Biomedical Research Institute, Pusan National University School of Medicine, Pusan National University Hospital, Busan, South Korea
| | - Dong-Won Lim
- Department of Otorhinolaryngology, Head and Neck Surgery, Pusan National University Hospital, Busan, South Korea
| | - Dae-Ik Jang
- Department of Otorhinolaryngology, Head and Neck Surgery, Kosin University Gospel Hospital, Kosin University College of Medicine, Busan, South Korea
| | | | - Yeong-Byn Hwang
- Department of Digital Anti-Aging Healthcare, u-AHRC, Inje University, Gimhae, South Korea
| | - Heung-Kook Choi
- Department of Digital Anti-Aging Healthcare, u-AHRC, Inje University, Gimhae, South Korea; Artificial Intelligence Research Center, JLK Inc., Seoul, South Korea
| | - Ickhwan Kwon
- Platform Development Headquarters, Autonomous A2Z, Daegu, South Korea
| | - Seon-Jong Kim
- Department of Applied IT and Engineering, Pusan National University, Miryang, Gyeongsangnam-do, South Korea
| | - Soon-Bok Kwon
- Department of Humanities, Language and Information, Pusan National University, Busan, South Korea.
| |
Collapse
|
7
|
Wang SX, Li Y, Zhu JQ, Wang ML, Zhang W, Tie CW, Wang GQ, Ni XG. The Detection of Nasopharyngeal Carcinomas Using a Neural Network Based on Nasopharyngoscopic Images. Laryngoscope 2024; 134:127-135. [PMID: 37254946 DOI: 10.1002/lary.30781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Revised: 05/14/2023] [Accepted: 05/15/2023] [Indexed: 06/01/2023]
Abstract
OBJECTIVE To construct and validate a deep convolutional neural network (DCNN)-based artificial intelligence (AI) system for the detection of nasopharyngeal carcinoma (NPC) using archived nasopharyngoscopic images. METHODS We retrospectively collected 14107 nasopharyngoscopic images (7108 NPCs and 6999 noncancers) to construct a DCNN model and prepared a validation dataset containing 3501 images (1744 NPCs and 1757 noncancers) from a single center between January 2009 and December 2020. The DCNN model was established using the You Only Look Once (YOLOv5) architecture. Four otolaryngologists were asked to review the images of the validation set to benchmark the DCNN model performance. RESULTS The DCNN model analyzed the 3501 images in 69.35 s. For the validation dataset, the precision, recall, accuracy, and F1 score of the DCNN model in the detection of NPCs on white light imaging (WLI) and narrow band imaging (NBI) were 0.845 ± 0.038, 0.942 ± 0.021, 0.920 ± 0.024, and 0.890 ± 0.045, and 0.895 ± 0.045, 0.941 ± 0.018, and 0.975 ± 0.013, 0.918 ± 0.036, respectively. The diagnostic outcome of the DCNN model on WLI and NBI images was significantly higher than that of two junior otolaryngologists (p < 0.05). CONCLUSION The DCNN model showed better diagnostic outcomes for NPCs than those of junior otolaryngologists. Therefore, it could assist them in improving their diagnostic level and reducing missed diagnoses. LEVEL OF EVIDENCE 3 Laryngoscope, 134:127-135, 2024.
Collapse
Affiliation(s)
- Shi-Xu Wang
- Department of Head and Neck Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Ying Li
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China
| | - Ji-Qing Zhu
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Mei-Ling Wang
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China
| | - Wei Zhang
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China
| | - Cheng-Wei Tie
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Gui-Qi Wang
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Xiao-Guang Ni
- Department of Endoscopy, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| |
Collapse
|
8
|
Esmaeili N, Davaris N, Boese A, Illanes A, Navab N, Friebe M, Arens C. Contact Endoscopy - Narrow Band Imaging (CE-NBI) data set for laryngeal lesion assessment. Sci Data 2023; 10:733. [PMID: 37865668 PMCID: PMC10590430 DOI: 10.1038/s41597-023-02629-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 10/11/2023] [Indexed: 10/23/2023] Open
Abstract
The endoscopic examination of subepithelial vascular patterns within the vocal fold is crucial for clinicians seeking to distinguish between benign lesions and laryngeal cancer. Among innovative techniques, Contact Endoscopy combined with Narrow Band Imaging (CE-NBI) offers real-time visualization of these vascular structures. Despite the advent of CE-NBI, concerns have arisen regarding the subjective interpretation of its images. As a result, several computer-based solutions have been developed to address this issue. This study introduces the CE-NBI data set, the first publicly accessible data set that features enhanced and magnified visualizations of subepithelial blood vessels within the vocal fold. This data set encompasses 11144 images from 210 adult patients with pathological vocal fold conditions, where CE-NBI images are annotated using three distinct label categories. The data set has proven invaluable for numerous clinical assessments geared toward diagnosing laryngeal cancer using Optical Biopsy. Furthermore, given its versatility for various image analysis tasks, we have devised and implemented diverse image classification scenarios using Machine Learning (ML) approaches to address critical clinical challenges in assessing laryngeal lesions.
Collapse
Affiliation(s)
- Nazila Esmaeili
- Department of Otorhinolaryngology, Head and Neck Surgery, Justus Liebig University of Giessen, 35392, Giessen, Germany.
- Chair for Computer Aided Medical Procedures and Augmented Reality, Technical University of Munich, 85748, Munich, Germany.
- SURAG Medical GmbH, 04103, Leipzig, Germany.
| | - Nikolaos Davaris
- Department of Otorhinolaryngology, Head and Neck Surgery, Giessen University Hospital, 35392, Giessen, Germany
- Department of Otorhinolaryngology, Head and Neck Surgery, Magdeburg University Hospital, 39120, Magdeburg, Germany
| | - Axel Boese
- INKA-Innovation Laboratory for Image Guided Therapy, Medical Faculty, Otto-von-Guericke University Magdeburg, 39120, Magdeburg, Germany
| | | | - Nassir Navab
- Chair for Computer Aided Medical Procedures and Augmented Reality, Technical University of Munich, 85748, Munich, Germany
| | - Michael Friebe
- INKA-Innovation Laboratory for Image Guided Therapy, Medical Faculty, Otto-von-Guericke University Magdeburg, 39120, Magdeburg, Germany
- Department of Biocybernetics and Biomedical Engineering, AGH University Kraków, 30-059, Kraków, Poland
- CIBE - Center for Innovation, Business Development & Entrepreneurship, FOM University of Applied Sciences, 45141, Essen, Germany
| | - Christoph Arens
- Department of Otorhinolaryngology, Head and Neck Surgery, Giessen University Hospital, 35392, Giessen, Germany
| |
Collapse
|
9
|
Sampieri C, Baldini C, Azam MA, Moccia S, Mattos LS, Vilaseca I, Peretti G, Ioppi A. Artificial Intelligence for Upper Aerodigestive Tract Endoscopy and Laryngoscopy: A Guide for Physicians and State-of-the-Art Review. Otolaryngol Head Neck Surg 2023; 169:811-829. [PMID: 37051892 DOI: 10.1002/ohn.343] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 03/03/2023] [Accepted: 03/23/2023] [Indexed: 04/14/2023]
Abstract
OBJECTIVE The endoscopic and laryngoscopic examination is paramount for laryngeal, oropharyngeal, nasopharyngeal, nasal, and oral cavity benign lesions and cancer evaluation. Nevertheless, upper aerodigestive tract (UADT) endoscopy is intrinsically operator-dependent and lacks objective quality standards. At present, there has been an increased interest in artificial intelligence (AI) applications in this area to support physicians during the examination, thus enhancing diagnostic performances. The relative novelty of this research field poses a challenge both for the reviewers and readers as clinicians often lack a specific technical background. DATA SOURCES Four bibliographic databases were searched: PubMed, EMBASE, Cochrane, and Google Scholar. REVIEW METHODS A structured review of the current literature (up to September 2022) was performed. Search terms related to topics of AI, machine learning (ML), and deep learning (DL) in UADT endoscopy and laryngoscopy were identified and queried by 3 independent reviewers. Citations of selected studies were also evaluated to ensure comprehensiveness. CONCLUSIONS Forty-one studies were included in the review. AI and computer vision techniques were used to achieve 3 fundamental tasks in this field: classification, detection, and segmentation. All papers were summarized and reviewed. IMPLICATIONS FOR PRACTICE This article comprehensively reviews the latest developments in the application of ML and DL in UADT endoscopy and laryngoscopy, as well as their future clinical implications. The technical basis of AI is also explained, providing guidance for nonexpert readers to allow critical appraisal of the evaluation metrics and the most relevant quality requirements.
Collapse
Affiliation(s)
- Claudio Sampieri
- Department of Experimental Medicine (DIMES), University of Genoa, Genoa, Italy
- Functional Unit of Head and Neck Tumors, Hospital Clínic, Barcelona, Spain
- Otorhinolaryngology Department, Hospital Clínic, Barcelona, Spain
| | - Chiara Baldini
- Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy
- Dipartimento di Informatica, Bioingegneria, Robotica e Ingegneria dei Sistemi (DIBRIS), University of Genoa, Genoa, Italy
| | - Muhammad Adeel Azam
- Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy
- Dipartimento di Informatica, Bioingegneria, Robotica e Ingegneria dei Sistemi (DIBRIS), University of Genoa, Genoa, Italy
| | - Sara Moccia
- Department of Excellence in Robotics and AI, The BioRobotics Institute, Pisa, Italy
| | - Leonardo S Mattos
- Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy
| | - Isabel Vilaseca
- Functional Unit of Head and Neck Tumors, Hospital Clínic, Barcelona, Spain
- Otorhinolaryngology Department, Hospital Clínic, Barcelona, Spain
- Head Neck Clínic, Agència de Gestió d'Ajuts Universitaris i de Recerca, Barcelona, Catalunya, Spain
- Surgery and Medical-Surgical Specialties Department, Faculty of Medicine and Health Sciences, Universitat de Barcelona, Barcelona, Spain
- Translational Genomics and Target Therapies in Solid Tumors Group, Faculty of Medicine, Institut d́Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
- University of Barcelona, Barcelona, Spain
| | - Giorgio Peretti
- Unit of Otorhinolaryngology-Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy
- Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| | - Alessandro Ioppi
- Unit of Otorhinolaryngology-Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy
- Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| |
Collapse
|
10
|
Wellenstein DJ, Woodburn J, Marres HAM, van den Broek GB. Detection of laryngeal carcinoma during endoscopy using artificial intelligence. Head Neck 2023; 45:2217-2226. [PMID: 37377069 DOI: 10.1002/hed.27441] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 04/25/2023] [Accepted: 06/18/2023] [Indexed: 06/29/2023] Open
Abstract
BACKGROUND The objective of this study was to assess the performance and application of a self-developed deep learning (DL) algorithm for the real-time localization and classification of both vocal cord carcinoma and benign vocal cord lesions. METHODS The algorithm was trained and validated upon a dataset of videos and photos collected from our own department, as well as an open-access dataset named "Laryngoscope8". RESULTS The algorithm correctly localizes and classifies vocal cord carcinoma on still images with a sensitivity between 71% and 78% and benign vocal cord lesions with a sensitivity between 70% and 82%. Furthermore, the best algorithm had an average frame per second rate of 63, thus making it suitable to use in an outpatient clinic setting for real-time detection of laryngeal pathology. CONCLUSION We have demonstrated that our developed DL algorithm is able to localize and classify benign and malignant laryngeal pathology during endoscopy.
Collapse
Affiliation(s)
- David J Wellenstein
- Department of Otorhinolaryngology and Head and Neck Surgery, Radboud University Medical Center, Nijmegen, The Netherlands
| | | | - Henri A M Marres
- Department of Otorhinolaryngology and Head and Neck Surgery, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Guido B van den Broek
- Department of Otorhinolaryngology and Head and Neck Surgery, Radboud University Medical Center, Nijmegen, The Netherlands
- Department of Information Management, Radboud University Medical Center, Nijmegen, The Netherlands
| |
Collapse
|
11
|
Liu Z, Lv Q, Yang Z, Li Y, Lee CH, Shen L. Recent progress in transformer-based medical image analysis. Comput Biol Med 2023; 164:107268. [PMID: 37494821 DOI: 10.1016/j.compbiomed.2023.107268] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 05/30/2023] [Accepted: 07/16/2023] [Indexed: 07/28/2023]
Abstract
The transformer is primarily used in the field of natural language processing. Recently, it has been adopted and shows promise in the computer vision (CV) field. Medical image analysis (MIA), as a critical branch of CV, also greatly benefits from this state-of-the-art technique. In this review, we first recap the core component of the transformer, the attention mechanism, and the detailed structures of the transformer. After that, we depict the recent progress of the transformer in the field of MIA. We organize the applications in a sequence of different tasks, including classification, segmentation, captioning, registration, detection, enhancement, localization, and synthesis. The mainstream classification and segmentation tasks are further divided into eleven medical image modalities. A large number of experiments studied in this review illustrate that the transformer-based method outperforms existing methods through comparisons with multiple evaluation metrics. Finally, we discuss the open challenges and future opportunities in this field. This task-modality review with the latest contents, detailed information, and comprehensive comparison may greatly benefit the broad MIA community.
Collapse
Affiliation(s)
- Zhaoshan Liu
- Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore, 117575, Singapore.
| | - Qiujie Lv
- Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore, 117575, Singapore; School of Intelligent Systems Engineering, Sun Yat-sen University, No. 66, Gongchang Road, Guangming District, 518107, China.
| | - Ziduo Yang
- Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore, 117575, Singapore; School of Intelligent Systems Engineering, Sun Yat-sen University, No. 66, Gongchang Road, Guangming District, 518107, China.
| | - Yifan Li
- Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore, 117575, Singapore.
| | - Chau Hung Lee
- Department of Radiology, Tan Tock Seng Hospital, 11 Jalan Tan Tock Seng, Singapore, 308433, Singapore.
| | - Lei Shen
- Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore, 117575, Singapore.
| |
Collapse
|
12
|
Paderno A, Villani FP, Fior M, Berretti G, Gennarini F, Zigliani G, Ulaj E, Montenegro C, Sordi A, Sampieri C, Peretti G, Moccia S, Piazza C. Instance segmentation of upper aerodigestive tract cancer: site-specific outcomes. ACTA OTORHINOLARYNGOLOGICA ITALICA : ORGANO UFFICIALE DELLA SOCIETA ITALIANA DI OTORINOLARINGOLOGIA E CHIRURGIA CERVICO-FACCIALE 2023; 43:283-290. [PMID: 37488992 PMCID: PMC10366566 DOI: 10.14639/0392-100x-n2336] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 01/08/2023] [Indexed: 07/26/2023]
Abstract
Objective To achieve instance segmentation of upper aerodigestive tract (UADT) neoplasms using a deep learning (DL) algorithm, and to identify differences in its diagnostic performance in three different sites: larynx/hypopharynx, oral cavity and oropharynx. Methods A total of 1034 endoscopic images from 323 patients were examined under narrow band imaging (NBI). The Mask R-CNN algorithm was used for the analysis. The dataset split was: 935 training, 48 validation and 51 testing images. Dice Similarity Coefficient (Dsc) was the main outcome measure. Results Instance segmentation was effective in 76.5% of images. The mean Dsc was 0.90 ± 0.05. The algorithm correctly predicted 77.8%, 86.7% and 55.5% of lesions in the larynx/hypopharynx, oral cavity, and oropharynx, respectively. The mean Dsc was 0.90 ± 0.05 for the larynx/hypopharynx, 0.60 ± 0.26 for the oral cavity, and 0.81 ± 0.30 for the oropharynx. The analysis showed inferior diagnostic results in the oral cavity compared with the larynx/hypopharynx (p < 0.001). Conclusions The study confirms the feasibility of instance segmentation of UADT using DL algorithms and shows inferior diagnostic results in the oral cavity compared with other anatomic areas.
Collapse
Affiliation(s)
- Alberto Paderno
- Unit of Otorhinolaryngology, Head and Neck Surgery, ASST Spedali Civili of Brescia, Brescia, Italy
- Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, University of Brescia, School of Medicine, Brescia, Italy
| | | | - Milena Fior
- Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, University of Brescia, School of Medicine, Brescia, Italy
| | - Giulia Berretti
- Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, University of Brescia, School of Medicine, Brescia, Italy
| | - Francesca Gennarini
- Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, University of Brescia, School of Medicine, Brescia, Italy
| | - Gabriele Zigliani
- Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, University of Brescia, School of Medicine, Brescia, Italy
| | - Emanuela Ulaj
- Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, University of Brescia, School of Medicine, Brescia, Italy
| | - Claudia Montenegro
- Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, University of Brescia, School of Medicine, Brescia, Italy
| | - Alessandra Sordi
- Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, University of Brescia, School of Medicine, Brescia, Italy
| | - Claudio Sampieri
- Unit of Otorhinolaryngology, Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy
| | - Giorgio Peretti
- Unit of Otorhinolaryngology, Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy
| | - Sara Moccia
- The BioRobotics Institute, Scuola Superiore Sant'Anna, Pisa, Italy
- Department of Excellence in Robotics and AI, Scuola Superiore Sant'Anna, Pisa, Italy
| | - Cesare Piazza
- Unit of Otorhinolaryngology, Head and Neck Surgery, ASST Spedali Civili of Brescia, Brescia, Italy
- Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, University of Brescia, School of Medicine, Brescia, Italy
| |
Collapse
|
13
|
Ameen YA, Badary DM, Abonnoor AEI, Hussain KF, Sewisy AA. Which data subset should be augmented for deep learning? a simulation study using urothelial cell carcinoma histopathology images. BMC Bioinformatics 2023; 24:75. [PMID: 36869300 PMCID: PMC9983182 DOI: 10.1186/s12859-023-05199-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Accepted: 02/21/2023] [Indexed: 03/05/2023] Open
Abstract
BACKGROUND Applying deep learning to digital histopathology is hindered by the scarcity of manually annotated datasets. While data augmentation can ameliorate this obstacle, its methods are far from standardized. Our aim was to systematically explore the effects of skipping data augmentation; applying data augmentation to different subsets of the whole dataset (training set, validation set, test set, two of them, or all of them); and applying data augmentation at different time points (before, during, or after dividing the dataset into three subsets). Different combinations of the above possibilities resulted in 11 ways to apply augmentation. The literature contains no such comprehensive systematic comparison of these augmentation ways. RESULTS Non-overlapping photographs of all tissues on 90 hematoxylin-and-eosin-stained urinary bladder slides were obtained. Then, they were manually classified as either inflammation (5948 images), urothelial cell carcinoma (5811 images), or invalid (3132 images; excluded). If done, augmentation was eight-fold by flipping and rotation. Four convolutional neural networks (Inception-v3, ResNet-101, GoogLeNet, and SqueezeNet), pre-trained on the ImageNet dataset, were fine-tuned to binary classify images of our dataset. This task was the benchmark for our experiments. Model testing performance was evaluated using accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve. Model validation accuracy was also estimated. The best testing performance was achieved when augmentation was done to the remaining data after test-set separation, but before division into training and validation sets. This leaked information between the training and the validation sets, as evidenced by the optimistic validation accuracy. However, this leakage did not cause the validation set to malfunction. Augmentation before test-set separation led to optimistic results. Test-set augmentation yielded more accurate evaluation metrics with less uncertainty. Inception-v3 had the best overall testing performance. CONCLUSIONS In digital histopathology, augmentation should include both the test set (after its allocation), and the remaining combined training/validation set (before being split into separate training and validation sets). Future research should try to generalize our results.
Collapse
Affiliation(s)
- Yusra A Ameen
- Department of Computer Science, Faculty of Computers and Information, Assiut University, Asyut, Egypt.
| | - Dalia M Badary
- Department of Pathology, Faculty of Medicine, Assiut University, Asyut, Egypt
| | | | - Khaled F Hussain
- Department of Computer Science, Faculty of Computers and Information, Assiut University, Asyut, Egypt
| | - Adel A Sewisy
- Department of Computer Science, Faculty of Computers and Information, Assiut University, Asyut, Egypt
| |
Collapse
|
14
|
Tran BA, Dao TTP, Dung HDQ, Van NB, Ha CC, Pham NH, Nguyen TCHTNC, Nguyen TC, Pham MK, Tran MK, Tran TM, Tran MT. Support of deep learning to classify vocal fold images in flexible laryngoscopy. Am J Otolaryngol 2023; 44:103800. [PMID: 36905912 DOI: 10.1016/j.amjoto.2023.103800] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 02/19/2023] [Indexed: 02/26/2023]
Abstract
PURPOSE To collect a dataset with adequate laryngoscopy images and identify the appearance of vocal folds and their lesions in flexible laryngoscopy images by objective deep learning models. METHODS We adopted a number of novel deep learning models to train and classify 4549 flexible laryngoscopy images as no vocal fold, normal vocal folds, and abnormal vocal folds. This could help these models recognize vocal folds and their lesions within these images. Ultimately, we made a comparison between the results of the state-of-the-art deep learning models, and another comparison of the results between the computer-aided classification system and ENT doctors. RESULTS This study exhibited the performance of the deep learning models by evaluating laryngoscopy images collected from 876 patients. The efficiency of the Xception model was higher and steadier than almost the rest of the models. The accuracy of no vocal fold, normal vocal folds, and vocal fold abnormalities on this model were 98.90 %, 97.36 %, and 96.26 %, respectively. Compared to our ENT doctors, the Xception model produced better results than a junior doctor and was near an expert. CONCLUSION Our results show that current deep learning models can classify vocal fold images well and effectively assist physicians in vocal fold identification and classification of normal or abnormal vocal folds.
Collapse
Affiliation(s)
- Bich Anh Tran
- Otorhinolaryngology Department, Cho Ray Hospital, Ho Chi Minh City, Viet Nam.
| | - Thao Thi Phuong Dao
- University of Science, VNUHCM, Ho Chi Minh City, Viet Nam; John von Neumann Institute, VNUHCM, Ho Chi Minh City, Viet Nam; Vietnam National University, Ho Chi Minh City, Viet Nam; Department of Otolaryngology, Thong Nhat Hospital, Ho Chi Minh City, Viet Nam.
| | - Ho Dang Quy Dung
- Department of Endoscopy, Cho Ray Hospital, Ho Chi Minh City, Viet Nam.
| | - Ngoc Boi Van
- Department of Otolaryngology, Vinmec Central Park International Hospital, Ho Chi Minh City, Viet Nam.
| | - Chanh Cong Ha
- Department of Otolaryngology, 7A Military Hospital, Ho Chi Minh City, Viet Nam.
| | - Nam Hoang Pham
- Otorhinolaryngology Department, Cho Ray Hospital, Ho Chi Minh City, Viet Nam.
| | | | - Tan-Cong Nguyen
- University of Science, VNUHCM, Ho Chi Minh City, Viet Nam; University of Social Sciences and Humanities, VNUHCM, Ho Chi Minh City, Vietnam; Vietnam National University, Ho Chi Minh City, Viet Nam.
| | - Minh-Khoi Pham
- University of Science, VNUHCM, Ho Chi Minh City, Viet Nam; Vietnam National University, Ho Chi Minh City, Viet Nam.
| | - Mai-Khiem Tran
- University of Science, VNUHCM, Ho Chi Minh City, Viet Nam; John von Neumann Institute, VNUHCM, Ho Chi Minh City, Viet Nam; Vietnam National University, Ho Chi Minh City, Viet Nam.
| | - Truong Minh Tran
- Otorhinolaryngology Department, Cho Ray Hospital, Ho Chi Minh City, Viet Nam.
| | - Minh-Triet Tran
- University of Science, VNUHCM, Ho Chi Minh City, Viet Nam; John von Neumann Institute, VNUHCM, Ho Chi Minh City, Viet Nam; Vietnam National University, Ho Chi Minh City, Viet Nam.
| |
Collapse
|
15
|
Choi SJ, Kim DK, Kim BS, Cho M, Jeong J, Jo YH, Song KJ, Kim YJ, Kim S. Mask R-CNN based multiclass segmentation model for endotracheal intubation using video laryngoscope. Digit Health 2023; 9:20552076231211547. [PMID: 38025115 PMCID: PMC10631336 DOI: 10.1177/20552076231211547] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 10/17/2023] [Indexed: 12/01/2023] Open
Abstract
Objective Endotracheal intubation (ETI) is critical to secure the airway in emergent situations. Although artificial intelligence algorithms are frequently used to analyze medical images, their application to evaluating intraoral structures based on images captured during emergent ETI remains limited. The aim of this study is to develop an artificial intelligence model for segmenting structures in the oral cavity using video laryngoscope (VL) images. Methods From 54 VL videos, clinicians manually labeled images that include motion blur, foggy vision, blood, mucus, and vomitus. Anatomical structures of interest included the tongue, epiglottis, vocal cord, and corniculate cartilage. EfficientNet-B5 with DeepLabv3+, EffecientNet-B5 with U-Net, and Configured Mask R-Convolution Neural Network (CNN) were used; EffecientNet-B5 was pretrained on ImageNet. Dice similarity coefficient (DSC) was used to measure the segmentation performance of the model. Accuracy, recall, specificity, and F1 score were used to evaluate the model's performance in targeting the structure from the value of the intersection over union between the ground truth and prediction mask. Results The DSC of tongue, epiglottis, vocal cord, and corniculate cartilage obtained from the EfficientNet-B5 with DeepLabv3+, EfficientNet-B5 with U-Net, and Configured Mask R-CNN model were 0.3351/0.7675/0.766/0.6539, 0.0/0.7581/0.7395/0.6906, and 0.1167/0.7677/0.7207/0.57, respectively. Furthermore, the processing speeds (frames per second) of the three models stood at 3, 24, and 32, respectively. Conclusions The algorithm developed in this study can assist medical providers performing ETI in emergent situations.
Collapse
Affiliation(s)
- Seung Jae Choi
- Transdisciplinary Department of Medicine and Advanced Technology, Seoul National University Hospital, Seoul, Republic of Korea
| | - Dae Kon Kim
- Department of Emergency Medicine, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
- Department of Emergency Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea
- Department of Biomedical Engineering, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Byeong Soo Kim
- Interdisciplinary Program in Bioengineering, Graduate School, Seoul National University, Seoul, Republic of Korea
| | - Minwoo Cho
- Transdisciplinary Department of Medicine and Advanced Technology, Seoul National University Hospital, Seoul, Republic of Korea
| | - Joo Jeong
- Department of Emergency Medicine, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
- Department of Emergency Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - You Hwan Jo
- Department of Emergency Medicine, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
- Department of Emergency Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Kyoung Jun Song
- Department of Emergency Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea
- Department of Emergency Medicine, Seoul Metropolitan Government-Seoul National University Boramae Medical Center, Seoul, Republic of Korea
| | - Yu Jin Kim
- Department of Emergency Medicine, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
- Department of Emergency Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Sungwan Kim
- Department of Biomedical Engineering, Seoul National University College of Medicine, Seoul, Republic of Korea
- Institute of Bioengineering, Seoul National University, Seoul, Republic of Korea
| |
Collapse
|
16
|
Konradi J, Zajber M, Betz U, Drees P, Gerken A, Meine H. AI-Based Detection of Aspiration for Video-Endoscopy with Visual Aids in Meaningful Frames to Interpret the Model Outcome. SENSORS (BASEL, SWITZERLAND) 2022; 22:9468. [PMID: 36502169 PMCID: PMC9736280 DOI: 10.3390/s22239468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 11/29/2022] [Accepted: 11/30/2022] [Indexed: 06/17/2023]
Abstract
Disorders of swallowing often lead to pneumonia when material enters the airways (aspiration). Flexible Endoscopic Evaluation of Swallowing (FEES) plays a key role in the diagnostics of aspiration but is prone to human errors. An AI-based tool could facilitate this process. Recent non-endoscopic/non-radiologic attempts to detect aspiration using machine-learning approaches have led to unsatisfying accuracy and show black-box characteristics. Hence, for clinical users it is difficult to trust in these model decisions. Our aim is to introduce an explainable artificial intelligence (XAI) approach to detect aspiration in FEES. Our approach is to teach the AI about the relevant anatomical structures, such as the vocal cords and the glottis, based on 92 annotated FEES videos. Simultaneously, it is trained to detect boluses that pass the glottis and become aspirated. During testing, the AI successfully recognized the glottis and the vocal cords but could not yet achieve satisfying aspiration detection quality. While detection performance must be optimized, our architecture results in a final model that explains its assessment by locating meaningful frames with relevant aspiration events and by highlighting suspected boluses. In contrast to comparable AI tools, our framework is verifiable and interpretable and, therefore, accountable for clinical users.
Collapse
Affiliation(s)
- Jürgen Konradi
- Institute of Physical Therapy, Prevention and Rehabilitation, University Medical Center of the Johannes Gutenberg-University Mainz, 55131 Mainz, Germany
| | - Milla Zajber
- Department for Health Care & Nursing, Catholic University of Applied Sciences, 55122 Mainz, Germany
| | - Ulrich Betz
- Institute of Physical Therapy, Prevention and Rehabilitation, University Medical Center of the Johannes Gutenberg-University Mainz, 55131 Mainz, Germany
| | - Philipp Drees
- Department of Orthopedics and Trauma Surgery, University Medical Center of the Johannes Gutenberg-University Mainz, 55131 Mainz, Germany
| | - Annika Gerken
- Fraunhofer Institute for Digital Medicine MEVIS, 28359 Bremen, Germany
| | - Hans Meine
- Fraunhofer Institute for Digital Medicine MEVIS, 28359 Bremen, Germany
| |
Collapse
|
17
|
Sahoo PK, Mishra S, Panigrahi R, Bhoi AK, Barsocchi P. An Improvised Deep-Learning-Based Mask R-CNN Model for Laryngeal Cancer Detection Using CT Images. SENSORS (BASEL, SWITZERLAND) 2022; 22:8834. [PMID: 36433430 PMCID: PMC9697116 DOI: 10.3390/s22228834] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Revised: 11/08/2022] [Accepted: 11/10/2022] [Indexed: 05/30/2023]
Abstract
Recently, laryngeal cancer cases have increased drastically across the globe. Accurate treatment for laryngeal cancer is intricate, especially in the later stages. This type of cancer is an intricate malignancy inside the head and neck area of patients. In recent years, diverse diagnosis approaches and tools have been developed by researchers for helping clinical experts to identify laryngeal cancer effectively. However, these existing tools and approaches have diverse issues related to performance constraints such as lower accuracy in the identification of laryngeal cancer in the initial stage, more computational complexity, and large time consumption in patient screening. In this paper, the authors present a novel and enhanced deep-learning-based Mask R-CNN model for the identification of laryngeal cancer and its related symptoms by utilizing diverse image datasets and CT images in real time. Furthermore, our suggested model is capable of capturing and detecting minor malignancies of the larynx portion in a significant and faster manner in the real-time screening of patients, and it saves time for the clinicians, allowing for more patient screening every day. The outcome of the suggested model is enhanced and pragmatic and obtained an accuracy of 98.99%, precision of 98.99%, F1 score of 97.99%, and recall of 96.79% on the ImageNet dataset. Several studies have been performed in recent years on laryngeal cancer detection by using diverse approaches from researchers. For the future, there are vigorous opportunities for further research to investigate new approaches for laryngeal cancer detection by utilizing diverse and large dataset images.
Collapse
Affiliation(s)
- Pravat Kumar Sahoo
- School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar 751024, India
| | - Sushruta Mishra
- School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar 751024, India
| | - Ranjit Panigrahi
- Department of Computer Applications, Sikkim Manipal Institute of Technology, Sikkim Manipal University, Majitar, Rangpo 737136, India
| | - Akash Kumar Bhoi
- KIET Group of Institutions, Delhi-NCR, Ghaziabad 201206, India
- Directorate of Research, Sikkim Manipal University, Gangtok 737102, India
- Institute of Information Science and Technologies, National Research Council, 56124 Pisa, Italy
| | - Paolo Barsocchi
- Institute of Information Science and Technologies, National Research Council, 56124 Pisa, Italy
| |
Collapse
|
18
|
Peterson QA, Fei T, Sy LE, Froeschke LL, Mendelsohn AH, Berke GS, Peterson DA. Correlating Perceptual Voice Quality in Adductor Spasmodic Dysphonia With Computer Vision Assessment of Glottal Geometry Dynamics. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:3695-3708. [PMID: 36130065 PMCID: PMC9927624 DOI: 10.1044/2022_jslhr-22-00053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
PURPOSE This study examined the relationship between voice quality and glottal geometry dynamics in patients with adductor spasmodic dysphonia (ADSD). METHOD An objective computer vision and machine learning system was developed to extract glottal geometry dynamics from nasolaryngoscopic video recordings for 78 patients with ADSD. General regression models were used to examine the relationship between overall voice quality and 15 variables that capture glottal geometry dynamics derived from the computer vision system. Two experts in ADSD independently rated voice quality for two separate voice tasks for every patient, yielding four different voice quality rating models. RESULTS All four of the regression models exhibited positive correlations with clinical assessments of voice quality (R 2s = .30-.34, Spearman rho = .55-.61, all with p < .001). Seven to 10 variables were included in each model. There was high overlap in the variables included between the four models, and the sign of the correlation with voice quality was consistent for each variable across all four regression models. CONCLUSION We found specific glottal geometry dynamics that correspond to voice quality in ADSD.
Collapse
Affiliation(s)
- Quinn A. Peterson
- Department of Computer Science and Software Engineering, California Polytechnic State University, San Luis Obispo
| | - Teng Fei
- Department of Cognitive Science, University of California, San Diego, La Jolla
| | - Lauren E. Sy
- Department of Cognitive Science, University of California, San Diego, La Jolla
| | | | - Abie H. Mendelsohn
- Department of Head and Neck Surgery, David Geffen School of Medicine, University of California, Los Angeles
| | - Gerald S. Berke
- Department of Head and Neck Surgery, David Geffen School of Medicine, University of California, Los Angeles
| | - David A. Peterson
- Institute for Neural Computation, University of California, San Diego, La Jolla
| |
Collapse
|
19
|
Sakthivel S, Prabhu V. Optimal Deep Learning-Based Vocal Fold Disorder Detection and Classification Model on High-Speed Video Endoscopy. JOURNAL OF HEALTHCARE ENGINEERING 2022; 2022:4248938. [PMID: 36353680 PMCID: PMC9640237 DOI: 10.1155/2022/4248938] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 09/04/2022] [Accepted: 09/21/2022] [Indexed: 08/08/2023]
Abstract
The use of high-speed video-endoscopy (HSV) in the study of phonatory processes linked to speech needs the precise identification of vocal fold boundaries at the time of vibration. The HSV is a unique laryngeal imaging technology that captures intracycle vocal fold vibrations at a higher frame rate without the need for auditory inputs. The HSV is also effective in identifying the vibrational characteristics of the vocal folds with an increased temporal resolution during retained phonation and flowing speech. Clinically significant vocal fold vibratory characteristics in running speech can be retrieved by creating automated algorithms for extracting HSV-based vocal fold vibration data. The best deep learning-based diagnosis and categorization of vocal fold abnormalities is due to the usage of HSV (ODL-VFDDC). The suggested ODL-VFDDC technique starts with temporal segmentation and motion correction to identify vocalized regions from the HSV recording and gathers the position of movable vocal folds across frames. The attributes gathered are fed into the deep belief network (DBN) model. Furthermore, the agricultural fertility algorithm (AFA) is used to optimize the hyperparameter tuning of the DBN model, which improves classification results. In terms of vocal fold disorder classification, the testing results demonstrated that the ODL-VFDDC technique beats the other existing methodologies. The farmland fertility algorithm (FFA) is then used to accurately determine the glottal limits of vibrating vocal folds. The suggested method has successfully tracked the speech fold boundaries across frames with minimum processing cost and high resilience to picture noise. This method gives a way to look at how the vocal folds move during a connected speech that is completely done by itself.
Collapse
Affiliation(s)
- S. Sakthivel
- Department of Computer Science and Engineering, Vel Tech High Tech Dr. Rangarajan Dr. Sakunthala Engineering College, Avadi, Chennai, India
| | - V. Prabhu
- Department of Electronics and Communication Engineering, Vel Tech Multi Tech Dr. Rangarajan Dr. Sakunthala Engineering College, Chennai, India
| |
Collapse
|
20
|
Long-term performance assessment of fully automatic biomedical glottis segmentation at the point of care. PLoS One 2022; 17:e0266989. [PMID: 36129922 PMCID: PMC9491538 DOI: 10.1371/journal.pone.0266989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 07/25/2022] [Indexed: 12/04/2022] Open
Abstract
Deep Learning has a large impact on medical image analysis and lately has been adopted for clinical use at the point of care. However, there is only a small number of reports of long-term studies that show the performance of deep neural networks (DNNs) in such an environment. In this study, we measured the long-term performance of a clinically optimized DNN for laryngeal glottis segmentation. We have collected the video footage for two years from an AI-powered laryngeal high-speed videoendoscopy imaging system and found that the footage image quality is stable across time. Next, we determined the DNN segmentation performance on lossy and lossless compressed data revealing that only 9% of recordings contain segmentation artifacts. We found that lossy and lossless compression is on par for glottis segmentation, however, lossless compression provides significantly superior image quality. Lastly, we employed continual learning strategies to continuously incorporate new data into the DNN to remove the aforementioned segmentation artifacts. With modest manual intervention, we were able to largely alleviate these segmentation artifacts by up to 81%. We believe that our suggested deep learning-enhanced laryngeal imaging platform consistently provides clinically sound results, and together with our proposed continual learning scheme will have a long-lasting impact on the future of laryngeal imaging.
Collapse
|
21
|
Paderno A, Gennarini F, Sordi A, Montenegro C, Lancini D, Villani FP, Moccia S, Piazza C. Artificial intelligence in clinical endoscopy: Insights in the field of videomics. Front Surg 2022; 9:933297. [PMID: 36171813 PMCID: PMC9510389 DOI: 10.3389/fsurg.2022.933297] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 08/22/2022] [Indexed: 11/13/2022] Open
Abstract
Artificial intelligence is being increasingly seen as a useful tool in medicine. Specifically, these technologies have the objective to extract insights from complex datasets that cannot easily be analyzed by conventional statistical methods. While promising results have been obtained for various -omics datasets, radiological images, and histopathologic slides, analysis of videoendoscopic frames still represents a major challenge. In this context, videomics represents a burgeoning field wherein several methods of computer vision are systematically used to organize unstructured data from frames obtained during diagnostic videoendoscopy. Recent studies have focused on five broad tasks with increasing complexity: quality assessment of endoscopic images, classification of pathologic and nonpathologic frames, detection of lesions inside frames, segmentation of pathologic lesions, and in-depth characterization of neoplastic lesions. Herein, we present a broad overview of the field, with a focus on conceptual key points and future perspectives.
Collapse
Affiliation(s)
- Alberto Paderno
- Unit of Otorhinolaryngology—Head and Neck Surgery, ASST Spedali Civili of Brescia, Brescia, Italy
- Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, School of Medicine, University of Brescia, Brescia, Italy
- Correspondence: Alberto Paderno
| | - Francesca Gennarini
- Unit of Otorhinolaryngology—Head and Neck Surgery, ASST Spedali Civili of Brescia, Brescia, Italy
- Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, School of Medicine, University of Brescia, Brescia, Italy
| | - Alessandra Sordi
- Unit of Otorhinolaryngology—Head and Neck Surgery, ASST Spedali Civili of Brescia, Brescia, Italy
- Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, School of Medicine, University of Brescia, Brescia, Italy
| | - Claudia Montenegro
- Unit of Otorhinolaryngology—Head and Neck Surgery, ASST Spedali Civili of Brescia, Brescia, Italy
- Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, School of Medicine, University of Brescia, Brescia, Italy
| | - Davide Lancini
- Unit of Otorhinolaryngology—Head and Neck Surgery, ASST Spedali Civili of Brescia, Brescia, Italy
| | - Francesca Pia Villani
- The BioRobotics Institute, Scuola Superiore Sant’Anna, Pisa, Italy
- Department of Excellence in Robotics and AI, Scuola Superiore Sant’Anna, Pisa, Italy
| | - Sara Moccia
- The BioRobotics Institute, Scuola Superiore Sant’Anna, Pisa, Italy
- Department of Excellence in Robotics and AI, Scuola Superiore Sant’Anna, Pisa, Italy
| | - Cesare Piazza
- Unit of Otorhinolaryngology—Head and Neck Surgery, ASST Spedali Civili of Brescia, Brescia, Italy
- Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, School of Medicine, University of Brescia, Brescia, Italy
| |
Collapse
|
22
|
Pan X, Bai W, Ma M, Zhang S. RANT: A cascade reverse attention segmentation framework with hybrid transformer for laryngeal endoscope images. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2022.103890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
23
|
Kist AM, Breininger K, Dörrich M, Dürr S, Schützenberger A, Semmler M. A single latent channel is sufficient for biomedical glottis segmentation. Sci Rep 2022; 12:14292. [PMID: 35995933 PMCID: PMC9395348 DOI: 10.1038/s41598-022-17764-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 07/30/2022] [Indexed: 11/23/2022] Open
Abstract
Glottis segmentation is a crucial step to quantify endoscopic footage in laryngeal high-speed videoendoscopy. Recent advances in deep neural networks for glottis segmentation allow for a fully automatic workflow. However, exact knowledge of integral parts of these deep segmentation networks remains unknown, and understanding the inner workings is crucial for acceptance in clinical practice. Here, we show that a single latent channel as a bottleneck layer is sufficient for glottal area segmentation using systematic ablations. We further demonstrate that the latent space is an abstraction of the glottal area segmentation relying on three spatially defined pixel subtypes allowing for a transparent interpretation. We further provide evidence that the latent space is highly correlated with the glottal area waveform, can be encoded with four bits, and decoded using lean decoders while maintaining a high reconstruction accuracy. Our findings suggest that glottis segmentation is a task that can be highly optimized to gain very efficient and explainable deep neural networks, important for application in the clinic. In the future, we believe that online deep learning-assisted monitoring is a game-changer in laryngeal examinations.
Collapse
Affiliation(s)
- Andreas M Kist
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nürnberg, 91052, Erlangen, Germany.
| | - Katharina Breininger
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nürnberg, 91052, Erlangen, Germany
| | - Marion Dörrich
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nürnberg, 91052, Erlangen, Germany
| | - Stephan Dürr
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, 91054, Erlangen, Germany
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, 91054, Erlangen, Germany
| | - Marion Semmler
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, 91054, Erlangen, Germany
| |
Collapse
|
24
|
Gumbs AA, Grasso V, Bourdel N, Croner R, Spolverato G, Frigerio I, Illanes A, Abu Hilal M, Park A, Elyan E. The Advances in Computer Vision That Are Enabling More Autonomous Actions in Surgery: A Systematic Review of the Literature. SENSORS (BASEL, SWITZERLAND) 2022; 22:4918. [PMID: 35808408 PMCID: PMC9269548 DOI: 10.3390/s22134918] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 06/21/2022] [Accepted: 06/21/2022] [Indexed: 12/28/2022]
Abstract
This is a review focused on advances and current limitations of computer vision (CV) and how CV can help us obtain to more autonomous actions in surgery. It is a follow-up article to one that we previously published in Sensors entitled, "Artificial Intelligence Surgery: How Do We Get to Autonomous Actions in Surgery?" As opposed to that article that also discussed issues of machine learning, deep learning and natural language processing, this review will delve deeper into the field of CV. Additionally, non-visual forms of data that can aid computerized robots in the performance of more autonomous actions, such as instrument priors and audio haptics, will also be highlighted. Furthermore, the current existential crisis for surgeons, endoscopists and interventional radiologists regarding more autonomy during procedures will be discussed. In summary, this paper will discuss how to harness the power of CV to keep doctors who do interventions in the loop.
Collapse
Affiliation(s)
- Andrew A. Gumbs
- Departement de Chirurgie Digestive, Centre Hospitalier Intercommunal de, Poissy/Saint-Germain-en-Laye, 78300 Poissy, France
- Department of Surgery, University of Magdeburg, 39106 Magdeburg, Germany;
| | - Vincent Grasso
- Family Christian Health Center, 31 West 155th St., Harvey, IL 60426, USA;
| | - Nicolas Bourdel
- Gynecological Surgery Department, CHU Clermont Ferrand, 1, Place Lucie-Aubrac Clermont-Ferrand, 63100 Clermont-Ferrand, France;
- EnCoV, Institut Pascal, UMR6602 CNRS, UCA, Clermont-Ferrand University Hospital, 63000 Clermont-Ferrand, France
- SurgAR-Surgical Augmented Reality, 63000 Clermont-Ferrand, France
| | - Roland Croner
- Department of Surgery, University of Magdeburg, 39106 Magdeburg, Germany;
| | - Gaya Spolverato
- Department of Surgical, Oncological and Gastroenterological Sciences, University of Padova, 35122 Padova, Italy;
| | - Isabella Frigerio
- Department of Hepato-Pancreato-Biliary Surgery, Pederzoli Hospital, 37019 Peschiera del Garda, Italy;
| | - Alfredo Illanes
- INKA-Innovation Laboratory for Image Guided Therapy, Otto-von-Guericke University Magdeburg, 39120 Magdeburg, Germany;
| | - Mohammad Abu Hilal
- Unità Chirurgia Epatobiliopancreatica, Robotica e Mininvasiva, Fondazione Poliambulanza Istituto Ospedaliero, Via Bissolati, 57, 25124 Brescia, Italy;
| | - Adrian Park
- Anne Arundel Medical Center, Johns Hopkins University, Annapolis, MD 21401, USA;
| | - Eyad Elyan
- School of Computing, Robert Gordon University, Aberdeen AB10 7JG, UK;
| |
Collapse
|
25
|
Azam MA, Sampieri C, Ioppi A, Benzi P, Giordano GG, De Vecchi M, Campagnari V, Li S, Guastini L, Paderno A, Moccia S, Piazza C, Mattos LS, Peretti G. Videomics of the Upper Aero-Digestive Tract Cancer: Deep Learning Applied to White Light and Narrow Band Imaging for Automatic Segmentation of Endoscopic Images. Front Oncol 2022; 12:900451. [PMID: 35719939 PMCID: PMC9198427 DOI: 10.3389/fonc.2022.900451] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Accepted: 04/26/2022] [Indexed: 12/13/2022] Open
Abstract
Introduction Narrow Band Imaging (NBI) is an endoscopic visualization technique useful for upper aero-digestive tract (UADT) cancer detection and margins evaluation. However, NBI analysis is strongly operator-dependent and requires high expertise, thus limiting its wider implementation. Recently, artificial intelligence (AI) has demonstrated potential for applications in UADT videoendoscopy. Among AI methods, deep learning algorithms, and especially convolutional neural networks (CNNs), are particularly suitable for delineating cancers on videoendoscopy. This study is aimed to develop a CNN for automatic semantic segmentation of UADT cancer on endoscopic images. Materials and Methods A dataset of white light and NBI videoframes of laryngeal squamous cell carcinoma (LSCC) was collected and manually annotated. A novel DL segmentation model (SegMENT) was designed. SegMENT relies on DeepLabV3+ CNN architecture, modified using Xception as a backbone and incorporating ensemble features from other CNNs. The performance of SegMENT was compared to state-of-the-art CNNs (UNet, ResUNet, and DeepLabv3). SegMENT was then validated on two external datasets of NBI images of oropharyngeal (OPSCC) and oral cavity SCC (OSCC) obtained from a previously published study. The impact of in-domain transfer learning through an ensemble technique was evaluated on the external datasets. Results 219 LSCC patients were retrospectively included in the study. A total of 683 videoframes composed the LSCC dataset, while the external validation cohorts of OPSCC and OCSCC contained 116 and 102 images. On the LSCC dataset, SegMENT outperformed the other DL models, obtaining the following median values: 0.68 intersection over union (IoU), 0.81 dice similarity coefficient (DSC), 0.95 recall, 0.78 precision, 0.97 accuracy. For the OCSCC and OPSCC datasets, results were superior compared to previously published data: the median performance metrics were, respectively, improved as follows: DSC=10.3% and 11.9%, recall=15.0% and 5.1%, precision=17.0% and 14.7%, accuracy=4.1% and 10.3%. Conclusion SegMENT achieved promising performances, showing that automatic tumor segmentation in endoscopic images is feasible even within the highly heterogeneous and complex UADT environment. SegMENT outperformed the previously published results on the external validation cohorts. The model demonstrated potential for improved detection of early tumors, more precise biopsies, and better selection of resection margins.
Collapse
Affiliation(s)
- Muhammad Adeel Azam
- Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy
| | - Claudio Sampieri
- Unit of Otorhinolaryngology - Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy.,Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| | - Alessandro Ioppi
- Unit of Otorhinolaryngology - Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy.,Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| | - Pietro Benzi
- Unit of Otorhinolaryngology - Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy.,Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| | - Giorgio Gregory Giordano
- Unit of Otorhinolaryngology - Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy.,Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| | - Marta De Vecchi
- Unit of Otorhinolaryngology - Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy.,Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| | - Valentina Campagnari
- Unit of Otorhinolaryngology - Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy.,Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| | - Shunlei Li
- Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy
| | - Luca Guastini
- Unit of Otorhinolaryngology - Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy.,Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| | - Alberto Paderno
- Unit of Otorhinolaryngology - Head and Neck Surgery, ASST Spedali Civili of Brescia, Brescia, Italy.,Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, University of Brescia, Brescia, Italy
| | - Sara Moccia
- The BioRobotics Institute and Department of Excellence in Robotics and AI, Scuola Superiore Sant'Anna, Pisa, Italy
| | - Cesare Piazza
- Unit of Otorhinolaryngology - Head and Neck Surgery, ASST Spedali Civili of Brescia, Brescia, Italy.,Department of Medical and Surgical Specialties, Radiological Sciences, and Public Health, University of Brescia, Brescia, Italy
| | - Leonardo S Mattos
- Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy
| | - Giorgio Peretti
- Unit of Otorhinolaryngology - Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Genoa, Italy.,Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, Genoa, Italy
| |
Collapse
|
26
|
Seidlitz S, Sellner J, Odenthal J, Özdemir B, Studier-Fischer A, Knödler S, Ayala L, Adler TJ, Kenngott HG, Tizabi M, Wagner M, Nickel F, Müller-Stich BP, Maier-Hein L. Robust deep learning-based semantic organ segmentation in hyperspectral images. Med Image Anal 2022; 80:102488. [DOI: 10.1016/j.media.2022.102488] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Revised: 03/28/2022] [Accepted: 05/20/2022] [Indexed: 12/15/2022]
|
27
|
Wang YP, Jheng YC, Sung KY, Lin HE, Hsin IF, Chen PH, Chu YC, Lu D, Wang YJ, Hou MC, Lee FY, Lu CL. Use of U-Net Convolutional Neural Networks for Automated Segmentation of Fecal Material for Objective Evaluation of Bowel Preparation Quality in Colonoscopy. Diagnostics (Basel) 2022; 12:diagnostics12030613. [PMID: 35328166 PMCID: PMC8947406 DOI: 10.3390/diagnostics12030613] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 02/22/2022] [Accepted: 02/24/2022] [Indexed: 12/29/2022] Open
Abstract
Background: Adequate bowel cleansing is important for colonoscopy performance evaluation. Current bowel cleansing evaluation scales are subjective, with a wide variation in consistency among physicians and low reported rates of accuracy. We aim to use machine learning to develop a fully automatic segmentation method for the objective evaluation of the adequacy of colon preparation. Methods: Colonoscopy videos were retrieved from a video data cohort and transferred to qualified images, which were randomly divided into training, validation, and verification datasets. The fecal residue was manually segmented. A deep learning model based on the U-Net convolutional network architecture was developed to perform automatic segmentation. The performance of the automatic segmentation was evaluated on the overlap area with the manual segmentation. Results: A total of 10,118 qualified images from 119 videos were obtained. The model averaged 0.3634 s to segmentate one image automatically. The models produced a strong high-overlap area with manual segmentation, with 94.7% ± 0.67% of that area predicted by our AI model, which correlated well with the area measured manually (r = 0.915, p < 0.001). The AI system can be applied in real-time qualitatively and quantitatively. Conclusions: We established a fully automatic segmentation method to rapidly and accurately mark the fecal residue-coated mucosa for the objective evaluation of colon preparation.
Collapse
Affiliation(s)
- Yen-Po Wang
- Endoscopy Center for Diagnosis and Treatment, Department of Medicine, Taipei Veterans General Hospital, Taipei 112, Taiwan; (Y.-P.W.); (Y.-C.J.); (K.-Y.S.); (H.-E.L.); (I.-F.H.); (P.-H.C.); (D.L.); (M.-C.H.)
- Division of Gastroenterology, Department of Medicine, Taipei Veterans General Hospital, Taipei 112, Taiwan;
- Institute of Brain Science, National Yang Ming Chiao Tung University School of Medicine, Taipei 112, Taiwan
- Faculty of Medicine, National Yang Ming Chiao Tung University School of Medicine, Taipei 112, Taiwan;
| | - Ying-Chun Jheng
- Endoscopy Center for Diagnosis and Treatment, Department of Medicine, Taipei Veterans General Hospital, Taipei 112, Taiwan; (Y.-P.W.); (Y.-C.J.); (K.-Y.S.); (H.-E.L.); (I.-F.H.); (P.-H.C.); (D.L.); (M.-C.H.)
- Faculty of Medicine, National Yang Ming Chiao Tung University School of Medicine, Taipei 112, Taiwan;
- Department of Medical Research, Taipei Veterans General Hospital, Taipei 112, Taiwan
| | - Kuang-Yi Sung
- Endoscopy Center for Diagnosis and Treatment, Department of Medicine, Taipei Veterans General Hospital, Taipei 112, Taiwan; (Y.-P.W.); (Y.-C.J.); (K.-Y.S.); (H.-E.L.); (I.-F.H.); (P.-H.C.); (D.L.); (M.-C.H.)
- Division of Gastroenterology, Department of Medicine, Taipei Veterans General Hospital, Taipei 112, Taiwan;
- Faculty of Medicine, National Yang Ming Chiao Tung University School of Medicine, Taipei 112, Taiwan;
| | - Hung-En Lin
- Endoscopy Center for Diagnosis and Treatment, Department of Medicine, Taipei Veterans General Hospital, Taipei 112, Taiwan; (Y.-P.W.); (Y.-C.J.); (K.-Y.S.); (H.-E.L.); (I.-F.H.); (P.-H.C.); (D.L.); (M.-C.H.)
- Division of Gastroenterology, Department of Medicine, Taipei Veterans General Hospital, Taipei 112, Taiwan;
- Faculty of Medicine, National Yang Ming Chiao Tung University School of Medicine, Taipei 112, Taiwan;
| | - I-Fang Hsin
- Endoscopy Center for Diagnosis and Treatment, Department of Medicine, Taipei Veterans General Hospital, Taipei 112, Taiwan; (Y.-P.W.); (Y.-C.J.); (K.-Y.S.); (H.-E.L.); (I.-F.H.); (P.-H.C.); (D.L.); (M.-C.H.)
- Division of Gastroenterology, Department of Medicine, Taipei Veterans General Hospital, Taipei 112, Taiwan;
- Faculty of Medicine, National Yang Ming Chiao Tung University School of Medicine, Taipei 112, Taiwan;
| | - Ping-Hsien Chen
- Endoscopy Center for Diagnosis and Treatment, Department of Medicine, Taipei Veterans General Hospital, Taipei 112, Taiwan; (Y.-P.W.); (Y.-C.J.); (K.-Y.S.); (H.-E.L.); (I.-F.H.); (P.-H.C.); (D.L.); (M.-C.H.)
- Division of Gastroenterology, Department of Medicine, Taipei Veterans General Hospital, Taipei 112, Taiwan;
- Faculty of Medicine, National Yang Ming Chiao Tung University School of Medicine, Taipei 112, Taiwan;
| | - Yuan-Chia Chu
- Information Management Office, Taipei Veterans General Hospital, Taipei 112, Taiwan;
- Big Data Center, Taipei Veterans General Hospital, Taipei 112, Taiwan
- Department of Information Management, National Taipei University of Nursing and Health Sciences, Taipei 112, Taiwan
| | - David Lu
- Endoscopy Center for Diagnosis and Treatment, Department of Medicine, Taipei Veterans General Hospital, Taipei 112, Taiwan; (Y.-P.W.); (Y.-C.J.); (K.-Y.S.); (H.-E.L.); (I.-F.H.); (P.-H.C.); (D.L.); (M.-C.H.)
| | - Yuan-Jen Wang
- Faculty of Medicine, National Yang Ming Chiao Tung University School of Medicine, Taipei 112, Taiwan;
- Healthcare and Management Center, Taipei Veterans General Hospital, Taipei 112, Taiwan
| | - Ming-Chih Hou
- Endoscopy Center for Diagnosis and Treatment, Department of Medicine, Taipei Veterans General Hospital, Taipei 112, Taiwan; (Y.-P.W.); (Y.-C.J.); (K.-Y.S.); (H.-E.L.); (I.-F.H.); (P.-H.C.); (D.L.); (M.-C.H.)
- Division of Gastroenterology, Department of Medicine, Taipei Veterans General Hospital, Taipei 112, Taiwan;
- Faculty of Medicine, National Yang Ming Chiao Tung University School of Medicine, Taipei 112, Taiwan;
| | - Fa-Yauh Lee
- Division of Gastroenterology, Department of Medicine, Taipei Veterans General Hospital, Taipei 112, Taiwan;
- Faculty of Medicine, National Yang Ming Chiao Tung University School of Medicine, Taipei 112, Taiwan;
| | - Ching-Liang Lu
- Endoscopy Center for Diagnosis and Treatment, Department of Medicine, Taipei Veterans General Hospital, Taipei 112, Taiwan; (Y.-P.W.); (Y.-C.J.); (K.-Y.S.); (H.-E.L.); (I.-F.H.); (P.-H.C.); (D.L.); (M.-C.H.)
- Division of Gastroenterology, Department of Medicine, Taipei Veterans General Hospital, Taipei 112, Taiwan;
- Institute of Brain Science, National Yang Ming Chiao Tung University School of Medicine, Taipei 112, Taiwan
- Faculty of Medicine, National Yang Ming Chiao Tung University School of Medicine, Taipei 112, Taiwan;
- Correspondence: ; Tel.: +886-2-2875-7272
| |
Collapse
|
28
|
Kim MS, Cha JH, Lee S, Han L, Park W, Ahn JS, Park SC. Deep-Learning-Based Cerebral Artery Semantic Segmentation in Neurosurgical Operating Microscope Vision Using Indocyanine Green Fluorescence Videoangiography. Front Neurorobot 2022; 15:735177. [PMID: 35095454 PMCID: PMC8790180 DOI: 10.3389/fnbot.2021.735177] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Accepted: 11/23/2021] [Indexed: 11/18/2022] Open
Abstract
There have been few anatomical structure segmentation studies using deep learning. Numbers of training and ground truth images applied were small and the accuracies of which were low or inconsistent. For a surgical video anatomy analysis, various obstacles, including a variable fast-changing view, large deformations, occlusions, low illumination, and inadequate focus occur. In addition, it is difficult and costly to obtain a large and accurate dataset on operational video anatomical structures, including arteries. In this study, we investigated cerebral artery segmentation using an automatic ground-truth generation method. Indocyanine green (ICG) fluorescence intraoperative cerebral videoangiography was used to create a ground-truth dataset mainly for cerebral arteries and partly for cerebral blood vessels, including veins. Four different neural network models were trained using the dataset and compared. Before augmentation, 35,975 training images and 11,266 validation images were used. After augmentation, 260,499 training and 90,129 validation images were used. A Dice score of 79% for cerebral artery segmentation was achieved using the DeepLabv3+ model trained using an automatically generated dataset. Strict validation in different patient groups was conducted. Arteries were also discerned from the veins using the ICG videoangiography phase. We achieved fair accuracy, which demonstrated the appropriateness of the methodology. This study proved the feasibility of operating field view of the cerebral artery segmentation using deep learning, and the effectiveness of the automatic blood vessel ground truth generation method using ICG fluorescence videoangiography. Using this method, computer vision can discern blood vessels and arteries from veins in a neurosurgical microscope field of view. Thus, this technique is essential for neurosurgical field vessel anatomy-based navigation. In addition, surgical assistance, safety, and autonomous surgery neurorobotics that can detect or manipulate cerebral vessels would require computer vision to identify blood vessels and arteries.
Collapse
Affiliation(s)
- Min-seok Kim
- Clinical Research Team, Deepnoid, Seoul, South Korea
| | - Joon Hyuk Cha
- Department of Internal Medicine, Inha University Hospital, Incheon, South Korea
| | - Seonhwa Lee
- Department of Bio-convergence Engineering, Korea University, Seoul, South Korea
| | - Lihong Han
- Clinical Research Team, Deepnoid, Seoul, South Korea
- Department of Computer Science and Engineering, Soongsil University, Seoul, South Korea
| | - Wonhyoung Park
- Department of Neurosurgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea
| | - Jae Sung Ahn
- Department of Neurosurgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea
| | - Seong-Cheol Park
- Clinical Research Team, Deepnoid, Seoul, South Korea
- Department of Neurosurgery, Gangneung Asan Hospital, University of Ulsan College of Medicine, Gangneung, South Korea
- Department of Neurosurgery, Seoul Metropolitan Government—Seoul National University Boramae Medical Center, Seoul, South Korea
- Department of Neurosurgery, Hallym Hospital, Incheon, South Korea
- *Correspondence: Seong-Cheol Park
| |
Collapse
|
29
|
Yang CH, Ren JH, Huang HC, Chuang LY, Chang PY. Deep Hybrid Convolutional Neural Network for Segmentation of Melanoma Skin Lesion. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2021; 2021:9409508. [PMID: 34790232 PMCID: PMC8592765 DOI: 10.1155/2021/9409508] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 09/05/2021] [Accepted: 10/11/2021] [Indexed: 11/17/2022]
Abstract
Melanoma is a type of skin cancer that often leads to poor prognostic responses and survival rates. Melanoma usually develops in the limbs, including in fingers, palms, and the margins of the nails. When melanoma is detected early, surgical treatment may achieve a higher cure rate. The early diagnosis of melanoma depends on the manual segmentation of suspected lesions. However, manual segmentation can lead to problems, including misclassification and low efficiency. Therefore, it is essential to devise a method for automatic image segmentation that overcomes the aforementioned issues. In this study, an improved algorithm is proposed, termed EfficientUNet++, which is developed from the U-Net model. In EfficientUNet++, the pretrained EfficientNet model is added to the UNet++ model to accelerate segmentation process, leading to more reliable and precise results in skin cancer image segmentation. Two skin lesion datasets were used to compare the performance of the proposed EfficientUNet++ algorithm with other common models. In the PH2 dataset, EfficientUNet++ achieved a better Dice coefficient (93% vs. 76%-91%), Intersection over Union (IoU, 96% vs. 74%-95%), and loss value (30% vs. 44%-32%) compared with other models. In the International Skin Imaging Collaboration dataset, EfficientUNet++ obtained a similar Dice coefficient (96% vs. 94%-96%) but a better IoU (94% vs. 89%-93%) and loss value (11% vs. 13%-11%) than other models. In conclusion, the EfficientUNet++ model efficiently detects skin lesions by improving composite coefficients and structurally expanding the size of the convolution network. Moreover, the use of residual units deepens the network to further improve performance.
Collapse
Affiliation(s)
- Cheng-Hong Yang
- Department of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung 80778, Taiwan
- Program in Biomedical Engineering, Kaohsiung Medical University, Kaohsiung 80708, Taiwan
- School of Dentistry, Kaohsiung Medical University, Kaohsiung 80708, Taiwan
| | - Jai-Hong Ren
- Department of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung 80778, Taiwan
| | - Hsiu-Chen Huang
- Department of Community Health Physical Medicine and Rehabilitation Physician, Chia-Yi Christian Hospital, Chia-Yi City 60002, Taiwan
| | - Li-Yeh Chuang
- Department of Chemical Engineering and Institute of Biotechnology and Chemical Engineering, I-Shou University, Kaohsiung 84001, Taiwan
| | - Po-Yin Chang
- Department of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung 80778, Taiwan
| |
Collapse
|
30
|
Mattos LS, Acemoglu A, Geraldes A, Laborai A, Schoob A, Tamadazte B, Davies B, Wacogne B, Pieralli C, Barbalata C, Caldwell DG, Kundrat D, Pardo D, Grant E, Mora F, Barresi G, Peretti G, Ortiz J, Rabenorosoa K, Tavernier L, Pazart L, Fichera L, Guastini L, Kahrs LA, Rakotondrabe M, Andreff N, Deshpande N, Gaiffe O, Renevier R, Moccia S, Lescano S, Ortmaier T, Penza V. μRALP and Beyond: Micro-Technologies and Systems for Robot-Assisted Endoscopic Laser Microsurgery. Front Robot AI 2021; 8:664655. [PMID: 34568434 PMCID: PMC8455830 DOI: 10.3389/frobt.2021.664655] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 07/14/2021] [Indexed: 01/05/2023] Open
Abstract
Laser microsurgery is the current gold standard surgical technique for the treatment of selected diseases in delicate organs such as the larynx. However, the operations require large surgical expertise and dexterity, and face significant limitations imposed by available technology, such as the requirement for direct line of sight to the surgical field, restricted access, and direct manual control of the surgical instruments. To change this status quo, the European project μRALP pioneered research towards a complete redesign of current laser microsurgery systems, focusing on the development of robotic micro-technologies to enable endoscopic operations. This has fostered awareness and interest in this field, which presents a unique set of needs, requirements and constraints, leading to research and technological developments beyond μRALP and its research consortium. This paper reviews the achievements and key contributions of such research, providing an overview of the current state of the art in robot-assisted endoscopic laser microsurgery. The primary target application considered is phonomicrosurgery, which is a representative use case involving highly challenging microsurgical techniques for the treatment of glottic diseases. The paper starts by presenting the motivations and rationale for endoscopic laser microsurgery, which leads to the introduction of robotics as an enabling technology for improved surgical field accessibility, visualization and management. Then, research goals, achievements, and current state of different technologies that can build-up to an effective robotic system for endoscopic laser microsurgery are presented. This includes research in micro-robotic laser steering, flexible robotic endoscopes, augmented imaging, assistive surgeon-robot interfaces, and cognitive surgical systems. Innovations in each of these areas are shown to provide sizable progress towards more precise, safer and higher quality endoscopic laser microsurgeries. Yet, major impact is really expected from the full integration of such individual contributions into a complete clinical surgical robotic system, as illustrated in the end of this paper with a description of preliminary cadaver trials conducted with the integrated μRALP system. Overall, the contribution of this paper lays in outlining the current state of the art and open challenges in the area of robot-assisted endoscopic laser microsurgery, which has important clinical applications even beyond laryngology.
Collapse
Affiliation(s)
| | | | | | - Andrea Laborai
- Department of Otorhinolaryngology, Guglielmo da Saliceto Hospital, Piacenza, Italy
| | | | - Brahim Tamadazte
- Institut des Systèmes Intelligents et de Robotique, Sorbonne Université, CNRS, Paris, France
| | | | - Bruno Wacogne
- FEMTO-ST Institute, Univ. Bourgogne Franche-Comte, CNRS, Besançon, France.,Centre Hospitalier Régional Universitaire, Besançon, France
| | - Christian Pieralli
- FEMTO-ST Institute, Univ. Bourgogne Franche-Comte, CNRS, Besançon, France
| | - Corina Barbalata
- Mechanical and Industrial Engineering Department, Louisiana State University, Baton Rouge, LA, United States
| | | | | | - Diego Pardo
- Istituto Italiano di Tecnologia, Genoa, Italy
| | - Edward Grant
- Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC, United States
| | - Francesco Mora
- Clinica Otorinolaringoiatrica, IRCCS Policlinico San Martino, Genoa, Italy.,Dipartimento di Scienze Chirurgiche e Diagnostiche Integrate, Università Degli Studi di Genova, Genoa, Italy
| | | | - Giorgio Peretti
- Clinica Otorinolaringoiatrica, IRCCS Policlinico San Martino, Genoa, Italy.,Dipartimento di Scienze Chirurgiche e Diagnostiche Integrate, Università Degli Studi di Genova, Genoa, Italy
| | - Jesùs Ortiz
- Istituto Italiano di Tecnologia, Genoa, Italy
| | - Kanty Rabenorosoa
- FEMTO-ST Institute, Univ. Bourgogne Franche-Comte, CNRS, Besançon, France
| | | | - Lionel Pazart
- Centre Hospitalier Régional Universitaire, Besançon, France
| | - Loris Fichera
- Department of Robotics Engineering, Worcester Polytechnic Institute, Worcester, MA, United States
| | - Luca Guastini
- Clinica Otorinolaringoiatrica, IRCCS Policlinico San Martino, Genoa, Italy.,Dipartimento di Scienze Chirurgiche e Diagnostiche Integrate, Università Degli Studi di Genova, Genoa, Italy
| | - Lüder A Kahrs
- Department of Mathematical and Computational Sciences, University of Toronto, Mississauga, ON, Canada
| | - Micky Rakotondrabe
- National School of Engineering in Tarbes, University of Toulouse, Tarbes, France
| | - Nicolas Andreff
- FEMTO-ST Institute, Univ. Bourgogne Franche-Comte, CNRS, Besançon, France
| | | | - Olivier Gaiffe
- Centre Hospitalier Régional Universitaire, Besançon, France
| | - Rupert Renevier
- FEMTO-ST Institute, Univ. Bourgogne Franche-Comte, CNRS, Besançon, France
| | - Sara Moccia
- The BioRobotics Institute, Scuola Superiore Sant'Anna, Pisa, Italy
| | - Sergio Lescano
- FEMTO-ST Institute, Univ. Bourgogne Franche-Comte, CNRS, Besançon, France
| | - Tobias Ortmaier
- Institute of Mechatronic Systems, Leibniz Universität Hannover, Garbsen, Germany
| | | |
Collapse
|
31
|
Su YH, Jiang W, Chitrakar D, Huang K, Peng H, Hannaford B. Local Style Preservation in Improved GAN-Driven Synthetic Image Generation for Endoscopic Tool Segmentation. SENSORS (BASEL, SWITZERLAND) 2021; 21:5163. [PMID: 34372398 PMCID: PMC8346972 DOI: 10.3390/s21155163] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 07/25/2021] [Accepted: 07/27/2021] [Indexed: 12/19/2022]
Abstract
Accurate semantic image segmentation from medical imaging can enable intelligent vision-based assistance in robot-assisted minimally invasive surgery. The human body and surgical procedures are highly dynamic. While machine-vision presents a promising approach, sufficiently large training image sets for robust performance are either costly or unavailable. This work examines three novel generative adversarial network (GAN) methods of providing usable synthetic tool images using only surgical background images and a few real tool images. The best of these three novel approaches generates realistic tool textures while preserving local background content by incorporating both a style preservation and a content loss component into the proposed multi-level loss function. The approach is quantitatively evaluated, and results suggest that the synthetically generated training tool images enhance UNet tool segmentation performance. More specifically, with a random set of 100 cadaver and live endoscopic images from the University of Washington Sinus Dataset, the UNet trained with synthetically generated images using the presented method resulted in 35.7% and 30.6% improvement over using purely real images in mean Dice coefficient and Intersection over Union scores, respectively. This study is promising towards the use of more widely available and routine screening endoscopy to preoperatively generate synthetic training tool images for intraoperative UNet tool segmentation.
Collapse
Affiliation(s)
- Yun-Hsuan Su
- Department of Computer Science, Mount Holyoke College, 50 College Street, South Hadley, MA 01075, USA;
| | - Wenfan Jiang
- Department of Computer Science, Mount Holyoke College, 50 College Street, South Hadley, MA 01075, USA;
| | - Digesh Chitrakar
- Department of Engineering, Trinity College, 300 Summit St., Hartford, CT 06106, USA; (D.C.); (K.H.)
| | - Kevin Huang
- Department of Engineering, Trinity College, 300 Summit St., Hartford, CT 06106, USA; (D.C.); (K.H.)
| | - Haonan Peng
- Department of Electrical and Computer Engineering, University of Washington, 185 Stevens Way, Paul Allen Center, Seattle, WA 98105, USA; (H.P.); (B.H.)
| | - Blake Hannaford
- Department of Electrical and Computer Engineering, University of Washington, 185 Stevens Way, Paul Allen Center, Seattle, WA 98105, USA; (H.P.); (B.H.)
| |
Collapse
|
32
|
Using deep learning to identify the recurrent laryngeal nerve during thyroidectomy. Sci Rep 2021; 11:14306. [PMID: 34253767 PMCID: PMC8275665 DOI: 10.1038/s41598-021-93202-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2021] [Accepted: 06/22/2021] [Indexed: 11/16/2022] Open
Abstract
Surgeons must visually distinguish soft-tissues, such as nerves, from surrounding anatomy to prevent complications and optimize patient outcomes. An accurate nerve segmentation and analysis tool could provide useful insight for surgical decision-making. Here, we present an end-to-end, automatic deep learning computer vision algorithm to segment and measure nerves. Unlike traditional medical imaging, our unconstrained setup with accessible handheld digital cameras, along with the unstructured open surgery scene, makes this task uniquely challenging. We investigate one common procedure, thyroidectomy, during which surgeons must avoid damaging the recurrent laryngeal nerve (RLN), which is responsible for human speech. We evaluate our segmentation algorithm on a diverse dataset across varied and challenging settings of operating room image capture, and show strong segmentation performance in the optimal image capture condition. This work lays the foundation for future research in real-time tissue discrimination and integration of accessible, intelligent tools into open surgery to provide actionable insights.
Collapse
|
33
|
Kist AM, Dürr S, Schützenberger A, Döllinger M. OpenHSV: an open platform for laryngeal high-speed videoendoscopy. Sci Rep 2021; 11:13760. [PMID: 34215788 PMCID: PMC8253769 DOI: 10.1038/s41598-021-93149-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Accepted: 06/03/2021] [Indexed: 11/22/2022] Open
Abstract
High-speed videoendoscopy is an important tool to study laryngeal dynamics, to quantify vocal fold oscillations, to diagnose voice impairments at laryngeal level and to monitor treatment progress. However, there is a significant lack of an open source, expandable research tool that features latest hardware and data analysis. In this work, we propose an open research platform termed OpenHSV that is based on state-of-the-art, commercially available equipment and features a fully automatic data analysis pipeline. A publicly available, user-friendly graphical user interface implemented in Python is used to interface the hardware. Video and audio data are recorded in synchrony and are subsequently fully automatically analyzed. Video segmentation of the glottal area is performed using efficient deep neural networks to derive glottal area waveform and glottal midline. Established quantitative, clinically relevant video and audio parameters were implemented and computed. In a preliminary clinical study, we recorded video and audio data from 28 healthy subjects. Analyzing these data in terms of image quality and derived quantitative parameters, we show the applicability, performance and usefulness of OpenHSV. Therefore, OpenHSV provides a valid, standardized access to high-speed videoendoscopy data acquisition and analysis for voice scientists, highlighting its use as a valuable research tool in understanding voice physiology. We envision that OpenHSV serves as basis for the next generation of clinical HSV systems.
Collapse
Affiliation(s)
- Andreas M Kist
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstr. 1, 91054, Erlangen, Germany. .,Department of Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nürnberg, Henkestr. 91, 91054, Erlangen, Germany.
| | - Stephan Dürr
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstr. 1, 91054, Erlangen, Germany
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstr. 1, 91054, Erlangen, Germany
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstr. 1, 91054, Erlangen, Germany
| |
Collapse
|
34
|
Kist AM, Gómez P, Dubrovskiy D, Schlegel P, Kunduk M, Echternach M, Patel R, Semmler M, Bohr C, Dürr S, Schützenberger A, Döllinger M. A Deep Learning Enhanced Novel Software Tool for Laryngeal Dynamics Analysis. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:1889-1903. [PMID: 34000199 DOI: 10.1044/2021_jslhr-20-00498] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Purpose High-speed videoendoscopy (HSV) is an emerging, but barely used, endoscopy technique in the clinic to assess and diagnose voice disorders because of the lack of dedicated software to analyze the data. HSV allows to quantify the vocal fold oscillations by segmenting the glottal area. This challenging task has been tackled by various studies; however, the proposed approaches are mostly limited and not suitable for daily clinical routine. Method We developed a user-friendly software in C# that allows the editing, motion correction, segmentation, and quantitative analysis of HSV data. We further provide pretrained deep neural networks for fully automatic glottis segmentation. Results We freely provide our software Glottis Analysis Tools (GAT). Using GAT, we provide a general threshold-based region growing platform that enables the user to analyze data from various sources, such as in vivo recordings, ex vivo recordings, and high-speed footage of artificial vocal folds. Additionally, especially for in vivo recordings, we provide three robust neural networks at various speed and quality settings to allow a fully automatic glottis segmentation needed for application by untrained personnel. GAT further evaluates video and audio data in parallel and is able to extract various features from the video data, among others the glottal area waveform, that is, the changing glottal area over time. In total, GAT provides 79 unique quantitative analysis parameters for video- and audio-based signals. Many of these parameters have already been shown to reflect voice disorders, highlighting the clinical importance and usefulness of the GAT software. Conclusion GAT is a unique tool to process HSV and audio data to determine quantitative, clinically relevant parameters for research, diagnosis, and treatment of laryngeal disorders. Supplemental Material https://doi.org/10.23641/asha.14575533.
Collapse
Affiliation(s)
- Andreas M Kist
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Pablo Gómez
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Denis Dubrovskiy
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Patrick Schlegel
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Melda Kunduk
- Department of Communication Sciences and Disorders, Louisiana State University, Baton Rouge
| | - Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Germany
| | - Rita Patel
- Department of Speech, Language and Hearing Sciences, College of Arts and Sciences, Indiana University, Bloomington
| | - Marion Semmler
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Christopher Bohr
- Klinik und Poliklinik für Hals-Nasen-Ohren-Heilkunde Universitätsklinikum Regensburg, Germany
| | - Stephan Dürr
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| |
Collapse
|
35
|
Cho WK, Lee YJ, Joo HA, Jeong IS, Choi Y, Nam SY, Kim SY, Choi SH. Diagnostic Accuracies of Laryngeal Diseases Using a Convolutional Neural Network-Based Image Classification System. Laryngoscope 2021; 131:2558-2566. [PMID: 34000069 DOI: 10.1002/lary.29595] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 04/17/2021] [Accepted: 04/21/2021] [Indexed: 12/21/2022]
Abstract
OBJECTIVES/HYPOTHESIS There may be an interobserver variation in the diagnosis of laryngeal disease based on laryngoscopic images according to clinical experience. Therefore, this study is aimed to perform computer-assisted diagnosis for common laryngeal diseases using deep learning-based disease classification models. STUDY DESIGN Experimental study with retrospective data METHODS: A total of 4106 images (cysts, nodules, polyps, leukoplakia, papillomas, Reinke's edema, granulomas, palsies, and normal cases) were analyzed. After equal distribution of diseases into ninefolds, stratified eightfold cross-validation was performed for training, validation process and remaining onefold was used as a test dataset. A trained model was applied to test sets, and model performance was assessed for precision (positive predictive value), recall (sensitivity), accuracy, F1 score, precision-recall (PR) curve, and PR-area under the receiver operating characteristic curve (PR-AUC). Outcomes were compared to those of visual assessments by four trainees. RESULTS The trained deep neural networks (DNNs) outperformed trainees' visual assessments in discriminating cysts, granulomas, nodules, normal cases, palsies, papillomas, and polyps according to the PR-AUC and F1 score. The lowest F1 score and PR-AUC of DNNs were estimated for Reinke's edema (0.720, 0.800) and nodules (0.730, 0.780) but were comparable to the mean of the two trainees' F1 score with the best performances (0.765 and 0.675, respectively). In discriminating papillomas, the F1 score was much higher for DNNs (0.870) than for trainees (0.685). Overall, DNNs outperformed all trainees (micro-average PR-AUC = 0.95; macro-average PR-AUC = 0.91). CONCLUSIONS DNN technology could be applied to laryngoscopy to supplement clinical assessment of examiners by providing additional diagnostic clues and having a role as a reference of diagnosis. LEVEL OF EVIDENCE 3 Laryngoscope, 2021.
Collapse
Affiliation(s)
- Won Ki Cho
- Department of Otorhinolaryngology-Head and Neck Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Yeong Ju Lee
- Department of Otorhinolaryngology-Head and Neck Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Hye Ah Joo
- Department of Otorhinolaryngology-Head and Neck Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - In Seong Jeong
- Department of Otorhinolaryngology-Head and Neck Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Yeonjoo Choi
- Department of Otorhinolaryngology-Head and Neck Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Soon Yuhl Nam
- Department of Otorhinolaryngology-Head and Neck Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Sang Yoon Kim
- Department of Otorhinolaryngology-Head and Neck Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Seung-Ho Choi
- Department of Otorhinolaryngology-Head and Neck Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| |
Collapse
|
36
|
Paderno A, Piazza C, Del Bon F, Lancini D, Tanagli S, Deganello A, Peretti G, De Momi E, Patrini I, Ruperti M, Mattos LS, Moccia S. Deep Learning for Automatic Segmentation of Oral and Oropharyngeal Cancer Using Narrow Band Imaging: Preliminary Experience in a Clinical Perspective. Front Oncol 2021; 11:626602. [PMID: 33842330 PMCID: PMC8024583 DOI: 10.3389/fonc.2021.626602] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 03/08/2021] [Indexed: 01/22/2023] Open
Abstract
Introduction Fully convoluted neural networks (FCNN) applied to video-analysis are of particular interest in the field of head and neck oncology, given that endoscopic examination is a crucial step in diagnosis, staging, and follow-up of patients affected by upper aero-digestive tract cancers. The aim of this study was to test FCNN-based methods for semantic segmentation of squamous cell carcinoma (SCC) of the oral cavity (OC) and oropharynx (OP). Materials and Methods Two datasets were retrieved from the institutional registry of a tertiary academic hospital analyzing 34 and 45 NBI endoscopic videos of OC and OP lesions, respectively. The dataset referring to the OC was composed of 110 frames, while 116 frames composed the OP dataset. Three FCNNs (U-Net, U-Net 3, and ResNet) were investigated to segment the neoplastic images. FCNNs performance was evaluated for each tested network and compared to the gold standard, represented by the manual annotation performed by expert clinicians. Results For FCNN-based segmentation of the OC dataset, the best results in terms of Dice Similarity Coefficient (Dsc) were achieved by ResNet with 5(×2) blocks and 16 filters, with a median value of 0.6559. In FCNN-based segmentation for the OP dataset, the best results in terms of Dsc were achieved by ResNet with 4(×2) blocks and 16 filters, with a median value of 0.7603. All tested FCNNs presented very high values of variance, leading to very low values of minima for all metrics evaluated. Conclusions FCNNs have promising potential in the analysis and segmentation of OC and OP video-endoscopic images. All tested FCNN architectures demonstrated satisfying outcomes in terms of diagnostic accuracy. The inference time of the processing networks were particularly short, ranging between 14 and 115 ms, thus showing the possibility for real-time application.
Collapse
Affiliation(s)
- Alberto Paderno
- Department of Otorhinolaryngology-Head and Neck Surgery, ASST-Spedali Civili of Brescia, University of Brescia, Brescia, Italy
| | - Cesare Piazza
- Department of Otorhinolaryngology-Head and Neck Surgery, ASST-Spedali Civili of Brescia, University of Brescia, Brescia, Italy
| | - Francesca Del Bon
- Department of Otorhinolaryngology-Head and Neck Surgery, ASST-Spedali Civili of Brescia, University of Brescia, Brescia, Italy
| | - Davide Lancini
- Department of Otorhinolaryngology-Head and Neck Surgery, ASST-Spedali Civili of Brescia, University of Brescia, Brescia, Italy
| | - Stefano Tanagli
- Department of Otorhinolaryngology-Head and Neck Surgery, ASST-Spedali Civili of Brescia, University of Brescia, Brescia, Italy
| | - Alberto Deganello
- Department of Otorhinolaryngology-Head and Neck Surgery, ASST-Spedali Civili of Brescia, University of Brescia, Brescia, Italy
| | - Giorgio Peretti
- Department of Otorhinolaryngology-Head and Neck Surgery, IRCCS San Martino Hospital, University of Genoa, Genoa, Italy
| | - Elena De Momi
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, Italy
| | - Ilaria Patrini
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, Italy
| | - Michela Ruperti
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, Italy
| | - Leonardo S Mattos
- Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy
| | - Sara Moccia
- Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy.,The BioRobotics Institute, Scuola Superiore Sant'Anna, Pisa, Italy.,Department of Excellence in Robotics and AI, Scuola Superiore Sant'Anna, Pisa, Italy
| |
Collapse
|
37
|
Abstract
PURPOSE OF REVIEW Machine learning (ML) algorithms have augmented human judgment in various fields of clinical medicine. However, little progress has been made in applying these tools to video-endoscopy. We reviewed the field of video-analysis (herein termed 'Videomics' for the first time) as applied to diagnostic endoscopy, assessing its preliminary findings, potential, as well as limitations, and consider future developments. RECENT FINDINGS ML has been applied to diagnostic endoscopy with different aims: blind-spot detection, automatic quality control, lesion detection, classification, and characterization. The early experience in gastrointestinal endoscopy has recently been expanded to the upper aerodigestive tract, demonstrating promising results in both clinical fields. From top to bottom, multispectral imaging (such as Narrow Band Imaging) appeared to provide significant information drawn from endoscopic images. SUMMARY Videomics is an emerging discipline that has the potential to significantly improve human detection and characterization of clinically significant lesions during endoscopy across medical and surgical disciplines. Research teams should focus on the standardization of data collection, identification of common targets, and optimal reporting. With such a collaborative stepwise approach, Videomics is likely to soon augment clinical endoscopy, significantly impacting cancer patient outcomes.
Collapse
|
38
|
Jansen MJA, Kuijf HJ, Dhara AK, Weaver NA, Jan Biessels G, Strand R, Pluim JPW. Patient-specific fine-tuning of convolutional neural networks for follow-up lesion quantification. J Med Imaging (Bellingham) 2020; 7:064003. [PMID: 33344673 PMCID: PMC7744252 DOI: 10.1117/1.jmi.7.6.064003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Accepted: 11/16/2020] [Indexed: 11/17/2022] Open
Abstract
Purpose: Convolutional neural network (CNN) methods have been proposed to quantify lesions in medical imaging. Commonly, more than one imaging examination is available for a patient, but the serial information in these images often remains unused. CNN-based methods have the potential to extract valuable information from previously acquired imaging to better quantify lesions on current imaging of the same patient. Approach: A pretrained CNN can be updated with a patient’s previously acquired imaging: patient-specific fine-tuning (FT). In this work, we studied the improvement in performance of lesion quantification methods on magnetic resonance images after FT compared to a pretrained base CNN. We applied the method to two different approaches: the detection of liver metastases and the segmentation of brain white matter hyperintensities (WMH). Results: The patient-specific fine-tuned CNN has a better performance than the base CNN. For the liver metastases, the median true positive rate increases from 0.67 to 0.85. For the WMH segmentation, the mean Dice similarity coefficient increases from 0.82 to 0.87. Conclusions: We showed that patient-specific FT has the potential to improve the lesion quantification performance of general CNNs by exploiting a patient’s previously acquired imaging.
Collapse
Affiliation(s)
- Mariëlle J A Jansen
- University Medical Center Utrecht and Utrecht University, Image Sciences Institute, Utrecht, The Netherlands
| | - Hugo J Kuijf
- University Medical Center Utrecht and Utrecht University, Image Sciences Institute, Utrecht, The Netherlands
| | - Ashis K Dhara
- Uppsala University, Center for Image Analysis, Department of Information Technology, Uppsala, Sweden
| | - Nick A Weaver
- University Medical Center Utrecht, Brain Center Rudolf Magnus, Department of Neurology, Utrecht, The Netherlands
| | - Geert Jan Biessels
- University Medical Center Utrecht, Brain Center Rudolf Magnus, Department of Neurology, Utrecht, The Netherlands
| | - Robin Strand
- Uppsala University, Center for Image Analysis, Department of Information Technology, Uppsala, Sweden
| | - Josien P W Pluim
- University Medical Center Utrecht and Utrecht University, Image Sciences Institute, Utrecht, The Netherlands
| |
Collapse
|
39
|
Abstract
A healthy voice is crucial for verbal communication and hence in daily as well as professional life. The basis for a healthy voice are the sound producing vocal folds in the larynx. A hallmark of healthy vocal fold oscillation is the symmetric motion of the left and right vocal fold. Clinically, videoendoscopy is applied to assess the symmetry of the oscillation and evaluated subjectively. High-speed videoendoscopy, an emerging method that allows quantification of the vocal fold oscillation, is more commonly employed in research due to the amount of data and the complex, semi-automatic analysis. In this study, we provide a comprehensive evaluation of methods that detect fully automatically the glottal midline. We used a biophysical model to simulate different vocal fold oscillations, extended the openly available BAGLS dataset using manual annotations, utilized both, simulations and annotated endoscopic images, to train deep neural networks at different stages of the analysis workflow, and compared these to established computer vision algorithms. We found that classical computer vision perform well on detecting the glottal midline in glottis segmentation data, but are outperformed by deep neural networks on this task. We further suggest GlottisNet, a multi-task neural architecture featuring the simultaneous prediction of both, the opening between the vocal folds and the symmetry axis, leading to a huge step forward towards clinical applicability of quantitative, deep learning-assisted laryngeal endoscopy, by fully automating segmentation and midline detection.
Collapse
|
40
|
Tama BA, Kim DH, Kim G, Kim SW, Lee S. Recent Advances in the Application of Artificial Intelligence in Otorhinolaryngology-Head and Neck Surgery. Clin Exp Otorhinolaryngol 2020; 13:326-339. [PMID: 32631041 PMCID: PMC7669308 DOI: 10.21053/ceo.2020.00654] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Revised: 05/24/2020] [Accepted: 06/09/2020] [Indexed: 12/12/2022] Open
Abstract
This study presents an up-to-date survey of the use of artificial intelligence (AI) in the field of otorhinolaryngology, considering opportunities, research challenges, and research directions. We searched PubMed, the Cochrane Central Register of Controlled Trials, Embase, and the Web of Science. We initially retrieved 458 articles. The exclusion of non-English publications and duplicates yielded a total of 90 remaining studies. These 90 studies were divided into those analyzing medical images, voice, medical devices, and clinical diagnoses and treatments. Most studies (42.2%, 38/90) used AI for image-based analysis, followed by clinical diagnoses and treatments (24 studies). Each of the remaining two subcategories included 14 studies. Machine learning and deep learning have been extensively applied in the field of otorhinolaryngology. However, the performance of AI models varies and research challenges remain.
Collapse
Affiliation(s)
- Bayu Adhi Tama
- Department of Mechanical Engineering, Pohang University of Science and Technology, Pohang, Korea
| | - Do Hyun Kim
- Department of Otolaryngology-Head and Neck Surgery, Seoul St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Gyuwon Kim
- Department of Mechanical Engineering, Pohang University of Science and Technology, Pohang, Korea
| | - Soo Whan Kim
- Department of Otolaryngology-Head and Neck Surgery, Seoul St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Seungchul Lee
- Department of Mechanical Engineering, Pohang University of Science and Technology, Pohang, Korea
- Graduate School of Artificial Intelligence, Pohang University of Science and Technology, Pohang, Korea
| |
Collapse
|
41
|
Cho WK, Choi SH. Comparison of Convolutional Neural Network Models for Determination of Vocal Fold Normality in Laryngoscopic Images. J Voice 2020; 36:590-598. [PMID: 32873430 DOI: 10.1016/j.jvoice.2020.08.003] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Revised: 08/04/2020] [Accepted: 08/04/2020] [Indexed: 01/02/2023]
Abstract
OBJECTIVES Deep learning using convolutional neural networks (CNNs) is widely used in medical imaging research. This study was performed to investigate if vocal fold normality in laryngoscopic images can be determined by CNN-based deep learning and to compare accuracy of CNN models and explore the feasibility of application of deep learning on laryngoscopy. METHODS Laryngoscopy videos were screen-captured and each image was cropped to include abducted vocal fold regions. A total of 2216 image (899 normal, 1317 abnormal) were allocated to training, validation, and test sets. Augmentation of training sets was used to train a constructed CNN model with six layers (CNN6), VGG16, Inception V3, and Xception models. Trained models were applied to the test set; for each model, receiver operating characteristic curves and cutoff values were obtained. Sensitivity, specificity, positive predictive value, negative predictive value, and accuracy were calculated. The best model was employed in video-streams and localization of features was attempted using Grad-CAM. RESULTS All of the trained models showed high area under the receiver operating characteristic curve and the most discriminative cutoff levels of probability of normality were determined to be 35.6%, 61.8%, 13.5%, 39.7% for CNN6, VGG16, Inception V3, and Xception models, respectively. Accuracy of the CNN models selecting normal and abnormal vocal folds in the test set was 82.3%, 99.7%, 99.1%, and 83.8%, respectively. CONCLUSION All four models showed acceptable diagnostic accuracy. Performance of VGG16 and Inception V3 was better than the simple CNN6 model and the recently published Xception model. Real-time classification with a combination of the VGG16 model, OpenCV, and Grad-CAM on a video stream showed the potential clinical applications of the deep learning model in laryngoscopy.
Collapse
Affiliation(s)
- Won Ki Cho
- Departments of Otorhinolaryngology-Head and Neck Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea
| | - Seung-Ho Choi
- Departments of Otorhinolaryngology-Head and Neck Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea.
| |
Collapse
|
42
|
Parker F, Brodsky MB, Akst LM, Ali H. Machine Learning in Laryngoscopy Analysis: A Proof of Concept Observational Study for the Identification of Post-Extubation Ulcerations and Granulomas. Ann Otol Rhinol Laryngol 2020; 130:286-291. [PMID: 32795159 DOI: 10.1177/0003489420950364] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
OBJECTIVE Computer-aided analysis of laryngoscopy images has potential to add objectivity to subjective evaluations. Automated classification of biomedical images is extremely challenging due to the precision required and the limited amount of annotated data available for training. Convolutional neural networks (CNNs) have the potential to improve image analysis and have demonstrated good performance in many settings. This study applied machine-learning technologies to laryngoscopy to determine the accuracy of computer recognition of known laryngeal lesions found in patients post-extubation. METHODS This is a proof of concept study that used a convenience sample of transnasal, flexible, distal-chip laryngoscopy images from patients post-extubation in the intensive care unit. After manually annotating images at the pixel-level, we applied a CNN-based method for analysis of granulomas and ulcerations to test potential machine-learning approaches for laryngoscopy analysis. RESULTS A total of 127 images from 25 patients were manually annotated for presence and shape of these lesions-100 for training, 27 for evaluating the system. There were 193 ulcerations (148 in the training set; 45 in the evaluation set) and 272 granulomas (208 in the training set; 64 in the evaluation set) identified. Time to annotate each image was approximately 3 minutes. Machine-based analysis demonstrated per-pixel sensitivity of 82.0% and 62.8% for granulomas and ulcerations respectively; specificity was 99.0% and 99.6%. CONCLUSION This work demonstrates the feasibility of machine learning via CNN-based methods to add objectivity to laryngoscopy analysis, suggesting that CNN may aid in laryngoscopy analysis for other conditions in the future.
Collapse
Affiliation(s)
- Felix Parker
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Martin B Brodsky
- Department of Physical Medicine and Rehabilitation, Johns Hopkins University, Baltimore, MD, USA.,Division Pulmonary and Critical Care Medicine, Johns Hopkins University, Baltimore, MD, USA.,Outcomes After Critical Illness and Surgery (OACIS) Research Group, Johns Hopkins University, Baltimore, MD, USA
| | - Lee M Akst
- Department of Otolaryngology - Head and Neck Surgery, Johns Hopkins University, Baltimore, MD, USA
| | - Haider Ali
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
43
|
Belagali V, Rao M V A, Gopikishore P, Krishnamurthy R, Ghosh PK. Two step convolutional neural network for automatic glottis localization and segmentation in stroboscopic videos. BIOMEDICAL OPTICS EXPRESS 2020; 11:4695-4713. [PMID: 32923072 PMCID: PMC7449707 DOI: 10.1364/boe.396252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Revised: 07/16/2020] [Accepted: 07/16/2020] [Indexed: 06/11/2023]
Abstract
Precise analysis of the vocal fold vibratory pattern in a stroboscopic video plays a key role in the evaluation of voice disorders. Automatic glottis segmentation is one of the preliminary steps in such analysis. In this work, it is divided into two subproblems namely, glottis localization and glottis segmentation. A two step convolutional neural network (CNN) approach is proposed for the automatic glottis segmentation. Data augmentation is carried out using two techniques : 1) Blind rotation (WB), 2) Rotation with respect to glottis orientation (WO). The dataset used in this study contains stroboscopic videos of 18 subjects with Sulcus vocalis, in which the glottis region is annotated by three speech language pathologists (SLPs). The proposed two step CNN approach achieves an average localization accuracy of 90.08% and a mean dice score of 0.65.
Collapse
Affiliation(s)
- Varun Belagali
- Computer Science and Engineering, RV College of Engineering, Bangalore 560059, India
| | - Achuth Rao M V
- Electrical Engineering, Indian Institute of Science, Bangalore 560012, India
| | | | - Rahul Krishnamurthy
- Department of Audiology and Speech Language Pathology, Kasturba Medical College, Mangalore, Manipal Academy of Higher Education, Manipal, India
| | | |
Collapse
|
44
|
Schlegel P, Kniesburges S, Dürr S, Schützenberger A, Döllinger M. Machine learning based identification of relevant parameters for functional voice disorders derived from endoscopic high-speed recordings. Sci Rep 2020; 10:10517. [PMID: 32601277 PMCID: PMC7324600 DOI: 10.1038/s41598-020-66405-y] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Accepted: 05/20/2020] [Indexed: 11/13/2022] Open
Abstract
In voice research and clinical assessment, many objective parameters are in use. However, there is no commonly used set of parameters that reflect certain voice disorders, such as functional dysphonia (FD); i.e. disorders with no visible anatomical changes. Hence, 358 high-speed videoendoscopy (HSV) recordings (159 normal females (NF), 101 FD females (FDF), 66 normal males (NM), 32 FD males (FDM)) were analyzed. We investigated 91 quantitative HSV parameters towards their significance. First, 25 highly correlated parameters were discarded. Second, further 54 parameters were discarded by using a LogitBoost decision stumps approach. This yielded a subset of 12 parameters sufficient to reflect functional dysphonia. These parameters separated groups NF vs. FDF and NM vs. FDM with fair accuracy of 0.745 or 0.768, respectively. Parameters solely computed from the changing glottal area waveform (1D-function called GAW) between the vocal folds were less important than parameters describing the oscillation characteristics along the vocal folds (2D-function called Phonovibrogram). Regularity of GAW phases and peak shape, harmonic structure and Phonovibrogram-based vocal fold open and closing angles were mainly important. This study showed the high degree of redundancy of HSV-voice-parameters but also affirms the need of multidimensional based assessment of clinical data.
Collapse
Affiliation(s)
- Patrick Schlegel
- Department of Otorhinolaryngology, Division of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany.
| | - Stefan Kniesburges
- Department of Otorhinolaryngology, Division of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Stephan Dürr
- Department of Otorhinolaryngology, Division of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Anne Schützenberger
- Department of Otorhinolaryngology, Division of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Michael Döllinger
- Department of Otorhinolaryngology, Division of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
45
|
Automated Surgical Instrument Detection from Laparoscopic Gastrectomy Video Images Using an Open Source Convolutional Neural Network Platform. J Am Coll Surg 2020; 230:725-732.e1. [DOI: 10.1016/j.jamcollsurg.2020.01.037] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Revised: 01/31/2020] [Accepted: 01/31/2020] [Indexed: 11/24/2022]
|
46
|
Fehling MK, Grosch F, Schuster ME, Schick B, Lohscheller J. Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network. PLoS One 2020; 15:e0227791. [PMID: 32040514 PMCID: PMC7010264 DOI: 10.1371/journal.pone.0227791] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Accepted: 12/25/2019] [Indexed: 01/22/2023] Open
Abstract
The objective investigation of the dynamic properties of vocal fold vibrations demands the recording and further quantitative analysis of laryngeal high-speed video (HSV). Quantification of the vocal fold vibration patterns requires as a first step the segmentation of the glottal area within each video frame from which the vibrating edges of the vocal folds are usually derived. Consequently, the outcome of any further vibration analysis depends on the quality of this initial segmentation process. In this work we propose for the first time a procedure to fully automatically segment not only the time-varying glottal area but also the vocal fold tissue directly from laryngeal high-speed video (HSV) using a deep Convolutional Neural Network (CNN) approach. Eighteen different Convolutional Neural Network (CNN) network configurations were trained and evaluated on totally 13,000 high-speed video (HSV) frames obtained from 56 healthy and 74 pathologic subjects. The segmentation quality of the best performing Convolutional Neural Network (CNN) model, which uses Long Short-Term Memory (LSTM) cells to take also the temporal context into account, was intensely investigated on 15 test video sequences comprising 100 consecutive images each. As performance measures the Dice Coefficient (DC) as well as the precisions of four anatomical landmark positions were used. Over all test data a mean Dice Coefficient (DC) of 0.85 was obtained for the glottis and 0.91 and 0.90 for the right and left vocal fold (VF) respectively. The grand average precision of the identified landmarks amounts 2.2 pixels and is in the same range as comparable manual expert segmentations which can be regarded as Gold Standard. The method proposed here requires no user interaction and overcomes the limitations of current semiautomatic or computational expensive approaches. Thus, it allows also for the analysis of long high-speed video (HSV)-sequences and holds the promise to facilitate the objective analysis of vocal fold vibrations in clinical routine. The here used dataset including the ground truth will be provided freely for all scientific groups to allow a quantitative benchmarking of segmentation approaches in future.
Collapse
Affiliation(s)
- Mona Kirstin Fehling
- Department of Computer Science, Trier University of Applied Sciences, Schneidershof, Trier, Germany
| | - Fabian Grosch
- Department of Computer Science, Trier University of Applied Sciences, Schneidershof, Trier, Germany
| | - Maria Elke Schuster
- Department of Otorhinolaryngology and Head and Neck Surgery, University of Munich, Campus Grosshadern, München, Germany
| | - Bernhard Schick
- Department of Otorhinolaryngology, Saarland University Hospital, Homburg/Saar, Germany
| | - Jörg Lohscheller
- Department of Computer Science, Trier University of Applied Sciences, Schneidershof, Trier, Germany
| |
Collapse
|
47
|
Ye S, Nedzvedz A, Ye F, Ablameyko S. Segmentation and Feature Extraction of Endoscopic Images for Making Diagnosis of Acute Appendicitis. PATTERN RECOGNITION AND IMAGE ANALYSIS 2019. [DOI: 10.1134/s1054661819040205] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
48
|
Turkmen HI, Karsligil ME. Advanced computing solutions for analysis of laryngeal disorders. Med Biol Eng Comput 2019; 57:2535-2552. [DOI: 10.1007/s11517-019-02031-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Accepted: 08/13/2019] [Indexed: 11/29/2022]
|