1
|
Alter IL, Chan K, Lechien J, Rameau A. An introduction to machine learning and generative artificial intelligence for otolaryngologists-head and neck surgeons: a narrative review. Eur Arch Otorhinolaryngol 2024; 281:2723-2731. [PMID: 38393353 DOI: 10.1007/s00405-024-08512-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Accepted: 01/25/2024] [Indexed: 02/25/2024]
Abstract
PURPOSE Despite the robust expansion of research surrounding artificial intelligence (AI) and machine learning (ML) and their applications to medicine, these methodologies often remain opaque and inaccessible to many otolaryngologists. Especially, with the increasing ubiquity of large-language models (LLMs), such as ChatGPT and their potential implementation in clinical practice, clinicians may benefit from a baseline understanding of some aspects of AI. In this narrative review, we seek to clarify underlying concepts, illustrate applications to otolaryngology, and highlight future directions and limitations of these tools. METHODS Recent literature regarding AI principles and otolaryngologic applications of ML and LLMs was reviewed via search in PubMed and Google Scholar. RESULTS Significant recent strides have been made in otolaryngology research utilizing AI and ML, across all subspecialties, including neurotology, head and neck oncology, laryngology, rhinology, and sleep surgery. Potential applications suggested by recent publications include screening and diagnosis, predictive tools, clinical decision support, and clinical workflow improvement via LLMs. Ongoing concerns regarding AI in medicine include ethical concerns around bias and data sharing, as well as the "black box" problem and limitations in explainability. CONCLUSIONS Potential implementations of AI in otolaryngology are rapidly expanding. While implementation in clinical practice remains theoretical for most of these tools, their potential power to influence the practice of otolaryngology is substantial. LEVEL OF EVIDENCE: 4
Collapse
Affiliation(s)
- Isaac L Alter
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medical College, 240 E 59 St, New York, NY, 10022, USA
| | - Karly Chan
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medical College, 240 E 59 St, New York, NY, 10022, USA
| | - Jérome Lechien
- Department of Otorhinolaryngology, Head and Neck Surgery, Hôpital Foch, School of Medicine, UFR Simone Veil, Université Versailles Saint-Quentin-en-Yvelines (Paris Saclay University), Paris, France
- Department of Human Anatomy and Experimental Oncology, Faculty of Medicine, UMONS Research Institute for Health and Sciences Technology, University of Mons (UMons), Mons, Belgium
| | - Anaïs Rameau
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medical College, 240 E 59 St, New York, NY, 10022, USA.
| |
Collapse
|
2
|
Bélisle-Pipon JC, Powell M, English R, Malo MF, Ravitsky V, Bensoussan Y. Stakeholder perspectives on ethical and trustworthy voice AI in health care. Digit Health 2024; 10:20552076241260407. [PMID: 39055787 PMCID: PMC11271113 DOI: 10.1177/20552076241260407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Accepted: 05/21/2024] [Indexed: 07/27/2024] Open
Abstract
Objective Voice as a health biomarker using artificial intelligence (AI) is gaining momentum in research. The noninvasiveness of voice data collection through accessible technology (such as smartphones, telehealth, and ambient recordings) or within clinical contexts means voice AI may help address health disparities and promote the inclusion of marginalized communities. However, the development of AI-ready voice datasets free from bias and discrimination is a complex task. The objective of this study is to better understand the perspectives of engaged and interested stakeholders regarding ethical and trustworthy voice AI, to inform both further ethical inquiry and technology innovation. Methods A questionnaire was administered to voice AI experts, clinicians, scholars, patients, trainees, and policy-makers who participated at the 2023 Voice AI Symposium organized by the Bridge2AI-Voice AI Consortium. The survey used a mix of Likert scale, ranking and open-ended questions. A total of 27 stakeholders participated in the study. Results The main results of the study are the identification of priorities in terms of ethical issues, an initial definition of ethically sourced data for voice AI, insights into the use of synthetic voice data, and proposals for acting on the trustworthiness of voice AI. The study shows a diversity of perspectives and adds nuance to the planning and development of ethical and trustworthy voice AI. Conclusions This study represents the first stakeholder survey related to voice as a biomarker of health published to date. This study sheds light on the critical importance of ethics and trustworthiness in the development of voice AI technologies for health applications.
Collapse
Affiliation(s)
| | - Maria Powell
- Vanderbilt University Medical Center, Department of Otolaryngology-Head & Neck Surgery, Nashville, TN, Canada
| | - Renee English
- Faculty of Health Sciences, Simon Fraser University, Burnaby, BC, Canada
| | | | - Vardit Ravitsky
- Hastings Center, Garrison, NY, USA
- Department of Global Health and Social Medicine, Harvard University, Cambridge, MA, USA
| | | | - Yael Bensoussan
- Department of Otolaryngology-Head & Neck Surgery, University of South Florida, Tampa, FL, USA
| |
Collapse
|
3
|
Chato L, Regentova E. Survey of Transfer Learning Approaches in the Machine Learning of Digital Health Sensing Data. J Pers Med 2023; 13:1703. [PMID: 38138930 PMCID: PMC10744730 DOI: 10.3390/jpm13121703] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 12/01/2023] [Accepted: 12/08/2023] [Indexed: 12/24/2023] Open
Abstract
Machine learning and digital health sensing data have led to numerous research achievements aimed at improving digital health technology. However, using machine learning in digital health poses challenges related to data availability, such as incomplete, unstructured, and fragmented data, as well as issues related to data privacy, security, and data format standardization. Furthermore, there is a risk of bias and discrimination in machine learning models. Thus, developing an accurate prediction model from scratch can be an expensive and complicated task that often requires extensive experiments and complex computations. Transfer learning methods have emerged as a feasible solution to address these issues by transferring knowledge from a previously trained task to develop high-performance prediction models for a new task. This survey paper provides a comprehensive study of the effectiveness of transfer learning for digital health applications to enhance the accuracy and efficiency of diagnoses and prognoses, as well as to improve healthcare services. The first part of this survey paper presents and discusses the most common digital health sensing technologies as valuable data resources for machine learning applications, including transfer learning. The second part discusses the meaning of transfer learning, clarifying the categories and types of knowledge transfer. It also explains transfer learning methods and strategies, and their role in addressing the challenges in developing accurate machine learning models, specifically on digital health sensing data. These methods include feature extraction, fine-tuning, domain adaptation, multitask learning, federated learning, and few-/single-/zero-shot learning. This survey paper highlights the key features of each transfer learning method and strategy, and discusses the limitations and challenges of using transfer learning for digital health applications. Overall, this paper is a comprehensive survey of transfer learning methods on digital health sensing data which aims to inspire researchers to gain knowledge of transfer learning approaches and their applications in digital health, enhance the current transfer learning approaches in digital health, develop new transfer learning strategies to overcome the current limitations, and apply them to a variety of digital health technologies.
Collapse
Affiliation(s)
- Lina Chato
- Department of Electrical and Computer Engineering, University of Nevada, Las Vegas, NV 89154, USA;
| | | |
Collapse
|
4
|
Tran BA, Dao TTP, Dung HDQ, Van NB, Ha CC, Pham NH, Nguyen TCHTNC, Nguyen TC, Pham MK, Tran MK, Tran TM, Tran MT. Support of deep learning to classify vocal fold images in flexible laryngoscopy. Am J Otolaryngol 2023; 44:103800. [PMID: 36905912 DOI: 10.1016/j.amjoto.2023.103800] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 02/19/2023] [Indexed: 02/26/2023]
Abstract
PURPOSE To collect a dataset with adequate laryngoscopy images and identify the appearance of vocal folds and their lesions in flexible laryngoscopy images by objective deep learning models. METHODS We adopted a number of novel deep learning models to train and classify 4549 flexible laryngoscopy images as no vocal fold, normal vocal folds, and abnormal vocal folds. This could help these models recognize vocal folds and their lesions within these images. Ultimately, we made a comparison between the results of the state-of-the-art deep learning models, and another comparison of the results between the computer-aided classification system and ENT doctors. RESULTS This study exhibited the performance of the deep learning models by evaluating laryngoscopy images collected from 876 patients. The efficiency of the Xception model was higher and steadier than almost the rest of the models. The accuracy of no vocal fold, normal vocal folds, and vocal fold abnormalities on this model were 98.90 %, 97.36 %, and 96.26 %, respectively. Compared to our ENT doctors, the Xception model produced better results than a junior doctor and was near an expert. CONCLUSION Our results show that current deep learning models can classify vocal fold images well and effectively assist physicians in vocal fold identification and classification of normal or abnormal vocal folds.
Collapse
Affiliation(s)
- Bich Anh Tran
- Otorhinolaryngology Department, Cho Ray Hospital, Ho Chi Minh City, Viet Nam.
| | - Thao Thi Phuong Dao
- University of Science, VNUHCM, Ho Chi Minh City, Viet Nam; John von Neumann Institute, VNUHCM, Ho Chi Minh City, Viet Nam; Vietnam National University, Ho Chi Minh City, Viet Nam; Department of Otolaryngology, Thong Nhat Hospital, Ho Chi Minh City, Viet Nam.
| | - Ho Dang Quy Dung
- Department of Endoscopy, Cho Ray Hospital, Ho Chi Minh City, Viet Nam.
| | - Ngoc Boi Van
- Department of Otolaryngology, Vinmec Central Park International Hospital, Ho Chi Minh City, Viet Nam.
| | - Chanh Cong Ha
- Department of Otolaryngology, 7A Military Hospital, Ho Chi Minh City, Viet Nam.
| | - Nam Hoang Pham
- Otorhinolaryngology Department, Cho Ray Hospital, Ho Chi Minh City, Viet Nam.
| | | | - Tan-Cong Nguyen
- University of Science, VNUHCM, Ho Chi Minh City, Viet Nam; University of Social Sciences and Humanities, VNUHCM, Ho Chi Minh City, Vietnam; Vietnam National University, Ho Chi Minh City, Viet Nam.
| | - Minh-Khoi Pham
- University of Science, VNUHCM, Ho Chi Minh City, Viet Nam; Vietnam National University, Ho Chi Minh City, Viet Nam.
| | - Mai-Khiem Tran
- University of Science, VNUHCM, Ho Chi Minh City, Viet Nam; John von Neumann Institute, VNUHCM, Ho Chi Minh City, Viet Nam; Vietnam National University, Ho Chi Minh City, Viet Nam.
| | - Truong Minh Tran
- Otorhinolaryngology Department, Cho Ray Hospital, Ho Chi Minh City, Viet Nam.
| | - Minh-Triet Tran
- University of Science, VNUHCM, Ho Chi Minh City, Viet Nam; John von Neumann Institute, VNUHCM, Ho Chi Minh City, Viet Nam; Vietnam National University, Ho Chi Minh City, Viet Nam.
| |
Collapse
|
5
|
Peterson QA, Fei T, Sy LE, Froeschke LL, Mendelsohn AH, Berke GS, Peterson DA. Correlating Perceptual Voice Quality in Adductor Spasmodic Dysphonia With Computer Vision Assessment of Glottal Geometry Dynamics. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:3695-3708. [PMID: 36130065 PMCID: PMC9927624 DOI: 10.1044/2022_jslhr-22-00053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
PURPOSE This study examined the relationship between voice quality and glottal geometry dynamics in patients with adductor spasmodic dysphonia (ADSD). METHOD An objective computer vision and machine learning system was developed to extract glottal geometry dynamics from nasolaryngoscopic video recordings for 78 patients with ADSD. General regression models were used to examine the relationship between overall voice quality and 15 variables that capture glottal geometry dynamics derived from the computer vision system. Two experts in ADSD independently rated voice quality for two separate voice tasks for every patient, yielding four different voice quality rating models. RESULTS All four of the regression models exhibited positive correlations with clinical assessments of voice quality (R 2s = .30-.34, Spearman rho = .55-.61, all with p < .001). Seven to 10 variables were included in each model. There was high overlap in the variables included between the four models, and the sign of the correlation with voice quality was consistent for each variable across all four regression models. CONCLUSION We found specific glottal geometry dynamics that correspond to voice quality in ADSD.
Collapse
Affiliation(s)
- Quinn A. Peterson
- Department of Computer Science and Software Engineering, California Polytechnic State University, San Luis Obispo
| | - Teng Fei
- Department of Cognitive Science, University of California, San Diego, La Jolla
| | - Lauren E. Sy
- Department of Cognitive Science, University of California, San Diego, La Jolla
| | | | - Abie H. Mendelsohn
- Department of Head and Neck Surgery, David Geffen School of Medicine, University of California, Los Angeles
| | - Gerald S. Berke
- Department of Head and Neck Surgery, David Geffen School of Medicine, University of California, Los Angeles
| | - David A. Peterson
- Institute for Neural Computation, University of California, San Diego, La Jolla
| |
Collapse
|