Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Meskó B. The Impact of Multimodal Large Language Models on Health Care's Future. J Med Internet Res 2023;25:e52865. [PMID: 37917126 PMCID: PMC10654899 DOI: 10.2196/52865] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 10/10/2023] [Accepted: 10/12/2023] [Indexed: 11/03/2023] Open

For:	Meskó B. The Impact of Multimodal Large Language Models on Health Care's Future. J Med Internet Res 2023;25:e52865. [PMID: 37917126 PMCID: PMC10654899 DOI: 10.2196/52865] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 10/10/2023] [Accepted: 10/12/2023] [Indexed: 11/03/2023] Open

Number

Cited by Other Article(s)

Chotcomwongse P, Ruamviboonsuk P, Grzybowski A. Utilizing Large Language Models in Ophthalmology: The Current Landscape and Challenges. Ophthalmol Ther 2024;13:2543-2558. [PMID: 39180701 PMCID: PMC11408418 DOI: 10.1007/s40123-024-01018-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Accepted: 08/01/2024] [Indexed: 08/26/2024] Open

AlSaad R, Abd-Alrazaq A, Boughorbel S, Ahmed A, Renault MA, Damseh R, Sheikh J. Multimodal Large Language Models in Health Care: Applications, Challenges, and Future Outlook. J Med Internet Res 2024;26:e59505. [PMID: 39321458 DOI: 10.2196/59505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Revised: 08/07/2024] [Accepted: 08/20/2024] [Indexed: 09/27/2024] Open

Yanagita Y, Yokokawa D, Uchida S, Li Y, Uehara T, Ikusaka M. Can AI-Generated Clinical Vignettes in Japanese Be Used Medically and Linguistically? J Gen Intern Med 2024:10.1007/s11606-024-09031-y. [PMID: 39313665 DOI: 10.1007/s11606-024-09031-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 09/10/2024] [Indexed: 09/25/2024]

Abstract

BACKGROUND

Creating clinical vignettes requires considerable effort. Recent developments in generative artificial intelligence (AI) for natural language processing have been remarkable and may allow for the easy and immediate creation of diverse clinical vignettes.

OBJECTIVE

In this study, we evaluated the medical accuracy and grammatical correctness of AI-generated clinical vignettes in Japanese and verified their usefulness.

METHODS

Clinical vignettes were created using the generative AI model GPT-4-0613. The input prompts for the clinical vignettes specified the following seven elements: (1) age, (2) sex, (3) chief complaint and time course since onset, (4) physical findings, (5) examination results, (6) diagnosis, and (7) treatment course. The list of diseases integrated into the vignettes was based on 202 cases considered in the management of diseases and symptoms in Japan's Primary Care Physicians Training Program. The clinical vignettes were evaluated for medical and Japanese-language accuracy by three physicians using a five-point scale. A total score of 13 points or above was defined as "sufficiently beneficial and immediately usable with minor revisions," a score between 10 and 12 points was defined as "partly insufficient and in need of modifications," and a score of 9 points or below was defined as "insufficient."

RESULTS

Regarding medical accuracy, of the 202 clinical vignettes, 118 scored 13 points or above, 78 scored between 10 and 12 points, and 6 scored 9 points or below. Regarding Japanese-language accuracy, 142 vignettes scored 13 points or above, 56 scored between 10 and 12 points, and 4 scored 9 points or below. Overall, 97% (196/202) of vignettes were available with some modifications.

CONCLUSION

Overall, 97% of the clinical vignettes proved practically useful, based on confirmation and revision by Japanese medical physicians. Given the significant effort required by physicians to create vignettes without AI, using GPT is expected to greatly optimize this process.

Collapse

Wang Z, Yang W, Li Z, Rong Z, Wang X, Han J, Ma L. A 25-Year Retrospective of the Use of AI for Diagnosing Acute Stroke: Systematic Review. J Med Internet Res 2024;26:e59711. [PMID: 39255472 PMCID: PMC11422733 DOI: 10.2196/59711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Revised: 06/25/2024] [Accepted: 07/15/2024] [Indexed: 09/12/2024] Open

Pool J, Indulska M, Sadiq S. Large language models and generative AI in telehealth: a responsible use lens. J Am Med Inform Assoc 2024;31:2125-2136. [PMID: 38441296 PMCID: PMC11339524 DOI: 10.1093/jamia/ocae035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 02/05/2024] [Accepted: 02/14/2024] [Indexed: 08/23/2024] Open

Luo MJ, Pang J, Bi S, Lai Y, Zhao J, Shang Y, Cui T, Yang Y, Lin Z, Zhao L, Wu X, Lin D, Chen J, Lin H. Development and Evaluation of a Retrieval-Augmented Large Language Model Framework for Ophthalmology. JAMA Ophthalmol 2024;142:798-805. [PMID: 39023885 PMCID: PMC11258636 DOI: 10.1001/jamaophthalmol.2024.2513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 05/14/2024] [Indexed: 07/20/2024]

Abstract

Importance

Although augmenting large language models (LLMs) with knowledge bases may improve medical domain-specific performance, practical methods are needed for local implementation of LLMs that address privacy concerns and enhance accessibility for health care professionals.

Objective

To develop an accurate, cost-effective local implementation of an LLM to mitigate privacy concerns and support their practical deployment in health care settings.

Design, Setting, and Participants

ChatZOC (Sun Yat-Sen University Zhongshan Ophthalmology Center), a retrieval-augmented LLM framework, was developed by enhancing a baseline LLM with a comprehensive ophthalmic dataset and evaluation framework (CODE), which includes over 30 000 pieces of ophthalmic knowledge. This LLM was benchmarked against 10 representative LLMs, including GPT-4 and GPT-3.5 Turbo (OpenAI), across 300 clinical questions in ophthalmology. The evaluation, involving a panel of medical experts and biomedical researchers, focused on accuracy, utility, and safety. A double-masked approach was used to try to minimize bias assessment across all models. The study used a comprehensive knowledge base derived from ophthalmic clinical practice, without directly involving clinical patients.

Exposures

LLM response to clinical questions.

Main Outcomes and Measures

Accuracy, utility, and safety of LLMs in responding to clinical questions.

Results

The baseline model achieved a human ranking score of 0.48. The retrieval-augmented LLM had a score of 0.60, a difference of 0.12 (95% CI, 0.02-0.22; P = .02) from baseline and not different from GPT-4 with a score of 0.61 (difference = 0.01; 95% CI, -0.11 to 0.13; P = .89). For scientific consensus, the retrieval-augmented LLM was 84.0% compared with the baseline model of 46.5% (difference = 37.5%; 95% CI, 29.0%-46.0%; P < .001) and not different from GPT-4 with a value of 79.2% (difference = 4.8%; 95% CI, -0.3% to 10.0%; P = .06).

Conclusions and Relevance

Results of this quality improvement study suggest that the integration of high-quality knowledge bases improved the LLM's performance in medical domains. This study highlights the transformative potential of augmented LLMs in clinical practice by providing reliable, safe, and practical clinical information. Further research is needed to explore the broader application of such frameworks in the real world.

Collapse

Affiliation(s)

Ming-Jie Luo State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
Jianyu Pang State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
Shaowei Bi State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
Yunxi Lai State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
Jiaman Zhao State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
Yuanrui Shang The Second Affiliated Hospital of Xi’an Jiaotong University, Xi’an, China
Tingxin Cui State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
Yahan Yang State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
Zhenzhe Lin State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
Lanqin Zhao State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
Xiaohang Wu State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
Duoru Lin State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
Jingjing Chen State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
Haotian Lin State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China Center for Precision Medicine and Department of Genetics and Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China Hainan Eye Hospital and Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Haikou, China

Collapse

Kikuchi T, Nakao T, Nakamura Y, Hanaoka S, Mori H, Yoshikawa T. Toward Improved Radiologic Diagnostics: Investigating the Utility and Limitations of GPT-3.5 Turbo and GPT-4 with Quiz Cases. AJNR Am J Neuroradiol 2024:ajnr.A8332. [PMID: 38719605 DOI: 10.3174/ajnr.a8332] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Accepted: 05/03/2024] [Indexed: 08/17/2024]

Abstract

BACKGROUND AND PURPOSE

The rise of large language models such as generative pretrained transformers (GPTs) has sparked considerable interest in radiology, especially in interpreting radiologic reports and image findings. While existing research has focused on GPTs estimating diagnoses from radiologic descriptions, exploring alternative diagnostic information sources is also crucial. This study introduces the use of GPTs (GPT-3.5 Turbo and GPT-4) for information retrieval and summarization, searching relevant case reports via PubMed, and investigates their potential to aid diagnosis.

MATERIALS AND METHODS

From October 2021 to December 2023, we selected 115 cases from the "Case of the Week" series on the American Journal of Neuroradiology website. Their Description and Legend sections were presented to the GPTs for the 2 tasks. For the Direct Diagnosis task, the models provided 3 differential diagnoses that were considered correct if they matched the diagnosis in the diagnosis section. For the Case Report Search task, the models generated 2 keywords per case, creating PubMed search queries to extract up to 3 relevant reports. A response was considered correct if reports containing the disease name stated in the diagnosis section were extracted. The McNemar test was used to evaluate whether adding a Case Report Search to Direct Diagnosis improved overall accuracy.

RESULTS

In the Direct Diagnosis task, GPT-3.5 Turbo achieved a correct response rate of 26% (30/115 cases), whereas GPT-4 achieved 41% (47/115). For the Case Report Search task, GPT-3.5 Turbo scored 10% (11/115), and GPT-4 scored 7% (8/115). Correct responses totaled 32% (37/115) with 3 overlapping cases for GPT-3.5 Turbo, whereas GPT-4 had 43% (50/115) of correct responses with 5 overlapping cases. Adding Case Report Search improved GPT-3.5 Turbo's performance (P = .023) but not that of GPT-4 (P = .248).

CONCLUSIONS

The effectiveness of adding Case Report Search to GPT-3.5 Turbo was particularly pronounced, suggesting its potential as an alternative diagnostic approach to GPTs, particularly in scenarios where direct diagnoses from GPTs are not obtainable. Nevertheless, the overall performance of GPT models in both direct diagnosis and case report retrieval tasks remains not optimal, and users should be aware of their limitations.

Collapse

Sallam M, Al-Mahzoum K, Alshuaib O, Alhajri H, Alotaibi F, Alkhurainej D, Al-Balwah MY, Barakat M, Egger J. Language discrepancies in the performance of generative artificial intelligence models: an examination of infectious disease queries in English and Arabic. BMC Infect Dis 2024;24:799. [PMID: 39118057 PMCID: PMC11308449 DOI: 10.1186/s12879-024-09725-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 08/06/2024] [Indexed: 08/10/2024] Open

Abstract

BACKGROUND

Assessment of artificial intelligence (AI)-based models across languages is crucial to ensure equitable access and accuracy of information in multilingual contexts. This study aimed to compare AI model efficiency in English and Arabic for infectious disease queries.

METHODS

The study employed the METRICS checklist for the design and reporting of AI-based studies in healthcare. The AI models tested included ChatGPT-3.5, ChatGPT-4, Bing, and Bard. The queries comprised 15 questions on HIV/AIDS, tuberculosis, malaria, COVID-19, and influenza. The AI-generated content was assessed by two bilingual experts using the validated CLEAR tool.

RESULTS

In comparing AI models' performance in English and Arabic for infectious disease queries, variability was noted. English queries showed consistently superior performance, with Bard leading, followed by Bing, ChatGPT-4, and ChatGPT-3.5 (P = .012). The same trend was observed in Arabic, albeit without statistical significance (P = .082). Stratified analysis revealed higher scores for English in most CLEAR components, notably in completeness, accuracy, appropriateness, and relevance, especially with ChatGPT-3.5 and Bard. Across the five infectious disease topics, English outperformed Arabic, except for flu queries in Bing and Bard. The four AI models' performance in English was rated as "excellent", significantly outperforming their "above-average" Arabic counterparts (P = .002).

CONCLUSIONS

Disparity in AI model performance was noticed between English and Arabic in response to infectious disease queries. This language variation can negatively impact the quality of health content delivered by AI models among native speakers of Arabic. This issue is recommended to be addressed by AI developers, with the ultimate goal of enhancing health outcomes.

Collapse

Sridharan K, Sivaramakrishnan G. Enhancing readability of USFDA patient communications through large language models: a proof-of-concept study. Expert Rev Clin Pharmacol 2024;17:731-741. [PMID: 38823007 DOI: 10.1080/17512433.2024.2363840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 05/31/2024] [Indexed: 06/03/2024]

MohanaSundaram A, Emran TB. A commentary on 'ChatGPT in medicine: prospects and challenges: a review article' - correspondence. Int J Surg 2024;110:5178-5179. [PMID: 38640507 PMCID: PMC11325996 DOI: 10.1097/js9.0000000000001487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Accepted: 04/09/2024] [Indexed: 04/21/2024]

Leypold T, Schäfer B, Boos AM, Beier JP. Artificial Intelligence-Powered Hand Surgery Consultation: GPT-4 as an Assistant in a Hand Surgery Outpatient Clinic. J Hand Surg Am 2024:S0363-5023(24)00261-2. [PMID: 39066762 DOI: 10.1016/j.jhsa.2024.06.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 05/21/2024] [Accepted: 06/05/2024] [Indexed: 07/30/2024]

Cheng J, Huang C, Zhang J, Wu B, Zhang W, Liu X, Zhang J, Tang Y, Zhou H, Zhang Q, Gu M, Dong J, Zhang X. Multimodal deep learning using on-chip diffractive optics with in situ training capability. Nat Commun 2024;15:6189. [PMID: 39043669 PMCID: PMC11266606 DOI: 10.1038/s41467-024-50677-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 07/18/2024] [Indexed: 07/25/2024] Open

Haider SA, Pressman SM, Borna S, Gomez-Cabello CA, Sehgal A, Leibovich BC, Forte AJ. Evaluating Large Language Model (LLM) Performance on Established Breast Classification Systems. Diagnostics (Basel) 2024;14:1491. [PMID: 39061628 PMCID: PMC11275570 DOI: 10.3390/diagnostics14141491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 06/25/2024] [Accepted: 07/09/2024] [Indexed: 07/28/2024] Open

Yang Z, Wang D, Zhou F, Song D, Zhang Y, Jiang J, Kong K, Liu X, Qiao Y, Chang RT, Han Y, Li F, Tham CC, Zhang X. Understanding natural language: Potential application of large language models to ophthalmology. Asia Pac J Ophthalmol (Phila) 2024;13:100085. [PMID: 39059558 DOI: 10.1016/j.apjo.2024.100085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 06/19/2024] [Accepted: 07/19/2024] [Indexed: 07/28/2024] Open

Affiliation(s)

Zefeng Yang State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
Deming Wang State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
Fengqi Zhou Ophthalmology, Mayo Clinic Health System, Eau Claire, Wisconsin, USA
Diping Song Shanghai Artificial Intelligence Laboratory, Shanghai, China
Yinhang Zhang State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
Jiaxuan Jiang State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
Kangjie Kong State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
Xiaoyi Liu State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
Yu Qiao Shanghai Artificial Intelligence Laboratory, Shanghai, China
Robert T Chang Department of Ophthalmology, Byers Eye Institute at Stanford University, Palo Alto, CA, USA
Ying Han Department of Ophthalmology, University of California, San Francisco, San Francisco, CA, USA
Fei Li State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China.
Clement C Tham Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong SAR, China; Hong Kong Eye Hospital, Kowloon, Hong Kong SAR, China; Department of Ophthalmology and Visual Sciences, Prince of Wales Hospital, Shatin, Hong Kong SAR, China.
Xiulan Zhang State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China.

Collapse

Ong JCL, Chang SYH, William W, Butte AJ, Shah NH, Chew LST, Liu N, Doshi-Velez F, Lu W, Savulescu J, Ting DSW. Ethical and regulatory challenges of large language models in medicine. Lancet Digit Health 2024;6:e428-e432. [PMID: 38658283 DOI: 10.1016/s2589-7500(24)00061-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Revised: 03/08/2024] [Accepted: 03/12/2024] [Indexed: 04/26/2024]

Affiliation(s)

Jasmine Chiat Ling Ong Division of Pharmacy, Singapore General Hospital, Singapore; Duke-NUS Medical School, National University of Singapore, Singapore
Shelley Yin-Hsi Chang Department of Ophthalmology, Chang Gung Memorial Hospital, Linkou Medical Center, Taoyuan, Taiwan; College of Medicine, Chang Gung University, Taoyuan, Taiwan
Wasswa William Department of Biomedical Sciences and Engineering, Mbarara University of Science and Technology, Mbarara, Uganda
Atul J Butte Bakar Computational Health Sciences Institute, and Department of Pediatrics, University of California, San Francisco, San Francisco, CA, USA; Center for Data-Driven Insights and Innovation, University of California Health, Oakland, CA, USA
Nigam H Shah Stanford Health Care, Palo Alto, CA, USA; Department of Medicine, and Clinical Excellence Research Center, School of Medicine, Stanford University, Stanford, CA, USA
Lita Sui Tjien Chew Department of Pharmacy, National University of Singapore, Singapore; Singapore Health Services, Pharmacy and Therapeutics Council Office, Singapore; Department of Pharmacy, National Cancer Centre Singapore, Singapore
Nan Liu Duke-NUS Medical School, National University of Singapore, Singapore
Finale Doshi-Velez Harvard Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
Wei Lu StatNLP Research Group, Singapore University of Technology and Design, Singpore
Julian Savulescu Murdoch Children's Research Institute, Melbourne, VIC, Australia; Centre for Biomedical Ethics, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Oxford Uehiro Centre for Practical Ethics, Faculty of Philosophy, University of Oxford, Oxford, UK
Daniel Shu Wei Ting Duke-NUS Medical School, National University of Singapore, Singapore; Artificial Intelligence and Digital Innovation, Singapore Eye Research Institute, Singapore National Eye Center, Singapore Health Service, Singapore; Byers Eye Institute, Stanford University, Palo Alto, CA, USA.

Collapse

Tan S, Xin X, Wu D. ChatGPT in medicine: prospects and challenges: a review article. Int J Surg 2024;110:3701-3706. [PMID: 38502861 PMCID: PMC11175750 DOI: 10.1097/js9.0000000000001312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 02/26/2024] [Indexed: 03/21/2024]

Naqvi WM, Shaikh SZ, Mishra GV. Large language models in physical therapy: time to adapt and adept. Front Public Health 2024;12:1364660. [PMID: 38887241 PMCID: PMC11182445 DOI: 10.3389/fpubh.2024.1364660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 05/10/2024] [Indexed: 06/20/2024] Open

Leypold T, Lingens LF, Beier JP, Boos AM. Integrating AI in Lipedema Management: Assessing the Efficacy of GPT-4 as a Consultation Assistant. Life (Basel) 2024;14:646. [PMID: 38792666 PMCID: PMC11122530 DOI: 10.3390/life14050646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 05/09/2024] [Accepted: 05/16/2024] [Indexed: 05/26/2024] Open

Abstract

The role of artificial intelligence (AI) in healthcare is evolving, offering promising avenues for enhancing clinical decision making and patient management. Limited knowledge about lipedema often leads to patients being frequently misdiagnosed with conditions like lymphedema or obesity rather than correctly identifying lipedema. Furthermore, patients with lipedema often present with intricate and extensive medical histories, resulting in significant time consumption during consultations. AI could, therefore, improve the management of these patients. This research investigates the utilization of OpenAI's Generative Pre-Trained Transformer 4 (GPT-4), a sophisticated large language model (LLM), as an assistant in consultations for lipedema patients. Six simulated scenarios were designed to mirror typical patient consultations commonly encountered in a lipedema clinic. GPT-4 was tasked with conducting patient interviews to gather medical histories, presenting its findings, making preliminary diagnoses, and recommending further diagnostic and therapeutic actions. Advanced prompt engineering techniques were employed to refine the efficacy, relevance, and accuracy of GPT-4's responses. A panel of experts in lipedema treatment, using a Likert Scale, evaluated GPT-4's responses across six key criteria. Scoring ranged from 1 (lowest) to 5 (highest), with GPT-4 achieving an average score of 4.24, indicating good reliability and applicability in a clinical setting. This study is one of the initial forays into applying large language models like GPT-4 in specific clinical scenarios, such as lipedema consultations. It demonstrates the potential of AI in supporting clinical practices and emphasizes the continuing importance of human expertise in the medical field, despite ongoing technological advancements.

Collapse

Bitterman DS, Downing A, Maués J, Lustberg M. Promise and Perils of Large Language Models for Cancer Survivorship and Supportive Care. J Clin Oncol 2024;42:1607-1611. [PMID: 38452323 PMCID: PMC11095890 DOI: 10.1200/jco.23.02439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 12/07/2023] [Accepted: 01/17/2024] [Indexed: 03/09/2024] Open

Esmaeilzadeh P. Challenges and strategies for wide-scale artificial intelligence (AI) deployment in healthcare practices: A perspective for healthcare organizations. Artif Intell Med 2024;151:102861. [PMID: 38555850 DOI: 10.1016/j.artmed.2024.102861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 03/19/2024] [Accepted: 03/25/2024] [Indexed: 04/02/2024]

Abstract

Healthcare organizations have realized that Artificial intelligence (AI) can provide a competitive edge through personalized patient experiences, improved patient outcomes, early diagnosis, augmented clinician capabilities, enhanced operational efficiencies, or improved medical service accessibility. However, deploying AI-driven tools in the healthcare ecosystem could be challenging. This paper categorizes AI applications in healthcare and comprehensively examines the challenges associated with deploying AI in medical practices at scale. As AI continues to make strides in healthcare, its integration presents various challenges, including production timelines, trust generation, privacy concerns, algorithmic biases, and data scarcity. The paper highlights that flawed business models and wrong workflows in healthcare practices cannot be rectified merely by deploying AI-driven tools. Healthcare organizations should re-evaluate root problems such as misaligned financial incentives (e.g., fee-for-service models), dysfunctional medical workflows (e.g., high rates of patient readmissions), poor care coordination between different providers, fragmented electronic health records systems, and inadequate patient education and engagement models in tandem with AI adoption. This study also explores the need for a cultural shift in viewing AI not as a threat but as an enabler that can enhance healthcare delivery and create new employment opportunities while emphasizing the importance of addressing underlying operational issues. The necessity of investments beyond finance is discussed, emphasizing the importance of human capital, continuous learning, and a supportive environment for AI integration. The paper also highlights the crucial role of clear regulations in building trust, ensuring safety, and guiding the ethical use of AI, calling for coherent frameworks addressing transparency, model accuracy, data quality control, liability, and ethics. Furthermore, this paper underscores the importance of advancing AI literacy within academia to prepare future healthcare professionals for an AI-driven landscape. Through careful navigation and proactive measures addressing these challenges, the healthcare community can harness AI's transformative power responsibly and effectively, revolutionizing healthcare delivery and patient care. The paper concludes with a vision and strategic suggestions for the future of healthcare with AI, emphasizing thoughtful, responsible, and innovative engagement as the pathway to realizing its full potential to unlock immense benefits for healthcare organizations, physicians, nurses, and patients while proactively mitigating risks.

Collapse

Sheikh MS, Thongprayoon C, Suppadungsuk S, Miao J, Qureshi F, Kashani K, Cheungpasitporn W. Evaluating ChatGPT's Accuracy in Responding to Patient Education Questions on Acute Kidney Injury and Continuous Renal Replacement Therapy. Blood Purif 2024;53:725-731. [PMID: 38679000 DOI: 10.1159/000539065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Accepted: 04/19/2024] [Indexed: 05/01/2024]

Abstract

INTRODUCTION

Acute kidney injury (AKI) and continuous renal replacement therapy (CRRT) are critical areas in nephrology. The effectiveness of ChatGPT in simpler, patient education-oriented questions has not been thoroughly assessed. This study evaluates the proficiency of ChatGPT 4.0 in responding to such questions, subjected to various linguistic alterations.

METHODS

Eighty-nine questions were sourced from the Mayo Clinic Handbook for educating patients on AKI and CRRT. These questions were categorized as original, paraphrased with different interrogative adverbs, paraphrased resulting in incomplete sentences, and paraphrased containing misspelled words. Two nephrologists verified the questions for medical accuracy. A χ2 test was conducted to ascertain notable discrepancies in ChatGPT 4.0's performance across these formats.

RESULTS

ChatGPT provided notable accuracy in handling a variety of question formats for patient education in AKI and CRRT. Across all question types, ChatGPT demonstrated an accuracy of 97% for both original and adverb-altered questions and 98% for questions with incomplete sentences or misspellings. Specifically for AKI-related questions, the accuracy was consistently maintained at 97% for all versions. In the subset of CRRT-related questions, the tool achieved a 96% accuracy for original and adverb-altered questions, and this increased to 98% for questions with incomplete sentences or misspellings. The statistical analysis revealed no significant difference in performance across these varied question types (p value: 1.00 for AKI and 1.00 for CRRT), and there was no notable disparity between the artificial intelligence (AI)'s responses to AKI and CRRT questions (p value: 0.71).

CONCLUSION

ChatGPT 4.0 demonstrates consistent and high accuracy in interpreting and responding to queries related to AKI and CRRT, irrespective of linguistic modifications. These findings suggest that ChatGPT 4.0 has the potential to be a reliable support tool in the delivery of patient education, by accurately providing information across a range of question formats. Further research is needed to explore the direct impact of AI-generated responses on patient understanding and education outcomes.

Collapse

Araújo R, Ramalhete L, Viegas A, Von Rekowski CP, Fonseca TAH, Calado CRC, Bento L. Simplifying Data Analysis in Biomedical Research: An Automated, User-Friendly Tool. Methods Protoc 2024;7:36. [PMID: 38804330 PMCID: PMC11130801 DOI: 10.3390/mps7030036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 04/20/2024] [Accepted: 04/22/2024] [Indexed: 05/29/2024] Open

Affiliation(s)

Rúben Araújo NMS—NOVA Medical School, FCM—Faculdade de Ciências Médicas, Universidade NOVA de Lisboa, Campo Mártires da Pátria 130, 1169-056 Lisbon, Portugal CHRC—Comprehensive Health Research Centre, Universidade NOVA de Lisboa, 1150-082 Lisbon, Portugal ISEL—Instituto Superior de Engenharia de Lisboa, Instituto Politécnico de Lisboa, Rua Conselheiro Emídio Navarro 1, 1959-007 Lisbon, Portugal
Luís Ramalhete NMS—NOVA Medical School, FCM—Faculdade de Ciências Médicas, Universidade NOVA de Lisboa, Campo Mártires da Pátria 130, 1169-056 Lisbon, Portugal Blood and Transplantation Center of Lisbon, IPST—Instituto Português do Sangue e da Transplantação, Alameda das Linhas de Torres 117, 1769-001 Lisbon, Portugal iNOVA4Health—Advancing Precision Medicine, RG11: Reno-Vascular Diseases Group, NMS—NOVA Medical School, FCM—Faculdade de Ciências Médicas, Universidade NOVA de Lisboa, 1169-056 Lisbon, Portugal
Ana Viegas CHRC—Comprehensive Health Research Centre, Universidade NOVA de Lisboa, 1150-082 Lisbon, Portugal ESTeSL—Escola Superior de Tecnologia da Saúde de Lisboa, Instituto Politécnico de Lisboa, Avenida D. João II, Lote 4.69.01, 1990-096 Lisbon, Portugal Neurosciences Area, Clinical Neurophysiology Unit, ULSSJ—Unidade Local de Saúde São José, Rua José António Serrano, 1150-199 Lisbon, Portugal
Cristiana P. Von Rekowski NMS—NOVA Medical School, FCM—Faculdade de Ciências Médicas, Universidade NOVA de Lisboa, Campo Mártires da Pátria 130, 1169-056 Lisbon, Portugal CHRC—Comprehensive Health Research Centre, Universidade NOVA de Lisboa, 1150-082 Lisbon, Portugal ISEL—Instituto Superior de Engenharia de Lisboa, Instituto Politécnico de Lisboa, Rua Conselheiro Emídio Navarro 1, 1959-007 Lisbon, Portugal
Tiago A. H. Fonseca NMS—NOVA Medical School, FCM—Faculdade de Ciências Médicas, Universidade NOVA de Lisboa, Campo Mártires da Pátria 130, 1169-056 Lisbon, Portugal CHRC—Comprehensive Health Research Centre, Universidade NOVA de Lisboa, 1150-082 Lisbon, Portugal ISEL—Instituto Superior de Engenharia de Lisboa, Instituto Politécnico de Lisboa, Rua Conselheiro Emídio Navarro 1, 1959-007 Lisbon, Portugal
Cecília R. C. Calado ISEL—Instituto Superior de Engenharia de Lisboa, Instituto Politécnico de Lisboa, Rua Conselheiro Emídio Navarro 1, 1959-007 Lisbon, Portugal Institute for Bioengineering and Biosciences (iBB), The Associate Laboratory Institute for Health and Bioeconomy–i4HB, Instituto Superior Técnico (IST), Universidade de Lisboa (UL), Av. Rovisco Pais, 1049-001 Lisboa, Portugal
Luís Bento Intensive Care Department, ULSSJ—Unidade Local de Saúde São José, Rua José António Serrano, 1150-199 Lisbon, Portugal; Integrated Pathophysiological Mechanisms, CHRC—Comprehensive Health Research Centre, NMS—NOVA Medical School, FCM—Faculdade de Ciências Médicas, Universidade NOVA de Lisboa, Campo Mártires da Pátria 130, 1169-056 Lisbon, Portugal

Collapse

Ostrowska M, Kacała P, Onolememen D, Vaughan-Lane K, Sisily Joseph A, Ostrowski A, Pietruszewska W, Banaszewski J, Wróbel MJ. To trust or not to trust: evaluating the reliability and safety of AI responses to laryngeal cancer queries. Eur Arch Otorhinolaryngol 2024:10.1007/s00405-024-08643-8. [PMID: 38652298 DOI: 10.1007/s00405-024-08643-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Accepted: 03/26/2024] [Indexed: 04/25/2024]

Abstract

PURPOSE

As online health information-seeking surges, concerns mount over the quality and safety of accessible content, potentially leading to patient harm through misinformation. On one hand, the emergence of Artificial Intelligence (AI) in healthcare could prevent it; on the other hand, questions raise regarding the quality and safety of the medical information provided. As laryngeal cancer is a prevalent head and neck malignancy, this study aims to evaluate the utility and safety of three large language models (LLMs) as sources of patient information about laryngeal cancer.

METHODS

A cross-sectional study was conducted using three LLMs (ChatGPT 3.5, ChatGPT 4.0, and Bard). A questionnaire comprising 36 inquiries about laryngeal cancer was categorised into diagnosis (11 questions), treatment (9 questions), novelties and upcoming treatments (4 questions), controversies (8 questions), and sources of information (4 questions). The population of reviewers consisted of 3 groups, including ENT specialists, junior physicians, and non-medicals, who graded the responses. Each physician evaluated each question twice for each model, while non-medicals only once. Everyone was blinded to the model type, and the question order was shuffled. Outcome evaluations were based on a safety score (1-3) and a Global Quality Score (GQS, 1-5). Results were compared between LLMs. The study included iterative assessments and statistical validations.

RESULTS

Analysis revealed that ChatGPT 3.5 scored highest in both safety (mean: 2.70) and GQS (mean: 3.95). ChatGPT 4.0 and Bard had lower safety scores of 2.56 and 2.42, respectively, with corresponding quality scores of 3.65 and 3.38. Inter-rater reliability was consistent, with less than 3% discrepancy. About 4.2% of responses fell into the lowest safety category (1), particularly in the novelty category. Non-medical reviewers' quality assessments correlated moderately (r = 0.67) with response length.

CONCLUSIONS

LLMs can be valuable resources for patients seeking information on laryngeal cancer. ChatGPT 3.5 provided the most reliable and safe responses among the models evaluated.

Collapse

Mu Y, He D. The Potential Applications and Challenges of ChatGPT in the Medical Field. Int J Gen Med 2024;17:817-826. [PMID: 38476626 PMCID: PMC10929156 DOI: 10.2147/ijgm.s456659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 02/26/2024] [Indexed: 03/14/2024] Open

Denecke K, May R, Rivera-Romero O. Transformer Models in Healthcare: A Survey and Thematic Analysis of Potentials, Shortcomings and Risks. J Med Syst 2024;48:23. [PMID: 38367119 PMCID: PMC10874304 DOI: 10.1007/s10916-024-02043-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 02/10/2024] [Indexed: 02/19/2024]

Maki S, Furuya T, Inoue M, Shiga Y, Inage K, Eguchi Y, Orita S, Ohtori S. Machine Learning and Deep Learning in Spinal Injury: A Narrative Review of Algorithms in Diagnosis and Prognosis. J Clin Med 2024;13:705. [PMID: 38337399 PMCID: PMC10856760 DOI: 10.3390/jcm13030705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 01/14/2024] [Accepted: 01/18/2024] [Indexed: 02/12/2024] Open

Larson HJ, Lin L. Generative artificial intelligence can have a role in combating vaccine hesitancy. BMJ 2024;384:q69. [PMID: 38228351 PMCID: PMC10789191 DOI: 10.1136/bmj.q69] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/18/2024]

Shaban-Nejad A, Michalowski M, Bianco S. Creative and generative artificial intelligence for personalized medicine and healthcare: Hype, reality, or hyperreality? Exp Biol Med (Maywood) 2023;248:2497-2499. [PMID: 38311873 PMCID: PMC10854468 DOI: 10.1177/15353702241226801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2024] Open