1
|
Li X, Guo H, Li D, Zheng Y. Engine of Innovation in Hospital Pharmacy: Applications and Reflections of ChatGPT. J Med Internet Res 2024; 26:e51635. [PMID: 39365643 DOI: 10.2196/51635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Revised: 04/09/2024] [Accepted: 09/06/2024] [Indexed: 10/05/2024] Open
Abstract
Hospital pharmacy plays an important role in ensuring medical care quality and safety, especially in the area of drug information retrieval, therapy guidance, and drug-drug interaction management. ChatGPT is a powerful artificial intelligence language model that can generate natural-language texts. Here, we explored the applications and reflections of ChatGPT in hospital pharmacy, where it may enhance the quality and efficiency of pharmaceutical care. We also explored ChatGPT's prospects in hospital pharmacy and discussed its working principle, diverse applications, and practical cases in daily operations and scientific research. Meanwhile, the challenges and limitations of ChatGPT, such as data privacy, ethical issues, bias and discrimination, and human oversight, are discussed. ChatGPT is a promising tool for hospital pharmacy, but it requires careful evaluation and validation before it can be integrated into clinical practice. Some suggestions for future research and development of ChatGPT in hospital pharmacy are provided.
Collapse
Affiliation(s)
- Xingang Li
- Department of Pharmacy, Beijing Friendship Hospital, Capital Medical University, Beijing, China
| | - Heng Guo
- Department of Pharmacy, Beijing Friendship Hospital, Capital Medical University, Beijing, China
| | - Dandan Li
- Department of Pharmacy, Beijing Friendship Hospital, Capital Medical University, Beijing, China
| | - Yingming Zheng
- Department of Pharmacy, Beijing Friendship Hospital, Capital Medical University, Beijing, China
| |
Collapse
|
2
|
Finch L, Broach V, Feinberg J, Al-Niaimi A, Abu-Rustum NR, Zhou Q, Iasonos A, Chi DS. ChatGPT compared to national guidelines for management of ovarian cancer: Did ChatGPT get it right? - A Memorial Sloan Kettering Cancer Center Team Ovary study. Gynecol Oncol 2024; 189:75-79. [PMID: 39042956 PMCID: PMC11402584 DOI: 10.1016/j.ygyno.2024.07.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 07/08/2024] [Accepted: 07/15/2024] [Indexed: 07/25/2024]
Abstract
OBJECTIVES We evaluated the performance of a chatbot compared to the National Comprehensive Cancer Network (NCCN) Guidelines for the management of ovarian cancer. METHODS Using NCCN Guidelines, we generated 10 questions and answers regarding management of ovarian cancer at a single point in time. Questions were thematically divided into risk factors, surgical management, medical management, and surveillance. We asked ChatGPT (GPT-4) to provide responses without prompting (unprompted GPT) and with prompt engineering (prompted GPT). Responses were blinded and evaluated for accuracy and completeness by 5 gynecologic oncologists. A score of 0 was defined as inaccurate, 1 as accurate and incomplete, and 2 as accurate and complete. Evaluations were compared among NCCN, unprompted GPT, and prompted GPT answers. RESULTS Overall, 48% of responses from NCCN, 64% from unprompted GPT, and 66% from prompted GPT were accurate and complete. The percentage of accurate but incomplete responses was higher for NCCN vs GPT-4. The percentage of accurate and complete scores for questions regarding risk factors, surgical management, and surveillance was higher for GPT-4 vs NCCN; however, for questions regarding medical management, the percentage was lower for GPT-4 vs NCCN. Overall, 14% of responses from unprompted GPT, 12% from prompted GPT, and 10% from NCCN were inaccurate. CONCLUSIONS GPT-4 provided accurate and complete responses at a single point in time to a limited set of questions regarding ovarian cancer, with best performance in areas of risk factors, surgical management, and surveillance. Occasional inaccuracies, however, should limit unsupervised use of chatbots at this time.
Collapse
Affiliation(s)
- Lindsey Finch
- Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Vance Broach
- Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Department of Obstetrics and Gynecology, Weill Cornell Medical College, New York, NY, USA
| | - Jacqueline Feinberg
- Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Department of Obstetrics and Gynecology, Weill Cornell Medical College, New York, NY, USA
| | - Ahmed Al-Niaimi
- Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Department of Obstetrics and Gynecology, Weill Cornell Medical College, New York, NY, USA
| | - Nadeem R Abu-Rustum
- Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Department of Obstetrics and Gynecology, Weill Cornell Medical College, New York, NY, USA
| | - Qin Zhou
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Alexia Iasonos
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Dennis S Chi
- Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Department of Obstetrics and Gynecology, Weill Cornell Medical College, New York, NY, USA.
| |
Collapse
|
3
|
McClymont H, Lambert SB, Barr I, Vardoulakis S, Bambrick H, Hu W. Internet-based Surveillance Systems and Infectious Diseases Prediction: An Updated Review of the Last 10 Years and Lessons from the COVID-19 Pandemic. J Epidemiol Glob Health 2024; 14:645-657. [PMID: 39141074 PMCID: PMC11442909 DOI: 10.1007/s44197-024-00272-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Accepted: 06/26/2024] [Indexed: 08/15/2024] Open
Abstract
The last decade has seen major advances and growth in internet-based surveillance for infectious diseases through advanced computational capacity, growing adoption of smart devices, increased availability of Artificial Intelligence (AI), alongside environmental pressures including climate and land use change contributing to increased threat and spread of pandemics and emerging infectious diseases. With the increasing burden of infectious diseases and the COVID-19 pandemic, the need for developing novel technologies and integrating internet-based data approaches to improving infectious disease surveillance is greater than ever. In this systematic review, we searched the scientific literature for research on internet-based or digital surveillance for influenza, dengue fever and COVID-19 from 2013 to 2023. We have provided an overview of recent internet-based surveillance research for emerging infectious diseases (EID), describing changes in the digital landscape, with recommendations for future research directed at public health policymakers, healthcare providers, and government health departments to enhance traditional surveillance for detecting, monitoring, reporting, and responding to influenza, dengue, and COVID-19.
Collapse
Affiliation(s)
- Hannah McClymont
- Ecosystem Change and Population Health (ECAPH) Research Group, School of Public Health and Social Work, Queensland University of Technology (QUT), Brisbane, Australia
| | - Stephen B Lambert
- Communicable Diseases Branch, Queensland Health, Brisbane, Australia
- National Centre for Immunisation Research and Surveillance, Sydney Children's Hospitals Network, Westmead, Australia
| | - Ian Barr
- WHO Collaborating Centre for Reference and Research on Influenza, The Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
- Department of Microbiology and Immunology, University of Melbourne, Melbourne, Australia
| | - Sotiris Vardoulakis
- Health Research Institute, University of Canberra, Canberra, Australia
- Healthy Environments and Lives (HEAL) National Research Network, Canberra, Australia
| | - Hilary Bambrick
- National Centre for Epidemiology and Population Health, College of Health and Medicine, The Australian National University, Canberra, Australia
| | - Wenbiao Hu
- Ecosystem Change and Population Health (ECAPH) Research Group, School of Public Health and Social Work, Queensland University of Technology (QUT), Brisbane, Australia.
- Healthy Environments and Lives (HEAL) National Research Network, Canberra, Australia.
| |
Collapse
|
4
|
Dashti M, Ghasemi S, Ghadimi N, Hefzi D, Karimian A, Zare N, Fahimipour A, Khurshid Z, Chafjiri MM, Ghaedsharaf S. Performance of ChatGPT 3.5 and 4 on U.S. dental examinations: the INBDE, ADAT, and DAT. Imaging Sci Dent 2024; 54:271-275. [PMID: 39371301 PMCID: PMC11450412 DOI: 10.5624/isd.20240037] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 04/21/2024] [Accepted: 04/27/2024] [Indexed: 10/08/2024] Open
Abstract
Purpose Recent advancements in artificial intelligence (AI), particularly tools such as ChatGPT developed by OpenAI, a U.S.-based AI research organization, have transformed the healthcare and education sectors. This study investigated the effectiveness of ChatGPT in answering dentistry exam questions, demonstrating its potential to enhance professional practice and patient care. Materials and Methods This study assessed the performance of ChatGPT 3.5 and 4 on U.S. dental exams - specifically, the Integrated National Board Dental Examination (INBDE), Dental Admission Test (DAT), and Advanced Dental Admission Test (ADAT) - excluding image-based questions. Using customized prompts, ChatGPT's answers were evaluated against official answer sheets. Results ChatGPT 3.5 and 4 were tested with 253 questions from the INBDE, ADAT, and DAT exams. For the INBDE, both versions achieved 80% accuracy in knowledge-based questions and 66-69% in case history questions. In ADAT, they scored 66-83% in knowledge-based and 76% in case history questions. ChatGPT 4 excelled on the DAT, with 94% accuracy in knowledge-based questions, 57% in mathematical analysis items, and 100% in comprehension questions, surpassing ChatGPT 3.5's rates of 83%, 31%, and 82%, respectively. The difference was significant for knowledge-based questions (P=0.009). Both versions showed similar patterns in incorrect responses. Conclusion Both ChatGPT 3.5 and 4 effectively handled knowledge-based, case history, and comprehension questions, with ChatGPT 4 being more reliable and surpassing the performance of 3.5. ChatGPT 4's perfect score in comprehension questions underscores its trainability in specific subjects. However, both versions exhibited weaker performance in mathematical analysis, suggesting this as an area for improvement.
Collapse
Affiliation(s)
- Mahmood Dashti
- Dentofacial Deformities Research Center, Research Institute of Dental Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Shohreh Ghasemi
- Department of Trauma and Craniofacial Reconstruction, Queen Mary College, London, England
| | - Niloofar Ghadimi
- Department of Oral and Maxillofacial Radiology, Dental School, Islamic Azad University of Medical Sciences, Tehran, Iran
| | - Delband Hefzi
- School of Dentistry, Tehran University of Medical Science, Tehran, Iran
| | - Azizeh Karimian
- Department of Biostatistics, Dental Research Center, Golestan University of Medical Sciences, Gorgan, Iran
| | - Niusha Zare
- Department of Operative Dentistry, University of Southern California, CA, USA
| | - Amir Fahimipour
- Discipline of Oral Surgery, Medicine and Diagnostics, School of Dentistry, Faculty of Medicine and Health, Westmead Centre for Oral Health, The University of Sydney, Sydney, Australia
| | - Zohaib Khurshid
- Department of Prosthodontics and Dental Implantology, King Faisal University, Al Ahsa, Kingdom of Saudi Arabia
| | - Maryam Mohammadalizadeh Chafjiri
- Department of Oral and Maxillofacial Pathology, School of Dentistry, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Sahar Ghaedsharaf
- Department of Oral and Maxillofacial Radiology, School of Dentistry, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
5
|
Luo MJ, Pang J, Bi S, Lai Y, Zhao J, Shang Y, Cui T, Yang Y, Lin Z, Zhao L, Wu X, Lin D, Chen J, Lin H. Development and Evaluation of a Retrieval-Augmented Large Language Model Framework for Ophthalmology. JAMA Ophthalmol 2024; 142:798-805. [PMID: 39023885 PMCID: PMC11258636 DOI: 10.1001/jamaophthalmol.2024.2513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 05/14/2024] [Indexed: 07/20/2024]
Abstract
Importance Although augmenting large language models (LLMs) with knowledge bases may improve medical domain-specific performance, practical methods are needed for local implementation of LLMs that address privacy concerns and enhance accessibility for health care professionals. Objective To develop an accurate, cost-effective local implementation of an LLM to mitigate privacy concerns and support their practical deployment in health care settings. Design, Setting, and Participants ChatZOC (Sun Yat-Sen University Zhongshan Ophthalmology Center), a retrieval-augmented LLM framework, was developed by enhancing a baseline LLM with a comprehensive ophthalmic dataset and evaluation framework (CODE), which includes over 30 000 pieces of ophthalmic knowledge. This LLM was benchmarked against 10 representative LLMs, including GPT-4 and GPT-3.5 Turbo (OpenAI), across 300 clinical questions in ophthalmology. The evaluation, involving a panel of medical experts and biomedical researchers, focused on accuracy, utility, and safety. A double-masked approach was used to try to minimize bias assessment across all models. The study used a comprehensive knowledge base derived from ophthalmic clinical practice, without directly involving clinical patients. Exposures LLM response to clinical questions. Main Outcomes and Measures Accuracy, utility, and safety of LLMs in responding to clinical questions. Results The baseline model achieved a human ranking score of 0.48. The retrieval-augmented LLM had a score of 0.60, a difference of 0.12 (95% CI, 0.02-0.22; P = .02) from baseline and not different from GPT-4 with a score of 0.61 (difference = 0.01; 95% CI, -0.11 to 0.13; P = .89). For scientific consensus, the retrieval-augmented LLM was 84.0% compared with the baseline model of 46.5% (difference = 37.5%; 95% CI, 29.0%-46.0%; P < .001) and not different from GPT-4 with a value of 79.2% (difference = 4.8%; 95% CI, -0.3% to 10.0%; P = .06). Conclusions and Relevance Results of this quality improvement study suggest that the integration of high-quality knowledge bases improved the LLM's performance in medical domains. This study highlights the transformative potential of augmented LLMs in clinical practice by providing reliable, safe, and practical clinical information. Further research is needed to explore the broader application of such frameworks in the real world.
Collapse
Affiliation(s)
- Ming-Jie Luo
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Jianyu Pang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Shaowei Bi
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Yunxi Lai
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Jiaman Zhao
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Yuanrui Shang
- The Second Affiliated Hospital of Xi’an Jiaotong University, Xi’an, China
| | - Tingxin Cui
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Yahan Yang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Zhenzhe Lin
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Lanqin Zhao
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Xiaohang Wu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Duoru Lin
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Jingjing Chen
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
| | - Haotian Lin
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, China
- Center for Precision Medicine and Department of Genetics and Biomedical Informatics, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, Guangdong, China
- Hainan Eye Hospital and Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Haikou, China
| |
Collapse
|
6
|
Zhang Q, Wu Z, Song J, Luo S, Chai Z. Comprehensiveness of Large Language Models in Patient Queries on Gingival and Endodontic Health. Int Dent J 2024:S0020-6539(24)00195-3. [PMID: 39147663 DOI: 10.1016/j.identj.2024.06.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 06/12/2024] [Accepted: 06/19/2024] [Indexed: 08/17/2024] Open
Abstract
AIM Given the increasing interest in using large language models (LLMs) for self-diagnosis, this study aimed to evaluate the comprehensiveness of two prominent LLMs, ChatGPT-3.5 and ChatGPT-4, in addressing common queries related to gingival and endodontic health across different language contexts and query types. METHODS We assembled a set of 33 common real-life questions related to gingival and endodontic healthcare, including 17 common-sense questions and 16 expert questions. Each question was presented to the LLMs in both English and Chinese. Three specialists were invited to evaluate the comprehensiveness of the responses on a five-point Likert scale, where a higher score indicated greater quality responses. RESULTS LLMs performed significantly better in English, with an average score of 4.53, compared to 3.95 in Chinese (Mann-Whitney U test, P < .05). Responses to common sense questions received higher scores than those to expert questions, with averages of 4.46 and 4.02 (Mann-Whitney U test, P < .05). Among the LLMs, ChatGPT-4 consistently outperformed ChatGPT-3.5, achieving average scores of 4.45 and 4.03 (Mann-Whitney U test, P < .05). CONCLUSIONS ChatGPT-4 provides more comprehensive responses than ChatGPT-3.5 for queries related to gingival and endodontic health. Both LLMs perform better in English and on common sense questions. However, the performance discrepancies across different language contexts and the presence of inaccurate responses suggest that further evaluation and understanding of their limitations are crucial to avoid potential misunderstandings. CLINICAL RELEVANCE This study revealed the performance differences of ChatGPT-3.5 and ChatGPT-4 in handling gingival and endodontic health issues across different language contexts, providing insights into the comprehensiveness and limitations of LLMs in addressing common oral healthcare queries.
Collapse
Affiliation(s)
- Qian Zhang
- College of Stomatology, Chongqing Medical University, Chongqing, China; Chongqing Key Laboratory for Oral Diseases and Biomedical Sciences, Chongqing, China; Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, China
| | - Zhengyu Wu
- College of Stomatology, Chongqing Medical University, Chongqing, China; Chongqing Key Laboratory for Oral Diseases and Biomedical Sciences, Chongqing, China; Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, China
| | - Jinlin Song
- College of Stomatology, Chongqing Medical University, Chongqing, China; Chongqing Key Laboratory for Oral Diseases and Biomedical Sciences, Chongqing, China; Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, China
| | - Shuicai Luo
- Quanzhou Institute of Equipment Manufacturing, Haixi Institute, Chinese Academy of Sciences, Quanzhou, China
| | - Zhaowu Chai
- College of Stomatology, Chongqing Medical University, Chongqing, China; Chongqing Key Laboratory for Oral Diseases and Biomedical Sciences, Chongqing, China; Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, China.
| |
Collapse
|
7
|
Mutschler E, Roloff T, Neves A, Vangstein Aamot H, Rodriguez-Sanchez B, Ramirez M, Rossen J, Couto N, Novais Â, Howden BP, Brisse S, Reuter S, Nolte O, Egli A, Seth-Smith HMB. Towards unified reporting of genome sequencing results in clinical microbiology. PeerJ 2024; 12:e17673. [PMID: 39131622 PMCID: PMC11317035 DOI: 10.7717/peerj.17673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Accepted: 06/12/2024] [Indexed: 08/13/2024] Open
Abstract
Whole genome sequencing (WGS) has become a vital tool in clinical microbiology, playing an important role in outbreak investigations, molecular surveillance, and identification of bacterial species, resistance mechanisms and virulence factors. However, the complexity of WGS data presents challenges in interpretation and reporting, requiring tailored strategies to enhance efficiency and impact. This study explores the diverse needs of key stakeholders in healthcare, including clinical management, laboratory work, public surveillance and epidemiology, infection prevention and control, and academic research, regarding WGS-based reporting of clinically relevant bacterial species. In order to determine preferences regarding WGS reports, human-centered design approach was employed, involving an online survey and a subsequent workshop with stakeholders. The survey gathered responses from 64 participants representing the above mentioned healthcare sectors across geographical regions. Key findings include the identification of barriers related to data accessibility, integration with patient records, and the complexity of interpreting WGS results. As the participants designed their ideal report using nine pre-defined sections of a typical WGS report, differences in needs regarding report structure and content across stakeholders became evident. The workshop discussions further highlighted the need to feature critical findings and quality metrics prominently in reports, as well as the demand for flexible report designs. Commonalities were observed across stakeholder-specific reporting templates, such as the uniform ranking of certain report sections, but preferences regarding the depth of content within these sections varied. Using these findings, we suggest stakeholder-specific structures which should be considered when designing customized reporting templates. In conclusion, this study underscores the importance of tailoring WGS-based reports of clinically relevant bacteria to meet the distinct needs of diverse healthcare stakeholders. The evolving landscape of digital reporting increases the opportunities with respect to WGS reporting and its utility in managing infectious diseases and public health surveillance.
Collapse
Affiliation(s)
- Eugenio Mutschler
- Institute of Medical Microbiology, University of Zürich, Zurich, Switzerland
| | - Tim Roloff
- Institute of Medical Microbiology, University of Zürich, Zurich, Switzerland
| | - Aitana Neves
- Swiss Institute of Bioinformatics, Geneva, Switzerland
| | | | | | - Mario Ramirez
- Instituto de Microbiologia, Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Lisbon, Portugal
| | - John Rossen
- University Medical Center Groningen, Zwolle, Netherlands
| | - Natacha Couto
- Centre for Genomic Pathogen Surveillance, Pandemic Sciences Institute, University of Oxford, Oxford, United Kingdom
| | - Ângela Novais
- UCIBIO. Applied Molecular Biosciences Unit, Department of Biological Sciences, Faculty of Pharmacy, University of Porto, Porto, Portugal
- Associate Laboratory i4HB-Institute for Health and Bioeconomy, Faculty of Pharmacy, University of Porto, Porto, Portugal
| | | | | | - Sandra Reuter
- Medical Center, University of Freiburg, Freiburg, Germany
| | - Oliver Nolte
- Institute of Medical Microbiology, University of Zürich, Zurich, Switzerland
| | - Adrian Egli
- Institute of Medical Microbiology, University of Zürich, Zurich, Switzerland
| | | | - the ESCMID Study Group for Epidemiological Markers (ESGEM), and ESCMID Study Group for Genomic and Molecular Diagnostics (ESGMD)
- Institute of Medical Microbiology, University of Zürich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Geneva, Switzerland
- Akershus University Hospital, Lorenskog, Norway
- Hospital Gregorio Marañon, Madrid, Spain
- Instituto de Microbiologia, Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Lisbon, Portugal
- University Medical Center Groningen, Zwolle, Netherlands
- Centre for Genomic Pathogen Surveillance, Pandemic Sciences Institute, University of Oxford, Oxford, United Kingdom
- UCIBIO. Applied Molecular Biosciences Unit, Department of Biological Sciences, Faculty of Pharmacy, University of Porto, Porto, Portugal
- Associate Laboratory i4HB-Institute for Health and Bioeconomy, Faculty of Pharmacy, University of Porto, Porto, Portugal
- University of Melbourne, Parkville, Australia
- Institut Pasteur, Paris, France
- Medical Center, University of Freiburg, Freiburg, Germany
| |
Collapse
|
8
|
Zhao N, Wu T, Wang W, Zhang L, Gong X. Review and Comparative Analysis of Methods and Advancements in Predicting Protein Complex Structure. Interdiscip Sci 2024; 16:261-288. [PMID: 38955920 DOI: 10.1007/s12539-024-00626-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 02/29/2024] [Accepted: 03/01/2024] [Indexed: 07/04/2024]
Abstract
Protein complexes perform diverse biological functions, and obtaining their three-dimensional structure is critical to understanding and grasping their functions. In many cases, it's not just two proteins interacting to form a dimer; instead, multiple proteins interact to form a multimer. Experimentally resolving protein complex structures can be quite challenging. Recently, there have been efforts and methods that build upon prior predictions of dimer structures to attempt to predict multimer structures. However, in comparison to monomeric protein structure prediction, the accuracy of protein complex structure prediction remains relatively low. This paper provides an overview of recent advancements in efficient computational models for predicting protein complex structures. We introduce protein-protein docking methods in detail and summarize their main ideas, applicable modes, and related information. To enhance prediction accuracy, other critical protein-related information is also integrated, such as predicting interchain residue contact, utilizing experimental data like cryo-EM experiments, and considering protein interactions and non-interactions. In addition, we comprehensively review computational approaches for end-to-end prediction of protein complex structures based on artificial intelligence (AI) technology and describe commonly used datasets and representative evaluation metrics in protein complexes. Finally, we analyze the formidable challenges faced in current protein complex structure prediction tasks, including the structure prediction of heteromeric complex, disordered regions in complex, antibody-antigen complex, and RNA-related complex, as well as the evaluation metrics for complex assessment. We hope that this work will provide comprehensive knowledge of complex structure predictions to contribute to future advanced predictions.
Collapse
Affiliation(s)
- Nan Zhao
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Tong Wu
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Wenda Wang
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Lunchuan Zhang
- School of Mathematics, Renmin University of China, Beijing, 100872, China.
| | - Xinqi Gong
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China.
- School of Mathematics, Renmin University of China, Beijing, 100872, China.
- Beijing Academy of Artificial Intelligence, Beijing, 100084, China.
| |
Collapse
|
9
|
Grimm DR, Lee YJ, Hu K, Liu L, Garcia O, Balakrishnan K, Ayoub NF. The utility of ChatGPT as a generative medical translator. Eur Arch Otorhinolaryngol 2024:10.1007/s00405-024-08708-8. [PMID: 38705894 DOI: 10.1007/s00405-024-08708-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Accepted: 04/24/2024] [Indexed: 05/07/2024]
Abstract
PURPOSE Large language models continue to dramatically change the medical landscape. We aimed to explore the utility of ChatGPT in providing accurate, actionable, and understandable generative medical translations in English, Spanish, and Mandarin pertaining to Otolaryngology. METHODS Responses of GPT-4 to commonly asked patient questions listed on official otolaryngology clinical practice guidelines (CPG) were evaluated with the Patient Education materials Assessment Tool-printable (PEMAT-P.) Additional critical elements were identified a priori to evaluate ChatGPT's accuracy and thoroughness in its responses. Multiple fluent speakers of English, Mandarin, and Spanish evaluated each response generated by ChatGPT. RESULTS Total PEMAT-P scores differed between English, Mandarin, and Spanish GPT-4 generated responses depicting a moderate effect size of language, Eta-Square 0.07 with scores ranging from 73 to 77 (P-value = 0.03). Overall understandability scores did not differ between English, Mandarin, and Spanish depicting a small effect size of language, Eta-Square 0.02 scores ranging from 76 to 79 (P-value = 0.17), nor did overall actionability scores Eta-Square 0 score ranging 66-73 (P-value = 0.44). Overall a priori procedure-specific responses similarly did not differ between English, Spanish, and Mandarin Eta-Square 0.02 scores ranging 61-78 (P-value = 0.22). CONCLUSION GPT-4 produces accurate, understandable, and actionable outputs in English, Spanish, and Mandarin. Responses generated by GPT-4 in Spanish and Mandarin are comparable to English counterparts indicating a novel use for these models within Otolaryngology, and implications for bridging healthcare access and literacy gaps. LEVEL OF EVIDENCE IV.
Collapse
Affiliation(s)
- David R Grimm
- Division of Pediatric Otolaryngology, Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Yu-Jin Lee
- Division of Pediatric Otolaryngology, Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Katherine Hu
- Division of Pediatric Otolaryngology, Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Longsha Liu
- Division of Pediatric Otolaryngology, Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Omar Garcia
- Division of Pediatric Otolaryngology, Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Karthik Balakrishnan
- Division of Pediatric Otolaryngology, Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Noel F Ayoub
- Division of Pediatric Otolaryngology, Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine, Stanford, CA, 94305, USA.
- Division of Rhinology and Skull Base Surgery, Department of Otolaryngology-Head and Neck Surgery, Mass Eye and Ear, 243 Charles Street, Boston, MA, 02114, USA.
| |
Collapse
|
10
|
Ranjan J, Ahmad A, Subudhi M, Kumar A. Assessment of Artificial Intelligence Platforms With Regard to Medical Microbiology Knowledge: An Analysis of ChatGPT and Gemini. Cureus 2024; 16:e60675. [PMID: 38770053 PMCID: PMC11104281 DOI: 10.7759/cureus.60675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/20/2024] [Indexed: 05/22/2024] Open
Abstract
The performance of two artificial intelligence (AI) platforms, ChatGPT 3.5 (OpenAI, California, United States) and Gemini (Google AI, California, United States) was assessed by answering 200 questions of microbiology drawn from validated sources. The questions were selected from topics such as General Microbiology, Immunology, and Microbiology Applied to Infectious Diseases. The study was conducted from December 2023 to March 2024, and the responses of the different AI platforms were compared with an answer key. Statistical analysis was performed to assess accuracy. ChatGPT 3.5 and Gemini had comparable accuracy with correct response scores of 71% and 70.5%, respectively. Their performance varied across different sections. Gemini performed better in General Microbiology and Immunology, and ChatGPT 3.5 had a better score in the Applied Microbiology section. The study's findings highlight that AI platforms such as ChatGPT and Gemini can be utilized in microbiology and medical education. The evolution and continuous updating of AI platforms are required to improve their performance.
Collapse
Affiliation(s)
- Jai Ranjan
- Microbiology, All India Institute of Medical Sciences, Bathinda, Bathinda, IND
| | - Absar Ahmad
- Animal Genetics and Breeding, Faculty of Veterinary Science and Animal Husbandry, Birsa Agricultural University, Ranchi, IND
| | - Monalisa Subudhi
- Microbiology, Institute of Medical Sciences and SUM-II Hospital, Bhubaneswar, IND
| | - Ajay Kumar
- Microbiology, Manipal Tata Medical College, Manipal Academy of Higher Education, Manipal, IND
| |
Collapse
|
11
|
Mastrokostas PG, Mastrokostas LE, Emara AK, Wellington IJ, Ginalis E, Houten JK, Khalsa AS, Saleh A, Razi AE, Ng MK. GPT-4 as a Source of Patient Information for Anterior Cervical Discectomy and Fusion: A Comparative Analysis Against Google Web Search. Global Spine J 2024:21925682241241241. [PMID: 38513636 DOI: 10.1177/21925682241241241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 03/23/2024] Open
Abstract
STUDY DESIGN Comparative study. OBJECTIVES This study aims to compare Google and GPT-4 in terms of (1) question types, (2) response readability, (3) source quality, and (4) numerical response accuracy for the top 10 most frequently asked questions (FAQs) about anterior cervical discectomy and fusion (ACDF). METHODS "Anterior cervical discectomy and fusion" was searched on Google and GPT-4 on December 18, 2023. Top 10 FAQs were classified according to the Rothwell system. Source quality was evaluated using JAMA benchmark criteria and readability was assessed using Flesch Reading Ease and Flesch-Kincaid grade level. Differences in JAMA scores, Flesch-Kincaid grade level, Flesch Reading Ease, and word count between platforms were analyzed using Student's t-tests. Statistical significance was set at the .05 level. RESULTS Frequently asked questions from Google were varied, while GPT-4 focused on technical details and indications/management. GPT-4 showed a higher Flesch-Kincaid grade level (12.96 vs 9.28, P = .003), lower Flesch Reading Ease score (37.07 vs 54.85, P = .005), and higher JAMA scores for source quality (3.333 vs 1.800, P = .016). Numerically, 6 out of 10 responses varied between platforms, with GPT-4 providing broader recovery timelines for ACDF. CONCLUSIONS This study demonstrates GPT-4's ability to elevate patient education by providing high-quality, diverse information tailored to those with advanced literacy levels. As AI technology evolves, refining these tools for accuracy and user-friendliness remains crucial, catering to patients' varying literacy levels and information needs in spine surgery.
Collapse
Affiliation(s)
- Paul G Mastrokostas
- College of Medicine, State University of New York (SUNY) Downstate, Brooklyn, NY, USA
| | | | - Ahmed K Emara
- Department of Orthopaedic Surgery, Cleveland Clinic, Cleveland, OH, USA
| | - Ian J Wellington
- Department of Orthopaedic Surgery, University of Connecticut, Hartford, CT, USA
| | | | - John K Houten
- Department of Neurosurgery, Mount Sinai School of Medicine, New York, NY, USA
| | - Amrit S Khalsa
- Department of Orthopaedic Surgery, University of Pennsylvania, Philadelphia, PA, USA
| | - Ahmed Saleh
- Department of Orthopaedic Surgery, Maimonides Medical Center, Brooklyn, NY, USA
| | - Afshin E Razi
- Department of Orthopaedic Surgery, Maimonides Medical Center, Brooklyn, NY, USA
| | - Mitchell K Ng
- Department of Orthopaedic Surgery, Maimonides Medical Center, Brooklyn, NY, USA
| |
Collapse
|
12
|
Langford BJ, Branch-Elliman W, Nori P, Marra AR, Bearman G. Confronting the Disruption of the Infectious Diseases Workforce by Artificial Intelligence: What This Means for Us and What We Can Do About It. Open Forum Infect Dis 2024; 11:ofae053. [PMID: 38434616 PMCID: PMC10906702 DOI: 10.1093/ofid/ofae053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 01/26/2024] [Indexed: 03/05/2024] Open
Abstract
With the rapid advancement of artificial intelligence (AI), the field of infectious diseases (ID) faces both innovation and disruption. AI and its subfields including machine learning, deep learning, and large language models can support ID clinicians' decision making and streamline their workflow. AI models may help ensure earlier detection of disease, more personalized empiric treatment recommendations, and allocation of human resources to support higher-yield antimicrobial stewardship and infection prevention strategies. AI is unlikely to replace the role of ID experts, but could instead augment it. However, its limitations will need to be carefully addressed and mitigated to ensure safe and effective implementation. ID experts can be engaged in AI implementation by participating in training and education, identifying use cases for AI to help improve patient care, designing, validating and evaluating algorithms, and continuing to advocate for their vital role in patient care.
Collapse
Affiliation(s)
- Bradley J Langford
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
- Hotel Dieu Shaver Health and Rehabilitation Centre, Department of Pharmacy, St Catharines, Ontario, Canada
| | - Westyn Branch-Elliman
- Department of Medicine, Section of Infectious Diseases, Veterans Affairs Boston Healthcare System, Boston, Massachusetts, USA
- National Artificial Intelligence Institute, Department of Veterans Affairs, Washington, District of Columbia, USA
- Harvard Medical School, Boston, Massachusetts, USA
| | - Priya Nori
- Division of Infectious Diseases, Department of Medicine, Montefiore Health System, Albert Einstein College of Medicine, Bronx, New York, USA
| | - Alexandre R Marra
- Instituto Israelita de Ensino e Pesquisa Albert Einstein, Hospital Israelita Albert Einstein, São Paulo, Brazil
- Department of Internal Medicine, University of Iowa Carver College of Medicine, Iowa City, Iowa, USA
| | - Gonzalo Bearman
- Division of Infectious Diseases, Virginia Commonwealth University Health, Virginia Commonwealth University, Richmond, Virginia, USA
| |
Collapse
|
13
|
Chakraborty C, Pal S, Bhattacharya M, Islam MA. ChatGPT or LLMs can provide treatment suggestions for critical patients with antibiotic-resistant infections: a next-generation revolution for medical science? Int J Surg 2024; 110:1829-1831. [PMID: 38085845 PMCID: PMC10942188 DOI: 10.1097/js9.0000000000000987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Accepted: 11/27/2023] [Indexed: 03/16/2024]
Affiliation(s)
- Chiranjib Chakraborty
- Department of Biotechnology, School of Life Science and Biotechnology, Adamas University, Kolkata, West Bengal, India
| | - Soumen Pal
- School of Mechanical Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - Manojit Bhattacharya
- Department of Zoology, Fakir Mohan University, Vyasa Vihar, Balasore, Odisha, India
| | - Md. Aminul Islam
- COVID-19 Diagnostic Lab, Department of Microbiology, Noakhali Science and Technology University, Noakhali
- Advanced Molecular Lab, Department of Microbiology, President Abdul Hamid Medical College, Karimganj, Kishoreganj, Bangladesh
| |
Collapse
|
14
|
Andrew A. Potential applications and implications of large language models in primary care. Fam Med Community Health 2024; 12:e002602. [PMID: 38290759 PMCID: PMC10828839 DOI: 10.1136/fmch-2023-002602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 01/16/2024] [Indexed: 02/01/2024] Open
Abstract
The recent release of highly advanced generative artificial intelligence (AI) chatbots, including ChatGPT and Bard, which are powered by large language models (LLMs), has attracted growing mainstream interest over its diverse applications in clinical practice, including in health and healthcare. The potential applications of LLM-based programmes in the medical field range from assisting medical practitioners in improving their clinical decision-making and streamlining administrative paperwork to empowering patients to take charge of their own health. However, despite the broad range of benefits, the use of such AI tools also comes with several limitations and ethical concerns that warrant further consideration, encompassing issues related to privacy, data bias, and the accuracy and reliability of information generated by AI. The focus of prior research has primarily centred on the broad applications of LLMs in medicine. To the author's knowledge, this is, the first article that consolidates current and pertinent literature on LLMs to examine its potential in primary care. The objectives of this paper are not only to summarise the potential benefits, risks and challenges of using LLMs in primary care, but also to offer insights into considerations that primary care clinicians should take into account when deciding to adopt and integrate such technologies into their clinical practice.
Collapse
Affiliation(s)
- Albert Andrew
- Medical Student, The University of Auckland School of Medicine, Auckland, New Zealand
| |
Collapse
|
15
|
Roemer G, Li A, Mahmood U, Dauer L, Bellamy M. Artificial intelligence model GPT4 narrowly fails simulated radiological protection exam. JOURNAL OF RADIOLOGICAL PROTECTION : OFFICIAL JOURNAL OF THE SOCIETY FOR RADIOLOGICAL PROTECTION 2024; 44:013502. [PMID: 38232401 DOI: 10.1088/1361-6498/ad1fdf] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 01/17/2024] [Indexed: 01/19/2024]
Abstract
This study assesses the efficacy of Generative Pre-Trained Transformers (GPT) published by OpenAI in the specialised domains of radiological protection and health physics. Utilising a set of 1064 surrogate questions designed to mimic a health physics certification exam, we evaluated the models' ability to accurately respond to questions across five knowledge domains. Our results indicated that neither model met the 67% passing threshold, with GPT-3.5 achieving a 45.3% weighted average and GPT-4 attaining 61.7%. Despite GPT-4's significant parameter increase and multimodal capabilities, it demonstrated superior performance in all categories yet still fell short of a passing score. The study's methodology involved a simple, standardised prompting strategy without employing prompt engineering or in-context learning, which are known to potentially enhance performance. The analysis revealed that GPT-3.5 formatted answers more correctly, despite GPT-4's higher overall accuracy. The findings suggest that while GPT-3.5 and GPT-4 show promise in handling domain-specific content, their application in the field of radiological protection should be approached with caution, emphasising the need for human oversight and verification.
Collapse
Affiliation(s)
- G Roemer
- MSKCC, 1275 York Avenue, New York, NY 10065, United States of America
| | - A Li
- MSKCC, 1275 York Avenue, New York, NY 10065, United States of America
| | - U Mahmood
- MSKCC, 1275 York Avenue, New York, NY 10065, United States of America
| | - L Dauer
- MSKCC, 1275 York Avenue, New York, NY 10065, United States of America
| | - M Bellamy
- MSKCC, 1275 York Avenue, New York, NY 10065, United States of America
| |
Collapse
|
16
|
Kienzle A, Niemann M, Meller S, Gwinner C. ChatGPT May Offer an Adequate Substitute for Informed Consent to Patients Prior to Total Knee Arthroplasty-Yet Caution Is Needed. J Pers Med 2024; 14:69. [PMID: 38248771 PMCID: PMC10821427 DOI: 10.3390/jpm14010069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 12/30/2023] [Accepted: 01/03/2024] [Indexed: 01/23/2024] Open
Abstract
Prior to undergoing total knee arthroplasty (TKA), surgeons are often confronted with patients with numerous questions regarding the procedure and the recovery process. Due to limited staff resources and mounting individual workload, increased efficiency, e.g., using artificial intelligence (AI), is of increasing interest. We comprehensively evaluated ChatGPT's orthopedic responses using the DISCERN instrument. Three independent orthopedic surgeons rated the responses across various criteria. We found consistently high scores, predominantly exceeding a score of three out of five in almost all categories, indicative of the quality and accuracy of the information provided. Notably, the AI demonstrated proficiency in conveying precise and reliable information on orthopedic topics. However, a notable observation pertains to the generation of non-existing references for certain claims. This study underscores the significance of critically evaluating references provided by ChatGPT and emphasizes the necessity of cross-referencing information from established sources. Overall, the findings contribute valuable insights into the performance of ChatGPT in delivering accurate orthopedic information for patients in clinical use while shedding light on areas warranting further refinement. Future iterations of natural language processing systems may be able to replace, in part or in entirety, the preoperative interactions, thereby optimizing the efficiency, accessibility, and standardization of patient communication.
Collapse
Affiliation(s)
- Arne Kienzle
- Center for Musculoskeletal Surgery, Clinic for Orthopedics, Charité—Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 10117 Berlin, Germany; (M.N.); (S.M.); (C.G.)
- Julius Wolff Institute and Center for Musculoskeletal Surgery, Charité—Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 13353 Berlin, Germany
- Berlin Institute of Health at Charité—Universitätsmedizin Berlin, BIH Biomedical Innovation Academy, BIH Charité Clinician Scientist Program, 10117 Berlin, Germany
| | - Marcel Niemann
- Center for Musculoskeletal Surgery, Clinic for Orthopedics, Charité—Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 10117 Berlin, Germany; (M.N.); (S.M.); (C.G.)
| | - Sebastian Meller
- Center for Musculoskeletal Surgery, Clinic for Orthopedics, Charité—Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 10117 Berlin, Germany; (M.N.); (S.M.); (C.G.)
| | - Clemens Gwinner
- Center for Musculoskeletal Surgery, Clinic for Orthopedics, Charité—Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 10117 Berlin, Germany; (M.N.); (S.M.); (C.G.)
| |
Collapse
|
17
|
Fabijan A, Polis B, Fabijan R, Zakrzewski K, Nowosławska E, Zawadzka-Fabijan A. Artificial Intelligence in Scoliosis Classification: An Investigation of Language-Based Models. J Pers Med 2023; 13:1695. [PMID: 38138922 PMCID: PMC10744696 DOI: 10.3390/jpm13121695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 12/03/2023] [Accepted: 12/07/2023] [Indexed: 12/24/2023] Open
Abstract
Open-source artificial intelligence models are finding free application in various industries, including computer science and medicine. Their clinical potential, especially in assisting diagnosis and therapy, is the subject of increasingly intensive research. Due to the growing interest in AI for diagnostics, we conducted a study evaluating the abilities of AI models, including ChatGPT, Microsoft Bing, and Scholar AI, in classifying single-curve scoliosis based on radiological descriptions. Fifty-six posturographic images depicting single-curve scoliosis were selected and assessed by two independent neurosurgery specialists, who classified them as mild, moderate, or severe based on Cobb angles. Subsequently, descriptions were developed that accurately characterized the degree of spinal deformation, based on the measured values of Cobb angles. These descriptions were then provided to AI language models to assess their proficiency in diagnosing spinal pathologies. The artificial intelligence models conducted classification using the provided data. Our study also focused on identifying specific sources of information and criteria applied in their decision-making algorithms, aiming for a deeper understanding of the determinants influencing AI decision processes in scoliosis classification. The classification quality of the predictions was evaluated using performance evaluation metrics such as sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, and balanced accuracy. Our study strongly supported our hypothesis, showing that among four AI models, ChatGPT 4 and Scholar AI Premium excelled in classifying single-curve scoliosis with perfect sensitivity and specificity. These models demonstrated unmatched rater concordance and excellent performance metrics. In comparing real and AI-generated scoliosis classifications, they showed impeccable precision in all posturographic images, indicating total accuracy (1.0, MAE = 0.0) and remarkable inter-rater agreement, with a perfect Fleiss' Kappa score. This was consistent across scoliosis cases with a Cobb's angle range of 11-92 degrees. Despite high accuracy in classification, each model used an incorrect angular range for the mild stage of scoliosis. Our findings highlight the immense potential of AI in analyzing medical data sets. However, the diversity in competencies of AI models indicates the need for their further development to more effectively meet specific needs in clinical practice.
Collapse
Affiliation(s)
- Artur Fabijan
- Department of Neurosurgery, Polish-Mother’s Memorial Hospital Research Institute, 93-338 Lodz, Poland; (B.P.); (K.Z.); (E.N.)
| | - Bartosz Polis
- Department of Neurosurgery, Polish-Mother’s Memorial Hospital Research Institute, 93-338 Lodz, Poland; (B.P.); (K.Z.); (E.N.)
| | | | - Krzysztof Zakrzewski
- Department of Neurosurgery, Polish-Mother’s Memorial Hospital Research Institute, 93-338 Lodz, Poland; (B.P.); (K.Z.); (E.N.)
| | - Emilia Nowosławska
- Department of Neurosurgery, Polish-Mother’s Memorial Hospital Research Institute, 93-338 Lodz, Poland; (B.P.); (K.Z.); (E.N.)
| | - Agnieszka Zawadzka-Fabijan
- Department of Rehabilitation Medicine, Faculty of Health Sciences, Medical University of Lodz, 90-419 Lodz, Poland;
| |
Collapse
|
18
|
Arena F, Bernaschi P, Mencacci A. Editorial: Clinical impact of fast platforms and laboratory automation for the rapid diagnosis of infectious diseases and detection of antimicrobial resistance determinants. Front Cell Infect Microbiol 2023; 13:1321663. [PMID: 38239509 PMCID: PMC10794890 DOI: 10.3389/fcimb.2023.1321663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Accepted: 11/27/2023] [Indexed: 01/22/2024] Open
Affiliation(s)
- Fabio Arena
- Department of Clinical and Experimental Medicine, University of Foggia, Foggia, Italy
| | - Paola Bernaschi
- Microbiology and Diagnostic Immunology Unit, Bambino Gesù Children’s Hospital, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS), Rome, Italy
| | - Antonella Mencacci
- Microbiology and Clinical Microbiology, Department of Medicine and Surgery, University of Perugia, Perugia, Italy
- Microbiology, Perugia General Hospital, Perugia, Italy
| |
Collapse
|
19
|
Marra AR, Langford BJ, Nori P, Bearman G. Revolutionizing antimicrobial stewardship, infection prevention, and public health with artificial intelligence: the middle path. ANTIMICROBIAL STEWARDSHIP & HEALTHCARE EPIDEMIOLOGY : ASHE 2023; 3:e219. [PMID: 38156216 PMCID: PMC10753466 DOI: 10.1017/ash.2023.494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Revised: 08/22/2023] [Accepted: 10/12/2023] [Indexed: 12/30/2023]
Affiliation(s)
- Alexandre R. Marra
- Hospital Israelita Albert Einstein, São Paulo, Brazil
- Department of Internal Medicine, University of Iowa Carver College of Medicine, Iowa City, IA, USA
| | - Bradley J. Langford
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
- Hotel Dieu Shaver Health and Rehabilitation Centre, St. Catharines, ON, Canada
| | - Priya Nori
- Division of Infectious Diseases, Department of Medicine, Montefiore Health System, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Gonzalo Bearman
- Division of Infectious Diseases, Virginia Commonwealth University Health, Virginia Commonwealth University, Richmond, VA, USA
| |
Collapse
|
20
|
Irfan B, Yaqoob A. ChatGPT's Epoch in Rheumatological Diagnostics: A Critical Assessment in the Context of Sjögren's Syndrome. Cureus 2023; 15:e47754. [PMID: 38022092 PMCID: PMC10676288 DOI: 10.7759/cureus.47754] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/26/2023] [Indexed: 12/01/2023] Open
Abstract
INTRODUCTION The rise of artificial intelligence in medical practice is reshaping clinical care. Large language models (LLMs) like ChatGPT have the potential to assist in rheumatology by personalizing scientific information retrieval, particularly in the context of Sjögren's Syndrome. This study aimed to evaluate the efficacy of ChatGPT in providing insights into Sjögren's Syndrome, differentiating it from other rheumatological conditions. MATERIALS AND METHODS A database of peer-reviewed articles and clinical guidelines focused on Sjögren's Syndrome was compiled. Clinically relevant questions were presented to ChatGPT, with responses assessed for accuracy, relevance, and comprehensiveness. Techniques such as blinding, random control queries, and temporal analysis ensured unbiased evaluation. ChatGPT's responses were also assessed using the 15-questionnaire DISCERN tool. RESULTS ChatGPT effectively highlighted key immunopathological and histopathological characteristics of Sjögren's Syndrome, though some crucial data and citation inconsistencies were noted. For a given clinical vignette, ChatGPT correctly identified potential etiological considerations with Sjögren's Syndrome being prominent. DISCUSSION LLMs like ChatGPT offer rapid access to vast amounts of data, beneficial for both patients and providers. While it democratizes information, limitations like potential oversimplification and reference inaccuracies were observed. The balance between LLM insights and clinical judgment, as well as continuous model refinement, is crucial. CONCLUSION LLMs like ChatGPT offer significant potential in rheumatology, providing swift and broad medical insights. However, a cautious approach is vital, ensuring rigorous training and ethical application for optimal patient care and clinical practice.
Collapse
Affiliation(s)
- Bilal Irfan
- Microbiology and Immunology, University of Michigan, Ann Arbor, USA
| | | |
Collapse
|
21
|
Levkovich I, Elyoseph Z. Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study. JMIR Ment Health 2023; 10:e51232. [PMID: 37728984 PMCID: PMC10551796 DOI: 10.2196/51232] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 08/22/2023] [Accepted: 08/24/2023] [Indexed: 09/22/2023] Open
Abstract
BACKGROUND ChatGPT, a linguistic artificial intelligence (AI) model engineered by OpenAI, offers prospective contributions to mental health professionals. Although having significant theoretical implications, ChatGPT's practical capabilities, particularly regarding suicide prevention, have not yet been substantiated. OBJECTIVE The study's aim was to evaluate ChatGPT's ability to assess suicide risk, taking into consideration 2 discernable factors-perceived burdensomeness and thwarted belongingness-over a 2-month period. In addition, we evaluated whether ChatGPT-4 more accurately evaluated suicide risk than did ChatGPT-3.5. METHODS ChatGPT was tasked with assessing a vignette that depicted a hypothetical patient exhibiting differing degrees of perceived burdensomeness and thwarted belongingness. The assessments generated by ChatGPT were subsequently contrasted with standard evaluations rendered by mental health professionals. Using both ChatGPT-3.5 and ChatGPT-4 (May 24, 2023), we executed 3 evaluative procedures in June and July 2023. Our intent was to scrutinize ChatGPT-4's proficiency in assessing various facets of suicide risk in relation to the evaluative abilities of both mental health professionals and an earlier version of ChatGPT-3.5 (March 14 version). RESULTS During the period of June and July 2023, we found that the likelihood of suicide attempts as evaluated by ChatGPT-4 was similar to the norms of mental health professionals (n=379) under all conditions (average Z score of 0.01). Nonetheless, a pronounced discrepancy was observed regarding the assessments performed by ChatGPT-3.5 (May version), which markedly underestimated the potential for suicide attempts, in comparison to the assessments carried out by the mental health professionals (average Z score of -0.83). The empirical evidence suggests that ChatGPT-4's evaluation of the incidence of suicidal ideation and psychache was higher than that of the mental health professionals (average Z score of 0.47 and 1.00, respectively). Conversely, the level of resilience as assessed by both ChatGPT-4 and ChatGPT-3.5 (both versions) was observed to be lower in comparison to the assessments offered by mental health professionals (average Z score of -0.89 and -0.90, respectively). CONCLUSIONS The findings suggest that ChatGPT-4 estimates the likelihood of suicide attempts in a manner akin to evaluations provided by professionals. In terms of recognizing suicidal ideation, ChatGPT-4 appears to be more precise. However, regarding psychache, there was an observed overestimation by ChatGPT-4, indicating a need for further research. These results have implications regarding ChatGPT-4's potential to support gatekeepers, patients, and even mental health professionals' decision-making. Despite the clinical potential, intensive follow-up studies are necessary to establish the use of ChatGPT-4's capabilities in clinical practice. The finding that ChatGPT-3.5 frequently underestimates suicide risk, especially in severe cases, is particularly troubling. It indicates that ChatGPT may downplay one's actual suicide risk level.
Collapse
Affiliation(s)
- Inbar Levkovich
- Oranim Academic College, Faculty of Graduate Studies, Kiryat Tivon, Israel
| | - Zohar Elyoseph
- Department of Psychology and Educational Counseling, The Center for Psychobiological Research, Max Stern Yezreel Valley College, Emek Yezreel, Israel
- Department of Brain Sciences, Faculty of Medicine, Imperial College London, London, United Kingdom
| |
Collapse
|
22
|
Ramamurthi A, Are C, Kothari AN. From ChatGPT to Treatment: the Future of AI and Large Language Models in Surgical Oncology. Indian J Surg Oncol 2023; 14:537-539. [PMID: 37900654 PMCID: PMC10611626 DOI: 10.1007/s13193-023-01836-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 10/04/2023] [Indexed: 10/31/2023] Open
Abstract
This paper explores the transformative potential of Large Language Models (LLMs) within the context of surgical oncology and outlines the foundational mechanisms behind these models. LLMs, such as GPT-4, have rapidly evolved in terms of scale and capabilities, with profound implications for their applications in healthcare. These models, rooted in the Generative Pretrained Transformer architecture, exhibit advanced natural language understanding and generation skills. Within surgical oncology, LLMs, when integrated into a Generalist Medical AI (GMAI) framework, hold great promise in offering real-time support throughout the cancer journey. However, alongside these opportunities, this paper underscores the importance of ethical, privacy, and efficacy considerations, especially in light of issues like data drift and potential biases. Collaborative efforts among healthcare providers, AI developers, and regulatory bodies are pivotal in ensuring responsible and effective use of LLMs in surgical oncology, thereby contributing to enhanced patient care and safety. As LLMs continue to advance, they are poised to become indispensable tools in the delivery of high-quality, efficient care in this specialized medical field.
Collapse
Affiliation(s)
- Adhitya Ramamurthi
- Department of Surgical Oncology, Medical College of Wisconsin, Milwaukee, WI USA
| | - Chandrakanth Are
- Department of Surgery, University of Nebraska Medical Center, Omaha, NE USA
| | - Anai N. Kothari
- Department of Surgical Oncology, Medical College of Wisconsin, Milwaukee, WI USA
| |
Collapse
|
23
|
Mykhalko Y, Kish P, Rubtsova Y, Kutsyn O, Koval V. FROM TEXT TO DIAGNOSE: CHATGPT'S EFFICACY IN MEDICAL DECISION-MAKING. WIADOMOSCI LEKARSKIE (WARSAW, POLAND : 1960) 2023; 76:2345-2350. [PMID: 38112347 DOI: 10.36740/wlek202311101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2023]
Abstract
OBJECTIVE The aim: Evaluate the diagnostic capabilities of the ChatGPT in the field of medical diagnosis. PATIENTS AND METHODS Materials and methods: We utilized 50 clinical cases, employing Large Language Model ChatGPT-3.5. The experiment had three phases, each with a new chat setup. In the initial phase, ChatGPT received detailed clinical case descriptions, guided by a "Persona Pattern" prompt. In the second phase, cases with diagnostic errors were addressed by providing potential diagnoses for ChatGPT to choose from. The final phase assessed artificial intelligence's ability to mimic a medical practitioner's diagnostic process, with prompts limiting initial information to symptoms and history. RESULTS Results: In the initial phase, ChatGPT showed a 66.00% diagnostic accuracy, surpassing physicians by nearly 50%. Notably, in 11 cases requiring image inter¬pretation, ChatGPT struggled initially but achieved a correct diagnosis for four without added interpretations. In the second phase, ChatGPT demonstrated a remarkable 70.59% diagnostic accuracy, while physicians averaged 41.47%. Furthermore, the overall accuracy of Large Language Model in first and second phases together was 90.00%. In the third phase emulating real doctor decision-making, ChatGPT achieved a 46.00% success rate. CONCLUSION Conclusions: Our research underscores ChatGPT's strong potential in clinical medicine as a diagnostic tool, especially in structured scenarios. It emphasizes the need for supplementary data and the complexity of medical diagnosis. This contributes valuable insights to AI-driven clinical diagnostics, with a nod to the importance of prompt engineering techniques in ChatGPT's interaction with doctors.
Collapse
Affiliation(s)
| | - Pavlo Kish
- UZHHOROD NATIONAL UNIVERSITY, UZHHOROD, UKRAINE
| | | | | | | |
Collapse
|