1
|
Bagde H, Dhopte A, Alam MK, Basri R. A systematic review and meta-analysis on ChatGPT and its utilization in medical and dental research. Heliyon 2023; 9:e23050. [PMID: 38144348 PMCID: PMC10746423 DOI: 10.1016/j.heliyon.2023.e23050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Revised: 10/24/2023] [Accepted: 11/24/2023] [Indexed: 12/26/2023] Open
Abstract
Since its release, ChatGPT has taken the world by storm with its utilization in various fields of life. This review's main goal was to offer a thorough and fact-based evaluation of ChatGPT's potential as a tool for medical and dental research, which could direct subsequent research and influence clinical practices. METHODS Different online databases were scoured for relevant articles that were in accordance with the study objectives. A team of reviewers was assembled to devise a proper methodological framework for inclusion of articles and meta-analysis. RESULTS 11 descriptive studies were considered for this review that evaluated the accuracy of ChatGPT in answering medical queries related to different domains such as systematic reviews, cancer, liver diseases, diagnostic imaging, education, and COVID-19 vaccination. The studies reported different accuracy ranges, from 18.3 % to 100 %, across various datasets and specialties. The meta-analysis showed an odds ratio (OR) of 2.25 and a relative risk (RR) of 1.47 with a 95 % confidence interval (CI), indicating that the accuracy of ChatGPT in providing correct responses was significantly higher compared to the total responses for queries. However, significant heterogeneity was present among the studies, suggesting considerable variability in the effect sizes across the included studies. CONCLUSION The observations indicate that ChatGPT has the ability to provide appropriate solutions to questions in the medical and dentistry areas, but researchers and doctors should cautiously assess its responses because they might not always be dependable. Overall, the importance of this study rests in shedding light on ChatGPT's accuracy in the medical and dentistry fields and emphasizing the need for additional investigation to enhance its performance. © 2017 Elsevier Inc. All rights reserved.
Collapse
Affiliation(s)
- Hiroj Bagde
- Department of Periodontology, Chhattisgarh Dental College and Research Institute, Rajnandgaon, Chhattisgarh, India
| | - Ashwini Dhopte
- Department of Oral Medicine and Radiology, Chhattisgarh Dental College and Research Institute, Rajnandgaon, Chhattisgarh, India
| | - Mohammad Khursheed Alam
- Preventive Dentistry Department, College of Dentistry, Jouf University, Sakaka, 72345, Saudi Arabia
- Department of Dental Research Cell, Saveetha Dental College and Hospitals, Saveetha Institute of Medical and Technical Sciences, Chennai, India
- Department of Public Health, Faculty of Allied Health Sciences, Daffodil International University, Dhaka, Bangladesh
| | - Rehana Basri
- Department of Internal Medicine, College of Medicine, Jouf University, Sakaka, 72345, Saudi Arabia
| |
Collapse
|
2
|
Nguyen KAN, Tandon P, Ghanavati S, Cheetirala SN, Timsina P, Freeman R, Reich D, Levin MA, Mazumdar M, Fayad ZA, Kia A. A Hybrid Decision Tree and Deep Learning Approach Combining Medical Imaging and Electronic Medical Records to Predict Intubation Among Hospitalized Patients With COVID-19: Algorithm Development and Validation. JMIR Form Res 2023; 7:e46905. [PMID: 37883177 PMCID: PMC10636624 DOI: 10.2196/46905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 05/18/2023] [Accepted: 06/27/2023] [Indexed: 10/27/2023] Open
Abstract
BACKGROUND Early prediction of the need for invasive mechanical ventilation (IMV) in patients hospitalized with COVID-19 symptoms can help in the allocation of resources appropriately and improve patient outcomes by appropriately monitoring and treating patients at the greatest risk of respiratory failure. To help with the complexity of deciding whether a patient needs IMV, machine learning algorithms may help bring more prognostic value in a timely and systematic manner. Chest radiographs (CXRs) and electronic medical records (EMRs), typically obtained early in patients admitted with COVID-19, are the keys to deciding whether they need IMV. OBJECTIVE We aimed to evaluate the use of a machine learning model to predict the need for intubation within 24 hours by using a combination of CXR and EMR data in an end-to-end automated pipeline. We included historical data from 2481 hospitalizations at The Mount Sinai Hospital in New York City. METHODS CXRs were first resized, rescaled, and normalized. Then lungs were segmented from the CXRs by using a U-Net algorithm. After splitting them into a training and a test set, the training set images were augmented. The augmented images were used to train an image classifier to predict the probability of intubation with a prediction window of 24 hours by retraining a pretrained DenseNet model by using transfer learning, 10-fold cross-validation, and grid search. Then, in the final fusion model, we trained a random forest algorithm via 10-fold cross-validation by combining the probability score from the image classifier with 41 longitudinal variables in the EMR. Variables in the EMR included clinical and laboratory data routinely collected in the inpatient setting. The final fusion model gave a prediction likelihood for the need of intubation within 24 hours as well. RESULTS At a prediction probability threshold of 0.5, the fusion model provided 78.9% (95% CI 59%-96%) sensitivity, 83% (95% CI 76%-89%) specificity, 0.509 (95% CI 0.34-0.67) F1-score, 0.874 (95% CI 0.80-0.94) area under the receiver operating characteristic curve (AUROC), and 0.497 (95% CI 0.32-0.65) area under the precision recall curve (AUPRC) on the holdout set. Compared to the image classifier alone, which had an AUROC of 0.577 (95% CI 0.44-0.73) and an AUPRC of 0.206 (95% CI 0.08-0.38), the fusion model showed significant improvement (P<.001). The most important predictor variables were respiratory rate, C-reactive protein, oxygen saturation, and lactate dehydrogenase. The imaging probability score ranked 15th in overall feature importance. CONCLUSIONS We show that, when linked with EMR data, an automated deep learning image classifier improved performance in identifying hospitalized patients with severe COVID-19 at risk for intubation. With additional prospective and external validation, such a model may assist risk assessment and optimize clinical decision-making in choosing the best care plan during the critical stages of COVID-19.
Collapse
Affiliation(s)
- Kim-Anh-Nhi Nguyen
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Pranai Tandon
- Department of Medicine Division of Pulmonary, Critical Care, and Sleep Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Sahar Ghanavati
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Satya Narayana Cheetirala
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Prem Timsina
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Robert Freeman
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, United States
- Hospital Administration, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - David Reich
- Hospital Administration, Icahn School of Medicine at Mount Sinai, New York, NY, United States
- Department of Anesthesiology, Perioperative and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Matthew A Levin
- Department of Anesthesiology, Perioperative and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, United States
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
- Windreich Department of Artificial Intelligence and Human Health, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Madhu Mazumdar
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, United States
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Zahi A Fayad
- BioMedical Engineering and Imaging Institute, Icahn School of Medicine at Mount Sinai, New York, NY, United States
- Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Arash Kia
- Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, United States
- Department of Anesthesiology, Perioperative and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| |
Collapse
|
3
|
Rao A, Pang M, Kim J, Kamineni M, Lie W, Prasad AK, Landman A, Dreyer K, Succi MD. Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study. J Med Internet Res 2023; 25:e48659. [PMID: 37606976 PMCID: PMC10481210 DOI: 10.2196/48659] [Citation(s) in RCA: 48] [Impact Index Per Article: 48.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 07/26/2023] [Accepted: 07/27/2023] [Indexed: 08/23/2023] Open
Abstract
BACKGROUND Large language model (LLM)-based artificial intelligence chatbots direct the power of large training data sets toward successive, related tasks as opposed to single-ask tasks, for which artificial intelligence already achieves impressive performance. The capacity of LLMs to assist in the full scope of iterative clinical reasoning via successive prompting, in effect acting as artificial physicians, has not yet been evaluated. OBJECTIVE This study aimed to evaluate ChatGPT's capacity for ongoing clinical decision support via its performance on standardized clinical vignettes. METHODS We inputted all 36 published clinical vignettes from the Merck Sharpe & Dohme (MSD) Clinical Manual into ChatGPT and compared its accuracy on differential diagnoses, diagnostic testing, final diagnosis, and management based on patient age, gender, and case acuity. Accuracy was measured by the proportion of correct responses to the questions posed within the clinical vignettes tested, as calculated by human scorers. We further conducted linear regression to assess the contributing factors toward ChatGPT's performance on clinical tasks. RESULTS ChatGPT achieved an overall accuracy of 71.7% (95% CI 69.3%-74.1%) across all 36 clinical vignettes. The LLM demonstrated the highest performance in making a final diagnosis with an accuracy of 76.9% (95% CI 67.8%-86.1%) and the lowest performance in generating an initial differential diagnosis with an accuracy of 60.3% (95% CI 54.2%-66.6%). Compared to answering questions about general medical knowledge, ChatGPT demonstrated inferior performance on differential diagnosis (β=-15.8%; P<.001) and clinical management (β=-7.4%; P=.02) question types. CONCLUSIONS ChatGPT achieves impressive accuracy in clinical decision-making, with increasing strength as it gains more clinical information at its disposal. In particular, ChatGPT demonstrates the greatest accuracy in tasks of final diagnosis as compared to initial diagnosis. Limitations include possible model hallucinations and the unclear composition of ChatGPT's training data set.
Collapse
Affiliation(s)
- Arya Rao
- Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Department of Radiology, Massachusetts General Hospital, Boston, MA, United States
| | - Michael Pang
- Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Department of Radiology, Massachusetts General Hospital, Boston, MA, United States
| | - John Kim
- Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Department of Radiology, Massachusetts General Hospital, Boston, MA, United States
| | - Meghana Kamineni
- Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Department of Radiology, Massachusetts General Hospital, Boston, MA, United States
| | - Winston Lie
- Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Department of Radiology, Massachusetts General Hospital, Boston, MA, United States
| | - Anoop K Prasad
- Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Department of Radiology, Massachusetts General Hospital, Boston, MA, United States
| | - Adam Landman
- Harvard Medical School, Boston, MA, United States
- Department of Radiology, Brigham and Women's Hospital, Boston, MA, United States
| | - Keith Dreyer
- Harvard Medical School, Boston, MA, United States
- Data Science Office, Mass General Brigham, Boston, MA, United States
| | - Marc D Succi
- Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Department of Radiology, Massachusetts General Hospital, Boston, MA, United States
- Mass General Brigham Innovation, Mass General Brigham, Boston, MA, United States
| |
Collapse
|
4
|
Rao A, Pang M, Kim J, Kamineni M, Lie W, Prasad AK, Landman A, Dreyer KJ, Succi MD. Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.02.21.23285886. [PMID: 36865204 PMCID: PMC9980239 DOI: 10.1101/2023.02.21.23285886] [Citation(s) in RCA: 34] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]
Abstract
IMPORTANCE Large language model (LLM) artificial intelligence (AI) chatbots direct the power of large training datasets towards successive, related tasks, as opposed to single-ask tasks, for which AI already achieves impressive performance. The capacity of LLMs to assist in the full scope of iterative clinical reasoning via successive prompting, in effect acting as virtual physicians, has not yet been evaluated. OBJECTIVE To evaluate ChatGPT's capacity for ongoing clinical decision support via its performance on standardized clinical vignettes. DESIGN We inputted all 36 published clinical vignettes from the Merck Sharpe & Dohme (MSD) Clinical Manual into ChatGPT and compared accuracy on differential diagnoses, diagnostic testing, final diagnosis, and management based on patient age, gender, and case acuity. SETTING ChatGPT, a publicly available LLM. PARTICIPANTS Clinical vignettes featured hypothetical patients with a variety of age and gender identities, and a range of Emergency Severity Indices (ESIs) based on initial clinical presentation. EXPOSURES MSD Clinical Manual vignettes. MAIN OUTCOMES AND MEASURES We measured the proportion of correct responses to the questions posed within the clinical vignettes tested. RESULTS ChatGPT achieved 71.7% (95% CI, 69.3% to 74.1%) accuracy overall across all 36 clinical vignettes. The LLM demonstrated the highest performance in making a final diagnosis with an accuracy of 76.9% (95% CI, 67.8% to 86.1%), and the lowest performance in generating an initial differential diagnosis with an accuracy of 60.3% (95% CI, 54.2% to 66.6%). Compared to answering questions about general medical knowledge, ChatGPT demonstrated inferior performance on differential diagnosis (β=-15.8%, p<0.001) and clinical management (β=-7.4%, p=0.02) type questions. CONCLUSIONS AND RELEVANCE ChatGPT achieves impressive accuracy in clinical decision making, with particular strengths emerging as it has more clinical information at its disposal.
Collapse
|