Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: O'Shea A, Li MD, Mercaldo ND, Balthazar P, Som A, Yeung T, Succi MD, Little BP, Kalpathy-Cramer J, Lee SI. Intubation and mortality prediction in hospitalized COVID-19 patients using a combination of convolutional neural network-based scoring of chest radiographs and clinical data. BJR Open 2022;4:20210062. [PMID: 36105420 PMCID: PMC9459864 DOI: 10.1259/bjro.20210062] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 03/06/2022] [Accepted: 03/09/2022] [Indexed: 12/04/2022] Open

For:	O'Shea A, Li MD, Mercaldo ND, Balthazar P, Som A, Yeung T, Succi MD, Little BP, Kalpathy-Cramer J, Lee SI. Intubation and mortality prediction in hospitalized COVID-19 patients using a combination of convolutional neural network-based scoring of chest radiographs and clinical data. BJR Open 2022;4:20210062. [PMID: 36105420 PMCID: PMC9459864 DOI: 10.1259/bjro.20210062] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 03/06/2022] [Accepted: 03/09/2022] [Indexed: 12/04/2022] Open

Number

Cited by Other Article(s)

Bagde H, Dhopte A, Alam MK, Basri R. A systematic review and meta-analysis on ChatGPT and its utilization in medical and dental research. Heliyon 2023;9:e23050. [PMID: 38144348 PMCID: PMC10746423 DOI: 10.1016/j.heliyon.2023.e23050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Revised: 10/24/2023] [Accepted: 11/24/2023] [Indexed: 12/26/2023] Open

Abstract

Since its release, ChatGPT has taken the world by storm with its utilization in various fields of life. This review's main goal was to offer a thorough and fact-based evaluation of ChatGPT's potential as a tool for medical and dental research, which could direct subsequent research and influence clinical practices.

METHODS

Different online databases were scoured for relevant articles that were in accordance with the study objectives. A team of reviewers was assembled to devise a proper methodological framework for inclusion of articles and meta-analysis.

RESULTS

11 descriptive studies were considered for this review that evaluated the accuracy of ChatGPT in answering medical queries related to different domains such as systematic reviews, cancer, liver diseases, diagnostic imaging, education, and COVID-19 vaccination. The studies reported different accuracy ranges, from 18.3 % to 100 %, across various datasets and specialties. The meta-analysis showed an odds ratio (OR) of 2.25 and a relative risk (RR) of 1.47 with a 95 % confidence interval (CI), indicating that the accuracy of ChatGPT in providing correct responses was significantly higher compared to the total responses for queries. However, significant heterogeneity was present among the studies, suggesting considerable variability in the effect sizes across the included studies.

CONCLUSION

The observations indicate that ChatGPT has the ability to provide appropriate solutions to questions in the medical and dentistry areas, but researchers and doctors should cautiously assess its responses because they might not always be dependable. Overall, the importance of this study rests in shedding light on ChatGPT's accuracy in the medical and dentistry fields and emphasizing the need for additional investigation to enhance its performance. © 2017 Elsevier Inc. All rights reserved.

Collapse

Nguyen KAN, Tandon P, Ghanavati S, Cheetirala SN, Timsina P, Freeman R, Reich D, Levin MA, Mazumdar M, Fayad ZA, Kia A. A Hybrid Decision Tree and Deep Learning Approach Combining Medical Imaging and Electronic Medical Records to Predict Intubation Among Hospitalized Patients With COVID-19: Algorithm Development and Validation. JMIR Form Res 2023;7:e46905. [PMID: 37883177 PMCID: PMC10636624 DOI: 10.2196/46905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 05/18/2023] [Accepted: 06/27/2023] [Indexed: 10/27/2023] Open

Abstract

BACKGROUND

Early prediction of the need for invasive mechanical ventilation (IMV) in patients hospitalized with COVID-19 symptoms can help in the allocation of resources appropriately and improve patient outcomes by appropriately monitoring and treating patients at the greatest risk of respiratory failure. To help with the complexity of deciding whether a patient needs IMV, machine learning algorithms may help bring more prognostic value in a timely and systematic manner. Chest radiographs (CXRs) and electronic medical records (EMRs), typically obtained early in patients admitted with COVID-19, are the keys to deciding whether they need IMV.

OBJECTIVE

We aimed to evaluate the use of a machine learning model to predict the need for intubation within 24 hours by using a combination of CXR and EMR data in an end-to-end automated pipeline. We included historical data from 2481 hospitalizations at The Mount Sinai Hospital in New York City.

METHODS

CXRs were first resized, rescaled, and normalized. Then lungs were segmented from the CXRs by using a U-Net algorithm. After splitting them into a training and a test set, the training set images were augmented. The augmented images were used to train an image classifier to predict the probability of intubation with a prediction window of 24 hours by retraining a pretrained DenseNet model by using transfer learning, 10-fold cross-validation, and grid search. Then, in the final fusion model, we trained a random forest algorithm via 10-fold cross-validation by combining the probability score from the image classifier with 41 longitudinal variables in the EMR. Variables in the EMR included clinical and laboratory data routinely collected in the inpatient setting. The final fusion model gave a prediction likelihood for the need of intubation within 24 hours as well.

RESULTS

At a prediction probability threshold of 0.5, the fusion model provided 78.9% (95% CI 59%-96%) sensitivity, 83% (95% CI 76%-89%) specificity, 0.509 (95% CI 0.34-0.67) F1-score, 0.874 (95% CI 0.80-0.94) area under the receiver operating characteristic curve (AUROC), and 0.497 (95% CI 0.32-0.65) area under the precision recall curve (AUPRC) on the holdout set. Compared to the image classifier alone, which had an AUROC of 0.577 (95% CI 0.44-0.73) and an AUPRC of 0.206 (95% CI 0.08-0.38), the fusion model showed significant improvement (P<.001). The most important predictor variables were respiratory rate, C-reactive protein, oxygen saturation, and lactate dehydrogenase. The imaging probability score ranked 15th in overall feature importance.

CONCLUSIONS

We show that, when linked with EMR data, an automated deep learning image classifier improved performance in identifying hospitalized patients with severe COVID-19 at risk for intubation. With additional prospective and external validation, such a model may assist risk assessment and optimize clinical decision-making in choosing the best care plan during the critical stages of COVID-19.

Collapse

Affiliation(s)

Kim-Anh-Nhi Nguyen Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, United States
Pranai Tandon Department of Medicine Division of Pulmonary, Critical Care, and Sleep Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, United States
Sahar Ghanavati Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, United States
Satya Narayana Cheetirala Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, United States
Prem Timsina Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, United States
Robert Freeman Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, United States Hospital Administration, Icahn School of Medicine at Mount Sinai, New York, NY, United States
David Reich Hospital Administration, Icahn School of Medicine at Mount Sinai, New York, NY, United States Department of Anesthesiology, Perioperative and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, United States
Matthew A Levin Department of Anesthesiology, Perioperative and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, United States Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States Windreich Department of Artificial Intelligence and Human Health, Icahn School of Medicine at Mount Sinai, New York, NY, United States
Madhu Mazumdar Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, United States Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, United States
Zahi A Fayad BioMedical Engineering and Imaging Institute, Icahn School of Medicine at Mount Sinai, New York, NY, United States Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY, United States
Arash Kia Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY, United States Department of Anesthesiology, Perioperative and Pain Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, United States

Collapse

Rao A, Pang M, Kim J, Kamineni M, Lie W, Prasad AK, Landman A, Dreyer K, Succi MD. Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study. J Med Internet Res 2023;25:e48659. [PMID: 37606976 PMCID: PMC10481210 DOI: 10.2196/48659] [Citation(s) in RCA: 48] [Impact Index Per Article: 48.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 07/26/2023] [Accepted: 07/27/2023] [Indexed: 08/23/2023] Open

Abstract

BACKGROUND

Large language model (LLM)-based artificial intelligence chatbots direct the power of large training data sets toward successive, related tasks as opposed to single-ask tasks, for which artificial intelligence already achieves impressive performance. The capacity of LLMs to assist in the full scope of iterative clinical reasoning via successive prompting, in effect acting as artificial physicians, has not yet been evaluated.

OBJECTIVE

This study aimed to evaluate ChatGPT's capacity for ongoing clinical decision support via its performance on standardized clinical vignettes.

METHODS

We inputted all 36 published clinical vignettes from the Merck Sharpe & Dohme (MSD) Clinical Manual into ChatGPT and compared its accuracy on differential diagnoses, diagnostic testing, final diagnosis, and management based on patient age, gender, and case acuity. Accuracy was measured by the proportion of correct responses to the questions posed within the clinical vignettes tested, as calculated by human scorers. We further conducted linear regression to assess the contributing factors toward ChatGPT's performance on clinical tasks.

RESULTS

ChatGPT achieved an overall accuracy of 71.7% (95% CI 69.3%-74.1%) across all 36 clinical vignettes. The LLM demonstrated the highest performance in making a final diagnosis with an accuracy of 76.9% (95% CI 67.8%-86.1%) and the lowest performance in generating an initial differential diagnosis with an accuracy of 60.3% (95% CI 54.2%-66.6%). Compared to answering questions about general medical knowledge, ChatGPT demonstrated inferior performance on differential diagnosis (β=-15.8%; P<.001) and clinical management (β=-7.4%; P=.02) question types.

CONCLUSIONS

ChatGPT achieves impressive accuracy in clinical decision-making, with increasing strength as it gains more clinical information at its disposal. In particular, ChatGPT demonstrates the greatest accuracy in tasks of final diagnosis as compared to initial diagnosis. Limitations include possible model hallucinations and the unclear composition of ChatGPT's training data set.

Collapse

Affiliation(s)

Arya Rao Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States Department of Radiology, Massachusetts General Hospital, Boston, MA, United States
Michael Pang Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States Department of Radiology, Massachusetts General Hospital, Boston, MA, United States
John Kim Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States Department of Radiology, Massachusetts General Hospital, Boston, MA, United States
Meghana Kamineni Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States Department of Radiology, Massachusetts General Hospital, Boston, MA, United States
Winston Lie Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States Department of Radiology, Massachusetts General Hospital, Boston, MA, United States
Anoop K Prasad Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States Department of Radiology, Massachusetts General Hospital, Boston, MA, United States
Adam Landman Harvard Medical School, Boston, MA, United States Department of Radiology, Brigham and Women's Hospital, Boston, MA, United States
Keith Dreyer Harvard Medical School, Boston, MA, United States Data Science Office, Mass General Brigham, Boston, MA, United States
Marc D Succi Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center (MESH IO), Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States Department of Radiology, Massachusetts General Hospital, Boston, MA, United States Mass General Brigham Innovation, Mass General Brigham, Boston, MA, United States

Collapse

Rao A, Pang M, Kim J, Kamineni M, Lie W, Prasad AK, Landman A, Dreyer KJ, Succi MD. Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.02.21.23285886. [PMID: 36865204 PMCID: PMC9980239 DOI: 10.1101/2023.02.21.23285886] [Citation(s) in RCA: 34] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]

Abstract

IMPORTANCE

Large language model (LLM) artificial intelligence (AI) chatbots direct the power of large training datasets towards successive, related tasks, as opposed to single-ask tasks, for which AI already achieves impressive performance. The capacity of LLMs to assist in the full scope of iterative clinical reasoning via successive prompting, in effect acting as virtual physicians, has not yet been evaluated.

OBJECTIVE

To evaluate ChatGPT's capacity for ongoing clinical decision support via its performance on standardized clinical vignettes.

DESIGN

We inputted all 36 published clinical vignettes from the Merck Sharpe & Dohme (MSD) Clinical Manual into ChatGPT and compared accuracy on differential diagnoses, diagnostic testing, final diagnosis, and management based on patient age, gender, and case acuity.

SETTING

ChatGPT, a publicly available LLM.

PARTICIPANTS

Clinical vignettes featured hypothetical patients with a variety of age and gender identities, and a range of Emergency Severity Indices (ESIs) based on initial clinical presentation.

EXPOSURES

MSD Clinical Manual vignettes.

MAIN OUTCOMES AND MEASURES

We measured the proportion of correct responses to the questions posed within the clinical vignettes tested.

RESULTS

ChatGPT achieved 71.7% (95% CI, 69.3% to 74.1%) accuracy overall across all 36 clinical vignettes. The LLM demonstrated the highest performance in making a final diagnosis with an accuracy of 76.9% (95% CI, 67.8% to 86.1%), and the lowest performance in generating an initial differential diagnosis with an accuracy of 60.3% (95% CI, 54.2% to 66.6%). Compared to answering questions about general medical knowledge, ChatGPT demonstrated inferior performance on differential diagnosis (β=-15.8%, p<0.001) and clinical management (β=-7.4%, p=0.02) type questions.

CONCLUSIONS AND RELEVANCE

ChatGPT achieves impressive accuracy in clinical decision making, with particular strengths emerging as it has more clinical information at its disposal.

Collapse