Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Sun H, Zhang K, Lan W, Gu Q, Jiang G, Yang X, Qin W, Han D. An AI Dietitian for Type 2 Diabetes Mellitus Management Based on Large Language and Image Recognition Models: Preclinical Concept Validation Study. J Med Internet Res 2023;25:e51300. [PMID: 37943581 PMCID: PMC10667983 DOI: 10.2196/51300] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 09/18/2023] [Accepted: 10/06/2023] [Indexed: 11/10/2023] Open

For:	Sun H, Zhang K, Lan W, Gu Q, Jiang G, Yang X, Qin W, Han D. An AI Dietitian for Type 2 Diabetes Mellitus Management Based on Large Language and Image Recognition Models: Preclinical Concept Validation Study. J Med Internet Res 2023;25:e51300. [PMID: 37943581 PMCID: PMC10667983 DOI: 10.2196/51300] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 09/18/2023] [Accepted: 10/06/2023] [Indexed: 11/10/2023] Open

Number

Cited by Other Article(s)

Su Y, Wang Y, He J, Wang H, A X, Jiang H, Lu W, Zhou W, Li L. Development and validation of machine-learning models of diet management for hyperphenylalaninemia: a multicenter retrospective study. BMC Med 2024;22:377. [PMID: 39256839 PMCID: PMC11388910 DOI: 10.1186/s12916-024-03602-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Accepted: 09/02/2024] [Indexed: 09/12/2024] Open

Abstract

BACKGROUND

Assessing dietary phenylalanine (Phe) tolerance is crucial for managing hyperphenylalaninemia (HPA) in children. However, traditionally, adjusting the diet requires significant time from clinicians and parents. This study aims to investigate the development of a machine-learning model that predicts a range of dietary Phe intake tolerance for children with HPA over 10 years following diagnosis.

METHODS

In this multicenter retrospective observational study, we collected the genotypes of phenylalanine hydroxylase (PAH), metabolic profiles at screening and diagnosis, and blood Phe concentrations corresponding to dietary Phe intake from over 10 years of follow-up data for 204 children with HPA. To incorporate genetic information, allelic phenotype value (APV) was input for 2965 missense variants in the PAH gene using a predicted APV (pAPV) model. This model was trained on known pheno-genotype relationships from the BioPKU database, utilizing 31 features. Subsequently, a multiclass classification model was constructed and trained on a dataset featuring metabolic data, genetic data, and follow-up data from 3177 events. The final model was fine-tuned using tenfold validation and validated against three independent datasets.

RESULTS

The pAPV model achieved a good predictive performance with root mean squared error (RMSE) of 1.53 and 2.38 on the training and test datasets, respectively. The variants that cause amino acid changes in the region of 200-300 of PAH tend to exhibit lower pAPV. The final model achieved a sensitivity range of 0.77 to 0.91 and a specificity range of 0.8 to 1 across all validation datasets. Additional assessment metrics including positive predictive value (0.68-1), negative predictive values (0.8-0.98), F1 score (0.71-0.92), and balanced accuracy (0.8-0.92) demonstrated the robust performance of our model.

CONCLUSIONS

Our model integrates metabolic and genetic information to accurately predict age-specific Phe tolerance, aiding in the precision management of patients with HPA. This study provides a potential framework that could be applied to other inborn errors of metabolism.

Collapse

Wang D, Liang J, Ye J, Li J, Li J, Zhang Q, Hu Q, Pan C, Wang D, Liu Z, Shi W, Shi D, Li F, Qu B, Zheng Y. Enhancement of Large Language Models' Performance in Diabetes Education: Retrieval-Augmented Generation Approach. J Med Internet Res 2024. [PMID: 39046096 DOI: 10.2196/58041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/25/2024] Open

Abstract

BACKGROUND

Large language models (LLMs) demonstrated advanced performance in processing clinical information. However, commercially available LLMs lack specialized medical knowledge and remain susceptible to generating inaccurate information. Given the need for self-management in diabetes, patients commonly seek information online. We introduce the RISE framework and evaluate its performance in enhancing LLMs to provide accurate responses to diabetes-related inquiries.

OBJECTIVE

This study aimed to evaluate the potential of RISE framework, an information retrieval and augmentation tool, to improve the LLM's performance to accurately and safely respond to diabetes-related inquiries.

METHODS

The RISE, an innovative retrieval augmentation framework, comprises four steps: Rewriting Query, Information Retrieval, Summarization, and Execution. Using a set of 43 common diabetes-related questions, we evaluated three base LLMs (GPT-4, Anthropic Claude 2, Google Bard) and their RISE-enhanced versions. Assessments were conducted by clinicians for accuracy and comprehensiveness, and by patients for understandability.

RESULTS

The integration of RISE significantly improved the accuracy and comprehensiveness of responses from all three based LLMs. On average, the percentage of accurate responses increased by 12% (122 - 107/129) with RISE. Specifically, the rates of accurate responses increased by 7% (42 - 39/43) for GPT-4, 19% (39 - 31/43) for Claude 2, and 9% (41 - 37/43) for Google Bard. The framework also enhanced response comprehensiveness, with mean scores improving by 0.44. Understandability was also enhanced by 0.19 on average. Data collection was conducted from Sept. 30, 2023, to Feb. 05, 2024.

CONCLUSIONS

RISE significantly improves LLMs' performance in responding to diabetes-related inquiries, enhancing accuracy, comprehensiveness, and understandability. These improvements have crucial implications for RISE's future role in patient education and chronic illness self-management, which contributes to relieving medical resource pressures and raising public awareness of medical knowledge.

CLINICALTRIAL

Collapse

Huo W, He M, Zeng Z, Bao X, Lu Y, Tian W, Feng J, Feng R. Impact Analysis of COVID-19 Pandemic on Hospital Reviews on Dianping Website in Shanghai, China: Empirical Study. J Med Internet Res 2024;26:e52992. [PMID: 38954461 PMCID: PMC11252617 DOI: 10.2196/52992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 01/24/2024] [Accepted: 05/21/2024] [Indexed: 07/04/2024] Open

Abstract

BACKGROUND

In the era of the internet, individuals have increasingly accustomed themselves to gathering necessary information and expressing their opinions on public web-based platforms. The health care sector is no exception, as these comments, to a certain extent, influence people's health care decisions. During the onset of the COVID-19 pandemic, how the medical experience of Chinese patients and their evaluations of hospitals have changed remains to be studied. Therefore, we plan to collect patient medical visit data from the internet to reflect the current status of medical relationships under specific circumstances.

OBJECTIVE

This study aims to explore the differences in patient comments across various stages (during, before, and after) of the COVID-19 pandemic, as well as among different types of hospitals (children's hospitals, maternity hospitals, and tumor hospitals). Additionally, by leveraging ChatGPT (OpenAI), the study categorizes the elements of negative hospital evaluations. An analysis is conducted on the acquired data, and potential solutions that could improve patient satisfaction are proposed. This study is intended to assist hospital managers in providing a better experience for patients who are seeking care amid an emergent public health crisis.

METHODS

Selecting the top 50 comprehensive hospitals nationwide and the top specialized hospitals (children's hospitals, tumor hospitals, and maternity hospitals), we collected patient reviews from these hospitals on the Dianping website. Using ChatGPT, we classified the content of negative reviews. Additionally, we conducted statistical analysis using SPSS (IBM Corp) to examine the scoring and composition of negative evaluations.

RESULTS

A total of 30,317 pieces of effective comment information were collected from January 1, 2018, to August 15, 2023, including 7696 pieces of negative comment information. Manual inspection results indicated that ChatGPT had an accuracy rate of 92.05%. The F1-score was 0.914. The analysis of this data revealed a significant correlation between the comments and ratings received by hospitals during the pandemic. Overall, there was a significant increase in average comment scores during the outbreak (P<.001). Furthermore, there were notable differences in the composition of negative comments among different types of hospitals (P<.001). Children's hospitals received sensitive feedback regarding waiting times and treatment effectiveness, while patients at maternity hospitals showed a greater concern for the attitude of health care providers. Patients at tumor hospitals expressed a desire for timely examinations and treatments, especially during the pandemic period.

CONCLUSIONS

The COVID-19 pandemic had some association with patient comment scores. There were variations in the scores and content of comments among different types of specialized hospitals. Using ChatGPT to analyze patient comment content represents an innovative approach for statistically assessing factors contributing to patient dissatisfaction. The findings of this study could provide valuable insights for hospital administrators to foster more harmonious physician-patient relationships and enhance hospital performance during public health emergencies.

Collapse

Sosa-Holwerda A, Park OH, Albracht-Schulte K, Niraula S, Thompson L, Oldewage-Theron W. The Role of Artificial Intelligence in Nutrition Research: A Scoping Review. Nutrients 2024;16:2066. [PMID: 38999814 PMCID: PMC11243505 DOI: 10.3390/nu16132066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 06/20/2024] [Accepted: 06/24/2024] [Indexed: 07/14/2024] Open

Marchi F, Bellini E, Iandelli A, Sampieri C, Peretti G. Exploring the landscape of AI-assisted decision-making in head and neck cancer treatment: a comparative analysis of NCCN guidelines and ChatGPT responses. Eur Arch Otorhinolaryngol 2024;281:2123-2136. [PMID: 38421392 DOI: 10.1007/s00405-024-08525-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 02/02/2024] [Indexed: 03/02/2024]

Abstract

PURPOSE

Recent breakthroughs in natural language processing and machine learning, exemplified by ChatGPT, have spurred a paradigm shift in healthcare. Released by OpenAI in November 2022, ChatGPT rapidly gained global attention. Trained on massive text datasets, this large language model holds immense potential to revolutionize healthcare. However, existing literature often overlooks the need for rigorous validation and real-world applicability.

METHODS

This head-to-head comparative study assesses ChatGPT's capabilities in providing therapeutic recommendations for head and neck cancers. Simulating every NCCN Guidelines scenarios. ChatGPT is queried on primary treatments, adjuvant treatment, and follow-up, with responses compared to the NCCN Guidelines. Performance metrics, including sensitivity, specificity, and F1 score, are employed for assessment.

RESULTS

The study includes 68 hypothetical cases and 204 clinical scenarios. ChatGPT exhibits promising capabilities in addressing NCCN-related queries, achieving high sensitivity and overall accuracy across primary treatment, adjuvant treatment, and follow-up. The study's metrics showcase robustness in providing relevant suggestions. However, a few inaccuracies are noted, especially in primary treatment scenarios.

CONCLUSION

Our study highlights the proficiency of ChatGPT in providing treatment suggestions. The model's alignment with the NCCN Guidelines sets the stage for a nuanced exploration of AI's evolving role in oncological decision support. However, challenges related to the interpretability of AI in clinical decision-making and the importance of clinicians understanding the underlying principles of AI models remain unexplored. As AI continues to advance, collaborative efforts between models and medical experts are deemed essential for unlocking new frontiers in personalized cancer care.

Collapse

Shiraishi M, Lee H, Kanayama K, Moriwaki Y, Okazaki M. Appropriateness of Artificial Intelligence Chatbots in Diabetic Foot Ulcer Management. INT J LOW EXTR WOUND 2024:15347346241236811. [PMID: 38419470 DOI: 10.1177/15347346241236811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2024]

Abstract

Type 2 diabetes is a significant global health concern. It often causes diabetic foot ulcers (DFUs), which affect millions of people and increase amputation and mortality rates. Despite existing guidelines, the complexity of DFU treatment makes clinical decisions challenging. Large language models such as chat generative pretrained transformer (ChatGPT), which are adept at natural language processing, have emerged as valuable resources in the medical field. However, concerns about the accuracy and reliability of the information they provide remain. We aimed to assess the accuracy of various artificial intelligence (AI) chatbots, including ChatGPT, in providing information on DFUs based on established guidelines. Seven AI chatbots were asked clinical questions (CQs) based on the DFU guidelines. Their responses were analyzed for accuracy in terms of answers to CQs, grade of recommendation, level of evidence, and agreement with the reference, including verification of the authenticity of the references provided by the chatbots. The AI chatbots showed a mean accuracy of 91.2% in answers to CQs, with discrepancies noted in grade of recommendation and level of evidence. Claude-2 outperformed other chatbots in the number of verified references (99.6%), whereas ChatGPT had the lowest rate of reference authenticity (66.3%). This study highlights the potential of AI chatbots as tools for disseminating medical information and demonstrates their high degree of accuracy in answering CQs related to DFUs. However, the variability in the accuracy of these chatbots and problems like AI hallucinations necessitate cautious use and further optimization for medical applications. This study underscores the evolving role of AI in healthcare and the importance of refining these technologies for effective use in clinical decision-making and patient education.

Collapse

Sallam M, Barakat M, Sallam M. A Preliminary Checklist (METRICS) to Standardize the Design and Reporting of Studies on Generative Artificial Intelligence-Based Models in Health Care Education and Practice: Development Study Involving a Literature Review. Interact J Med Res 2024;13:e54704. [PMID: 38276872 PMCID: PMC10905357 DOI: 10.2196/54704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Revised: 12/18/2023] [Accepted: 01/26/2024] [Indexed: 01/27/2024] Open

Abstract

BACKGROUND

Adherence to evidence-based practice is indispensable in health care. Recently, the utility of generative artificial intelligence (AI) models in health care has been evaluated extensively. However, the lack of consensus guidelines on the design and reporting of findings of these studies poses a challenge for the interpretation and synthesis of evidence.

OBJECTIVE

This study aimed to develop a preliminary checklist to standardize the reporting of generative AI-based studies in health care education and practice.

METHODS

A literature review was conducted in Scopus, PubMed, and Google Scholar. Published records with "ChatGPT," "Bing," or "Bard" in the title were retrieved. Careful examination of the methodologies employed in the included records was conducted to identify the common pertinent themes and the possible gaps in reporting. A panel discussion was held to establish a unified and thorough checklist for the reporting of AI studies in health care. The finalized checklist was used to evaluate the included records by 2 independent raters. Cohen κ was used as the method to evaluate the interrater reliability.

RESULTS

The final data set that formed the basis for pertinent theme identification and analysis comprised a total of 34 records. The finalized checklist included 9 pertinent themes collectively referred to as METRICS (Model, Evaluation, Timing, Range/Randomization, Individual factors, Count, and Specificity of prompts and language). Their details are as follows: (1) Model used and its exact settings; (2) Evaluation approach for the generated content; (3) Timing of testing the model; (4) Transparency of the data source; (5) Range of tested topics; (6) Randomization of selecting the queries; (7) Individual factors in selecting the queries and interrater reliability; (8) Count of queries executed to test the model; and (9) Specificity of the prompts and language used. The overall mean METRICS score was 3.0 (SD 0.58). The tested METRICS score was acceptable, with the range of Cohen κ of 0.558 to 0.962 (P<.001 for the 9 tested items). With classification per item, the highest average METRICS score was recorded for the "Model" item, followed by the "Specificity" item, while the lowest scores were recorded for the "Randomization" item (classified as suboptimal) and "Individual factors" item (classified as satisfactory).

CONCLUSIONS

The METRICS checklist can facilitate the design of studies guiding researchers toward best practices in reporting results. The findings highlight the need for standardized reporting algorithms for generative AI-based studies in health care, considering the variability observed in methodologies and reporting. The proposed METRICS checklist could be a preliminary helpful base to establish a universally accepted approach to standardize the design and reporting of generative AI-based studies in health care, which is a swiftly evolving research topic.

Collapse