1
|
Su Y, Wang Y, He J, Wang H, A X, Jiang H, Lu W, Zhou W, Li L. Development and validation of machine-learning models of diet management for hyperphenylalaninemia: a multicenter retrospective study. BMC Med 2024; 22:377. [PMID: 39256839 PMCID: PMC11388910 DOI: 10.1186/s12916-024-03602-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Accepted: 09/02/2024] [Indexed: 09/12/2024] Open
Abstract
BACKGROUND Assessing dietary phenylalanine (Phe) tolerance is crucial for managing hyperphenylalaninemia (HPA) in children. However, traditionally, adjusting the diet requires significant time from clinicians and parents. This study aims to investigate the development of a machine-learning model that predicts a range of dietary Phe intake tolerance for children with HPA over 10 years following diagnosis. METHODS In this multicenter retrospective observational study, we collected the genotypes of phenylalanine hydroxylase (PAH), metabolic profiles at screening and diagnosis, and blood Phe concentrations corresponding to dietary Phe intake from over 10 years of follow-up data for 204 children with HPA. To incorporate genetic information, allelic phenotype value (APV) was input for 2965 missense variants in the PAH gene using a predicted APV (pAPV) model. This model was trained on known pheno-genotype relationships from the BioPKU database, utilizing 31 features. Subsequently, a multiclass classification model was constructed and trained on a dataset featuring metabolic data, genetic data, and follow-up data from 3177 events. The final model was fine-tuned using tenfold validation and validated against three independent datasets. RESULTS The pAPV model achieved a good predictive performance with root mean squared error (RMSE) of 1.53 and 2.38 on the training and test datasets, respectively. The variants that cause amino acid changes in the region of 200-300 of PAH tend to exhibit lower pAPV. The final model achieved a sensitivity range of 0.77 to 0.91 and a specificity range of 0.8 to 1 across all validation datasets. Additional assessment metrics including positive predictive value (0.68-1), negative predictive values (0.8-0.98), F1 score (0.71-0.92), and balanced accuracy (0.8-0.92) demonstrated the robust performance of our model. CONCLUSIONS Our model integrates metabolic and genetic information to accurately predict age-specific Phe tolerance, aiding in the precision management of patients with HPA. This study provides a potential framework that could be applied to other inborn errors of metabolism.
Collapse
Affiliation(s)
- Yajie Su
- Centre for Molecular Medicine, Children's Hospital of Fudan University, and Institutes of Biomedical Sciences, Fudan University, Shanghai, China
- Department of Neonatology, Children's Hospital of Xinjiang Uygur Autonomous Region, Xinjiang Hospital of Beijing Children's Hospital, Urumqi, China
| | - Yaqiong Wang
- Centre for Molecular Medicine, Children's Hospital of Fudan University, and Institutes of Biomedical Sciences, Fudan University, Shanghai, China
| | - Jinfeng He
- Department of Neonatology, People's Hospital of Xinjiang Uygur Autonomous Region, Urumqi, China
| | - Huijun Wang
- Shanghai Key Laboratory of Birth Defects, Pediatrics Research Institute, Shanghai, China
| | - Xian A
- Department of Neonatology, Children's Hospital of Xinjiang Uygur Autonomous Region, Xinjiang Hospital of Beijing Children's Hospital, Urumqi, China
| | - Haili Jiang
- Department of Neonatology, Children's Hospital of Xinjiang Uygur Autonomous Region, Xinjiang Hospital of Beijing Children's Hospital, Urumqi, China
| | - Wei Lu
- Department of Pediatric Endocrinology and Inherited Metabolic Diseases, Children's Hospital of Fudan University, Shanghai, China
| | - Wenhao Zhou
- Centre for Molecular Medicine, Children's Hospital of Fudan University, and Institutes of Biomedical Sciences, Fudan University, Shanghai, China.
- Shanghai Key Laboratory of Birth Defects, Pediatrics Research Institute, Shanghai, China.
- Department of Neonatology, Guangzhou Women and Children's Medical Center, Guangzhou Medical University, Guangzhou, China.
| | - Long Li
- Department of Neonatology, Children's Hospital of Xinjiang Uygur Autonomous Region, Xinjiang Hospital of Beijing Children's Hospital, Urumqi, China.
| |
Collapse
|
2
|
Wang D, Liang J, Ye J, Li J, Li J, Zhang Q, Hu Q, Pan C, Wang D, Liu Z, Shi W, Shi D, Li F, Qu B, Zheng Y. Enhancement of Large Language Models' Performance in Diabetes Education: Retrieval-Augmented Generation Approach. J Med Internet Res 2024. [PMID: 39046096 DOI: 10.2196/58041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/25/2024] Open
Abstract
BACKGROUND Large language models (LLMs) demonstrated advanced performance in processing clinical information. However, commercially available LLMs lack specialized medical knowledge and remain susceptible to generating inaccurate information. Given the need for self-management in diabetes, patients commonly seek information online. We introduce the RISE framework and evaluate its performance in enhancing LLMs to provide accurate responses to diabetes-related inquiries. OBJECTIVE This study aimed to evaluate the potential of RISE framework, an information retrieval and augmentation tool, to improve the LLM's performance to accurately and safely respond to diabetes-related inquiries. METHODS The RISE, an innovative retrieval augmentation framework, comprises four steps: Rewriting Query, Information Retrieval, Summarization, and Execution. Using a set of 43 common diabetes-related questions, we evaluated three base LLMs (GPT-4, Anthropic Claude 2, Google Bard) and their RISE-enhanced versions. Assessments were conducted by clinicians for accuracy and comprehensiveness, and by patients for understandability. RESULTS The integration of RISE significantly improved the accuracy and comprehensiveness of responses from all three based LLMs. On average, the percentage of accurate responses increased by 12% (122 - 107/129) with RISE. Specifically, the rates of accurate responses increased by 7% (42 - 39/43) for GPT-4, 19% (39 - 31/43) for Claude 2, and 9% (41 - 37/43) for Google Bard. The framework also enhanced response comprehensiveness, with mean scores improving by 0.44. Understandability was also enhanced by 0.19 on average. Data collection was conducted from Sept. 30, 2023, to Feb. 05, 2024. CONCLUSIONS RISE significantly improves LLMs' performance in responding to diabetes-related inquiries, enhancing accuracy, comprehensiveness, and understandability. These improvements have crucial implications for RISE's future role in patient education and chronic illness self-management, which contributes to relieving medical resource pressures and raising public awareness of medical knowledge. CLINICALTRIAL
Collapse
Affiliation(s)
- Dingqiao Wang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, GuangZhou, CN
| | - Jiangbo Liang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, GuangZhou, CN
| | - Jinguo Ye
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, GuangZhou, CN
| | - Jingni Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, GuangZhou, CN
| | - Jingpeng Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, GuangZhou, CN
| | - Qikai Zhang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, GuangZhou, CN
| | - Qiuling Hu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, GuangZhou, CN
| | - Caineng Pan
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, GuangZhou, CN
| | - Dongliang Wang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, GuangZhou, CN
| | - Zhong Liu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, GuangZhou, CN
| | - Wen Shi
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, GuangZhou, CN
| | - Danli Shi
- Research Centre for SHARP Vision, The Hong Kong Polytechnic University, Hong Kong, CN
| | - Fei Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, GuangZhou, CN
| | - Bo Qu
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, GuangZhou, CN
| | - Yingfeng Zheng
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, GuangZhou, CN
| |
Collapse
|
3
|
Huo W, He M, Zeng Z, Bao X, Lu Y, Tian W, Feng J, Feng R. Impact Analysis of COVID-19 Pandemic on Hospital Reviews on Dianping Website in Shanghai, China: Empirical Study. J Med Internet Res 2024; 26:e52992. [PMID: 38954461 PMCID: PMC11252617 DOI: 10.2196/52992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 01/24/2024] [Accepted: 05/21/2024] [Indexed: 07/04/2024] Open
Abstract
BACKGROUND In the era of the internet, individuals have increasingly accustomed themselves to gathering necessary information and expressing their opinions on public web-based platforms. The health care sector is no exception, as these comments, to a certain extent, influence people's health care decisions. During the onset of the COVID-19 pandemic, how the medical experience of Chinese patients and their evaluations of hospitals have changed remains to be studied. Therefore, we plan to collect patient medical visit data from the internet to reflect the current status of medical relationships under specific circumstances. OBJECTIVE This study aims to explore the differences in patient comments across various stages (during, before, and after) of the COVID-19 pandemic, as well as among different types of hospitals (children's hospitals, maternity hospitals, and tumor hospitals). Additionally, by leveraging ChatGPT (OpenAI), the study categorizes the elements of negative hospital evaluations. An analysis is conducted on the acquired data, and potential solutions that could improve patient satisfaction are proposed. This study is intended to assist hospital managers in providing a better experience for patients who are seeking care amid an emergent public health crisis. METHODS Selecting the top 50 comprehensive hospitals nationwide and the top specialized hospitals (children's hospitals, tumor hospitals, and maternity hospitals), we collected patient reviews from these hospitals on the Dianping website. Using ChatGPT, we classified the content of negative reviews. Additionally, we conducted statistical analysis using SPSS (IBM Corp) to examine the scoring and composition of negative evaluations. RESULTS A total of 30,317 pieces of effective comment information were collected from January 1, 2018, to August 15, 2023, including 7696 pieces of negative comment information. Manual inspection results indicated that ChatGPT had an accuracy rate of 92.05%. The F1-score was 0.914. The analysis of this data revealed a significant correlation between the comments and ratings received by hospitals during the pandemic. Overall, there was a significant increase in average comment scores during the outbreak (P<.001). Furthermore, there were notable differences in the composition of negative comments among different types of hospitals (P<.001). Children's hospitals received sensitive feedback regarding waiting times and treatment effectiveness, while patients at maternity hospitals showed a greater concern for the attitude of health care providers. Patients at tumor hospitals expressed a desire for timely examinations and treatments, especially during the pandemic period. CONCLUSIONS The COVID-19 pandemic had some association with patient comment scores. There were variations in the scores and content of comments among different types of specialized hospitals. Using ChatGPT to analyze patient comment content represents an innovative approach for statistically assessing factors contributing to patient dissatisfaction. The findings of this study could provide valuable insights for hospital administrators to foster more harmonious physician-patient relationships and enhance hospital performance during public health emergencies.
Collapse
Affiliation(s)
- Weixue Huo
- Department of Vascular Surgery, Shanghai General Hospital, Shanghai Jiaotong University, Shanghai, China
| | - Mengwei He
- Department of Vascular Surgery, Shanghai General Hospital, Shanghai Jiaotong University, Shanghai, China
| | - Zhaoxiang Zeng
- Department of Vascular Surgery, Changhai Hospital, Navy Medical University, Shanghai, China
| | - Xianhao Bao
- Department of Vascular Surgery, Shanghai General Hospital, Shanghai Jiaotong University, Shanghai, China
| | - Ye Lu
- Department of Vascular Surgery, Shanghai General Hospital, Shanghai Jiaotong University, Shanghai, China
| | - Wen Tian
- Department of Vascular Surgery, Shanghai General Hospital, Shanghai Jiaotong University, Shanghai, China
| | - Jiaxuan Feng
- Vascular Surgery Department, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, China
| | - Rui Feng
- Department of Vascular Surgery, Shanghai General Hospital, Shanghai Jiaotong University, Shanghai, China
| |
Collapse
|
4
|
Sosa-Holwerda A, Park OH, Albracht-Schulte K, Niraula S, Thompson L, Oldewage-Theron W. The Role of Artificial Intelligence in Nutrition Research: A Scoping Review. Nutrients 2024; 16:2066. [PMID: 38999814 PMCID: PMC11243505 DOI: 10.3390/nu16132066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 06/20/2024] [Accepted: 06/24/2024] [Indexed: 07/14/2024] Open
Abstract
Artificial intelligence (AI) refers to computer systems doing tasks that usually need human intelligence. AI is constantly changing and is revolutionizing the healthcare field, including nutrition. This review's purpose is four-fold: (i) to investigate AI's role in nutrition research; (ii) to identify areas in nutrition using AI; (iii) to understand AI's future potential impact; (iv) to investigate possible concerns about AI's use in nutrition research. Eight databases were searched: PubMed, Web of Science, EBSCO, Agricola, Scopus, IEEE Explore, Google Scholar and Cochrane. A total of 1737 articles were retrieved, of which 22 were included in the review. Article screening phases included duplicates elimination, title-abstract selection, full-text review, and quality assessment. The key findings indicated AI's role in nutrition is at a developmental stage, focusing mainly on dietary assessment and less on malnutrition prediction, lifestyle interventions, and diet-related diseases comprehension. Clinical research is needed to determine AI's intervention efficacy. The ethics of AI use, a main concern, remains unresolved and needs to be considered for collateral damage prevention to certain populations. The studies' heterogeneity in this review limited the focus on specific nutritional areas. Future research should prioritize specialized reviews in nutrition and dieting for a deeper understanding of AI's potential in human nutrition.
Collapse
Affiliation(s)
- Andrea Sosa-Holwerda
- Department of Nutritional Sciences, Texas Tech University, Lubbock, TX 79409, USA
| | - Oak-Hee Park
- College of Health & Human Sciences, Texas Tech University, Lubbock, TX 79409, USA
| | | | - Surya Niraula
- Department of Nutritional Sciences, Texas Tech University, Lubbock, TX 79409, USA
| | - Leslie Thompson
- Department of Animal and Food Sciences, Texas Tech University, Lubbock, TX 79409, USA
| | | |
Collapse
|
5
|
Marchi F, Bellini E, Iandelli A, Sampieri C, Peretti G. Exploring the landscape of AI-assisted decision-making in head and neck cancer treatment: a comparative analysis of NCCN guidelines and ChatGPT responses. Eur Arch Otorhinolaryngol 2024; 281:2123-2136. [PMID: 38421392 DOI: 10.1007/s00405-024-08525-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 02/02/2024] [Indexed: 03/02/2024]
Abstract
PURPOSE Recent breakthroughs in natural language processing and machine learning, exemplified by ChatGPT, have spurred a paradigm shift in healthcare. Released by OpenAI in November 2022, ChatGPT rapidly gained global attention. Trained on massive text datasets, this large language model holds immense potential to revolutionize healthcare. However, existing literature often overlooks the need for rigorous validation and real-world applicability. METHODS This head-to-head comparative study assesses ChatGPT's capabilities in providing therapeutic recommendations for head and neck cancers. Simulating every NCCN Guidelines scenarios. ChatGPT is queried on primary treatments, adjuvant treatment, and follow-up, with responses compared to the NCCN Guidelines. Performance metrics, including sensitivity, specificity, and F1 score, are employed for assessment. RESULTS The study includes 68 hypothetical cases and 204 clinical scenarios. ChatGPT exhibits promising capabilities in addressing NCCN-related queries, achieving high sensitivity and overall accuracy across primary treatment, adjuvant treatment, and follow-up. The study's metrics showcase robustness in providing relevant suggestions. However, a few inaccuracies are noted, especially in primary treatment scenarios. CONCLUSION Our study highlights the proficiency of ChatGPT in providing treatment suggestions. The model's alignment with the NCCN Guidelines sets the stage for a nuanced exploration of AI's evolving role in oncological decision support. However, challenges related to the interpretability of AI in clinical decision-making and the importance of clinicians understanding the underlying principles of AI models remain unexplored. As AI continues to advance, collaborative efforts between models and medical experts are deemed essential for unlocking new frontiers in personalized cancer care.
Collapse
Affiliation(s)
- Filippo Marchi
- Unit of Otorhinolaryngology-Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Largo Rosanna Benzi, 10, 16132, Genoa, Italy
- Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, 16132, Genoa, Italy
| | - Elisa Bellini
- Unit of Otorhinolaryngology-Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Largo Rosanna Benzi, 10, 16132, Genoa, Italy.
- Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, 16132, Genoa, Italy.
| | - Andrea Iandelli
- Unit of Otorhinolaryngology-Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Largo Rosanna Benzi, 10, 16132, Genoa, Italy
| | - Claudio Sampieri
- Department of Experimental Medicine (DIMES), University of Genoa, Genoa, Italy
- Department of Otolaryngology-Hospital Cliníc, Barcelona, Spain
- Functional Unit of Head and Neck Tumors-Hospital Cliníc, Barcelona, Spain
| | - Giorgio Peretti
- Unit of Otorhinolaryngology-Head and Neck Surgery, IRCCS Ospedale Policlinico San Martino, Largo Rosanna Benzi, 10, 16132, Genoa, Italy
- Department of Surgical Sciences and Integrated Diagnostics (DISC), University of Genoa, 16132, Genoa, Italy
| |
Collapse
|
6
|
Shiraishi M, Lee H, Kanayama K, Moriwaki Y, Okazaki M. Appropriateness of Artificial Intelligence Chatbots in Diabetic Foot Ulcer Management. INT J LOW EXTR WOUND 2024:15347346241236811. [PMID: 38419470 DOI: 10.1177/15347346241236811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2024]
Abstract
Type 2 diabetes is a significant global health concern. It often causes diabetic foot ulcers (DFUs), which affect millions of people and increase amputation and mortality rates. Despite existing guidelines, the complexity of DFU treatment makes clinical decisions challenging. Large language models such as chat generative pretrained transformer (ChatGPT), which are adept at natural language processing, have emerged as valuable resources in the medical field. However, concerns about the accuracy and reliability of the information they provide remain. We aimed to assess the accuracy of various artificial intelligence (AI) chatbots, including ChatGPT, in providing information on DFUs based on established guidelines. Seven AI chatbots were asked clinical questions (CQs) based on the DFU guidelines. Their responses were analyzed for accuracy in terms of answers to CQs, grade of recommendation, level of evidence, and agreement with the reference, including verification of the authenticity of the references provided by the chatbots. The AI chatbots showed a mean accuracy of 91.2% in answers to CQs, with discrepancies noted in grade of recommendation and level of evidence. Claude-2 outperformed other chatbots in the number of verified references (99.6%), whereas ChatGPT had the lowest rate of reference authenticity (66.3%). This study highlights the potential of AI chatbots as tools for disseminating medical information and demonstrates their high degree of accuracy in answering CQs related to DFUs. However, the variability in the accuracy of these chatbots and problems like AI hallucinations necessitate cautious use and further optimization for medical applications. This study underscores the evolving role of AI in healthcare and the importance of refining these technologies for effective use in clinical decision-making and patient education.
Collapse
Affiliation(s)
- Makoto Shiraishi
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| | - Haesu Lee
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| | - Koji Kanayama
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| | - Yuta Moriwaki
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| | - Mutsumi Okazaki
- Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan
| |
Collapse
|
7
|
Sallam M, Barakat M, Sallam M. A Preliminary Checklist (METRICS) to Standardize the Design and Reporting of Studies on Generative Artificial Intelligence-Based Models in Health Care Education and Practice: Development Study Involving a Literature Review. Interact J Med Res 2024; 13:e54704. [PMID: 38276872 PMCID: PMC10905357 DOI: 10.2196/54704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Revised: 12/18/2023] [Accepted: 01/26/2024] [Indexed: 01/27/2024] Open
Abstract
BACKGROUND Adherence to evidence-based practice is indispensable in health care. Recently, the utility of generative artificial intelligence (AI) models in health care has been evaluated extensively. However, the lack of consensus guidelines on the design and reporting of findings of these studies poses a challenge for the interpretation and synthesis of evidence. OBJECTIVE This study aimed to develop a preliminary checklist to standardize the reporting of generative AI-based studies in health care education and practice. METHODS A literature review was conducted in Scopus, PubMed, and Google Scholar. Published records with "ChatGPT," "Bing," or "Bard" in the title were retrieved. Careful examination of the methodologies employed in the included records was conducted to identify the common pertinent themes and the possible gaps in reporting. A panel discussion was held to establish a unified and thorough checklist for the reporting of AI studies in health care. The finalized checklist was used to evaluate the included records by 2 independent raters. Cohen κ was used as the method to evaluate the interrater reliability. RESULTS The final data set that formed the basis for pertinent theme identification and analysis comprised a total of 34 records. The finalized checklist included 9 pertinent themes collectively referred to as METRICS (Model, Evaluation, Timing, Range/Randomization, Individual factors, Count, and Specificity of prompts and language). Their details are as follows: (1) Model used and its exact settings; (2) Evaluation approach for the generated content; (3) Timing of testing the model; (4) Transparency of the data source; (5) Range of tested topics; (6) Randomization of selecting the queries; (7) Individual factors in selecting the queries and interrater reliability; (8) Count of queries executed to test the model; and (9) Specificity of the prompts and language used. The overall mean METRICS score was 3.0 (SD 0.58). The tested METRICS score was acceptable, with the range of Cohen κ of 0.558 to 0.962 (P<.001 for the 9 tested items). With classification per item, the highest average METRICS score was recorded for the "Model" item, followed by the "Specificity" item, while the lowest scores were recorded for the "Randomization" item (classified as suboptimal) and "Individual factors" item (classified as satisfactory). CONCLUSIONS The METRICS checklist can facilitate the design of studies guiding researchers toward best practices in reporting results. The findings highlight the need for standardized reporting algorithms for generative AI-based studies in health care, considering the variability observed in methodologies and reporting. The proposed METRICS checklist could be a preliminary helpful base to establish a universally accepted approach to standardize the design and reporting of generative AI-based studies in health care, which is a swiftly evolving research topic.
Collapse
Affiliation(s)
- Malik Sallam
- Department of Pathology, Microbiology and Forensic Medicine, School of Medicine, The University of Jordan, Amman, Jordan
- Department of Clinical Laboratories and Forensic Medicine, Jordan University Hospital, Amman, Jordan
- Department of Translational Medicine, Faculty of Medicine, Lund University, Malmo, Sweden
| | - Muna Barakat
- Department of Clinical Pharmacy and Therapeutics, Faculty of Pharmacy, Applied Science Private University, Amman, Jordan
| | - Mohammed Sallam
- Department of Pharmacy, Mediclinic Parkview Hospital, Mediclinic Middle East, Dubai, United Arab Emirates
| |
Collapse
|