1
|
Alkhalaf M, Yu P, Yin M, Deng C. Applying generative AI with retrieval augmented generation to summarize and extract key clinical information from electronic health records. J Biomed Inform 2024; 156:104662. [PMID: 38880236 DOI: 10.1016/j.jbi.2024.104662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 05/25/2024] [Accepted: 05/28/2024] [Indexed: 06/18/2024]
Abstract
BACKGROUND Malnutrition is a prevalent issue in aged care facilities (RACFs), leading to adverse health outcomes. The ability to efficiently extract key clinical information from a large volume of data in electronic health records (EHR) can improve understanding about the extent of the problem and developing effective interventions. This research aimed to test the efficacy of zero-shot prompt engineering applied to generative artificial intelligence (AI) models on their own and in combination with retrieval augmented generation (RAG), for the automating tasks of summarizing both structured and unstructured data in EHR and extracting important malnutrition information. METHODOLOGY We utilized Llama 2 13B model with zero-shot prompting. The dataset comprises unstructured and structured EHRs related to malnutrition management in 40 Australian RACFs. We employed zero-shot learning to the model alone first, then combined it with RAG to accomplish two tasks: generate structured summaries about the nutritional status of a client and extract key information about malnutrition risk factors. We utilized 25 notes in the first task and 1,399 in the second task. We evaluated the model's output of each task manually against a gold standard dataset. RESULT The evaluation outcomes indicated that zero-shot learning applied to generative AI model is highly effective in summarizing and extracting information about nutritional status of RACFs' clients. The generated summaries provided concise and accurate representation of the original data with an overall accuracy of 93.25%. The addition of RAG improved the summarization process, leading to a 6% increase and achieving an accuracy of 99.25%. The model also proved its capability in extracting risk factors with an accuracy of 90%. However, adding RAG did not further improve accuracy in this task. Overall, the model has shown a robust performance when information was explicitly stated in the notes; however, it could encounter hallucination limitations, particularly when details were not explicitly provided. CONCLUSION This study demonstrates the high performance and limitations of applying zero-shot learning to generative AI models to automatic generation of structured summarization of EHRs data and extracting key clinical information. The inclusion of the RAG approach improved the model performance and mitigated the hallucination problem.
Collapse
Affiliation(s)
- Mohammad Alkhalaf
- School of Computing and Information Technology, University of Wollongong, Wollongong, NSW 2522, Australia; School of Computer Science, Qassim University, Qassim 51452, Saudi Arabia
| | - Ping Yu
- School of Computing and Information Technology, University of Wollongong, Wollongong, NSW 2522, Australia.
| | - Mengyang Yin
- Opal Healthcare, Level 11/420 George St, Sydney NSW 2000, Australia
| | - Chao Deng
- School of Medical, Indigenous and Health Sciences, University of Wollongong, Wollongong, NSW 2522, Australia
| |
Collapse
|
2
|
Scott IA, De Guzman KR, Falconer N, Canaris S, Bonilla O, McPhail SM, Marxen S, Van Garderen A, Abdel-Hafez A, Barras M. Evaluating automated machine learning platforms for use in healthcare. JAMIA Open 2024; 7:ooae031. [PMID: 38863963 PMCID: PMC11165368 DOI: 10.1093/jamiaopen/ooae031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 03/06/2024] [Accepted: 04/22/2024] [Indexed: 06/13/2024] Open
Abstract
Objective To describe development and application of a checklist of criteria for selecting an automated machine learning (Auto ML) platform for use in creating clinical ML models. Materials and Methods Evaluation criteria for selecting an Auto ML platform suited to ML needs of a local health district were developed in 3 steps: (1) identification of key requirements, (2) a market scan, and (3) an assessment process with desired outcomes. Results The final checklist comprising 21 functional and 6 non-functional criteria was applied to vendor submissions in selecting a platform for creating a ML heparin dosing model as a use case. Discussion A team of clinicians, data scientists, and key stakeholders developed a checklist which can be adapted to ML needs of healthcare organizations, the use case providing a relevant example. Conclusion An evaluative checklist was developed for selecting Auto ML platforms which requires validation in larger multi-site studies.
Collapse
Affiliation(s)
- Ian A Scott
- Centre for Health Services Research, University of Queensland, Brisbane, 4102, Australia
- Department of Internal Medicine and Clinical Epidemiology, Princess Alexandra Hospital, Brisbane, 4102, Australia
| | - Keshia R De Guzman
- Department of Pharmacy, Princess Alexandra Hospital, Brisbane, 4102, Australia
- School of Pharmacy, The University of Queensland, Brisbane, 4102, Australia
| | - Nazanin Falconer
- Department of Pharmacy, Princess Alexandra Hospital, Brisbane, 4102, Australia
- School of Pharmacy, The University of Queensland, Brisbane, 4102, Australia
| | - Stephen Canaris
- Digital Health and Informatics, Metro South Health, Brisbane, 4102, Australia
| | - Oscar Bonilla
- Digital Health and Informatics, Metro South Health, Brisbane, 4102, Australia
| | - Steven M McPhail
- Digital Health and Informatics, Metro South Health, Brisbane, 4102, Australia
- Australian Centre for Health Services Innovation and Centre for Healthcare Transformation, School of Public Health and Social Work, Queensland University of Technology, Brisbane, 4059, Australia
| | - Sven Marxen
- Pharmacy Service, Logan and Beaudesert Hospitals, Logan, 4131, Australia
| | - Aaron Van Garderen
- Digital Health and Informatics, Metro South Health, Brisbane, 4102, Australia
- Pharmacy Service, Logan and Beaudesert Hospitals, Logan, 4131, Australia
| | - Ahmad Abdel-Hafez
- Digital Health and Informatics, Metro South Health, Brisbane, 4102, Australia
- Australian Centre for Health Services Innovation and Centre for Healthcare Transformation, School of Public Health and Social Work, Queensland University of Technology, Brisbane, 4059, Australia
| | - Michael Barras
- Department of Pharmacy, Princess Alexandra Hospital, Brisbane, 4102, Australia
- School of Pharmacy, The University of Queensland, Brisbane, 4102, Australia
| |
Collapse
|
3
|
Liu Y, Cao S. The analysis of aerobics intelligent fitness system for neurorobotics based on big data and machine learning. Heliyon 2024; 10:e33191. [PMID: 39022026 PMCID: PMC11253048 DOI: 10.1016/j.heliyon.2024.e33191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Revised: 05/08/2024] [Accepted: 06/16/2024] [Indexed: 07/20/2024] Open
Abstract
In modern society, people's pace of life is fast, and the pressure is enormous, leading to increasingly prominent issues such as obesity and sub-health. Traditional fitness methods cannot meet people's needs to a certain extent. Therefore, this work aims to use technology to change people's lifestyles and compensate for traditional fitness methods' shortcomings. Firstly, this work overviews neurorobotics, providing neural perception and control functions for aerobics intelligent fitness system. Secondly, the connection between big data and machine learning (ML), big data technology products, and the ML process are discussed. The Spark big data platform builds node data for calculation, and the decision tree algorithm is used for data preprocessing. These are important for future intelligent fitness analysis. This work proposes an aerobics intelligent fitness system based on neurorobotics technology and big data analysis and develops a recommendation system for the best fitness exercise. This system utilizes neural perception and control functions, combined with big data and ML technology, to solve the obesity and sub-health problems faced by people in fast-paced and high-pressure lifestyles. By harnessing the computational capabilities of the Spark big data platform and applying the decision tree algorithm for data preprocessing, the system can furnish users with personalized fitness plans and optimization recommendations. This work conducts a model performance study on 35 % aerobic fitness data on intelligent fitness Android v1.0.8 to evaluate the system's data processing ability and training effectiveness. Moreover, the aerobics intelligent fitness system models based on neurorobotics, big data, and ML are evaluated. The results indicate that normalizing the data using the Min-Max method leads to a decrease in the F1 value and a reduction in data set errors. Consequently, the dataset studied by the system model is beneficial to improving the work efficiency of the aerobics intelligent fitness system. After the comprehensive human quality of the system model is evaluated, the actual average score of the comprehensive human quality of the 13 users tested before the aerobics intelligent fitness system test is 91.44, and the average prediction score is 90.88. The results of the two tests are similar. Thus, using the intelligent fitness system can enable the user to obtain system feedback according to the actual training effect, thereby playing a guiding role in the intelligent fitness of aerobics for the user. This work designs and implements the aerobics intelligent fitness system close to the human body's training effect, further enhancing the specialization and individualization of sports and fitness.
Collapse
Affiliation(s)
- Yuanxin Liu
- Sports Department, Henan Medical College, Zhengzhou, 451191, China
| | - Shufang Cao
- Ministry of Basic Medicine Education, Dazhou Vocational College of Chinese Medicine, Dazhou, 635000, China
| |
Collapse
|
4
|
Wang Y, Xu X, Fang Y, Yang S, Wang Q, Liu W, Zhang J, Liang D, Zhai W, Qian K. Self-Assembled Hyperbranched Gold Nanoarrays Decode Serum United Urine Metabolic Fingerprints for Kidney Tumor Diagnosis. ACS NANO 2024; 18:2409-2420. [PMID: 38190455 DOI: 10.1021/acsnano.3c10717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/10/2024]
Abstract
Serum united urine metabolic analysis comprehensively reveals the disease status for kidney diseases in particular. Thus, the precise and convenient acquisition of metabolic molecular information from united biofluids is vitally important for clinical disease diagnosis and biomarker discovery. Laser desorption/ionization mass spectrometry (LDI-MS) presents various advantages in metabolic analysis; however, there remain challenges in ionization efficiency and MS signal reproducibility. Herein, we constructed a self-assembled hyperbranched black gold nanoarray (HyBrAuNA) assisted LDI-MS platform to profile serum united urine metabolic fingerprints (S-UMFs) for diagnosis of early stage renal cell carcinoma (RCC). The closely packed HyBrAuNA afforded strong electromagnetic field enhancement and high photothermal conversion efficacy, enabling effective ionization of low abundant metabolites for S-UMF collection. With a uniform nanoarray, the platform presented excellent reproducibility to ensure the accuracy of S-UMFs obtained in seconds. When it was combined with automated machine learning analysis of S-UMFs, early stage RCC patients were discriminated from the healthy controls with an area under the curve (AUC) > 0.99. Furthermore, we screened out a panel of 9 metabolites (4 from serum and 5 from urine) and related pathways toward early stage kidney tumor. In view of its high-throughput, fast analytical speed, and low sample consumption, our platform possesses potential in metabolic profiling of united biofluids for disease diagnosis and pathogenic mechanism exploration.
Collapse
Affiliation(s)
- Yuning Wang
- State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering and Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai 200030, People's Republic of China
| | - Xiaoyu Xu
- State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering and Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai 200030, People's Republic of China
| | - Yuzheng Fang
- Department of Urology, Renji Hospital, School of Medicine in Shanghai Jiao Tong University, 160 Pujian Road, Shanghai 200127, People's Republic of China
| | - Shouzhi Yang
- State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering and Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai 200030, People's Republic of China
| | - Qirui Wang
- Health Management Center, Renji Hospital of Medical School of Shanghai Jiao Tong University, Shanghai 200127, People's Republic of China
| | - Wanshan Liu
- State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering and Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai 200030, People's Republic of China
| | - Juxiang Zhang
- State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering and Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai 200030, People's Republic of China
| | - Dingyitai Liang
- State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering and Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai 200030, People's Republic of China
| | - Wei Zhai
- Department of Urology, Renji Hospital, School of Medicine in Shanghai Jiao Tong University, 160 Pujian Road, Shanghai 200127, People's Republic of China
| | - Kun Qian
- State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering and Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai 200030, People's Republic of China
| |
Collapse
|
5
|
Lin E, Lin CH, Lane HY. Inference of social cognition in schizophrenia patients with neurocognitive domains and neurocognitive tests using automated machine learning. Asian J Psychiatr 2024; 91:103866. [PMID: 38128351 DOI: 10.1016/j.ajp.2023.103866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 12/07/2023] [Accepted: 12/09/2023] [Indexed: 12/23/2023]
Abstract
AIM It has been suggested that single neurocognitive domain or neurocognitive test can be used to determine the overall cognitive function in schizophrenia using machine learning algorithms. It is unknown whether social cognition in schizophrenia patients can be estimated with machine learning based on neurocognitive domains or neurocognitive tests. METHODS To predict social cognition in schizophrenia, we applied an automated machine learning (AutoML) framework resulting from the analysis of predictive factors such as six neurocognitive domain scores and nine neurocognitive test scores of 380 schizophrenia patients in the Taiwanese population. Four clinical parameters (i.e., age, gender, subgroup, and education) were also used as predictive factors. We utilized an AutoML framework called Tree-based Pipeline Optimization Tool (TPOT) to generate predictive pipelines automatically. RESULTS The analysis revealed that all neurocognitive domains and tests except the reasoning and problem solving domain/test showed significant associations with social cognition. In addition, a TPOT-generated pipeline can best predict social cognition in schizophrenia using seven predictive factors, including five neurocognitive domains (i.e., speed of processing, sustained attention, working memory, verbal learning and memory, and visual learning and memory) and two clinical parameters (i.e., age and gender). This predictive pipeline consists of machine learning algorithms such as function transformers, an approximate feature map, independent component analysis, and linear regression. CONCLUSION The study indicates that an AutoML framework such as TPOT may provide a promising way to produce truly effective machine learning pipelines for predicting social cognition in schizophrenia using neurocognitive domains and/or neurocognitive tests.
Collapse
Affiliation(s)
- Eugene Lin
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA; Department of Electrical & Computer Engineering, University of Washington, Seattle, WA 98195, USA; Graduate Institute of Biomedical Sciences, China Medical University, Taichung, Taiwan
| | - Chieh-Hsin Lin
- Graduate Institute of Biomedical Sciences, China Medical University, Taichung, Taiwan; Department of Psychiatry, Kaohsiung Chang Gung Memorial Hospital, Chang Gung University College of Medicine, Kaohsiung, Taiwan; School of Medicine, Chang Gung University, Taoyuan, Taiwan.
| | - Hsien-Yuan Lane
- Graduate Institute of Biomedical Sciences, China Medical University, Taichung, Taiwan; Department of Psychiatry, China Medical University Hospital, Taichung, Taiwan; Brain Disease Research Center, China Medical University Hospital, Taichung, Taiwan; Department of Psychology, College of Medical and Health Sciences, Asia University, Taichung, Taiwan.
| |
Collapse
|
6
|
Omar I, Khan M, Starr A, Abou Rok Ba K. Automated Prediction of Crack Propagation Using H2O AutoML. SENSORS (BASEL, SWITZERLAND) 2023; 23:8419. [PMID: 37896512 PMCID: PMC10611134 DOI: 10.3390/s23208419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 10/06/2023] [Accepted: 10/09/2023] [Indexed: 10/29/2023]
Abstract
Crack propagation is a critical phenomenon in materials science and engineering, significantly impacting structural integrity, reliability, and safety across various applications. The accurate prediction of crack propagation behavior is paramount for ensuring the performance and durability of engineering components, as extensively explored in prior research. Nevertheless, there is a pressing demand for automated models capable of efficiently and precisely forecasting crack propagation. In this study, we address this need by developing a machine learning-based automated model using the powerful H2O library. This model aims to accurately predict crack propagation behavior in various materials by analyzing intricate crack patterns and delivering reliable predictions. To achieve this, we employed a comprehensive dataset derived from measured instances of crack propagation in Acrylonitrile Butadiene Styrene (ABS) specimens. Rigorous evaluation metrics, including Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and R-squared (R2) values, were applied to assess the model's predictive accuracy. Cross-validation techniques were utilized to ensure its robustness and generalizability across diverse datasets. Our results underscore the automated model's remarkable accuracy and reliability in predicting crack propagation. This study not only highlights the immense potential of the H2O library as a valuable tool for structural health monitoring but also advocates for the broader adoption of Automated Machine Learning (AutoML) solutions in engineering applications. In addition to presenting these findings, we define H2O as a powerful machine learning library and AutoML as Automated Machine Learning to ensure clarity and understanding for readers unfamiliar with these terms. This research not only demonstrates the significance of AutoML in future-proofing our approach to structural integrity and safety but also emphasizes the need for comprehensive reporting and understanding in scientific discourse.
Collapse
Affiliation(s)
| | - Muhammad Khan
- School of Aerospace, Transport and Manufacturing, Cranfield University, Bedford MK43 0AL, UK
| | | | | |
Collapse
|
7
|
Yu HQ, O’Neill S, Kermanizadeh A. AIMS: An Automatic Semantic Machine Learning Microservice Framework to Support Biomedical and Bioengineering Research. Bioengineering (Basel) 2023; 10:1134. [PMID: 37892864 PMCID: PMC10603862 DOI: 10.3390/bioengineering10101134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 09/21/2023] [Accepted: 09/25/2023] [Indexed: 10/29/2023] Open
Abstract
The fusion of machine learning and biomedical research offers novel ways to understand, diagnose, and treat various health conditions. However, the complexities of biomedical data, coupled with the intricate process of developing and deploying machine learning solutions, often pose significant challenges to researchers in these fields. Our pivotal achievement in this research is the introduction of the Automatic Semantic Machine Learning Microservice (AIMS) framework. AIMS addresses these challenges by automating various stages of the machine learning pipeline, with a particular emphasis on the ontology of machine learning services tailored to the biomedical domain. This ontology encompasses everything from task representation, service modeling, and knowledge acquisition to knowledge reasoning and the establishment of a self-supervised learning policy. Our framework has been crafted to prioritize model interpretability, integrate domain knowledge effortlessly, and handle biomedical data with efficiency. Additionally, AIMS boasts a distinctive feature: it leverages self-supervised knowledge learning through reinforcement learning techniques, paired with an ontology-based policy recording schema. This enables it to autonomously generate, fine-tune, and continually adapt to machine learning models, especially when faced with new tasks and data. Our work has two standout contributions demonstrating that machine learning processes in the biomedical domain can be automated, while integrating a rich domain knowledge base and providing a way for machines to have self-learning ability, ensuring they handle new tasks effectively. To showcase AIMS in action, we have highlighted its prowess in three case studies of biomedical tasks. These examples emphasize how our framework can simplify research routines, uplift the caliber of scientific exploration, and set the stage for notable advances.
Collapse
|
8
|
Clinical Screening Prediction in the Portuguese National Health Service: Data Analysis, Machine Learning Models, Explainability and Meta-Evaluation. FUTURE INTERNET 2023. [DOI: 10.3390/fi15010026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
This paper presents an analysis of the calls made to the Portuguese National Health Contact Center (SNS24) during a three years period. The final goal was to develop a system to help nurse attendants select the appropriate clinical pathway (from 59 options) for each call. It examines several aspects of the calls distribution like age and gender of the user, date and time of the call and final referral, among others and presents comparative results for alternative classification models (SVM and CNN) and different data samples (three months, one and two years data models). For the task of selecting the appropriate pathway, the models, learned on the basis of the available data, achieved F1 values that range between 0.642 (3 months CNN model) and 0.783 (2 years CNN model), with SVM having a more stable performance (between 0.743 and 0.768 for the corresponding data samples). These results are discussed regarding error analysis and possibilities for explaining the system decisions. A final meta evaluation, based on a clinical expert overview, compares the different choices: the nurse attendants (reference ground truth), the expert and the automatic decisions (2 models), revealing a higher agreement between the ML models, followed by their agreement with the clinical expert, and minor agreement with the reference.
Collapse
|
9
|
Loganathan T, Priya Doss C G. The influence of machine learning technologies in gut microbiome research and cancer studies - A review. Life Sci 2022; 311:121118. [DOI: 10.1016/j.lfs.2022.121118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 10/19/2022] [Accepted: 10/19/2022] [Indexed: 11/18/2022]
|
10
|
Christidis N, Lindberg V, Jounger SL, Christidis M. Early steps towards professional clinical note-taking in a Swedish study programme in dentistry. BMC MEDICAL EDUCATION 2022; 22:676. [PMID: 36104688 PMCID: PMC9472420 DOI: 10.1186/s12909-022-03727-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Accepted: 09/02/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND Higher education tends to focus on academic writing only, instead of emphasizing that professional texts are also used as a basis for communication in contexts with a variety of participators. When it comes to clinical notes, research is scarce and focused on technology and informatics. Therefore, the aim was to explore dental students' clinical notes, and specifically which aspects of the clinical notes characterizes clinical notes that are not sufficient enough for professional purposes. METHODS The object of analysis was the student's written completion of a teacher constructed protocol regarding oral mucosa, the dental apparatus including pathology on tooth level, oral hygiene, and a validated international clinical examination protocol of the temporomandibular region. The study was framed within the New Literacy Studies approach, and the clinical notes were analyzed using thematic analysis. RESULTS Within the clinical notes three themes were identified; a) familiar content; b) familiar content in new context; and c) new content. The forms of notes could refer to either categorizational clinical notes or descriptive clinical notes. Most students were able to write acceptable clinical notes when the content was familiar, but as soon as the familiar content was in a new context the students had difficulties to write acceptable notes. When it comes to descriptive notes students suffered difficulties to write acceptable notes both when it came to familiar content, or familiar content in a new context. CONCLUSIONS Taken together, the results indicate that students have difficulties writing acceptable notes when they are novices to the content or context, making their notes either insufficient, too short or even wrong for professional purposes. With this in mind, this study suggests that there is a need to strengthen the demands on sufficient professional quality in clinical notes and focus on clinical notes already in the early stages of the different medical educations.
Collapse
Affiliation(s)
- Nikolaos Christidis
- Department of Dental Medicine, Division of Oral Diagnostics and Rehabilitation, Karolinska Institutet, SE-141 04, Huddinge, Sweden.
| | - Viveca Lindberg
- Department of Teaching and Learning, Stockholm University, Stockholm, Sweden
- Department of Special Education, Stockholm University, Stockholm, Sweden
| | - Sofia Louca Jounger
- Department of Dental Medicine, Division of Oral Diagnostics and Rehabilitation, Karolinska Institutet, SE-141 04, Huddinge, Sweden
| | - Maria Christidis
- Department of Health Sciences, The Swedish Red Cross University, SE-151 47, Huddinge, Sweden
- Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, SE-141 83, Huddinge, Sweden
| |
Collapse
|
11
|
A Romero RA, Y Deypalan MN, Mehrotra S, Jungao JT, Sheils NE, Manduchi E, Moore JH. Benchmarking AutoML frameworks for disease prediction using medical claims. BioData Min 2022; 15:15. [PMID: 35883154 PMCID: PMC9327416 DOI: 10.1186/s13040-022-00300-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 06/27/2022] [Indexed: 11/10/2022] Open
Abstract
Objectives Ascertain and compare the performances of Automated Machine Learning (AutoML) tools on large, highly imbalanced healthcare datasets. Materials and Methods We generated a large dataset using historical de-identified administrative claims including demographic information and flags for disease codes in four different time windows prior to 2019. We then trained three AutoML tools on this dataset to predict six different disease outcomes in 2019 and evaluated model performances on several metrics. Results The AutoML tools showed improvement from the baseline random forest model but did not differ significantly from each other. All models recorded low area under the precision-recall curve and failed to predict true positives while keeping the true negative rate high. Model performance was not directly related to prevalence. We provide a specific use-case to illustrate how to select a threshold that gives the best balance between true and false positive rates, as this is an important consideration in medical applications. Discussion Healthcare datasets present several challenges for AutoML tools, including large sample size, high imbalance, and limitations in the available features. Improvements in scalability, combinations of imbalance-learning resampling and ensemble approaches, and curated feature selection are possible next steps to achieve better performance. Conclusion Among the three explored, no AutoML tool consistently outperforms the rest in terms of predictive performance. The performances of the models in this study suggest that there may be room for improvement in handling medical claims data. Finally, selection of the optimal prediction threshold should be guided by the specific practical application. Supplementary Information The online version contains supplementary material available at (10.1186/s13040-022-00300-2).
Collapse
Affiliation(s)
| | | | | | | | | | - Elisabetta Manduchi
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center Suite G540, West Hollywood, 90069, CA, USA
| | - Jason H Moore
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center Suite G540, West Hollywood, 90069, CA, USA.
| |
Collapse
|
12
|
Just Add Data: automated predictive modeling for knowledge discovery and feature selection. NPJ Precis Oncol 2022; 6:38. [PMID: 35710826 PMCID: PMC9203777 DOI: 10.1038/s41698-022-00274-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Accepted: 04/13/2022] [Indexed: 01/20/2023] Open
Abstract
Fully automated machine learning (AutoML) for predictive modeling is becoming a reality, giving rise to a whole new field. We present the basic ideas and principles of Just Add Data Bio (JADBio), an AutoML platform applicable to the low-sample, high-dimensional omics data that arise in translational medicine and bioinformatics applications. In addition to predictive and diagnostic models ready for clinical use, JADBio focuses on knowledge discovery by performing feature selection and identifying the corresponding biosignatures, i.e., minimal-size subsets of biomarkers that are jointly predictive of the outcome or phenotype of interest. It also returns a palette of useful information for interpretation, clinical use of the models, and decision making. JADBio is qualitatively and quantitatively compared against Hyper-Parameter Optimization Machine Learning libraries. Results show that in typical omics dataset analysis, JADBio manages to identify signatures comprising of just a handful of features while maintaining competitive predictive performance and accurate out-of-sample performance estimation.
Collapse
|
13
|
Artificial Intelligence for Health. COMPUTERS 2021. [DOI: 10.3390/computers10080100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Health is one of the major research topics that has been attracting cross-disciplinary research groups [...]
Collapse
|
14
|
Murovec B, Deutsch L, Stres B. General Unified Microbiome Profiling Pipeline (GUMPP) for Large Scale, Streamlined and Reproducible Analysis of Bacterial 16S rRNA Data to Predicted Microbial Metagenomes, Enzymatic Reactions and Metabolic Pathways. Metabolites 2021; 11:336. [PMID: 34074026 PMCID: PMC8225202 DOI: 10.3390/metabo11060336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Revised: 05/14/2021] [Accepted: 05/23/2021] [Indexed: 11/23/2022] Open
Abstract
General Unified Microbiome Profiling Pipeline (GUMPP) was developed for large scale, streamlined and reproducible analysis of bacterial 16S rRNA data and prediction of microbial metagenomes, enzymatic reactions and metabolic pathways from amplicon data. GUMPP workflow introduces reproducible data analyses at each of the three levels of resolution (genus; operational taxonomic units (OTUs); amplicon sequence variants (ASVs)). The ability to support reproducible analyses enables production of datasets that ultimately identify the biochemical pathways characteristic of disease pathology. These datasets coupled to biostatistics and mathematical approaches of machine learning can play a significant role in extraction of truly significant and meaningful information from a wide set of 16S rRNA datasets. The adoption of GUMPP in the gut-microbiota related research enables focusing on the generation of novel biomarkers that can lead to the development of mechanistic hypotheses applicable to the development of novel therapies in personalized medicine.
Collapse
Affiliation(s)
- Boštjan Murovec
- Faculty of Electrical Engineering, University of Ljubljana, Tržaška 25, SI-1000 Ljubljana, Slovenia;
| | - Leon Deutsch
- Biotechnical Faculty, University of Ljubljana, Jamnikarjeva 101, SI-1000 Ljubljana, Slovenia;
| | - Blaž Stres
- Biotechnical Faculty, University of Ljubljana, Jamnikarjeva 101, SI-1000 Ljubljana, Slovenia;
- Faculty of Civil and Geodetic Engineering, University of Ljubljana, Jamova 2, SI-1000 Ljubljana, Slovenia
- Department of Automation, Jožef Stefan Institute, Biocybernetics and Robotics, Jamova 39, SI-1000 Ljubljana, Slovenia
- Department of Microbiology, University of Innsbruck, Technikerstrasse 25d, A-6020 Innsbruck, Austria
| |
Collapse
|
15
|
Deutsch L, Stres B. The Importance of Objective Stool Classification in Fecal 1H-NMR Metabolomics: Exponential Increase in Stool Crosslinking Is Mirrored in Systemic Inflammation and Associated to Fecal Acetate and Methionine. Metabolites 2021; 11:172. [PMID: 33809780 PMCID: PMC8002301 DOI: 10.3390/metabo11030172] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 03/10/2021] [Accepted: 03/10/2021] [Indexed: 12/25/2022] Open
Abstract
Past studies strongly connected stool consistency-as measured by Bristol Stool Scale (BSS)-with microbial gene richness and intestinal inflammation, colonic transit time and metabolome characteristics that are of clinical relevance in numerous gastro intestinal conditions. While retention time, defecation rate, BSS but not water activity have been shown to account for BSS-associated inflammatory effects, the potential correlation with the strength of a gel in the context of intestinal forces, abrasion, mucus imprinting, fecal pore clogging remains unexplored as a shaping factor for intestinal inflammation and has yet to be determined. Our study introduced a minimal pressure approach (MP) by probe indentation as measure of stool material crosslinking in fecal samples. Results reported here were obtained from 170 samples collected in two independent projects, including males and females, covering a wide span of moisture contents and BSS. MP values increased exponentially with increasing consistency (i.e., lower BSS) and enabled stratification of samples exhibiting mixed BSS classes. A trade-off between lowest MP and highest dry matter content delineated the span of intermediate healthy density of gel crosslinks. The crossectional transects identified fecal surface layers with exceptionally high MP and of <5 mm thickness followed by internal structures with an order of magnitude lower MP, characteristic of healthy stool consistency. The MP and BSS values reported in this study were coupled to reanalysis of the PlanHab data and fecal 1H-NMR metabolomes reported before. The exponential association between stool consistency and MP determined in this study was mirrored in the elevated intestinal and also systemic inflammation and other detrimental physiological deconditioning effects observed in the PlanHab participants reported before. The MP approach described in this study can be used to better understand fecal hardness and its relationships to human health as it provides a simple, fine scale and objective stool classification approach for the characterization of the exact sampling locations in future microbiome and metabolome studies.
Collapse
Affiliation(s)
- Leon Deutsch
- Biotechnical Faculty, University of Ljubljana, Jamnikarjeva 101, SI-1000 Ljubljana, Slovenia;
| | - Blaz Stres
- Biotechnical Faculty, University of Ljubljana, Jamnikarjeva 101, SI-1000 Ljubljana, Slovenia;
- Faculty of Civil and Geodetic Engineering, University of Ljubljana, Jamova 2, SI-1000 Ljubljana, Slovenia
- Department of Automation, Jožef Stefan Institute, Biocybernetics and Robotics, Jamova 39, SI-1000 Ljubljana, Slovenia
- Department of Microbiology, University of Innsbruck, Technikerstrasse 25d, A-6020 Innsbruck, Austria
| |
Collapse
|
16
|
Aiding Clinical Triage with Text Classification. PROGRESS IN ARTIFICIAL INTELLIGENCE 2021. [DOI: 10.1007/978-3-030-86230-5_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|