1
|
Janssen SMW, Bouzembrak Y, Tekinerdogan B. Artificial Intelligence in Malnutrition: A systematic literature review. Adv Nutr 2024:100264. [PMID: 38971229 DOI: 10.1016/j.advnut.2024.100264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 06/03/2024] [Accepted: 06/26/2024] [Indexed: 07/08/2024] Open
Abstract
Malnutrition among the population of the world is a frequent yet underdiagnosed problem in both children and adults. Development of malnutrition screening and diagnostic tools for early detection of malnutrition is necessary to prevent long-term complications to patients' health and well-being. Most of these tools are based on predefined questionnaires and consensus guidelines. The use of artificial intelligence (AI) allows for automated tools to detect malnutrition in an earlier stage to prevent long-term consequences. In this study, a systematic literature review was carried out with the goal of providing detailed information on what patient groups, screening tools, machine learning algorithms, data types, and variables are being used as well as the current limitations and implementation stage of these AI based tools. The results showed that a staggering majority exceeding 90 percent of all AI models go unused in day-to-day clinical practice. Furthermore, supervised learning models seemed to be the most popular type of learning. Alongside this, disease-related malnutrition was the most common category of malnutrition found in the analysis of all primary studies. The current research provides a resource for researchers to identify directions for their research on the use of AI in in Malnutrition.
Collapse
Affiliation(s)
- Sander M W Janssen
- Information Technology Group, Wageningen University and Research, Wageningen, The Netherlands
| | - Yamine Bouzembrak
- Information Technology Group, Wageningen University and Research, Wageningen, The Netherlands.
| | - Bedir Tekinerdogan
- Information Technology Group, Wageningen University and Research, Wageningen, The Netherlands
| |
Collapse
|
2
|
Huang AA, Huang SY. Comparison of model feature importance statistics to identify covariates that contribute most to model accuracy in prediction of insomnia. PLoS One 2024; 19:e0306359. [PMID: 38954735 PMCID: PMC11218970 DOI: 10.1371/journal.pone.0306359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Accepted: 06/14/2024] [Indexed: 07/04/2024] Open
Abstract
IMPORTANCE Sleep is critical to a person's physical and mental health and there is a need to create high performing machine learning models and critically understand how models rank covariates. OBJECTIVE The study aimed to compare how different model metrics rank the importance of various covariates. DESIGN, SETTING, AND PARTICIPANTS A cross-sectional cohort study was conducted retrospectively using the National Health and Nutrition Examination Survey (NHANES), which is publicly available. METHODS This study employed univariate logistic models to filter out strong, independent covariates associated with sleep disorder outcome, which were then used in machine-learning models, of which, the most optimal was chosen. The machine-learning model was used to rank model covariates based on gain, cover, and frequency to identify risk factors for sleep disorder and feature importance was evaluated using both univariable and multivariable t-statistics. A correlation matrix was created to determine the similarity of the importance of variables ranked by different model metrics. RESULTS The XGBoost model had the highest mean AUROC of 0.865 (SD = 0.010) with Accuracy of 0.762 (SD = 0.019), F1 of 0.875 (SD = 0.766), Sensitivity of 0.768 (SD = 0.023), Specificity of 0.782 (SD = 0.025), Positive Predictive Value of 0.806 (SD = 0.025), and Negative Predictive Value of 0.737 (SD = 0.034). The model metrics from the machine learning of gain and cover were strongly positively correlated with one another (r > 0.70). Model metrics from the multivariable model and univariable model were weakly negatively correlated with machine learning model metrics (R between -0.3 and 0). CONCLUSION The ranking of important variables associated with sleep disorder in this cohort from the machine learning models were not related to those from regression models.
Collapse
Affiliation(s)
- Alexander A. Huang
- Northwestern University Feinberg School of Medicine, Chicago, IL, United States of America
| | - Samuel Y. Huang
- Virginia Commonwealth University School of Medicine, Richmond, VA, United States of America
| |
Collapse
|
3
|
Huang AA, Huang SY. Application of a transparent artificial intelligence algorithm for US adults in the obese category of weight. PLoS One 2024; 19:e0304509. [PMID: 38820332 PMCID: PMC11142543 DOI: 10.1371/journal.pone.0304509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 05/13/2024] [Indexed: 06/02/2024] Open
Abstract
OBJECTIVE AND AIMS Identification of associations between the obese category of weight in the general US population will continue to advance our understanding of the condition and allow clinicians, providers, communities, families, and individuals make more informed decisions. This study aims to improve the prediction of the obese category of weight and investigate its relationships with factors, ultimately contributing to healthier lifestyle choices and timely management of obesity. METHODS Questionnaires that included demographic, dietary, exercise and health information from the US National Health and Nutrition Examination Survey (NHANES 2017-2020) were utilized with BMI 30 or higher defined as obesity. A machine learning model, XGBoost predicted the obese category of weight and Shapely Additive Explanations (SHAP) visualized the various covariates and their feature importance. Model statistics including Area under the receiver operator curve (AUROC), sensitivity, specificity, positive predictive value, negative predictive value and feature properties such as gain, cover, and frequency were measured. SHAP explanations were created for transparent and interpretable analysis. RESULTS There were 6,146 adults (age > 18) that were included in the study with average age 58.39 (SD = 12.94) and 3122 (51%) females. The machine learning model had an Area under the receiver operator curve of 0.8295. The top four covariates include waist circumference (gain = 0.185), GGT (gain = 0.101), platelet count (gain = 0.059), AST (gain = 0.057), weight (gain = 0.049), HDL cholesterol (gain = 0.032), and ferritin (gain = 0.034). CONCLUSION In conclusion, the utilization of machine learning models proves to be highly effective in accurately predicting the obese category of weight. By considering various factors such as demographic information, laboratory results, physical examination findings, and lifestyle factors, these models successfully identify crucial risk factors associated with the obese category of weight.
Collapse
Affiliation(s)
- Alexander A. Huang
- Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America
| | - Samuel Y. Huang
- Virginia Commonwealth University School of Medicine, Richmond, Virginia, United States of America
| |
Collapse
|
4
|
Thakur GK, Thakur A, Kulkarni S, Khan N, Khan S. Deep Learning Approaches for Medical Image Analysis and Diagnosis. Cureus 2024; 16:e59507. [PMID: 38826977 PMCID: PMC11144045 DOI: 10.7759/cureus.59507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Accepted: 05/01/2024] [Indexed: 06/04/2024] Open
Abstract
In addition to enhancing diagnostic accuracy, deep learning techniques offer the potential to streamline workflows, reduce interpretation time, and ultimately improve patient outcomes. The scalability and adaptability of deep learning algorithms enable their deployment across diverse clinical settings, ranging from radiology departments to point-of-care facilities. Furthermore, ongoing research efforts focus on addressing the challenges of data heterogeneity, model interpretability, and regulatory compliance, paving the way for seamless integration of deep learning solutions into routine clinical practice. As the field continues to evolve, collaborations between clinicians, data scientists, and industry stakeholders will be paramount in harnessing the full potential of deep learning for advancing medical image analysis and diagnosis. Furthermore, the integration of deep learning algorithms with other technologies, including natural language processing and computer vision, may foster multimodal medical data analysis and clinical decision support systems to improve patient care. The future of deep learning in medical image analysis and diagnosis is promising. With each success and advancement, this technology is getting closer to being leveraged for medical purposes. Beyond medical image analysis, patient care pathways like multimodal imaging, imaging genomics, and intelligent operating rooms or intensive care units can benefit from deep learning models.
Collapse
Affiliation(s)
- Gopal Kumar Thakur
- Department of Data Sciences, Harrisburg University of Science and Technology, Harrisburg, USA
| | - Abhishek Thakur
- Department of Data Sciences, Harrisburg University of Science and Technology, Harrisburg, USA
| | - Shridhar Kulkarni
- Department of Data Sciences, Harrisburg University of Science and Technology, Harrisburg, USA
| | - Naseebia Khan
- Department of Data Sciences, Harrisburg University of Science and Technology, Harrisburg, USA
| | - Shahnawaz Khan
- Department of Computer Application, Bundelkhand University, Jhansi, IND
| |
Collapse
|
5
|
Terranova N, Renard D, Shahin MH, Menon S, Cao Y, Hop CECA, Hayes S, Madrasi K, Stodtmann S, Tensfeldt T, Vaddady P, Ellinwood N, Lu J. Artificial Intelligence for Quantitative Modeling in Drug Discovery and Development: An Innovation and Quality Consortium Perspective on Use Cases and Best Practices. Clin Pharmacol Ther 2024; 115:658-672. [PMID: 37716910 DOI: 10.1002/cpt.3053] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 09/11/2023] [Indexed: 09/18/2023]
Abstract
Recent breakthroughs in artificial intelligence (AI) and machine learning (ML) have ushered in a new era of possibilities across various scientific domains. One area where these advancements hold significant promise is model-informed drug discovery and development (MID3). To foster a wider adoption and acceptance of these advanced algorithms, the Innovation and Quality (IQ) Consortium initiated the AI/ML working group in 2021 with the aim of promoting their acceptance among the broader scientific community as well as by regulatory agencies. By drawing insights from workshops organized by the working group and attended by key stakeholders across the biopharma industry, academia, and regulatory agencies, this white paper provides a perspective from the IQ Consortium. The range of applications covered in this white paper encompass the following thematic topics: (i) AI/ML-enabled Analytics for Pharmacometrics and Quantitative Systems Pharmacology (QSP) Workflows; (ii) Explainable Artificial Intelligence and its Applications in Disease Progression Modeling; (iii) Natural Language Processing (NLP) in Quantitative Pharmacology Modeling; and (iv) AI/ML Utilization in Drug Discovery. Additionally, the paper offers a set of best practices to ensure an effective and responsible use of AI, including considering the context of use, explainability and generalizability of models, and having human-in-the-loop. We believe that embracing the transformative power of AI in quantitative modeling while adopting a set of good practices can unlock new opportunities for innovation, increase efficiency, and ultimately bring benefits to patients.
Collapse
Affiliation(s)
- Nadia Terranova
- Quantitative Pharmacology, Merck KGaA, Lausanne, Switzerland
| | - Didier Renard
- Full Development Pharmacometrics, Novartis Pharma AG, Basel, Switzerland
| | | | - Sujatha Menon
- Clinical Pharmacology, Pfizer Inc., Groton, Connecticut, USA
| | - Youfang Cao
- Clinical Pharmacology and Translational Medicine, Eisai Inc., Nutley, New Jersey, USA
| | | | - Sean Hayes
- Quantitative Pharmacology & Pharmacometrics, Merck & Co. Inc., Rahway, New Jersey, USA
| | - Kumpal Madrasi
- Modeling & Simulation, Sanofi, Bridgewater, New Jersey, USA
| | - Sven Stodtmann
- Pharmacometrics, AbbVie Deutschland GmbH & Co. KG, Ludwigshafen, Germany
| | | | - Pavan Vaddady
- Quantitative Clinical Pharmacology, Daiichi Sankyo, Inc., Basking Ridge, New Jersey, USA
| | | | - James Lu
- Clinical Pharmacology, Genentech Inc., South San Francisco, California, USA
| |
Collapse
|
6
|
Maynard S, Farrington J, Alimam S, Evans H, Li K, Wong WK, Stanworth SJ. Machine learning in transfusion medicine: A scoping review. Transfusion 2024; 64:162-184. [PMID: 37950535 DOI: 10.1111/trf.17582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 09/25/2023] [Accepted: 09/27/2023] [Indexed: 11/12/2023]
Affiliation(s)
- Suzanne Maynard
- Medical Sciences Division, Radcliffe Department of Medicine, University of Oxford, Oxford, UK
- NIHR Blood and Transplant Research Unit in Data Driven Transfusion Practice, Nuffield Division of Clinical Laboratory Sciences, Radcliffe Department of Medicine, University of Oxford, Oxford, UK
- NHSBT and Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Joseph Farrington
- Institute of Health Informatics, University College London, London, UK
| | - Samah Alimam
- Haematology Department, University College London Hospitals NHS Foundation Trust, London, UK
| | - Hayley Evans
- NIHR Blood and Transplant Research Unit in Data Driven Transfusion Practice, Nuffield Division of Clinical Laboratory Sciences, Radcliffe Department of Medicine, University of Oxford, Oxford, UK
| | - Kezhi Li
- Institute of Health Informatics, University College London, London, UK
| | - Wai Keong Wong
- Director of Digital, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Simon J Stanworth
- Medical Sciences Division, Radcliffe Department of Medicine, University of Oxford, Oxford, UK
- NIHR Blood and Transplant Research Unit in Data Driven Transfusion Practice, Nuffield Division of Clinical Laboratory Sciences, Radcliffe Department of Medicine, University of Oxford, Oxford, UK
- NHSBT and Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| |
Collapse
|
7
|
Hussain A, Marlowe S, Ali M, Uy E, Bhopalwala H, Gullapalli D, Vangara A, Haroon M, Akbar A, Piercy J. A Systematic Review of Artificial Intelligence Applications in the Management of Lung Disorders. Cureus 2024; 16:e51581. [PMID: 38313926 PMCID: PMC10836179 DOI: 10.7759/cureus.51581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/02/2024] [Indexed: 02/06/2024] Open
Abstract
This systematic review examines the transformative impact of artificial intelligence (AI) in managing lung disorders through a comprehensive analysis of articles spanning 2014 to 2023. Evaluating AI's multifaceted roles in radiological imaging, disease burden prediction, detection, diagnosis, and molecular mechanisms, this review presents a critical synthesis of key insights from select articles. The findings underscore AI's significant strides in bolstering diagnostic accuracy, interpreting radiological imaging, predicting disease burdens, and deepening the understanding of tuberculosis (TB), chronic obstructive pulmonary disease (COPD), silicosis, pneumoconiosis, and lung fibrosis. The synthesis positions AI as a revolutionary tool within the healthcare system, offering vital implications for healthcare workers, policymakers, and researchers in comprehending and leveraging AI's pivotal role in lung disease management.
Collapse
Affiliation(s)
- Akbar Hussain
- Internal Medicine, Appalachian Regional Healthcare, Harlan, USA
| | - Stanley Marlowe
- Internal Medicine, Appalachian Regional Healthcare, Harlan, USA
| | - Muhammad Ali
- Pulmonary and Critical Care, Appalachian Regional Healthcare, Hazard, USA
| | - Edilfavia Uy
- Diabetes and Endocrinology, Appalachian Regional Healthcare, Whitesburg, USA
| | - Huzefa Bhopalwala
- Internal Medicine, Appalachian Regional Healthcare, Whitesburg, USA
- Cardiovascular, Mayo Clinic, Rochester, USA
| | | | - Avinash Vangara
- Internal Medicine, Appalachian Regional Healthcare, Harlan, USA
| | - Moeez Haroon
- Internal Medicine, Appalachian Regional Healthcare, Harlan, USA
| | - Aelia Akbar
- Public Health, Appalachian Regional Healthcare, Harlan, USA
| | - Jonathan Piercy
- Internal Medicine, Appalachian Regional Healthcare, Whitesburg, USA
| |
Collapse
|
8
|
Huang AA, Huang SY. Stochastic modeling of obesity status in United States adults using Markov Chains: A nationally representative analysis of population health data from 2017-2020. Obes Sci Pract 2023; 9:653-660. [PMID: 38090680 PMCID: PMC10712400 DOI: 10.1002/osp4.697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 07/04/2023] [Accepted: 07/07/2023] [Indexed: 05/12/2024] Open
Abstract
Importance The prevalence of obesity among United States adults has increased from 34.9% in 2013-2014 to 42.8% in 2017-2018. Developing methods to model the increase of obesity over-time is a necessity to know how to accurately quantify its cost and to develop solutions to combat this national public health emergency. Methods A cross-sectional cohort study using the publicly available National Health and Nutrition Examination Survey (NHANES 2017-2020) was conducted in individuals who completed the weight questionnaire and had accurate data for both weight at the time of survey and weight 10 years ago. To model the dynamics of obesity, a Markov transition state matrix was created, which allowed for the analysis of weight transitions over time. Bootstrap simulation was incorporated to account for uncertainty and generate multiple simulated datasets, providing a more robust estimation of the prevalence and trends in obesity within the cohort. Results Of the 6146 individuals who met the inclusion criteria, 3024 (49%) individuals were male and 3122 (51%) were female. There were 2252 (37%) White individuals, 1257 (20%) Hispanic individuals, 1636 (37%) Black individuals, and 739 (12%) Asian individuals. The average BMI was 30.16 (SD = 7.15), the average weight was 83.67 kilos (SD = 22.04), and the average weight change was a 3.27 kg (SD = 14.97) increase in body weight. A total of 2411 (39%) individuals lost weight, and 3735 (61%) individuals gained weight. 87 (1%) individuals were underweight (BMI <18.5), 2058 (33%) were normal weight (18.5 ≤ BMI <25), 1376 (22%) were overweight (25 ≤ BMI <30) and 2625 (43%) were in the obese category (BMI >30). Conclusion United States adults are at risk of transitioning from normal weight to the overweight or obese category. Markov modeling combined with bootstrap simulations can accurately model long-term weight status.
Collapse
Affiliation(s)
- Alexander A. Huang
- Cornell UniversityIthacaNew YorkUSA
- Northwestern University Feinberg School of MedicineChicagoIllinoisUSA
| | - Samuel Y. Huang
- Cornell UniversityIthacaNew YorkUSA
- Virginia Commonwealth University School of MedicineRichmondVirginiaUSA
| |
Collapse
|
9
|
Huang AA, Huang SY. Use of feature importance statistics to accurately predict asthma attacks using machine learning: A cross-sectional cohort study of the US population. PLoS One 2023; 18:e0288903. [PMID: 37992024 PMCID: PMC10664888 DOI: 10.1371/journal.pone.0288903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 07/05/2023] [Indexed: 11/24/2023] Open
Abstract
BACKGROUND Asthma attacks are a major cause of morbidity and mortality in vulnerable populations, and identification of associations with asthma attacks is necessary to improve public awareness and the timely delivery of medical interventions. OBJECTIVE The study aimed to identify feature importance of factors associated with asthma in a representative population of US adults. METHODS A cross-sectional analysis was conducted using a modern, nationally representative cohort, the National Health and Nutrition Examination Surveys (NHANES 2017-2020). All adult patients greater than 18 years of age (total of 7,922 individuals) with information on asthma attacks were included in the study. Univariable regression was used to identify significant nutritional covariates to be included in a machine learning model and feature importance was reported. The acquisition and analysis of the data were authorized by the National Center for Health Statistics Ethics Review Board. RESULTS 7,922 patients met the inclusion criteria in this study. The machine learning model had 55 out of a total of 680 features that were found to be significant on univariate analysis (P<0.0001 used). In the XGBoost model the model had an Area Under the Receiver Operator Characteristic Curve (AUROC) = 0.737, Sensitivity = 0.960, NPV = 0.967. The top five highest ranked features by gain, a measure of the percentage contribution of the covariate to the overall model prediction, were Octanoic Acid intake as a Saturated Fatty Acid (SFA) (gm) (Gain = 8.8%), Eosinophil percent (Gain = 7.9%), BMXHIP-Hip Circumference (cm) (Gain = 7.2%), BMXHT-standing height (cm) (Gain = 6.2%) and HS C-Reactive Protein (mg/L) (Gain 6.1%). CONCLUSION Machine Learning models can additionally offer feature importance and additional statistics to help identify associations with asthma attacks.
Collapse
Affiliation(s)
- Alexander A. Huang
- Northwestern University Feinberg School of Medicine, Chicago, IL, United States of America
| | - Samuel Y. Huang
- Virginia Commonwealth University School of Medicine, Richmond, VA, United States of America
| |
Collapse
|
10
|
Lin W, Shi S, Huang H, Wen J, Chen G. Predicting risk of obesity in overweight adults using interpretable machine learning algorithms. Front Endocrinol (Lausanne) 2023; 14:1292167. [PMID: 38047114 PMCID: PMC10693451 DOI: 10.3389/fendo.2023.1292167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 11/02/2023] [Indexed: 12/05/2023] Open
Abstract
Objective To screen for predictive obesity factors in overweight populations using an optimal and interpretable machine learning algorithm. Methods This cross-sectional study was conducted between June 2011 and January 2012. The participants were randomly selected using a simple random sampling technique. Seven commonly used machine learning methods were employed to construct obesity risk prediction models. A total of 5,236 Chinese participants from Ningde City, Fujian Province, Southeast China, participated in this study. The best model was selected through appropriate verification and validation and suitably explained. Subsequently, a minimal set of significant predictors was identified. The Shapley additive explanation force plot was used to illustrate the model at the individual level. Results Machine learning models for predicting obesity have demonstrated strong performance, with CatBoost emerging as the most effective in both model validity and net clinical benefit. Specifically, the CatBoost algorithm yielded the highest scores, registering 0.91 in the training set and an impressive 0.83 in the test set. This was further corroborated by the area under the curve (AUC) metrics, where CatBoost achieved 0.95 for the training set and 0.87 for the test set. In a rigorous five-fold cross-validation, the AUC for the CatBoost model ranged between 0.84 and 0.91, with an average AUC of ROC at 0.87 ± 0.022. Key predictors identified within these models included waist circumference, hip circumference, female gender, and systolic blood pressure. Conclusion CatBoost may be the best machine learning method for prediction. Combining Shapley's additive explanation and machine learning methods can be effective in identifying disease risk factors for prevention and control.
Collapse
Affiliation(s)
- Wei Lin
- Department of Endocrinology, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital, Fuzhou, China
| | - Songchang Shi
- Department of Critical Care Medicine, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital South Branch, Fujian Provincial Hospital Jinshan Branch, Fujian Provincial Hospital, Fuzhou, China
| | - Huibin Huang
- Department of Endocrinology, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital, Fuzhou, China
| | - Junping Wen
- Department of Endocrinology, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital, Fuzhou, China
| | - Gang Chen
- Department of Endocrinology, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital, Fuzhou, China
| |
Collapse
|
11
|
Huang AA, Huang SY. Technical Report: Machine-Learning Pipeline for Medical Research and Quality-Improvement Initiatives. Cureus 2023; 15:e46549. [PMID: 37933338 PMCID: PMC10625496 DOI: 10.7759/cureus.46549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Accepted: 10/05/2023] [Indexed: 11/08/2023] Open
Abstract
Machine-learning techniques have been increasing in popularity within medicine during the past decade. However, these computational techniques are not presented in statistical lectures throughout medical school and are perceived to have a high barrier to entry. The objective is to develop a concise pipeline with publicly available data to decrease the learning time towards using machine learning for medical research and quality-improvement initiatives. This report utilized a publicly available machine-learning data package in R (MLDataR) and computational packages (XGBoost) to highlight techniques for machine-learning model development and visualization with SHaply Additive exPlanations (SHAP). A simple six-step process along with example code was constructed to build and visualize machine-learning models. A concrete set of three steps was developed to help with interpretation. Further teaching of these methods could benefit researchers by providing alternative methods for data analysis in medical studies. These could help researchers without computational experience to get a feel for machine learning to better understand the literature and technique.
Collapse
Affiliation(s)
- Alexander A Huang
- Surgery, Northwestern University Feinberg School of Medicine, Chicago, USA
| | - Samuel Y Huang
- Internal Medicine, Icahn School of Medicine at Mount Sinai South Nassau, Oceanside, USA
| |
Collapse
|
12
|
Huang AA, Huang SY. Exploring Depression and Nutritional Covariates Amongst US Adults using Shapely Additive Explanations. Health Sci Rep 2023; 6:e1635. [PMID: 37867784 PMCID: PMC10588337 DOI: 10.1002/hsr2.1635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 05/02/2023] [Accepted: 10/10/2023] [Indexed: 10/24/2023] Open
Abstract
Background Depression affects personal and public well-being and identification of natural therapeutics such as nutrition is necessary to help alleviate this public health concern. Objective The study aimed to identify feature importance in a machine learning model using solely nutrition covariates. Methods A retrospective analysis was conducted using a modern, nationally representative cohort, the National Health and Nutrition Examination Surveys (NHANES 2017-2020). Depressive symptoms were evaluated using the validated 9-item Patient Health Questionnaire (PHQ-9), and all adult patients (total of 7929 individuals) who completed the PHQ-9 and total nutritional intake questionnaire were included in the study. Univariable regression was used to identify significant nutritional covariates to be included in a machine learning model and feature importance was reported. The acquisition and analysis of the data were authorized by the National Center for Health Statistics Ethics Review Board. Results 7929 patients met the inclusion criteria in this study. The machine learning model had 24 out of a total of 60 features that were found to be significant on univariate analysis (p < 0.01 used). In the XGBoost model the model had an Area Under the Receiver Operator Characteristic Curve (AUROC) = 0.603, Sensitivity = 0.943, Specificity = 0.163. The top four highest ranked features by gain, a measure of the percentage contribution of the covariate to the overall model prediction, were Potassium Intake (Gain = 6.8%), Vitamin E Intake (Gain = 5.7%), Number of Foods and Beverages Reported (Gain = 5.7%), and Vitamin K Intake (Gain 5.6%). Conclusion Machine learning models with feature importance can be utilized to identify nutritional covariates for further study in patients with clinical symptoms of depression.
Collapse
Affiliation(s)
| | - Samuel Y. Huang
- Virginia Commonwealth University School of MedicineRichmondVirginiaUSA
| |
Collapse
|
13
|
Huang AA, Huang SY. Dendrogram of transparent feature importance machine learning statistics to classify associations for heart failure: A reanalysis of a retrospective cohort study of the Medical Information Mart for Intensive Care III (MIMIC-III) database. PLoS One 2023; 18:e0288819. [PMID: 37471315 PMCID: PMC10358877 DOI: 10.1371/journal.pone.0288819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 07/04/2023] [Indexed: 07/22/2023] Open
Abstract
BACKGROUND There is a continual push for developing accurate predictors for Intensive Care Unit (ICU) admitted heart failure (HF) patients and in-hospital mortality. OBJECTIVE The study aimed to utilize transparent machine learning and create hierarchical clustering of key predictors based off of model importance statistics gain, cover, and frequency. METHODS Inclusion criteria of complete patient information for in-hospital mortality in the ICU with HF from the MIMIC-III database were randomly divided into a training (n = 941, 80%) and test (n = 235, 20%). A grid search was set to find hyperparameters. Machine Learning with XGBoost were used to predict mortality followed by feature importance with Shapely Additive Explanations (SHAP) and hierarchical clustering of model metrics with a dendrogram and heat map. RESULTS Of the 1,176 heart failure ICU patients that met inclusion criteria for the study, 558 (47.5%) were males. The mean age was 74.05 (SD = 12.85). XGBoost model had an area under the receiver operator curve of 0.662. The highest overall SHAP explanations were urine output, leukocytes, bicarbonate, and platelets. Average urine output was 1899.28 (SD = 1272.36) mL/day with the hospital mortality group having 1345.97 (SD = 1136.58) mL/day and the group without hospital mortality having 1986.91 (SD = 1271.16) mL/day. The average leukocyte count in the cohort was 10.72 (SD = 5.23) cells per microliter. For the hospital mortality group the leukocyte count was 13.47 (SD = 7.42) cells per microliter and for the group without hospital mortality the leukocyte count was 10.28 (SD = 4.66) cells per microliter. The average bicarbonate value was 26.91 (SD = 5.17) mEq/L. Amongst the group with hospital mortality the average bicarbonate value was 24.00 (SD = 5.42) mEq/L. Amongst the group without hospital mortality the average bicarbonate value was 27.37 (SD = 4.98) mEq/L. The average platelet value was 241.52 platelets per microliter. For the group with hospital mortality the average platelet value was 216.21 platelets per microliter. For the group without hospital mortality the average platelet value was 245.47 platelets per microliter. Cluster 1 of the dendrogram grouped the temperature, platelets, urine output, Saturation of partial pressure of Oxygen (SPO2), Leukocyte count, lymphocyte count, bicarbonate, anion gap, respiratory rate, PCO2, BMI, and age as most similar in having the highest aggregate gain, cover, and frequency metrics. CONCLUSION Machine Learning models that incorporate dendrograms and heat maps can offer additional summaries of model statistics in differentiating factors between in patient ICU mortality in heart failure patients.
Collapse
Affiliation(s)
- Alexander A. Huang
- Department of MD Education, Northwestern University Feinberg School of Medicine, Chicago, IL, United States of America
| | - Samuel Y. Huang
- Department of Internal Medicine, Virginia Commonwealth University School of Medicine, Richmond, VA, United States of America
| |
Collapse
|
14
|
Huang AA, Huang SY. Diabetes is associated with increased risk of death in COVID-19 hospitalizations in Mexico 2020: A retrospective cohort study. Health Sci Rep 2023; 6:e1416. [PMID: 37415678 PMCID: PMC10320697 DOI: 10.1002/hsr2.1416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 06/14/2023] [Accepted: 06/27/2023] [Indexed: 07/08/2023] Open
Abstract
Background and Aim The COVID-19 disease course can be thought of as a function of prior risk factors consisting of comorbidities and outcomes. Survival analysis data for diabetic patients with COVID-19 from an up to date and representative sample can increase efficiency in resource allocation. The study aimed to quantify mortality in Mexico for individuals with diabetes in the setting of COVID-19 hospitalization. Methods This retrospective cohort study utilized publicly available data from the Mexican Federal Government, covering the period from April 14, 2020, to December 20, 2020 (last accessed). Survival analysis techniques were applied, including Kaplan-Meier curves to estimate survival probabilities, log-rank tests to compare survival between groups, Cox proportional hazard models to assess the association between diabetes and mortality risk, and restricted mean survival time (RMST) analyses to measure the average survival time. Results A total of 402,388 adults age greater than 18 with COVID-19 were used in the analysis. Mean age = 16.16 (SD = 15.55), 214,161 males (53%). Twenty-day Kaplan-Meier estimates of mortality were 32% for COVID-19 patients with diabetes and 10.2% for those without diabetes with log-rank p < 0.01. Univariable analysis showed increased mortality in diabetic patients (hazard ratio [HR]: 3.61, 95% confidence interval [CI]: 3.54-3.67, p < 0.01) showing a 254% increase in death. After controlling for confounding variables, multivariate analysis continued to show increased mortality in diabetics (HR: 1.37, 95% CI: 1.29-1.44, p < 0.01) indicating a 37% increase in death. Multivariable RMST at Day 20 showed in Mexico, hospitalized COVID-19 patients were associated with less mean survival time by 2.01 days (p < 0.01) and a 10% increased mortality (p < 0.01). Conclusions In the present analysis, COVID-19 patients with diabetes in Mexico had shorter survival times. Further interventions aimed at improving comorbidities in the population, particularly in individuals with diabetes, may contribute to better outcomes in COVID-19 patients.
Collapse
Affiliation(s)
- Alexander A. Huang
- Department of MD EducationNorthwestern University Feinberg School of MedicineChicagoIllinoisUSA
| | - Samuel Y. Huang
- Department of Internal MedicineVirginia Commonwealth University School of MedicineRichmondVirginiaUSA
| |
Collapse
|