Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Huang AA, Huang SY. Increasing transparency in machine learning through bootstrap simulation and shapely additive explanations. PLoS One 2023;18:e0281922. [PMID: 36821544 PMCID: PMC9949629 DOI: 10.1371/journal.pone.0281922] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 02/05/2023] [Indexed: 02/24/2023] Open

For:	Huang AA, Huang SY. Increasing transparency in machine learning through bootstrap simulation and shapely additive explanations. PLoS One 2023;18:e0281922. [PMID: 36821544 PMCID: PMC9949629 DOI: 10.1371/journal.pone.0281922] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 02/05/2023] [Indexed: 02/24/2023] Open

Number

Cited by Other Article(s)

Janssen SMW, Bouzembrak Y, Tekinerdogan B. Artificial Intelligence in Malnutrition: A systematic literature review. Adv Nutr 2024:100264. [PMID: 38971229 DOI: 10.1016/j.advnut.2024.100264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 06/03/2024] [Accepted: 06/26/2024] [Indexed: 07/08/2024] Open

Huang AA, Huang SY. Comparison of model feature importance statistics to identify covariates that contribute most to model accuracy in prediction of insomnia. PLoS One 2024;19:e0306359. [PMID: 38954735 PMCID: PMC11218970 DOI: 10.1371/journal.pone.0306359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Accepted: 06/14/2024] [Indexed: 07/04/2024] Open

Abstract

IMPORTANCE

Sleep is critical to a person's physical and mental health and there is a need to create high performing machine learning models and critically understand how models rank covariates.

OBJECTIVE

The study aimed to compare how different model metrics rank the importance of various covariates.

DESIGN, SETTING, AND PARTICIPANTS

A cross-sectional cohort study was conducted retrospectively using the National Health and Nutrition Examination Survey (NHANES), which is publicly available.

METHODS

This study employed univariate logistic models to filter out strong, independent covariates associated with sleep disorder outcome, which were then used in machine-learning models, of which, the most optimal was chosen. The machine-learning model was used to rank model covariates based on gain, cover, and frequency to identify risk factors for sleep disorder and feature importance was evaluated using both univariable and multivariable t-statistics. A correlation matrix was created to determine the similarity of the importance of variables ranked by different model metrics.

RESULTS

The XGBoost model had the highest mean AUROC of 0.865 (SD = 0.010) with Accuracy of 0.762 (SD = 0.019), F1 of 0.875 (SD = 0.766), Sensitivity of 0.768 (SD = 0.023), Specificity of 0.782 (SD = 0.025), Positive Predictive Value of 0.806 (SD = 0.025), and Negative Predictive Value of 0.737 (SD = 0.034). The model metrics from the machine learning of gain and cover were strongly positively correlated with one another (r > 0.70). Model metrics from the multivariable model and univariable model were weakly negatively correlated with machine learning model metrics (R between -0.3 and 0).

CONCLUSION

The ranking of important variables associated with sleep disorder in this cohort from the machine learning models were not related to those from regression models.

Collapse

Huang AA, Huang SY. Application of a transparent artificial intelligence algorithm for US adults in the obese category of weight. PLoS One 2024;19:e0304509. [PMID: 38820332 PMCID: PMC11142543 DOI: 10.1371/journal.pone.0304509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 05/13/2024] [Indexed: 06/02/2024] Open

Abstract

OBJECTIVE AND AIMS

Identification of associations between the obese category of weight in the general US population will continue to advance our understanding of the condition and allow clinicians, providers, communities, families, and individuals make more informed decisions. This study aims to improve the prediction of the obese category of weight and investigate its relationships with factors, ultimately contributing to healthier lifestyle choices and timely management of obesity.

METHODS

Questionnaires that included demographic, dietary, exercise and health information from the US National Health and Nutrition Examination Survey (NHANES 2017-2020) were utilized with BMI 30 or higher defined as obesity. A machine learning model, XGBoost predicted the obese category of weight and Shapely Additive Explanations (SHAP) visualized the various covariates and their feature importance. Model statistics including Area under the receiver operator curve (AUROC), sensitivity, specificity, positive predictive value, negative predictive value and feature properties such as gain, cover, and frequency were measured. SHAP explanations were created for transparent and interpretable analysis.

RESULTS

There were 6,146 adults (age > 18) that were included in the study with average age 58.39 (SD = 12.94) and 3122 (51%) females. The machine learning model had an Area under the receiver operator curve of 0.8295. The top four covariates include waist circumference (gain = 0.185), GGT (gain = 0.101), platelet count (gain = 0.059), AST (gain = 0.057), weight (gain = 0.049), HDL cholesterol (gain = 0.032), and ferritin (gain = 0.034).

CONCLUSION

In conclusion, the utilization of machine learning models proves to be highly effective in accurately predicting the obese category of weight. By considering various factors such as demographic information, laboratory results, physical examination findings, and lifestyle factors, these models successfully identify crucial risk factors associated with the obese category of weight.

Collapse

Thakur GK, Thakur A, Kulkarni S, Khan N, Khan S. Deep Learning Approaches for Medical Image Analysis and Diagnosis. Cureus 2024;16:e59507. [PMID: 38826977 PMCID: PMC11144045 DOI: 10.7759/cureus.59507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Accepted: 05/01/2024] [Indexed: 06/04/2024] Open

Terranova N, Renard D, Shahin MH, Menon S, Cao Y, Hop CECA, Hayes S, Madrasi K, Stodtmann S, Tensfeldt T, Vaddady P, Ellinwood N, Lu J. Artificial Intelligence for Quantitative Modeling in Drug Discovery and Development: An Innovation and Quality Consortium Perspective on Use Cases and Best Practices. Clin Pharmacol Ther 2024;115:658-672. [PMID: 37716910 DOI: 10.1002/cpt.3053] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 09/11/2023] [Indexed: 09/18/2023]

Maynard S, Farrington J, Alimam S, Evans H, Li K, Wong WK, Stanworth SJ. Machine learning in transfusion medicine: A scoping review. Transfusion 2024;64:162-184. [PMID: 37950535 DOI: 10.1111/trf.17582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 09/25/2023] [Accepted: 09/27/2023] [Indexed: 11/12/2023]

Hussain A, Marlowe S, Ali M, Uy E, Bhopalwala H, Gullapalli D, Vangara A, Haroon M, Akbar A, Piercy J. A Systematic Review of Artificial Intelligence Applications in the Management of Lung Disorders. Cureus 2024;16:e51581. [PMID: 38313926 PMCID: PMC10836179 DOI: 10.7759/cureus.51581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/02/2024] [Indexed: 02/06/2024] Open

Huang AA, Huang SY. Stochastic modeling of obesity status in United States adults using Markov Chains: A nationally representative analysis of population health data from 2017-2020. Obes Sci Pract 2023;9:653-660. [PMID: 38090680 PMCID: PMC10712400 DOI: 10.1002/osp4.697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 07/04/2023] [Accepted: 07/07/2023] [Indexed: 05/12/2024] Open

Abstract

Importance

The prevalence of obesity among United States adults has increased from 34.9% in 2013-2014 to 42.8% in 2017-2018. Developing methods to model the increase of obesity over-time is a necessity to know how to accurately quantify its cost and to develop solutions to combat this national public health emergency.

Methods

A cross-sectional cohort study using the publicly available National Health and Nutrition Examination Survey (NHANES 2017-2020) was conducted in individuals who completed the weight questionnaire and had accurate data for both weight at the time of survey and weight 10 years ago. To model the dynamics of obesity, a Markov transition state matrix was created, which allowed for the analysis of weight transitions over time. Bootstrap simulation was incorporated to account for uncertainty and generate multiple simulated datasets, providing a more robust estimation of the prevalence and trends in obesity within the cohort.

Results

Of the 6146 individuals who met the inclusion criteria, 3024 (49%) individuals were male and 3122 (51%) were female. There were 2252 (37%) White individuals, 1257 (20%) Hispanic individuals, 1636 (37%) Black individuals, and 739 (12%) Asian individuals. The average BMI was 30.16 (SD = 7.15), the average weight was 83.67 kilos (SD = 22.04), and the average weight change was a 3.27 kg (SD = 14.97) increase in body weight. A total of 2411 (39%) individuals lost weight, and 3735 (61%) individuals gained weight. 87 (1%) individuals were underweight (BMI <18.5), 2058 (33%) were normal weight (18.5 ≤ BMI <25), 1376 (22%) were overweight (25 ≤ BMI <30) and 2625 (43%) were in the obese category (BMI >30).

Conclusion

United States adults are at risk of transitioning from normal weight to the overweight or obese category. Markov modeling combined with bootstrap simulations can accurately model long-term weight status.

Collapse

Huang AA, Huang SY. Use of feature importance statistics to accurately predict asthma attacks using machine learning: A cross-sectional cohort study of the US population. PLoS One 2023;18:e0288903. [PMID: 37992024 PMCID: PMC10664888 DOI: 10.1371/journal.pone.0288903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 07/05/2023] [Indexed: 11/24/2023] Open

Abstract

BACKGROUND

Asthma attacks are a major cause of morbidity and mortality in vulnerable populations, and identification of associations with asthma attacks is necessary to improve public awareness and the timely delivery of medical interventions.

OBJECTIVE

The study aimed to identify feature importance of factors associated with asthma in a representative population of US adults.

METHODS

A cross-sectional analysis was conducted using a modern, nationally representative cohort, the National Health and Nutrition Examination Surveys (NHANES 2017-2020). All adult patients greater than 18 years of age (total of 7,922 individuals) with information on asthma attacks were included in the study. Univariable regression was used to identify significant nutritional covariates to be included in a machine learning model and feature importance was reported. The acquisition and analysis of the data were authorized by the National Center for Health Statistics Ethics Review Board.

RESULTS

7,922 patients met the inclusion criteria in this study. The machine learning model had 55 out of a total of 680 features that were found to be significant on univariate analysis (P<0.0001 used). In the XGBoost model the model had an Area Under the Receiver Operator Characteristic Curve (AUROC) = 0.737, Sensitivity = 0.960, NPV = 0.967. The top five highest ranked features by gain, a measure of the percentage contribution of the covariate to the overall model prediction, were Octanoic Acid intake as a Saturated Fatty Acid (SFA) (gm) (Gain = 8.8%), Eosinophil percent (Gain = 7.9%), BMXHIP-Hip Circumference (cm) (Gain = 7.2%), BMXHT-standing height (cm) (Gain = 6.2%) and HS C-Reactive Protein (mg/L) (Gain 6.1%).

CONCLUSION

Machine Learning models can additionally offer feature importance and additional statistics to help identify associations with asthma attacks.

Collapse

Lin W, Shi S, Huang H, Wen J, Chen G. Predicting risk of obesity in overweight adults using interpretable machine learning algorithms. Front Endocrinol (Lausanne) 2023;14:1292167. [PMID: 38047114 PMCID: PMC10693451 DOI: 10.3389/fendo.2023.1292167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 11/02/2023] [Indexed: 12/05/2023] Open

Huang AA, Huang SY. Technical Report: Machine-Learning Pipeline for Medical Research and Quality-Improvement Initiatives. Cureus 2023;15:e46549. [PMID: 37933338 PMCID: PMC10625496 DOI: 10.7759/cureus.46549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Accepted: 10/05/2023] [Indexed: 11/08/2023] Open

Huang AA, Huang SY. Exploring Depression and Nutritional Covariates Amongst US Adults using Shapely Additive Explanations. Health Sci Rep 2023;6:e1635. [PMID: 37867784 PMCID: PMC10588337 DOI: 10.1002/hsr2.1635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 05/02/2023] [Accepted: 10/10/2023] [Indexed: 10/24/2023] Open

Abstract

Background

Depression affects personal and public well-being and identification of natural therapeutics such as nutrition is necessary to help alleviate this public health concern.

Objective

The study aimed to identify feature importance in a machine learning model using solely nutrition covariates.

Methods

A retrospective analysis was conducted using a modern, nationally representative cohort, the National Health and Nutrition Examination Surveys (NHANES 2017-2020). Depressive symptoms were evaluated using the validated 9-item Patient Health Questionnaire (PHQ-9), and all adult patients (total of 7929 individuals) who completed the PHQ-9 and total nutritional intake questionnaire were included in the study. Univariable regression was used to identify significant nutritional covariates to be included in a machine learning model and feature importance was reported. The acquisition and analysis of the data were authorized by the National Center for Health Statistics Ethics Review Board.

Results

7929 patients met the inclusion criteria in this study. The machine learning model had 24 out of a total of 60 features that were found to be significant on univariate analysis (p < 0.01 used). In the XGBoost model the model had an Area Under the Receiver Operator Characteristic Curve (AUROC) = 0.603, Sensitivity = 0.943, Specificity = 0.163. The top four highest ranked features by gain, a measure of the percentage contribution of the covariate to the overall model prediction, were Potassium Intake (Gain = 6.8%), Vitamin E Intake (Gain = 5.7%), Number of Foods and Beverages Reported (Gain = 5.7%), and Vitamin K Intake (Gain 5.6%).

Conclusion

Machine learning models with feature importance can be utilized to identify nutritional covariates for further study in patients with clinical symptoms of depression.

Collapse

Huang AA, Huang SY. Dendrogram of transparent feature importance machine learning statistics to classify associations for heart failure: A reanalysis of a retrospective cohort study of the Medical Information Mart for Intensive Care III (MIMIC-III) database. PLoS One 2023;18:e0288819. [PMID: 37471315 PMCID: PMC10358877 DOI: 10.1371/journal.pone.0288819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 07/04/2023] [Indexed: 07/22/2023] Open

Abstract

BACKGROUND

There is a continual push for developing accurate predictors for Intensive Care Unit (ICU) admitted heart failure (HF) patients and in-hospital mortality.

OBJECTIVE

The study aimed to utilize transparent machine learning and create hierarchical clustering of key predictors based off of model importance statistics gain, cover, and frequency.

METHODS

Inclusion criteria of complete patient information for in-hospital mortality in the ICU with HF from the MIMIC-III database were randomly divided into a training (n = 941, 80%) and test (n = 235, 20%). A grid search was set to find hyperparameters. Machine Learning with XGBoost were used to predict mortality followed by feature importance with Shapely Additive Explanations (SHAP) and hierarchical clustering of model metrics with a dendrogram and heat map.

RESULTS

Of the 1,176 heart failure ICU patients that met inclusion criteria for the study, 558 (47.5%) were males. The mean age was 74.05 (SD = 12.85). XGBoost model had an area under the receiver operator curve of 0.662. The highest overall SHAP explanations were urine output, leukocytes, bicarbonate, and platelets. Average urine output was 1899.28 (SD = 1272.36) mL/day with the hospital mortality group having 1345.97 (SD = 1136.58) mL/day and the group without hospital mortality having 1986.91 (SD = 1271.16) mL/day. The average leukocyte count in the cohort was 10.72 (SD = 5.23) cells per microliter. For the hospital mortality group the leukocyte count was 13.47 (SD = 7.42) cells per microliter and for the group without hospital mortality the leukocyte count was 10.28 (SD = 4.66) cells per microliter. The average bicarbonate value was 26.91 (SD = 5.17) mEq/L. Amongst the group with hospital mortality the average bicarbonate value was 24.00 (SD = 5.42) mEq/L. Amongst the group without hospital mortality the average bicarbonate value was 27.37 (SD = 4.98) mEq/L. The average platelet value was 241.52 platelets per microliter. For the group with hospital mortality the average platelet value was 216.21 platelets per microliter. For the group without hospital mortality the average platelet value was 245.47 platelets per microliter. Cluster 1 of the dendrogram grouped the temperature, platelets, urine output, Saturation of partial pressure of Oxygen (SPO2), Leukocyte count, lymphocyte count, bicarbonate, anion gap, respiratory rate, PCO2, BMI, and age as most similar in having the highest aggregate gain, cover, and frequency metrics.

CONCLUSION

Machine Learning models that incorporate dendrograms and heat maps can offer additional summaries of model statistics in differentiating factors between in patient ICU mortality in heart failure patients.

Collapse

Huang AA, Huang SY. Diabetes is associated with increased risk of death in COVID-19 hospitalizations in Mexico 2020: A retrospective cohort study. Health Sci Rep 2023;6:e1416. [PMID: 37415678 PMCID: PMC10320697 DOI: 10.1002/hsr2.1416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 06/14/2023] [Accepted: 06/27/2023] [Indexed: 07/08/2023] Open

Abstract

Background and Aim

The COVID-19 disease course can be thought of as a function of prior risk factors consisting of comorbidities and outcomes. Survival analysis data for diabetic patients with COVID-19 from an up to date and representative sample can increase efficiency in resource allocation. The study aimed to quantify mortality in Mexico for individuals with diabetes in the setting of COVID-19 hospitalization.

Methods

This retrospective cohort study utilized publicly available data from the Mexican Federal Government, covering the period from April 14, 2020, to December 20, 2020 (last accessed). Survival analysis techniques were applied, including Kaplan-Meier curves to estimate survival probabilities, log-rank tests to compare survival between groups, Cox proportional hazard models to assess the association between diabetes and mortality risk, and restricted mean survival time (RMST) analyses to measure the average survival time.

Results

A total of 402,388 adults age greater than 18 with COVID-19 were used in the analysis. Mean age = 16.16 (SD = 15.55), 214,161 males (53%). Twenty-day Kaplan-Meier estimates of mortality were 32% for COVID-19 patients with diabetes and 10.2% for those without diabetes with log-rank p < 0.01. Univariable analysis showed increased mortality in diabetic patients (hazard ratio [HR]: 3.61, 95% confidence interval [CI]: 3.54-3.67, p < 0.01) showing a 254% increase in death. After controlling for confounding variables, multivariate analysis continued to show increased mortality in diabetics (HR: 1.37, 95% CI: 1.29-1.44, p < 0.01) indicating a 37% increase in death. Multivariable RMST at Day 20 showed in Mexico, hospitalized COVID-19 patients were associated with less mean survival time by 2.01 days (p < 0.01) and a 10% increased mortality (p < 0.01).

Conclusions

In the present analysis, COVID-19 patients with diabetes in Mexico had shorter survival times. Further interventions aimed at improving comorbidities in the population, particularly in individuals with diabetes, may contribute to better outcomes in COVID-19 patients.

Collapse