Huang AA, Huang SY. Use of feature importance statistics to accurately predict asthma attacks using machine learning: A cross-sectional cohort study of the US population.
PLoS One 2023;
18:e0288903. [PMID:
37992024 PMCID:
PMC10664888 DOI:
10.1371/journal.pone.0288903]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 07/05/2023] [Indexed: 11/24/2023] Open
Abstract
BACKGROUND
Asthma attacks are a major cause of morbidity and mortality in vulnerable populations, and identification of associations with asthma attacks is necessary to improve public awareness and the timely delivery of medical interventions.
OBJECTIVE
The study aimed to identify feature importance of factors associated with asthma in a representative population of US adults.
METHODS
A cross-sectional analysis was conducted using a modern, nationally representative cohort, the National Health and Nutrition Examination Surveys (NHANES 2017-2020). All adult patients greater than 18 years of age (total of 7,922 individuals) with information on asthma attacks were included in the study. Univariable regression was used to identify significant nutritional covariates to be included in a machine learning model and feature importance was reported. The acquisition and analysis of the data were authorized by the National Center for Health Statistics Ethics Review Board.
RESULTS
7,922 patients met the inclusion criteria in this study. The machine learning model had 55 out of a total of 680 features that were found to be significant on univariate analysis (P<0.0001 used). In the XGBoost model the model had an Area Under the Receiver Operator Characteristic Curve (AUROC) = 0.737, Sensitivity = 0.960, NPV = 0.967. The top five highest ranked features by gain, a measure of the percentage contribution of the covariate to the overall model prediction, were Octanoic Acid intake as a Saturated Fatty Acid (SFA) (gm) (Gain = 8.8%), Eosinophil percent (Gain = 7.9%), BMXHIP-Hip Circumference (cm) (Gain = 7.2%), BMXHT-standing height (cm) (Gain = 6.2%) and HS C-Reactive Protein (mg/L) (Gain 6.1%).
CONCLUSION
Machine Learning models can additionally offer feature importance and additional statistics to help identify associations with asthma attacks.
Collapse