1
|
Lu X, Qiu H. Explainable prediction of daily hospitalizations for cerebrovascular disease using stacked ensemble learning. BMC Med Inform Decis Mak 2023; 23:59. [PMID: 37024922 PMCID: PMC10080841 DOI: 10.1186/s12911-023-02159-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2022] [Accepted: 03/23/2023] [Indexed: 04/08/2023] Open
Abstract
BACKGROUND With the prevalence of cerebrovascular disease (CD) and the increasing strain on healthcare resources, forecasting the healthcare demands of cerebrovascular patients has significant implications for optimizing medical resources. METHODS In this study, a stacking ensemble model comprised of four base learners (ridge regression, random forest, gradient boosting decision tree, and artificial neural network) and a meta learner (elastic net) was proposed for predicting the daily number of hospital admissions (HAs) for CD using the historical HAs data, air quality data, and meteorological data in Chengdu, China from 2015 to 2018. To solve the label imbalance problem, a re-weighting method based on label distribution smoothing was integrated into the meta learner. We trained the model using the data from 2015 to 2017 and evaluated its predictive ability using the data in 2018 based on four metrics, including mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (R2). In addition, the SHapley Additive exPlanations (SHAP) framework was applied to provide explanation for the prediction of our stacking model. RESULTS Our proposed model outperformed all the base learners and long short-term memory (LSTM) on two datasets. Particularly, compared with the optimal results obtained by individual models, the MAE, RMSE, and MAPE of the stacking model decreased by 13.9%, 12.7%, and 5.8%, respectively, and the R2 improved by 6.8% on CD dataset. The model explanation demonstrated that environmental features played a role in further improving the model performance and identified that high temperature and high concentrations of gaseous air pollutants might strongly associate with an increased risk of CD. CONCLUSIONS Our stacking model considering environmental exposure is efficient in predicting daily HAs for CD and has practical value in early warning and healthcare resource allocation.
Collapse
Affiliation(s)
- Xiaoya Lu
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, No.2006, Xiyuan Ave, West Hi-Tech Zone, 611731, Chengdu, Sichuan, People's Republic of China
| | - Hang Qiu
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, No.2006, Xiyuan Ave, West Hi-Tech Zone, 611731, Chengdu, Sichuan, People's Republic of China.
- Big Data Research Center, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
2
|
Bai L, Lu K, Dong Y, Wang X, Gong Y, Xia Y, Wang X, Chen L, Yan S, Tang Z, Li C. Predicting monthly hospital outpatient visits based on meteorological environmental factors using the ARIMA model. Sci Rep 2023; 13:2691. [PMID: 36792764 PMCID: PMC9930044 DOI: 10.1038/s41598-023-29897-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 02/13/2023] [Indexed: 02/17/2023] Open
Abstract
Accurate forecasting of hospital outpatient visits is beneficial to the rational planning and allocation of medical resources to meet medical needs. Several studies have suggested that outpatient visits are related to meteorological environmental factors. We aimed to use the autoregressive integrated moving average (ARIMA) model to analyze the relationship between meteorological environmental factors and outpatient visits. Also, outpatient visits can be forecast for the future period. Monthly outpatient visits and meteorological environmental factors were collected from January 2015 to July 2021. An ARIMAX model was constructed by incorporating meteorological environmental factors as covariates to the ARIMA model, by evaluating the stationary [Formula: see text], coefficient of determination [Formula: see text], mean absolute percentage error (MAPE), and normalized Bayesian information criterion (BIC). The ARIMA [Formula: see text] model with the covariates of [Formula: see text], [Formula: see text], and [Formula: see text] was the optimal model. Monthly outpatient visits in 2019 can be predicted using average data from past years. The relative error between the predicted and actual values for 2019 was 2.77%. Our study suggests that [Formula: see text], [Formula: see text], and [Formula: see text] concentration have a significant impact on outpatient visits. The model built has excellent predictive performance and can provide some references for the scientific management of hospitals to allocate staff and resources.
Collapse
Affiliation(s)
- Lu Bai
- grid.263761.70000 0001 0198 0694Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, 215123 China ,grid.263761.70000 0001 0198 0694Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Suzhou, 215123 China
| | - Ke Lu
- grid.452273.50000 0004 4914 577XDepartment of Orthopedics, Affiliated Kunshan Hospital of Jiangsu University, No. 91 West of Qianjin Road, Suzhou, 215300 Jiangsu China
| | - Yongfei Dong
- grid.263761.70000 0001 0198 0694Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, 215123 China ,grid.263761.70000 0001 0198 0694Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Suzhou, 215123 China
| | - Xichao Wang
- grid.263761.70000 0001 0198 0694Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, 215123 China ,grid.263761.70000 0001 0198 0694Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Suzhou, 215123 China
| | - Yaqin Gong
- grid.452273.50000 0004 4914 577XInformation Department, Affiliated Kunshan Hospital of Jiangsu University, Suzhou, 215300 Jiangsu China
| | - Yunyu Xia
- Meteorological Bureau of Kunshan City, Suzhou, 215337 Jiangsu China
| | - Xiaochun Wang
- Meteorological Bureau of Kunshan City, Suzhou, 215337 Jiangsu China
| | - Lin Chen
- Ecology and Environment Bureau of Kunshan City, Suzhou, 215330 Jiangsu China
| | - Shanjun Yan
- Ecology and Environment Bureau of Kunshan City, Suzhou, 215330 Jiangsu China
| | - Zaixiang Tang
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, 215123, China. .,Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow University, Suzhou, 215123, China.
| | - Chong Li
- Department of Orthopedics, Affiliated Kunshan Hospital of Jiangsu University, No. 91 West of Qianjin Road, Suzhou, 215300, Jiangsu, China.
| |
Collapse
|
3
|
Wang C, Qi Y, Chen Z. Explainable Gated Recurrent Unit to explore the effect of co-exposure to multiple air pollutants and meteorological conditions on mental health outcomes. ENVIRONMENT INTERNATIONAL 2023; 171:107689. [PMID: 36508748 DOI: 10.1016/j.envint.2022.107689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 11/03/2022] [Accepted: 12/08/2022] [Indexed: 06/17/2023]
Abstract
Mental health conditions have the potential to be worsened by air pollution or other climate-sensitive factors. Few studies have empirically examined those associations when we faced to co-exposures, as well as interaction effects. There would be an urgent need to use deep learning to handle complex co-exposures that might interact in multiple ways, and the model performance reinforced by SHapely Additive exPlanations (SHAP) enabled our predictions interpretable and hence actionable. Here, to evaluate the mixed effect of short-term co-exposure, we conducted a time-series analysis using approximately 1.47 million hospital outpatient visits of mental disorders (i.e., depressive disorder-DD, Schizophrenia-SP, Anxiety Disorder-AD, Bipolar Disorder-BD, Attention Deficit and Hyperactivity Disorder-ADHD, Autism Spectrum Disorder-ASD), with matched meteorological observations from 2015 through 2019 in Nanjing, China. The global insights of gated recurrent unit model revealed that most of input features with similar effect size caused the illness risk of SP and ASD increase, and most markedly, 73% of relative humidity, 44.6 µg/m3 of NO2, and 14.1 µg/m3 of SO2 at 5-year average level associated with 2.27, 1.14, and 1.29 visits increase for DD, SP, and AD, respectively. Both synergic and antagonistic effect among informative paired-features were distinguished from local feature dependence. Interestingly, variation tendencies of excessive visits of bipolar disorder when atmospheric pressure, PM2.5, and O3 interacted with one another were inconsistent. Our results provided added qualitative and quantitative support for the conclusion that short-term co-exposure to ambient air pollutants and meteorological conditions posed threats to human mental health.
Collapse
Affiliation(s)
- Ce Wang
- School of Energy and Environment, Southeast University, Nanjing 210096, PR China; State Key Laboratory of Environmental Medicine Engineering, Ministry of Education, Southeast University, Nanjing 210096, PR China.
| | - Yi Qi
- School of Architecture and Urban Planning, Nanjing University, No. 22 Hankoulu Road, Nanjing 210093, PR China
| | - Zhenhua Chen
- Department of Information, Affiliated Nanjing Brain Hospital, Nanjing Medical University, No. 264 Guangzhou Road, Nanjing 210029, RP China.
| |
Collapse
|
4
|
Lee W, Lim YH, Ha E, Kim Y, Lee WK. Forecasting of non-accidental, cardiovascular, and respiratory mortality with environmental exposures adopting machine learning approaches. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2022; 29:88318-88329. [PMID: 35834079 PMCID: PMC9281380 DOI: 10.1007/s11356-022-21768-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 06/27/2022] [Indexed: 04/16/2023]
Abstract
Environmental exposure constantly changes with time and various interactions that can affect health outcomes. Machine learning (ML) or deep learning (DL) algorithms have been used to solve complex problems, such as multiple exposures and their interactions. This study developed predictive models for cause-specific mortality using ML and DL algorithms with the daily or hourly measured meteorological and air pollution data. The ML algorithm improved the performance compared to the conventional methods, even though the optimal algorithm depended on the adverse health outcomes. The best algorithms were extreme gradient boosting, ridge, and elastic net, respectively, for non-accidental, cardiovascular, and respiratory mortality with daily measurement; they were superior to the generalized additive model reducing a mean absolute error by 4.7%, 4.9%, and 16.8%, respectively. With hourly measurements, the ML model tended to outperform the conventional models, even though hourly data, instead of daily data, did not enhance the performance in some models. The proposed model allows a better understanding and development of robust predictive models for health outcomes using multiple environmental exposures.
Collapse
Affiliation(s)
- Woojoo Lee
- Department of Public Health Sciences, Graduate School of Public Health, Seoul National University, Seoul, Republic of Korea
| | - Youn-Hee Lim
- Department of Public Health, University of Copenhagen, Copenhagen, Denmark
| | - Eunhee Ha
- Department of Occupational and Environmental Medicine, Ewha Medical Research Center, College of Medicine, Ewha Woman's University, Seoul, Republic of Korea
| | - Yoenjin Kim
- Department of Public Health Sciences, Graduate School of Public Health, Seoul National University, Seoul, Republic of Korea
| | - Won Kyung Lee
- Department of Prevention and Management, Inha University Hospital, School of Medicine, Inha University, Incheon, Republic of Korea.
| |
Collapse
|
5
|
A Method for Improving the Prediction of Outpatient Visits for Hospital Management: Bayesian Autoregressive Analysis. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:4718157. [PMID: 36277006 PMCID: PMC9581652 DOI: 10.1155/2022/4718157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/11/2021] [Revised: 07/03/2022] [Accepted: 08/31/2022] [Indexed: 11/17/2022]
Abstract
The number of outpatient visits is generally influenced by various factors that are difficult to quantify and obtain, resulting in some irregular fluctuations. The traditional statistical methodology seldom considers these uncertainties. Accordingly, this paper presents a Bayesian autoregressive (AR) analysis to propose a forecasting framework to cope with the strict requirements. The AR model was conducted to identify the linear and autocorrelation relationships of historical series, and Bayesian inference was used to correct and optimize the AR model parameters. Posterior distribution of parameters was stably and reliably obtained by Gibbs sampling on the condition of the convergent Markov chain. Meanwhile, the lag orders of the AR model were adjusted based on the series characteristics. To increase the variability and generality of the dataset, the developed Bayesian AR model was evaluated at seven hospitals in China. The results demonstrated that the Bayesian AR model had varying degrees of decline in the MAPE value in the seven sets of experimental data. The reductions ranged from 0.1431% to 0.0342%, indicating effective optimization of the Bayesian inference in the AR model parameters and reflecting the useful correction of the lag order adjustment strategy. The proposed Bayesian AR framework showed high accuracy index and stable prediction accuracy, thereby outperforming the traditional AR model.
Collapse
|
6
|
Wang C, Feng L, Qi Y. Explainable deep learning predictions for illness risk of mental disorders in Nanjing, China. ENVIRONMENTAL RESEARCH 2021; 202:111740. [PMID: 34329635 DOI: 10.1016/j.envres.2021.111740] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Revised: 07/16/2021] [Accepted: 07/19/2021] [Indexed: 06/13/2023]
Abstract
Epidemiological studies have revealed the associations of air pollutants and meteorological factors with a range of mental health conditions. However, little is known about local explanations and global understanding on the importance and effect of input features in the complex system of environmental stressors - mental disorders (MDs), especially for exposure to air pollution mixture. In this study, we combined deep learning neural networks (DLNNs) with SHapley Additive exPlanation (SHAP) to predict the illness risk of MDs on the population level, and then provided explanations for risk factors. The modeling system, which was trained on day-by-day hospital outpatient visits of two major hospitals in Nanjing, China from 2013/07/01 through 2019/02/28, visualized the time-varying prediction, contributing factors, and interaction effects of informative features. Our results suggested that NO2, SO2, and CO made outstanding contributions in magnitude of feature attributions under circumstances of mixed air pollutants. In particular, NO2 at high concentration level was associated with an increase in illness risk of MDs, and the maximum and mean absolute SHAP value were approximated to 10 and 2 as a local and global measure of feature importance, respectively. It presented a marginally antagonistic effect for two pairs of gaseous pollutants, i.e., NO2 vs. SO2 and CO vs. NO2. In contrast, CO and SO2 displayed the opposite direction of feature effects to the rise of observed concentrations, but an apparent synergistic effect was obviously captured. The primary risk factors driving a sharp increase in acute attack or exacerbation of MDs were also identified by depicting prediction paths of time-series samples. We believe that the significance of coupling accurate predictions from DLNNs with interpretable explanations of why a prediction is completed has broad applicability throughout the field of environmental health.
Collapse
Affiliation(s)
- Ce Wang
- School of Energy and Environment, Southeast University, Nanjing, 210096, PR China; State Key Laboratory of Environmental Medicine Engineering, Ministry of Education, Southeast University, Nanjing, 210096, PR China.
| | - Lan Feng
- National-Provincial Joint Engineering Research Center of Electromechanical Product Packaging, College of Civil Engineering, Nanjing Forestry University, Nanjing, 210037, PR China.
| | - Yi Qi
- School of Architecture and Urban Planning, Nanjing University, No. 22 Hankoulu Road, Nanjing, 210093, PR China.
| |
Collapse
|
7
|
Zhao D, Chen M, Shi K, Ma M, Huang Y, Shen J. A long short-term memory-fully connected (LSTM-FC) neural network for predicting the incidence of bronchopneumonia in children. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2021; 28:56892-56905. [PMID: 34076817 DOI: 10.1007/s11356-021-14632-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 05/25/2021] [Indexed: 06/12/2023]
Abstract
Bronchopneumonia is the most common infectious disease in children, and it seriously endangers children's health. In this paper, a deep neural network combining long short-term memory (LSTM) layers and fully connected layers was proposed to predict the prevalence of bronchopneumonia in children in Chengdu based on environmental factors and previous prevalence rates. The mean square error (MSE), mean absolute error (MAE), and Pearson correlation coefficient (R) were used to detect the performance of the deep learning model. The values of MSE, MAE, and R in the test dataset are 0.0051, 0.053, and 0.846, respectively. The results show that the proposed model can accurately predict the prevalence of bronchopneumonia in children. We also compared the proposed model with three other models, namely, a fully connected (FC) layer neural network, a random forest model, and a support vector machine. The results show that the proposed model achieves better performance than the three other models by capturing time series and mitigating the lag effect.
Collapse
Affiliation(s)
- Dongzhe Zhao
- Chongqing Jinfo Mountain Karst Ecosystem National Observation and Research Station, School of Geographical Sciences, Southwest University, Chongqing, 400715, China
- Chongqing Engineering Research Center for Remote Sensing Big Data Application, School of Geographical Sciences, Southwest University, Chongqing, 400715, China
| | - Min Chen
- Key Laboratory of Virtual Geographic Environment (Ministry of Education), Nanjing Normal University, Nanjing, Jiangsu Province, 210046, China
| | - Kaifang Shi
- Chongqing Jinfo Mountain Karst Ecosystem National Observation and Research Station, School of Geographical Sciences, Southwest University, Chongqing, 400715, China
- Chongqing Engineering Research Center for Remote Sensing Big Data Application, School of Geographical Sciences, Southwest University, Chongqing, 400715, China
| | - Mingguo Ma
- Chongqing Jinfo Mountain Karst Ecosystem National Observation and Research Station, School of Geographical Sciences, Southwest University, Chongqing, 400715, China
- Chongqing Engineering Research Center for Remote Sensing Big Data Application, School of Geographical Sciences, Southwest University, Chongqing, 400715, China
| | - Yang Huang
- Chongqing Jinfo Mountain Karst Ecosystem National Observation and Research Station, School of Geographical Sciences, Southwest University, Chongqing, 400715, China
- Chongqing Engineering Research Center for Remote Sensing Big Data Application, School of Geographical Sciences, Southwest University, Chongqing, 400715, China
| | - Jingwei Shen
- Chongqing Jinfo Mountain Karst Ecosystem National Observation and Research Station, School of Geographical Sciences, Southwest University, Chongqing, 400715, China.
- Chongqing Engineering Research Center for Remote Sensing Big Data Application, School of Geographical Sciences, Southwest University, Chongqing, 400715, China.
| |
Collapse
|
8
|
Impact of Input Filtering and Architecture Selection Strategies on GRU Runoff Forecasting: A Case Study in the Wei River Basin, Shaanxi, China. WATER 2020. [DOI: 10.3390/w12123532] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
A gated recurrent unit (GRU) network, which is a kind of artificial neural network (ANN), has been increasingly applied to runoff forecasting. However, knowledge about the impact of different input data filtering strategies and the implications of different architectures on the GRU runoff forecasting model’s performance is still insufficient. This study has selected the daily rainfall and runoff data from 2007 to 2014 in the Wei River basin in Shaanxi, China, and assessed six different scenarios to explore the patterns of that impact. In the scenarios, four manually-selected rainfall or runoff data combinations and principal component analysis (PCA) denoised input have been considered along with single directional and bi-directional GRU network architectures. The performance has been evaluated from the aspect of robustness to 48 various hypermeter combinations, also, optimized accuracy in one-day-ahead (T + 1) and two-day-ahead (T + 2) forecasting for the overall forecasting process and the flood peak forecasts. The results suggest that the rainfall data can enhance the robustness of the model, especially in T + 2 forecasting. Additionally, it slightly introduces noise and affects the optimized prediction accuracy in T + 1 forecasting, but significantly improves the accuracy in T + 2 forecasting. Though with relevance (R = 0.409~0.763, Grey correlation grade >0.99), the runoff data at the adjacent tributary has an adverse effect on the robustness, but can enhance the accuracy of the flood peak forecasts with a short lead time. The models with PCA denoised input has an equivalent, even better performance on the robustness and accuracy compared with the models with the well manually filtered data; though slightly reduces the time-step robustness, the bi-directional architecture can enhance the prediction accuracy. All the scenarios provide acceptable forecasting results (NSE of 0.927~0.951 for T + 1 forecasting and 0.745~0.836 for T + 2 forecasting) when the hyperparameters have already been optimized. Based on the results, recommendations have been provided for the construction of the GRU runoff forecasting model.
Collapse
|
9
|
CIMI: Classify and Itemize Medical Image System for PFT Big Data Based on Deep Learning. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10238575] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The value of pulmonary function test (PFT) data is increasing due to the advent of the Coronavirus Infectious Disease 19 (COVID-19) and increased respiratory disease. However, these PFT data cannot be directly used in clinical studies, because PFT results are stored in raw image files. In this study, the classification and itemization medical image (CIMI) system generates valuable data from raw PFT images by automatically classifying various PFT results, extracting texts, and storing them in the PFT database and Excel files. The deep-learning-based optical character recognition (OCR) technology was mainly used in CIMI to classify and itemize PFT images in St. Mary’s Hospital. CIMI classified seven types and itemized 913,059 texts from 14,720 PFT image sheets, which cannot be done by humans. The number, type, and location of texts that can be extracted by PFT type are all different, but CIMI solves this issue by classifying the PFT image sheets by type, allowing researchers to analyze the data. To demonstrate the superiority of CIMI, the validation results of CIMI were compared to the results of the other four algorithms. A total of 70 randomly selected sheets (ten sheets from each type) and 33,550 texts were used for the validation. The accuracy of CIMI was 95%, which was the highest accuracy among the other four algorithms.
Collapse
|