1
|
Li Y, Du C, Ge S, Zhang R, Shao Y, Chen K, Li Z, Ma F. Hematoma expansion prediction based on SMOTE and XGBoost algorithm. BMC Med Inform Decis Mak 2024; 24:172. [PMID: 38898499 PMCID: PMC11186182 DOI: 10.1186/s12911-024-02561-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Accepted: 05/30/2024] [Indexed: 06/21/2024] Open
Abstract
Hematoma expansion (HE) is a high risky symptom with high rate of occurrence for patients who have undergone spontaneous intracerebral hemorrhage (ICH) after a major accident or illness. Correct prediction of the occurrence of HE in advance is critical to help the doctors to determine the next step medical treatment. Most existing studies focus only on the occurrence of HE within 6 h after the occurrence of ICH, while in reality a considerable number of patients have HE after the first 6 h but within 24 h. In this study, based on the medical doctors recommendation, we focus on prediction of the occurrence of HE within 24 h, as well as the occurrence of HE every 6 h within 24 h. Based on the demographics and computer tomography (CT) image extraction information, we used the XGBoost method to predict the occurrence of HE within 24 h. In this study, to solve the issue of highly imbalanced data set, which is a frequent case in medical data analysis, we used the SMOTE algorithm for data augmentation. To evaluate our method, we used a data set consisting of 582 patients records, and compared the results of proposed method as well as few machine learning methods. Our experiments show that XGBoost achieved the best prediction performance on the balanced dataset processed by the SMOTE algorithm with an accuracy of 0.82 and F1-score of 0.82. Moreover, our proposed method predicts the occurrence of HE within 6, 12, 18 and 24 h at the accuracy of 0.89, 0.82, 0.87 and 0.94, indicating that the HE occurrence within 24 h can be predicted accurately by the proposed method.
Collapse
Affiliation(s)
- Yan Li
- Department of Mathematics and Physics, Xi'an Jiaotong-Liverpool University, Suzhou, China
| | - Chaonan Du
- Department of Neurosurgery, Affiliated Jinling Hospital, Medical School of Nanjing University, Nanjing, China
| | - Sikai Ge
- Department of Mathematics and Physics, Xi'an Jiaotong-Liverpool University, Suzhou, China
| | - Ruonan Zhang
- Department of Mathematics and Physics, Xi'an Jiaotong-Liverpool University, Suzhou, China
| | - Yiming Shao
- Department of Mathematics and Physics, Xi'an Jiaotong-Liverpool University, Suzhou, China
| | - Keyu Chen
- Department of Mathematics and Physics, Xi'an Jiaotong-Liverpool University, Suzhou, China
| | - Zhepeng Li
- Department of Mathematics and Physics, Xi'an Jiaotong-Liverpool University, Suzhou, China
| | - Fei Ma
- Department of Mathematics and Physics, Xi'an Jiaotong-Liverpool University, Suzhou, China.
| |
Collapse
|
2
|
Xiong X, Wang A, He J, Wang C, Liu R, Sun Z, Zhang J, Zhang J. Application of LightGBM hybrid model based on TPE algorithm optimization in sleep apnea detection. Front Neurosci 2024; 18:1324933. [PMID: 38440395 PMCID: PMC10909841 DOI: 10.3389/fnins.2024.1324933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 01/30/2024] [Indexed: 03/06/2024] Open
Abstract
Introduction Sleep apnoea syndrome (SAS) is a serious sleep disorder and early detection of sleep apnoea not only reduces treatment costs but also saves lives. Conventional polysomnography (PSG) is widely regarded as the gold standard diagnostic tool for sleep apnoea. However, this method is expensive, time-consuming and inherently disruptive to sleep. Recent studies have pointed out that ECG analysis is a simple and effective diagnostic method for sleep apnea, which can effectively provide physicians with an aid to diagnosis and reduce patients' suffering. Methods To this end, in this paper proposes a LightGBM hybrid model based on ECG signals for efficient detection of sleep apnea. Firstly, the improved Isolated Forest algorithm is introduced to remove abnormal data and solve the data sample imbalance problem. Secondly, the parameters of LightGBM algorithm are optimised by the improved TPE (Tree-structured Parzen Estimator) algorithm to determine the best parameter configuration of the model. Finally, the fusion model TPE_OptGBM is used to detect sleep apnoea. In the experimental phase, we validated the model based on the sleep apnoea ECG database provided by Phillips-University of Marburg, Germany. Results The experimental results show that the model proposed in this paper achieves an accuracy of 95.08%, a precision of 94.80%, a recall of 97.51%, and an F1 value of 96.14%. Discussion All of these evaluation indicators are better than the current mainstream models, which is expected to assist the doctor's diagnostic process and provide a better medical experience for patients.
Collapse
Affiliation(s)
- Xin Xiong
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan, China
| | - Aikun Wang
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan, China
| | - Jianfeng He
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan, China
| | - Chunwu Wang
- College of Physics and Electronic Engineering, Hanshan Normal University, Chaozhou, China
| | - Ruixiang Liu
- Department of Clinical Psychology, Second People’s Hospital of Yunnan, Kunming, China
| | - Zhiran Sun
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan, China
| | - Jiancong Zhang
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan, China
| | - Jing Zhang
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan, China
| |
Collapse
|
3
|
Park J, Feng Y, Jeong SP. Developing an advanced prediction model for new employee turnover intention utilizing machine learning techniques. Sci Rep 2024; 14:1221. [PMID: 38216616 PMCID: PMC10786846 DOI: 10.1038/s41598-023-50593-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 12/21/2023] [Indexed: 01/14/2024] Open
Abstract
In recent years, the turnover phenomenon of new college graduates has been intensifying. The turnover of new employees creates many difficulties for businesses as it is difficult to recover the costs spent on their hiring and training. Therefore, it is necessary to promptly identify and effectively manage new employees who are inclined to change jobs. So far previous studies related to turnover intention have contributed to understanding the turnover phenomenon of new employees by identifying factors influencing turnover intention. However, with these factors, there is a limitation that it has not been able to present how much it is possible to predict employees who are actually willing to change jobs. Therefore, this study proposes a method of developing a machine learning-based turnover intention prediction model to overcome the limitations of previous studies. In this study, data from the Korea Employment Information Service's Job Movement Path Survey for college graduates were used, and OLS regression analysis was performed to confirm the influence of predictors. And model learning and classification were performed using a logistic regression (LR), k-nearest neighbor (KNN), and extreme gradient boosting (XGB) classifier. A novel finding of this research is the diminished or reversed influence of certain traditional factors, such as workload importance and the relevance of one's major field, on turnover intention. Instead, job security emerged as the most significant predictor. The model's accuracy rates, highest with XGB at 78.5%, demonstrate the efficacy of applying machine learning in turnover intention prediction, marking a significant advancement over traditional econometric models. This study breaks new ground by integrating advanced predictive analytics into turnover intention research, offering a more nuanced understanding of the factors influencing the turnover intentions of new college graduates. The insights gained could guide organizations in effectively managing and retaining new talent, highlighting the need for a focus on job security and organizational satisfaction, and the shifting relevance of traditional factors like job preference.
Collapse
Affiliation(s)
- Jungryeol Park
- Technology Policy Research Division, Electronics and Telecommunications Research Institute (ETRI), Daejeon, South Korea
| | - Yituo Feng
- Management Information Systems, Chungbuk National University, Cheongju, South Korea.
| | - Seon-Phil Jeong
- Department of Computer Science, BNU-HKBU United International College, Zhuhai, Guangdong, China
| |
Collapse
|
4
|
Thirunavukkarasu MK, Veerappapillai S, Karuppasamy R. Sequential virtual screening collaborated with machine-learning strategies for the discovery of precise medicine against non-small cell lung cancer. J Biomol Struct Dyn 2024; 42:615-628. [PMID: 36995235 DOI: 10.1080/07391102.2023.2194994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Accepted: 03/17/2023] [Indexed: 03/31/2023]
Abstract
Dysregulation of MAPK pathway receptors are crucial in causing uncontrolled cell proliferation in many cancer types including non-small cell lung cancer. Due to the complications in targeting the upstream components, MEK is an appealing target to diminish this pathway activity. Hence, we have aimed to discover potent MEK inhibitors by integrating virtual screening and machine learning-based strategies. Preliminary screening was conducted on 11,808 compounds using the cavity-based pharmacophore model AADDRRR. Further, seven ML models were accessed to predict the MEK active compounds using six molecular representations. The LGB model with morgan2 fingerprints surpasses other models ensuing 0.92 accuracy and 0.83 MCC value versus test set and 0.85 accuracy and 0.70 MCC value with external set. Further, the binding ability of screened hits were examined using glide XP docking and prime-MM/GBSA calculations. Note that we have utilized three ML-based scoring functions to predict the various biological properties of the compounds. The two hit compounds such as DB06920 and DB08010 resulted excellent binding mechanism with acceptable toxicity properties against MEK. Further, 200 ns of MD simulation combined with MM-GBSA/PBSA calculations confirms that DB06920 may have stable binding conformations with MEK thus step forwarded to the experimental studies in the near future.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Muthu Kumar Thirunavukkarasu
- Department of Biotechnology, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - Shanthi Veerappapillai
- Department of Biotechnology, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - Ramanathan Karuppasamy
- Department of Biotechnology, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| |
Collapse
|
5
|
Talkhi N, Nooghabi MJ, Esmaily H, Maleki S, Hajipoor M, Ferns GA, Ghayour-Mobarhan M. Prediction of serum anti-HSP27 antibody titers changes using a light gradient boosting machine (LightGBM) technique. Sci Rep 2023; 13:12775. [PMID: 37550399 PMCID: PMC10406940 DOI: 10.1038/s41598-023-39724-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 07/29/2023] [Indexed: 08/09/2023] Open
Abstract
Previous studies have proposed that heat shock proteins 27 (HSP27) and its anti-HSP27 antibody titers may play a crucial role in several diseases including cardiovascular disease. However, available studies has been used simple analytical methods. This study aimed to determine the factors that associate serum anti-HSP27 antibody titers using ensemble machine learning methods and to demonstrate the magnitude and direction of the predictors using PFI and SHAP methods. The study employed Python 3 to apply various machine learning models, including LightGBM, CatBoost, XGBoost, AdaBoost, SVR, MLP, and MLR. The best models were selected using model evaluation metrics during the K-Fold cross-validation strategy. The LightGBM model (with RMSE: 0.1900 ± 0.0124; MAE: 0.1471 ± 0.0044; MAPE: 0.8027 ± 0.064 as the mean ± sd) and the SHAP method revealed that several factors, including pro-oxidant-antioxidant balance (PAB), physical activity level (PAL), platelet distribution width, mid-upper arm circumference, systolic blood pressure, age, red cell distribution width, waist-to-hip ratio, neutrophils to lymphocytes ratio, platelet count, serum glucose, serum cholesterol, red blood cells were associated with anti-HSP27, respectively. The study found that PAB and PAL were strongly associated with serum anti-HSP27 antibody titers, indicating a direct and indirect relationship, respectively. These findings can help improve our understanding of the factors that determine anti-HSP27 antibody titers and their potential role in disease development.
Collapse
Affiliation(s)
- Nasrin Talkhi
- Department of Biostatistics, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran
- International UNESCO Center for Health-Related Basic Sciences and Human Nutrition, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Mehdi Jabbari Nooghabi
- Department of Statistics, Ferdowsi University of Mashhad, Mashhad, Iran
- Department of Mathematical Sciences, University of Copenhagen, 2100, Copenhagen, Denmark
| | - Habibollah Esmaily
- Department of Biostatistics, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran
- Social Determinants of Health Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Saba Maleki
- International UNESCO Center for Health-Related Basic Sciences and Human Nutrition, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Mojtaba Hajipoor
- Department of Nutrition Sciences, Varastegan Institute for Medical Sciences, Mashhad, Iran
| | - Gordon A Ferns
- Division of Medical Education, Brighton & Sussex Medical School, Falmer, Brighton, BN1 9PH, Sussex, UK
| | - Majid Ghayour-Mobarhan
- International UNESCO Center for Health-Related Basic Sciences and Human Nutrition, Mashhad University of Medical Sciences, Mashhad, Iran.
- Metabolic Syndrome Research Center, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran.
| |
Collapse
|
6
|
Zou Z, Chen J, Wu W, Luo J, Long T, Wu Q, Wang Q, Zhen J, Zhao Y, Wang Y, Chen Y, Zhou M, Xu L. Detection of peanut seed vigor based on hyperspectral imaging and chemometrics. FRONTIERS IN PLANT SCIENCE 2023; 14:1127108. [PMID: 36923124 PMCID: PMC10010490 DOI: 10.3389/fpls.2023.1127108] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 02/10/2023] [Indexed: 06/18/2023]
Abstract
Rapid nondestructive testing of peanut seed vigor is of great significance in current research. Before seeds are sown, effective screening of high-quality seeds for planting is crucial to improve the quality of crop yield, and seed vitality is one of the important indicators to evaluate seed quality, which can represent the potential ability of seeds to germinate quickly and whole and grow into normal seedlings or plants. Meanwhile, the advantage of nondestructive testing technology is that the seeds themselves will not be damaged. In this study, hyperspectral technology and superoxide dismutase activity were used to detect peanut seed vigor. To investigate peanut seed vigor and predict superoxide dismutase activity, spectral characteristics of peanut seeds in the wavelength range of 400-1000 nm were analyzed. The spectral data are processed by a variety of hot spot algorithms. Spectral data were preprocessed with Savitzky-Golay (SG), multivariate scatter correction (MSC), and median filtering (MF), which can effectively to reduce the effects of baseline drift and tilt. CatBoost and Gradient Boosted Decision Tree were used for feature band extraction, the top five weights of the characteristic bands of peanut seed vigor classification are 425.48nm, 930.8nm, 965.32nm, 984.0nm, and 994.7nm. XGBoost, LightGBM, Support Vector Machine and Random Forest were used for modeling of seed vitality classification. XGBoost and partial least squares regression were used to establish superoxide dismutase activity value regression model. The results indicated that MF-CatBoost-LightGBM was the best model for peanut seed vigor classification, and the accuracy result was 90.83%. MSC-CatBoost-PLSR was the optimal regression model of superoxide dismutase activity value. The results show that the R2 was 0.9787 and the RMSE value was 0.0566. The results suggested that hyperspectral technology could correlate the external manifestation of effective peanut seed vigor.
Collapse
Affiliation(s)
- Zhiyong Zou
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| | - Jie Chen
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| | - Weijia Wu
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| | - Jinghao Luo
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| | - Tao Long
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| | - Qingsong Wu
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| | - Qianlong Wang
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| | - Jiangbo Zhen
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| | - Yongpeng Zhao
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| | - Yuchao Wang
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| | - Yongming Chen
- School of Electrical Engineering and Automation, Hubei Normal University, Huangshi, Hubei, China
| | - Man Zhou
- Food Academy, Sichuan Agricultural University, Yaan, China
| | - Lijia Xu
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Yaan, China
| |
Collapse
|
7
|
Nafouanti MB, Li J, Nyakilla EE, Mwakipunda GC, Mulashani A. A novel hybrid random forest linear model approach for forecasting groundwater fluoride contamination. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023; 30:50661-50674. [PMID: 36800089 DOI: 10.1007/s11356-023-25886-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 02/07/2023] [Indexed: 02/18/2023]
Abstract
Groundwater quality in the Datong basin is threatened by high fluoride contamination. Laboratory analysis is a standard method for estimating groundwater quality parameters, which is expensive and time-consuming. Therefore, this paper proposes a hybrid random forest linear model (HRFLM) as a novel approach for estimating groundwater fluoride contamination. Light gradient boosting (LightGBM), random forest (RF), and extreme gradient boosting (Xgboost) were also employed in comparison with HRFLM for predicting fluoride contamination in groundwater. 202 groundwater samples were collected to draw up the performance capability of several models in forecasting subsurface water fluoride contamination. The performance of the models was assessed utilizing the receiver operating characteristic (ROC) area under the curve (AUC) and the confusion matrix (CM). The CM results reveal that with nine predictor variables, the hybrid HRFLM achieved an accuracy of 95%, outperforming the Xgboost, LightGBM, and RF models, which attained 88%, 88%, and 85%, respectively. Likewise, the AUC results of the hybrid HRFLM show high performance with an AUC of 0.98 compared to Xgboost, LightGBM, and RF, which achieved an AUC of 0.95, 0.90, and 0.88, respectively. The study demonstrates that the HRFLM can be applied as an advanced approach for groundwater fluoride contamination prediction in the Datong basin and could be adopted in various areas facing a similar challenge.
Collapse
Affiliation(s)
- Mouigni Baraka Nafouanti
- State Key Laboratory of Biogeology and Environmental Geology, China University of Geosciences, Wuhan, 430074, China.
| | - Junxia Li
- State Key Laboratory of Biogeology and Environmental Geology, China University of Geosciences, Wuhan, 430074, China.,China Laboratory of Basin Hydrology and Wetland Eco-restoration, China University of Geosciences, Wuhan, 430074, China
| | - Edwin E Nyakilla
- Department of Petroleum Engineering, Faculty of Earth Resources, China University of Geosciences, Wuhan, 430074, China
| | - Grant Charles Mwakipunda
- Department of Petroleum Engineering, Faculty of Earth Resources, China University of Geosciences, Wuhan, 430074, China
| | - Alvin Mulashani
- Department of Geosciences and Mining Technology, College of Engineering and Technology, Mbeya University of Science and Technology, Box 131, Mbeya, Tanzania
| |
Collapse
|
8
|
Trajectory tracking of changes digital divide prediction factors in the elderly through machine learning. PLoS One 2023; 18:e0281291. [PMID: 36763570 PMCID: PMC9916605 DOI: 10.1371/journal.pone.0281291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Accepted: 01/19/2023] [Indexed: 02/11/2023] Open
Abstract
RESEARCH MOTIVATION Recently, the digital divide problem among elderly individuals has been intensifying. A larger problem is that the level of use of digital technology varies from person to person. Therefore, a digital divide may even exist among elderly individuals. Considering the recent accelerating digital transformation in our society, it is highly likely that elderly individuals are experiencing many difficulties in their daily life. Therefore, it is necessary to quickly address and manage these difficulties. RESEARCH OBJECTIVE This study aims to predict the digital divide in the elderly population and provide essential insights into managing it. To this end, predictive analysis is performed using public data and machine learning techniques. METHODS AND MATERIALS This study used data from the '2020 Report on Digital Information Divide Survey' published by the Korea National Information Society Agency. In establishing the prediction model, various independent variables were used. Ten variables with high importance for predicting the digital divide were identified and used as critical, independent variables to increase the convenience of analyzing the model. The data were divided into 70% for training and 30% for testing. The model was trained on the training set, and the model's predictive accuracy was analyzed on the test set. The prediction accuracy was analyzed using logistic regression (LR), support vector machine (SVM), K-nearest neighbor (KNN), decision tree (DT), and eXtreme gradient boosting (XGBoost). A convolutional neural network (CNN) was used to further improve the accuracy. In addition, the importance of variables was analyzed using data from 2019 before the COVID-19 outbreak, and the results were compared with the results from 2020. RESULTS The study results showed that the variables with high importance in the 2020 data predicting the digital divide of elderly individuals were the demographic perspective, internet usage perspective, self-efficacy perspective, and social connectedness perspective. These variables, as well as the social support perspective, were highly important in 2019. The highest prediction accuracy was achieved using the CNN-based model (accuracy: 80.4%), followed by the XGBoost model (accuracy: 79%) and LR model (accuracy: 78.3%). The lowest accuracy (accuracy: 72.6%) was obtained using the DT model. DISCUSSION The results of this analysis suggest that support that can strengthen the practical connection of elderly individuals through digital devices is becoming more critical than ever in a situation where digital transformation is accelerating in various fields. In addition, it is necessary to comprehensively use classification algorithms from various academic fields when constructing a classification model to obtain higher prediction accuracy. CONCLUSION The academic significance of this study is that the CNN, which is often employed in image and video processing, was extended and applied to a social science field using structured data to improve the accuracy of the prediction model. The practical significance of this study is that the prediction models and the analytical methodologies proposed in this article can be applied to classify elderly people affected by the digital divide, and the trained models can be used to predict the people of younger generations who may be affected by the digital divide. Another practical significance of this study is that, as a method for managing individuals who are affected by a digital divide, the self-efficacy perspective about acquiring and using ICTs and the socially connected perspective are suggested in addition to the demographic perspective and the internet usage perspective.
Collapse
|
9
|
Blood Glucose Prediction Method Based on Particle Swarm Optimization and Model Fusion. Diagnostics (Basel) 2022; 12:diagnostics12123062. [PMID: 36553069 PMCID: PMC9776993 DOI: 10.3390/diagnostics12123062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Revised: 11/15/2022] [Accepted: 11/23/2022] [Indexed: 12/12/2022] Open
Abstract
Blood glucose stability in diabetic patients determines the degree of health, and changes in blood glucose levels are related to the outcome of diabetic patients. Therefore, accurate monitoring of blood glucose has a crucial role in controlling diabetes. Aiming at the problem of high volatility of blood glucose concentration in diabetic patients and the limitations of a single regression prediction model, this paper proposes a method for predicting blood glucose values based on particle swarm optimization and model fusion. First, the Kalman filtering algorithm is used to smooth and reduce the noise of the sensor current signal to reduce the effect of noise on the data. Then, the hyperparameter optimization of Extreme Gradient Boosting (XGBoost) and Light Gradient Boosting Machine (LightGBM) models is performed using particle swarm optimization algorithm. Finally, the XGBoost and LightGBM models are used as the base learner and the Bayesian regression model as the meta-learner, and the stacking model fusion method is used to achieve the prediction of blood glucose values. In order to prove the effectiveness and superiority of the method in this paper, we compared the prediction results of stacking fusion model with other 6 models. The experimental results show that the stacking fusion model proposed in this paper can accurately predict blood glucose values, and the average absolute percentage error of blood glucose prediction is 13.01%, and the prediction error of the stacking fusion model is much lower than that of the other six models. Therefore, the proposed diabetes blood glucose prediction method in this paper has superiority.
Collapse
|
10
|
Wadghiri MZ, Idri A, El Idrissi T, Hakkoum H. Ensemble blood glucose prediction in diabetes mellitus: A review. Comput Biol Med 2022; 147:105674. [PMID: 35716436 DOI: 10.1016/j.compbiomed.2022.105674] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 04/28/2022] [Accepted: 05/25/2022] [Indexed: 11/03/2022]
Abstract
Considering the complexity of blood glucose dynamics, the adoption of a single model to predict blood glucose level does not always capture the inter- and intra-patients' context changes. Ensembles are a set of machine learning techniques combining multiple single learners to find a better variance/bias trade-off and hence improve the prediction accuracy. The present paper aims to review the state of the art in predicting blood glucose using ensemble methods with regard to 8 criteria: publication year and sources, datasets used to train/evaluate the models, types of ensembles used, single learners involved to construct ensembles, combination schemes used to aggregate the base learners, metrics and validation methods adopted to assess the performance of ensembles, reported overall performance of the predictors and accuracy comparison of ensemble techniques with single models. A systematic literature review has been conducted in order to analyze and synthetize primary studies published between 2000 and 2020 in six digital libraries. A total of 32 primary papers were selected and reviewed with regard to eight review questions. The results show that ensembles have gained wider interest during the last years and improved in general the performance compared with other single models. However, multiple gaps have been identified concerning the ensembles construction process and the performance metrics used. Several recommendations have been made in this regard to design accurate ensembles for blood glucose level prediction.
Collapse
Affiliation(s)
- M Z Wadghiri
- Software Project Management Research Team, ENSIAS, Mohammed V University in Rabat, Morocco
| | - A Idri
- Software Project Management Research Team, ENSIAS, Mohammed V University in Rabat, Morocco; MSDA, Mohammed VI Polytechnic University, Benguerir, Morocco.
| | - Touria El Idrissi
- Software Project Management Research Team, ENSIAS, Mohammed V University in Rabat, Morocco
| | - Hajar Hakkoum
- Software Project Management Research Team, ENSIAS, Mohammed V University in Rabat, Morocco
| |
Collapse
|
11
|
Safaei N, Safaei B, Seyedekrami S, Talafidaryani M, Masoud A, Wang S, Li Q, Moqri M. E-CatBoost: An efficient machine learning framework for predicting ICU mortality using the eICU Collaborative Research Database. PLoS One 2022; 17:e0262895. [PMID: 35511882 PMCID: PMC9070907 DOI: 10.1371/journal.pone.0262895] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2021] [Accepted: 01/09/2022] [Indexed: 11/19/2022] Open
Abstract
Improving the Intensive Care Unit (ICU) management network and building cost-effective and well-managed healthcare systems are high priorities for healthcare units. Creating accurate and explainable mortality prediction models helps identify the most critical risk factors in the patients' survival/death status and early detect the most in-need patients. This study proposes a highly accurate and efficient machine learning model for predicting ICU mortality status upon discharge using the information available during the first 24 hours of admission. The most important features in mortality prediction are identified, and the effects of changing each feature on the prediction are studied. We used supervised machine learning models and illness severity scoring systems to benchmark the mortality prediction. We also implemented a combination of SHAP, LIME, partial dependence, and individual conditional expectation plots to explain the predictions made by the best-performing model (CatBoost). We proposed E-CatBoost, an optimized and efficient patient mortality prediction model, which can accurately predict the patients' discharge status using only ten input features. We used eICU-CRD v2.0 to train and validate the models; the dataset contains information on over 200,000 ICU admissions. The patients were divided into twelve disease groups, and models were fitted and tuned for each group. The models' predictive performance was evaluated using the area under a receiver operating curve (AUROC). The AUROC scores were 0.86 [std:0.02] to 0.92 [std:0.02] for CatBoost and 0.83 [std:0.02] to 0.91 [std:0.03] for E-CatBoost models across the defined disease groups; if measured over the entire patient population, their AUROC scores were 7 to 18 and 2 to 12 percent higher than the baseline models, respectively. Based on SHAP explanations, we found age, heart rate, respiratory rate, blood urine nitrogen, and creatinine level as the most critical cross-disease features in mortality predictions.
Collapse
Affiliation(s)
- Nima Safaei
- Department of Business Analytics and Information Systems, Tippie College of Business, University of Iowa, Iowa City, IA, United States of America
| | - Babak Safaei
- Civil and Environmental Engineering Department, Michigan State University, East Lansing, MI, United States of America
| | - Seyedhouman Seyedekrami
- Department of Computer Science and Engineering, University of Nevada, Reno, NV, United States of America
| | | | - Arezoo Masoud
- Department of Business Analytics and Information Systems, Tippie College of Business, University of Iowa, Iowa City, IA, United States of America
| | - Shaodong Wang
- Department of Industrial and Manufacturing Systems Engineering, Iowa State University, Ames, IA, United States of America
| | - Qing Li
- Department of Industrial and Manufacturing Systems Engineering, Iowa State University, Ames, IA, United States of America
| | - Mahdi Moqri
- Department of Information Systems and Business Analytics, Ivy College of Business, Iowa State University, Ames, IA, United States of America
| |
Collapse
|
12
|
Machine Learning Algorithms: Prediction and Feature Selection for Clinical Refracture after Surgically Treated Fragility Fracture. J Clin Med 2022; 11:jcm11072021. [PMID: 35407629 PMCID: PMC8999234 DOI: 10.3390/jcm11072021] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Revised: 03/08/2022] [Accepted: 04/02/2022] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND The number of patients with fragility fracture has been increasing. Although the increasing number of patients with fragility fracture increased the rate of fracture (refracture), the causes of refracture are multifactorial, and its predictors are still not clarified. In this issue, we collected a registry-based longitudinal dataset that contained more than 7000 patients with fragility fractures treated surgically to detect potential predictors for clinical refracture. METHODS Based on the fact that machine learning algorithms are often used for the analysis of a large-scale dataset, we developed automatic prediction models and clarified the relevant features for patients with clinical refracture. Formats of input data containing perioperative clinical information were table data. Clinical refracture was documented as the primary outcome if the diagnosis of fracture was made at postoperative outpatient care. A decision-tree-based model, LightGBM, had moderate accuracy for the prediction in the test and the independent dataset, whereas the other models had poor accuracy or worse. RESULTS From a clinical perspective, rheumatoid arthritis (RA) and chronic kidney disease (CKD) were noted as the relevant features for patients with clinical refracture, both of which were associated with secondary osteoporosis. CONCLUSION The decision-tree-based algorithm showed the precise prediction of clinical refracture, in which RA and CKD were detected as the potential predictors. Understanding these predictors may improve the management of patients with fragility fractures.
Collapse
|
13
|
Temperature Forecasting Correction Based on Operational GRAPES-3km Model Using Machine Learning Methods. ATMOSPHERE 2022. [DOI: 10.3390/atmos13020362] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
Postprocess correction is essential to improving the model forecasting result, in which machine learning methods play more and more important roles. In this study, three machine learning (ML) methods of Linear Regression, LSTM-FCN and LightGBM were used to carry out the correction of temperature forecasting of an operational high-resolution model GRAPES-3km. The input parameters include 2 m temperature, relative humidity, local pressure and wind speed forecasting and observation data in Shaanxi province of China from 1 January 2019 to 31 December 2020. The dataset from September 2018 was used for model evaluation using the metrics of root mean square error (RMSE), average absolute error (MAE) and coefficient of determination (R2). All three machine learning methods perform very well in correcting the temperature forecast of GRAPES-3km model. The RMSE decreased by 33%, 32% and 40%, respectively, the MAE decreased by 33%, 34% and 41%, respectively, the R2 increased by 21.4%, 21.5% and 25.2%, respectively. Among the three methods, LightGBM performed the best with the forecast accuracy rate reaching above 84%.
Collapse
|
14
|
Degradation Trend Prediction of Pumped Storage Unit Based on MIC-LGBM and VMD-GRU Combined Model. ENERGIES 2022. [DOI: 10.3390/en15020605] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
The harsh operating environment aggravates the degradation of pumped storage units (PSUs). Degradation trend prediction (DTP) provides important support for the condition-based maintenance of PSUs. However, the complexity of the performance degradation index (PDI) sequence poses a severe challenge of the reliability of DTP. Additionally, the accuracy of healthy model is often ignored, resulting in an unconvincing PDI. To solve these problems, a combined DTP model that integrates the maximal information coefficient (MIC), light gradient boosting machine (LGBM), variational mode decomposition (VMD) and gated recurrent unit (GRU) is proposed. Firstly, MIC-LGBM is utilized to generate a high-precision healthy model. MIC is applied to select the working parameters with the most relevance, then the LGBM is utilized to construct the healthy model. Afterwards, a performance degradation index (PDI) is generated based on the LGBM healthy model and monitoring data. Finally, the VMD-GRU prediction model is designed to achieve precise DTP under the complex PDI sequence. The proposed model is verified by applying it to a PSU located in Zhejiang province, China. The results reveal that the proposed model achieves the highest precision healthy model and the best prediction performance compared with other comparative models. The absolute average (|AVG|) and standard deviation (STD) of fitting errors are reduced to 0.0275 and 0.9245, and the RMSE, MAE, and R2 are 0.00395, 0.0032, and 0.9226 respectively, on average for two operating conditions.
Collapse
|
15
|
Prediction of PM2.5 Concentration Based on the LSTM-TSLightGBM Variable Weight Combination Model. ATMOSPHERE 2021. [DOI: 10.3390/atmos12091211] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
PM2.5 is one of the main pollutants that cause air pollution, and high concentrations of PM2.5 seriously threaten human health. Therefore, an accurate prediction of PM2.5 concentration has great practical significance for air quality detection, air pollution restoration, and human health. This paper uses the historical air quality concentration data and meteorological data of the Beijing Olympic Sports Center as the research object. This paper establishes a long short-term memory (LSTM) model with a time window size of 12, establishes a T-shape light gradient boosting machine (TSLightGBM) model that uses all information in the time window as the next period of prediction input, and establishes a LSTM-TSLightGBM model pair based on an optimal weighted combination method. PM2.5 hourly concentration is predicted. The prediction results on the test set show that the mean squared error (MAE), root mean squared error (RMSE), and symmetric mean absolute percentage error (SMAPE) of the LSTM-TSLightGBM model are 11.873, 22.516, and 19.540%, respectively. Compared with LSTM, TSLightGBM, the recurrent neural network (RNN), and other models, the LSTM-TSLightGBM model has a lower MAE, RMSE, and SMAPE, and higher prediction accuracy for PM2.5 and better goodness-of-fit.
Collapse
|
16
|
Li Z, Hu D. Forecast of the COVID-19 Epidemic Based on RF-BOA-LightGBM. Healthcare (Basel) 2021; 9:1172. [PMID: 34574946 PMCID: PMC8465863 DOI: 10.3390/healthcare9091172] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Revised: 08/26/2021] [Accepted: 08/30/2021] [Indexed: 01/23/2023] Open
Abstract
In this paper, we utilize the Internet big data tool, namely Baidu Index, to predict the development trend of the new coronavirus pneumonia epidemic to obtain further data. By selecting appropriate keywords, we can collect the data of COVID-19 cases in China between 1 January 2020 and 1 April 2020. After preprocessing the data set, the optimal sub-data set can be obtained by using random forest feature selection method. The optimization results of the seven hyperparameters of the LightGBM model by grid search, random search and Bayesian optimization algorithms are compared. The experimental results show that applying the data set obtained from the Baidu Index to the Bayesian-optimized LightGBM model can better predict the growth of the number of patients with new coronary pneumonias, and also help people to make accurate judgments to the development trend of the new coronary pneumonia.
Collapse
Affiliation(s)
| | - Dehua Hu
- School of Life Sciences, Central South University, Changsha 410083, China;
| |
Collapse
|
17
|
Lee Y, Ryu J, Kang MW, Seo KH, Kim J, Suh J, Kim YC, Kim DK, Oh KH, Joo KW, Kim YS, Jeong CW, Lee SC, Kwak C, Kim S, Han SS. Machine learning-based prediction of acute kidney injury after nephrectomy in patients with renal cell carcinoma. Sci Rep 2021; 11:15704. [PMID: 34344909 PMCID: PMC8333365 DOI: 10.1038/s41598-021-95019-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Accepted: 07/20/2021] [Indexed: 12/17/2022] Open
Abstract
The precise prediction of acute kidney injury (AKI) after nephrectomy for renal cell carcinoma (RCC) is an important issue because of its relationship with subsequent kidney dysfunction and high mortality. Herein we addressed whether machine learning (ML) algorithms could predict postoperative AKI risk better than conventional logistic regression (LR) models. A total of 4104 RCC patients who had undergone unilateral nephrectomy from January 2003 to December 2017 were reviewed. ML models such as support vector machine, random forest, extreme gradient boosting, and light gradient boosting machine (LightGBM) were developed, and their performance based on the area under the receiver operating characteristic curve, accuracy, and F1 score was compared with that of the LR-based scoring model. Postoperative AKI developed in 1167 patients (28.4%). All the ML models had higher performance index values than the LR-based scoring model. Among them, the LightGBM model had the highest value of 0.810 (0.783-0.837). The decision curve analysis demonstrated a greater net benefit of the ML models than the LR-based scoring model over all the ranges of threshold probabilities. The application of ML algorithms improves the predictability of AKI after nephrectomy for RCC, and these models perform better than conventional LR-based models.
Collapse
Affiliation(s)
- Yeonhee Lee
- Department of Internal Medicine, Seoul National University College of Medicine, 103 Daehakro, Jongno-gu, Seoul, 03080, South Korea.,Department of Internal Medicine, Uijeongbu Eulji Medical Center, Eulji University, Uijeongbu-si, Gyeonggi-do, South Korea
| | - Jiwon Ryu
- Department of Internal Medicine, Seoul National University Bundang Hospital, Seongnam-si, Gyeonggi-do, South Korea
| | - Min Woo Kang
- Department of Internal Medicine, Seoul National University College of Medicine, 103 Daehakro, Jongno-gu, Seoul, 03080, South Korea
| | - Kyung Ha Seo
- Medical Research Collaborating Center, Seoul National University Hospital, Seoul, South Korea
| | - Jayoun Kim
- Medical Research Collaborating Center, Seoul National University Hospital, Seoul, South Korea
| | - Jungyo Suh
- Department of Urology, Seoul National University College of Medicine, Seoul, South Korea
| | - Yong Chul Kim
- Department of Internal Medicine, Seoul National University College of Medicine, 103 Daehakro, Jongno-gu, Seoul, 03080, South Korea
| | - Dong Ki Kim
- Department of Internal Medicine, Seoul National University College of Medicine, 103 Daehakro, Jongno-gu, Seoul, 03080, South Korea
| | - Kook-Hwan Oh
- Department of Internal Medicine, Seoul National University College of Medicine, 103 Daehakro, Jongno-gu, Seoul, 03080, South Korea
| | - Kwon Wook Joo
- Department of Internal Medicine, Seoul National University College of Medicine, 103 Daehakro, Jongno-gu, Seoul, 03080, South Korea
| | - Yon Su Kim
- Department of Internal Medicine, Seoul National University College of Medicine, 103 Daehakro, Jongno-gu, Seoul, 03080, South Korea
| | - Chang Wook Jeong
- Department of Urology, Seoul National University College of Medicine, Seoul, South Korea
| | - Sang Chul Lee
- Department of Urology, Seoul National University College of Medicine, Seoul, South Korea
| | - Cheol Kwak
- Department of Urology, Seoul National University College of Medicine, Seoul, South Korea.
| | - Sejoong Kim
- Department of Internal Medicine, Seoul National University College of Medicine, 103 Daehakro, Jongno-gu, Seoul, 03080, South Korea. .,Department of Internal Medicine, Seoul National University Bundang Hospital, Seongnam-si, Gyeonggi-do, South Korea. .,Center for Artificial Intelligence in Healthcare, Seoul National University Bundang Hospital, Seongnam-si, Gyeonggi-do, South Korea.
| | - Seung Seok Han
- Department of Internal Medicine, Seoul National University College of Medicine, 103 Daehakro, Jongno-gu, Seoul, 03080, South Korea.
| |
Collapse
|
18
|
Improved Methods for Mid-Term Blood Glucose Level Prediction Using Dietary and Insulin Logs. ACTA ACUST UNITED AC 2021; 57:medicina57070676. [PMID: 34209125 PMCID: PMC8307794 DOI: 10.3390/medicina57070676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 06/25/2021] [Indexed: 11/16/2022]
Abstract
Background and Objectives: The daily lifestyle management of diabetes requires accurate predictions of the blood glucose level between meals. The objective of this study was to improve the accuracy achieved by previous work, especially on the mid-term, i.e., 120 to 180 min prediction horizons, for insulin-dependent patients. Materials and Methods: An absorption model-based method is proposed to train an artificial neural network with the bolus and basal insulin dosing and timing, the baseline blood glucose level, the maximal glucose infusion rate, and the total carbohydrate content as parameters. The approach was implemented in various algorithmic setups, and it was validated on data from a small-scale clinical trial with continuous glucose monitoring. Results: Root mean square error results for the mid-term horizons are 1.72 mmol/L (120 min) and 1.95 mmol/L (180 min). The accuracy of the proposed model measured on the clinical data is better than the accuracy reported by any other currently available and comparable models. Conclusions: A relatively short (ca. two weeks) training sample of a continuous glucose monitor and dietary/insulin log is sufficient to provide accurate predictions. For the outpatient application in practice, a hybrid model is proposed that combines the present mid-term method with the authors’ previous work for short-term predictions.
Collapse
|
19
|
Wang L, Niu D, Zhao X, Wang X, Hao M, Che H. A Comparative Analysis of Novel Deep Learning and Ensemble Learning Models to Predict the Allergenicity of Food Proteins. Foods 2021; 10:809. [PMID: 33918556 PMCID: PMC8069377 DOI: 10.3390/foods10040809] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Revised: 04/02/2021] [Accepted: 04/06/2021] [Indexed: 11/16/2022] Open
Abstract
Traditional food allergen identification mainly relies on in vivo and in vitro experiments, which often needs a long period and high cost. The artificial intelligence (AI)-driven rapid food allergen identification method has solved the above mentioned some drawbacks and is becoming an efficient auxiliary tool. Aiming to overcome the limitations of lower accuracy of traditional machine learning models in predicting the allergenicity of food proteins, this work proposed to introduce deep learning model-transformer with self-attention mechanism, ensemble learning models (representative as Light Gradient Boosting Machine (LightGBM) eXtreme Gradient Boosting (XGBoost)) to solve the problem. In order to highlight the superiority of the proposed novel method, the study also selected various commonly used machine learning models as the baseline classifiers. The results of 5-fold cross-validation showed that the area under the receiver operating characteristic curve (AUC) of the deep model was the highest (0.9578), which was better than the ensemble learning and baseline algorithms. But the deep model need to be pre-trained, and the training time is the longest. By comparing the characteristics of the transformer model and boosting models, it can be analyzed that, each model has its own advantage, which provides novel clues and inspiration for the rapid prediction of food allergens in the future.
Collapse
Affiliation(s)
- Liyang Wang
- Key Laboratory of Precision Nutrition and Food Quality, The Ministry of Education, College of Food Science and Nutritional Engineering, China Agricultural University, Beijing 100083, China; (L.W.); (X.W.); (M.H.)
| | - Dantong Niu
- College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China;
| | - Xinjie Zhao
- College of Humanities and Development Studies, China Agricultural University, Beijing 100083, China;
| | - Xiaoya Wang
- Key Laboratory of Precision Nutrition and Food Quality, The Ministry of Education, College of Food Science and Nutritional Engineering, China Agricultural University, Beijing 100083, China; (L.W.); (X.W.); (M.H.)
| | - Mengzhen Hao
- Key Laboratory of Precision Nutrition and Food Quality, The Ministry of Education, College of Food Science and Nutritional Engineering, China Agricultural University, Beijing 100083, China; (L.W.); (X.W.); (M.H.)
| | - Huilian Che
- Key Laboratory of Precision Nutrition and Food Quality, The Ministry of Education, College of Food Science and Nutritional Engineering, China Agricultural University, Beijing 100083, China; (L.W.); (X.W.); (M.H.)
| |
Collapse
|
20
|
Ho IMK, Cheong KY, Weldon A. Predicting student satisfaction of emergency remote learning in higher education during COVID-19 using machine learning techniques. PLoS One 2021; 16:e0249423. [PMID: 33798204 PMCID: PMC8018673 DOI: 10.1371/journal.pone.0249423] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 03/17/2021] [Indexed: 11/20/2022] Open
Abstract
Despite the wide adoption of emergency remote learning (ERL) in higher education during the COVID-19 pandemic, there is insufficient understanding of influencing factors predicting student satisfaction for this novel learning environment in crisis. The present study investigated important predictors in determining the satisfaction of undergraduate students (N = 425) from multiple departments in using ERL at a self-funded university in Hong Kong while Moodle and Microsoft Team are the key learning tools. By comparing the predictive accuracy between multiple regression and machine learning models before and after the use of random forest recursive feature elimination, all multiple regression, and machine learning models showed improved accuracy while the most accurate model was the elastic net regression with 65.2% explained variance. The results show only neutral (4.11 on a 7-point Likert scale) regarding the overall satisfaction score on ERL. Even majority of students are competent in technology and have no obvious issue in accessing learning devices or Wi-Fi, face-to-face learning is more preferable compared to ERL and this is found to be the most important predictor. Besides, the level of efforts made by instructors, the agreement on the appropriateness of the adjusted assessment methods, and the perception of online learning being well delivered are shown to be highly important in determining the satisfaction scores. The results suggest that the need of reviewing the quality and quantity of modified assessment accommodated for ERL and structured class delivery with the suitable amount of interactive learning according to the learning culture and program nature.
Collapse
Affiliation(s)
- Indy Man Kit Ho
- Technological and Higher Education Institute of Hong Kong (THEi), Chai Wan, Hong Kong
| | - Kai Yuen Cheong
- Technological and Higher Education Institute of Hong Kong (THEi), Chai Wan, Hong Kong
| | - Anthony Weldon
- Technological and Higher Education Institute of Hong Kong (THEi), Chai Wan, Hong Kong
| |
Collapse
|
21
|
Prediction of River Stage Using Multistep-Ahead Machine Learning Techniques for a Tidal River of Taiwan. WATER 2021. [DOI: 10.3390/w13070920] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Time-series prediction of a river stage during typhoons or storms is essential for flood control or flood disaster prevention. Data-driven models using machine learning (ML) techniques have become an attractive and effective approach to modeling and analyzing river stage dynamics. However, relatively new ML techniques, such as the light gradient boosting machine regression (LGBMR), have rarely been applied to predict the river stage in a tidal river. In this study, data-driven ML models were developed under a multistep-ahead prediction framework and evaluated for river stage modeling. Four ML techniques, namely support vector regression (SVR), random forest regression (RFR), multilayer perceptron regression (MLPR), and LGBMR, were employed to establish data-driven ML models with Bayesian optimization. The models were applied to simulate river stage hydrographs of the tidal reach of the Lan-Yang River Basin in Northeastern Taiwan. Historical measurements of rainfall, river stages, and tidal levels were collected from 2004 to 2017 and used for training and validation of the four models. Four scenarios were used to investigate the effect of the combinations of input variables on river stage predictions. The results indicated that (1) the tidal level at a previous stage significantly affected the prediction results; (2) the LGBMR model achieves more favorable prediction performance than the SVR, RFR, and MLPR models; and (3) the LGBMR model could efficiently and accurately predict the 1–6-h river stage in the tidal river. This study provides an extensive and insightful comparison of four data-driven ML models for river stage forecasting that can be helpful for model selection and flood mitigation.
Collapse
|