1
|
Mirzaeian R, Nopour R, Asghari Varzaneh Z, Shafiee M, Shanbehzadeh M, Kazemi-Arpanahi H. Which are best for successful aging prediction? Bagging, boosting, or simple machine learning algorithms? Biomed Eng Online 2023; 22:85. [PMID: 37644599 PMCID: PMC10463617 DOI: 10.1186/s12938-023-01140-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Accepted: 07/21/2023] [Indexed: 08/31/2023] Open
Abstract
BACKGROUND The worldwide society is currently facing an epidemiological shift due to the significant improvement in life expectancy and increase in the elderly population. This shift requires the public and scientific community to highlight successful aging (SA), as an indicator representing the quality of elderly people's health. SA is a subjective, complex, and multidimensional concept; thus, its meaning or measuring is a difficult task. This study seeks to identify the most affecting factors on SA and fed them as input variables for constructing predictive models using machine learning (ML) algorithms. METHODS Data from 1465 adults aged ≥ 60 years who were referred to health centers in Abadan city (Iran) between 2021 and 2022 were collected by interview. First, binary logistic regression (BLR) was used to identify the main factors influencing SA. Second, eight ML algorithms, including adaptive boosting (AdaBoost), bootstrap aggregating (Bagging), eXtreme Gradient Boosting (XG-Boost), random forest (RF), J-48, multilayered perceptron (MLP), Naïve Bayes (NB), and support vector machine (SVM), were trained to predict SA. Finally, their performance was evaluated using metrics derived from the confusion matrix to determine the best model. RESULTS The experimental results showed that 44 factors had a meaningful relationship with SA as the output class. In total, the RF algorithm with sensitivity = 0.95 ± 0.01, specificity = 0.94 ± 0.01, accuracy = 0.94 ± 0.005, and F-score = 0.94 ± 0.003 yielded the best performance for predicting SA. CONCLUSIONS Compared to other selected ML methods, the effectiveness of the RF as a bagging algorithm in predicting SA was significantly better. Our developed prediction models can provide, gerontologists, geriatric nursing, healthcare administrators, and policymakers with a reliable and responsive tool to improve elderly outcomes.
Collapse
Affiliation(s)
- Razieh Mirzaeian
- Department of Health Information Management, Shahrekord University of Medical Sciences, Shahrekord, Iran
| | - Raoof Nopour
- Student Research Committee, School of Health Management and Information Sciences Branch, Iran University of Medical Sciences, Tehran, Iran
| | - Zahra Asghari Varzaneh
- Department of Computer Science, Faculty of Mathematics and Computer, Shahid Bahonar University of Kerman, Kerman, Iran
| | - Mohsen Shafiee
- Department of Nursing, Abadan University of Medical Sciences, Abadan, Iran
| | - Mostafa Shanbehzadeh
- Department of Health Information Technology, School of Paramedical, Ilam University of Medical Sciences, Ilam, Iran
| | - Hadi Kazemi-Arpanahi
- Department of Health Information Technology, Abadan University of Medical Sciences, Abadan, Iran.
| |
Collapse
|
2
|
Prediction of novel ionic liquids’ surface tension via Bagging KNN predictive model: Modeling and Simulation. J Mol Liq 2022. [DOI: 10.1016/j.molliq.2022.120748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
3
|
Liu W, Zhao R, Su X, Mohamed A, Diana T. Development and validation of machine learning models for prediction of nanomedicine solubility in supercritical solvent for advanced pharmaceutical manufacturing. J Mol Liq 2022. [DOI: 10.1016/j.molliq.2022.119208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
4
|
Singh A, Amutha J, Nagar J, Sharma S, Lee CC. AutoML-ID: automated machine learning model for intrusion detection using wireless sensor network. Sci Rep 2022; 12:9074. [PMID: 35641584 PMCID: PMC9156733 DOI: 10.1038/s41598-022-13061-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Accepted: 05/18/2022] [Indexed: 11/18/2022] Open
Abstract
Momentous increase in the popularity of explainable machine learning models coupled with the dramatic increase in the use of synthetic data facilitates us to develop a cost-efficient machine learning model for fast intrusion detection and prevention at frontier areas using Wireless Sensor Networks (WSNs). The performance of any explainable machine learning model is driven by its hyperparameters. Several approaches have been developed and implemented successfully for optimising or tuning these hyperparameters for skillful predictions. However, the major drawback of these techniques, including the manual selection of the optimal hyperparameters, is that they depend highly on the problem and demand application-specific expertise. In this paper, we introduced Automated Machine Learning (AutoML) model to automatically select the machine learning model (among support vector regression, Gaussian process regression, binary decision tree, bagging ensemble learning, boosting ensemble learning, kernel regression, and linear regression model) and to automate the hyperparameters optimisation for accurate prediction of numbers of k-barriers for fast intrusion detection and prevention using Bayesian optimisation. To do so, we extracted four synthetic predictors, namely, area of the region, sensing range of the sensor, transmission range of the sensor, and the number of sensors using Monte Carlo simulation. We used 80% of the datasets to train the models and the remaining 20% for testing the performance of the trained model. We found that the Gaussian process regression performs prodigiously and outperforms all the other considered explainable machine learning models with correlation coefficient (R = 1), root mean square error (RMSE = 0.007), and bias = − 0.006. Further, we also tested the AutoML performance on a publicly available intrusion dataset, and we observed a similar performance. This study will help the researchers accurately predict the required number of k-barriers for fast intrusion detection and prevention.
Collapse
Affiliation(s)
- Abhilash Singh
- Indian Institute of Science Education and Research Bhopal, Fluvial Geomorphology and Remote Sensing Laboratory, Bhopal, 462066, India
| | - J Amutha
- Gautam Buddha University, School of ICT, Greater Noida, 201312, India
| | - Jaiprakash Nagar
- Indian Institute of Technology Kharagpur, Subir Chowdhury School of Quality and Reliability, Kharagpur, 721302, India
| | - Sandeep Sharma
- Department of Electronics Engineering, Madhav Institute of Technology and Science, Gwalior, 474005, India.
| | - Cheng-Chi Lee
- Department of Library and Information Science, Research and Development, Center for Physical Education, Health, and Information Technology, Fu Jen Catholic University, New Taipei, 242, Taiwan. .,Department of Computer Science and Information Engineering, Asia University, Taichung, 41354, Taiwan.
| |
Collapse
|
5
|
Prediction of the compressive strength of concrete using various predictive modeling techniques. Neural Comput Appl 2022. [DOI: 10.1007/s00521-021-06820-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
6
|
The Effectiveness of Ensemble-Neural Network Techniques to Predict Peak Uplift Resistance of Buried Pipes in Reinforced Sand. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11030908] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Buried pipes are extensively used for oil transportation from offshore platforms. Under unfavorable loading combinations, the pipe’s uplift resistance may be exceeded, which may result in excessive deformations and significant disruptions. This paper presents findings from a series of small-scale tests performed on pipes buried in geogrid-reinforced sands, with the measured peak uplift resistance being used to calibrate advanced numerical models employing neural networks. Multilayer perceptron (MLP) and Radial Basis Function (RBF) primary structure types have been used to train two neural network models, which were then further developed using bagging and boosting ensemble techniques. Correlation coefficients in excess of 0.954 between the measured and predicted peak uplift resistance have been achieved. The results show that the design of pipelines can be significantly improved using the proposed novel, reliable and robust soft computing models.
Collapse
|
7
|
Comparison of Ensemble Machine Learning Methods for Soil Erosion Pin Measurements. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION 2021. [DOI: 10.3390/ijgi10010042] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Although machine learning has been extensively used in various fields, it has only recently been applied to soil erosion pin modeling. To improve upon previous methods of quantifying soil erosion based on erosion pin measurements, this study explored the possible application of ensemble machine learning algorithms to the Shihmen Reservoir watershed in northern Taiwan. Three categories of ensemble methods were considered in this study: (a) Bagging, (b) boosting, and (c) stacking. The bagging method in this study refers to bagged multivariate adaptive regression splines (bagged MARS) and random forest (RF), and the boosting method includes Cubist and gradient boosting machine (GBM). Finally, the stacking method is an ensemble method that uses a meta-model to combine the predictions of base models. This study used RF and GBM as the meta-models, decision tree, linear regression, artificial neural network, and support vector machine as the base models. The dataset used in this study was sampled using stratified random sampling to achieve a 70/30 split for the training and test data, and the process was repeated three times. The performance of six ensemble methods in three categories was analyzed based on the average of three attempts. It was found that GBM performed the best among the ensemble models with the lowest root-mean-square error (RMSE = 1.72 mm/year), the highest Nash-Sutcliffe efficiency (NSE = 0.54), and the highest index of agreement (d = 0.81). This result was confirmed by the spatial comparison of the absolute differences (errors) between model predictions and observations using GBM and RF in the study area. In summary, the results show that as a group, the bagging method and the boosting method performed equally well, and the stacking method was third for the erosion pin dataset considered in this study.
Collapse
|
8
|
Wei J, Ye T, Zhang Z. A Machine Learning Approach to Evaluate the Performance of Rural Bank. COMPLEXITY 2021; 2021:1-10. [DOI: 10.1155/2021/6649605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
In the current performance evaluation works of commercial banks, most of the researches only focus on the relationship between a single characteristic and performance and lack a comprehensive analysis of characteristics. On the other hand, they mainly focus on causal inference and lack systematic quantitative conclusions from the perspective of prediction. This paper is the first to comprehensively investigate the predictability of multidimensional features on commercial bank performance using boosting regression tree. The dimensionality in the financial-related fields is relatively high. There are not only observable price data, financial fundamentals data, etc., but also many unobservable undisclosed data and undisclosed events; more sources of income cannot be explained by existing models. Aiming at the characteristics of commercial bank data, this paper proposes an adaptively reduced step size gradient boosting regression tree algorithm for bank performance evaluation. In this method, a random subsample sampling is performed before training each regression tree. The adaptive reduction step size is used to replace the reduction step size setting of the original algorithm, which overcomes the shortcomings of low accuracy and poor generalization ability of the existing regression decision tree model. Compared to the BIRCH algorithm for classification of existing data, our proposed gradient boosting regression tree algorithm with adaptively reduced step size obtains better classification results. This paper empirically uses data from rural banks in 30 provinces in China to classify the different characteristics of rural banks’ performance in order to better evaluate their performance.
Collapse
Affiliation(s)
- Jun Wei
- School of Economics and Management, Beijing Jiaotong University, 100044 Beijing, China
| | - Tao Ye
- School of Finance, Capital University of Economics and Business, 100070 Beijing, China
| | - Zhe Zhang
- School of Management Science and Engineering, Shandong University of Finance and Economics, 250014 Jinan, China
| |
Collapse
|
9
|
Cost-sensitive probability for weighted voting in an ensemble model for multi-class classification problems. APPL INTELL 2021. [DOI: 10.1007/s10489-020-02106-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
AbstractEnsemble learning is an algorithm that utilizes various types of classification models. This algorithm can enhance the prediction efficiency of component models. However, the efficiency of combining models typically depends on the diversity and accuracy of the predicted results of ensemble models. However, the problem of multi-class data is still encountered. In the proposed approach, cost-sensitive learning was implemented to evaluate the prediction accuracy for each class, which was used to construct a cost-sensitivity matrix of the true positive (TP) rate. This TP rate can be used as a weight value and combined with a probability value to drive ensemble learning for a specified class. We proposed an ensemble model, which was a type of heterogenous model, namely, a combination of various individual classification models (support vector machine, Bayes, K-nearest neighbour, naïve Bayes, decision tree, and multi-layer perceptron) in experiments on 3-, 4-, 5- and 6-classifier models. The efficiencies of the propose models were compared to those of the individual classifier model and homogenous models (Adaboost, bagging, stacking, voting, random forest, and random subspaces) with various multi-class data sets. The experimental results demonstrate that the cost-sensitive probability for the weighted voting ensemble model that was derived from 3 models provided the most accurate results for the dataset in multi-class prediction. The objective of this study was to increase the efficiency of predicting classification results in multi-class classification tasks and to improve the classification results.
Collapse
|
10
|
Ribeiro MHDM, dos Santos Coelho L. Ensemble approach based on bagging, boosting and stacking for short-term prediction in agribusiness time series. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2019.105837] [Citation(s) in RCA: 145] [Impact Index Per Article: 36.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
11
|
Barzegar R, Asghari Moghaddam A, Adamowski J, Nazemi AH. Delimitation of groundwater zones under contamination risk using a bagged ensemble of optimized DRASTIC frameworks. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2019; 26:8325-8339. [PMID: 30706265 DOI: 10.1007/s11356-019-04252-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Accepted: 01/14/2019] [Indexed: 06/09/2023]
Abstract
Developing a reliable groundwater vulnerability and contamination risk map is very important for groundwater management and protection. This study aims to compare various modified DRASTIC vulnerability frameworks based on rate calibration using the Wilcoxon rank-sum test (WRST), frequency ratio (FR) and weight optimization using the correlation coefficient (CC), the analytic hierarchy process (AHP), and genetic algorithms (GA), as well as to introduce, for the first time, an aggregated approach based on a bagging ensemble to develop a combined modified DRASTIC model. This research was conducted in the Khoy plain, NW Iran. To develop a typical DRASTIC map, seven DRASTIC data layers were generated, weighted, and then overlaid in ArcGIS. The nitrate (NO3) concentrations at 54 sites in the study area were used to validate the models by calculating the correlation coefficient (r) between the vulnerability/risk indices and NO3 concentrations. The calculated r value for the typical DRASTIC was 0.12. A sensitivity analysis reveals that the impact of the vadose zone and conductivity parameters with mean variation indices of 22.2 and 7.5%, respectively, have the highest and lowest influence on aquifer vulnerability. The r values increased for all the optimized frameworks. The results show that the WRST and GA methods are the most effective methods for calibration and optimization of DRASTIC rates and weights, with the WRST-GA-DRASTIC model obtaining an r value of 0.64. A bagging ensemble model was employed to combine the advantages of each standalone model. The bagging ensemble model yields an r value of 0.67. The ensemble model has the potential to increase the r value further than both the standalone optimized frameworks and the typical DRASTIC approach. In terms of spatial distribution class area (%), the bagging ensemble-DRASTIC model demonstrates that the moderate and low contamination risk classes with 16.4 and 23.1% of the total area cover the lowest and highest parts of the plain.
Collapse
Affiliation(s)
- Rahim Barzegar
- Department of Earth Sciences, Faculty of Natural Sciences, University of Tabriz, 29 Bahman Boulevard, Tabriz, Iran.
- Department of Bioresource Engineering, McGill University, 21111 Lakeshore, Ste Anne de Bellevue, Quebec, H9X3V9, Canada.
| | - Asghar Asghari Moghaddam
- Department of Earth Sciences, Faculty of Natural Sciences, University of Tabriz, 29 Bahman Boulevard, Tabriz, Iran
| | - Jan Adamowski
- Department of Bioresource Engineering, McGill University, 21111 Lakeshore, Ste Anne de Bellevue, Quebec, H9X3V9, Canada
| | - Amir Hossein Nazemi
- Department of Water Engineering, Faculty of Agriculture, University of Tabriz, 29 Bahman Boulevard, Tabriz, Iran
| |
Collapse
|
12
|
Mohan S, Saranya P. A novel bagging ensemble approach for predicting summertime ground-level ozone concentration. JOURNAL OF THE AIR & WASTE MANAGEMENT ASSOCIATION (1995) 2019; 69:220-233. [PMID: 30303768 DOI: 10.1080/10962247.2018.1534701] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Revised: 10/05/2018] [Accepted: 10/07/2018] [Indexed: 06/08/2023]
Abstract
Ozone pollution appears as a major air quality issue, e.g. for the protection of human health and vegetation. Formation of ground level ozone is a complex photochemical phenomenon and involves numerous intricate factors most of which are interrelated with each other. Machine learning techniques can be adopted to predict the ground level ozone. The main objective of the present study is to develop the state-of-the-art ensemble bagging approach to model the summer time ground level ozone in an industrial area comprising a hazardous waste management facility. In this study, the feasibility of using ensemble model with seven meteorological parameters as input variables to predict the surface level O3 concentration. Multilayer perceptron, RTree, REPTree, and Random forest were employed as the base learners. The error measures used for checking the performance of each model includes IoAd, R2, and PEP. The model results were validated against an independent test data set. Bagged random forest predicted the ground level ozone better with higher Nash-Sutcliffe coefficient 0.93. This study scaffolded the current research gap in big data analysis identified with air pollutant prediction. Implications: The main focus of this paper is to model the summer time ground level O3 concentration in an Industrial area comprising of hazardous waste management facility. Comparison study was made between the base classifiers and the ensemble classifiers. Most of the conventional models can well predict the average concentrations. In this case the peak concentrations are of importance as it has serious effect on human health and environment. The models developed should also be homoscedastic.
Collapse
Affiliation(s)
- Sankaralingam Mohan
- a Environmental and Water Resources Engineering Division, Department of Civil Engineering , Indian Institute of Technology Madras , Chennai , Tamil Nadu , India
| | - Packiam Saranya
- a Environmental and Water Resources Engineering Division, Department of Civil Engineering , Indian Institute of Technology Madras , Chennai , Tamil Nadu , India
| |
Collapse
|
13
|
Improving the Fuzzy Min–Max neural network performance with an ensemble of clustering trees. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.10.025] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|