1
|
Ribeiro MHDM, da Silva RG, Larcher JHK, Mendes A, Mariani VC, Coelho LDS. Decoding Electroencephalography Signal Response by Stacking Ensemble Learning and Adaptive Differential Evolution. SENSORS (BASEL, SWITZERLAND) 2023; 23:7049. [PMID: 37631586 PMCID: PMC10459492 DOI: 10.3390/s23167049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 07/29/2023] [Accepted: 08/02/2023] [Indexed: 08/27/2023]
Abstract
Electroencephalography (EEG) is an exam widely adopted to monitor cerebral activities regarding external stimuli, and its signals compose a nonlinear dynamical system. There are many difficulties associated with EEG analysis. For example, noise can originate from different disorders, such as muscle or physiological activity. There are also artifacts that are related to undesirable signals during EEG recordings, and finally, nonlinearities can occur due to brain activity and its relationship with different brain regions. All these characteristics make data modeling a difficult task. Therefore, using a combined approach can be the best solution to obtain an efficient model for identifying neural data and developing reliable predictions. This paper proposes a new hybrid framework combining stacked generalization (STACK) ensemble learning and a differential-evolution-based algorithm called Adaptive Differential Evolution with an Optional External Archive (JADE) to perform nonlinear system identification. In the proposed framework, five base learners, namely, eXtreme Gradient Boosting, a Gaussian Process, Least Absolute Shrinkage and Selection Operator, a Multilayer Perceptron Neural Network, and Support Vector Regression with a radial basis function kernel, are trained. The predictions from all these base learners compose STACK's layer-0 and are adopted as inputs of the Cubist model, whose hyperparameters were obtained by JADE. The model was evaluated for decoding the electroencephalography signal response to wrist joint perturbations. The variance accounted for (VAF), root-mean-squared error (RMSE), and Friedman statistical test were used to validate the performance of the proposed model and compare its results with other methods in the literature, including the base learners. The JADE-STACK model outperforms the other models in terms of accuracy, being able to explain around, as an average of all participants, 94.50% and 67.50% (standard deviations of 1.53 and 7.44, respectively) of the data variability for one step ahead and three steps ahead, which makes it a suitable approach to dealing with nonlinear system identification. Also, the improvement over state-of-the-art methods ranges from 0.6% to 161% and 43.34% for one step ahead and three steps ahead, respectively. Therefore, the developed model can be viewed as an alternative and additional approach to well-established techniques for nonlinear system identification once it can achieve satisfactory results regarding the data variability explanation.
Collapse
Affiliation(s)
- Matheus Henrique Dal Molin Ribeiro
- Industrial and Systems Engineering Graduate Program (PPGEPS), Pontifical Catholic University of Paraná (PUCPR), R. Imaculada Conceição 1155, Curitiba 80215-901, PR, Brazil;
- Department of Mathematics, Federal University of Technology—Paraná (UTFPR), Via do Conhecimento, KM 01—Fraron, Pato Branco 85503-390, PR, Brazil
| | - Ramon Gomes da Silva
- Industrial and Systems Engineering Graduate Program (PPGEPS), Pontifical Catholic University of Paraná (PUCPR), R. Imaculada Conceição 1155, Curitiba 80215-901, PR, Brazil;
| | - José Henrique Kleinubing Larcher
- Mechanical Engineering Graduate Program (PPGEM), Pontifical Catholic University of Paraná (PUCPR), R. Imaculada Conceição 1155, Curitiba 80215-901, PR, Brazil; (J.H.K.L.); (V.C.M.)
| | - Andre Mendes
- Department of Economics, Massachusetts Institute of Technology, 292 Main St, Cambridge, MA 02142, USA;
| | - Viviana Cocco Mariani
- Mechanical Engineering Graduate Program (PPGEM), Pontifical Catholic University of Paraná (PUCPR), R. Imaculada Conceição 1155, Curitiba 80215-901, PR, Brazil; (J.H.K.L.); (V.C.M.)
- Department of Electrical Engineering, Federal University of Paraná (UFPR), R. Evaristo F. Ferreira da Costa 384, Curitiba 81530-000, PR, Brazil
| | - Leandro dos Santos Coelho
- Industrial and Systems Engineering Graduate Program (PPGEPS), Pontifical Catholic University of Paraná (PUCPR), R. Imaculada Conceição 1155, Curitiba 80215-901, PR, Brazil;
- Department of Electrical Engineering, Federal University of Paraná (UFPR), R. Evaristo F. Ferreira da Costa 384, Curitiba 81530-000, PR, Brazil
| |
Collapse
|
2
|
Assad DBN, Cara J, Ortega-Mier M. Comparing Short-Term Univariate and Multivariate Time-Series Forecasting Models in Infectious Disease Outbreak. Bull Math Biol 2023; 85:9. [PMID: 36565344 PMCID: PMC9789525 DOI: 10.1007/s11538-022-01112-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Accepted: 11/29/2022] [Indexed: 12/25/2022]
Abstract
Predicting infectious disease outbreak impacts on population, healthcare resources and economics and has received a special academic focus during coronavirus (COVID-19) pandemic. Focus on human disease outbreak prediction techniques in current literature, Marques et al. (Predictive models for decision support in the COVID-19 crisis. Springer, Switzerland, 2021) state that there are four main methods to address forecasting problem: compartmental models, classic statistical models, space-state models and machine learning models. We adopt their framework to compare our research with previous works. Besides being divided by methods, forecasting problems can also be divided by the number of variables that are considered to make predictions. Considering this number of variables, forecasting problems can be classified as univariate, causal and multivariate models. Multivariate approaches have been applied in less than 10% of research found. This research is the first attempt to evaluate, over real time-series data of 3 different countries with univariate and multivariate methods to provide a short-term prediction. In literature we found no research with that scope and aim. A comparison of univariate and multivariate methods has been conducted and we concluded that besides the strong potential of multivariate methods, in our research univariate models presented best results in almost all regions' predictions.
Collapse
Affiliation(s)
- Daniel Bouzon Nagem Assad
- Universidad Politécnica de Madrid, Department of Organization Engineering, Business Administration and Statistics, Escuela Técnica Superior de Ingenieros Industriales, José Gutiérrez Abascal, 2, 28006 Madrid, Spain ,Universidade do Estado do Rio de Janeiro, Rua São Francisco Xavier, 524, Maracanã, 20550-900 Rio de Janeiro, Brazil
| | - Javier Cara
- Universidad Politécnica de Madrid, Department of Organization Engineering, Business Administration and Statistics, Escuela Técnica Superior de Ingenieros Industriales, José Gutiérrez Abascal, 2, 28006 Madrid, Spain
| | - Miguel Ortega-Mier
- Universidad Politécnica de Madrid, Department of Organization Engineering, Business Administration and Statistics, Escuela Técnica Superior de Ingenieros Industriales, José Gutiérrez Abascal, 2, 28006 Madrid, Spain
| |
Collapse
|
3
|
Joseph LP, Joseph EA, Prasad R. Explainable diabetes classification using hybrid Bayesian-optimized TabNet architecture. Comput Biol Med 2022; 151:106178. [PMID: 36306578 DOI: 10.1016/j.compbiomed.2022.106178] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 09/23/2022] [Accepted: 10/01/2022] [Indexed: 12/27/2022]
Abstract
Diabetes is a deadly chronic disease that occurs when the pancreas is not able to produce ample insulin or when the body cannot use insulin effectively. If undetected, it may lead to a host of health complications. Hence, accurate and explainable early-stage detection of diabetes is essential for the proper administration of treatment options in leading a healthy and productive life. For this, we developed an interpretable TabNet model tuned via Bayesian optimization (BO). To achieve model-specific interpretability, the attention mechanism of TabNet architecture was used, which offered the local and global model explanations on the influence of the attributes on the outcomes. The model was further explained locally and globally using more robust model-agnostic LIME and SHAP eXplainable Artificial Intelligence (XAI) tools. The proposed model outperformed all benchmarked models by obtaining high accuracy of 92.2% and 99.4% using the Pima Indians diabetes dataset (PIDD) and the early-stage diabetes risk prediction dataset (ESDRPD), respectively. Based on the XAI results, it was clear that the most influential attribute for diabetes classification using PIDD and ESDRPD were Insulin and Polyuria, respectively. The feature importance values registered for insulin was 0.301 (PIDD) and for polyuria 0.206 was registered (ESDRPD). The high accuracy and ancillary interpretability of our objective model is expected to increase end-users trust and confidence in early-stage detection of diabetes.
Collapse
Affiliation(s)
- Lionel P Joseph
- School of Mathematics, Physics, and Computing, University of Southern Queensland, Springfield, QLD, 4300, Australia
| | - Erica A Joseph
- Umanand Prasad School of Medicine and Health Sciences, The University of Fiji, Saweni, Lautoka, Fiji
| | - Ramendra Prasad
- Department of Science, School of Science and Technology, The University of Fiji, Saweni, Lautoka, Fiji.
| |
Collapse
|
4
|
Sopelsa Neto NF, Stefenon SF, Meyer LH, Ovejero RG, Leithardt VRQ. Fault Prediction Based on Leakage Current in Contaminated Insulators Using Enhanced Time Series Forecasting Models. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22166121. [PMID: 36015882 PMCID: PMC9415177 DOI: 10.3390/s22166121] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 08/11/2022] [Accepted: 08/13/2022] [Indexed: 05/17/2023]
Abstract
To improve the monitoring of the electrical power grid, it is necessary to evaluate the influence of contamination in relation to leakage current and its progression to a disruptive discharge. In this paper, insulators were tested in a saline chamber to simulate the increase of salt contamination on their surface. From the time series forecasting of the leakage current, it is possible to evaluate the development of the fault before a flashover occurs. In this paper, for a complete evaluation, the long short-term memory (LSTM), group method of data handling (GMDH), adaptive neuro-fuzzy inference system (ANFIS), bootstrap aggregation (bagging), sequential learning (boosting), random subspace, and stacked generalization (stacking) ensemble learning models are analyzed. From the results of the best structure of the models, the hyperparameters are evaluated and the wavelet transform is used to obtain an enhanced model. The contribution of this paper is related to the improvement of well-established models using the wavelet transform, thus obtaining hybrid models that can be used for several applications. The results showed that using the wavelet transform leads to an improvement in all the used models, especially the wavelet ANFIS model, which had a mean RMSE of 1.58 ×10-3, being the model that had the best result. Furthermore, the results for the standard deviation were 2.18 ×10-19, showing that the model is stable and robust for the application under study. Future work can be performed using other components of the distribution power grid susceptible to contamination because they are installed outdoors.
Collapse
Affiliation(s)
- Nemesio Fava Sopelsa Neto
- Department of Electrical Engineering, Regional University of Blumenau, Rua São Paulo 3250, Blumenau 89030-000, Brazil
- Correspondence:
| | - Stefano Frizzo Stefenon
- Fondazione Bruno Kessler, Via Sommarive 18, 38123 Trento, Italy
- Department of Mathematics, Informatics and Physical Sciences, University of Udine, Via delle Scienze 206, 33100 Udine, Italy
| | - Luiz Henrique Meyer
- Department of Electrical Engineering, Regional University of Blumenau, Rua São Paulo 3250, Blumenau 89030-000, Brazil
| | - Raúl García Ovejero
- Expert Systems and Applications Laboratory, E.T.S.I.I. of Béjar, Universidad de Salamanca, 37700 Salamanca, Spain
| | - Valderi Reis Quietinho Leithardt
- COPELABS, Lusófona University of Humanities and Technologies, Campo Grande 376, 1749-024 Lisboa, Portugal
- VALORIZA, Research Center for Endogenous Resources Valorization, Instituto Politécnico de Portalegre, 7300-555 Portalegre, Portugal
| |
Collapse
|
5
|
A Highly Discriminative Hybrid Feature Selection Algorithm for Cancer Diagnosis. ScientificWorldJournal 2022; 2022:1056490. [PMID: 35983572 PMCID: PMC9381276 DOI: 10.1155/2022/1056490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 07/20/2022] [Indexed: 11/17/2022] Open
Abstract
Cancer is a deadly disease that occurs due to rapid and uncontrolled cell growth. In this article, a machine learning (ML) algorithm is proposed to diagnose different cancer diseases from big data. The algorithm comprises a two-stage hybrid feature selection. In the first stage, an overall ranker is initiated to combine the results of three filter-based feature evaluation methods, namely, chi-squared, F-statistic, and mutual information (MI). The features are then ordered according to this combination. In the second stage, the modified wrapper-based sequential forward selection is utilized to discover the optimal feature subset, using ML models such as support vector machine (SVM), decision tree (DT), random forest (RF), and K-nearest neighbor (KNN) classifiers. To examine the proposed algorithm, many tests have been carried out on four cancerous microarray datasets, employing in the process 10-fold cross-validation and hyperparameter tuning. The performance of the algorithm is evaluated by calculating the diagnostic accuracy. The results indicate that for the leukemia dataset, both SVM and KNN models register the highest accuracy at 100% using only 5 features. For the ovarian cancer dataset, the SVM model achieves the highest accuracy at 100% using only 6 features. For the small round blue cell tumor (SRBCT) dataset, the SVM model also achieves the highest accuracy at 100% using only 8 features. For the lung cancer dataset, the SVM model also achieves the highest accuracy at 99.57% using 19 features. By comparing with other algorithms, the results obtained from the proposed algorithm are superior in terms of the number of selected features and diagnostic accuracy.
Collapse
|
6
|
Hajirahimi Z, Khashei M. Hybridization of hybrid structures for time series forecasting: a review. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10199-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
|
7
|
Joseph SK, M A A, Thomas S, Nair SC. Nanomedicine as a future therapeutic approach for treating meningitis. J Drug Deliv Sci Technol 2022. [DOI: 10.1016/j.jddst.2021.102968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
8
|
Wang Z, Chen H, Zhu J, Ding Z. Daily PM2.5 and PM10 forecasting using linear and nonlinear modeling framework based on robust local mean decomposition and moving window ensemble strategy. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2021.108110] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
9
|
Accuracy versus reliability-based modelling approaches for medical decision making. Comput Biol Med 2021; 141:105138. [PMID: 34929467 DOI: 10.1016/j.compbiomed.2021.105138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 12/11/2021] [Accepted: 12/11/2021] [Indexed: 11/21/2022]
Abstract
Forecasting in the medical domain is critical to the quality of decisions made by physicians, patients, and health planners. Modeling is one of the most important components of decision support systems, which are frequently used to simulate and analyze under-studied systems in order to make more appropriate decisions in medical science. In the medical modeling literature, various approaches with varying structures and characteristics have been proposed to cover a wide range of application categories and domains. Regardless of the differences between modeling approaches, all of them aim to maximize the accuracy or reliability of the results in order to achieve the most generalizable model and, as a result, a higher level of profitability decisions. Despite the theoretical significance and practical impact of reliability on generalizability, particularly in high-risk decisions and applications, a significant number of models in the fields of medical forecasting, classification, and time series prediction have been developed to maximize accuracy in mind. In other words, given the volatility of medical variables, it is also necessary to have stable and reliable forecasts in order to make sound decisions. The quality of medical decisions resulting from accuracy and reliability-based intelligent and statistical modeling approaches is compared and evaluated in this paper in order to determine the relative importance of accuracy and reliability on the quality of made decisions in decision support systems. For this purpose, 33 different case studies from the UCI in three categories of supervised modeling, namely causal forecasting, time series prediction, and classification, were considered. These cases were chosen from various domains, such as disease diagnosis (obesity, Parkinson's disease, diabetes, hepatitis, stenosis of arteries, orthopedic disease, autism) and cancer (lung, breast, cervical), experiments, therapy (immunotherapy, cryotherapy), fertility prediction, and predicting the number of patients in the emergency room and ICU. According to empirical findings, the reliability-based strategy outperformed the accuracy-based strategy in causal forecasting cases by 2.26%, classification cases by 13.49%, and time series prediction cases by 3.08%. Furthermore, compared to similar accuracy-based models, the reliability-based models can generate a 6.28% improvement. As a result, they can be considered an appropriate alternative to traditional accuracy-based models for medical decision support systems modeling purposes.
Collapse
|
10
|
Yang Y, Fan C, Xiong H. A novel general-purpose hybrid model for time series forecasting. APPL INTELL 2021; 52:2212-2223. [PMID: 34764604 PMCID: PMC8178659 DOI: 10.1007/s10489-021-02442-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/17/2021] [Indexed: 11/04/2022]
Abstract
Realizing the accurate prediction of data flow is an important and challenging problem in industrial automation. However, due to the diversity of data types, it is difficult for traditional time series prediction models to have good prediction effects on different types of data. To improve the versatility and accuracy of the model, this paper proposes a novel hybrid time-series prediction model based on recursive empirical mode decomposition (REMD) and long short-term memory (LSTM). In REMD-LSTM, we first propose a new REMD to overcome the marginal effects and mode confusion problems in traditional decomposition methods. Then use REMD to decompose the data stream into multiple in intrinsic modal functions (IMF). After that, LSTM is used to predict each IMF subsequence separately and obtain the corresponding prediction results. Finally, the true prediction value of the input data is obtained by accumulating the prediction results of all IMF subsequences. The final experimental results show that the prediction accuracy of our proposed model is improved by more than 20% compared with the LSTM algorithm. In addition, the model has the highest prediction accuracy on all different types of data sets. This fully shows the model proposed in this paper has a greater advantage in prediction accuracy and versatility than the state-of-the-art models. The data used in the experiment can be downloaded from this website: https://github.com/Yang-Yun726/REMD-LSTM.
Collapse
Affiliation(s)
- Yun Yang
- University of Shanghai for Science and Technology, Shanghai, China
| | - ChongJun Fan
- University of Shanghai for Science and Technology, Shanghai, China
| | - HongLin Xiong
- University of Shanghai for Science and Technology, Shanghai, China
| |
Collapse
|