1
|
Al Fahoum A, Zyout A. Wavelet Transform, Reconstructed Phase Space, and Deep Learning Neural Networks for EEG-Based Schizophrenia Detection. Int J Neural Syst 2024; 34:2450046. [PMID: 39010724 DOI: 10.1142/s0129065724500461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/17/2024]
Abstract
This study proposes an innovative expert system that uses exclusively EEG signals to diagnose schizophrenia in its early stages. For diagnosing psychiatric/neurological disorders, electroencephalogram (EEG) testing is considered a financially viable, safe, and reliable alternative. Using the reconstructed phase space (RPS) and the continuous wavelet transform, the researchers maximized the differences between the EEG nonstationary signals of normal and schizophrenia individuals, which cannot be observed in the time, frequency, or time-frequency domains. This reveals significant information, highlighting more distinguishable features. Then, a deep learning network was trained to enhance the accuracy of the resulting image classification. The algorithm's efficacy was confirmed through three distinct methods: employing 70% of the dataset for training, 15% for validation, and the remaining 15% for testing. This was followed by a 5-fold cross-validation technique and a leave-one-out classification approach. Each method was iterated 100 times to ascertain the algorithm's robustness. The performance metrics derived from these tests - accuracy, precision, sensitivity, F1 score, Matthews correlation coefficient, and Kappa - indicated remarkable outcomes. The algorithm demonstrated steady performance across all evaluation strategies, underscoring its relevance and reliability. The outcomes validate the system's accuracy, precision, sensitivity, and robustness by showcasing its capability to autonomously differentiate individuals diagnosed with schizophrenia from those in a state of normal health.
Collapse
Affiliation(s)
- Amjed Al Fahoum
- Biomedical Systems and Informatics Engineering Department, Yarmouk University, Irbid 21163, Jordan
| | - Ala'a Zyout
- Biomedical Systems and Informatics Engineering Department, Yarmouk University, Irbid 21163, Jordan
| |
Collapse
|
2
|
Kabir E, Guikema SD, Quiring SM. Power outage prediction using data streams: An adaptive ensemble learning approach with a feature- and performance-based weighting mechanism. RISK ANALYSIS : AN OFFICIAL PUBLICATION OF THE SOCIETY FOR RISK ANALYSIS 2024; 44:686-704. [PMID: 37666505 DOI: 10.1111/risa.14211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/06/2023]
Abstract
A wide variety of weather conditions, from windstorms to prolonged heat events, can substantially impact power systems, posing many risks and inconveniences due to power outages. Accurately estimating the probability distribution of the number of customers without power using data about the power utility system and environmental and weather conditions can help utilities restore power more quickly and efficiently. However, the critical shortcoming of current models lies in the difficulties of handling (i) data streams and (ii) model uncertainty due to combining data from various weather events. Accordingly, this article proposes an adaptive ensemble learning algorithm for data streams, which deploys a feature- and performance-based weighting mechanism to adaptively combine outputs from multiple competitive base learners. As a proof of concept, we use a large, real data set of daily customer interruptions to develop the first adaptive all-weather outage prediction model using data streams. We benchmark several approaches to demonstrate the advantage of our approach in offering more accurate probabilistic predictions. The results show that the proposed algorithm reduces the probabilistic predictions' error of the base learners between 4% and 22% with an average of 8%, which also result in substantially more accurate point predictions. The improvement made by our algorithm is enhanced as we exchange base learners with simpler models.
Collapse
Affiliation(s)
- Elnaz Kabir
- Department of Engineering Technology & Industrial Distribution, Texas A&M University, College Station, Texas, USA
| | - Seth D Guikema
- Department of Industrial & Operations Engineering, University of Michigan, Ann Arbor, Michigan, USA
| | - Steven M Quiring
- Department of Geography, The Ohio State University, Columbus, Ohio, USA
| |
Collapse
|
3
|
Shyaa MA, Zainol Z, Abdullah R, Anbar M, Alzubaidi L, Santamaría J. Enhanced Intrusion Detection with Data Stream Classification and Concept Drift Guided by the Incremental Learning Genetic Programming Combiner. SENSORS (BASEL, SWITZERLAND) 2023; 23:3736. [PMID: 37050795 PMCID: PMC10098915 DOI: 10.3390/s23073736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/11/2023] [Revised: 03/27/2023] [Accepted: 03/31/2023] [Indexed: 06/19/2023]
Abstract
Concept drift (CD) in data streaming scenarios such as networking intrusion detection systems (IDS) refers to the change in the statistical distribution of the data over time. There are five principal variants related to CD: incremental, gradual, recurrent, sudden, and blip. Genetic programming combiner (GPC) classification is an effective core candidate for data stream classification for IDS. However, its basic structure relies on the usage of traditional static machine learning models that receive onetime training, limiting its ability to handle CD. To address this issue, we propose an extended variant of the GPC using three main components. First, we replace existing classifiers with alternatives: online sequential extreme learning machine (OSELM), feature adaptive OSELM (FA-OSELM), and knowledge preservation OSELM (KP-OSELM). Second, we add two new components to the GPC, specifically, a data balancing and a classifier update. Third, the coordination between the sub-models produces three novel variants of the GPC: GPC-KOS for KA-OSELM; GPC-FOS for FA-OSELM; and GPC-OS for OSELM. This article presents the first data stream-based classification framework that provides novel strategies for handling CD variants. The experimental results demonstrate that both GPC-KOS and GPC-FOS outperform the traditional GPC and other state-of-the-art methods, and the transfer learning and memory features contribute to the effective handling of most types of CD. Moreover, the application of our incremental variants on real-world datasets (KDD Cup '99, CICIDS-2017, CSE-CIC-IDS-2018, and ISCX '12) demonstrate improved performance (GPC-FOS in connection with CSE-CIC-IDS-2018 and CICIDS-2017; GPC-KOS in connection with ISCX2012 and KDD Cup '99), with maximum accuracy rates of 100% and 98% by GPC-KOS and GPC-FOS, respectively. Additionally, our GPC variants do not show superior performance in handling blip drift.
Collapse
Affiliation(s)
- Methaq A. Shyaa
- School of Computer Sciences, Universiti Sains Malaysia, USM, Gelugor 11800, Pulau Penang, Malaysia; (M.A.S.)
| | - Zurinahni Zainol
- School of Computer Sciences, Universiti Sains Malaysia, USM, Gelugor 11800, Pulau Penang, Malaysia; (M.A.S.)
| | - Rosni Abdullah
- School of Computer Sciences, Universiti Sains Malaysia, USM, Gelugor 11800, Pulau Penang, Malaysia; (M.A.S.)
| | - Mohammed Anbar
- National Advanced IPv6 Centre (NAv6), Universiti Sains Malaysia, USM, Gelugor 11800, Pulau Penang, Malaysia
| | - Laith Alzubaidi
- School of Mechanical, Medical, and Process Engineering, Queensland University of Technology, Brisbane, QLD 4000, Australia
- Centre for Data Science, Queensland University of Technology, Brisbane, QLD 4000, Australia
| | - José Santamaría
- Department of Computer Science, University of Jaén, 23071 Jaén, Spain
| |
Collapse
|
4
|
Wang K, Lu J, Liu A, Zhang G, Xiong L. Evolving Gradient Boost: A Pruning Scheme Based on Loss Improvement Ratio for Learning Under Concept Drift. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:2110-2123. [PMID: 34613927 DOI: 10.1109/tcyb.2021.3109796] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In nonstationary environments, data distributions can change over time. This phenomenon is known as concept drift, and the related models need to adapt if they are to remain accurate. With gradient boosting (GB) ensemble models, selecting which weak learners to keep/prune to maintain model accuracy under concept drift is nontrivial research. Unlike existing models such as AdaBoost, which can directly compare weak learners' performance by their accuracy (a metric between [0, 1]), in GB, weak learners' performance is measured with different scales. To address the performance measurement scaling issue, we propose a novel criterion to evaluate weak learners in GB models, called the loss improvement ratio (LIR). Based on LIR, we develop two pruning strategies: 1) naive pruning (NP), which simply deletes all learners with increasing loss and 2) statistical pruning (SP), which removes learners if their loss increase meets a significance threshold. We also devise a scheme to dynamically switch between NP and SP to achieve the best performance. We implement the scheme as a concept drift learning algorithm, called evolving gradient boost (LIR-eGB). On average, LIR-eGB delivered the best performance against state-of-the-art methods on both stationary and nonstationary data.
Collapse
|
5
|
Active Weighted Aging Ensemble for Drifted Data Stream Classification. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.02.046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/17/2023]
|
6
|
Coelho RA, Bambirra Torres LC, de Castro CL. Concept Drift Detection with Quadtree-based Spatial Mapping of Streaming Data. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2022.12.085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
7
|
Ko H, Rim K, Hong JY. Bio-metric authentication with electrocardiogram (ECG) by considering variable signals. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:1716-1729. [PMID: 36899505 DOI: 10.3934/mbe.2023078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
The use of conventional bio-signals such as an electrocardiogram (ECG) for biometric authentication is vulnerable to a lack of verification of continuity of signals; this is because the system does not consider the change in signals caused by a change in the situation of a person, that is, conventional biological signals. Prediction technology based on tracking and analyzing new signals can overcome this shortcoming. However, since the biological signal data sets are massive, their utilization is crucial for higher accuracy. In this study, we defined a 10 × 10 matrix for 100 points based on the R-peak point and an array for the dimension of the signals. Furthermore, we defined the future predicted signals by analyzing the continuous points in each array of the matrices at the same point. As a result, the accuracy of user authentication was 91%.
Collapse
Affiliation(s)
- Hoon Ko
- Research & Development Center, MetaiONE Inc., Business Incubation Center (#504), Chosun University, Gwangju 61452, Korea
- Instituto Superior de Engenharia do Porto (ISEP/IPP), Porto 4249-015, Portugal
| | - Kwangcheol Rim
- The Department of Mathematics, Chosun University, 309 pilmundae-ro, Gwangju 61452, Korea
| | - Jong Youl Hong
- College of Culture & Sports, Korea University, Korea University Sejong Campus, Sejong City 30019, Korea
| |
Collapse
|
8
|
Malialis K, Panayiotou CG, Polycarpou MM. Nonstationary data stream classification with online active learning and siamese neural networks✩. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
|
9
|
Takada T, Kitajima T. Trend-following with better adaptation to large downside risks. PLoS One 2022; 17:e0276322. [PMID: 36256670 PMCID: PMC9578607 DOI: 10.1371/journal.pone.0276322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2022] [Accepted: 10/04/2022] [Indexed: 11/18/2022] Open
Abstract
Avoiding losses from long-term trend reversals is challenging, and trend-following is one of the few trading approaches to explore it. While trend-following is popular among investors and has gained increased attention in academia, the recent diminished profitability in equity markets casts doubt on its effectiveness. To clarify its cause and suggest remedies, we thoroughly examine the effect of market conditions and averaging window on recent profitability using four major stock indices in an out-of-sample experiment comparing trend-following rules (moving average and momentum) and a machine-classification-based non-trend-following rule. In addition to the significant advantage of trend-following rules in avoiding downside risks, we find a discrepancy in the optimum averaging window size between trend direction phases, which is exacerbated by a higher positive trend direction ratio. A higher positive trend direction ratio leads to poor performance relative to buy-and-hold returns. This discrepancy creates the dilemma of choosing which trend direction phase to emphasize. Incorporating machine-learning into trend-following is effective for alleviating this dilemma. We find that the profit-maximizing averaging window realizes the level that best balances the dilemma and suggest a simple guideline for selecting the optimum averaging window. We attribute the sluggishness of trend-following in recent equity markets to the insufficient chances of trend reversals rather than their loss of profitability. Our results contribute to improving the performance of trend following by mitigating the dilemma.
Collapse
Affiliation(s)
- Teruko Takada
- Graduate School of Business, Osaka Metropolitan University, Osaka, Japan
- * E-mail:
| | - Takahiro Kitajima
- Graduate School of Business, Osaka Metropolitan University, Osaka, Japan
- Faculty of Commerce, Kumamoto Gakuen University, Kumamoto, Japan
| |
Collapse
|
10
|
Machine learning to improve the interpretation of intercalating dye-based quantitative PCR results. Sci Rep 2022; 12:16445. [PMID: 36180590 PMCID: PMC9525288 DOI: 10.1038/s41598-022-21010-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 09/21/2022] [Indexed: 11/16/2022] Open
Abstract
This study aimed to evaluate the contribution of Machine Learning (ML) approach in the interpretation of intercalating dye-based quantitative PCR (IDqPCR) signals applied to the diagnosis of mucormycosis. The ML-based classification approach was applied to 734 results of IDqPCR categorized as positive (n = 74) or negative (n = 660) for mucormycosis after combining “visual reading” of the amplification and denaturation curves with clinical, radiological and microbiological criteria. Fourteen features were calculated to characterize the curves and injected in several pipelines including four ML-algorithms. An initial subset (n = 345) was used for the conception of classifiers. The classifier predictions were combined with majority voting to estimate performances of 48 meta-classifiers on an external dataset (n = 389). The visual reading returned 57 (7.7%), 568 (77.4%) and 109 (14.8%) positive, negative and doubtful results respectively. The Kappa coefficients of all the meta-classifiers were greater than 0.83 for the classification of IDqPCR results on the external dataset. Among these meta-classifiers, 6 exhibited Kappa coefficients at 1. The proposed ML-based approach allows a rigorous interpretation of IDqPCR curves, making the diagnosis of mucormycosis available for non-specialists in molecular diagnosis. A free online application was developed to classify IDqPCR from the raw data of the thermal cycler output (http://gepamy-sat.asso.st/).
Collapse
|
11
|
Komorniczak J, Zyblewski P, Ksieniewicz P. Statistical Drift Detection Ensemble for batch processing of data streams. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
12
|
Ksieniewicz P. Processing data stream with chunk-similarity model selection. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03826-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
13
|
Korycki Ł, Krawczyk B. Adversarial concept drift detection under poisoning attacks for robust data stream mining. Mach Learn 2022; 112:1-36. [PMID: 35668720 PMCID: PMC9162121 DOI: 10.1007/s10994-022-06177-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2020] [Revised: 11/01/2021] [Accepted: 04/12/2022] [Indexed: 11/30/2022]
Abstract
Continuous learning from streaming data is among the most challenging topics in the contemporary machine learning. In this domain, learning algorithms must not only be able to handle massive volume of rapidly arriving data, but also adapt themselves to potential emerging changes. The phenomenon of evolving nature of data streams is known as concept drift. While there is a plethora of methods designed for detecting its occurrence, all of them assume that the drift is connected with underlying changes in the source of data. However, one must consider the possibility of a malicious injection of false data that simulates a concept drift. This adversarial setting assumes a poisoning attack that may be conducted in order to damage the underlying classification system by forcing an adaptation to false data. Existing drift detectors are not capable of differentiating between real and adversarial concept drift. In this paper, we propose a framework for robust concept drift detection in the presence of adversarial and poisoning attacks. We introduce the taxonomy for two types of adversarial concept drifts, as well as a robust trainable drift detector. It is based on the augmented restricted Boltzmann machine with improved gradient computation and energy function. We also introduce Relative Loss of Robustness-a novel measure for evaluating the performance of concept drift detectors under poisoning attacks. Extensive computational experiments, conducted on both fully and sparsely labeled data streams, prove the high robustness and efficacy of the proposed drift detection framework in adversarial scenarios.
Collapse
Affiliation(s)
- Łukasz Korycki
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA USA
| | - Bartosz Krawczyk
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA USA
| |
Collapse
|
14
|
Klikowski J, Woźniak M. Deterministic Sampling Classifier with weighted Bagging for drifted imbalanced data stream classification. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.108855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
15
|
ROSE: robust online self-adjusting ensemble for continual learning on imbalanced drifting data streams. Mach Learn 2022. [DOI: 10.1007/s10994-022-06168-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
16
|
Han M, Chen Z, Li M, Wu H, Zhang X. A survey of active and passive concept drift handling methods. Comput Intell 2022. [DOI: 10.1111/coin.12520] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Meng Han
- School of Computer Science and Engineering North Minzu University Yinchuan China
| | - Zhiqiang Chen
- School of Computer Science and Engineering North Minzu University Yinchuan China
| | - Muhang Li
- School of Computer Science and Engineering North Minzu University Yinchuan China
| | - Hongxin Wu
- School of Computer Science and Engineering North Minzu University Yinchuan China
| | - Xilong Zhang
- School of Computer Science and Engineering North Minzu University Yinchuan China
| |
Collapse
|
17
|
Alberghini G, Barbon Junior S, Cano A. Adaptive ensemble of self-adjusting nearest neighbor subspaces for multi-label drifting data streams. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.01.075] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
18
|
Wu Z, Gao P, Cui L, Chen J. An Incremental Learning Method Based on Dynamic Ensemble RVM for Intrusion Detection. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT 2022. [DOI: 10.1109/tnsm.2021.3102388] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
19
|
Silva RP, Zarpelão BB, Cano A, Junior SB. Time Series Segmentation Based on Stationarity Analysis to Improve New Samples Prediction. SENSORS 2021; 21:s21217333. [PMID: 34770639 PMCID: PMC8587387 DOI: 10.3390/s21217333] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 11/01/2021] [Accepted: 11/02/2021] [Indexed: 11/16/2022]
Abstract
A wide range of applications based on sequential data, named time series, have become increasingly popular in recent years, mainly those based on the Internet of Things (IoT). Several different machine learning algorithms exploit the patterns extracted from sequential data to support multiple tasks. However, this data can suffer from unreliable readings that can lead to low accuracy models due to the low-quality training sets available. Detecting the change point between high representative segments is an important ally to find and thread biased subsequences. By constructing a framework based on the Augmented Dickey-Fuller (ADF) test for data stationarity, two proposals to automatically segment subsequences in a time series were developed. The former proposal, called Change Detector segmentation, relies on change detection methods of data stream mining. The latter, called ADF-based segmentation, is constructed on a new change detector derived from the ADF test only. Experiments over real-file IoT databases and benchmarks showed the improvement provided by our proposals for prediction tasks with traditional Autoregressive integrated moving average (ARIMA) and Deep Learning (Long short-term memory and Temporal Convolutional Networks) methods. Results obtained by the Long short-term memory predictive model reduced the relative prediction error from 1 to 0.67, compared to time series without segmentation.
Collapse
Affiliation(s)
- Ricardo Petri Silva
- Department of Electrical Engineering, State University of Londrina, Londrina 86057-970, Brazil
- Correspondence:
| | - Bruno Bogaz Zarpelão
- Department of Computer Science, State University of Londrina, Londrina 86057-970, Brazil; (B.B.Z.); (S.B.J.)
| | - Alberto Cano
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA;
| | - Sylvio Barbon Junior
- Department of Computer Science, State University of Londrina, Londrina 86057-970, Brazil; (B.B.Z.); (S.B.J.)
| |
Collapse
|
20
|
Kumar S, Singh R, Khan MZ, Noorwali A. Design of adaptive ensemble classifier for online sentiment analysis and opinion mining. PeerJ Comput Sci 2021; 7:e660. [PMID: 34435102 PMCID: PMC8356659 DOI: 10.7717/peerj-cs.660] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 07/13/2021] [Indexed: 06/13/2023]
Abstract
DataStream mining is a challenging task for researchers because of the change in data distribution during classification, known as concept drift. Drift detection algorithms emphasize detecting the drift. The drift detection algorithm needs to be very sensitive to change in data distribution for detecting the maximum number of drifts in the data stream. But highly sensitive drift detectors lead to higher false-positive drift detections. This paper proposed a Drift Detection-based Adaptive Ensemble classifier for sentiment analysis and opinion mining, which uses these false-positive drift detections to benefit and minimize the negative impact of false-positive drift detection signals. The proposed method creates and adds a new classifier to the ensemble whenever a drift happens. A weighting mechanism is implemented, which provides weights to each classifier in the ensemble. The weight of the classifier decides the contribution of each classifier in the final classification results. The experiments are performed using different classification algorithms, and results are evaluated on the accuracy, precision, recall, and F1-measures. The proposed method is also compared with these state-of-the-art methods, OzaBaggingADWINClassifier, Accuracy Weighted Ensemble, Additive Expert Ensemble, Streaming Random Patches, and Adaptive Random Forest Classifier. The results show that the proposed method handles both true positive and false positive drifts efficiently.
Collapse
Affiliation(s)
- Sanjeev Kumar
- Department of Computer Science and Information Technology, M.J.P. Rohilkhand University, Bareilly, Uttar Pradesh, India
| | - Ravendra Singh
- Department of Computer Science and Information Technology, M.J.P. Rohilkhand University, Bareilly, Uttar Pradesh, India
| | - Mohammad Zubair Khan
- Department of Computer Science, College of Computer Science and Engineering, Taibah University, Madinah, Madinah, Saudi Arabia
| | - Abdulfattah Noorwali
- Department of Electrical Engineering, Umm Al-Qura University, Makkah, Makkah, Saudi Arabia
| |
Collapse
|
21
|
Cost-Sensitive Classification for Evolving Data Streams with Concept Drift and Class Imbalance. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2021; 2021:8813806. [PMID: 34381499 PMCID: PMC8352686 DOI: 10.1155/2021/8813806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Revised: 07/04/2021] [Accepted: 07/21/2021] [Indexed: 11/17/2022]
Abstract
Class imbalance and concept drift are two primary principles that exist concurrently in data stream classification. Although the two issues have drawn enough attention separately, the joint treatment largely remains unexplored. Moreover, the class imbalance issue is further complicated if data streams with concept drift. A novel Cost-Sensitive based Data Stream (CSDS) classification is introduced to overcome the two issues simultaneously. The CSDS considers cost information during the procedures of data preprocessing and classification. During the data preprocessing, a cost-sensitive learning strategy is introduced into the ReliefF algorithm for alleviating the class imbalance at the data level. In the classification process, a cost-sensitive weighting schema is devised to enhance the overall performance of the ensemble. Besides, a change detection mechanism is embedded in our algorithm, which guarantees that an ensemble can capture and react to drift promptly. Experimental results validate that our method can obtain better classification results under different imbalanced concept drifting data stream scenarios.
Collapse
|
22
|
Goel K, Batra S. Dynamically adaptive and diverse dual ensemble learning approach for handling concept drift in data streams. Comput Intell 2021. [DOI: 10.1111/coin.12475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Kanu Goel
- Computer Science and Engineering Department Thapar Institute of Engineering and Technology Patiala India
| | - Shalini Batra
- Computer Science and Engineering Department Thapar Institute of Engineering and Technology Patiala India
| |
Collapse
|
23
|
Wu O, Koh YS, Dobbie G, Lacombe T. Probabilistic exact adaptive random forest for recurrent concepts in data streams. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2021. [DOI: 10.1007/s41060-021-00273-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
24
|
Roseberry M, Krawczyk B, Djenouri Y, Cano A. Self-adjusting k nearest neighbors for continual learning from multi-label drifting data streams. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.02.032] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
25
|
Feitosa Neto A, Canuto AM. EOCD: An ensemble optimization approach for concept drift applications. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.01.051] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
26
|
Vieira DM, Fernandes C, Lucena C, Lifschitz S. Driftage: a multi-agent system framework for concept drift detection. Gigascience 2021; 10:6290670. [PMID: 34061207 PMCID: PMC8168350 DOI: 10.1093/gigascience/giab030] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 03/07/2021] [Accepted: 03/30/2021] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND The amount of data and behavior changes in society happens at a swift pace in this interconnected world. Consequently, machine learning algorithms lose accuracy because they do not know these new patterns. This change in the data pattern is known as concept drift. There exist many approaches for dealing with these drifts. Usually, these methods are costly to implement because they require (i) knowledge of drift detection algorithms, (ii) software engineering strategies, and (iii) continuous maintenance concerning new drifts. RESULTS This article proposes to create Driftage: a new framework using multi-agent systems to simplify the implementation of concept drift detectors considerably and divide concept drift detection responsibilities between agents, enhancing explainability of each part of drift detection. As a case study, we illustrate our strategy using a muscle activity monitor of electromyography. We show a reduction in the number of false-positive drifts detected, improving detection interpretability, and enabling concept drift detectors' interactivity with other knowledge bases. CONCLUSION We conclude that using Driftage, arises a new paradigm to implement concept drift algorithms with multi-agent architecture that contributes to split drift detection responsability, algorithms interpretability and more dynamic algorithms adaptation.
Collapse
Affiliation(s)
- Diogo Munaro Vieira
- Informatics Department, Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Marques de São Vicente, 225, Gávea, Rio de Janeiro, RJ 22451-900, Brazil
| | - Chrystinne Fernandes
- Informatics Department, Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Marques de São Vicente, 225, Gávea, Rio de Janeiro, RJ 22451-900, Brazil
| | - Carlos Lucena
- Informatics Department, Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Marques de São Vicente, 225, Gávea, Rio de Janeiro, RJ 22451-900, Brazil
| | - Sérgio Lifschitz
- Informatics Department, Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Marques de São Vicente, 225, Gávea, Rio de Janeiro, RJ 22451-900, Brazil
| |
Collapse
|
27
|
Khezri S, Tanha J, Ahmadi A, Sharifi A. A novel semi-supervised ensemble algorithm using a performance-based selection metric to non-stationary data streams. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.02.031] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
28
|
Sun Y, Dai H. Constructing accuracy and diversity ensemble using Pareto-based multi-objective learning for evolving data streams. Neural Comput Appl 2021. [DOI: 10.1007/s00521-020-05386-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
29
|
Sarnovsky M, Kolarik M. Classification of the drifting data streams using heterogeneous diversified dynamic class-weighted ensemble. PeerJ Comput Sci 2021; 7:e459. [PMID: 33834113 PMCID: PMC8022634 DOI: 10.7717/peerj-cs.459] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Accepted: 03/05/2021] [Indexed: 06/12/2023]
Abstract
Data streams can be defined as the continuous stream of data coming from different sources and in different forms. Streams are often very dynamic, and its underlying structure usually changes over time, which may result to a phenomenon called concept drift. When solving predictive problems using the streaming data, traditional machine learning models trained on historical data may become invalid when such changes occur. Adaptive models equipped with mechanisms to reflect the changes in the data proved to be suitable to handle drifting streams. Adaptive ensemble models represent a popular group of these methods used in classification of drifting data streams. In this paper, we present the heterogeneous adaptive ensemble model for the data streams classification, which utilizes the dynamic class weighting scheme and a mechanism to maintain the diversity of the ensemble members. Our main objective was to design a model consisting of a heterogeneous group of base learners (Naive Bayes, k-NN, Decision trees), with adaptive mechanism which besides the performance of the members also takes into an account the diversity of the ensemble. The model was experimentally evaluated on both real-world and synthetic datasets. We compared the presented model with other existing adaptive ensemble methods, both from the perspective of predictive performance and computational resource requirements.
Collapse
|
30
|
Concept Drift Adaptation Techniques in Distributed Environment for Real-World Data Streams. SMART CITIES 2021. [DOI: 10.3390/smartcities4010021] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Real-world data streams pose a unique challenge to the implementation of machine learning (ML) models and data analysis. A notable problem that has been introduced by the growth of Internet of Things (IoT) deployments across the smart city ecosystem is that the statistical properties of data streams can change over time, resulting in poor prediction performance and ineffective decisions. While concept drift detection methods aim to patch this problem, emerging communication and sensing technologies are generating a massive amount of data, requiring distributed environments to perform computation tasks across smart city administrative domains. In this article, we implement and test a number of state-of-the-art active concept drift detection algorithms for time series analysis within a distributed environment. We use real-world data streams and provide critical analysis of results retrieved. The challenges of implementing concept drift adaptation algorithms, along with their applications in smart cities, are also discussed.
Collapse
|
31
|
Park N, Kim S. FlexSketch: Estimation of Probability Density for Stationary and Non-Stationary Data Streams. SENSORS (BASEL, SWITZERLAND) 2021; 21:1080. [PMID: 33557367 PMCID: PMC7915800 DOI: 10.3390/s21041080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 01/28/2021] [Accepted: 01/28/2021] [Indexed: 11/21/2022]
Abstract
Efficient and accurate estimation of the probability distribution of a data stream is an important problem in many sensor systems. It is especially challenging when the data stream is non-stationary, i.e., its probability distribution changes over time. Statistical models for non-stationary data streams demand agile adaptation for concept drift while tolerating temporal fluctuations. To this end, a statistical model needs to forget old data samples and to detect concept drift swiftly. In this paper, we propose FlexSketch, an online probability density estimation algorithm for data streams. Our algorithm uses an ensemble of histograms, each of which represents a different length of data history. FlexSketch updates each histogram for a new data sample and generates probability distribution by combining the ensemble of histograms while monitoring discrepancy between recent data and existing models periodically. When it detects concept drift, a new histogram is added to the ensemble and the oldest histogram is removed. This allows us to estimate the probability density function with high update speed and high accuracy using only limited memory. Experimental results demonstrate that our algorithm shows improved speed and accuracy compared to existing methods for both stationary and non-stationary data streams.
Collapse
Affiliation(s)
| | - Songkuk Kim
- School of Integrated Technology, Yonsei University, Incheon 21983, Korea;
| |
Collapse
|
32
|
Bertini JR. Graph embedded rules for explainable predictions in data streams. Neural Netw 2020; 129:174-192. [DOI: 10.1016/j.neunet.2020.05.035] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 05/25/2020] [Accepted: 05/28/2020] [Indexed: 12/11/2022]
|
33
|
Toor AA, Usman M, Younas F, M. Fong AC, Khan SA, Fong S. Mining Massive E-Health Data Streams for IoMT Enabled Healthcare Systems. SENSORS 2020; 20:s20072131. [PMID: 32283841 PMCID: PMC7180875 DOI: 10.3390/s20072131] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Revised: 04/04/2020] [Accepted: 04/07/2020] [Indexed: 12/02/2022]
Abstract
With the increasing popularity of the Internet-of-Medical-Things (IoMT) and smart devices, huge volumes of data streams have been generated. This study aims to address the concept drift, which is a major challenge in the processing of voluminous data streams. Concept drift refers to overtime change in data distribution. It may occur in the medical domain, for example the medical sensors measuring for general healthcare or rehabilitation, which may switch their roles for ICU emergency operations when required. Detecting concept drifts becomes trickier when the class distributions in data are skewed, which is often true for medical sensors e-health data. Reactive Drift Detection Method (RDDM) is an efficient method for detecting long concepts. However, RDDM has a high error rate, and it does not handle class imbalance. We propose an Enhanced Reactive Drift Detection Method (ERDDM), which systematically generates strategies to handle concept drift with class imbalance in data streams. We conducted experiments to compare ERDDM with three contemporary techniques in terms of prediction error, drift detection delay, latency, and ability to handle data imbalance. The experimentation was done in Massive Online Analysis (MOA) on 48 synthetic datasets customized to possess the capabilities of data streams. ERDDM can handle abrupt and gradual drifts and performs better than all benchmarks in almost all experiments.
Collapse
Affiliation(s)
- Affan Ahmed Toor
- Department of Computer Science, Shaheed Zulfikar Ali Bhutto Institute of Science and Technology, Islamabad 44000, Pakistan; (A.A.T.); (M.U.); (F.Y.)
| | - Muhammad Usman
- Department of Computer Science, Shaheed Zulfikar Ali Bhutto Institute of Science and Technology, Islamabad 44000, Pakistan; (A.A.T.); (M.U.); (F.Y.)
| | - Farah Younas
- Department of Computer Science, Shaheed Zulfikar Ali Bhutto Institute of Science and Technology, Islamabad 44000, Pakistan; (A.A.T.); (M.U.); (F.Y.)
| | - Alvis Cheuk M. Fong
- Department of Computing, Western Michigan University, Gladstone, MI 49837, USA
- Correspondence: ; Tel.: +1-269-2763-110
| | - Sajid Ali Khan
- Department of Software Engineering, Foundation University Islamabad, Islambad 44000, Pakistan;
| | - Simon Fong
- Department of Computer and Information Science, University of Macau, Macau 999078, China;
| |
Collapse
|