1
|
DynK-hydra: improved dynamic architecture ensembling for efficient inference. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-022-00897-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
AbstractAccessibility on edge devices and the trade-off between latency and accuracy is an area of interest in deploying deep learning models. This paper explores a Mixture of Experts system, namely, DynK-Hydra, which allows training of an ensemble formed of multiple similar branches on data sets with a high number of classes, but uses, during the inference, only a subset of necessary branches. We achieve this by training a cohort of specialized branches (deep network of reduced size) and a gater/supervisor, that decides dynamically what branch to use for any specific input. An original contribution is that the number of chosen models is dynamically set, based on how confident the gater is (similar works use a static parameter for this). Another contribution is the way we ensure the branches’ specialization. We divide the data set classes into multiple clusters, and we assign a cluster to each branch while enforcing its specialization on this cluster by a separate loss function. We evaluate DynK-Hydra on CIFAR-100, Food-101, CUB-200, and ImageNet32 data sets and we obtain improvements of up to 4.3% accuracy compared with state-of-the-art ResNet. All this while reducing the number of inference flops by a factor of 2–5.5 times. Compared to a similar work (HydraRes), we obtain marginal accuracy improvements of up to 1.2% on the pairwise inference time architectures. However, we improve the inference times by up to 2.8 times compared to HydraRes.
Collapse
|
2
|
Xu H, Cao D, Li S. A self-regulated generative adversarial network for stock price movement prediction based on the historical price and tweets. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
3
|
Lu J, Ding J, Dai X, Chai T. Ensemble Stochastic Configuration Networks for Estimating Prediction Intervals: A Simultaneous Robust Training Algorithm and Its Application. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:5426-5440. [PMID: 32071006 DOI: 10.1109/tnnls.2020.2967816] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Obtaining accurate point prediction of industrial processes' key variables is challenging due to the outliers and noise that are common in industrial data. Hence the prediction intervals (PIs) have been widely adopted to quantify the uncertainty related to the point prediction. In order to improve the prediction accuracy and quantify the level of uncertainty associated with the point prediction, this article estimates the PIs by using ensemble stochastic configuration networks (SCNs) and bootstrap method. The estimated PIs can guarantee both the modeling stability and computational efficiency. To encourage the cooperation among the base SCNs and improve the robustness of the ensemble SCNs when the training data are contaminated with noise and outliers, a simultaneous robust training method of the ensemble SCNs is developed based on the Bayesian ridge regression and M-estimate. Moreover, the hyperparameters of the assumed distributions over noise and output weights of the ensemble SCNs are estimated by the expectation-maximization (EM) algorithm, which can result in the optimal PIs and better prediction accuracy. Finally, the performance of the proposed approach is evaluated on three benchmark data sets and a real-world data set collected from a refinery. The experimental results demonstrate that the proposed approach exhibits better performance in terms of the quality of PIs, prediction accuracy, and robustness.
Collapse
|
4
|
Li W, Li M, Zhang J, Qiao J. Design of a self-organizing reciprocal modular neural network for nonlinear system modeling. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.06.056] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
5
|
Li W, Li M, Qiao J, Guo X. A feature clustering-based adaptive modular neural network for nonlinear system modeling. ISA TRANSACTIONS 2020; 100:185-197. [PMID: 31767196 DOI: 10.1016/j.isatra.2019.11.015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Revised: 08/27/2019] [Accepted: 11/08/2019] [Indexed: 06/10/2023]
Abstract
To improve the performance of nonlinear system modeling, this study proposes a feature clustering-based adaptive modular neural network (FC-AMNN) by simulating information processing mechanism of human brains in the way that different information is processed by different modules in parallel. Firstly, features are clustered using an adaptive feature clustering algorithm, and the number of modules in FC-AMNN is determined by the number of feature clusters automatically. The features in each cluster are then allocated to the corresponding module in FC-AMNN. Then, a self-constructive RBF neural network based on Error Correction algorithm is adopted as the subnetwork to study the allocated features. All modules work in parallel and are finally integrated using a Bayesian method to obtain the output. To demonstrate the effectiveness of the proposed model, FC-AMNN is tested on several UCI benchmark problems as well as a practical problem in wastewater treatment process. The experimental results show that the FC-AMNN can achieve a better generalization performance and an accurate result for nonlinear system modeling compared with other modular neural networks.
Collapse
Affiliation(s)
- Wenjing Li
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing, 100124, China.
| | - Meng Li
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing, 100124, China
| | - Junfei Qiao
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing, 100124, China
| | - Xin Guo
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China; Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing, 100124, China
| |
Collapse
|
6
|
|
7
|
Hosni M, Abnane I, Idri A, Carrillo de Gea JM, Fernández Alemán JL. Reviewing ensemble classification methods in breast cancer. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2019; 177:89-112. [PMID: 31319964 DOI: 10.1016/j.cmpb.2019.05.019] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Revised: 05/16/2019] [Accepted: 05/18/2019] [Indexed: 05/09/2023]
Abstract
CONTEXT Ensemble methods consist of combining more than one single technique to solve the same task. This approach was designed to overcome the weaknesses of single techniques and consolidate their strengths. Ensemble methods are now widely used to carry out prediction tasks (e.g. classification and regression) in several fields, including that of bioinformatics. Researchers have particularly begun to employ ensemble techniques to improve research into breast cancer, as this is the most frequent type of cancer and accounts for most of the deaths among women. OBJECTIVE AND METHOD The goal of this study is to analyse the state of the art in ensemble classification methods when applied to breast cancer as regards 9 aspects: publication venues, medical tasks tackled, empirical and research types adopted, types of ensembles proposed, single techniques used to construct the ensembles, validation framework adopted to evaluate the proposed ensembles, tools used to build the ensembles, and optimization methods used for the single techniques. This paper was undertaken as a systematic mapping study. RESULTS A total of 193 papers that were published from the year 2000 onwards, were selected from four online databases: IEEE Xplore, ACM digital library, Scopus and PubMed. This study found that of the six medical tasks that exist, the diagnosis medical task was that most frequently researched, and that the experiment-based empirical type and evaluation-based research type were the most dominant approaches adopted in the selected studies. The homogeneous type was that most widely used to perform the classification task. With regard to single techniques, this mapping study found that decision trees, support vector machines and artificial neural networks were those most frequently adopted to build ensemble classifiers. In the case of the evaluation framework, the Wisconsin Breast Cancer dataset was the most frequently used by researchers to perform their experiments, while the most noticeable validation method was k-fold cross-validation. Several tools are available to perform experiments related to ensemble classification methods, such as Weka and R Software. Few researchers took into account the optimisation of the single technique of which their proposed ensemble was composed, while the grid search method was that most frequently adopted to tune the parameter settings of a single classifier. CONCLUSION This paper reports an in-depth study of the application of ensemble methods as regards breast cancer. Our results show that there are several gaps and issues and we, therefore, provide researchers in the field of breast cancer research with recommendations. Moreover, after analysing the papers found in this systematic mapping study, we discovered that the majority report positive results concerning the accuracy of ensemble classifiers when compared to the single classifiers. In order to aggregate the evidence reported in literature, it will, therefore, be necessary to perform a systematic literature review and meta-analysis in which an in-depth analysis could be conducted so as to confirm the superiority of ensemble classifiers over the classical techniques.
Collapse
Affiliation(s)
- Mohamed Hosni
- Software Project Management Research Team, ENSIAS, University Mohammed V of Rabat, Morocco.
| | - Ibtissam Abnane
- Software Project Management Research Team, ENSIAS, University Mohammed V of Rabat, Morocco.
| | - Ali Idri
- Software Project Management Research Team, ENSIAS, University Mohammed V of Rabat, Morocco.
| | - Juan M Carrillo de Gea
- Department of Informatics and Systems, Faculty of Computer Science, University of Murcia, Spain.
| | | |
Collapse
|
8
|
|
9
|
|
10
|
Chen H, Jiang B, Yao X. Semisupervised Negative Correlation Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:5366-5379. [PMID: 29994737 DOI: 10.1109/tnnls.2017.2784814] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Negative correlation learning (NCL) is an ensemble learning algorithm that introduces a correlation penalty term to the cost function of each individual ensemble member. Each ensemble member minimizes its mean square error and its error correlation with the rest of the ensemble. This paper analyzes NCL and reveals that adopting a negative correlation term for unlabeled data is beneficial to improving the model performance in the semisupervised learning (SSL) setting. We then propose a novel SSL algorithm, Semisupervised NCL (SemiNCL) algorithm. The algorithm considers the negative correlation terms for both labeled and unlabeled data for the semisupervised problems. In order to reduce the computational and memory complexity, an accelerated SemiNCL is derived from the distributed least square algorithm. In addition, we have derived a bound for two parameters in SemiNCL based on an analysis of the Hessian matrix of the error function. The new algorithm is evaluated by extensive experiments with various ratios of labeled and unlabeled training data. Comparisons with other state-of-the-art supervised and semisupervised algorithms confirm that SemiNCL achieves the best overall performance.
Collapse
|
11
|
|
12
|
Sheng W, Shan P, Chen S, Liu Y, Alsaadi FE. A niching evolutionary algorithm with adaptive negative correlation learning for neural network ensemble. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.03.055] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
13
|
Lv F, Yang G, Yang W, Zhang X, Li K. The convergence and termination criterion of quantum-inspired evolutionary neural networks. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.01.048] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
14
|
Muzhou H, Taohua L, Yunlei Y, Hao Z, Hongjuan L, Xiugui Y, Xinge L. A new hybrid constructive neural network method for impacting and its application on tungsten price prediction. APPL INTELL 2017. [DOI: 10.1007/s10489-016-0882-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
15
|
|
16
|
Fernández JC, Cruz-Ramírez M, Hervás-Martínez C. Sensitivity versus accuracy in ensemble models of Artificial Neural Networks from Multi-objective Evolutionary Algorithms. Neural Comput Appl 2016. [DOI: 10.1007/s00521-016-2781-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
17
|
Dhaliwal BS, Pattnaik SS. BFO–ANN ensemble hybrid algorithm to design compact fractal antenna for rectenna system. Neural Comput Appl 2016. [DOI: 10.1007/s00521-016-2402-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
18
|
Feitosa Neto AA, Canuto AMP, Dantas CA. Multiobjective Optimization Techniques for Selecting Important Metrics in the Design of Ensemble Systems. Comput Intell 2016. [DOI: 10.1111/coin.12090] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Antonino A. Feitosa Neto
- Department of Informatics and Applied Mathematics; Federal University of Rio Grande do Norte (UFRN); Natal Brazil
| | - Anne M. P. Canuto
- Department of Informatics and Applied Mathematics; Federal University of Rio Grande do Norte (UFRN); Natal Brazil
| | - Carine A. Dantas
- Department of Informatics and Applied Mathematics; Federal University of Rio Grande do Norte (UFRN); Natal Brazil
| |
Collapse
|
19
|
Rahman MM, Islam MM, Murase K, Yao X. Layered Ensemble Architecture for Time Series Forecasting. IEEE TRANSACTIONS ON CYBERNETICS 2016; 46:270-283. [PMID: 25751882 DOI: 10.1109/tcyb.2015.2401038] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Time series forecasting (TSF) has been widely used in many application areas such as science, engineering, and finance. The phenomena generating time series are usually unknown and information available for forecasting is only limited to the past values of the series. It is, therefore, necessary to use an appropriate number of past values, termed lag, for forecasting. This paper proposes a layered ensemble architecture (LEA) for TSF problems. Our LEA consists of two layers, each of which uses an ensemble of multilayer perceptron (MLP) networks. While the first ensemble layer tries to find an appropriate lag, the second ensemble layer employs the obtained lag for forecasting. Unlike most previous work on TSF, the proposed architecture considers both accuracy and diversity of the individual networks in constructing an ensemble. LEA trains different networks in the ensemble by using different training sets with an aim of maintaining diversity among the networks. However, it uses the appropriate lag and combines the best trained networks to construct the ensemble. This indicates LEAs emphasis on accuracy of the networks. The proposed architecture has been tested extensively on time series data of neural network (NN)3 and NN5 competitions. It has also been tested on several standard benchmark time series data. In terms of forecasting accuracy, our experimental results have revealed clearly that LEA is better than other ensemble and nonensemble methods.
Collapse
|
20
|
López-Yáñez I, Sheremetov L, Yáñez-Márquez C. A novel associative model for time series data mining. Pattern Recognit Lett 2014. [DOI: 10.1016/j.patrec.2013.11.008] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
21
|
Xi L, Muzhou H, Lee MH, Li J, Wei D, Hai H, Wu Y. A new constructive neural network method for noise processing and its application on stock market prediction. Appl Soft Comput 2014. [DOI: 10.1016/j.asoc.2013.10.013] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
22
|
Teso S, Passerini A. Joint probabilistic-logical refinement of multiple protein feature predictors. BMC Bioinformatics 2014; 15:16. [PMID: 24428894 PMCID: PMC3929554 DOI: 10.1186/1471-2105-15-16] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2012] [Accepted: 11/06/2013] [Indexed: 11/24/2022] Open
Abstract
Background Computational methods for the prediction of protein features from sequence are a long-standing focus of bioinformatics. A key observation is that several protein features are closely inter-related, that is, they are conditioned on each other. Researchers invested a lot of effort into designing predictors that exploit this fact. Most existing methods leverage inter-feature constraints by including known (or predicted) correlated features as inputs to the predictor, thus conditioning the result. Results By including correlated features as inputs, existing methods only rely on one side of the relation: the output feature is conditioned on the known input features. Here we show how to jointly improve the outputs of multiple correlated predictors by means of a probabilistic-logical consistency layer. The logical layer enforces a set of weighted first-order rules encoding biological constraints between the features, and improves the raw predictions so that they least violate the constraints. In particular, we show how to integrate three stand-alone predictors of correlated features: subcellular localization (Loctree [J Mol Biol 348:85–100, 2005]), disulfide bonding state (Disulfind [Nucleic Acids Res 34:W177–W181, 2006]), and metal bonding state (MetalDetector [Bioinformatics 24:2094–2095, 2008]), in a way that takes into account the respective strengths and weaknesses, and does not require any change to the predictors themselves. We also compare our methodology against two alternative refinement pipelines based on state-of-the-art sequential prediction methods. Conclusions The proposed framework is able to improve the performance of the underlying predictors by removing rule violations. We show that different predictors offer complementary advantages, and our method is able to integrate them using non-trivial constraints, generating more consistent predictions. In addition, our framework is fully general, and could in principle be applied to a vast array of heterogeneous predictions without requiring any change to the underlying software. On the other hand, the alternative strategies are more specific and tend to favor one task at the expense of the others, as shown by our experimental evaluation. The ultimate goal of our framework is to seamlessly integrate full prediction suites, such as Distill [BMC Bioinformatics 7:402, 2006] and PredictProtein [Nucleic Acids Res 32:W321–W326, 2004].
Collapse
Affiliation(s)
- Stefano Teso
- Department of Information Engineering and Computer Science, Università degli Studi di Trento, Trento, Italy.
| | | |
Collapse
|
23
|
Ludwig O, Nunes U, Ribeiro B, Premebida C. Improving the generalization capacity of cascade classifiers. IEEE TRANSACTIONS ON CYBERNETICS 2013; 43:2135-2146. [PMID: 23757522 DOI: 10.1109/tcyb.2013.2240678] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
The cascade classifier is a usual approach in object detection based on vision, since it successively rejects negative occurrences, e.g., background images, in a cascade structure, keeping the processing time suitable for on-the-fly applications. On the other hand, similar to other classifier ensembles, cascade classifiers are likely to have high Vapnik-Chervonenkis (VC) dimension, which may lead to overfitting the training data. Therefore, this work aims at improving the generalization capacity of the cascade classifier by controlling its complexity, which depends on the model of their classifier stages, the number of stages, and the feature space dimension of each stage, which can be controlled by integrating the parameter setting of the feature extractor (in our case an image descriptor) into the maximum-margin framework of support vector machine training, as will be shown in this paper. Moreover, to set the number of cascade stages, bounds on the false positive rate (FP) and on the true positive rate (TP) of cascade classifiers are derived based on a VC-style analysis. These bounds are applied to compose an enveloping receiver operating curve (EROC), i.e., a new curve in the TP–FP space in which each point is an ordered pair of upper bound on the FP and lower bound on the TP. The optimal number of cascade stages is forecasted by comparing EROCs of cascades with different numbers of stages.
Collapse
|
24
|
Lu TC, Yu GR, Juang JC. Quantum-based algorithm for optimizing artificial neural networks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2013; 24:1266-1278. [PMID: 24808566 DOI: 10.1109/tnnls.2013.2249089] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
This paper presents a quantum-based algorithm for evolving artificial neural networks (ANNs). The aim is to design an ANN with few connections and high classification performance by simultaneously optimizing the network structure and the connection weights. Unlike most previous studies, the proposed algorithm uses quantum bit representation to codify the network. As a result, the connectivity bits do not indicate the actual links but the probability of the existence of the connections, thus alleviating mapping problems and reducing the risk of throwing away a potential candidate. In addition, in the proposed model, each weight space is decomposed into subspaces in terms of quantum bits. Thus, the algorithm performs a region by region exploration, and evolves gradually to find promising subspaces for further exploitation. This is helpful to provide a set of appropriate weights when evolving the network structure and to alleviate the noisy fitness evaluation problem. The proposed model is tested on four benchmark problems, namely breast cancer and iris, heart, and diabetes problems. The experimental results show that the proposed algorithm can produce compact ANN structures with good generalization ability compared to other algorithms.
Collapse
|
25
|
Sheng C, Zhao J, Wang W, Leung H. Prediction intervals for a noisy nonlinear time series based on a bootstrapping reservoir computing network ensemble. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2013; 24:1036-1048. [PMID: 24808519 DOI: 10.1109/tnnls.2013.2250299] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Prediction intervals that provide estimated values as well as the corresponding reliability are applied to nonlinear time series forecast. However, constructing reliable prediction intervals for noisy time series is still a challenge. In this paper, a bootstrapping reservoir computing network ensemble (BRCNE) is proposed and a simultaneous training method based on Bayesian linear regression is developed. In addition, the structural parameters of the BRCNE, that is, the number of reservoir computing networks and the reservoir dimension, are determined off-line by the 0.632 bootstrap cross-validation. To verify the effectiveness of the proposed method, two kinds of time series data, including the multisuperimposed oscillator problem with additive noises and a practical gas flow in steel industry are employed here. The experimental results indicate that the proposed approach has a satisfactory performance on prediction intervals for practical applications.
Collapse
|
26
|
|
27
|
|
28
|
AKHAND MAH, MURASE K. ENSEMBLES OF NEURAL NETWORKS BASED ON THE ALTERATION OF INPUT FEATURE VALUES. Int J Neural Syst 2012; 22:77-87. [DOI: 10.1142/s0129065712003079] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
An ensemble performs well when the component classifiers are diverse yet accurate, so that the failure of one is compensated for by others. A number of methods have been investigated for constructing ensemble in which some of them train classifiers with the generated patterns. This study investigates a new technique of training pattern generation. The method alters input feature values of some patterns using the values of other patterns to generate different patterns for different classifiers. The effectiveness of neural network ensemble based on the proposed technique was evaluated using a suite of 25 benchmark classification problems, and was found to achieve performance better than or competitive with related conventional methods. Experimental investigation of different input values alteration techniques finds that alteration with pattern values in the same class is better for generalization, although other alteration techniques may offer more diversity.
Collapse
Affiliation(s)
- M. A. H. AKHAND
- Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna 9203, Bangladesh
| | - K. MURASE
- Graduate School of Engineering, University of Fukui, 3-9-1 Bunkyo, Fukui 910-8507, Japan
| |
Collapse
|
29
|
|
30
|
García-Pedrajas N, de Haro-García A. Scaling up data mining algorithms: review and taxonomy. PROGRESS IN ARTIFICIAL INTELLIGENCE 2012. [DOI: 10.1007/s13748-011-0004-4] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
31
|
Abstract
The k-nearest neighbor method is a classifier based on the evaluation of the distances to each pattern in the training set. The edited version of this method consists of the application of this classifier with a subset of the complete training set in which some of the training patterns are excluded, in order to reduce the classification error rate. In recent works, genetic algorithms have been successfully applied to determine which patterns must be included in the edited subset. In this paper we propose a novel implementation of a genetic algorithm for designing edited k-nearest neighbor classifiers. It includes the definition of a novel mean square error based fitness function, a novel clustered crossover technique, and the proposal of a fast smart mutation scheme. In order to evaluate the performance of the proposed method, results using the breast cancer database, the diabetes database and the letter recognition database from the UCI machine learning benchmark repository have been included. Both error rate and computational cost have been considered in the analysis. Obtained results show the improvement achieved by the proposed editing method.
Collapse
Affiliation(s)
- ROBERTO GIL-PITA
- Signal Theory and Communications Department, University of Alcalá, Alcalá de Henares, Madrid 28805, Spain
| | - XIN YAO
- The Centre of Excellence for Research in Computational Intelligence and Applications (CERCIA), School of Computer Sciences, University of Birmingham, Birmingham B15 2TT, United Kingdom
- Nature Inspired Computation and Applications Laboratory (NICAL), Department of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui 230027, China
| |
Collapse
|
32
|
AKHAND MAH, ISLAM MDMONIRUL, MURASE KAZUYUKI. A COMPARATIVE STUDY OF DATA SAMPLING TECHNIQUES FOR CONSTRUCTING NEURAL NETWORK ENSEMBLES. Int J Neural Syst 2011; 19:67-89. [DOI: 10.1142/s0129065709001859] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Ensembles with several classifiers (such as neural networks or decision trees) are widely used to improve the generalization performance over a single classifier. Proper diversity among component classifiers is considered an important parameter for ensemble construction so that failure of one may be compensated by others. Among various approaches, data sampling, i.e., different data sets for different classifiers, is found more effective than other approaches. A number of ensemble methods have been proposed under the umbrella of data sampling in which some are constrained to neural networks or decision trees and others are commonly applicable to both types of classifiers. We studied prominent data sampling techniques for neural network ensembles, and then experimentally evaluated their effectiveness on a common test ground. Based on overlap and uncover, the relation between generalization and diversity is presented. Eight ensemble methods were tested on 30 benchmark classification problems. We found that bagging and boosting, the pioneer ensemble methods, are still better than most of the other proposed methods. However, negative correlation learning that implicitly encourages different networks to different training spaces is shown as better or at least comparable to bagging and boosting that explicitly create different training spaces.
Collapse
Affiliation(s)
- M. A. H. AKHAND
- Graduate School of Engineering, University of Fukui, 3-9-1 Bunkyo, Fukui 910-8507, Japan
| | - MD. MONIRUL ISLAM
- Graduate School of Engineering, University of Fukui, 3-9-1 Bunkyo, Fukui 910-8507, Japan
| | - KAZUYUKI MURASE
- Graduate School of Engineering, University of Fukui, 3-9-1 Bunkyo, Fukui 910-8507, Japan
- Research and Education Program for Life Science, University of Fukui, 3-9-1 Bunkyo, Fukui 910-8507, Japan
| |
Collapse
|
33
|
|
34
|
Pendashteh AR, Fakhru'l-Razi A, Chaibakhsh N, Abdullah LC, Madaeni SS, Abidin ZZ. Modeling of membrane bioreactor treating hypersaline oily wastewater by artificial neural network. JOURNAL OF HAZARDOUS MATERIALS 2011; 192:568-575. [PMID: 21676540 DOI: 10.1016/j.jhazmat.2011.05.052] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2010] [Revised: 05/13/2011] [Accepted: 05/17/2011] [Indexed: 05/30/2023]
Abstract
A membrane sequencing batch reactor (MSBR) treating hypersaline oily wastewater was modeled by artificial neural network (ANN). The MSBR operated at different total dissolved solids (TDSs) (35,000; 50,000; 100,000; 150,000; 200,000; 250,000mg/L), various organic loading rates (OLRs) (0.281, 0.563, 1.124, 2.248, and 3.372kg COD/(m(3)day)) and cyclic time (12, 24, and 48h). A feed-forward neural network trained by batch back propagation algorithm was employed to model the MSBR. A set of 193 operational data from the wastewater treatment with the MSBR was used to train the network. The training, validating and testing procedures for the effluent COD, total organic carbon (TOC) and oil and grease (O&G) concentrations were successful and a good correlation was observed between the measured and predicted values. The results showed that at OLR of 2.44kg COD/(m(3)day), TDS of 78,000mg/L and reaction time (RT) of 40h, the average removal rate of COD was 98%. In these conditions, the average effluent COD concentration was less than 100mg/L and met the discharge limits.
Collapse
Affiliation(s)
- Ali Reza Pendashteh
- Department of Chemical and Environmental Engineering, Faculty of Engineering, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor D.E., Malaysia
| | | | | | | | | | | |
Collapse
|
35
|
Akhand MAH, Shill PC, Murase K. Hybrid Ensemble Construction with Selected Neural Networks. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS 2011. [DOI: 10.20965/jaciii.2011.p0652] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
A Neural Network Ensemble (NNE) is convenient for improving classification task performance. Among the remarkable number of methods based on different techniques for constructing NNEs, Negative Correlation Learning (NCL), bagging, and boosting are the most popular. None of them, however, could show better performance for all problems. To improve performance combining the complementary strengths of the individual methods, we propose two different ways to construct hybrid ensembles combining NCL with bagging and boosting. One produces a pool of predefined numbers of networks using standard NCL and bagging (or boosting) and then uses a genetic algorithm to select an optimal network subset for an NNE from the pool. Results of experiments confirmed that our proposals show consistently better performance with concise ensembles than conventional methods when tested using a suite of 25 benchmark problems.
Collapse
|
36
|
Khosravi A, Nahavandi S, Creighton D, Atiya AF. Comprehensive review of neural network-based prediction intervals and new advances. ACTA ACUST UNITED AC 2011; 22:1341-56. [PMID: 21803683 DOI: 10.1109/tnn.2011.2162110] [Citation(s) in RCA: 333] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
This paper evaluates the four leading techniques proposed in the literature for construction of prediction intervals (PIs) for neural network point forecasts. The delta, Bayesian, bootstrap, and mean-variance estimation (MVE) methods are reviewed and their performance for generating high-quality PIs is compared. PI-based measures are proposed and applied for the objective and quantitative assessment of each method's performance. A selection of 12 synthetic and real-world case studies is used to examine each method's performance for PI construction. The comparison is performed on the basis of the quality of generated PIs, the repeatability of the results, the computational requirements and the PIs variability with regard to the data uncertainty. The obtained results in this paper indicate that: 1) the delta and Bayesian methods are the best in terms of quality and repeatability, and 2) the MVE and bootstrap methods are the best in terms of low computational load and the width variability of PIs. This paper also introduces the concept of combinations of PIs, and proposes a new method for generating combined PIs using the traditional PIs. Genetic algorithm is applied for adjusting the combiner parameters through minimization of a PI-based cost function subject to two sets of restrictions. It is shown that the quality of PIs produced by the combiners is dramatically better than the quality of PIs obtained from each individual method.
Collapse
Affiliation(s)
- Abbas Khosravi
- Centre for Intelligent Systems Research, Deakin University, Geelong, Victoria 3117, Australia.
| | | | | | | |
Collapse
|
37
|
A new data mining scheme using artificial neural networks. SENSORS 2011; 11:4622-47. [PMID: 22163866 PMCID: PMC3231400 DOI: 10.3390/s110504622] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/11/2011] [Revised: 04/11/2011] [Accepted: 04/14/2011] [Indexed: 11/16/2022]
Abstract
Classification is one of the data mining problems receiving enormous attention in the database community. Although artificial neural networks (ANNs) have been successfully applied in a wide range of machine learning applications, they are however often regarded as black boxes, i.e., their predictions cannot be explained. To enhance the explanation of ANNs, a novel algorithm to extract symbolic rules from ANNs has been proposed in this paper. ANN methods have not been effectively utilized for data mining tasks because how the classifications were made is not explicitly stated as symbolic rules that are suitable for verification or interpretation by human experts. With the proposed approach, concise symbolic rules with high accuracy, that are easily explainable, can be extracted from the trained ANNs. Extracted rules are comparable with other methods in terms of number of rules, average number of conditions for a rule, and the accuracy. The effectiveness of the proposed approach is clearly demonstrated by the experimental results on a set of benchmark data mining classification problems.
Collapse
|
38
|
Castro PAD, Von Zuben FJ. Learning Ensembles of Neural Networks by Means of a Bayesian Artificial Immune System. ACTA ACUST UNITED AC 2011; 22:304-16. [DOI: 10.1109/tnn.2010.2096823] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
39
|
Araújo RDA. Hybrid intelligent methodology to design translation invariant morphological operators for Brazilian stock market prediction. Neural Netw 2010; 23:1238-51. [DOI: 10.1016/j.neunet.2010.06.007] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2009] [Revised: 06/04/2010] [Accepted: 06/17/2010] [Indexed: 10/19/2022]
|
40
|
Wang B, Chiang HD. ELITE: ensemble of optimal input-pruned neural networks using TRUST-TECH. ACTA ACUST UNITED AC 2010; 22:96-109. [PMID: 21075722 DOI: 10.1109/tnn.2010.2087354] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The ensemble of optimal input-pruned neural networks using TRUST-TECH (ELITE) method for constructing high-quality ensemble through an optimal linear combination of accurate and diverse neural networks is developed. The optimization problems in the proposed methodology are solved by a global optimization a global optimization method called TRansformation Under Stability-reTraining Equilibrium Characterization (TRUST-TECH), whose main features include its capability in identifying multiple local optimal solutions in a deterministic, systematic, and tier-by-tier manner. ELITE creates a diverse population via a feature selection procedure of different local optimal neural networks obtained using tier-1 TRUST-TECH search. In addition, the capability of each input-pruned network is fully exploited through a TRUST-TECH-based optimal training. Finally, finding the optimal linear combination weights for an ensemble is modeled as a nonlinear programming problem and solved using TRUST-TECH and the interior point method, where the issue of non-convexity can be effectively handled. Extensive numerical experiments have been carried out for pattern classification on the synthetic and benchmark datasets. Numerical results show that ELITE consistently outperforms existing methods on the benchmark datasets. The results show that ELITE can be very promising for constructing high-quality neural network ensembles.
Collapse
Affiliation(s)
- Bin Wang
- School of Electrical and Computer Engineering, Cornell University, Ithaca, NY 14853, USA.
| | | |
Collapse
|
41
|
Monirul Kabir M, Monirul Islam M, Murase K. A new wrapper feature selection approach using neural network. Neurocomputing 2010. [DOI: 10.1016/j.neucom.2010.04.003] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
42
|
|
43
|
|
44
|
|
45
|
|
46
|
Abstract
Negative correlation learning (NCL) is a neural network ensemble learning algorithm that introduces a correlation penalty term to the cost function of each individual network so that each neural network minimizes its mean square error (MSE) together with the correlation of the ensemble. This paper analyzes NCL and reveals that the training of NCL (when lambda = 1) corresponds to training the entire ensemble as a single learning machine that only minimizes the MSE without regularization. This analysis explains the reason why NCL is prone to overfitting the noise in the training set. This paper also demonstrates that tuning the correlation parameter lambda in NCL by cross validation cannot overcome the overfitting problem. The paper analyzes this problem and proposes the regularized negative correlation learning (RNCL) algorithm which incorporates an additional regularization term for the whole ensemble. RNCL decomposes the ensemble's training objectives, including MSE and regularization, into a set of sub-objectives, and each sub-objective is implemented by an individual neural network. In this paper, we also provide a Bayesian interpretation for RNCL and provide an automatic algorithm to optimize regularization parameters based on Bayesian inference. The RNCL formulation is applicable to any nonlinear estimator minimizing the MSE. The experiments on synthetic as well as real-world data sets demonstrate that RNCL achieves better performance than NCL, especially when the noise level is nontrivial in the data set.
Collapse
Affiliation(s)
- Huanhuan Chen
- The Centre of Excellence for Research in Computational Intelligence and Applications (CERCIA), School of Computer Science, University of Birmingham, Birmingham, UK.
| | | |
Collapse
|
47
|
Rokach L. Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography. Comput Stat Data Anal 2009. [DOI: 10.1016/j.csda.2009.07.017] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
48
|
|
49
|
Kraipeerapun P, Fung CC. Binary classification using ensemble neural networks and interval neutrosophic sets. Neurocomputing 2009. [DOI: 10.1016/j.neucom.2008.07.017] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
50
|
Amin MF, Islam MM, Murase K. Ensemble of single-layered complex-valued neural networks for classification tasks. Neurocomputing 2009. [DOI: 10.1016/j.neucom.2008.12.028] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|