1
|
Altham C, Zhang H, Pereira E. Machine learning for the detection and diagnosis of cognitive impairment in Parkinson's Disease: A systematic review. PLoS One 2024; 19:e0303644. [PMID: 38753740 PMCID: PMC11098383 DOI: 10.1371/journal.pone.0303644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Accepted: 04/29/2024] [Indexed: 05/18/2024] Open
Abstract
BACKGROUND Parkinson's Disease is the second most common neurological disease in over 60s. Cognitive impairment is a major clinical symptom, with risk of severe dysfunction up to 20 years post-diagnosis. Processes for detection and diagnosis of cognitive impairments are not sufficient to predict decline at an early stage for significant impact. Ageing populations, neurologist shortages and subjective interpretations reduce the effectiveness of decisions and diagnoses. Researchers are now utilising machine learning for detection and diagnosis of cognitive impairment based on symptom presentation and clinical investigation. This work aims to provide an overview of published studies applying machine learning to detecting and diagnosing cognitive impairment, evaluate the feasibility of implemented methods, their impacts, and provide suitable recommendations for methods, modalities and outcomes. METHODS To provide an overview of the machine learning techniques, data sources and modalities used for detection and diagnosis of cognitive impairment in Parkinson's Disease, we conducted a review of studies published on the PubMed, IEEE Xplore, Scopus and ScienceDirect databases. 70 studies were included in this review, with the most relevant information extracted from each. From each study, strategy, modalities, sources, methods and outcomes were extracted. RESULTS Literatures demonstrate that machine learning techniques have potential to provide considerable insight into investigation of cognitive impairment in Parkinson's Disease. Our review demonstrates the versatility of machine learning in analysing a wide range of different modalities for the detection and diagnosis of cognitive impairment in Parkinson's Disease, including imaging, EEG, speech and more, yielding notable diagnostic accuracy. CONCLUSIONS Machine learning based interventions have the potential to glean meaningful insight from data, and may offer non-invasive means of enhancing cognitive impairment assessment, providing clear and formidable potential for implementation of machine learning into clinical practice.
Collapse
Affiliation(s)
- Callum Altham
- Department of Computer Science, Edge Hill University, Ormskirk, Lancashire, United Kingdom
| | - Huaizhong Zhang
- Department of Computer Science, Edge Hill University, Ormskirk, Lancashire, United Kingdom
| | - Ella Pereira
- Department of Computer Science, Edge Hill University, Ormskirk, Lancashire, United Kingdom
| |
Collapse
|
2
|
Fu W, Xue B, Gao X, Zhang M. Genetic Programming for Document Classification: A Transductive Transfer Learning System. IEEE TRANSACTIONS ON CYBERNETICS 2024; 54:1119-1132. [PMID: 38127617 DOI: 10.1109/tcyb.2023.3338266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Document classification is a challenging task to the data being high-dimensional and sparse. Many transfer learning methods have been investigated for improving the classification performance by effectively transferring knowledge from a source domain to a target domain, which is similar to but different from the source domain. However, most of the existing methods cannot handle the case that the training data of the target domain does not have labels. In this study, we propose a transductive transfer learning system, utilizing solutions evolved by genetic programming (GP) on a source domain to automatically pseudolabel the training data in the target domain in order to train classifiers. Different from many other transfer learning techniques, the proposed system pseudolabels target-domain training data to retrains classifiers using all target-domain features. The proposed method is examined on nine transfer learning tasks, and the results show that the proposed transductive GP system has better prediction accuracy on the test data in the target domain than existing transfer learning approaches including subspace alignment-domain adaptation methods, feature-level-domain adaptation methods, and one latest pseudolabeling strategy-based method.
Collapse
|
3
|
Fahmy H, El-Gendy EM, Mohamed M, Saafan MM. ECH 3OA: An Enhanced Chimp-Harris Hawks Optimization Algorithm for copyright protection in Color Images using watermarking techniques. Knowl Based Syst 2023. [DOI: 10.1016/j.knosys.2023.110494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
|
4
|
Lovinger J, Valova I. AUTO: supervised learning with full model search and global optimisation. J EXP THEOR ARTIF IN 2023. [DOI: 10.1080/0952813x.2023.2165717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Affiliation(s)
- Justin Lovinger
- Computer and Information Science Department, University of Massachusetts Dartmouth, North Dartmouth, MA, USA
| | - Iren Valova
- Computer and Information Science Department, University of Massachusetts Dartmouth, North Dartmouth, MA, USA
| |
Collapse
|
5
|
Investigating the influence of survival selection and fitness estimation method in genotype-based surrogate-assisted genetic programming. ARTIFICIAL LIFE AND ROBOTICS 2022. [DOI: 10.1007/s10015-022-00821-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
6
|
Murphy A, Ali MS, Mota Dias D, Amaral J, Naredo E, Ryan C. Fuzzy Pattern Tree Evolution Using Grammatical Evolution. SN COMPUTER SCIENCE 2022; 3:426. [PMID: 35950192 PMCID: PMC9356967 DOI: 10.1007/s42979-022-01258-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Accepted: 06/20/2022] [Indexed: 10/31/2022]
Abstract
AbstractA novel approach to induce Fuzzy Pattern Trees using Grammatical Evolution is presented in this paper. This new method, called Fuzzy Grammatical Evolution, is applied to a set of benchmark classification problems. Experimental results show that Fuzzy Grammatical Evolution attains similar and oftentimes better results when compared with state-of-the-art Fuzzy Pattern Tree composing methods, namely Fuzzy Pattern Trees evolved using Cartesian Genetic Programming, on a set of benchmark problems. We show that, although Cartesian Genetic Programming produces smaller trees, Fuzzy Grammatical Evolution produces better performing trees. Fuzzy Grammatical Evolution also benefits from a reduction in the number of necessary user-selectable parameters, while Cartesian Genetic Programming requires the selection of three crucial graph parameters before each experiment. To address the issue of bloat, an additional version of Fuzzy Grammatical Evolution using parsimony pressure was tested. The experimental results show that Fuzzy Grammatical Evolution with this extension routinely finds smaller trees than those using Cartesian Genetic Programming without any compromise in performance. To improve the performance of Fuzzy Grammatical Evolution, various ensemble methods were investigated. Boosting was seen to find the best individuals on half the benchmarks investigated.
Collapse
|
7
|
GÜLTEPE Y. Analysis of Alburnus tarichi population by machine learning classification methods for sustainable fisheries. SLAS Technol 2022; 27:261-266. [DOI: 10.1016/j.slast.2022.03.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Revised: 03/06/2022] [Accepted: 03/24/2022] [Indexed: 10/18/2022]
|
8
|
Dioşan L, Andreica A, Voiculescu I. On the use of multi-objective evolutionary classifiers for breast cancer detection. PLoS One 2022; 17:e0269950. [PMID: 35853014 PMCID: PMC9295958 DOI: 10.1371/journal.pone.0269950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2021] [Accepted: 05/31/2022] [Indexed: 11/20/2022] Open
Abstract
PURPOSE Breast cancer is one of the most common tumours in women, nevertheless, it is also one of the cancers that is most usually treated. As a result, early detection is critical, which can be accomplished by routine mammograms. This paper aims to describe, analyze, compare and evaluate three image descriptors involved in classifying breast cancer images from four databases. APPROACH Multi-Objective Evolutionary Algorithms (MOEAs) prove themselves as being efficient methods for selection and classification problems. This paper aims to study combinations of well-known classification objectives in order to compare the results of their application in solving very specific learning problems. The experimental results undergo empirical analysis which is supported by a statistical approach. The results are illustrated on a collection of medical image databases, but with a focus on the MOEAs' performance in terms of several well-known measures. The databases were chosen specifically to feature reliable human annotations, so as to measure the correlation between the gold standard classifications and the various MOEA classifications. RESULTS We have seen how different statistical tests rank one algorithm over the others in our set as being better. These findings are unsurprising, revealing that there is no single gold standard for comparing diverse techniques or evolutionary algorithms. Furthermore, building meta-classifiers and evaluating them using a single, favorable metric is both extremely unwise and unsatisfactory, as the impact is to skew the results. CONCLUSIONS The best method to address these flaws is to select the right set of objectives and criteria. Using accuracy-related objectives, for example, is directly linked to maximizing the number of true positives. If, on the other hand, accuracy is chosen as the generic metric, the primary classification goal is shifted to increasing the positively categorized data points.
Collapse
Affiliation(s)
- Laura Dioşan
- Department of Computer Science, Babes-Bolyai University, Cluj-Napoca, Romania
| | - Anca Andreica
- Department of Computer Science, Babes-Bolyai University, Cluj-Napoca, Romania
| | - Irina Voiculescu
- Department of Computer Science, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
9
|
Ghane M, Ang MC, Nilashi M, Sorooshian S. Enhanced decision tree induction using evolutionary techniques for Parkinson's disease classification. Biocybern Biomed Eng 2022. [DOI: 10.1016/j.bbe.2022.07.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
10
|
A review on big data based parallel and distributed approaches of pattern mining. JOURNAL OF KING SAUD UNIVERSITY - COMPUTER AND INFORMATION SCIENCES 2022. [DOI: 10.1016/j.jksuci.2019.09.006] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
11
|
Pei W, Xue B, Shang L, Zhang M. High-Dimensional Unbalanced Binary Classification by Genetic Programming with Multi-Criterion Fitness Evaluation and Selection. EVOLUTIONARY COMPUTATION 2022; 30:99-129. [PMID: 34902018 DOI: 10.1162/evco_a_00304] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Accepted: 09/10/2021] [Indexed: 06/14/2023]
Abstract
High-dimensional unbalanced classification is challenging because of the joint effects of high dimensionality and class imbalance. Genetic programming (GP) has the potential benefits for use in high-dimensional classification due to its built-in capability to select informative features. However, once data are not evenly distributed, GP tends to develop biased classifiers which achieve a high accuracy on the majority class but a low accuracy on the minority class. Unfortunately, the minority class is often at least as important as the majority class. It is of importance to investigate how GP can be effectively utilized for high-dimensional unbalanced classification. In this article, to address the performance bias issue of GP, a new two-criterion fitness function is developed, which considers two criteria, that is, the approximation of area under the curve (AUC) and the classification clarity (i.e., how well a program can separate two classes). The obtained values on the two criteria are combined in pairs, instead of summing them together. Furthermore, this article designs a three-criterion tournament selection to effectively identify and select good programs to be used by genetic operators for generating offspring during the evolutionary learning process. The experimental results show that the proposed method achieves better classification performance than other compared methods.
Collapse
Affiliation(s)
- Wenbin Pei
- School of Engineering and Computer Science, Victoria University of Wellington, PO Box 600, Wellington 6140, New Zealand
| | - Bing Xue
- School of Engineering and Computer Science, Victoria University of Wellington, PO Box 600, Wellington 6140, New Zealand
| | - Lin Shang
- State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China
| | - Mengjie Zhang
- School of Engineering and Computer Science, Victoria University of Wellington, PO Box 600, Wellington 6140, New Zealand
| |
Collapse
|
12
|
A New Design Method for Optimal Parameters Setting of PSSs and SVC Damping Controllers to Alleviate Power System Stability Problem. ENERGIES 2021. [DOI: 10.3390/en14217312] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
This paper presents an improved Teaching-Learning-Based Optimization (TLBO) for optimal tuning of power system stabilizers (PSSs) and static VAR compensator (SVC)-based controllers. The original TLBO is characterized by easy implementation and is mainly free of control parameters. Unfortunately, TLBO may suffer from population diversity losses in some cases, leading to local optimum and premature convergence. In this study, three approaches are considered for improving the original TLBO (i) randomness improvement, (ii) three new mutation strategies (iii) hyperchaotic perturbation strategy. In the first approach, all random numbers in the original TLBO are substituted by the hyperchaotic map sequence to boost exploration capability. In the second approach, three mutations are carried out to explore a new promising search space. The obtained solution is further improved in the third strategy by implementing a new perturbation equation. The proposed HTLBO was evaluated with 26 test functions. The obtained results show that HTLBO outperforms the TBLO algorithm and some state-of-the-art algorithms in robustness and accuracy in almost all experiments. Moreover, the efficacy of the proposed HTLBO is justified by involving it in the power system stability problem. The results consist of the Integral of Absolute Error (ITAE) and eigenvalue analysis of electromechanical modes demonstrate the superiority and the potential of the proposed HTLBO based PSSs and SVC controllers over a wide range of operating conditions. Besides, the advantage of the proposed coordination design controllers was confirmed by comparing them to PSSs and SVC tuned individually.
Collapse
|
13
|
Al-Sahaf H, Al-Sahaf A, Xue B, Zhang M. Automatically Evolving Texture Image Descriptors Using the Multitree Representation in Genetic Programming Using Few Instances. EVOLUTIONARY COMPUTATION 2021; 29:331-366. [PMID: 33236924 DOI: 10.1162/evco_a_00284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2018] [Accepted: 11/17/2020] [Indexed: 06/11/2023]
Abstract
The performance of image classification is highly dependent on the quality of the extracted features that are used to build a model. Designing such features usually requires prior knowledge of the domain and is often undertaken by a domain expert who, if available, is very costly to employ. Automating the process of designing such features can largely reduce the cost and efforts associated with this task. Image descriptors, such as local binary patterns, have emerged in computer vision, and aim at detecting keypoints, for example, corners, line-segments, and shapes, in an image and extracting features from those keypoints. In this article, genetic programming (GP) is used to automatically evolve an image descriptor using only two instances per class by utilising a multitree program representation. The automatically evolved descriptor operates directly on the raw pixel values of an image and generates the corresponding feature vector. Seven well-known datasets were adapted to the few-shot setting and used to assess the performance of the proposed method and compared against six handcrafted and one evolutionary computation-based image descriptor as well as three convolutional neural network (CNN) based methods. The experimental results show that the new method has significantly outperformed the competitor image descriptors and CNN-based methods. Furthermore, different patterns have been identified from analysing the evolved programs.
Collapse
Affiliation(s)
- Harith Al-Sahaf
- School of Engineering and Computer Science, Victoria University of Wellington, PO Box 600, Wellington 6140, New Zealand
| | - Ausama Al-Sahaf
- School of Engineering and Computer Science, Victoria University of Wellington, PO Box 600, Wellington 6140, New Zealand
| | - Bing Xue
- School of Engineering and Computer Science, Victoria University of Wellington, PO Box 600, Wellington 6140, New Zealand
| | - Mengjie Zhang
- School of Engineering and Computer Science, Victoria University of Wellington, PO Box 600, Wellington 6140, New Zealand
| |
Collapse
|
14
|
Liu WL, Yang J, Zhong J, Wang S. Genetic programming with separability detection for symbolic regression. COMPLEX INTELL SYST 2021. [DOI: 10.1007/s40747-020-00240-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
AbstractGenetic Programming (GP) is a popular and powerful evolutionary optimization algorithm that has a wide range of applications such as symbolic regression, classification and program synthesis. However, existing GPs often ignore the intrinsic structure of the ground truth equation of the symbolic regression problem. To improve the search efficacy of GP on symbolic regression problems by fully exploiting the intrinsic structure information, this paper proposes a genetic programming with separability detection technique (SD-GP). In the proposed SD-GP, a separability detection method is proposed to detect additive separable characteristics of input features from the observed data. Then based on the separability detection results, a chromosome representation is proposed, which utilizes multiple sub chromosomes to represent the final solution. Some sub chromosomes are used to construct separable sub functions by using separate input features, while the other sub chromosomes are used to construct sub functions by using all input features. The final solution is the weighted sum of all sub functions, and the optimal weights of sub functions are obtained by using the least squares method. In this way, the structure information can be learnt and the global search ability of GP can be maintained. Experimental results on synthetic problems with differing characteristics have demonstrated that the proposed SD-GP can perform better than several state-of-the-art GPs in terms of the success rate of finding the optimal solution and the convergence speed.
Collapse
|
15
|
Fu W, Xue B, Gao X, Zhang M. Transductive transfer learning based Genetic Programming for balanced and unbalanced document classification using different types of features. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107172] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
16
|
Improving Land Cover Classification Using Genetic Programming for Feature Construction. REMOTE SENSING 2021. [DOI: 10.3390/rs13091623] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Genetic programming (GP) is a powerful machine learning (ML) algorithm that can produce readable white-box models. Although successfully used for solving an array of problems in different scientific areas, GP is still not well known in the field of remote sensing. The M3GP algorithm, a variant of the standard GP algorithm, performs feature construction by evolving hyperfeatures from the original ones. In this work, we use the M3GP algorithm on several sets of satellite images over different countries to create hyperfeatures from satellite bands to improve the classification of land cover types. We add the evolved hyperfeatures to the reference datasets and observe a significant improvement of the performance of three state-of-the-art ML algorithms (decision trees, random forests, and XGBoost) on multiclass classifications and no significant effect on the binary classifications. We show that adding the M3GP hyperfeatures to the reference datasets brings better results than adding the well-known spectral indices NDVI, NDWI, and NBR. We also compare the performance of the M3GP hyperfeatures in the binary classification problems with those created by other feature construction methods such as FFX and EFS.
Collapse
|
17
|
Mustaqeem M, Saqib M. Principal component based support vector machine (PC-SVM): a hybrid technique for software defect detection. CLUSTER COMPUTING 2021; 24:2581-2595. [PMID: 33880074 PMCID: PMC8050160 DOI: 10.1007/s10586-021-03282-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/14/2020] [Revised: 03/11/2021] [Accepted: 03/29/2021] [Indexed: 06/12/2023]
Abstract
Defects are the major problems in the current situation and predicting them is also a difficult task. Researchers and scientists have developed many software defects prediction techniques to overcome this very helpful issue. But to some extend there is a need for an algorithm/method to predict defects with more accuracy, reduce time and space complexities. All the previous research conducted on the data without feature reduction lead to the curse of dimensionality. We brought up a machine learning hybrid approach by combining Principal component Analysis (PCA) and Support vector machines (SVM) to overcome the ongoing problem. We have employed PROMISE (CM1: 344 observations, KC1: 2109 observations) data from the directory of NASA to conduct our research. We split the dataset into training (CM1: 240 observations, KC1: 1476 observations) dataset and testing (CM1: 104 observations, KC1: 633 observations) datasets. Using PCA, we find the principal components for feature optimization which reduce the time complexity. Then, we applied SVM for classification due to very native qualities over traditional and conventional methods. We also employed the GridSearchCV method for hyperparameter tuning. In the proposed hybrid model we have found better accuracy (CM1: 95.2%, KC1: 86.6%) than other methods. The proposed model also presents higher evaluation in the terms of other criteria. As a limitation, the only problem with SVM is there is no probabilistic explanation for classification which may very rigid towards classifications. In the future, some other method may also introduce which can overcome this limitation and keep a soft probabilistic based margin for classification on the optimal hyperplane.
Collapse
Affiliation(s)
- Mohd. Mustaqeem
- CSE Department, Institute of Technology & Management (A.K.T.U), Aligarh, U.P India
| | - Mohd. Saqib
- Mathematic and Computing Department, Indian Institute of Technology (ISM), Dhanbad, Jharkhand India
| |
Collapse
|
18
|
Bi Y, Xue B, Zhang M. Genetic Programming With a New Representation to Automatically Learn Features and Evolve Ensembles for Image Classification. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:1769-1783. [PMID: 32011275 DOI: 10.1109/tcyb.2020.2964566] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Image classification is a popular task in machine learning and computer vision, but it is very challenging due to high variation crossing images. Using ensemble methods for solving image classification can achieve higher classification performance than using a single classification algorithm. However, to obtain a good ensemble, the component (base) classifiers in an ensemble should be accurate and diverse. To solve image classification effectively, feature extraction is necessary to transform raw pixels into high-level informative features. However, this process often requires domain knowledge. This article proposes an evolutionary approach based on genetic programming to automatically and simultaneously learn informative features and evolve effective ensembles for image classification. The new approach takes raw images as inputs and returns predictions of class labels based on the evolved classifiers. To achieve this, a new individual representation, a new function set, and a new terminal set are developed to allow the new approach to effectively find the best solution. More important, the solutions of the new approach can extract informative features from raw images and can automatically address the diversity issue of the ensembles. In addition, the new approach can automatically select and optimize the parameters for the classification algorithms in the ensemble. The performance of the new approach is examined on 13 different image classification datasets of varying difficulty and compared with a large number of effective methods. The results show that the new approach achieves better classification accuracy on most datasets than the competitive methods. Further analysis demonstrates that the new approach can evolve solutions with high accuracy and diversity.
Collapse
|
19
|
Si T, Miranda P, Galdino JV, Nascimento A. Grammar-based automatic programming for medical data classification: an experimental study. Artif Intell Rev 2021. [DOI: 10.1007/s10462-020-09949-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
20
|
|
21
|
Ma J, Gao X. Designing genetic programming classifiers with feature selection and feature construction. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
22
|
A Genetic Programming Strategy to Induce Logical Rules for Clinical Data Analysis. Processes (Basel) 2020. [DOI: 10.3390/pr8121565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
This paper proposes a machine learning approach dealing with genetic programming to build classifiers through logical rule induction. In this context, we define and test a set of mutation operators across from different clinical datasets to improve the performance of the proposal for each dataset. The use of genetic programming for rule induction has generated interesting results in machine learning problems. Hence, genetic programming represents a flexible and powerful evolutionary technique for automatic generation of classifiers. Since logical rules disclose knowledge from the analyzed data, we use such knowledge to interpret the results and filter the most important features from clinical data as a process of knowledge discovery. The ultimate goal of this proposal is to provide the experts in the data domain with prior knowledge (as a guide) about the structure of the data and the rules found for each class, especially to track dichotomies and inequality. The results reached by our proposal on the involved datasets have been very promising when used in classification tasks and compared with other methods.
Collapse
|
23
|
Comparison of schedule generation schemes for designing dispatching rules with genetic programming in the unrelated machines environment. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106637] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
24
|
Liang J, Wen J, Wang Z, Wang J. Evolving semantic object segmentation methods automatically by genetic programming from images and image processing operators. Soft comput 2020. [DOI: 10.1007/s00500-020-04713-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
25
|
Sahebi G, Movahedi P, Ebrahimi M, Pahikkala T, Plosila J, Tenhunen H. GeFeS: A generalized wrapper feature selection approach for optimizing classification performance. Comput Biol Med 2020; 125:103974. [PMID: 32890978 DOI: 10.1016/j.compbiomed.2020.103974] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Revised: 08/12/2020] [Accepted: 08/12/2020] [Indexed: 10/23/2022]
Abstract
In this paper, we propose a generalized wrapper-based feature selection, called GeFeS, which is based on a parallel new intelligent genetic algorithm (GA). The proposed GeFeS works properly under different numerical dataset dimensions and sizes, carefully tries to avoid overfitting and significantly enhances classification accuracy. To make the GA more accurate, robust and intelligent, we have proposed a new operator for features weighting, improved the mutation and crossover operators, and integrated nested cross-validation into the GA process to properly validate the learning model. The k-nearest neighbor (kNN) classifier is utilized to evaluate the goodness of selected features. We have evaluated the efficiency of GeFeS on various datasets selected from the UCI machine learning repository. The performance is compared with state-of-the-art classification and feature selection methods. The results demonstrate that GeFeS can significantly generalize the proposed multi-population intelligent genetic algorithm under different sizes of two-class and multi-class datasets. We have achieved the average classification accuracy of 95.83%, 97.62%, 99.02%, 98.51%, and 94.28% while reducing the number of features from 56 to 28, 34 to 18, 279 to 135, 30 to 16, and 19 to 9 under lung cancer, dermatology, arrhythmia, WDBC, and hepatitis, respectively.
Collapse
Affiliation(s)
- Golnaz Sahebi
- Department of Future Technologies, University of Turku, Turku, FI-20014, Turun yliopisto, Finland.
| | - Parisa Movahedi
- Department of Future Technologies, University of Turku, Turku, FI-20014, Turun yliopisto, Finland
| | - Masoumeh Ebrahimi
- Department of Future Technologies, University of Turku, Turku, FI-20014, Turun yliopisto, Finland; School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, SE-100 44, Stockholm, Sweden
| | - Tapio Pahikkala
- Department of Future Technologies, University of Turku, Turku, FI-20014, Turun yliopisto, Finland
| | - Juha Plosila
- Department of Future Technologies, University of Turku, Turku, FI-20014, Turun yliopisto, Finland
| | - Hannu Tenhunen
- Department of Future Technologies, University of Turku, Turku, FI-20014, Turun yliopisto, Finland; School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, SE-100 44, Stockholm, Sweden
| |
Collapse
|
26
|
Abstract
We introduce a soft computing approach for automatically selecting and combining indices from remote sensing multispectral images that can be used for classification tasks. The proposed approach is based on a Genetic-Programming (GP) framework, a technique successfully used in a wide variety of optimization problems. Through GP, it is possible to learn indices that maximize the separability of samples from two different classes. Once the indices specialized for all the pairs of classes are obtained, they are used in pixelwise classification tasks. We used the GP-based solution to evaluate complex classification problems, such as those that are related to the discrimination of vegetation types within and between tropical biomes. Using time series defined in terms of the learned spectral indices, we show that the GP framework leads to superior results than other indices that are used to discriminate and classify tropical biomes.
Collapse
|
27
|
Liang J, Liu Y, Xue Y. Preference-driven Pareto front exploitation for bloat control in genetic programming. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106254] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
28
|
Pei W, Xue B, Shang L, Zhang M. Genetic programming for high-dimensional imbalanced classification with a new fitness function and program reuse mechanism. Soft comput 2020. [DOI: 10.1007/s00500-020-05056-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
29
|
Deng S, Xie X, Yuan C, Yang L, Wu X. Numerical sensitive data recognition based on hybrid gene expression programming for active distribution networks. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106213] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
30
|
Ma J, Gao X. A filter-based feature construction and feature selection approach for classification using Genetic Programming. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.105806] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
31
|
A fast parallel genetic programming framework with adaptively weighted primitives for symbolic regression. Soft comput 2020. [DOI: 10.1007/s00500-019-04379-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
32
|
Sousa RT, Silva S, Pesquita C. Evolving knowledge graph similarity for supervised learning in complex biomedical domains. BMC Bioinformatics 2020; 21:6. [PMID: 31900127 PMCID: PMC6942314 DOI: 10.1186/s12859-019-3296-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Accepted: 11/27/2019] [Indexed: 01/22/2023] Open
Abstract
Background In recent years, biomedical ontologies have become important for describing existing biological knowledge in the form of knowledge graphs. Data mining approaches that work with knowledge graphs have been proposed, but they are based on vector representations that do not capture the full underlying semantics. An alternative is to use machine learning approaches that explore semantic similarity. However, since ontologies can model multiple perspectives, semantic similarity computations for a given learning task need to be fine-tuned to account for this. Obtaining the best combination of semantic similarity aspects for each learning task is not trivial and typically depends on expert knowledge. Results We have developed a novel approach, evoKGsim, that applies Genetic Programming over a set of semantic similarity features, each based on a semantic aspect of the data, to obtain the best combination for a given supervised learning task. The approach was evaluated on several benchmark datasets for protein-protein interaction prediction using the Gene Ontology as the knowledge graph to support semantic similarity, and it outperformed competing strategies, including manually selected combinations of semantic aspects emulating expert knowledge. evoKGsim was also able to learn species-agnostic models with different combinations of species for training and testing, effectively addressing the limitations of predicting protein-protein interactions for species with fewer known interactions. Conclusions evoKGsim can overcome one of the limitations in knowledge graph-based semantic similarity applications: the need to expertly select which aspects should be taken into account for a given application. Applying this methodology to protein-protein interaction prediction proved successful, paving the way to broader applications.
Collapse
Affiliation(s)
- Rita T Sousa
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal.
| | - Sara Silva
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
| | - Catia Pesquita
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
| |
Collapse
|
33
|
Iqbal M, Al-Sahaf H, Xue B, Zhang M. Genetic programming with transfer learning for texture image classification. Soft comput 2019. [DOI: 10.1007/s00500-019-03843-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
34
|
|
35
|
Efficient Evolutionary Learning Algorithm for Real-Time Embedded Vision Applications. ELECTRONICS 2019. [DOI: 10.3390/electronics8111367] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
This paper reports the development of an efficient evolutionary learning algorithm designed specifically for real-time embedded visual inspection applications. The proposed evolutionary learning algorithm constructs image features as a series of image transforms for image classification and is suitable for resource-limited systems. This algorithm requires only a small number of images and time for training. It does not depend on handcrafted features or manual tuning of parameters and is generalized to be versatile for visual inspection applications. This allows the system to be configured on the fly for different applications and by an operator without extensive experience. An embedded vision system, equipped with an ARM processor running Linux, is capable of performing at roughly one hundred 640 × 480 frames per second which is more than adequate for real-time visual inspection applications. As example applications, three image datasets were created to test the performance of this algorithm. The first dataset was used to demonstrate the suitability of the algorithm for visual inspection automation applications. This experiment combined two applications to make it a more challenging test. One application was for separating fertilized and unfertilized eggs. The other one was for detecting two common defects on the eggshell. Two other datasets were created for road condition classification and pavement quality evaluation. The proposed algorithm was 100% for fertilized egg detection and 98.6% for eggshell quality inspection for a combined 99.1% accuracy. It had an accuracy of 92% for the road condition classification and 100% for pavement quality evaluation.
Collapse
|
36
|
Lensen A, Xue B, Zhang M. Genetic Programming for Evolving Similarity Functions for Clustering: Representations and Analysis. EVOLUTIONARY COMPUTATION 2019; 28:531-561. [PMID: 31599651 DOI: 10.1162/evco_a_00264] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Clustering is a difficult and widely studied data mining task, with many varieties of clustering algorithms proposed in the literature. Nearly all algorithms use a similarity measure such as a distance metric (e.g., Euclidean distance) to decide which instances to assign to the same cluster. These similarity measures are generally predefined and cannot be easily tailored to the properties of a particular dataset, which leads to limitations in the quality and the interpretability of the clusters produced. In this article, we propose a new approach to automatically evolving similarity functions for a given clustering algorithm by using genetic programming. We introduce a new genetic programming-based method which automatically selects a small subset of features (feature selection) and then combines them using a variety of functions (feature construction) to produce dynamic and flexible similarity functions that are specifically designed for a given dataset. We demonstrate how the evolved similarity functions can be used to perform clustering using a graph-based representation. The results of a variety of experiments across a range of large, high-dimensional datasets show that the proposed approach can achieve higher and more consistent performance than the benchmark methods. We further extend the proposed approach to automatically produce multiple complementary similarity functions by using a multi-tree approach, which gives further performance improvements. We also analyse the interpretability and structure of the automatically evolved similarity functions to provide insight into how and why they are superior to standard distance metrics.
Collapse
Affiliation(s)
- Andrew Lensen
- Evolutionary Computation Research Group, Victoria University of Wellington, Wellington 6140, New Zealand
| | - Bing Xue
- Evolutionary Computation Research Group, Victoria University of Wellington, Wellington 6140, New Zealand
| | - Mengjie Zhang
- Evolutionary Computation Research Group, Victoria University of Wellington, Wellington 6140, New Zealand
| |
Collapse
|
37
|
|
38
|
Panda N, Majhi SK. Improved Salp Swarm Algorithm with Space Transformation Search for Training Neural Network. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2019. [DOI: 10.1007/s13369-019-04132-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
|
39
|
Kim SJ, Ha JW, Kim H, Zhang BT. Bayesian evolutionary hypernetworks for interpretable learning from high-dimensional data. Appl Soft Comput 2019. [DOI: 10.1016/j.asoc.2019.05.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
40
|
Azari S, Xue B, Zhang M, Peng L. Preprocessing Tandem Mass Spectra Using Genetic Programming for Peptide Identification. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2019; 30:1294-1307. [PMID: 31025295 DOI: 10.1007/s13361-019-02196-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Revised: 01/15/2019] [Accepted: 03/11/2019] [Indexed: 06/09/2023]
Abstract
One of the major challenges in proteomics is peptide identification from mass spectra containing high noise ratio and small number of signal (b-/y-ions) peaks. However, the accuracy and reliability of peptide identification in such highly imbalanced MS/MS data can be improved by applying a preprocessing step prior to peptide identification aiming at discriminating b-/y-ions from noise peaks in the spectra. In this study, we report a genetic programming (GP)-based preprocessing method for de-noising highly imbalanced and noisy CID MS/MS spectra. GP now becomes a popular machine learning method via automatic programming. GP preprocesses the highly noisy MS/MS spectra by classifying peaks as noise peaks or signal peaks in a binary classification manner. Meanwhile, a set of spectral fragment features based on the MS/MS fragmentation rules is extracted from the dataset to investigate their discriminating abilities by GP. A MS/MS spectral dataset containing thousands of spectra are used to train the GP model. As the GP tree-based representation has the capability for implicit feature selection during the evolutionary process, the evolved GP model with the selected features is compared with the best threshold-based method. The results show that the GP method improved the reliability of peptide identification and increased the identification rate of a de novo sequencing tool, PEAKS, to 99.4% from 80.1% achieved by the best threshold-based method. Moreover, the result of peptide identification by a database search tool, SEQUEST, using the data preprocessed by the GP method was statistically significant compared to the other methods.
Collapse
Affiliation(s)
- Samaneh Azari
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, Kelburn, 6012, New Zealand.
- School of Engineering and Computer Science, Victoria University of Wellington, PO Box 600, Wellington, 6140, New Zealand.
| | - Bing Xue
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, Kelburn, 6012, New Zealand
| | - Mengjie Zhang
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, Kelburn, 6012, New Zealand
| | - Lifeng Peng
- Centre for Biodiscovery and School of Biological Sciences, Victoria University of Wellington, Wellington, New Zealand
| |
Collapse
|
41
|
A hybrid multiple feature construction approach for classification using Genetic Programming. Appl Soft Comput 2019. [DOI: 10.1016/j.asoc.2019.04.039] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
42
|
Shaukat F, Raja G, Frangi AF. Computer-aided detection of lung nodules: a review. J Med Imaging (Bellingham) 2019. [DOI: 10.1117/1.jmi.6.2.020901] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Furqan Shaukat
- University of Engineering and Technology, Department of Electrical Engineering, Taxila
| | - Gulistan Raja
- University of Engineering and Technology, Department of Electrical Engineering, Taxila
| | - Alejandro F. Frangi
- University of Leeds Woodhouse Lane, School of Computing and School of Medicine, Leeds
| |
Collapse
|
43
|
Al-Sahaf H, Bi Y, Chen Q, Lensen A, Mei Y, Sun Y, Tran B, Xue B, Zhang M. A survey on evolutionary machine learning. J R Soc N Z 2019. [DOI: 10.1080/03036758.2019.1609052] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Affiliation(s)
- Harith Al-Sahaf
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
| | - Ying Bi
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
| | - Qi Chen
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
| | - Andrew Lensen
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
| | - Yi Mei
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
| | - Yanan Sun
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
| | - Binh Tran
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
| | - Bing Xue
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
| | - Mengjie Zhang
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
| |
Collapse
|
44
|
|
45
|
Djellab CAK, Chaker W, Hajjami Ben Ghezala H. Travel Demand Forecasting: An Evolutionary Learning Approach. PROGRESS IN ARTIFICIAL INTELLIGENCE 2019. [DOI: 10.1007/978-3-030-30241-2_51] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
46
|
|
47
|
Tran CT, Zhang M, Andreae P, Xue B, Bui LT. An effective and efficient approach to classification with incomplete data. Knowl Based Syst 2018. [DOI: 10.1016/j.knosys.2018.05.013] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
48
|
|
49
|
Brankovic A, Falsone A, Prandini M, Piroddi L. A Feature Selection and Classification Algorithm Based on Randomized Extraction of Model Populations. IEEE TRANSACTIONS ON CYBERNETICS 2018; 48:1151-1162. [PMID: 28371789 DOI: 10.1109/tcyb.2017.2682418] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
We here introduce a novel classification approach adopted from the nonlinear model identification framework, which jointly addresses the feature selection (FS) and classifier design tasks. The classifier is constructed as a polynomial expansion of the original features and a selection process is applied to find the relevant model terms. The selection method progressively refines a probability distribution defined on the model structure space, by extracting sample models from the current distribution and using the aggregate information obtained from the evaluation of the population of models to reinforce the probability of extracting the most important terms. To reduce the initial search space, distance correlation filtering is optionally applied as a preprocessing technique. The proposed method is compared to other well-known FS and classification methods on standard benchmark problems. Besides the favorable properties of the method regarding classification accuracy, the obtained models have a simple structure, easily amenable to interpretation and analysis.
Collapse
|
50
|
Epileptic MEG Spike Detection Using Statistical Features and Genetic Programming with KNN. JOURNAL OF HEALTHCARE ENGINEERING 2017; 2017:3035606. [PMID: 29118962 PMCID: PMC5651155 DOI: 10.1155/2017/3035606] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2017] [Revised: 08/06/2017] [Accepted: 09/13/2017] [Indexed: 11/18/2022]
Abstract
Epilepsy is a neurological disorder that affects millions of people worldwide. Monitoring the brain activities and identifying the seizure source which starts with spike detection are important steps for epilepsy treatment. Magnetoencephalography (MEG) is an emerging epileptic diagnostic tool with high-density sensors; this makes manual analysis a challenging task due to the vast amount of MEG data. This paper explores the use of eight statistical features and genetic programing (GP) with the K-nearest neighbor (KNN) for interictal spike detection. The proposed method is comprised of three stages: preprocessing, genetic programming-based feature generation, and classification. The effectiveness of the proposed approach has been evaluated using real MEG data obtained from 28 epileptic patients. It has achieved a 91.75% average sensitivity and 92.99% average specificity.
Collapse
|