1
|
Mao C, Zhu Q, Chen R, Su W. Automatic medical specialty classification based on patients' description of their symptoms. BMC Med Inform Decis Mak 2023; 23:15. [PMID: 36670382 PMCID: PMC9862953 DOI: 10.1186/s12911-023-02105-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 01/09/2023] [Indexed: 01/22/2023] Open
Abstract
In China, patients usually determine their medical specialty before they register the corresponding specialists in the hospitals. This process usually requires a lot of medical knowledge for the patients. As a result, many patients do not register the correct specialty for the first time if they do not receive help from the hospitals. In this study, we try to automatically direct the patients to the appropriate specialty based on the symptoms they described. As far as we know, this is the first study to solve the problem. We propose a neural network-based model based on a hybrid model integrated with an attention mechanism. To prove the actual effect of this hybrid model, we utilized a data set of more than 40,000 items, including eight departments, such as Otorhinolaryngology, Pediatrics, and other common departments. The experiment results show that the hybrid model achieves more than 93.5% accuracy and has a high generalization capacity, which is superior to traditional classification models.
Collapse
Affiliation(s)
- Chao Mao
- grid.469245.80000 0004 1756 4881Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, BNU-HKBU United International College, Zhuhai, 519087 China
| | - Quanjing Zhu
- grid.13291.380000 0001 0807 1581Specialty of Laboratory Medicine, West China Hospital, Sichuan University, Guoxue Lane, Wuhou District, Chengdu, 610041 China
| | - Rong Chen
- grid.412615.50000 0004 1803 6239Specialty of Rehabilitation Medicine, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, 510080 China
| | - Weifeng Su
- grid.469245.80000 0004 1756 4881Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, BNU-HKBU United International College, Zhuhai, 519087 China
| |
Collapse
|
2
|
Genomic prediction through machine learning and neural networks for traits with epistasis. Comput Struct Biotechnol J 2022; 20:5490-5499. [PMID: 36249559 PMCID: PMC9547190 DOI: 10.1016/j.csbj.2022.09.029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 09/20/2022] [Accepted: 09/20/2022] [Indexed: 11/22/2022] Open
Abstract
Performance of machine learning and neural netowrks in Genomic analysis. Heritability and QTL number impacts on performance machine learning methods. Machine learning models in genomic analyses. Neural networks can present better performance for complex quantitative traits.
Genomic wide selection (GWS) is one contributions of molecular genetics to breeding. Machine learning (ML) and artificial neural networks (ANN) methods are non-parameterized and can develop more accurate and parsimonious models for GWS analysis. Multivariate Adaptive Regression Splines (MARS) is considered one of the most flexible ML methods, automatically modeling nonlinearities and interactions of the predictor variables. This study aimed to evaluate and compare methods based on ANN, ML, including MARS, and G-BLUP through GWS. An F2 population formed by 1000 individuals and genotyped for 4010 SNP markers and twelve traits from a model considering epistatic effect, with QTL numbers ranging from eight to 480 and heritability (h2) of 0.3, 0.5 or 0.8 were simulated. Variation in heritability and number of QTL impacts the performance of methods. About quantitative traits (40, 80, 120, 240, and 480 QTLs) was observed highest R2 to Radial Base Network (RBF) and G-BLUP, followed by Random Forest (RF), Bagging (BA), and Boosting (BO). RF and BA also showed better results for traits to h2 of 0.3 with R2 values 16.51% and 16.30%, respectively, while MARS methods showed better results for oligogenic traits with R2 values ranging from 39,12 % to 43,20 % in h2 of 0.5 and from 59.92% to 78,56% in h2 of 0.8. Non-additive MARS methods also showed high R2 for traits with high heritability and 240 QTLs or more. ANN and ML methods are powerful tools to predict genetic values in traits with epistatic effect, for different degrees of heritability and QTL numbers.
Collapse
|
3
|
Malakar S, Roy SD, Das S, Sen S, Velásquez JD, Sarkar R. Computer Based Diagnosis of Some Chronic Diseases: A Medical Journey of the Last Two Decades. ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING : STATE OF THE ART REVIEWS 2022; 29:5525-5567. [PMID: 35729963 PMCID: PMC9199478 DOI: 10.1007/s11831-022-09776-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 05/22/2022] [Indexed: 06/15/2023]
Abstract
Disease prediction from diagnostic reports and pathological images using artificial intelligence (AI) and machine learning (ML) is one of the fastest emerging applications in recent days. Researchers are striving to achieve near-perfect results using advanced hardware technologies in amalgamation with AI and ML based approaches. As a result, a large number of AI and ML based methods are found in the literature. A systematic survey describing the state-of-the-art disease prediction methods, specifically chronic disease prediction algorithms, will provide a clear idea about the recent models developed in this field. This will also help the researchers to identify the research gaps present there. To this end, this paper looks over the approaches in the literature designed for predicting chronic diseases like Breast Cancer, Lung Cancer, Leukemia, Heart Disease, Diabetes, Chronic Kidney Disease and Liver Disease. The advantages and disadvantages of various techniques are thoroughly explained. This paper also presents a detailed performance comparison of different methods. Finally, it concludes the survey by highlighting some future research directions in this field that can be addressed through the forthcoming research attempts.
Collapse
Affiliation(s)
- Samir Malakar
- Department of Computer Science, Asutosh College, Kolkata, India
| | - Soumya Deep Roy
- Department of Metallurgical and Material Engineering, Jadavpur University, Kolkata, India
| | - Soham Das
- Department of Metallurgical and Material Engineering, Jadavpur University, Kolkata, India
| | - Swaraj Sen
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
| | - Juan D. Velásquez
- Departament of Industrial Engineering, University of Chile, Santiago, Chile
- Instituto Sistemas Complejos de Ingeniería (ISCI), Santiago, Chile
| | - Ram Sarkar
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
| |
Collapse
|
4
|
Chen X, Jin W, Wu Q, Zhang W, Liang H. A hybrid cost-sensitive machine learning approach for the classification of intelligent disease diagnosis. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022. [DOI: 10.3233/jifs-213486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Automatic risk classification of diseases is one of the most significant health problems in medical and healthcare domain. However, the related studies are relative scarce. In this paper, we design an intelligent diagnosis model based on optimal machine learning algorithms with rich clinical data. First, the disease risk classification problem based on machine learning is defined. Then, the K-means clustering algorithm is used to validate the class label of given data, thereby removing misclassified instances from the original dataset. Furthermore, naive Bayesian algorithm is applied to build the final classifier by using 10-fold cross-validation method. In addition, a novel class-specific attribute weighted approach is adopted to alleviate the conditional independence assumption of naive Bayes, which means we assign each disease attribute a specific weight for each class. Last but not least, a hybrid cost-sensitive disease risk classification model is formulated, and a practical example from the University of California Irvine (UCI) machine learning database is used to illustrate the potential of the proposed method. Experimental results demonstrate that the approach is competitive with the state-of-the-art classifiers.
Collapse
Affiliation(s)
- Xi Chen
- School of Economics & Management, Xidian University, Xi’an, China
| | - Wenquan Jin
- School of Economics & Management, Xidian University, Xi’an, China
| | - Qirui Wu
- School of Foreign Languages, Xidian University, China
| | - Wenbo Zhang
- School of Economics & Management, Xidian University, Xi’an, China
| | | |
Collapse
|
5
|
Kaur J, Khehra BS. Fuzzy Logic and Hybrid based Approaches for the Risk of Heart Disease Detection: State-of-the-Art Review. JOURNAL OF THE INSTITUTION OF ENGINEERS (INDIA): SERIES B 2022. [PMCID: PMC8328141 DOI: 10.1007/s40031-021-00644-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Artificial Intelligence, Machine Learning, Fuzzy Logic, Neural Network, Genetic Algorithm and their hybrid systems play vital role in the medical sciences to diagnose various diseases efficiently in the patients. The problems related to the heart are widely comon in today’s world. The risk of heart failure develops due to the narrowness and blockage in the coronary arteries of the heart as excess cholesterol deposits in the arteries and blood vessels that results in fatigue, chest pain, dyspnoea, sleeping difficulties and depression. This research aims to explore diverse work done on FL and Hybrid-based techniques to identify the risk of heart disease among the patients. The present study reveals publications along with the strength, operating system, accuracy rate and other specifications used in the identification of heart disease based on FL and Hybrid-based approaches since 2010. This survey contributes motivation for research scholars to generate more innovative ideas and continue their research work in the respective field. Moreover, the future model for direct service of the patients from old age homes to the Intensive Care Unit through ambulance services is also presented in this paper.
Collapse
Affiliation(s)
- Jagmohan Kaur
- IK Gujral Punjab Technical University, Jalandhar, Punjab India
| | | |
Collapse
|
6
|
|
7
|
Prediction of Heart Attacks Using Biological Signals Based on Recurrent GMDH Neural Network. Neural Process Lett 2022. [DOI: 10.1007/s11063-021-10667-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
8
|
Wadhawan S, Maini R. A Systematic Review on Prediction Techniques for Cardiac Disease. INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGIES AND SYSTEMS APPROACH 2022. [DOI: 10.4018/ijitsa.290001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Mortality rate can be lowered with early prediction of cardiac diseases, which is one of the major issue in healthcare industry. In comparison of traditional methods, intelligent systems have potential to predict these diseases accurately at early stage even with complex data. Various intelligent DSS are presented by researchers for predicting this disease. To study the trends of these intelligent systems, to find the effective techniques for predicting cardiac disease and to find the future directions are the objective of this study. Therefore this paper presents a systematic review on state-of-art techniques based on ML, NN and FL. For analysis, we follow PRISMA statement and considered the studies presented from 2010 to 2020 from different databases. Analysis concluded that ML based techniques are broadly used for feature selection and classification and have the potential for the prediction of cardiac diseases. The future directions are to evaluate the rarely used prediction techniques and finding the way of improving them for model generalization with better prediction accuracy.
Collapse
Affiliation(s)
- Savita Wadhawan
- Department of CSE, Punjabi University, Patiala, India & MMICTBM, MM(DU), Mullana, Ambala, India
| | - Raman Maini
- Department of CSE, Punjabi University, Patiala, India
| |
Collapse
|
9
|
Freitas SA, Nienow D, da Costa CA, Ramos GDO. Functional Coronary Artery Assessment: a Systematic Literature Review. Wien Klin Wochenschr 2021; 134:302-318. [PMID: 34870740 DOI: 10.1007/s00508-021-01970-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 10/11/2021] [Indexed: 11/28/2022]
Abstract
Cardiovascular diseases represent the number one cause of death in the world, including the most common disorders in the heart's health, namely coronary artery disease (CAD). CAD is mainly caused by fat accumulated in the arteries' internal walls, creating an atherosclerotic plaque that impacts the blood flow functional behavior. Anatomical plaque characteristics are essential but not sufficient for a complete functional assessment of CAD. In fact, plaque analysis and visual inspection alone have proven insufficient to determine the lesion severity and hemodynamic repercussion. Furthermore, the fractional flow reserve (FFR) exam, which is considered the gold standard for stenosis functional impair determination, is invasive and contains several limitations. Such a panorama evidences the need for new techniques applied to image exams to improve CAD functional assessment. In this article, we perform a systematic literature review on emerging methods determining CAD significance, thus delivering a unique base for comparing these methods, qualitatively and quantitatively. Our goal is to guide further studies with evidence from the most promising methods, highlighting the benefits from both areas. We summarize benchmarks, metrics for evaluation, and challenges already faced, thus shedding light on the requirements for a valid, meaningful, and accepted technique for functional assessment evaluation. We create a base of comparison based on quantitative and qualitative indicators and highlight the most relevant geometrical metrics that correlate with lesion significance. Finally, we point out future benchmarks based on recent literature.
Collapse
Affiliation(s)
- Samuel A Freitas
- Software Innovation Laboratory, Graduate Program in Applied Computing, Universidade do Vale do Rio dos Sinos, São Leopoldo, Brazil
| | - Débora Nienow
- Hospital de Clínicas de Porto Alegre, Porto Alegre, Brazil
| | - Cristiano A da Costa
- Software Innovation Laboratory, Graduate Program in Applied Computing, Universidade do Vale do Rio dos Sinos, São Leopoldo, Brazil
| | - Gabriel de O Ramos
- Software Innovation Laboratory, Graduate Program in Applied Computing, Universidade do Vale do Rio dos Sinos, São Leopoldo, Brazil.
| |
Collapse
|
10
|
Nikparvar B, Rahman MM, Hatami F, Thill JC. Spatio-temporal prediction of the COVID-19 pandemic in US counties: modeling with a deep LSTM neural network. Sci Rep 2021; 11:21715. [PMID: 34741093 PMCID: PMC8571358 DOI: 10.1038/s41598-021-01119-3] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Accepted: 10/20/2021] [Indexed: 12/13/2022] Open
Abstract
Prediction of complex epidemiological systems such as COVID-19 is challenging on many grounds. Commonly used compartmental models struggle to handle an epidemiological process that evolves rapidly and is spatially heterogeneous. On the other hand, machine learning methods are limited at the beginning of the pandemics due to small data size for training. We propose a deep learning approach to predict future COVID-19 infection cases and deaths 1 to 4 weeks ahead at the fine granularity of US counties. The multi-variate Long Short-term Memory (LSTM) recurrent neural network is trained on multiple time series samples at the same time, including a mobility series. Results show that adding mobility as a variable and using multiple samples to train the network improve predictive performance both in terms of bias and of variance of the forecasts. We also show that the predicted results have similar accuracy and spatial patterns with a standard ensemble model used as benchmark. The model is attractive in many respects, including the fine geographic granularity of predictions and great predictive performance several weeks ahead. Furthermore, data requirement and computational intensity are reduced by substituting a single model to multiple models folded in an ensemble model.
Collapse
Affiliation(s)
- Behnam Nikparvar
- The William States Lee College of Engineering, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC, 28223, USA
| | - Md Mokhlesur Rahman
- The William States Lee College of Engineering, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC, 28223, USA
- Department of Urban and Regional Planning, Khulna University of Engineering & Technology (KUET), Khulna, 9203, Bangladesh
| | - Faizeh Hatami
- Department of Geography and Earth Sciences, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC, 28223, USA
| | - Jean-Claude Thill
- Department of Geography and Earth Sciences, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC, 28223, USA.
- School of Data Science, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC, 28223, USA.
| |
Collapse
|
11
|
Rahman MM, Paul KC, Hossain MA, Ali GGMN, Rahman MS, Thill JC. Machine Learning on the COVID-19 Pandemic, Human Mobility and Air Quality: A Review. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2021; 9:72420-72450. [PMID: 34786314 PMCID: PMC8545207 DOI: 10.1109/access.2021.3079121] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Accepted: 05/07/2021] [Indexed: 05/19/2023]
Abstract
The ongoing COVID-19 global pandemic is touching every facet of human lives (e.g., public health, education, economy, transportation, and the environment). This novel pandemic and non-pharmaceutical interventions of lockdown and confinement implemented citywide, regionally or nationally are affecting virus transmission, people's travel patterns, and air quality. Many studies have been conducted to predict the diffusion of the COVID-19 disease, assess the impacts of the pandemic on human mobility and on air quality, and assess the impacts of lockdown measures on viral spread with a range of Machine Learning (ML) techniques. This literature review aims to analyze the results from past research to understand the interactions among the COVID-19 pandemic, lockdown measures, human mobility, and air quality. The critical review of prior studies indicates that urban form, people's socioeconomic and physical conditions, social cohesion, and social distancing measures significantly affect human mobility and COVID-19 viral transmission. During the COVID-19 pandemic, many people are inclined to use private transportation for necessary travel to mitigate coronavirus-related health problems. This review study also noticed that COVID-19 related lockdown measures significantly improve air quality by reducing the concentration of air pollutants, which in turn improves the COVID-19 situation by reducing respiratory-related sickness and deaths. It is argued that ML is a powerful, effective, and robust analytic paradigm to handle complex and wicked problems such as a global pandemic. This study also explores the spatio-temporal aspects of lockdown and confinement measures on coronavirus diffusion, human mobility, and air quality. Additionally, we discuss policy implications, which will be helpful for policy makers to take prompt actions to moderate the severity of the pandemic and improve urban environments by adopting data-driven analytic methods.
Collapse
Affiliation(s)
- Md. Mokhlesur Rahman
- The William States Lee College of EngineeringUniversity of North Carolina at CharlotteCharlotteNC28223USA
- Department of Urban and Regional PlanningKhulna University of Engineering and Technology (KUET)Khulna9203Bangladesh
| | - Kamal Chandra Paul
- Department of Electrical and Computer EngineeringThe William States Lee College of EngineeringUniversity of North Carolina at CharlotteCharlotteNC28223USA
| | - Md. Amjad Hossain
- Department of Computer Science, Mathematics and EngineeringShepherd UniversityShepherdstownWV25443USA
| | - G. G. Md. Nawaz Ali
- Department of Applied Computer ScienceUniversity of CharlestonCharlestonWV25304USA
| | - Md. Shahinoor Rahman
- Department of Earth and Environmental SciencesNew Jersey City UniversityJersey CityNJ07305USA
| | - Jean-Claude Thill
- Department of Geography and Earth SciencesSchool of Data ScienceUniversity of North Carolina at CharlotteCharlotteNC28223USA
| |
Collapse
|
12
|
Role of machine learning in management of degenerative spondylolisthesis: a systematic review. CURRENT ORTHOPAEDIC PRACTICE 2021. [DOI: 10.1097/bco.0000000000000992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
13
|
Yavari A, Rajabzadeh A, Abdali-Mohammadi F. Profile-based assessment of diseases affective factors using fuzzy association rule mining approach: A case study in heart diseases. J Biomed Inform 2021; 116:103695. [PMID: 33549658 DOI: 10.1016/j.jbi.2021.103695] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2020] [Revised: 12/15/2020] [Accepted: 02/01/2021] [Indexed: 10/22/2022]
Abstract
The existing data mining solutions to identify risk factors associated with diseases are burdened with quite a few shortcomings. They usually use crisp partitions for numerical features and also do not use patient-specific profiles. These shortcomings create limitations for solving real problems. Discretizing a numerical feature through crisp partitions can also generate substantial partitioning errors, particularly for features whose values are closer to crisp boundaries. Since the normal range of each numerical feature varies according to the age, gender, and medical conditions of the patients, then ignoring these differences can undermine the accuracy of the extracted itemsets and rules. This paper presents a profile-based fuzzy association rule mining (PB-FARM) approach for the assessment of risk factors highly correlated with diseases. The proposed approach has three phases. Phase I involves creating profiles for patients based on their age, gender, and medical conditions, to determine a normal range of each numerical feature. Then fuzzy partitioning is done for all features (namely, numerical and categorical), and consequently, a structure, called FirstScan, is created. In Phase II, the FirstScan structure is utilized to mine for large fuzzy k-itemsets. Ultimately, in Phase III, the given k-itemsets are employed to generate fuzzy rules for associations between risk factors and diseases. To evaluate the performance of the proposed method the Z-Alizadeh Sani coronary artery disease (CAD) dataset, containing 303 records and 54 features, was used. The results show a positive correlation between typical chest pain and old age with the incidence of CAD. The comparisons made in this study showed that, firstly, the proposed algorithm has a higher partitioning accuracy than other methods, and secondly, it has a reasonably short execution time.
Collapse
Affiliation(s)
- Ali Yavari
- Department of Electrical and Computer Engineering, Razi University, Kermanshah, Iran.
| | - Amir Rajabzadeh
- Department of Electrical and Computer Engineering, Razi University, Kermanshah, Iran.
| | | |
Collapse
|
14
|
Patro SP, Nayak GS, Padhy N. Heart disease prediction by using novel optimization algorithm: A supervised learning prospective. INFORMATICS IN MEDICINE UNLOCKED 2021. [DOI: 10.1016/j.imu.2021.100696] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
|
15
|
Benhar H, Idri A, Fernández-Alemán JL. Data preprocessing for heart disease classification: A systematic literature review. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2020; 195:105635. [PMID: 32652383 DOI: 10.1016/j.cmpb.2020.105635] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2019] [Accepted: 06/24/2020] [Indexed: 06/11/2023]
Abstract
CONTEXT Early detection of heart disease is an important challenge since 17.3 million people yearly lose their lives due to heart diseases. Besides, any error in diagnosis of cardiac disease can be dangerous and risks an individual's life. Accurate diagnosis is therefore critical in cardiology. Data Mining (DM) classification techniques have been used to diagnosis heart diseases but still limited by some challenges of data quality such as inconsistencies, noise, missing data, outliers, high dimensionality and imbalanced data. Data preprocessing (DP) techniques were therefore used to prepare data with the goal of improving the performance of heart disease DM based prediction systems. OBJECTIVE The purpose of this study is to review and summarize the current evidence on the use of preprocessing techniques in heart disease classification as regards: (1) the DP tasks and techniques most frequently used, (2) the impact of DP tasks and techniques on the performance of classification in cardiology, (3) the overall performance of classifiers when using DP techniques, and (4) comparisons of different combinations classifier-preprocessing in terms of accuracy rate. METHOD A systematic literature review is carried out, by identifying and analyzing empirical studies on the application of data preprocessing in heart disease classification published in the period between January 2000 and June 2019. A total of 49 studies were therefore selected and analyzed according to the aforementioned criteria. RESULTS The review results show that data reduction is the most used preprocessing task in cardiology, followed by data cleaning. In general, preprocessing either maintained or improved the performance of heart disease classifiers. Some combinations such as (ANN + PCA), (ANN + CHI) and (SVM + PCA) are promising terms of accuracy. However the deployment of these models in real-world diagnosis decision support systems is subject to several risks and limitations due to the lack of interpretation.
Collapse
Affiliation(s)
- H Benhar
- Software Project Management Research Team, ENSIAS, University Mohammed V in Rabat, Morocco.
| | - A Idri
- Software Project Management Research Team, ENSIAS, University Mohammed V in Rabat, Morocco; CSEHS-MSDA, Mohammed VI Polytechnic University, Benguerir, Morocco.
| | - J L Fernández-Alemán
- Department of Informatics and Systems, Faculty of Computer Science, University of Murcia, Spain.
| |
Collapse
|
16
|
A Novel Approach for Coronary Artery Disease Diagnosis using Hybrid Particle Swarm Optimization based Emotional Neural Network. Biocybern Biomed Eng 2020. [DOI: 10.1016/j.bbe.2020.09.005] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
17
|
Using Machine Learning Classifiers to Recognize the Mixture Control Chart Patterns for a Multiple-Input Multiple-Output Process. MATHEMATICS 2020. [DOI: 10.3390/math8010102] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
A statistical process control (SPC) chart is one of the most important techniques for monitoring a process. Typically, a certain root cause or a disturbance in a process would result in the presence of a systematic control chart pattern (CCP). Consequently, the effective recognition of CCPs has received considerable attention in recent years for their potential use in improving process quality. However, most studies have focused on the recognition of CCPs for SPC applications alone. Specifically, even though numerous studies have addressed the increased use of the SPC and engineering process control (EPC) mechanisms, very little research has discussed the recognition of CCPs for multiple-input multiple-output (MIMO) systems. It is much more difficult to recognize the CCPs of an MIMO system since two or more disturbances are simultaneously involved in the process. The purpose of this study is thus to propose several machine learning (ML) classifiers to overcome the difficulties in recognizing CCPs in MIMO systems. Because of their efficient and fast algorithms and effective classification performance, the considered ML classifiers include an artificial neural network (ANN), support vector machine (SVM), extreme learning machine (ELM), and multivariate adaptive regression splines (MARS). Furthermore, one problem may arise due to the existence of embedded mixture CCPs (MCCPs) in MIMO systems. In contrast to using typical process outputs alone in a classifier, this study employs both process outputs and EPC compensation to ensure the effectiveness of CCP recognition. Experimental results reveal that the proposed classifiers are able to effectively recognize MCCPs for MIMO systems.
Collapse
|
18
|
Magesh G, Swarnalatha P. Optimal feature selection through a cluster-based DT learning (CDTL) in heart disease prediction. EVOLUTIONARY INTELLIGENCE 2020. [DOI: 10.1007/s12065-019-00336-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
19
|
Liu J, Bai M, Jiang N, Yu D. A novel measure of attribute significance with complexity weight. Appl Soft Comput 2019. [DOI: 10.1016/j.asoc.2019.105543] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
20
|
Special Issue on Using Machine Learning Algorithms in the Prediction of Kyphosis Disease: A Comparative Study. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9163322] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Machine learning (ML) is the technology that allows a computer system to learn from the environment, through re-iterative processes, and improve itself from experience. Recently, machine learning has gained massive attention across numerous fields, and is making it easy to model data extremely well, without the importance of using strong assumptions about the modeled system. The rise of machine learning has proven to better describe data as a result of providing both engineering solutions and an important benchmark. Therefore, in this current research work, we applied three different machine learning algorithms, which were, the Random Forest (RF), Support Vector Machines (SVM), and Artificial Neural Network (ANN) to predict kyphosis disease based on a biomedical data. At the initial stage of the experiments, we performed 5- and 10-Fold Cross-Validation using Logistic Regression as a baseline model to compare with our ML models without performing grid search. We then evaluated the models and compared their performances based on 5- and 10-Fold Cross-Validation after running grid search algorithms on the ML models. Among the Support Vector Machines, we experimented with the three kernels (Linear, Radial Basis Function (RBF), Polynomial). We observed overall accuracies of the models between 79%–85%, and 77%–86% based on the 5- and 10-Fold Cross-Validation, after running grid search respectively. Based on the 5- and 10-Fold Cross-Validation as evaluation metrics, the RF, SVM-RBF, and ANN models achieved accuracies more than 80%. The RF, SVM-RBF and ANN models outperformed the baseline model based on the 10-Fold Cross-Validation with grid search. Overall, in terms of accuracies, the ANN model outperformed all the other ML models, achieving 85.19% and 86.42% based on the 5- and 10-Fold Cross-Validation. We proposed that RF, SVM-RBF and ANN models should be used to detect and predict kyphosis disease after a patient had undergone surgery or operation. We suggest that machine learning should be adopted and used as an essential and critical tool across the maximum spectrum of answering biomedical questions.
Collapse
|
21
|
Mohan S, Thirumalai C, Srivastava G. Effective Heart Disease Prediction Using Hybrid Machine Learning Techniques. IEEE ACCESS 2019; 7:81542-81554. [DOI: 10.1109/access.2019.2923707] [Citation(s) in RCA: 180] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/29/2023]
|
22
|
A Systematic Mapping Study of Data Preparation in Heart Disease Knowledge Discovery. J Med Syst 2018; 43:17. [PMID: 30542772 DOI: 10.1007/s10916-018-1134-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Accepted: 12/03/2018] [Indexed: 01/25/2023]
Abstract
The increasing amount of data produced by various biomedical and healthcare systems has led to a need for methodologies related to knowledge data discovery. Data mining (DM) offers a set of powerful techniques that allow the identification and extraction of relevant information from medical datasets, thus enabling doctors and patients to greatly benefit from DM, particularly in the case of diseases with high mortality and morbidity rates, such as heart disease (HD). Nonetheless, the use of raw medical data implies several challenges, such as missing data, noise, redundancy and high dimensionality, which make the extraction of useful and relevant information difficult and challenging. Intensive research has, therefore, recently begun in order to prepare raw healthcare data before knowledge extraction. In any knowledge data discovery (KDD) process, data preparation is the step prior to DM that deals with data imperfectness in order to improve its quality so as to satisfy the requirements and improve the performances of DM techniques. The objective of this paper is to perform a systematic mapping study (SMS) on data preparation for KDD in cardiology so as to provide an overview of the quantity and type of research carried out in this respect. The SMS consisted of a set of 58 selected papers published in the period January 2000 and December 2017. The selected studies were analyzed according to six criteria: year and channel of publication, preparation task, medical task, DM objective, research type and empirical type. The results show that a high amount of data preparation research was carried out in order to improve the performance of DM-based decision support systems in cardiology. Researchers were mainly interested in the data reduction preparation task and particularly in feature selection. Moreover, the majority of the selected studies focused on classification for the diagnosis of HD. Two main research types were identified in the selected studies: solution proposal and evaluation research, and the most frequently used empirical type was that of historical-based evaluation.
Collapse
|
23
|
Narayan S, Sathiyamoorthy E. A novel recommender system based on FFT with machine learning for predicting and identifying heart diseases. Neural Comput Appl 2018. [DOI: 10.1007/s00521-018-3662-3] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
24
|
Alizadehsani R, Hosseini MJ, Khosravi A, Khozeimeh F, Roshanzamir M, Sarrafzadegan N, Nahavandi S. Non-invasive detection of coronary artery disease in high-risk patients based on the stenosis prediction of separate coronary arteries. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2018; 162:119-127. [PMID: 29903478 DOI: 10.1016/j.cmpb.2018.05.009] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/18/2017] [Revised: 04/24/2018] [Accepted: 05/03/2018] [Indexed: 05/10/2023]
Abstract
BACKGROUND AND OBJECTIVE Cardiovascular diseases are an extremely widespread sickness and account for 17 million deaths in the world per annum. Coronary artery disease (CAD) is one of such diseases with an annual mortality rate of about 7 million. Thus, early diagnosis of CAD is of vital importance. Angiography is currently the modality of choice for the detection of CAD. However, its complications and costs have prompted researchers to seek alternative methods via machine learning algorithms. METHODS The present study proposes a novel machine learning algorithm. The proposed algorithm uses three classifiers for detection of the stenosis of three coronary arteries, i.e., left anterior descending (LAD), left circumflex (LCX) and right coronary artery (RCA) to get higher accuracy for CAD diagnosis. RESULTS This method was applied on the extension of Z-Alizadeh Sani dataset which contains demographic, examination, ECG, and laboratory and echo data of 500 patients. This method achieves an accuracy, sensitivity and specificity rates of 96.40%, 100% and 88.1%, respectively for the detection of CAD. To our knowledge, such high rates of accuracy and sensitivity have not been attained elsewhere before. CONCLUSION This new algorithm reliably distinguishes those with normal coronary arteries from those with CAD which may obviate the need for angiography in the normal group.
Collapse
Affiliation(s)
- Roohallah Alizadehsani
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Victoria 3217, Australia
| | - Mohammad Javad Hosseini
- Department of Computer Science and Engineering, University of Washington, Seattle, United States
| | - Abbas Khosravi
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Victoria 3217, Australia.
| | - Fahime Khozeimeh
- Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Mohamad Roshanzamir
- Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan, Iran
| | - Nizal Sarrafzadegan
- Isfahan Cardiovascular Research Center, Cardiovascular Research Institute, Isfahan University of Medical Sciences,Isfahan,Iran & Faculty of Medicine, SPPH, University of British Columbia, Vancouver,BC, Canada
| | - Saeid Nahavandi
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Victoria 3217, Australia
| |
Collapse
|
25
|
Idri A, Benhar H, Fernández-Alemán JL, Kadi I. A systematic map of medical data preprocessing in knowledge discovery. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2018; 162:69-85. [PMID: 29903496 DOI: 10.1016/j.cmpb.2018.05.007] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Revised: 04/25/2018] [Accepted: 05/03/2018] [Indexed: 06/08/2023]
Abstract
BACKGROUND AND OBJECTIVE Datamining (DM) has, over the last decade, received increased attention in the medical domain and has been widely used to analyze medical datasets in order to extract useful knowledge and previously unknown patterns. However, historical medical data can often comprise inconsistent, noisy, imbalanced, missing and high dimensional data. These challenges lead to a serious bias in predictive modeling and reduce the performance of DM techniques. Data preprocessing is, therefore, an essential step in knowledge discovery as regards improving the quality of data and making it appropriate and suitable for DM techniques. The objective of this paper is to review the use of preprocessing techniques in clinical datasets. METHODS We performed a systematic map of studies regarding the application of data preprocessing to healthcare and published between January 2000 and December 2017. A search string was determined on the basis of the mapping questions and the PICO categories. The search string was then applied in digital databases covering the fields of computer science and medical informatics in order to identify relevant studies. The studies were initially selected by reading their titles, abstracts and keywords. Those that were selected at that stage were then reviewed using a set of inclusion and exclusion criteria in order to eliminate any that were not relevant. This process resulted in 126 primary studies. RESULTS Selected studies were analyzed and classified according to their publication years and channels, research type, empirical type and contribution type. The findings of this mapping study revealed that researchers have paid a considerable amount of attention to preprocessing in medical DM in last decade. A significant number of the selected studies used data reduction and cleaning preprocessing tasks. Moreover, the disciplines in which preprocessing have received most attention are: cardiology, endocrinology and oncology. CONCLUSIONS Researchers should develop and implement standards for an effective integration of multiple medical data types. Moreover, we identified the need to perform literature reviews.
Collapse
Affiliation(s)
- A Idri
- Software Project Management Research Team, ENSIAS, University Mohammed V of Rabat, Morocco.
| | - H Benhar
- Software Project Management Research Team, ENSIAS, University Mohammed V of Rabat, Morocco.
| | - J L Fernández-Alemán
- Department of Informatics and Systems, Faculty of Computer Science, University of Murcia, Spain.
| | - I Kadi
- Software Project Management Research Team, ENSIAS, University Mohammed V of Rabat, Morocco.
| |
Collapse
|
26
|
Saqlain SM, Sher M, Shah FA, Khan I, Ashraf MU, Awais M, Ghani A. Fisher score and Matthews correlation coefficient-based feature subset selection for heart disease diagnosis using support vector machines. Knowl Inf Syst 2018. [DOI: 10.1007/s10115-018-1185-y] [Citation(s) in RCA: 56] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
27
|
Nilashi M, Ibrahim OB, Ahmadi H, Shahmoradi L. An analytical method for diseases prediction using machine learning techniques. Comput Chem Eng 2017. [DOI: 10.1016/j.compchemeng.2017.06.011] [Citation(s) in RCA: 93] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
28
|
Optimal feature selection using a modified differential evolution algorithm and its effectiveness for prediction of heart disease. Comput Biol Med 2017; 90:125-136. [PMID: 28987988 DOI: 10.1016/j.compbiomed.2017.09.011] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2017] [Revised: 09/15/2017] [Accepted: 09/16/2017] [Indexed: 10/18/2022]
Abstract
Enormous data growth in multiple domains has posed a great challenge for data processing and analysis techniques. In particular, the traditional record maintenance strategy has been replaced in the healthcare system. It is vital to develop a model that is able to handle the huge amount of e-healthcare data efficiently. In this paper, the challenging tasks of selecting critical features from the enormous set of available features and diagnosing heart disease are carried out. Feature selection is one of the most widely used pre-processing steps in classification problems. A modified differential evolution (DE) algorithm is used to perform feature selection for cardiovascular disease and optimization of selected features. Of the 10 available strategies for the traditional DE algorithm, the seventh strategy, which is represented by DE/rand/2/exp, is considered for comparative study. The performance analysis of the developed modified DE strategy is given in this paper. With the selected critical features, prediction of heart disease is carried out using fuzzy AHP and a feed-forward neural network. Various performance measures of integrating the modified differential evolution algorithm with fuzzy AHP and a feed-forward neural network in the prediction of heart disease are evaluated in this paper. The accuracy of the proposed hybrid model is 83%, which is higher than that of some other existing models. In addition, the prediction time of the proposed hybrid model is also evaluated and has shown promising results.
Collapse
|
29
|
Kadi I, Idri A, Fernandez-Aleman JL. Systematic mapping study of data mining–based empirical studies in cardiology. Health Informatics J 2017; 25:741-770. [DOI: 10.1177/1460458217717636] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Data mining provides the methodology and technology to transform huge amount of data into useful information for decision making. It is a powerful process to extract knowledge and discover new patterns embedded in large data sets. Data mining has been increasingly used in medicine, particularly in cardiology. In fact, data mining applications can greatly benefits all parts involved in cardiology such as patients, cardiologists and nurses. This article aims to perform a systematic mapping study so as to analyze and synthesize empirical studies on the application of data mining techniques in cardiology. A total of 142 articles published between 2000 and 2015 were therefore selected, studied and analyzed according to the four following criteria: year and channel of publication, research type, medical task and empirical type. The results of this mapping study are discussed and a list of recommendations for researchers and cardiologists is provided.
Collapse
Affiliation(s)
| | - Ali Idri
- Mohammed V University in Rabat, Morocco
| | | |
Collapse
|
30
|
Diagnosis of heart disease using genetic algorithm based trained recurrent fuzzy neural networks. ACTA ACUST UNITED AC 2017. [DOI: 10.1016/j.procs.2017.11.283] [Citation(s) in RCA: 84] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
31
|
Kadi I, Idri A, Fernandez-Aleman J. Knowledge discovery in cardiology: A systematic literature review. Int J Med Inform 2017; 97:12-32. [DOI: 10.1016/j.ijmedinf.2016.09.005] [Citation(s) in RCA: 49] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2016] [Revised: 09/01/2016] [Accepted: 09/11/2016] [Indexed: 11/24/2022]
|
32
|
Shao YE, Chiu CC. Applying emerging soft computing approaches to control chart pattern recognition for an SPC–EPC process. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2016.04.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
33
|
|
34
|
Vallejos de Schatz CH, Schneider FK, Abatti PJ, Nievola JC. Dynamic Fuzzy-Neural based tool formonitoring and predicting patients conditions using selected vital signs. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2015. [DOI: 10.3233/ifs-151537] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Cecilia H. Vallejos de Schatz
- Graduate Schools of Electrical Engineering and Applied Computer Science, Federal Technological University of Parana (UTFPR), Avenida Sete de Setembro, Curitiba, Paraná, Brazil
| | - Fabio K. Schneider
- Graduate Schools of Electrical Engineering and Applied Computer Science, Federal Technological University of Parana (UTFPR), Avenida Sete de Setembro, Curitiba, Paraná, Brazil
| | - Paulo J. Abatti
- Graduate Schools of Electrical Engineering and Applied Computer Science, Federal Technological University of Parana (UTFPR), Avenida Sete de Setembro, Curitiba, Paraná, Brazil
| | - Julio C. Nievola
- Post-Graduate Program in Informatics, Pontifical Catholic University of Parana (PUCPR), Rua Imaculada Conceição, Curitiba, Paraná, Brazil
| |
Collapse
|
35
|
Knowledge mining from clinical datasets using rough sets and backpropagation neural network. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2015; 2015:460189. [PMID: 25821508 PMCID: PMC4364360 DOI: 10.1155/2015/460189] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Accepted: 01/12/2015] [Indexed: 11/18/2022]
Abstract
The availability of clinical datasets and knowledge mining methodologies encourages the researchers to pursue research in extracting knowledge from clinical datasets. Different data mining techniques have been used for mining rules, and mathematical models have been developed to assist the clinician in decision making. The objective of this research is to build a classifier that will predict the presence or absence of a disease by learning from the minimal set of attributes that has been extracted from the clinical dataset. In this work rough set indiscernibility relation method with backpropagation neural network (RS-BPNN) is used. This work has two stages. The first stage is handling of missing values to obtain a smooth data set and selection of appropriate attributes from the clinical dataset by indiscernibility relation method. The second stage is classification using backpropagation neural network on the selected reducts of the dataset. The classifier has been tested with hepatitis, Wisconsin breast cancer, and Statlog heart disease datasets obtained from the University of California at Irvine (UCI) machine learning repository. The accuracy obtained from the proposed method is 97.3%, 98.6%, and 90.4% for hepatitis, breast cancer, and heart disease, respectively. The proposed system provides an effective classification model for clinical datasets.
Collapse
|
36
|
Salari N, Shohaimi S, Najafi F, Nallappan M, Karishnarajah I. A novel hybrid classification model of genetic algorithms, modified k-Nearest Neighbor and developed backpropagation neural network. PLoS One 2014; 9:e112987. [PMID: 25419659 PMCID: PMC4242540 DOI: 10.1371/journal.pone.0112987] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2013] [Accepted: 10/21/2014] [Indexed: 11/18/2022] Open
Abstract
Among numerous artificial intelligence approaches, k-Nearest Neighbor algorithms, genetic algorithms, and artificial neural networks are considered as the most common and effective methods in classification problems in numerous studies. In the present study, the results of the implementation of a novel hybrid feature selection-classification model using the above mentioned methods are presented. The purpose is benefitting from the synergies obtained from combining these technologies for the development of classification models. Such a combination creates an opportunity to invest in the strength of each algorithm, and is an approach to make up for their deficiencies. To develop proposed model, with the aim of obtaining the best array of features, first, feature ranking techniques such as the Fisher's discriminant ratio and class separability criteria were used to prioritize features. Second, the obtained results that included arrays of the top-ranked features were used as the initial population of a genetic algorithm to produce optimum arrays of features. Third, using a modified k-Nearest Neighbor method as well as an improved method of backpropagation neural networks, the classification process was advanced based on optimum arrays of the features selected by genetic algorithms. The performance of the proposed model was compared with thirteen well-known classification models based on seven datasets. Furthermore, the statistical analysis was performed using the Friedman test followed by post-hoc tests. The experimental findings indicated that the novel proposed hybrid model resulted in significantly better classification performance compared with all 13 classification methods. Finally, the performance results of the proposed model was benchmarked against the best ones reported as the state-of-the-art classifiers in terms of classification accuracy for the same data sets. The substantial findings of the comprehensive comparative study revealed that performance of the proposed model in terms of classification accuracy is desirable, promising, and competitive to the existing state-of-the-art classification models.
Collapse
Affiliation(s)
- Nader Salari
- Department of Biology, Faculty of Science, University Putra Malaysia, Serdang, Selangor, Malaysia
- Department of Biostatistics and Epidemiology, School of Public Health, Kermanshah University of Medical Sciences, Kermanshah, Iran
- * E-mail:
| | - Shamarina Shohaimi
- Department of Biology, Faculty of Science, University Putra Malaysia, Serdang, Selangor, Malaysia
| | - Farid Najafi
- Department of Biostatistics and Epidemiology, School of Public Health, Kermanshah University of Medical Sciences, Kermanshah, Iran
| | - Meenakshii Nallappan
- Department of Biology, Faculty of Science, University Putra Malaysia, Serdang, Selangor, Malaysia
| | | |
Collapse
|