1
|
Vahmiyan M, Kheirabadi M, Akbari E. Feature selection methods in microarray gene expression data: a systematic mapping study. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07661-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/07/2022]
|
2
|
Javid I, Ghazali R, Zulqarnain M, Hassan N. Data pre-processing for cardiovascular disease classification: A systematic literature review. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022. [DOI: 10.3233/jifs-220061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The important task in the medical field is the early detection of disease. Heart disease is one of the greatest challenging diseases in all other diseases subsequently 17.3 million people died once a year due to heart disease. A minute error in heart disease diagnosis is a risk for an individual lifespan. Precise heart disease diagnosis is consequently critical. Different approaches including data mining have been used for the prediction of heart disease. However, there are some solemn concerns related to the data quality for example inconsistencies, missing values, noise, high dimensionality, and imbalanced statistics. In order to improve the accuracy of Data Mining based prediction systems, techniques for data preparation were applied to increase the quality of the data. The foremost objective of this paper is to highlight and summarize the research work about (i) data preparation techniques mostly used, (ii) the impact of pre-processing procedures on the accuracy of a heart disease prediction system, (iii) classifier enactment with data pre-processing techniques, (4) comparison in terms of accuracy of the different pre-processing model. A systematic literature review on the use of data pre-processing in heart disease diagnosis is carried out from January 2001 to July 2021 by studying the published material. Almost 30 studies were designated and examined related to the above-mentioned benchmarks. The literature review concludes that data reduction and data cleaning pre-processing techniques are mostly used in heart disease prediction systems. Overall this study concludes that data pre-processing has improved the accuracy of models used for heart disease prediction. Some hybrid models including (ANN+CHI), (ANN+PCA), (DNN+CHI) and (SVM+PCA) have shown improved accuracy level. However, due to the lack of clarification, there is a number of limitations and challenges in order to implementing these models in the real world.
Collapse
Affiliation(s)
- Irfan Javid
- Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn, Malaysia
- Department of Computer Science & IT, University of Poonch Rawalakot, AJK, Pakistan
| | - Rozaida Ghazali
- Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn, Malaysia
| | - Muhammad Zulqarnain
- Riphah College of Computing, Riphah International University Faisalabad Campus, Pakistan
| | - Norlida Hassan
- Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn, Malaysia
| |
Collapse
|
3
|
Hosni M, Carrillo de Gea JM, Idri A, El Bajta M, Fernández Alemán JL, García-Mateos G, Abnane I. A systematic mapping study for ensemble classification methods in cardiovascular disease. Artif Intell Rev 2020. [DOI: 10.1007/s10462-020-09914-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
4
|
Missing data techniques in classification for cardiovascular dysautonomias diagnosis. Med Biol Eng Comput 2020; 58:2863-2878. [PMID: 32970269 DOI: 10.1007/s11517-020-02266-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Accepted: 09/08/2020] [Indexed: 10/23/2022]
Abstract
Missing data (MD) is a common and inevitable problem facing data mining (DM)-based decision systems in e-health since many medical historical datasets contain a huge number of missing values. Therefore, a pre-processing stage is usually required to deal with missing values before building any DM-based decision system. The purpose of this paper is to evaluate the impact of MD techniques on classification systems in cardiovascular dysautonomias diagnosis. We analyzed and compared the accuracy rates of four classification techniques: random forest (RF), support vector machines (SVM), C4.5 decision tree, and Naive Bayes (NB), using two MD techniques: deletion or imputation with k-nearest neighbors (KNN). A total of 216 experiments were therefore carried out using three missingness mechanisms (MCAR: missing completely at random, MAR: missing at random and NMAR: not missing at random), two MD techniques (deletion and KNN imputation), nine MD percentages from 10 to 90% over a dataset collected from the autonomic nervous system (ANS) unit of the University Hospital Avicenne in Morocco. The results obtained suggest that using KNN imputation rather than deletion enhances the accuracy rates of the four classifiers. Moreover, the MD percentages have a negative impact on the performance of classification techniques regardless of the MD mechanisms and MD techniques used. In fact, the accuracy rates of the four classifiers decrease as the MD percentage increases. Graphical abstract.
Collapse
|
5
|
A mapping study of ensemble classification methods in lung cancer decision support systems. Med Biol Eng Comput 2020; 58:2177-2193. [PMID: 32621068 DOI: 10.1007/s11517-020-02223-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2020] [Accepted: 06/25/2020] [Indexed: 10/23/2022]
Abstract
Achieving a high level of classification accuracy in medical datasets is a capital need for researchers to provide effective decision systems to assist doctors in work. In many domains of artificial intelligence, ensemble classification methods are able to improve the performance of single classifiers. This paper reports the state of the art of ensemble classification methods in lung cancer detection. We have performed a systematic mapping study to identify the most interesting papers concerning this topic. A total of 65 papers published between 2000 and 2018 were selected after an automatic search in four digital libraries and a careful selection process. As a result, it was observed that diagnosis was the task most commonly studied; homogeneous ensembles and decision trees were the most frequently adopted for constructing ensembles; and the majority voting rule was the predominant combination rule. Few studies considered the parameter tuning of the techniques used. These findings open several perspectives for researchers to enhance lung cancer research by addressing the identified gaps, such as investigating different classification methods, proposing other heterogeneous ensemble methods, and using new combination rules. Graphical abstract Main features of the mapping study performed in ensemble classification methods applied on lung cancer decision support systems.
Collapse
|
6
|
Chlioui I, Idri A, Abnane I. Data preprocessing in knowledge discovery in breast cancer: systematic mapping study. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING: IMAGING & VISUALIZATION 2020. [DOI: 10.1080/21681163.2020.1730974] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Affiliation(s)
- Imane Chlioui
- Software Project Management Research Team, ENSIAS, Mohammed V University, Rabat , Morocco
| | - Ali Idri
- Software Project Management Research Team, ENSIAS, Mohammed V University, Rabat , Morocco
- Complex Systems Engineering and Human Systems, University Mohammed VI Polytechnic , Ben Guerir, Morocco
| | - Ibtissam Abnane
- Software Project Management Research Team, ENSIAS, Mohammed V University, Rabat , Morocco
| |
Collapse
|
7
|
Hosni M, Abnane I, Idri A, Carrillo de Gea JM, Fernández Alemán JL. Reviewing ensemble classification methods in breast cancer. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2019; 177:89-112. [PMID: 31319964 DOI: 10.1016/j.cmpb.2019.05.019] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Revised: 05/16/2019] [Accepted: 05/18/2019] [Indexed: 05/09/2023]
Abstract
CONTEXT Ensemble methods consist of combining more than one single technique to solve the same task. This approach was designed to overcome the weaknesses of single techniques and consolidate their strengths. Ensemble methods are now widely used to carry out prediction tasks (e.g. classification and regression) in several fields, including that of bioinformatics. Researchers have particularly begun to employ ensemble techniques to improve research into breast cancer, as this is the most frequent type of cancer and accounts for most of the deaths among women. OBJECTIVE AND METHOD The goal of this study is to analyse the state of the art in ensemble classification methods when applied to breast cancer as regards 9 aspects: publication venues, medical tasks tackled, empirical and research types adopted, types of ensembles proposed, single techniques used to construct the ensembles, validation framework adopted to evaluate the proposed ensembles, tools used to build the ensembles, and optimization methods used for the single techniques. This paper was undertaken as a systematic mapping study. RESULTS A total of 193 papers that were published from the year 2000 onwards, were selected from four online databases: IEEE Xplore, ACM digital library, Scopus and PubMed. This study found that of the six medical tasks that exist, the diagnosis medical task was that most frequently researched, and that the experiment-based empirical type and evaluation-based research type were the most dominant approaches adopted in the selected studies. The homogeneous type was that most widely used to perform the classification task. With regard to single techniques, this mapping study found that decision trees, support vector machines and artificial neural networks were those most frequently adopted to build ensemble classifiers. In the case of the evaluation framework, the Wisconsin Breast Cancer dataset was the most frequently used by researchers to perform their experiments, while the most noticeable validation method was k-fold cross-validation. Several tools are available to perform experiments related to ensemble classification methods, such as Weka and R Software. Few researchers took into account the optimisation of the single technique of which their proposed ensemble was composed, while the grid search method was that most frequently adopted to tune the parameter settings of a single classifier. CONCLUSION This paper reports an in-depth study of the application of ensemble methods as regards breast cancer. Our results show that there are several gaps and issues and we, therefore, provide researchers in the field of breast cancer research with recommendations. Moreover, after analysing the papers found in this systematic mapping study, we discovered that the majority report positive results concerning the accuracy of ensemble classifiers when compared to the single classifiers. In order to aggregate the evidence reported in literature, it will, therefore, be necessary to perform a systematic literature review and meta-analysis in which an in-depth analysis could be conducted so as to confirm the superiority of ensemble classifiers over the classical techniques.
Collapse
Affiliation(s)
- Mohamed Hosni
- Software Project Management Research Team, ENSIAS, University Mohammed V of Rabat, Morocco.
| | - Ibtissam Abnane
- Software Project Management Research Team, ENSIAS, University Mohammed V of Rabat, Morocco.
| | - Ali Idri
- Software Project Management Research Team, ENSIAS, University Mohammed V of Rabat, Morocco.
| | - Juan M Carrillo de Gea
- Department of Informatics and Systems, Faculty of Computer Science, University of Murcia, Spain.
| | | |
Collapse
|
8
|
A Systematic Mapping Study of Data Preparation in Heart Disease Knowledge Discovery. J Med Syst 2018; 43:17. [PMID: 30542772 DOI: 10.1007/s10916-018-1134-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Accepted: 12/03/2018] [Indexed: 01/25/2023]
Abstract
The increasing amount of data produced by various biomedical and healthcare systems has led to a need for methodologies related to knowledge data discovery. Data mining (DM) offers a set of powerful techniques that allow the identification and extraction of relevant information from medical datasets, thus enabling doctors and patients to greatly benefit from DM, particularly in the case of diseases with high mortality and morbidity rates, such as heart disease (HD). Nonetheless, the use of raw medical data implies several challenges, such as missing data, noise, redundancy and high dimensionality, which make the extraction of useful and relevant information difficult and challenging. Intensive research has, therefore, recently begun in order to prepare raw healthcare data before knowledge extraction. In any knowledge data discovery (KDD) process, data preparation is the step prior to DM that deals with data imperfectness in order to improve its quality so as to satisfy the requirements and improve the performances of DM techniques. The objective of this paper is to perform a systematic mapping study (SMS) on data preparation for KDD in cardiology so as to provide an overview of the quantity and type of research carried out in this respect. The SMS consisted of a set of 58 selected papers published in the period January 2000 and December 2017. The selected studies were analyzed according to six criteria: year and channel of publication, preparation task, medical task, DM objective, research type and empirical type. The results show that a high amount of data preparation research was carried out in order to improve the performance of DM-based decision support systems in cardiology. Researchers were mainly interested in the data reduction preparation task and particularly in feature selection. Moreover, the majority of the selected studies focused on classification for the diagnosis of HD. Two main research types were identified in the selected studies: solution proposal and evaluation research, and the most frequently used empirical type was that of historical-based evaluation.
Collapse
|
9
|
Fernández-Sotos P, Navarro E, Torio I, Dompablo M, Fernández-Caballero A, Rodriguez-Jimenez R. Pharmacological interventions in social cognition deficits: A systematic mapping review. Psychiatry Res 2018; 270:57-67. [PMID: 30245378 DOI: 10.1016/j.psychres.2018.09.012] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/17/2018] [Revised: 08/08/2018] [Accepted: 09/07/2018] [Indexed: 12/17/2022]
Abstract
Social cognition is an important research field in psychiatry due to its relevance in the functioning and quality of life of patients. The objective of this work is to conduct a systematic mapping review of pharmacological strategies for improving social cognition deficits. Publications from 2006 to 2016 were reviewed in Scopus, PsycINFO, PubMed, and Embase. From the initial 1059 publications obtained, a final number of 110 were selected. The results show an increasing interest in pharmacological approaches in different medical fields (especially psychiatry, pharmacology, and endocrinology, with schizophrenia and autism as the most studied disorders), as can be observed in the progressive increase in the number of publications, the high degree of scientific evidence, and the high impact factor of publications. However, it is also observed that most studies were conducted with oxytocin, psychostimulants, and antipsychotics (mainly risperidone and olanzapine), with few studies using other drugs. In the different social cognition domains, the majority of publications were focused on emotional processing or theory of mind, with few studies in other domains. Thus, this systematic mapping review shows that, even though there are increasing research activities, there are some important gaps to cover in future investigation.
Collapse
Affiliation(s)
- Patricia Fernández-Sotos
- Department of Psychiatry, Instituto de Investigación Sanitaria Hospital 12 de Octubre (imas12), Madrid, Spain; CIBERSAM (Biomedical Research Networking Centre in Mental Health), Spain
| | - Elena Navarro
- CIBERSAM (Biomedical Research Networking Centre in Mental Health), Spain; Instituto de Investigación en Informática de Albacete, Albacete, Spain; Departamento de Sistemas Informáticos, Universidad de Castilla-La Mancha, Albacete, Spain
| | - Iosune Torio
- Department of Psychiatry, Instituto de Investigación Sanitaria Hospital 12 de Octubre (imas12), Madrid, Spain; CIBERSAM (Biomedical Research Networking Centre in Mental Health), Spain; Universidad Rey Juan Carlos, Madrid, Spain
| | - Mónica Dompablo
- Department of Psychiatry, Instituto de Investigación Sanitaria Hospital 12 de Octubre (imas12), Madrid, Spain; CIBERSAM (Biomedical Research Networking Centre in Mental Health), Spain
| | - Antonio Fernández-Caballero
- CIBERSAM (Biomedical Research Networking Centre in Mental Health), Spain; Instituto de Investigación en Informática de Albacete, Albacete, Spain; Departamento de Sistemas Informáticos, Universidad de Castilla-La Mancha, Albacete, Spain
| | - Roberto Rodriguez-Jimenez
- Department of Psychiatry, Instituto de Investigación Sanitaria Hospital 12 de Octubre (imas12), Madrid, Spain; CIBERSAM (Biomedical Research Networking Centre in Mental Health), Spain; CogPsy-Group, Universidad Complutense de Madrid (UCM), Madrid, Spain.
| |
Collapse
|
10
|
Idri A, Benhar H, Fernández-Alemán JL, Kadi I. A systematic map of medical data preprocessing in knowledge discovery. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2018; 162:69-85. [PMID: 29903496 DOI: 10.1016/j.cmpb.2018.05.007] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Revised: 04/25/2018] [Accepted: 05/03/2018] [Indexed: 06/08/2023]
Abstract
BACKGROUND AND OBJECTIVE Datamining (DM) has, over the last decade, received increased attention in the medical domain and has been widely used to analyze medical datasets in order to extract useful knowledge and previously unknown patterns. However, historical medical data can often comprise inconsistent, noisy, imbalanced, missing and high dimensional data. These challenges lead to a serious bias in predictive modeling and reduce the performance of DM techniques. Data preprocessing is, therefore, an essential step in knowledge discovery as regards improving the quality of data and making it appropriate and suitable for DM techniques. The objective of this paper is to review the use of preprocessing techniques in clinical datasets. METHODS We performed a systematic map of studies regarding the application of data preprocessing to healthcare and published between January 2000 and December 2017. A search string was determined on the basis of the mapping questions and the PICO categories. The search string was then applied in digital databases covering the fields of computer science and medical informatics in order to identify relevant studies. The studies were initially selected by reading their titles, abstracts and keywords. Those that were selected at that stage were then reviewed using a set of inclusion and exclusion criteria in order to eliminate any that were not relevant. This process resulted in 126 primary studies. RESULTS Selected studies were analyzed and classified according to their publication years and channels, research type, empirical type and contribution type. The findings of this mapping study revealed that researchers have paid a considerable amount of attention to preprocessing in medical DM in last decade. A significant number of the selected studies used data reduction and cleaning preprocessing tasks. Moreover, the disciplines in which preprocessing have received most attention are: cardiology, endocrinology and oncology. CONCLUSIONS Researchers should develop and implement standards for an effective integration of multiple medical data types. Moreover, we identified the need to perform literature reviews.
Collapse
Affiliation(s)
- A Idri
- Software Project Management Research Team, ENSIAS, University Mohammed V of Rabat, Morocco.
| | - H Benhar
- Software Project Management Research Team, ENSIAS, University Mohammed V of Rabat, Morocco.
| | - J L Fernández-Alemán
- Department of Informatics and Systems, Faculty of Computer Science, University of Murcia, Spain.
| | - I Kadi
- Software Project Management Research Team, ENSIAS, University Mohammed V of Rabat, Morocco.
| |
Collapse
|