1
|
Wu Q, Ye F, Gu Q, Shao F, Long X, Zhan Z, Zhang J, He J, Zhang Y, Xiao Q. A customised down-sampling machine learning approach for sepsis prediction. Int J Med Inform 2024; 184:105365. [PMID: 38350181 DOI: 10.1016/j.ijmedinf.2024.105365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 12/17/2023] [Accepted: 01/29/2024] [Indexed: 02/15/2024]
Abstract
OBJECTIVE Sepsis is a life-threatening condition in the ICU and requires treatment in time. Despite the accuracy of existing sepsis prediction models, insufficient focus on reducing alarms could worsen alarm fatigue and desensitisation in ICUs, potentially compromising patient safety. In this retrospective study, we aim to develop an accurate, robust, and readily deployable method in ICUs, only based on the vital signs and laboratory tests. METHODS Our method consists of a customised down-sampling process and a specific dynamic sliding window and XGBoost to offer sepsis prediction. The down-sampling process was applied to the retrospective data for training the XGBoost model. During the testing stage, the dynamic sliding window and the trained XGBoost were used to predict sepsis on the retrospective datasets, PhysioNet and FHC. RESULTS With the filtered data from PhysioNet, our method achieved 80.74% accuracy (77.90% sensitivity and 84.42% specificity) and 83.95% (84.82% sensitivity and 82.00% specificity) on the test set of PhysioNet-A and PhysioNet-B, respectively. The AUC score was 0.89 for both datasets. On the FHC dataset, our method achieved 92.38% accuracy (88.37% sensitivity and 95.16% specificity) and 0.98 AUC score on the test set of FHC. CONCLUSION Our results indicate that the down-sampling process and the dynamic sliding window with XGBoost brought robust and accurate performance to give sepsis prediction under various hospital settings. The localisation and robustness of our method can assist in sepsis diagnosis in different ICU settings.
Collapse
Affiliation(s)
- Qinhao Wu
- Apriko Research, Eindhoven, the Netherlands; Department of Mathematics and Computer Science, Eindhoven University of Technology, De Zaale, Eindhoven, 5612 AZ, Noord Brabant, the Netherlands
| | - Fei Ye
- Apriko Research, Eindhoven, the Netherlands
| | - Qianqian Gu
- Digital, Data and Informatics, Natural History Museum, London, SW7 5BD, United Kingdom
| | - Feng Shao
- Apriko Research, Eindhoven, the Netherlands
| | - Xi Long
- Department of Electrical Engineering, Eindhoven University of Technology, De Zaale, Eindhoven, 5612 AZ, Noord Brabant, the Netherlands
| | - Zhuozhao Zhan
- Department of Mathematics and Computer Science, Eindhoven University of Technology, De Zaale, Eindhoven, 5612 AZ, Noord Brabant, the Netherlands
| | - Junjie Zhang
- E.N.T. Department, the First Hospital of Changsha, University of South China, Changsha, 410005, China
| | - Jun He
- Department of Critical Care Medicine, the First Hospital of Changsha, University of South China, Changsha, 410005, China
| | - Yangzhou Zhang
- Department of Critical Care Medicine, Xiangya Hospital, Central South University, Changsha, Changsha, 410008, China.
| | - Quan Xiao
- E.N.T. Department, the First Hospital of Changsha, University of South China, Changsha, 410005, China.
| |
Collapse
|
2
|
Li K, Wang Z, Zhou Y, Li S. Lung adenocarcinoma identification based on hybrid feature selections and attentional convolutional neural networks. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024; 21:2991-3015. [PMID: 38454716 DOI: 10.3934/mbe.2024133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/09/2024]
Abstract
Lung adenocarcinoma, a chronic non-small cell lung cancer, needs to be detected early. Tumor gene expression data analysis is effective for early detection, yet its challenges lie in a small sample size, high dimensionality, and multi-noise characteristics. In this study, we propose a lung adenocarcinoma convolutional neural network (LATCNN), a deep learning model tailored for accurate lung adenocarcinoma prediction and identification of key genes. During the feature selection stage, we introduce a hybrid algorithm. Initially, the fast correlation-based filter (FCBF) algorithm swiftly filters out irrelevant features, followed by applying the k-means-synthetic minority over-sampling technique (k-means-SMOTE) method to address category imbalance. Subsequently, we enhance the particle swarm optimization (PSO) algorithm by incorporating fast-decay dynamic inertia weights and utilizing the classification and regression tree (CART) as the fitness function for the second stage of feature selection, aiming to further eliminate redundant features. In the classifier construction stage, we present an attention convolutional neural network (atCNN) that incorporates an attention mechanism. This improved model conducts feature selection post lung adenocarcinoma gene expression data analysis for classification and prediction. The results show that LATCNN effectively reduces the feature dimensions and accurately identifies 12 key genes with accuracy, recall, F1 score, and MCC of 99.70%, 99.33%, 99.98%, and 98.67%, respectively. These performance metrics surpass those of other comparative models, highlighting the significance of this research for advancing lung adenocarcinoma treatment.
Collapse
Affiliation(s)
- Kunpeng Li
- School of Information Engineering, Gansu University of Chinese Medicine, Lanzhou 730000, China
| | - Zepeng Wang
- School of Information Engineering, Gansu University of Chinese Medicine, Lanzhou 730000, China
| | - Yu Zhou
- School of Information Engineering, Gansu University of Chinese Medicine, Lanzhou 730000, China
| | - Sihai Li
- School of Information Engineering, Gansu University of Chinese Medicine, Lanzhou 730000, China
| |
Collapse
|
3
|
Mohseni N, Ghaniee Zarich M, Afshar S, Hosseini M. Identification of Novel Biomarkers for Response to Preoperative Chemoradiation in Locally Advanced Rectal Cancer with Genetic Algorithm-Based Gene Selection. J Gastrointest Cancer 2023; 54:937-950. [PMID: 36534304 DOI: 10.1007/s12029-022-00873-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/05/2022] [Indexed: 12/23/2022]
Abstract
BACKGROUND The conventional treatment for patients with locally advanced colorectal tumors is preoperative chemo-radiotherapy (PCRT) preceding surgery. This treatment strategy has some long-term side effects, and some patients do not respond to it. Therefore, an evaluation of biomarkers that may help predict patients' response to PCRT is essential. METHODS We took advantage of genetic algorithm to search the space of possible combinations of features to choose subsets of genes that would yield convenient performance in differentiating PCRT responders from non-responders using a logistic regression model as our classifier. RESULTS We developed two gene signatures; first, to achieve the maximum prediction accuracy, the algorithm yielded 39 genes, and then, aiming to reduce the feature numbers as much as possible (while maintaining acceptable performance), a 5-gene signature was chosen. The performance of the two gene signatures was (accuracy = 0.97 and 0.81, sensitivity = 0.96 and 0.83, and specificity = 86 and 0.77) using a logistic regression classifier. Through analyzing bias and variance decomposition of the model error, we further investigated the involved genes by discovering and validating another 28-gene signature which possibly points towards two different sub-systems involved in the response of the patients to treatment. CONCLUSIONS Using genetic algorithm as our gene selection method, we have identified two groups of genes that can differentiate PCRT responders from non-responders in patients of the studied dataset with considerable performance. IMPACT After passing standard requirements, our gene signatures may be applicable as a robust and effective PCRT response prediction tool for colorectal cancer patients in clinical settings and may also help future studies aiming to further investigate involved pathways gain a clearer picture for the course of their research.
Collapse
Affiliation(s)
- Nima Mohseni
- Department of Biology, Faculty of Science, Lund University, Skåne, Sweden
| | | | - Saeid Afshar
- Research Center for Molecular Medicine, Hamadan University of Medical Sciences, Hamadan, Iran.
| | | |
Collapse
|
4
|
Zhu J, Liu J, Chen Y, Xue X, Sun S. Binary Restructuring Particle Swarm Optimization and Its Application. Biomimetics (Basel) 2023; 8:266. [PMID: 37366861 DOI: 10.3390/biomimetics8020266] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 06/14/2023] [Accepted: 06/15/2023] [Indexed: 06/28/2023] Open
Abstract
Restructuring Particle Swarm Optimization (RPSO) algorithm has been developed as an intelligent approach based on the linear system theory of particle swarm optimization (PSO). It streamlines the flow of the PSO algorithm, specifically targeting continuous optimization problems. In order to adapt RPSO for solving discrete optimization problems, this paper proposes the binary Restructuring Particle Swarm Optimization (BRPSO) algorithm. Unlike other binary metaheuristic algorithms, BRPSO does not utilize the transfer function. The particle updating process in BRPSO relies solely on comparison results between values derived from the position updating formula and a random number. Additionally, a novel perturbation term is incorporated into the position updating formula of BRPSO. Notably, BRPSO requires fewer parameters and exhibits high exploration capability during the early stages. To evaluate the efficacy of BRPSO, comprehensive experiments are conducted by comparing it against four peer algorithms in the context of feature selection problems. The experimental results highlight the competitive nature of BRPSO in terms of both classification accuracy and the number of selected features.
Collapse
Affiliation(s)
- Jian Zhu
- School of Computer Science and Mathematics, Fujian University of Technology, Fuzhou 350118, China
- Fujian Provincial Key Laboratory of Big Data Mining and Applications, Fuzhou 350118, China
| | - Jianhua Liu
- School of Computer Science and Mathematics, Fujian University of Technology, Fuzhou 350118, China
- Fujian Provincial Key Laboratory of Big Data Mining and Applications, Fuzhou 350118, China
| | - Yuxiang Chen
- School of Computer Science and Mathematics, Fujian University of Technology, Fuzhou 350118, China
- Fujian Provincial Key Laboratory of Big Data Mining and Applications, Fuzhou 350118, China
| | - Xingsi Xue
- School of Computer Science and Mathematics, Fujian University of Technology, Fuzhou 350118, China
- Fujian Provincial Key Laboratory of Big Data Mining and Applications, Fuzhou 350118, China
| | - Shuihua Sun
- School of Computer Science and Mathematics, Fujian University of Technology, Fuzhou 350118, China
- Fujian Provincial Key Laboratory of Big Data Mining and Applications, Fuzhou 350118, China
| |
Collapse
|
5
|
Xu B, Heidari AA, Cai Z, Chen H. Dimensional decision covariance colony predation algorithm: global optimization and high−dimensional feature selection. Artif Intell Rev 2023. [DOI: 10.1007/s10462-023-10412-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/14/2023]
|
6
|
Hayet-Otero M, García-García F, Lee DJ, Martínez-Minaya J, España Yandiola PP, Urrutia Landa I, Nieves Ermecheo M, Quintana JM, Menéndez R, Torres A, Zalacain Jorge R, Arostegui I. Extracting relevant predictive variables for COVID-19 severity prognosis: An exhaustive comparison of feature selection techniques. PLoS One 2023; 18:e0284150. [PMID: 37053151 PMCID: PMC10101453 DOI: 10.1371/journal.pone.0284150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 03/26/2023] [Indexed: 04/14/2023] Open
Abstract
With the COVID-19 pandemic having caused unprecedented numbers of infections and deaths, large research efforts have been undertaken to increase our understanding of the disease and the factors which determine diverse clinical evolutions. Here we focused on a fully data-driven exploration regarding which factors (clinical or otherwise) were most informative for SARS-CoV-2 pneumonia severity prediction via machine learning (ML). In particular, feature selection techniques (FS), designed to reduce the dimensionality of data, allowed us to characterize which of our variables were the most useful for ML prognosis. We conducted a multi-centre clinical study, enrolling n = 1548 patients hospitalized due to SARS-CoV-2 pneumonia: where 792, 238, and 598 patients experienced low, medium and high-severity evolutions, respectively. Up to 106 patient-specific clinical variables were collected at admission, although 14 of them had to be discarded for containing ⩾60% missing values. Alongside 7 socioeconomic attributes and 32 exposures to air pollution (chronic and acute), these became d = 148 features after variable encoding. We addressed this ordinal classification problem both as a ML classification and regression task. Two imputation techniques for missing data were explored, along with a total of 166 unique FS algorithm configurations: 46 filters, 100 wrappers and 20 embeddeds. Of these, 21 setups achieved satisfactory bootstrap stability (⩾0.70) with reasonable computation times: 16 filters, 2 wrappers, and 3 embeddeds. The subsets of features selected by each technique showed modest Jaccard similarities across them. However, they consistently pointed out the importance of certain explanatory variables. Namely: patient's C-reactive protein (CRP), pneumonia severity index (PSI), respiratory rate (RR) and oxygen levels -saturation Sp O2, quotients Sp O2/RR and arterial Sat O2/Fi O2-, the neutrophil-to-lymphocyte ratio (NLR) -to certain extent, also neutrophil and lymphocyte counts separately-, lactate dehydrogenase (LDH), and procalcitonin (PCT) levels in blood. A remarkable agreement has been found a posteriori between our strategy and independent clinical research works investigating risk factors for COVID-19 severity. Hence, these findings stress the suitability of this type of fully data-driven approaches for knowledge extraction, as a complementary to clinical perspectives.
Collapse
Affiliation(s)
- Miren Hayet-Otero
- Basque Center for Applied Mathematics (BCAM), Bilbao, Basque Country, Spain
- Department of Electronic Technology, University of the Basque Country (UPV/EHU), Leioa, Basque Country, Spain
- Basque Research and Technology Alliance (BRTA), TECNALIA, Derio, Basque Country, Spain
| | | | - Dae-Jin Lee
- Basque Center for Applied Mathematics (BCAM), Bilbao, Basque Country, Spain
- School of Science and Technology, IE University, Madrid, Madrid, Spain
| | - Joaquín Martínez-Minaya
- Department of Applied Statistics and Operational Research, and Quality, Universitat Politècnica de València (UPV), Valencia, Valencian Community, Spain
| | | | | | - Mónica Nieves Ermecheo
- BioCruces Bizkaia Health Research Institute, Barakaldo, Basque Country, Spain
- Research Unit, Galdakao-Usansolo University Hospital, Galdakao, Basque Country, Spain
| | - José María Quintana
- Research Unit, Galdakao-Usansolo University Hospital, Galdakao, Basque Country, Spain
| | - Rosario Menéndez
- Pneumology Department, La Fe University and Polytechnic Hospital, Valencia, Valencian Community, Spain
| | - Antoni Torres
- Pneumology Department, Hospital Clínic of Barcelona, Barcelona, Catalonia, Spain
| | | | - Inmaculada Arostegui
- Basque Center for Applied Mathematics (BCAM), Bilbao, Basque Country, Spain
- Department of Mathematics, University of the Basque Country (UPV/EHU), Leioa, Basque Country, Spain
| |
Collapse
|
7
|
Zhang X, Gavaldà R, Baixeries J. Interpretable prediction of mortality in liver transplant recipients based on machine learning. Comput Biol Med 2022; 151:106188. [PMID: 36306583 DOI: 10.1016/j.compbiomed.2022.106188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2022] [Revised: 09/24/2022] [Accepted: 10/08/2022] [Indexed: 12/27/2022]
Abstract
BACKGROUND Accurate prediction of the mortality of post-liver transplantation is an important but challenging task. It relates to optimizing organ allocation and estimating the risk of possible dysfunction. Existing risk scoring models, such as the Balance of Risk (BAR) score and the Survival Outcomes Following Liver Transplantation (SOFT) score, do not predict the mortality of post-liver transplantation with sufficient accuracy. In this study, we evaluate the performance of machine learning models and establish an explainable machine learning model for predicting mortality in liver transplant recipients. METHOD The optimal feature set for the prediction of the mortality was selected by a wrapper method based on binary particle swarm optimization (BPSO). With the selected optimal feature set, seven machine learning models were applied to predict mortality over different time windows. The best-performing model was used to predict mortality through a comprehensive comparison and evaluation. An interpretable approach based on machine learning and SHapley Additive exPlanations (SHAP) is used to explicitly explain the model's decision and make new discoveries. RESULTS With regard to predictive power, our results demonstrated that the feature set selected by BPSO outperformed both the feature set in the existing risk score model (BAR score, SOFT score) and the feature set processed by principal component analysis (PCA). The best-performing model, extreme gradient boosting (XGBoost), was found to improve the Area Under a Curve (AUC) values for mortality prediction by 6.7%, 11.6%, and 17.4% at 3 months, 3 years, and 10 years, respectively, compared to the SOFT score. The main predictors of mortality and their impact were discussed for different age groups and different follow-up periods. CONCLUSIONS Our analysis demonstrates that XGBoost can be an ideal method to assess the mortality risk in liver transplantation. In combination with the SHAP approach, the proposed framework provides a more intuitive and comprehensive interpretation of the predictive model, thereby allowing the clinician to better understand the decision-making process of the model and the impact of factors associated with mortality risk in liver transplantation.
Collapse
Affiliation(s)
- Xiao Zhang
- Department of Computer Science, Universitat Politècnica de Catalunya, Barcelona, 08034, Spain.
| | | | - Jaume Baixeries
- Department of Computer Science, Universitat Politècnica de Catalunya, Barcelona, 08034, Spain
| |
Collapse
|
8
|
Tran LV, Tran HM, Le TM, Huynh TTM, Tran HT, Dao SVT. Application of Machine Learning in Epileptic Seizure Detection. Diagnostics (Basel) 2022; 12:diagnostics12112879. [PMID: 36428941 PMCID: PMC9689720 DOI: 10.3390/diagnostics12112879] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 11/09/2022] [Accepted: 11/13/2022] [Indexed: 11/22/2022] Open
Abstract
Epileptic seizure is a neurological condition caused by short and unexpectedly occurring electrical disruptions in the brain. It is estimated that roughly 60 million individuals worldwide have had an epileptic seizure. Experiencing an epileptic seizure can have serious consequences for the patient. Automatic seizure detection on electroencephalogram (EEG) recordings is essential due to the irregular and unpredictable nature of seizures. By thoroughly analyzing EEG records, neurophysiologists can discover important information and patterns, and proper and timely treatments can be provided for the patients. This research presents a novel machine learning-based approach for detecting epileptic seizures in EEG signals. A public EEG dataset from the University of Bonn was used to validate the approach. Meaningful statistical features were extracted from the original data using discrete wavelet transform analysis, then the relevant features were selected using feature selection based on the binary particle swarm optimizer. This facilitated the reduction of 75% data dimensionality and 47% computational time, which eventually sped up the classification process. After having been selected, relevant features were used to train different machine learning models, then hyperparameter optimization was utilized to further enhance the models' performance. The results achieved up to 98.4% accuracy and showed that the proposed method was very effective and practical in detecting seizure presence in EEG signals. In clinical applications, this method could help relieve the suffering of epilepsy patients and alleviate the workload of neurologists.
Collapse
Affiliation(s)
- Ly V. Tran
- School of Industrial Engineering and Management, International University, Vietnam National University, Ho Chi Minh City 700000, Vietnam
| | - Hieu M. Tran
- School of Electrical Engineering, International University, Vietnam National University, Ho Chi Minh City 700000, Vietnam
| | - Tuan M. Le
- School of Electrical Engineering, International University, Vietnam National University, Ho Chi Minh City 700000, Vietnam
| | - Tri T. M. Huynh
- School of Electrical Engineering, International University, Vietnam National University, Ho Chi Minh City 700000, Vietnam
| | - Hung T. Tran
- School of Industrial Engineering and Management, International University, Vietnam National University, Ho Chi Minh City 700000, Vietnam
| | - Son V. T. Dao
- School of Industrial Engineering and Management, International University, Vietnam National University, Ho Chi Minh City 700000, Vietnam
- School of Science, Engineering & Technology, RMIT University Vietnam, Ho Chi Minh City 700000, Vietnam
- Correspondence: or ; Tel.: +84-98-159-1145
| |
Collapse
|
9
|
A hybrid multi-stage learning technique based on brain storming optimization algorithm for breast cancer recurrence prediction. JOURNAL OF KING SAUD UNIVERSITY - COMPUTER AND INFORMATION SCIENCES 2022. [DOI: 10.1016/j.jksuci.2021.05.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
10
|
A biologically-inspired hybrid deep learning approach for brain tumor classification from magnetic resonance imaging using improved gabor wavelet transform and Elmann-BiLSTM network. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2022.103949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
11
|
Rashno A, Shafipour M, Fadaei S. Particle ranking: An Efficient Method for Multi-Objective Particle Swarm Optimization Feature Selection. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108640] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
12
|
Sepsis prediction in intensive care unit based on genetic feature optimization and stacked deep ensemble learning. Neural Comput Appl 2022. [DOI: 10.1007/s00521-021-06631-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
13
|
Moslemi A, Kontogianni K, Brock J, Wood S, Herth F, Kirby M. Differentiating COPD and Asthma using Quantitative CT Imaging and Machine Learning. Eur Respir J 2022; 60:13993003.03078-2021. [PMID: 35210316 DOI: 10.1183/13993003.03078-2021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 02/04/2022] [Indexed: 11/05/2022]
Abstract
There are similarities and differences between chronic obstructive pulmonary disease (COPD) and asthma patients in terms of computed tomography (CT) disease-related features. Our objective was to determine the optimal subset of CT imaging features for differentiating COPD and asthma using machine learning.COPD and asthma patients were recruited from Heidelberg University Hospital. CT was acquired and 93 features were extracted (VIDA Diagnostics): percentage of low-attenuating-areas below -950HU (LAA950), LAA950 hole count, estimated airway-wall-thickness for a 10 mm internal perimeter airway (Pi10), total-airway-count (TAC), as well as inner/outer perimeter/areas and wall thickness for each of five segmental airways, and the average of those five airways. Hybrid feature selection was used to select the optimum number of features, and support vector machine was used to classify COPD and asthma.Ninety-five participants were included (n=48 COPD; n=47 asthma); there were no differences between COPD and asthma for age (p=0.25) or FEV1 (p=0.31). In a model including all CT features, the accuracy and F1-score was 80% and 81%, respectively. The top features were: LAA950, LAA950 hole count, average outer and inner airway perimeter, outer and inner airway area RB1, and TAC. In the model with only airway features, the accuracy and F1-score were 66% and 68%, respectively. The top features were: inner area RB1, wall thickness RB1, outer area LB1, TAC LB10, average outer/inner perimeter, Pi10, and TAC.In conclusions, COPD and asthma can be differentiated using machine learning with moderate-high accuracy by a subset of only 7 CT features.
Collapse
Affiliation(s)
- Amir Moslemi
- Department of Physics, Ryerson University, Toronto, ON, Canada.,Co-first authors
| | - Konstantina Kontogianni
- Department of Pneumology and Critical Care Medicine, Thoraxklinik and Translational Lung Research Center (TLRCH), University of Heidelberg, Germany.,Co-first authors
| | - Judith Brock
- Department of Pneumology and Critical Care Medicine, Thoraxklinik and Translational Lung Research Center (TLRCH), University of Heidelberg, Germany
| | | | - Felix Herth
- Department of Pneumology and Critical Care Medicine, Thoraxklinik and Translational Lung Research Center (TLRCH), University of Heidelberg, Germany .,Co-senior authors
| | - Miranda Kirby
- Department of Physics, Ryerson University, Toronto, ON, Canada.,Co-senior authors
| |
Collapse
|
14
|
Li Z. A Feature Selection Method Using Dynamic Dependency and Redundancy Analysis. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2022. [DOI: 10.1007/s13369-022-06590-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
15
|
Dispersed foraging slime mould algorithm: Continuous and binary variants for global optimization and wrapper-based feature selection. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.107761] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
16
|
A Review of the Modification Strategies of the Nature Inspired Algorithms for Feature Selection Problem. MATHEMATICS 2022. [DOI: 10.3390/math10030464] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
This survey is an effort to provide a research repository and a useful reference for researchers to guide them when planning to develop new Nature-inspired Algorithms tailored to solve Feature Selection problems (NIAs-FS). We identified and performed a thorough literature review in three main streams of research lines: Feature selection problem, optimization algorithms, particularly, meta-heuristic algorithms, and modifications applied to NIAs to tackle the FS problem. We provide a detailed overview of 156 different articles about NIAs modifications for tackling FS. We support our discussions by analytical views, visualized statistics, applied examples, open-source software systems, and discuss open issues related to FS and NIAs. Finally, the survey summarizes the main foundations of NIAs-FS with approximately 34 different operators investigated. The most popular operator is chaotic maps. Hybridization is the most widely used modification technique. There are three types of hybridization: Integrating NIA with another NIA, integrating NIA with a classifier, and integrating NIA with a classifier. The most widely used hybridization is the one that integrates a classifier with the NIA. Microarray and medical applications are the dominated applications where most of the NIA-FS are modified and used. Despite the popularity of the NIAs-FS, there are still many areas that need further investigation.
Collapse
|
17
|
Improved seagull optimization algorithm using Lévy flight and mutation operator for feature selection. Neural Comput Appl 2022. [DOI: 10.1007/s00521-021-06751-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
18
|
Machine learning identification of specific changes in myeloid cell phenotype during bloodstream infections. Sci Rep 2021; 11:20288. [PMID: 34645893 PMCID: PMC8514545 DOI: 10.1038/s41598-021-99628-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Accepted: 09/29/2021] [Indexed: 11/18/2022] Open
Abstract
The early identification of bacteremia is critical for ensuring appropriate treatment of nosocomial infections in intensive care unit (ICU) patients. The aim of this study was to use flow cytometric data of myeloid cells as a biomarker of bloodstream infection (BSI). An eight-color antibody panel was used to identify seven monocyte and two dendritic cell subsets. In the learning cohort, immunophenotyping was applied to (1) control subjects, (2) postoperative heart surgery patients, as a model of noninfectious inflammatory responses, and (3) blood culture-positive patients. Of the complex changes in the myeloid cell phenotype, a decrease in myeloid and plasmacytoid dendritic cell numbers, increase in CD14+CD16+ inflammatory monocyte numbers, and upregulation of neutrophils CD64 and CD123 expression were prominent in BSI patients. An extreme gradient boosting (XGBoost) algorithm called the “infection detection and ranging score” (iDAR), ranging from 0 to 100, was developed to identify infection-specific changes in 101 phenotypic variables related to neutrophils, monocytes and dendritic cells. The tenfold cross-validation achieved an area under the receiver operating characteristic (AUROC) of 0.988 (95% CI 0.985–1) for the detection of bacteremic patients. In an out-of-sample, in-house validation, iDAR achieved an AUROC of 0.85 (95% CI 0.71–0.98) in differentiating localized from bloodstream infection and 0.95 (95% CI 0.89–1) in discriminating infected from noninfected ICU patients. In conclusion, a machine learning approach was used to translate the changes in myeloid cell phenotype in response to infection into a score that could identify bacteremia with high specificity in ICU patients.
Collapse
|
19
|
Ibrahim RA, Abd Elaziz M, Ewees AA, El-Abd M, Lu S. New feature selection paradigm based on hyper-heuristic technique. APPLIED MATHEMATICAL MODELLING 2021; 98:14-37. [DOI: 10.1016/j.apm.2021.04.018] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
|
20
|
Qiu C, Liu N. A novel three layer particle swarm optimization for feature selection. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-202647] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Feature selection (FS) is a vital data preprocessing task which aims at selecting a small subset of features while maintaining a high level of classification accuracy. FS is a challenging optimization problem due to the large search space and the existence of local optimal solutions. Particle swarm optimization (PSO) is a promising technique in selecting optimal feature subset due to its rapid convergence speed and global search ability. But PSO suffers from stagnation or premature convergence in complex FS problems. In this paper, a novel three layer PSO (TLPSO) is proposed for solving FS problem. In the TLPSO, the particles in the swarm are divided into three layers according to their evolution status and particles in different layers are treated differently to fully investigate their potential. Instead of learning from those historical best positions, the TLPSO uses a random learning exemplar selection strategy to enrich the searching behavior of the swarm and enhance the population diversity. Further, a local search operator based on the Gaussian distribution is performed on the elite particles to improve the exploitation ability. Therefore, TLPSO is able to keep a balance between population diversity and convergence speed. Extensive comparisons with seven state-of-the-art meta-heuristic based FS methods are conducted on 18 datasets. The experimental results demonstrate the competitive and reliable performance of TLPSO in terms of improving the classification accuracy and reducing the number of features.
Collapse
Affiliation(s)
- Chenye Qiu
- School of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing, Jiangsu, China
| | - Ning Liu
- School of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing, Jiangsu, China
| |
Collapse
|
21
|
Wang F, Wang X. A novel feature selection algorithm based on damping oscillation theory. PLoS One 2021; 16:e0255307. [PMID: 34358234 PMCID: PMC8345869 DOI: 10.1371/journal.pone.0255307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 07/13/2021] [Indexed: 11/18/2022] Open
Abstract
Feature selection is an important task in big data analysis and information retrieval processing. It reduces the number of features by removing noise, extraneous data. In this paper, one feature subset selection algorithm based on damping oscillation theory and support vector machine classifier is proposed. This algorithm is called the Maximum Kendall coefficient Maximum Euclidean Distance Improved Gray Wolf Optimization algorithm (MKMDIGWO). In MKMDIGWO, first, a filter model based on Kendall coefficient and Euclidean distance is proposed, which is used to measure the correlation and redundancy of the candidate feature subset. Second, the wrapper model is an improved grey wolf optimization algorithm, in which its position update formula has been improved in order to achieve optimal results. Third, the filter model and the wrapper model are dynamically adjusted by the damping oscillation theory to achieve the effect of finding an optimal feature subset. Therefore, MKMDIGWO achieves both the efficiency of the filter model and the high precision of the wrapper model. Experimental results on five UCI public data sets and two microarray data sets have demonstrated the higher classification accuracy of the MKMDIGWO algorithm than that of other four state-of-the-art algorithms. The maximum ACC value of the MKMDIGWO algorithm is at least 0.5% higher than other algorithms on 10 data sets.
Collapse
Affiliation(s)
- Fujun Wang
- School of Electronic and Information Engineering, Liaoning Technical University, Huludao, People’s Republic of China
- Key Laboratory of Preparation and Application of Environmentally Friendly Materials, Chinese Ministry of Education, Jilin Normal University, Changchun, People’s Republic of China
| | - Xing Wang
- School of Electronic and Information Engineering, Liaoning Technical University, Huludao, People’s Republic of China
| |
Collapse
|
22
|
Particle Swarm Optimization and Multiple Stacked Generalizations to Detect Nitrogen and Organic-Matter in Organic-Fertilizer Using Vis-NIR. SENSORS 2021; 21:s21144882. [PMID: 34300620 PMCID: PMC8309747 DOI: 10.3390/s21144882] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/20/2021] [Revised: 07/13/2021] [Accepted: 07/16/2021] [Indexed: 11/29/2022]
Abstract
Organic fertilizer is a key component of agricultural sustainability and significantly contributes to the improvement of soil fertility. The values of nutrients such as organic matter and nitrogen in organic fertilizers positively affect plant growth and cause environmental problems when used in large amounts. Hence the importance of implementing fast detection of nitrogen (N) and organic matter (OM). This paper examines the feasibility of a framework that combined a particle swarm optimization (PSO) and two multiple stacked generalizations to determine the amount of nitrogen and organic matter in organic-fertilizer using visible near-infrared spectroscopy (Vis-NIR). The first multiple stacked generalizations for classification coupled with PSO (FSGC-PSO) were for feature selection purposes, while the second stacked generalizations for regression (SSGR) improved the detection of nitrogen and organic matter. The computation of root means square error (RMSE) and the coefficient of determination for calibration and prediction set (R2) was used to gauge the different models. The obtained FSGC-PSO subset combined with SSGR achieved significantly better prediction results than conventional methods such as Ridge, support vector machine (SVM), and partial least square (PLS) for both nitrogen (R2p = 0.9989, root mean square error of prediction (RMSEP) = 0.031 and limit of detection (LOD) = 2.97) and organic matter (R2p = 0.9972, RMSEP = 0.051 and LOD = 2.97). Therefore, our settled approach can be implemented as a promising way to monitor and evaluate the amount of N and OM in organic fertilizer.
Collapse
|
23
|
Classification of immature white blood cells in acute lymphoblastic leukemia L1 using neural networks particle swarm optimization. Neural Comput Appl 2021. [DOI: 10.1007/s00521-021-06245-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
24
|
Qu C, Zhang L, Li J, Deng F, Tang Y, Zeng X, Peng X. Improving feature selection performance for classification of gene expression data using Harris Hawks optimizer with variable neighborhood learning. Brief Bioinform 2021; 22:6238587. [PMID: 33876181 DOI: 10.1093/bib/bbab097] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Revised: 02/28/2021] [Accepted: 03/03/2021] [Indexed: 11/14/2022] Open
Abstract
Gene expression profiling has played a significant role in the identification and classification of tumor molecules. In gene expression data, only a few feature genes are closely related to tumors. It is a challenging task to select highly discriminative feature genes, and existing methods fail to deal with this problem efficiently. This article proposes a novel metaheuristic approach for gene feature extraction, called variable neighborhood learning Harris Hawks optimizer (VNLHHO). First, the F-score is used for a primary selection of the genes in gene expression data to narrow down the selection range of the feature genes. Subsequently, a variable neighborhood learning strategy is constructed to balance the global exploration and local exploitation of the Harris Hawks optimization. Finally, mutation operations are employed to increase the diversity of the population, so as to prevent the algorithm from falling into a local optimum. In addition, a novel activation function is used to convert the continuous solution of the VNLHHO into binary values, and a naive Bayesian classifier is utilized as a fitness function to select feature genes that can help classify biological tissues of binary and multi-class cancers. An experiment is conducted on gene expression profile data of eight types of tumors. The results show that the classification accuracy of the VNLHHO is greater than 96.128% for tumors in the colon, nervous system and lungs and 100% for the rest. We compare seven other algorithms and demonstrate the superiority of the VNLHHO in terms of the classification accuracy, fitness value and AUC value in feature selection for gene expression data.
Collapse
Affiliation(s)
- Chiwen Qu
- College of Mathematics and Statistics, Hunan Normal University, China
| | - Lupeng Zhang
- Department of Pathology and Pathophysiology, Jishou University School of Medicine, Jishou University, China
| | - Jinlong Li
- Department of Pathology and Pathophysiology, Jishou University School of Medicine, Jishou University, China
| | - Fang Deng
- Department of Epidemiology and Health Statistics, Xiangya Public Health School, Central South University, China
| | - Yifan Tang
- Department of Pathology and Pathophysiology, Hunan Normal University School of Medicine, Hunan Normal University, China
| | - Xiaomin Zeng
- Department of Epidemiology and Health Statistics, Xiangya Public Health School, Central South University, China
| | - Xiaoning Peng
- Department of Pathology and Pathophysiology, Hunan Normal University School of Medicine, Hunan Normal University, China
| |
Collapse
|
25
|
Sverdlov O, Ryeznik Y, Wong WK. Opportunity for efficiency in clinical development: An overview of adaptive clinical trial designs and innovative machine learning tools, with examples from the cardiovascular field. Contemp Clin Trials 2021; 105:106397. [PMID: 33845209 DOI: 10.1016/j.cct.2021.106397] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 03/28/2021] [Accepted: 04/05/2021] [Indexed: 11/30/2022]
Abstract
Modern data analysis tools and statistical modeling techniques are increasingly used in clinical research to improve diagnosis, estimate disease progression and predict treatment outcomes. What seems less emphasized is the importance of the study design, which can have a serious impact on the study cost, time and statistical efficiency. This paper provides an overview of different types of adaptive designs in clinical trials and their applications to cardiovascular trials. We highlight recent proliferation of work on adaptive designs over the past two decades, including some recent regulatory guidelines on complex trial designs and master protocols. We also describe the increasing role of machine learning and use of metaheuristics to construct increasingly complex adaptive designs or to identify interesting features for improved predictions and classifications.
Collapse
Affiliation(s)
- Oleksandr Sverdlov
- Early Development Biostatistics, Novartis Pharmaceuticals Corporation, USA.
| | - Yevgen Ryeznik
- Department of Pharmaceutical Biosciences, Uppsala University, Sweden
| | - Weng Kee Wong
- Department of Biostatistics, University of California Los Angeles, USA
| |
Collapse
|
26
|
Maheshwari S, Agarwal A, Shukla A, Tiwari R. A comprehensive evaluation for the prediction of mortality in intensive care units with LSTM networks: patients with cardiovascular disease. ACTA ACUST UNITED AC 2021; 65:435-446. [PMID: 31846424 DOI: 10.1515/bmt-2018-0206] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Accepted: 10/25/2019] [Indexed: 11/15/2022]
Abstract
Intensive care units (ICUs) are responsible for generating a wealth of useful data in the form of electronic health records. We aimed to build a mortality prediction model on a Medical Information Mart for Intensive Care (MIMIC-III) database and to assess whether the use of deep learning techniques like long short-term memory (LSTM) can effectively utilize the temporal relations among clinical variables. The models were built on clinical variable dynamics of the first 48 h of ICU admission of 12,550 records from the MIMIC-III database. A total of 36 variables including 33 time series variables and three static variables were used for the prediction. We present the application of LSTM and LSTM attention (LSTM-AT) model for mortality prediction with such a large number of clinical variables dataset. For training and validation purpose, we have used International Classification of Diseases, 9th edition (ICD-9) codes for extracting the patients with cardiovascular disease, and infections and parasitic disease, respectively. The effectiveness of the LSTM model is achieved over non-recurrent baseline models like naïve Bayes, logistic regression (LR), support vector machine and multilayer perceptron (MLP) by generating state of the art results (area under the curve [AUC], 0.852). Next, by providing attention at each time stamp, we developed a model, LSTM-AT, which exhibits even better performance (AUC, 0.876).
Collapse
Affiliation(s)
- Saumil Maheshwari
- Soft Computing and Expert System Laboratory, ABV-Indian Institute of Information Technology and Management, Gwalior 474010, Madhya Pradesh, India
| | - Aman Agarwal
- Soft Computing and Expert System Laboratory, ABV-Indian Institute of Information Technology and Management, Gwalior 474010, Madhya Pradesh, India
| | - Anupam Shukla
- Soft Computing and Expert System Laboratory, ABV-Indian Institute of Information Technology and Management, Gwalior 474010, Madhya Pradesh, India
| | - Ritu Tiwari
- Soft Computing and Expert System Laboratory, ABV-Indian Institute of Information Technology and Management, Gwalior 474010, Madhya Pradesh, India
| |
Collapse
|
27
|
Zhang G, Xue Z, Yan C, Wang J, Luo H. A Novel Biomarker Identification Approach for Gastric Cancer Using Gene Expression and DNA Methylation Dataset. Front Genet 2021; 12:644378. [PMID: 33868380 PMCID: PMC8044773 DOI: 10.3389/fgene.2021.644378] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Accepted: 02/16/2021] [Indexed: 01/09/2023] Open
Abstract
As one type of complex disease, gastric cancer has high mortality rate, and there are few effective treatments for patients in advanced stage. With the development of biological technology, a large amount of multiple-omics data of gastric cancer are generated, which enables computational method to discover potential biomarkers of gastric cancer. That will be very important to detect gastric cancer at earlier stages and thus assist in providing timely treatment. However, most of biological data have the characteristics of high dimension and low sample size. It is hard to process directly without feature selection. Besides, only using some omic data, such as gene expression data, provides limited evidence to investigate gastric cancer associated biomarkers. In this research, gene expression data and DNA methylation data are integrated to analyze gastric cancer, and a feature selection approach is proposed to identify the possible biomarkers of gastric cancer. After the original data are pre-processed, the mutual information (MI) is applied to select some top genes. Then, fold change (FC) and T-test are adopted to identify differentially expressed genes (DEG). In particular, false discover rate (FDR) is introduced to revise p_value to further screen genes. For chosen genes, a deep neural network (DNN) model is utilized as the classifier to measure the quality of classification. The experimental results show that the approach can achieve superior performance in terms of accuracy and other metrics. Biological analysis for chosen genes further validates the effectiveness of the approach.
Collapse
Affiliation(s)
- Ge Zhang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Zijing Xue
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Jianlin Wang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Huimin Luo
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| |
Collapse
|
28
|
Gupta R, Alam MA, Agarwal P. Whale optimization algorithm fused with SVM to detect stress in EEG signals. INTELLIGENT DECISION TECHNOLOGIES 2021. [DOI: 10.3233/idt-200047] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Identifying stress and its level has always been a challenging area for researchers. A lot of work is going on around the world on the same. An attempt has been made by the authors in this paper as they present a methodology for detecting stress in EEG signals. Electroencephalogram (EEG) is commonly used to acquire brain signal activity. Though there exist other techniques to extract the same like Functional magnetic resonance imaging (fMRI), positron emission tomography (PET) we have used EEG as it is economical. We have used an open-source dataset for EEG data. Various images are used as the target stressor for collecting EEG signals. After feature selection and extraction, a support vector machine (SVM) with a whale optimization algorithm (WOA) in its kernel function for classification is used. WOA is a bio-inspired meta-heuristic algorithm, based on the hunting behavior of humpback whales. Using this method, we had obtained 91% accuracy for detecting the stress. The paper also compared the previous work done in detecting stress with the work proposed in this paper.
Collapse
|
29
|
A Simultaneous Moth Flame Optimizer Feature Selection Approach Based on Levy Flight and Selection Operators for Medical Diagnosis. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2021. [DOI: 10.1007/s13369-021-05478-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
30
|
Abbas S, Jalil Z, Javed AR, Batool I, Khan MZ, Noorwali A, Gadekallu TR, Akbar A. BCD-WERT: a novel approach for breast cancer detection using whale optimization based efficient features and extremely randomized tree algorithm. PeerJ Comput Sci 2021; 7:e390. [PMID: 33817036 PMCID: PMC7959601 DOI: 10.7717/peerj-cs.390] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Accepted: 01/20/2021] [Indexed: 06/12/2023]
Abstract
Breast cancer is one of the leading causes of death in the current age. It often results in subpar living conditions for a patient as they have to go through expensive and painful treatments to fight this cancer. One in eight women all over the world is affected by this disease. Almost half a million women annually do not survive this fight and die from this disease. Machine learning algorithms have proven to outperform all existing solutions for the prediction of breast cancer using models built on the previously available data. In this paper, a novel approach named BCD-WERT is proposed that utilizes the Extremely Randomized Tree and Whale Optimization Algorithm (WOA) for efficient feature selection and classification. WOA reduces the dimensionality of the dataset and extracts the relevant features for accurate classification. Experimental results on state-of-the-art comprehensive dataset demonstrated improved performance in comparison with eight other machine learning algorithms: Support Vector Machine (SVM), Random Forest, Kernel Support Vector Machine, Decision Tree, Logistic Regression, Stochastic Gradient Descent, Gaussian Naive Bayes and k-Nearest Neighbor. BCD-WERT outperformed all with the highest accuracy rate of 99.30% followed by SVM achieving 98.60% accuracy. Experimental results also reveal the effectiveness of feature selection techniques in improving prediction accuracy.
Collapse
Affiliation(s)
- Shafaq Abbas
- Department of Computer Science, Air University, Islamabad, Pakistan
| | - Zunera Jalil
- Department of Cyber Security, Air University, Islamabad, Pakistan
| | | | - Iqra Batool
- Department of Computer Science, Air University, Islamabad, Pakistan
| | - Mohammad Zubair Khan
- Department of Computer Science, College of Computer Science and Engineering, Taibah University, Madinah, Saudi Arabia
| | | | - Thippa Reddy Gadekallu
- School of Information Technology and Engineering, Vellore Institute of Technology University, Tamil Nadu, India
| | - Aqsa Akbar
- Department of Computer Science, Air University, Islamabad, Pakistan
| |
Collapse
|
31
|
Xie H, Zhang L, Lim CP, Yu Y, Liu H. Feature Selection Using Enhanced Particle Swarm Optimisation for Classification Models. SENSORS 2021; 21:s21051816. [PMID: 33807806 PMCID: PMC7961412 DOI: 10.3390/s21051816] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 02/19/2021] [Accepted: 02/22/2021] [Indexed: 11/16/2022]
Abstract
In this research, we propose two Particle Swarm Optimisation (PSO) variants to undertake feature selection tasks. The aim is to overcome two major shortcomings of the original PSO model, i.e., premature convergence and weak exploitation around the near optimal solutions. The first proposed PSO variant incorporates four key operations, including a modified PSO operation with rectified personal and global best signals, spiral search based local exploitation, Gaussian distribution-based swarm leader enhancement, and mirroring and mutation operations for worst solution improvement. The second proposed PSO model enhances the first one through four new strategies, i.e., an adaptive exemplar breeding mechanism incorporating multiple optimal signals, nonlinear function oriented search coefficients, exponential and scattering schemes for swarm leader, and worst solution enhancement, respectively. In comparison with a set of 15 classical and advanced search methods, the proposed models illustrate statistical superiority for discriminative feature selection for a total of 13 data sets.
Collapse
Affiliation(s)
- Hailun Xie
- Computational Intelligence Research Group, Department of Computer and Information Sciences, Faculty of Engineering and Environment, University of Northumbria, Newcastle upon Tyne NE1 8ST, UK;
| | - Li Zhang
- Computational Intelligence Research Group, Department of Computer and Information Sciences, Faculty of Engineering and Environment, University of Northumbria, Newcastle upon Tyne NE1 8ST, UK;
- Correspondence:
| | - Chee Peng Lim
- Institute for Intelligent Systems Research and Innovation, Deakin University, Waurn Ponds, VIC 3216, Australia;
| | - Yonghong Yu
- College of Tongda, Nanjing University of Posts and Telecommunications, Nanjing 210049, China;
| | - Han Liu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China;
| |
Collapse
|
32
|
Wu W, Zhou Z. A Comprehensive Way to Access Hospital Death Prediction Model for Acute Mesenteric Ischemia: A Combination of Traditional Statistics and Machine Learning. Int J Gen Med 2021; 14:591-602. [PMID: 33658832 PMCID: PMC7920592 DOI: 10.2147/ijgm.s300492] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Accepted: 02/04/2021] [Indexed: 02/05/2023] Open
Abstract
Purpose This study aimed to use traditional statistics and machine learning to develop and validate prediction models for predicting hospital death in patients with AMI and compare these models' performance. Patients and Methods Data were retrieved from the Medical Information Mart for Intensive Care (MIMIC III) electronic clinical database. A total of 338 eligible AMI patients were divided into a training cohort (n = 238) and a validation cohort (n = 100), and all patients were divided into survival groups and nonsurvival groups according to patients' hospital outcomes. The performance of the traditional statistics prediction model and the optimal machine learning prediction model was evaluated and compared with respect to discrimination, calibration, and clinical utility in the validation cohort. Results Univariate and multivariate logistic regression analyses identified the following independent risk factors associated with hospital death for AMI in the training cohort, including diastolic blood pressure, blood lactate, blood creatinine, age, blood pH, and red blood cell distribution width. Both the nomogram (AUC = 77.0%, 67.9-86.1%) and optimal machine learning model (AUC = 82.9%, 74.9-91.0%) achieved good discrimination and calibration in the validation cohort. Decision curves analysis showed that the optimal machine learning model has a greater net benefit than that of nomogram in this study. Conclusion The nomogram achieved a concise and relatively accurate prediction of hospital death in patients with AMI, the machine learning model also has good discrimination and seems to have better clinical utility. Traditional statistics may help infer the relationship between risk factors and hospital death, while machine learning may contribute to a more accurate prediction. Traditional statistics and machine learning are complementary in developing the prediction model for hospital death of AMI. Therefore, a combination of nomogram-machine learning (Nomo-ML) predictive model may improve care and help clinicians make AMI management-related decisions.
Collapse
Affiliation(s)
- Wenhan Wu
- Institute of Digestive Surgery of Sichuan University, Chengdu, 610041, Sichuan.,Department of Gastrointestinal Surgery, West China Hospital, West China School of Medicine, Sichuan University, Chengdu, 610041, Sichuan
| | - Zongguang Zhou
- Institute of Digestive Surgery of Sichuan University, Chengdu, 610041, Sichuan.,Department of Gastrointestinal Surgery, West China Hospital, West China School of Medicine, Sichuan University, Chengdu, 610041, Sichuan
| |
Collapse
|
33
|
Venkatesh B, Anuradha J. A fuzzy gaussian rank aggregation ensemble feature selection method for microarray data. INTERNATIONAL JOURNAL OF KNOWLEDGE-BASED AND INTELLIGENT ENGINEERING SYSTEMS 2021. [DOI: 10.3233/kes-190134] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
In Microarray Data, it is complicated to achieve more classification accuracy due to the presence of high dimensions, irrelevant and noisy data. And also It had more gene expression data and fewer samples. To increase the classification accuracy and the processing speed of the model, an optimal number of features need to extract, this can be achieved by applying the feature selection method. In this paper, we propose a hybrid ensemble feature selection method. The proposed method has two phases, filter and wrapper phase in filter phase ensemble technique is used for aggregating the feature ranks of the Relief, minimum redundancy Maximum Relevance (mRMR), and Feature Correlation (FC) filter feature selection methods. This paper uses the Fuzzy Gaussian membership function ordering for aggregating the ranks. In wrapper phase, Improved Binary Particle Swarm Optimization (IBPSO) is used for selecting the optimal features, and the RBF Kernel-based Support Vector Machine (SVM) classifier is used as an evaluator. The performance of the proposed model are compared with state of art feature selection methods using five benchmark datasets. For evaluation various performance metrics such as Accuracy, Recall, Precision, and F1-Score are used. Furthermore, the experimental results show that the performance of the proposed method outperforms the other feature selection methods.
Collapse
|
34
|
Irshad S, Yin X, Zhang Y. A new approach for retinal vessel differentiation using binary particle swarm optimization. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING: IMAGING & VISUALIZATION 2021. [DOI: 10.1080/21681163.2020.1870001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Samra Irshad
- School of Software and Electrical Engineering, Swinburne University of Technology, Melbourne, Australia
| | - Xiaoxia Yin
- Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China
| | - Yanchun Zhang
- Institute for Sustainable Industries and Liveable Cities, Victoria University, Melbourne, Australia
| |
Collapse
|
35
|
Feature subset selection via an improved discretization-based particle swarm optimization. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2020.106794] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
|
36
|
Yan C, Wu B, Ma J, Zhang G, Luo J, Wang J, Luo H. A Novel Hybrid Filter/Wrapper Feature Selection Approach Based on Improved Fruit Fly Optimization Algorithm and Chi-square Test for High Dimensional Microarray Data. Curr Bioinform 2021. [DOI: 10.2174/1574893615666200324125535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Microarray data is widely utilized for disease analysis and diagnosis.
However, it is hard to process them directly and achieve high classification accuracy due to the
intrinsic characteristics of high dimensionality and small size samples. As an important data
preprocessing technique, feature selection is usually used to reduce the dimensionality of some
datasets.
Methods:
Given the limitations of employing filter or wrapper approaches individually for feature
selection, in the study, a novel hybrid filter-wrapper approach, CS_IFOA, is proposed for high
dimensional datasets. First, the Chi-square Test is utilized to filter out some irrelevant or redundant
features. Next, an improved binary Fruit Fly Optimization algorithm is conducted to further search
the optimal feature subset without degrading the classification accuracy. Here, the KNN classifier
with the 10-fold-CV is utilized to evaluate the classification accuracy.
Results:
Extensive experimental results on six benchmark biomedical datasets show that the
proposed CS-IFOA can achieve superior performance compared with other state-of-the-art
methods. The CS-IFOA can get a smaller number of features while achieving higher classification
accuracy. Furthermore, the standard deviation of the experimental results is relatively small, which
indicates that the proposed algorithm is relatively robust.
Conclusion:
The results confirmed the efficiency of our approach in identifying some important
genes for high-dimensional biomedical datasets, which can be used as an ideal pre-processing tool
to help optimize the feature selection process, and improve the efficiency of disease diagnosis.
Collapse
Affiliation(s)
- Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng,China
| | - Bin Wu
- School of Computer and Information Engineering, Henan University, Kaifeng,China
| | | | - Ge Zhang
- School of Computer and Information Engineering, Henan University, Kaifeng,China
| | - Junwei Luo
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo,China
| | - Jianlin Wang
- School of Computer and Information Engineering, Henan University, Kaifeng,China
| | - Huimin Luo
- School of Computer and Information Engineering, Henan University, Kaifeng,China
| |
Collapse
|
37
|
Binary JAYA Algorithm with Adaptive Mutation for Feature Selection. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2020. [DOI: 10.1007/s13369-020-04871-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
38
|
Feature engineering combined with 1-D convolutional neural network for improved mortality prediction. BIO-ALGORITHMS AND MED-SYSTEMS 2020. [DOI: 10.1515/bams-2020-0056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Abstract
Objectives
The appropriate care for patients admitted in Intensive care units (ICUs) is becoming increasingly prominent, thus recognizing the use of machine learning models. The real-time prediction of mortality of patients admitted in ICU has the potential for providing the physician with the interpretable results. With the growing crisis including soaring cost, unsafe care, misdirected care, fragmented care, chronic diseases and evolution of epidemic diseases in the domain of healthcare demands the application of automated and real-time data processing for assuring the improved quality of life. The intensive care units (ICUs) are responsible for generating a wealth of useful data in the form of Electronic Health Record (EHR). This data allows for the development of a prediction tool with perfect knowledge backing.
Method
We aimed to build the mortality prediction model on 2012 Physionet Challenge mortality prediction database of 4,000 patients admitted in ICU. The challenges in the dataset, such as high dimensionality, imbalanced distribution and missing values, were tackled with analytical methods and tools via feature engineering and new variable construction. The objective of the research is to utilize the relations among the clinical variables and construct new variables which would establish the effectiveness of 1-Dimensional Convolutional Neural Network (1-D CNN) with constructed features.
Results
Its performance with the traditional machine learning algorithms like XGBoost classifier, Light Gradient Boosting Machine (LGBM) classifier, Support Vector Machine (SVM), Decision Tree (DT), K-Neighbours Classifier (K-NN), and Random Forest Classifier (RF) and recurrent models like Long Short-Term Memory (LSTM) and LSTM-attention is compared for Area Under Curve (AUC). The investigation reveals the best AUC of 0.848 using 1-D CNN model.
Conclusion
The relationship between the various features were recognized. Also, constructed new features using existing ones. Multiple models were tested and compared on different metrics.
Collapse
|
39
|
Performance Evaluation of a Proposed Machine Learning Model for Chronic Disease Datasets Using an Integrated Attribute Evaluator and an Improved Decision Tree Classifier. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10228137] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
There is a consistent rise in chronic diseases worldwide. These diseases decrease immunity and the quality of daily life. The treatment of these disorders is a challenging task for medical professionals. Dimensionality reduction techniques make it possible to handle big data samples, providing decision support in relation to chronic diseases. These datasets contain a series of symptoms that are used in disease prediction. The presence of redundant and irrelevant symptoms in the datasets should be identified and removed using feature selection techniques to improve classification accuracy. Therefore, the main contribution of this paper is a comparative analysis of the impact of wrapper and filter selection methods on classification performance. The filter methods that have been considered include the Correlation Feature Selection (CFS) method, the Information Gain (IG) method and the Chi-Square (CS) method. The wrapper methods that have been considered include the Best First Search (BFS) method, the Linear Forward Selection (LFS) method and the Greedy Step Wise Search (GSS) method. A Decision Tree algorithm has been used as a classifier for this analysis and is implemented through the WEKA tool. An attribute significance analysis has been performed on the diabetes, breast cancer and heart disease datasets used in the study. It was observed that the CFS method outperformed other filter methods concerning the accuracy rate and execution time. The accuracy rate using the CFS method on the datasets for heart disease, diabetes, breast cancer was 93.8%, 89.5% and 96.8% respectively. Moreover, latency delays of 1.08 s, 1.02 s and 1.01 s were noted using the same method for the respective datasets. Among wrapper methods, BFS’ performance was impressive in comparison to other methods. Maximum accuracy of 94.7%, 95.8% and 96.8% were achieved on the datasets for heart disease, diabetes and breast cancer respectively. Latency delays of 1.42 s, 1.44 s and 132 s were recorded using the same method for the respective datasets. On the basis of the obtained result, a new hybrid Attribute Evaluator method has been proposed which effectively integrates enhanced K-Means clustering with the CFS filter method and the BFS wrapper method. Furthermore, the hybrid method was evaluated with an improved decision tree classifier. The improved decision tree classifier combined clustering with classification. It was validated on 14 different chronic disease datasets and its performance was recorded. A very optimal and consistent classification performance was observed. The mean values for accuracy, specificity, sensitivity and f-score metrics were 96.7%, 96.5%, 95.6% and 96.2% respectively.
Collapse
|
40
|
Abu Khurmaa R, Aljarah I, Sharieh A. An intelligent feature selection approach based on moth flame optimization for medical diagnosis. Neural Comput Appl 2020. [DOI: 10.1007/s00521-020-05483-5] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
41
|
Li H, He F, Chen Y. Learning dynamic simultaneous clustering and classification via automatic differential evolution and firework algorithm. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106593] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
42
|
Ratna Raju B, Swamy G, Padma Raju K. Diagnosis of colorectal cancer based on imperialist competitive algorithm. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2020. [DOI: 10.3233/jifs-189021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The Colorectal cancer leads to more number of death in recent years. The diagnosis of Colorectal cancer as early is safe to treat the patient. To identify and treat this type of cancer, Colonoscopy is applied commonly. The feature selection based methods are proposed which helps to choose the subset variables and to attain better prediction. An Imperialist Competitive Algorithm (ICA) is proposed which helps to select features in identification of colon cancer and its treatment. Also K-Nearest Neighbor (KNN) classifier is used to retain a minimal Euclidean distance between the feature of query vector and all the data in the nature of prototype training. Experimental results have proved that the proposed method is superior when compared to other methods in its metrics of performance. Better accuracy is achieved by the proposed method.
Collapse
Affiliation(s)
- B Ratna Raju
- Professor of ECE, Miracle Educational Society group of Institutions, Bhogapuram, Vizianagaram Dt. AP
| | - G.N Swamy
- Professor and HOD, -EIE, VR Siddhartha Engineering College (A), Vijayawada, AP
| | - K. Padma Raju
- Professor of ECE, University College of Engineering, JNTUK, Kakinada and Member of APPSC
| |
Collapse
|
43
|
Wang T, Paschalidis A, Liu Q, Liu Y, Yuan Y, Paschalidis IC. Predictive Models of Mortality for Hospitalized Patients With COVID-19: Retrospective Cohort Study. JMIR Med Inform 2020; 8:e21788. [PMID: 33055061 PMCID: PMC7572117 DOI: 10.2196/21788] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Revised: 07/28/2020] [Accepted: 09/15/2020] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND The novel coronavirus SARS-CoV-2 and its associated disease, COVID-19, have caused worldwide disruption, leading countries to take drastic measures to address the progression of the disease. As SARS-CoV-2 continues to spread, hospitals are struggling to allocate resources to patients who are most at risk. In this context, it has become important to develop models that can accurately predict the severity of infection of hospitalized patients to help guide triage, planning, and resource allocation. OBJECTIVE The aim of this study was to develop accurate models to predict the mortality of hospitalized patients with COVID-19 using basic demographics and easily obtainable laboratory data. METHODS We performed a retrospective study of 375 hospitalized patients with COVID-19 in Wuhan, China. The patients were randomly split into derivation and validation cohorts. Regularized logistic regression and support vector machine classifiers were trained on the derivation cohort, and accuracy metrics (F1 scores) were computed on the validation cohort. Two types of models were developed: the first type used laboratory findings from the entire length of the patient's hospital stay, and the second type used laboratory findings that were obtained no later than 12 hours after admission. The models were further validated on a multicenter external cohort of 542 patients. RESULTS Of the 375 patients with COVID-19, 174 (46.4%) died of the infection. The study cohort was composed of 224/375 men (59.7%) and 151/375 women (40.3%), with a mean age of 58.83 years (SD 16.46). The models developed using data from throughout the patients' length of stay demonstrated accuracies as high as 97%, whereas the models with admission laboratory variables possessed accuracies of up to 93%. The latter models predicted patient outcomes an average of 11.5 days in advance. Key variables such as lactate dehydrogenase, high-sensitivity C-reactive protein, and percentage of lymphocytes in the blood were indicated by the models. In line with previous studies, age was also found to be an important variable in predicting mortality. In particular, the mean age of patients who survived COVID-19 infection (50.23 years, SD 15.02) was significantly lower than the mean age of patients who died of the infection (68.75 years, SD 11.83; P<.001). CONCLUSIONS Machine learning models can be successfully employed to accurately predict outcomes of patients with COVID-19. Our models achieved high accuracies and could predict outcomes more than one week in advance; this promising result suggests that these models can be highly useful for resource allocation in hospitals.
Collapse
Affiliation(s)
- Taiyao Wang
- Department of Electrical and Computer Engineering, Boston University, Boston, MA, United States.,Department of Biomedical Engineering, Boston University, Boston, MA, United States.,Center for Information and Systems Engineering, Boston University, Boston, MA, United States
| | | | - Quanying Liu
- Department of Biomedical Engineering, University of Science and Technology, Shenzen, China
| | - Yingxia Liu
- Third People's Hospital of Shenzhen, Second Hospital Affiliated to Southern University of Science and Technology, Shenzen, China
| | - Ye Yuan
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China
| | - Ioannis Ch Paschalidis
- Department of Electrical and Computer Engineering, Boston University, Boston, MA, United States.,Department of Biomedical Engineering, Boston University, Boston, MA, United States.,Center for Information and Systems Engineering, Boston University, Boston, MA, United States
| |
Collapse
|
44
|
Abstract
The growth of wireless networks has been remarkable in the last few years. One of the main reasons for this growth is the massive use of portable and stand-alone devices with wireless network connectivity. These devices have become essential on the daily basis in consumer electronics. As the dependency on wireless networks has increased, the attacks against them over time have increased as well. To detect these attacks, a network intrusion detection system (NIDS) with high accuracy and low detection time is needed. In this work, we propose a machine learning (ML) based wireless network intrusion detection system (WNIDS) for Wi-Fi networks to efficiently detect attacks against them. The proposed WNIDS consists of two stages that work together in a sequence. An ML model is developed for each stage to classify the network records into normal or one of the specific attack classes. We train and validate the ML model for WNIDS using the publicly available Aegean Wi-Fi Intrusion Dataset (AWID). Several feature selection techniques have been considered to identify the best features set for the WNIDS. Our two-stage WNIDS achieves an accuracy of 99.42% for multi-class classification with a reduced set of features. A module for eXplainable Artificial Intelligence (XAI) is implemented as well to understand the influence of features on each type of network traffic records.
Collapse
|
45
|
Ryan L, Lam C, Mataraso S, Allen A, Green-Saxena A, Pellegrini E, Hoffman J, Barton C, McCoy A, Das R. Mortality prediction model for the triage of COVID-19, pneumonia, and mechanically ventilated ICU patients: A retrospective study. Ann Med Surg (Lond) 2020; 59:207-216. [PMID: 33042536 PMCID: PMC7532803 DOI: 10.1016/j.amsu.2020.09.044] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 09/18/2020] [Accepted: 09/20/2020] [Indexed: 01/18/2023] Open
Abstract
Rationale Prediction of patients at risk for mortality can help triage patients and assist in resource allocation. Objectives Develop and evaluate a machine learning-based algorithm which accurately predicts mortality in COVID-19, pneumonia, and mechanically ventilated patients. Methods Retrospective study of 53,001 total ICU patients, including 9166 patients with pneumonia and 25,895 mechanically ventilated patients, performed on the MIMIC dataset. An additional retrospective analysis was performed on a community hospital dataset containing 114 patients positive for SARS-COV-2 by PCR test. The outcome of interest was in-hospital patient mortality. Results When trained and tested on the MIMIC dataset, the XGBoost predictor obtained area under the receiver operating characteristic (AUROC) values of 0.82, 0.81, 0.77, and 0.75 for mortality prediction on mechanically ventilated patients at 12-, 24-, 48-, and 72- hour windows, respectively, and AUROCs of 0.87, 0.78, 0.77, and 0.734 for mortality prediction on pneumonia patients at 12-, 24-, 48-, and 72- hour windows, respectively. The predictor outperformed the qSOFA, MEWS and CURB-65 risk scores at all prediction windows. When tested on the community hospital dataset, the predictor obtained AUROCs of 0.91, 0.90, 0.86, and 0.87 for mortality prediction on COVID-19 patients at 12-, 24-, 48-, and 72- hour windows, respectively, outperforming the qSOFA, MEWS and CURB-65 risk scores at all prediction windows. Conclusions This machine learning-based algorithm is a useful predictive tool for anticipating patient mortality at clinically useful timepoints, and is capable of accurate mortality prediction for mechanically ventilated patients as well as those diagnosed with pneumonia and COVID-19. Mortality predictions have not previously been evaluated for COVID-19 patients. Machine learning may be a useful predictive tool for anticipating patient mortality. Prediction can be estimated at clinically useful windows up to 72 h in advance.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Andrea McCoy
- Cape Regional Medical Center, Cape May Court House, NJ, USA
| | | |
Collapse
|
46
|
Optimal generation scheduling of pumped storage hydro-thermal system with wind energy sources. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106345] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
47
|
Guha R, Ghosh M, Mutsuddi S, Sarkar R, Mirjalili S. Embedded chaotic whale survival algorithm for filter–wrapper feature selection. Soft comput 2020. [DOI: 10.1007/s00500-020-05183-1] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
48
|
|
49
|
Zhang G, Hou J, Wang J, Yan C, Luo J. Feature Selection for Microarray Data Classification Using Hybrid Information Gain and a Modified Binary Krill Herd Algorithm. Interdiscip Sci 2020; 12:288-301. [PMID: 32441000 DOI: 10.1007/s12539-020-00372-w] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Revised: 03/31/2020] [Accepted: 04/24/2020] [Indexed: 10/24/2022]
Abstract
Due to the presence of irrelevant or redundant data in microarray datasets, capturing potential patterns accurately and directly via existing models is difficult. Feature selection (FS) has become a necessary strategy to identify and screen out the most relevant attributes. However, the high dimensionality of microarray datasets poses a serious challenge to most existing FS algorithms. For this purpose, we propose a novel feature selection strategy in this paper, called IG-MBKH. A pre-screening method of feature ranking which is based on information gain (IG) and an improved binary krill herd (MBKH) algorithm are integrated in this strategy. When searching for feature subsets using MBKH, a hyperbolic tangent function, an adaptive transfer factor, and a chaos memory weight factor are introduced to facilitate a better searching the possible feature subsets. The results indicates that the IG-MBKH algorithm can achieve improvement in convergence, the number of features and classification accuracy when compared to the BKH, MBKH, and several newest algorithms. Furthermore, we evaluate the impact of different classifiers on the performance of the strategy we propose.
Collapse
Affiliation(s)
- Ge Zhang
- School of Computer and Information Engineering, Henan University, Kaifeng, China.,Henan Engineering Research Center of Intelligent Technology and Application, Henan University, Kaifeng, 475004, China
| | - Jincui Hou
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Jianlin Wang
- School of Computer and Information Engineering, Henan University, Kaifeng, China.,Henan Engineering Research Center of Intelligent Technology and Application, Henan University, Kaifeng, 475004, China
| | - Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng, China. .,Henan Engineering Research Center of Intelligent Technology and Application, Henan University, Kaifeng, 475004, China.
| | - Junwei Luo
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China.
| |
Collapse
|
50
|
Kwon YS, Baek MS. Development and Validation of a Quick Sepsis-Related Organ Failure Assessment-Based Machine-Learning Model for Mortality Prediction in Patients with Suspected Infection in the Emergency Department. J Clin Med 2020; 9:jcm9030875. [PMID: 32210033 PMCID: PMC7141518 DOI: 10.3390/jcm9030875] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Revised: 03/04/2020] [Accepted: 03/17/2020] [Indexed: 12/23/2022] Open
Abstract
The quick sepsis-related organ failure assessment (qSOFA) score has been introduced to predict the likelihood of organ dysfunction in patients with suspected infection. We hypothesized that machine-learning models using qSOFA variables for predicting three-day mortality would provide better accuracy than the qSOFA score in the emergency department (ED). Between January 2016 and December 2018, the medical records of patients aged over 18 years with suspected infection were retrospectively obtained from four EDs in Korea. Data from three hospitals (n = 19,353) were used as training-validation datasets and data from one (n = 4234) as the test dataset. Machine-learning algorithms including extreme gradient boosting, light gradient boosting machine, and random forest were used. We assessed the prediction ability of machine-learning models using the area under the receiver operating characteristic (AUROC) curve, and DeLong's test was used to compare AUROCs between the qSOFA scores and qSOFA-based machine-learning models. A total of 447,926 patients visited EDs during the study period. We analyzed 23,587 patients with suspected infection who were admitted to the EDs. The median age of the patients was 63 years (interquartile range: 43-78 years) and in-hospital mortality was 4.0% (n = 941). For predicting three-day mortality among patients with suspected infection in the ED, the AUROC of the qSOFA-based machine-learning model (0.86 [95% CI 0.85-0.87]) for three -day mortality was higher than that of the qSOFA scores (0.78 [95% CI 0.77-0.79], p < 0.001). For predicting three-day mortality in patients with suspected infection in the ED, the qSOFA-based machine-learning model was found to be superior to the conventional qSOFA scores.
Collapse
Affiliation(s)
- Young Suk Kwon
- Department of Anaesthesiology and Pain Medicine, College of Medicine, Hallym University, Chuncheon Sacred Heart Hospital, Chuncheon 24253, Korea;
| | - Moon Seong Baek
- Division of Pulmonary, Allergy and Critical Care Medicine, Hallym University Dongtan Sacred Heart Hospital, Hwaseong-si 18450, Korea
- Lung Research Institute of Hallym University College of Medicine, Chuncheon-si 24253, Korea
- Correspondence: ; Tel.: +82-31-8086-2292; Fax: +82-31-8086-2482
| |
Collapse
|