1
|
Liu H, Lu L, Xiong H, Fan C, Fan L, Lin Z, Zhang H. A Novel Approach to Dual Feature Selection of Atrial Fibrillation Based on HC-MFS. Diagnostics (Basel) 2024; 14:1145. [PMID: 38893671 PMCID: PMC11171513 DOI: 10.3390/diagnostics14111145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 05/23/2024] [Accepted: 05/24/2024] [Indexed: 06/21/2024] Open
Abstract
This investigation sought to discern the risk factors for atrial fibrillation within Shanghai's Chongming District, analyzing data from 678 patients treated at a tertiary hospital in Chongming District, Shanghai, from 2020 to 2023, collecting information on season, C-reactive protein, hypertension, platelets, and other relevant indicators. The researchers introduced a novel dual feature-selection methodology, combining hierarchical clustering with Fisher scores (HC-MFS), to benchmark against four established methods. Through the training of five classification models on a designated dataset, the most effective model was chosen for method performance evaluation, with validation confirmed by test set scores. Impressively, the HC-MFS approach achieved the highest accuracy and the lowest root mean square error in the classification model, at 0.9118 and 0.2970, respectively. This provides a higher performance compared to existing methods, thanks to the combination and interaction of the two methods, which improves the quality of the feature subset. The research identified seasonal changes that were strongly associated with atrial fibrillation (pr = 0.31, FS = 0.11, and DCFS = 0.33, ranked first in terms of correlation); LDL cholesterol, total cholesterol, C-reactive protein, and platelet count, which are associated with inflammatory response and coronary heart disease, also indirectly contribute to atrial fibrillation and are risk factors for AF. Conclusively, this study advocates that machine-learning models can significantly aid clinicians in diagnosing individuals predisposed to atrial fibrillation, which shows a strong correlation with both pathological and climatic elements, especially seasonal variations, in the Chongming District.
Collapse
Affiliation(s)
- Hong Liu
- Business School, University of Shanghai for Science and Technology, Shanghai 200093, China; (H.L.); (L.F.); (H.Z.)
- Chongming Hospital, Shanghai University of Medicine & Health Sciences, Shanghai 202150, China
| | - Lifeng Lu
- Business School, University of Shanghai for Science and Technology, Shanghai 200093, China; (H.L.); (L.F.); (H.Z.)
| | - Honglin Xiong
- Collaborative Innovation Center for Biomedicine, Shanghai University of Medicine & Health Sciences, Shanghai 201318, China
- Antai College of Economics and Management, Shanghai Jiao Tong University, Shanghai 200030, China
| | - Chongjun Fan
- Business School, University of Shanghai for Science and Technology, Shanghai 200093, China; (H.L.); (L.F.); (H.Z.)
| | - Lumin Fan
- Business School, University of Shanghai for Science and Technology, Shanghai 200093, China; (H.L.); (L.F.); (H.Z.)
| | - Ziqian Lin
- Business School, University of Shanghai for Science and Technology, Shanghai 200093, China; (H.L.); (L.F.); (H.Z.)
| | - Hongliu Zhang
- Business School, University of Shanghai for Science and Technology, Shanghai 200093, China; (H.L.); (L.F.); (H.Z.)
| |
Collapse
|
2
|
Santana DC, de Oliveira IC, de Oliveira JLG, Baio FHR, Teodoro LPR, da Silva Junior CA, Seron ACC, Ítavo LCV, Coradi PC, Teodoro PE. High-throughput phenotyping using VIS/NIR spectroscopy in the classification of soybean genotypes for grain yield and industrial traits. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2024; 310:123963. [PMID: 38309004 DOI: 10.1016/j.saa.2024.123963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 01/16/2024] [Accepted: 01/22/2024] [Indexed: 02/05/2024]
Abstract
Employing visible and near infrared sensors in high-throughput phenotyping provides insight into the relationship between the spectral characteristics of the leaf and the content of grain properties, helping soybean breeders to direct their program towards improving grain traits according to researchers' interests. Our research hypothesis is that the leaf reflectance of soybean genotypes can be directly related to industrial grain traits such as protein and fiber contents. Thus, the objectives of the study were: (i) to classify soybean genotypes according to the grain yield and industrial traits; (ii) to identify the algorithm(s) with the highest accuracy for classifying genotypes using leaf reflectance as model input; (iii) to identify the best input data for the algorithms to improve their performance. A field experiment was carried out in randomized block design with three replications and 32 soybean genotypes. At 60 days after emergence, spectral analysis was carried out on three leaf samples from each plot. A hyperspectral sensor was used to capture reflectance between the wavelengths from 450 to 824 nm. Representative spectral bands were selected and grouped into means. After harvest, grain yield was assessed and laboratory analyses of industrial traits were carried out. Spectral, industrial traits and yield data were subjected to statistical analysis. Data were analyzed by the following machine learning algorithms: J48 (J48) and REPTree (DT) decision trees, Random Forest (RF), Artificial Neural Networks (ANN), Support Vector Machine (SVM), and conventional Logistic Regression (LR) analysis. The clusters formed were used as the output of the models, while two groups of input data were used for the input of the models: the spectral variables (WL) noise-free obtained by the sensor (450-828 nm) and the spectral means of the selected bands (SB) (450.0-720.6 nm). Soybean genotypes were grouped according to their grain yield and industrial traits, in which the SVM and J48 algorithms performed better at classifying them. Using the spectral bands selected in the study improved the classification accuracy of the algorithms.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Ana Carina Candido Seron
- Department of Agronomy, State University of São Paulo (UNESP), Ilha Solteira 15385-000, SP, Brazil.
| | | | - Paulo Carteri Coradi
- Campus Cachoeira do Sul, Federal University of Santa Maria, Street Ernesto Barros, 1345, 96506-322 Cachoeira do Sul, RS, Brazil.
| | - Paulo Eduardo Teodoro
- Federal University of Mato Grosso do Sul (UFMS), Chapadão do Sul 79560-000, MS, Brazil.
| |
Collapse
|
3
|
Chafai N, Bonizzi L, Botti S, Badaoui B. Emerging applications of machine learning in genomic medicine and healthcare. Crit Rev Clin Lab Sci 2024; 61:140-163. [PMID: 37815417 DOI: 10.1080/10408363.2023.2259466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 09/12/2023] [Indexed: 10/11/2023]
Abstract
The integration of artificial intelligence technologies has propelled the progress of clinical and genomic medicine in recent years. The significant increase in computing power has facilitated the ability of artificial intelligence models to analyze and extract features from extensive medical data and images, thereby contributing to the advancement of intelligent diagnostic tools. Artificial intelligence (AI) models have been utilized in the field of personalized medicine to integrate clinical data and genomic information of patients. This integration allows for the identification of customized treatment recommendations, ultimately leading to enhanced patient outcomes. Notwithstanding the notable advancements, the application of artificial intelligence (AI) in the field of medicine is impeded by various obstacles such as the limited availability of clinical and genomic data, the diversity of datasets, ethical implications, and the inconclusive interpretation of AI models' results. In this review, a comprehensive evaluation of multiple machine learning algorithms utilized in the fields of clinical and genomic medicine is conducted. Furthermore, we present an overview of the implementation of artificial intelligence (AI) in the fields of clinical medicine, drug discovery, and genomic medicine. Finally, a number of constraints pertaining to the implementation of artificial intelligence within the healthcare industry are examined.
Collapse
Affiliation(s)
- Narjice Chafai
- Laboratory of Biodiversity, Ecology, and Genome, Faculty of Sciences, Department of Biology, Mohammed V University in Rabat, Rabat, Morocco
| | - Luigi Bonizzi
- Department of Biomedical, Surgical and Dental Science, University of Milan, Milan, Italy
| | - Sara Botti
- PTP Science Park, Via Einstein - Loc. Cascina Codazza, Lodi, Italy
| | - Bouabid Badaoui
- Laboratory of Biodiversity, Ecology, and Genome, Faculty of Sciences, Department of Biology, Mohammed V University in Rabat, Rabat, Morocco
- African Sustainable Agriculture Research Institute (ASARI), Mohammed VI Polytechnic University (UM6P), Laâyoune, Morocco
| |
Collapse
|
4
|
Al-Rajab M, Lu J, Xu Q, Kentour M, Sawsa A, Shuweikeh E, Joy M, Arasaradnam R. A hybrid machine learning feature selection model-HMLFSM to enhance gene classification applied to multiple colon cancers dataset. PLoS One 2023; 18:e0286791. [PMID: 37917732 PMCID: PMC10621932 DOI: 10.1371/journal.pone.0286791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 05/20/2023] [Indexed: 11/04/2023] Open
Abstract
Colon cancer is a significant global health problem, and early detection is critical for improving survival rates. Traditional detection methods, such as colonoscopies, can be invasive and uncomfortable for patients. Machine Learning (ML) algorithms have emerged as a promising approach for non-invasive colon cancer classification using genetic data or patient demographics and medical history. One approach is to use ML to analyse genetic data, or patient demographics and medical history, to predict the likelihood of colon cancer. However, due to the challenges imposed by variable gene expression and the high dimensionality of cancer-related datasets, traditional transductive ML applications have limited accuracy and risk overfitting. In this paper, we propose a new hybrid feature selection model called HMLFSM-Hybrid Machine Learning Feature Selection Model to improve colon cancer gene classification. We developed a multifilter hybrid model including a two-phase feature selection approach, combining Information Gain (IG) and Genetic Algorithms (GA), and minimum Redundancy Maximum Relevance (mRMR) coupling with Particle Swarm Optimization (PSO). We critically tested our model on three colon cancer genetic datasets and found that the new framework outperformed other models with significant accuracy improvements (95%, ~97%, and ~94% accuracies for datasets 1, 2, and 3 respectively). The results show that our approach improves the classification accuracy of colon cancer detection by highlighting important and relevant genes, eliminating irrelevant ones, and revealing the genes that have a direct influence on the classification process. For colon cancer gene analysis, and along with our experiments and literature review, we found that selective input feature extraction prior to feature selection is essential for improving predictive performance.
Collapse
Affiliation(s)
- Murad Al-Rajab
- College of Engineering, Abu Dhabi University, Abu Dhabi, United Arab Emirates
- School of Computing and Engineering, University of Huddersfield, Huddersfield, United Kingdom
| | - Joan Lu
- School of Computing and Engineering, University of Huddersfield, Huddersfield, United Kingdom
| | - Qiang Xu
- School of Computing and Engineering, University of Huddersfield, Huddersfield, United Kingdom
| | - Mohamed Kentour
- School of Computing and Engineering, University of Huddersfield, Huddersfield, United Kingdom
| | - Ahlam Sawsa
- School of Computing and Engineering, University of Huddersfield, Huddersfield, United Kingdom
- Bradford Teaching Hospitals NHS Foundation Trust, Bradford, United Kingdom
| | - Emad Shuweikeh
- School of Computing and Engineering, University of Huddersfield, Huddersfield, United Kingdom
| | - Mike Joy
- University of Warwick, Coventry, United Kingdom
| | | |
Collapse
|
5
|
Bayrak T, Çetin Z, Saygılı Eİ, Ogul H. Identifying the tumor location-associated candidate genes in development of new drugs for colorectal cancer using machine-learning-based approach. Med Biol Eng Comput 2022; 60:2877-2897. [DOI: 10.1007/s11517-022-02641-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 07/28/2022] [Indexed: 02/07/2023]
|
6
|
Torkey H, Belal NA. An Enhanced Multiple Sclerosis Disease Diagnosis via an Ensemble Approach. Diagnostics (Basel) 2022; 12:1771. [PMID: 35885672 PMCID: PMC9316893 DOI: 10.3390/diagnostics12071771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 06/25/2022] [Accepted: 07/18/2022] [Indexed: 11/30/2022] Open
Abstract
Multiple Sclerosis (MS) is a disease attacking the central nervous system. According to MS Atlas's most recent statistics, there are more than 2.8 million people worldwide diagnosed with MS. Recently, studies started to explore machine learning techniques to predict MS using various data. The objective of this paper is to develop an ensemble approach for diagnosis of MS using gene expression profiles, while handling the class imbalance problem associated with the data. A hierarchical ensemble approach employing voting and boosting techniques is proposed. This approach adopts a heterogeneous voting approach using two base learners, random forest and support vector machine. Experiments show that our approach outperforms state-of-the-art methods, with the highest recorded accuracy being 92.81% and 93.5% with BoostFS and DEGs for feature selection, respectively. Conclusively, the proposed approach is able to efficiently diagnose MS using the gene expression profiles that are more relevant to the disease. The approach is not merely an ensemble classifier outperforming previous work; it also identifies differentially expressed genes between normal samples and patients with multiple sclerosis using a genome-wide expression microarray. The results obtained show that the proposed approach is an efficient diagnostic tool for MS.
Collapse
Affiliation(s)
- Hanaa Torkey
- Computer Science and Engineering Department, Faculty of Electronic Engineering, Menoufia University, Menouf 32952, Egypt;
| | - Nahla A. Belal
- College of Computing and Information Technology, Arab Academy for Science, Technology, and Maritime Transport, Smart Village 12577, Egypt
| |
Collapse
|
7
|
Stroke Risk Prediction with Machine Learning Techniques. SENSORS 2022; 22:s22134670. [PMID: 35808172 PMCID: PMC9268898 DOI: 10.3390/s22134670] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 06/16/2022] [Accepted: 06/20/2022] [Indexed: 01/25/2023]
Abstract
A stroke is caused when blood flow to a part of the brain is stopped abruptly. Without the blood supply, the brain cells gradually die, and disability occurs depending on the area of the brain affected. Early recognition of symptoms can significantly carry valuable information for the prediction of stroke and promoting a healthy life. In this research work, with the aid of machine learning (ML), several models are developed and evaluated to design a robust framework for the long-term risk prediction of stroke occurrence. The main contribution of this study is a stacking method that achieves a high performance that is validated by various metrics, such as AUC, precision, recall, F-measure and accuracy. The experiment results showed that the stacking classification outperforms the other methods, with an AUC of 98.9%, F-measure, precision and recall of 97.4% and an accuracy of 98%.
Collapse
|
8
|
Soybean Cultivars Identification Using Remotely Sensed Image and Machine Learning Models. SUSTAINABILITY 2022. [DOI: 10.3390/su14127125] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Using remote sensing combined with machine learning (ML) techniques is a promising approach to classify soybean cultivars. Therefore, the objectives of this study are (i) to verify which input dataset configuration (using only spectral bands, only vegetation indices, or both) is more accurate in the identification of soybean cultivars, and (ii) to verify which ML technique is more accurate in the identification of soybean cultivars. Information was extracted from five central irrigation pivots in the same region and with the same sowing date in the 2015/2016 crop year, in which each pivot was cultivated with a different cultivar, in which the cultivars used were: CV1—P98y12 RR, CV2—Desafio RR, CV3—M6410 IPRO, CV4—M7110 IPRO, and CV5—NA5909 RR. A cloud-free orbital image of the site was acquired from the Google Earth Engine platform. In addition to the spectral bands alone, a total of 13 vegetation indices were calculated. The models tested were: artificial neural networks (ANN), radial basis function network (RBF), decision tree algorithms J48 (DT) and reduced error pruning tree (REP), random forest (RF), and support vector machine (SVM). The five soybean cultivars were classified by the six-machine learning (ML) models in stratified randomized cross-validation with k-fold = 10 and 10 repetitions (100 runs for each model). After obtaining the r and MAE statistics, analysis of variance was performed considering a 6 × 3 factorial scheme (models versus inputs) with 10 repetitions (folds). The means were grouped by the Scott–Knott test at 5% probability. The spectral bands were the most accurate among the tested inputs in the identification of soybean cultivars. ANN was the most accurate model in identifying soybean cultivars.
Collapse
|
9
|
da Silva André G, Coradi PC, Teodoro LPR, Teodoro PE. Predicting the quality of soybean seeds stored in different environments and packaging using machine learning. Sci Rep 2022; 12:8793. [PMID: 35614333 PMCID: PMC9132987 DOI: 10.1038/s41598-022-12863-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 05/06/2022] [Indexed: 11/25/2022] Open
Abstract
The monitoring and evaluating the physical and physiological quality of seeds throughout storage requires technical and financial resources and is subject to sampling and laboratory errors. Therefore, machine learning (ML) techniques could help optimize the processes and obtain accurate results for decision-making in the seed storage process. This study aimed to analyze the performance of ML algorithms from variables monitored during seed conditioning (temperature and packaging) and storage time to predict the physical and physiological quality of stored soybean seeds. Data analysis was performed using the Artificial Neural Networks, decision tree algorithms REPTree and M5P, Random Forest, and Linear Regression. In predicting seed quality, the combination of the input variables temperature and storage time for REPTree and Random Forest algorithms outperformed the linear regression, providing higher accuracy indices. Among the most important results, it was observed for apparent specific mass that T + P + ST, T + ST, P + ST, and ST had the highest r means and the lowest MAE means, however, Person's r coefficient for these inputs was 0.63 and the MAE between 9.59 to 10.47. The germination results for inputs T + P + ST and T + ST had the best results (r = 0.65 and r = 0.67, respectively) in the ANN, REPTree, M5P and RF models. Using computational intelligence algorithms is an excellent alternative to predict the quality of soybean seeds from the information of easy-to-measure variables.
Collapse
Affiliation(s)
- Geovane da Silva André
- Department of Agronomy, Campus de Chapadão do Sul, Federal University of Mato Grosso do Sul, Chapadão do Sul, MS, 79560-000, Brazil
| | - Paulo Carteri Coradi
- Department of Agronomy, Campus de Chapadão do Sul, Federal University of Mato Grosso do Sul, Chapadão do Sul, MS, 79560-000, Brazil.
- Department Agricultural Engineering, Rural Sciences Center, Federal University of Santa Maria, Avenue Roraima, 1000, Camobi, Santa Maria, Rio Grande do Sul, 97105-900, Brazil.
- Department of Agricultural Engineering, Laboratory of Postharvest, Campus Cachoeira do Sul, Federal University of Santa Maria, Highway Taufik Germano, 3013, Passo D'Areia, Cachoeira do Sul, Rio Grande do Sul, 96506-322, Brazil.
| | - Larissa Pereira Ribeiro Teodoro
- Department of Agronomy, Campus de Chapadão do Sul, Federal University of Mato Grosso do Sul, Chapadão do Sul, MS, 79560-000, Brazil
| | - Paulo Eduardo Teodoro
- Department of Agronomy, Campus de Chapadão do Sul, Federal University of Mato Grosso do Sul, Chapadão do Sul, MS, 79560-000, Brazil
| |
Collapse
|
10
|
Nadakinamani RG, Reyana A, Kautish S, Vibith AS, Gupta Y, Abdelwahab SF, Mohamed AW. Clinical Data Analysis for Prediction of Cardiovascular Disease Using Machine Learning Techniques. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:2973324. [PMID: 35069715 PMCID: PMC8767405 DOI: 10.1155/2022/2973324] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 12/03/2021] [Accepted: 12/15/2021] [Indexed: 02/08/2023]
Abstract
Cardiovascular disease is difficult to detect due to several risk factors, including high blood pressure, cholesterol, and an abnormal pulse rate. Accurate decision-making and optimal treatment are required to address cardiac risk. As machine learning technology advances, the healthcare industry's clinical practice is likely to change. As a result, researchers and clinicians must recognize the importance of machine learning techniques. The main objective of this research is to recommend a machine learning-based cardiovascular disease prediction system that is highly accurate. In contrast, modern machine learning algorithms such as REP Tree, M5P Tree, Random Tree, Linear Regression, Naive Bayes, J48, and JRIP are used to classify popular cardiovascular datasets. The proposed CDPS's performance was evaluated using a variety of metrics to identify the best suitable machine learning model. When it came to predicting cardiovascular disease patients, the Random Tree model performed admirably, with the highest accuracy of 100%, the lowest MAE of 0.0011, the lowest RMSE of 0.0231, and the fastest prediction time of 0.01 seconds.
Collapse
Affiliation(s)
| | - A. Reyana
- Department of Computer Science and Engineering, Hindusthan College of Engineering and Technology, Coimbatore, Tamil Nadu, India
| | - Sandeep Kautish
- Department of Computer Science and Engineering, LBEF Campus, Kathmandu, Nepal, India
| | - A. S. Vibith
- Department of Computer Science and Engineering, RMK College of Engineering and Technology, Tiruvallur, Tamil Nadu, India
| | - Yogita Gupta
- Department of Biotechnology, Thapar Institute of Engineering & Technology, Patiala, India
| | - Sayed F. Abdelwahab
- Department of Pharmaceutics and Industrial Pharmacy, College of Pharmacy, Taif University, PO Box 11099, Taif 21944, Saudi Arabia
| | - Ali Wagdy Mohamed
- Operations Research Department, Faculty of Graduate Studies for Statistical Research, Cairo University, Giza 12613, Egypt
- Department of Mathematics and Actuarial Science, School of Science and Engineering, The American University in Cairo, New Cairo, Egypt
| |
Collapse
|
11
|
Machine learning approaches for classification of colorectal cancer with and without feature selection method on microarray data. GENE REPORTS 2021. [DOI: 10.1016/j.genrep.2021.101419] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
|
12
|
Rastogi N, Frey HC. Characterizing Fuel Use and Emission Hotspots for a Diesel-Operated Passenger Rail Service. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2021; 55:10633-10644. [PMID: 34270225 DOI: 10.1021/acs.est.1c00273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Spatially varying diesel locomotive fuel use and emission rates (FUERs) are needed to accurately quantify local emission hotspots and their health impacts. However, existing locomotive FUER data are typically not spatially resolved or representative of real-world locomotive operation. Therefore, existing data are of limited use in quantifying the spatial variability in real-world FUERs. The objectives of this work are to quantify spatial variability in locomotive FUERs and identify factors differentiating hotspots from non-hotspots. FUERs were measured based on real-world measurements conducted for the Piedmont passenger rail service using a portable emission measurement system. FUERs were quantified based on 0.25 mile track segments on the Piedmont route. Hotspots were defined as segments in the top quintile of segment-average FUERs. On average, hotspots contributed 40-50% to trip fuel use and emissions. Hotspots were typically associated with low-to-medium speed, and high acceleration and grade. In contrast, non-hotspots were associated with high speed, and low acceleration and grade. Hotspots were typically located near populated areas and, thus, may exacerbate air pollutant exposure. The method demonstrated here can be applied to other passenger train services to assess key trends in hotspot locations and factors that explain the occurrence of hotspots.
Collapse
Affiliation(s)
- Nikhil Rastogi
- Department of Civil, Construction, and Environmental Engineering, North Carolina State University, Campus Box 7908, Raleigh, North Carolina 27695-7908, United States
| | - H Christopher Frey
- Department of Civil, Construction, and Environmental Engineering, North Carolina State University, Campus Box 7908, Raleigh, North Carolina 27695-7908, United States
| |
Collapse
|
13
|
Al-Rajab M, Lu J, Xu Q. A framework model using multifilter feature selection to enhance colon cancer classification. PLoS One 2021; 16:e0249094. [PMID: 33861766 PMCID: PMC8691854 DOI: 10.1371/journal.pone.0249094] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 03/11/2021] [Indexed: 11/18/2022] Open
Abstract
Gene expression profiles can be utilized in the diagnosis of critical diseases such as cancer. The selection of biomarker genes from these profiles is significant and crucial for cancer detection. This paper presents a framework proposing a two-stage multifilter hybrid model of feature selection for colon cancer classification. Colon cancer is being extremely common nowadays among other types of cancer. There is a need to find fast and an accurate method to detect the tissues, and enhance the diagnostic process and the drug discovery. This paper reports on a study whose objective has been to improve the diagnosis of cancer of the colon through a two-stage, multifilter model of feature selection. The model described deals with feature selection using a combination of Information Gain and a Genetic Algorithm. The next stage is to filter and rank the genes identified through this method using the minimum Redundancy Maximum Relevance (mRMR) technique. The final phase is to further analyze the data using correlated machine learning algorithms. This two-stage approach, which involves the selection of genes before classification techniques are used, improves success rates for the identification of cancer cells. It is found that Decision Tree, K-Nearest Neighbor, and Naïve Bayes classifiers had showed promising accurate results using the developed hybrid framework model. It is concluded that the performance of our proposed method has achieved a higher accuracy in comparison with the existing methods reported in the literatures. This study can be used as a clue to enhance treatment and drug discovery for the colon cancer cure.
Collapse
Affiliation(s)
- Murad Al-Rajab
- School of Computing and Engineering, University of
Huddersfield, Huddersfield, United Kingdom
| | - Joan Lu
- School of Computing and Engineering, University of
Huddersfield, Huddersfield, United Kingdom
| | - Qiang Xu
- School of Computing and Engineering, University of
Huddersfield, Huddersfield, United Kingdom
| |
Collapse
|
14
|
Jaya Ant lion optimization-driven Deep recurrent neural network for cancer classification using gene expression data. Med Biol Eng Comput 2021; 59:1005-1021. [PMID: 33851321 DOI: 10.1007/s11517-021-02350-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2020] [Accepted: 03/17/2021] [Indexed: 10/21/2022]
Abstract
Cancer is one of the deadly diseases prevailing worldwide and the patients with cancer are rescued only when the cancer is detected at the very early stage. Early detection of cancer is essential as, in the final stage, the chance of survival is limited. The symptoms of cancers are rigorous and therefore, all the symptoms should be studied properly before the diagnosis. Thus, an automatic prediction system is necessary for classifying cancer as malignant or benign. Hence, this paper introduces the novel strategy based on the JayaAnt lion optimization-based Deep recurrent neural network (JayaALO-based DeepRNN) for cancer classification. The steps followed in the developed model are data normalization, data transformation, feature dimension detection, and classification. The first step is data normalization. The goal of data normalization is to eliminate data redundancy and to mitigate the storage of objects in a relational database that maintains the same information in several places. After that, the data transformation is carried out based on log transformation that generates the patterns using more interpretable and helps fulfill the supposition, and to reduce skew. Also, the non-negative matrix factorization is employed for reducing the feature dimension. Finally, the proposed JayaALO-based DeepRNN method effectively classifies cancer based on the reduced dimension features to produce a satisfactory result. Thus, the resulted output of the proposed JayaALO-based DeepRNN is employed for cancer classification. The proposed JayaALO-based DeepRNN showed improved results with maximal accuracy of 95.97%, maximal sensitivity of 95.95%, and maximal specificity of 96.96%. The goal of this research is to devise the cancer classification strategy using the proposed JayaALO-based DeepRNN. It is required to detect the cancer at an early stage to prevent the destruction caused to the other organs. The developed model involves four phases to perform the cancer classification, namely data normalization, data transformation, feature dimension detection, and the classification. Initially, the input images are gathered and are adapted to perform data normalization. The normalized data is fed to the data transformation, which will be performed using log transformation. The obtained transformed data is fed to feature dimension reduction which is performed using non-negative matrix factorization. The reduced features will be employed in DeepRNN for cancer classification. The training of DeepRNN is done using the proposed JayaALO, which is designed by combining ALO and the Jaya algorithm the block diagram of the proposed cancer classification approach using JayaALO-based DeepRNN approach is given below.
Collapse
|
15
|
|
16
|
Cai Y, Zhang H, Sun S, Wang X, He Q. Axiomatic fuzzy set theory-based fuzzy oblique decision tree with dynamic mining fuzzy rules. Neural Comput Appl 2019. [DOI: 10.1007/s00521-019-04649-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
17
|
Liao Q, Ding Y, Jiang ZL, Wang X, Zhang C, Zhang Q. Multi-task deep convolutional neural network for cancer diagnosis. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.06.084] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
18
|
Multiclass Benchmarking Framework for Automated Acute Leukaemia Detection and Classification Based on BWM and Group-VIKOR. J Med Syst 2019; 43:212. [PMID: 31154550 DOI: 10.1007/s10916-019-1338-x] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Accepted: 05/13/2019] [Indexed: 10/26/2022]
Abstract
This paper aims to assist the administration departments of medical organisations in making the right decision on selecting a suitable multiclass classification model for acute leukaemia. In this paper, we proposed a framework that will aid these departments in evaluating, benchmarking and ranking available multiclass classification models for the selection of the best one. Medical organisations have continuously faced evaluation and benchmarking challenges in such endeavour, especially when no single model is superior. Moreover, the improper selection of multiclass classification for acute leukaemia model may be costly for medical organisations. For example, when a patient dies, one such organisation will be legally or financially sued for incidents in which the model fails to fulfil its desired outcome. With regard to evaluation and benchmarking, multiclass classification models are challenging processes due to multiple evaluation and conflicting criteria. This study structured a decision matrix (DM) based on the crossover of 2 groups of multi-evaluation criteria and 22 multiclass classification models. The matrix was then evaluated with datasets comprising 72 samples of acute leukaemia, which include 5327 gens. Subsequently, multi-criteria decision-making (MCDM) techniques are used in the benchmarking and ranking of multiclass classification models. The MCDM used techniques that include the integrated BWM and VIKOR. BWM has been applied for the weight calculations of evaluation criteria, whereas VIKOR has been used to benchmark and rank classification models. VIKOR has also been employed in two decision-making contexts: individual and group decision making and internal and external group aggregation. Results showed the following: (1) the integration of BWM and VIKOR is effective at solving the benchmarking/selection problems of multiclass classification models. (2) The ranks of classification models obtained from internal and external VIKOR group decision making were almost the same, and the best multiclass classification model based on the two was 'Bayes. Naive Byes Updateable' and the worst one was 'Trees.LMT'. (3) Among the scores of groups in the objective validation, significant differences were identified, which indicated that the ranking results of internal and external VIKOR group decision making were valid.
Collapse
|
19
|
Alsalem MA, Zaidan AA, Zaidan BB, Hashim M, Albahri OS, Albahri AS, Hadi A, Mohammed KI. Systematic Review of an Automated Multiclass Detection and Classification System for Acute Leukaemia in Terms of Evaluation and Benchmarking, Open Challenges, Issues and Methodological Aspects. J Med Syst 2018; 42:204. [PMID: 30232632 DOI: 10.1007/s10916-018-1064-9] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2018] [Accepted: 09/06/2018] [Indexed: 10/28/2022]
Abstract
This study aims to systematically review prior research on the evaluation and benchmarking of automated acute leukaemia classification tasks. The review depends on three reliable search engines: ScienceDirect, Web of Science and IEEE Xplore. A research taxonomy developed for the review considers a wide perspective for automated detection and classification of acute leukaemia research and reflects the usage trends in the evaluation criteria in this field. The developed taxonomy consists of three main research directions in this domain. The taxonomy involves two phases. The first phase includes all three research directions. The second one demonstrates all the criteria used for evaluating acute leukaemia classification. The final set of studies includes 83 investigations, most of which focused on enhancing the accuracy and performance of detection and classification through proposed methods or systems. Few efforts were made to undertake the evaluation issues. According to the final set of articles, three groups of articles represented the main research directions in this domain: 56 articles highlighted the proposed methods, 22 articles involved proposals for system development and 5 papers centred on evaluation and comparison. The other taxonomy side included 16 main and sub-evaluation and benchmarking criteria. This review highlights three serious issues in the evaluation and benchmarking of multiclass classification of acute leukaemia, namely, conflicting criteria, evaluation criteria and criteria importance. It also determines the weakness of benchmarking tools. To solve these issues, multicriteria decision-making (MCDM) analysis techniques were proposed as effective recommended solutions in the methodological aspect. This methodological aspect involves a proposed decision support system based on MCDM for evaluation and benchmarking to select suitable multiclass classification models for acute leukaemia. The said support system is examined and has three sequential phases. Phase One presents the identification procedure and process for establishing a decision matrix based on a crossover of evaluation criteria and acute leukaemia multiclass classification models. Phase Two describes the decision matrix development for the selection of acute leukaemia classification models based on the integrated Best and worst method (BWM) and VIKOR. Phase Three entails the validation of the proposed system.
Collapse
Affiliation(s)
- M A Alsalem
- Department of Computing, Universiti Pendidikan Sultan Idris, Tanjong Malim, Perak, Malaysia
| | - A A Zaidan
- Department of Computing, Universiti Pendidikan Sultan Idris, Tanjong Malim, Perak, Malaysia.
| | - B B Zaidan
- Department of Computing, Universiti Pendidikan Sultan Idris, Tanjong Malim, Perak, Malaysia
| | - M Hashim
- Department of Computing, Universiti Pendidikan Sultan Idris, Tanjong Malim, Perak, Malaysia
| | - O S Albahri
- Department of Computing, Universiti Pendidikan Sultan Idris, Tanjong Malim, Perak, Malaysia
| | - A S Albahri
- Department of Computing, Universiti Pendidikan Sultan Idris, Tanjong Malim, Perak, Malaysia
| | - Ali Hadi
- Department of Computing, Universiti Pendidikan Sultan Idris, Tanjong Malim, Perak, Malaysia
| | - K I Mohammed
- Department of Computing, Universiti Pendidikan Sultan Idris, Tanjong Malim, Perak, Malaysia
| |
Collapse
|
20
|
Alsalem MA, Zaidan AA, Zaidan BB, Hashim M, Madhloom HT, Azeez ND, Alsyisuf S. A review of the automated detection and classification of acute leukaemia: Coherent taxonomy, datasets, validation and performance measurements, motivation, open challenges and recommendations. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2018; 158:93-112. [PMID: 29544792 DOI: 10.1016/j.cmpb.2018.02.005] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2017] [Revised: 01/19/2018] [Accepted: 02/02/2018] [Indexed: 06/08/2023]
Abstract
CONTEXT Acute leukaemia diagnosis is a field requiring automated solutions, tools and methods and the ability to facilitate early detection and even prediction. Many studies have focused on the automatic detection and classification of acute leukaemia and their subtypes to promote enable highly accurate diagnosis. OBJECTIVE This study aimed to review and analyse literature related to the detection and classification of acute leukaemia. The factors that were considered to improve understanding on the field's various contextual aspects in published studies and characteristics were motivation, open challenges that confronted researchers and recommendations presented to researchers to enhance this vital research area. METHODS We systematically searched all articles about the classification and detection of acute leukaemia, as well as their evaluation and benchmarking, in three main databases: ScienceDirect, Web of Science and IEEE Xplore from 2007 to 2017. These indices were considered to be sufficiently extensive to encompass our field of literature. RESULTS Based on our inclusion and exclusion criteria, 89 articles were selected. Most studies (58/89) focused on the methods or algorithms of acute leukaemia classification, a number of papers (22/89) covered the developed systems for the detection or diagnosis of acute leukaemia and few papers (5/89) presented evaluation and comparative studies. The smallest portion (4/89) of articles comprised reviews and surveys. DISCUSSION Acute leukaemia diagnosis, which is a field requiring automated solutions, tools and methods, entails the ability to facilitate early detection or even prediction. Many studies have been performed on the automatic detection and classification of acute leukaemia and their subtypes to promote accurate diagnosis. CONCLUSIONS Research areas on medical-image classification vary, but they are all equally vital. We expect this systematic review to help emphasise current research opportunities and thus extend and create additional research fields.
Collapse
Affiliation(s)
- M A Alsalem
- Department of Computing, Faculty of Arts, Computing and Creative Industry, Universiti Pendidikan Sultan Idris, Malaysia
| | - A A Zaidan
- Department of Computing, Faculty of Arts, Computing and Creative Industry, Universiti Pendidikan Sultan Idris, Malaysia.
| | - B B Zaidan
- Department of Computing, Faculty of Arts, Computing and Creative Industry, Universiti Pendidikan Sultan Idris, Malaysia
| | - M Hashim
- Department of Computing, Faculty of Arts, Computing and Creative Industry, Universiti Pendidikan Sultan Idris, Malaysia
| | - H T Madhloom
- Department of Computing, Faculty of Arts, Computing and Creative Industry, Universiti Pendidikan Sultan Idris, Malaysia
| | - N D Azeez
- Department of Computing, Faculty of Arts, Computing and Creative Industry, Universiti Pendidikan Sultan Idris, Malaysia
| | - S Alsyisuf
- Faculty of on information Science and Engineering, Management and Science university, Shah Alam, Malaysia
| |
Collapse
|
21
|
Pai PF, ChangLiao LH, Lin KP. Analyzing basketball games by a support vector machines with decision tree model. Neural Comput Appl 2016. [DOI: 10.1007/s00521-016-2321-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
22
|
Hijazi H, Chan C. A classification framework applied to cancer gene expression profiles. JOURNAL OF HEALTHCARE ENGINEERING 2013; 4:255-83. [PMID: 23778014 DOI: 10.1260/2040-2295.4.2.255] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Classification of cancer based on gene expression has provided insight into possible treatment strategies. Thus, developing machine learning methods that can successfully distinguish among cancer subtypes or normal versus cancer samples is important. This work discusses supervised learning techniques that have been employed to classify cancers. Furthermore, a two-step feature selection method based on an attribute estimation method (e.g., ReliefF) and a genetic algorithm was employed to find a set of genes that can best differentiate between cancer subtypes or normal versus cancer samples. The application of different classification methods (e.g., decision tree, k-nearest neighbor, support vector machine (SVM), bagging, and random forest) on 5 cancer datasets shows that no classification method universally outperforms all the others. However, k-nearest neighbor and linear SVM generally improve the classification performance over other classifiers. Finally, incorporating diverse types of genomic data (e.g., protein-protein interaction data and gene expression) increase the prediction accuracy as compared to using gene expression alone.
Collapse
Affiliation(s)
- Hussein Hijazi
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA.
| | | |
Collapse
|