1
|
Halder RK, Uddin MN, Uddin MA, Aryal S, Saha S, Hossen R, Ahmed S, Rony MAT, Akter MF. ML-CKDP: Machine learning-based chronic kidney disease prediction with smart web application. J Pathol Inform 2024; 15:100371. [PMID: 38510072 PMCID: PMC10950726 DOI: 10.1016/j.jpi.2024.100371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Revised: 02/07/2024] [Accepted: 02/17/2024] [Indexed: 03/22/2024] Open
Abstract
Chronic kidney diseases (CKDs) are a significant public health issue with potential for severe complications such as hypertension, anemia, and renal failure. Timely diagnosis is crucial for effective management. Leveraging machine learning within healthcare offers promising advancements in predictive diagnostics. In this paper, we developed a machine learning-based kidney diseases prediction (ML-CKDP) model with dual objectives: to enhance dataset preprocessing for CKD classification and to develop a web-based application for CKD prediction. The proposed model involves a comprehensive data preprocessing protocol, converting categorical variables to numerical values, imputing missing data, and normalizing via Min-Max scaling. Feature selection is executed using a variety of techniques including Correlation, Chi-Square, Variance Threshold, Recursive Feature Elimination, Sequential Forward Selection, Lasso Regression, and Ridge Regression to refine the datasets. The model employs seven classifiers: Random Forest (RF), AdaBoost (AdaB), Gradient Boosting (GB), XgBoost (XgB), Naive Bayes (NB), Support Vector Machine (SVM), and Decision Tree (DT), to predict CKDs. The effectiveness of the models is assessed by measuring their accuracy, analyzing confusion matrix statistics, and calculating the Area Under the Curve (AUC) specifically for the classification of positive cases. Random Forest (RF) and AdaBoost (AdaB) achieve a 100% accuracy rate, evident across various validation methods including data splits of 70:30, 80:20, and K-Fold set to 10 and 15. RF and AdaB consistently reach perfect AUC scores of 100% across multiple datasets, under different splitting ratios. Moreover, Naive Bayes (NB) stands out for its efficiency, recording the lowest training and testing times across all datasets and split ratios. Additionally, we present a real-time web-based application to operationalize the model, enhancing accessibility for healthcare practitioners and stakeholders. Web app link: https://rajib-research-kedney-diseases-prediction.onrender.com/.
Collapse
Affiliation(s)
- Rajib Kumar Halder
- Dept. of Computer Science and Engineering, Jagannath University, Dhaka 1100, Bangladesh
| | - Mohammed Nasir Uddin
- Dept. of Computer Science and Engineering, Jagannath University, Dhaka 1100, Bangladesh
| | - Md. Ashraf Uddin
- School of Information Technology, Deakin University, Geelong 3125, Australia
| | - Sunil Aryal
- School of Information Technology, Deakin University, Geelong 3125, Australia
| | - Sajeeb Saha
- Dept. of Computer Science and Engineering, Jagannath University, Dhaka 1100, Bangladesh
| | - Rakib Hossen
- Dept. of Cyber Security, Bangabandhu Sheikh Mujibur Rahman Digital University, Kaliakoir, Gazipur 1750, Bangladesh
| | - Sabbir Ahmed
- Dept. of Educational Technology, Bangabandhu Sheikh Mujibur Rahman Digital University, Kaliakoir, Gazipur 1750, Bangladesh
| | | | - Mosammat Farida Akter
- Dept. of Computer Science and Engineering, Jagannath University, Dhaka 1100, Bangladesh
| |
Collapse
|
2
|
Abbasi Holasou H, Panahi B, Shahi A, Nami Y. Integration of machine learning models with microsatellite markers: New avenue in world grapevine germplasm characterization. Biochem Biophys Rep 2024; 38:101678. [PMID: 38495412 PMCID: PMC10940787 DOI: 10.1016/j.bbrep.2024.101678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Revised: 02/09/2024] [Accepted: 02/27/2024] [Indexed: 03/19/2024] Open
Abstract
Development of efficient analytical techniques is required for effective interpretation of biological data to take novel hypotheses and finding the critical predictive patterns. Machine Learning algorithms provide a novel opportunity for development of low-cost and practical solutions in biology. In this study, we proposed a new integrated analytical approach using supervised machine learning algorithms and microsatellites data of worldwide vitis populations. A total of 1378 wild (V. vinifera spp. sylvestris) and cultivated (V. vinifera spp. sativa) accessions of grapevine were investigated using 20 microsatellite markers. Data cleaning, feature selection, and supervised machine learning classification models vis, Naive Bayes, Support Vector Machine (SVM) and Tree Induction methods were implied to find most indicative and diagnostic alleles to represent wild/cultivated and originated geography of each population. Our combined approaches showed microsatellite markers with the highest differentiating capacity and proved efficiency for our pipeline of classification and prediction of vitis accessions. Moreover, our study proposed the best combination of markers for better distinguishing of populations, which can be exploited in future germplasm conservation and breeding programs.
Collapse
Affiliation(s)
- Hossein Abbasi Holasou
- Department of Plant Breeding and Biotechnology, Faculty of Agriculture, University of Tabriz, Tabriz, Iran
| | - Bahman Panahi
- Department of Genomics, Branch for Northwest and West Region, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research, Education and Extension Organization (AREEO), Tabriz, Iran
| | - Ali Shahi
- Faculty of Agriculture (Meshgin Shahr Campus), Mohaghegh Ardabili University, Ardabil, Iran
| | - Yousef Nami
- Department of Food Biotechnology, Branch for Northwest and West Region, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research, Education and Extension Organization (AREEO), Tabriz, Iran
| |
Collapse
|
3
|
Zhang VY, O’Connor SL, Welsh WJ, James MH. Machine learning models to predict ligand binding affinity for the orexin 1 receptor. Artif Intell Chem 2024; 2:100040. [PMID: 38476266 PMCID: PMC10927255 DOI: 10.1016/j.aichem.2023.100040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/14/2024]
Abstract
The orexin 1 receptor (OX1R) is a G-protein coupled receptor that regulates a variety of physiological processes through interactions with the neuropeptides orexin A and B. Selective OX1R antagonists exhibit therapeutic effects in preclinical models of several behavioral disorders, including drug seeking and overeating. However, currently there are no selective OX1R antagonists approved for clinical use, fueling demand for novel compounds that act at this target. In this study, we meticulously curated a dataset comprising over 1300 OX1R ligands using a stringent filter and criteria cascade. Subsequently, we developed highly predictive quantitative structure-activity relationship (QSAR) models employing the optimized hyper-parameters for the random forest machine learning algorithm and twelve 2D molecular descriptors selected by recursive feature elimination with a 5-fold cross-validation process. The predictive capacity of the QSAR model was further assessed using an external test set and enrichment study, confirming its high predictivity. The practical applicability of our final QSAR model was demonstrated through virtual screening of the DrugBank database. This revealed two FDA-approved drugs (isavuconazole and cabozantinib) as potential OX1R ligands, confirmed by radiolabeled OX1R binding assays. To our best knowledge, this study represents the first report of highly predictive QSAR models on a large comprehensive dataset of diverse OX1R ligands, which should prove useful for the discovery and design of new compounds targeting this receptor.
Collapse
Affiliation(s)
- Vanessa Y. Zhang
- Department of Psychiatry, Robert Wood Johnson Medical School, Rutgers University and Rutgers Biomedical Health Sciences, Piscataway, NJ, USA
- Brain Health Institute, Rutgers University and Rutgers Biomedical and Health Sciences, Piscataway, NJ, USA
- West Windsor-Plainsboro High School South, West Windsor, NJ, USA
| | - Shayna L. O’Connor
- Department of Psychiatry, Robert Wood Johnson Medical School, Rutgers University and Rutgers Biomedical Health Sciences, Piscataway, NJ, USA
- Brain Health Institute, Rutgers University and Rutgers Biomedical and Health Sciences, Piscataway, NJ, USA
| | - William J. Welsh
- Department of Pharmacology, Robert Wood Johnson Medical School, Rutgers University and Rutgers Biomedical Health Sciences, Piscataway, NJ, USA
| | - Morgan H. James
- Department of Psychiatry, Robert Wood Johnson Medical School, Rutgers University and Rutgers Biomedical Health Sciences, Piscataway, NJ, USA
- Brain Health Institute, Rutgers University and Rutgers Biomedical and Health Sciences, Piscataway, NJ, USA
| |
Collapse
|
4
|
Fengou LC, Lytou AE, Tsekos G, Tsakanikas P, Nychas GJE. Features in visible and Fourier transform infrared spectra confronting aspects of meat quality and fraud. Food Chem 2024; 440:138184. [PMID: 38100963 DOI: 10.1016/j.foodchem.2023.138184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 12/06/2023] [Accepted: 12/07/2023] [Indexed: 12/17/2023]
Abstract
Rapid assessment of microbiological quality (i.e., Total Aerobic Counts, TAC) and authentication (i.e., fresh vs frozen/thawed) of meat was investigated using spectroscopic-based methods. Data were collected throughout storage experiments from different conditions. In total 526 spectra (Fourier transform infrared, FTIR) and 534 multispectral images (MSI) were acquired. Partial Least Squares (PLS) was applied to select/transform the variables. In the case of FTIR data 30 % of the initial features were used, while for MSI-based models all features were employed. Subsequently, Support Vector Machines (SVM) regression/classification models were developed and evaluated. The performance of the models was evaluated based on the external validation set. In both cases MSI-based models (Root Mean Square Error, RMSE: 0.48-1.08, Accuracy: 91-97 %) were slightly better compared to FTIR (RMSE: 0.83-1.31, Accuracy: 88-94 %). The most informative features of FTIR for the case of quality were mainly in 900-1700 cm-1, while for fraud the features were more dispersed.
Collapse
Affiliation(s)
- Lemonia-Christina Fengou
- Laboratory of Microbiology and Biotechnology of Foods, Department of Food Science and Human Nutrition, School of Food and Nutritional Sciences, Agricultural University of Athens, Iera Odos 75, 11855 Athens, Greece.
| | - Anastasia E Lytou
- Laboratory of Microbiology and Biotechnology of Foods, Department of Food Science and Human Nutrition, School of Food and Nutritional Sciences, Agricultural University of Athens, Iera Odos 75, 11855 Athens, Greece.
| | - George Tsekos
- Laboratory of Microbiology and Biotechnology of Foods, Department of Food Science and Human Nutrition, School of Food and Nutritional Sciences, Agricultural University of Athens, Iera Odos 75, 11855 Athens, Greece.
| | - Panagiotis Tsakanikas
- Laboratory of Microbiology and Biotechnology of Foods, Department of Food Science and Human Nutrition, School of Food and Nutritional Sciences, Agricultural University of Athens, Iera Odos 75, 11855 Athens, Greece.
| | - George-John E Nychas
- Laboratory of Microbiology and Biotechnology of Foods, Department of Food Science and Human Nutrition, School of Food and Nutritional Sciences, Agricultural University of Athens, Iera Odos 75, 11855 Athens, Greece.
| |
Collapse
|
5
|
Ambre D, Sheyyab M, Lynch P, Mayhew EK, Brezinsky K. A Raman spectroscopy based chemometric approach to predict the derived cetane number of hydrocarbon jet fuels and their mixtures. Talanta 2024; 271:125635. [PMID: 38219321 DOI: 10.1016/j.talanta.2024.125635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 12/29/2023] [Accepted: 01/04/2024] [Indexed: 01/16/2024]
Abstract
Fuel ignition quality, measured in the form of Derived Cetane Number (DCN), is an important part of integrating fuels, including sustainable aviation fuels, in compression ignition engines. DCN has been correlated with simulated and/or real spectroscopic measurements as well as other physical and chemical properties, but rarely have these correlations developed into a pathway to application. One application of the correlations is the use of miniaturized onboard fuel sensors that could assist, by using predicted DCN, in real-time feedforward engine control. To aid in the application of developing such DCN fuel sensors, Raman spectra coupled with chemometrics and a selection of influential spectral features were investigated. In this study, the Raman spectra were obtained from a database that included jet fuels, jet fuel mixtures, pure hydrocarbon components, and their weighted mixtures. The resulting Raman spectral database from the experimental measurements included spectra of components that span a wide range of DCNs and covered all the expected chemical functional groups present in a standard jet fuel. Chemometric models were developed to associate Raman spectra with DCN in subsets of the spectral range to aid in sensor miniaturization. The models were tested on jet fuels such as National Jet Fuel Combustion Program fuels designated A-1, A-2, and A-3 along with mixtures of jet fuels that spanned a wide range of DCN, simulating fuels that could represent real-world scenarios. An Artificial Neural Network (ANN) model trained on the fingerprint region (500 cm-1 - 1800 cm-1) of the Raman spectra was able to capture the non-linearity of the association between the Raman spectra and DCN with a test R2 score of 0.926, a test MSE of 3.61, and a test MPE of 3.41. Around 97 % of the unseen test samples were predicted within 10 % of the DCN measured with an Ignition Quality Tester. One hundred features of the fingerprint region influencing DCN predictions in the optimal ANN model were extracted using a Global Surrogate (GS) model. A reduced ANN model trained on only these one hundred features performed slightly better with a test R2 score of 0.935, test MSE of 3.19, test MPE of 3.20 and with the entire set of unseen test samples predicted within 10 % of the measured DCN. For assessing applicability of real-time and online DCN sensing, the Raman spectrometer was integrated with a flow cell capable of allowing measurements of DCN in flowing fuel samples and included the optimal ANN model of the fingerprint region and the 100-feature GS-ANN model on a Raspberry Pi computer. A number of unseen F-24/alcohol-to-jet fuel mixtures composed of unknown volumes were tested using the flow cell for DCN, and all of these samples were predicted within 10 % of the measured DCN.
Collapse
Affiliation(s)
- Dhananjay Ambre
- Department of Mechanical and Industrial Engineering, University of Illinois at Chicago, Chicago, IL, 60607, USA
| | - Manaf Sheyyab
- Department of Mechanical and Industrial Engineering, University of Illinois at Chicago, Chicago, IL, 60607, USA
| | - Patrick Lynch
- Department of Mechanical and Industrial Engineering, University of Illinois at Chicago, Chicago, IL, 60607, USA
| | - Eric K Mayhew
- US Army Combat Capabilities Development Command, Army Research Laboratory, Aberdeen Proving Ground, MD 21005, USA
| | - Kenneth Brezinsky
- Department of Mechanical and Industrial Engineering, University of Illinois at Chicago, Chicago, IL, 60607, USA.
| |
Collapse
|
6
|
Gholizadeh M, Saeedi R, Bagheri A, Paeezi M. Machine learning-based prediction of effluent total suspended solids in a wastewater treatment plant using different feature selection approaches: A comparative study. Environ Res 2024; 246:118146. [PMID: 38215928 DOI: 10.1016/j.envres.2024.118146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 12/31/2023] [Accepted: 01/05/2024] [Indexed: 01/14/2024]
Abstract
Accurately predicting the characteristics of effluent, discharged from wastewater treatment plants (WWTPs) is crucial for reducing sampling requirements, labor, costs, and environmental pollution. Machine learning (ML) techniques can be effective in achieving this goal. To optimize ML-based models, various feature selection (FS) methods are employed. This study aims to investigate the impact of six FS methods (categorized as Wrapper, Filter, and Embedded methods) on the accuracy of three supervised ML algorithms in predicting total suspended solids (TSS) concentration in the effluent of a municipal wastewater treatment plant. Based on the features proposed by each FS method, five distinct scenarios were defined. Within each scenario, three ML algorithms, namely artificial neural network-multi layer perceptron (ANN-MLP), K-nearest neighbors (KNN), and adaptive boosting (AdaBoost) were applied. The features utilized for predicting TSS concentration in the WWTP effluent included BOD5, COD, TSS, TN, NH3 in the influent, and BOD5, COD, residual Cl2, NO3, TN, NH4 in the effluent. To construct the models, the dataset was randomly divided into training and testing subsets, and K-fold cross-validation was employed to control overfitting and underfitting. The evaluation metrics that are used are root mean squared error (RMSE), mean absolute error (MAE), and correlation coefficient (R2). The most efficient scenario was identified as Scenario IV, with the Sequential Backward Selection FS method. The features selected by this method were CODe, BOD5e, BOD5i, TNi. Furthermore, the ANN-MLP algorithm demonstrated the best performance, achieving the highest R2 value. This algorithm exhibited acceptable performance in both the training and testing subsets (R2 = 0.78 and R2 = 0.8, respectively).
Collapse
Affiliation(s)
- Mahdi Gholizadeh
- Environmental and Occupational Hazards Control Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran; Department of Health, Safety and Environment, School of Public Health and Safety, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Reza Saeedi
- Department of Health, Safety and Environment, School of Public Health and Safety, Shahid Beheshti University of Medical Sciences, Tehran, Iran; Workplace Health Promotion Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Amin Bagheri
- Environmental and Occupational Hazards Control Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran; Department of Health, Safety and Environment, School of Public Health and Safety, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| | - Mohammad Paeezi
- Department of Health, Safety and Environment, School of Public Health and Safety, Shahid Beheshti University of Medical Sciences, Tehran, Iran; Workplace Health Promotion Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
7
|
Tang B, Liu S, Feng X, Li C, Huo H, Wang A, Deng X, Yang C. Intelligent assessment of atrial fibrillation gradation based on sinus rhythm electrocardiogram and baseline information. Comput Methods Programs Biomed 2024; 247:108093. [PMID: 38401509 DOI: 10.1016/j.cmpb.2024.108093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 02/16/2024] [Accepted: 02/17/2024] [Indexed: 02/26/2024]
Abstract
BACKGROUND Atrial fibrillation (AF) is a progressive arrhythmia that significantly affects a patient's quality of life. The 4S-AF scheme is clinically recommended for AF management; however, the evaluation process is complex and time-consuming. This renders its promotion in primary medical institutions challenging. This retrospective study aimed to simplify the evaluation process and present an objective assessment model for AF gradation. METHODS In total, 189 12-lead electrocardiogram (ECG) recordings from 64 patients were included in this study. The data were annotated into two groups (mild and severe) according to the 4S-AF scheme. Using a preprocessed ECG during the sinus rhythm (SR), we obtained a synthesized vectorcardiogram (VCG). Subsequently, various features were calculated from both signals, and age, sex, and medical history were included as baseline characteristics. Different machine learning models, including support vector machines, random forests (RF), and logistic regression, were finally tested with a combination of feature selection techniques. RESULTS The proposed method demonstrated excellent performance in the classification of AF gradation. With an optimized feature set of VCG and baseline features, the RF model achieved accuracy, sensitivity, and specificity of 83.02 %, 80.56 %, and 88.24 %, respectively, under the inter-patient paradigm. CONCLUSION Our results demonstrate the value of physiological signals in AF gradation evaluation, and VCG signals were effective in identifying mild and severe AF. Considering its low computational complexity and high assessment performance, the proposed model is expected to serve as a useful prognostic tool for clinical AF management.
Collapse
Affiliation(s)
- Biqi Tang
- Department of Biomedical Engineering, School of Information Science and Technology, Fudan University, Shanghai, 200433, PR China
| | - Sen Liu
- Department of Biomedical Engineering, School of Information Science and Technology, Fudan University, Shanghai, 200433, PR China
| | - Xujian Feng
- Department of Biomedical Engineering, School of Information Science and Technology, Fudan University, Shanghai, 200433, PR China
| | - Chunpu Li
- Department of Cardiology, Xinghua City People's Hospital, Jiangsu, 225700, PR China
| | - Hongye Huo
- Department of Cardiology, Xinghua City People's Hospital, Jiangsu, 225700, PR China
| | - Aiguo Wang
- Department of Cardiology, Xinghua City People's Hospital, Jiangsu, 225700, PR China
| | - Xintao Deng
- Department of Cardiology, Xinghua City People's Hospital, Jiangsu, 225700, PR China.
| | - Cuiwei Yang
- Department of Biomedical Engineering, School of Information Science and Technology, Fudan University, Shanghai, 200433, PR China; Key Laboratory of Medical Imaging Computing and Computer Assisted Intervention of Shanghai, 200093, PR China.
| |
Collapse
|
8
|
Nishan A, M. Taslim Uddin Raju S, Hossain MI, Dipto SA, M. Tanvir Uddin S, Sijan A, Chowdhury MAS, Ahmad A, Mahamudul Hasan Khan M. A continuous cuffless blood pressure measurement from optimal PPG characteristic features using machine learning algorithms. Heliyon 2024; 10:e27779. [PMID: 38533045 PMCID: PMC10963242 DOI: 10.1016/j.heliyon.2024.e27779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 03/01/2024] [Accepted: 03/06/2024] [Indexed: 03/28/2024] Open
Abstract
Background and objective Hypertension is a potentially dangerous health condition that can be detected by measuring blood pressure (BP). Blood pressure monitoring and measurement are essential for preventing and treating cardiovascular diseases. Cuff-based devices, on the other hand, are uncomfortable and prevent continuous BP measurement. Methods In this study, a new non-invasive and cuff-less method for estimating Systolic Blood Pressure (SBP), Mean Arterial Pressure (MAP), and Diastolic Blood Pressure (DBP) has been proposed using characteristic features of photoplethysmogram (PPG) signals and nonlinear regression algorithms. PPG signals were collected from 219 participants, which were then subjected to preprocessing and feature extraction steps. Analyzing PPG and its derivative signals, a total of 46 time, frequency, and time-frequency domain features were extracted. In addition, the age and gender of each subject were also included as features. Further, correlation-based feature selection (CFS) and Relief F feature selection (ReliefF) techniques were used to select the relevant features and reduce the possibility of over-fitting the models. Finally, support vector regression (SVR), K-nearest neighbour regression (KNR), decision tree regression (DTR), and random forest regression (RFR) were established to develop the BP estimation model. Regression models were trained and evaluated on all features as well as selected features. The best regression models for SBP, MAP, and DBP estimations were selected separately. Results The SVR model, along with the ReliefF-based feature selection algorithm, outperforms other algorithms in estimating the SBP, MAP, and DBP with the mean absolute error of 2.49, 1.62 and 1.43 mmHg, respectively. The proposed method meets the Advancement of Medical Instrumentation standard for BP estimations. Based on the British Hypertension Society standard, the results also fall within Grade A for SBP, MAP, and DBP. Conclusion The findings show that the method can be used to estimate blood pressure non-invasively, without using a cuff or calibration, and only by utilizing the PPG signal characteristic features.
Collapse
Affiliation(s)
- Araf Nishan
- Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna - 9203, Bangladesh
| | - S. M. Taslim Uddin Raju
- Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna - 9203, Bangladesh
| | - Md Imran Hossain
- Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna - 9203, Bangladesh
| | - Safin Ahmed Dipto
- Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna - 9203, Bangladesh
| | - S. M. Tanvir Uddin
- Department of Electrical and Electronic Engineering, Dhaka University of Engineering & Technology, Gazipur, Bangladesh
| | - Asif Sijan
- Department of Software Engineering, American International University, Dhaka, Bangladesh
| | - Md Abu Shahid Chowdhury
- Department of Biomedical Engineering, Khulna University of Engineering & Technology, Khulna - 9203, Bangladesh
| | - Ashfaq Ahmad
- Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna - 9203, Bangladesh
| | - Md Mahamudul Hasan Khan
- Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna - 9203, Bangladesh
| |
Collapse
|
9
|
Rangam H, Sivasankaran SK, Balasubramanian V. Visual hazardous models: A hybrid approach to investigate road hazardous events. Accid Anal Prev 2024; 200:107556. [PMID: 38531281 DOI: 10.1016/j.aap.2024.107556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Revised: 02/10/2024] [Accepted: 03/21/2024] [Indexed: 03/28/2024]
Abstract
Road users (drivers, passengers, pedestrians, and Animals) are exposed to hazardous events during their commute. With 23 % of global fatalities among pedestrians, their safety continues to be a principal interest for policymakers worldwide. Owing to limited budgets available, there is a growing emphasis on data-driven stochastic models to decide on policies. However, statistical models have limitations due to crash data having redundant features, inherent heterogeneity, and unobserved characteristics. The random parameter model framework addresses the unobserved heterogeneity, but redundant features and inherent heterogeneity among the data's characteristics still compute the biased estimates. This is further complicated if the data has spatiotemporal attributes. To address this, we developed two visual hazardous (VH) models: (i) addresses the unobserved heterogeneity in the data, and (ii) addresses the dimensionality, inherent heterogeneity among the characteristics and unobserved heterogeneity in the collected data after spatiotemporal pattern identification. The feature selection model reduces the dimensionality, whereas latent class clustering classifies the data into maximum heterogeneity between classes. This integration reduces bias in the estimates. As a use-case, pedestrian crosswalk crashes for a decade (2009-2018) in the Indian state of Tamil Nadu extracted from the Road Accident Database Management System (RADMS) was used to understand model performance. This data comprises the crash location, road, vehicle, driver, pedestrian, and environment details. Results show that visual hazardous model 2 allows for generating crash scenarios with five homogeneous sub-classes and the magnitude with marginal effects of contributing factors impacting it. For example, pedestrians during their crosswalks are likely to sustain 82% more chance of fatal/grievous injuries on expressways (posted speed limit: 100 km per hour) in annual hazardous zone locations. Working pedestrian age group (25-64 years), an older pedestrian (>64 years), the pedestrian position on a pedestrian crossing and not in the centre of the road, pedestrian action: walking along the edge of the road, multiple lanes, two lanes, paved shoulder, straight and flat road, motorcycle, bus, truck, medium-duty vehicle, illegal driver (<=17 years), going ahead/ overtaking, high speed, expressways, and rural region were statistically significant (positively) contributing to the fatal/grievous injury pedestrian crashes during their crosswalk. This technique serves as a structure for engineers, researchers, and policymakers to formulate effective countermeasures that enhance road safety.
Collapse
Affiliation(s)
- Harikrishna Rangam
- RBG Labs, Department of Engineering Design, Indian Institute of Technology Madras, Chennai, Tamil Nadu 600036, India
| | - Sathish Kumar Sivasankaran
- RBG Labs, Department of Engineering Design, Indian Institute of Technology Madras, Chennai, Tamil Nadu 600036, India
| | - Venkatesh Balasubramanian
- RBG Labs, Department of Engineering Design, Indian Institute of Technology Madras, Chennai, Tamil Nadu 600036, India.
| |
Collapse
|
10
|
Sharkas M, Attallah O. Color-CADx: a deep learning approach for colorectal cancer classification through triple convolutional neural networks and discrete cosine transform. Sci Rep 2024; 14:6914. [PMID: 38519513 PMCID: PMC10959971 DOI: 10.1038/s41598-024-56820-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Accepted: 03/11/2024] [Indexed: 03/25/2024] Open
Abstract
Colorectal cancer (CRC) exhibits a significant death rate that consistently impacts human lives worldwide. Histopathological examination is the standard method for CRC diagnosis. However, it is complicated, time-consuming, and subjective. Computer-aided diagnostic (CAD) systems using digital pathology can help pathologists diagnose CRC faster and more accurately than manual histopathology examinations. Deep learning algorithms especially convolutional neural networks (CNNs) are advocated for diagnosis of CRC. Nevertheless, most previous CAD systems obtained features from one CNN, these features are of huge dimension. Also, they relied on spatial information only to achieve classification. In this paper, a CAD system is proposed called "Color-CADx" for CRC recognition. Different CNNs namely ResNet50, DenseNet201, and AlexNet are used for end-to-end classification at different training-testing ratios. Moreover, features are extracted from these CNNs and reduced using discrete cosine transform (DCT). DCT is also utilized to acquire spectral representation. Afterward, it is used to further select a reduced set of deep features. Furthermore, DCT coefficients obtained in the previous step are concatenated and the analysis of variance (ANOVA) feature selection approach is applied to choose significant features. Finally, machine learning classifiers are employed for CRC classification. Two publicly available datasets were investigated which are the NCT-CRC-HE-100 K dataset and the Kather_texture_2016_image_tiles dataset. The highest achieved accuracy reached 99.3% for the NCT-CRC-HE-100 K dataset and 96.8% for the Kather_texture_2016_image_tiles dataset. DCT and ANOVA have successfully lowered feature dimensionality thus reducing complexity. Color-CADx has demonstrated efficacy in terms of accuracy, as its performance surpasses that of the most recent advancements.
Collapse
Affiliation(s)
- Maha Sharkas
- Electronics and Communications Engineering Department, College of Engineering and Technology, Arab Academy for Science, Technology, and Maritime Transport, Alexandria, Egypt
| | - Omneya Attallah
- Electronics and Communications Engineering Department, College of Engineering and Technology, Arab Academy for Science, Technology, and Maritime Transport, Alexandria, Egypt.
- Wearables, Biosensing, and Biosignal Processing Laboratory, Arab Academy for Science, Technology and Maritime Transport, Alexandria, 21937, Egypt.
| |
Collapse
|
11
|
Zhang L, Chen X. Enhanced chimp hierarchy optimization algorithm with adaptive lens imaging for feature selection in data classification. Sci Rep 2024; 14:6910. [PMID: 38519568 PMCID: PMC10959962 DOI: 10.1038/s41598-024-57518-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Accepted: 03/19/2024] [Indexed: 03/25/2024] Open
Abstract
Feature selection is a critical component of machine learning and data mining to remove redundant and irrelevant features from a dataset. The Chimp Optimization Algorithm (CHoA) is widely applicable to various optimization problems due to its low number of parameters and fast convergence rate. However, CHoA has a weak exploration capability and tends to fall into local optimal solutions in solving the feature selection process, leading to ineffective removal of irrelevant and redundant features. To solve this problem, this paper proposes the Enhanced Chimp Hierarchy Optimization Algorithm for adaptive lens imaging (ALI-CHoASH) for searching the optimal classification problems for the optimal subset of features. Specifically, to enhance the exploration and exploitation capability of CHoA, we designed a chimp social hierarchy. We employed a novel social class factor to label the class situation of each chimp, enabling effective modelling and optimization of the relationships among chimp individuals. Then, to parse chimps' social and collaborative behaviours with different social classes, we introduce other attacking prey and autonomous search strategies to help chimp individuals approach the optimal solution faster. In addition, considering the poor diversity of chimp groups in the late iteration, we propose an adaptive lens imaging back-learning strategy to avoid the algorithm falling into a local optimum. Finally, we validate the improvement of ALI-CHoASH in exploration and exploitation capabilities using several high-dimensional datasets. We also compare ALI-CHoASH with eight state-of-the-art methods in classification accuracy, feature subset size, and computation time to demonstrate its superiority.
Collapse
Affiliation(s)
- Li Zhang
- College of Computer Engineering, Jiangsu University of Technology, Changzhou, 213001, People's Republic of China.
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education Jilin University, Changchun, 130012, People's Republic of China.
| | - XiaoBo Chen
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education Jilin University, Changchun, 130012, People's Republic of China
- People's Bank of China Changzhou City Center Branch, Changzhou, 213001, Jiangsu, People's Republic of China
| |
Collapse
|
12
|
Kuo CY, Yang WW, Su ECY. Improving dengue fever predictions in Taiwan based on feature selection and random forests. BMC Infect Dis 2024; 24:334. [PMID: 38509486 PMCID: PMC10953060 DOI: 10.1186/s12879-024-09220-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Accepted: 03/12/2024] [Indexed: 03/22/2024] Open
Abstract
BACKGROUND Dengue fever is a well-studied vector-borne disease in tropical and subtropical areas of the world. Several methods for predicting the occurrence of dengue fever in Taiwan have been proposed. However, to the best of our knowledge, no study has investigated the relationship between air quality indices (AQIs) and dengue fever in Taiwan. RESULTS This study aimed to develop a dengue fever prediction model in which meteorological factors, a vector index, and AQIs were incorporated into different machine learning algorithms. A total of 805 meteorological records from 2013 to 2015 were collected from government open-source data after preprocessing. In addition to well-known dengue-related factors, we investigated the effects of novel variables, including particulate matter with an aerodynamic diameter < 10 µm (PM10), PM2.5, and an ultraviolet index, for predicting dengue fever occurrence. The collected dataset was randomly divided into an 80% training set and a 20% test set. The experimental results showed that the random forests achieved an area under the receiver operating characteristic curve of 0.9547 for the test set, which was the best compared with the other machine learning algorithms. In addition, the temperature was the most important factor in our variable importance analysis, and it showed a positive effect on dengue fever at < 30 °C but had less of an effect at > 30 °C. The AQIs were not as important as temperature, but one was selected in the process of filtering the variables and showed a certain influence on the final results. CONCLUSIONS Our study is the first to demonstrate that AQI negatively affects dengue fever occurrence in Taiwan. The proposed prediction model can be used as an early warning system for public health to prevent dengue fever outbreaks.
Collapse
Affiliation(s)
- Chao-Yang Kuo
- Smart Healthcare Interdisciplinary College, National Taipei University of Nursing and Health Sciences, No.365, Mingde Road, Beitou District, Taipei City, 112303, Taiwan
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, No.301, Yuantong Road, Zhonghe District, New Taipei City, 23564, Taiwan
| | - Wei-Wen Yang
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, No.301, Yuantong Road, Zhonghe District, New Taipei City, 23564, Taiwan
| | - Emily Chia-Yu Su
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, No.301, Yuantong Road, Zhonghe District, New Taipei City, 23564, Taiwan.
- Clinical Big Data Research Center, Taipei Medical University Hospital, No.252 Wuxing Street, Xinyi District, Taipei City, 110, Taiwan.
| |
Collapse
|
13
|
Zhao H, Qiu S, Bai M, Wang L, Wang Z. Toxicity prediction and classification of Gunqile-7 with small sample based on transfer learning method. Comput Biol Med 2024; 173:108348. [PMID: 38531249 DOI: 10.1016/j.compbiomed.2024.108348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 03/10/2024] [Accepted: 03/17/2024] [Indexed: 03/28/2024]
Abstract
Drug-induced diseases are the most important component of iatrogenic disease. It is the duty of doctors to provide a reasonable and safe dose of medication. Gunqile-7 is a Mongolian medicine with analgesic and anti-inflammatory effects. As a foreign substance in the body, even with reasonable medication, it may produce varying degrees of adverse reactions or toxic side effects. Since the cost of collecting Gunqile-7 for pharmacological animal trials is high and the data sample is small, this paper employs transfer learning and data augmentation methods to study the toxicity of Gunqile-7. More specifically, to reduce the necessary number of training samples, the data augmentation approach is employed to extend the data set. Then, the transfer learning method and one-dimensional convolutional neural network are utilized to train the network. In addition, we use the support vector machine-recursive feature elimination method for feature selection to reduce features that have adverse effects on model predictions. Furthermore, due to the important role of the pre-trained model of transfer learning, we select a quantitative toxicity prediction model as the pre-trained model, which is consistent with the purpose of this paper. Lastly, the experimental results demonstrate the efficiency of the proposed method. Our method can improve accuracy by up to 9 percentage points compared to the method without transfer learning on a small sample set.
Collapse
Affiliation(s)
- Hongkai Zhao
- Key Laboratory of Intelligent Control and Optimization for Industrial Equipment of Ministry of Education, Dalian University of Technology, Dalian 116024, China; School of Control Science and Engineering, Dalian University of Technology, Dalian 116024, China.
| | - Sen Qiu
- Key Laboratory of Intelligent Control and Optimization for Industrial Equipment of Ministry of Education, Dalian University of Technology, Dalian 116024, China; School of Control Science and Engineering, Dalian University of Technology, Dalian 116024, China.
| | - Meirong Bai
- Key Laboratory of Ministry of Education of Mongolian Medicine RD Engineering, Inner Mongolia Minzu University, Tongliao 028000, China.
| | - Luyao Wang
- Key Laboratory of Intelligent Control and Optimization for Industrial Equipment of Ministry of Education, Dalian University of Technology, Dalian 116024, China; School of Control Science and Engineering, Dalian University of Technology, Dalian 116024, China.
| | - Zhelong Wang
- Key Laboratory of Intelligent Control and Optimization for Industrial Equipment of Ministry of Education, Dalian University of Technology, Dalian 116024, China; School of Control Science and Engineering, Dalian University of Technology, Dalian 116024, China.
| |
Collapse
|
14
|
Houssein EH, Hammad A, Emam MM, Ali AA. An enhanced Coati Optimization Algorithm for global optimization and feature selection in EEG emotion recognition. Comput Biol Med 2024; 173:108329. [PMID: 38513391 DOI: 10.1016/j.compbiomed.2024.108329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 03/07/2024] [Accepted: 03/17/2024] [Indexed: 03/23/2024]
Abstract
Emotion recognition based on Electroencephalography (EEG) signals has garnered significant attention across diverse domains including healthcare, education, information sharing, and gaming, among others. Despite its potential, the absence of a standardized feature set poses a challenge in efficiently classifying various emotions. Addressing the issue of high dimensionality, this paper introduces an advanced variant of the Coati Optimization Algorithm (COA), called eCOA for global optimization and selecting the best subset of EEG features for emotion recognition. Specifically, COA suffers from local optima and imbalanced exploitation abilities as other metaheuristic methods. The proposed eCOA incorporates the COA and RUNge Kutta Optimizer (RUN) algorithms. The Scale Factor (SF) and Enhanced Solution Quality (ESQ) mechanism from RUN are applied to resolve the raised shortcomings of COA. The proposed eCOA algorithm has been extensively evaluated using the CEC'22 test suite and two EEG emotion recognition datasets, DEAP and DREAMER. Furthermore, the eCOA is applied for binary and multi-class classification of emotions in the dimensions of valence, arousal, and dominance using a multi-layer perceptron neural network (MLPNN). The experimental results revealed that the eCOA algorithm has more powerful search capabilities than the original COA and seven well-known counterpart methods related to statistical, convergence, and diversity measures. Furthermore, eCOA can efficiently support feature selection to find the best EEG features to maximize performance on four quadratic emotion classification problems compared to the methods of its counterparts. The suggested method obtains a classification accuracy of 85.17% and 95.21% in the binary classification of low and high arousal emotions in two public datasets: DEAP and DREAMER, respectively, which are 5.58% and 8.98% superior to existing approaches working on the same datasets for different subjects, respectively.
Collapse
Affiliation(s)
- Essam H Houssein
- Faculty of Computers and Information, Minia University, Minia, Egypt.
| | - Asmaa Hammad
- Faculty of Computers and Information, Minia University, Minia, Egypt.
| | - Marwa M Emam
- Faculty of Computers and Information, Minia University, Minia, Egypt.
| | - Abdelmgeid A Ali
- Faculty of Computers and Information, Minia University, Minia, Egypt.
| |
Collapse
|
15
|
Nose S, Shiroma H, Yamada T, Uno Y. QNetDiff: a quantitative measurement of network rewiring. BMC Bioinformatics 2024; 25:118. [PMID: 38500025 PMCID: PMC10946107 DOI: 10.1186/s12859-024-05702-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 02/13/2024] [Indexed: 03/20/2024] Open
Abstract
Bacteria in the human body, particularly in the large intestine, are known to be associated with various diseases. To identify disease-associated bacteria (markers), a typical method is to statistically compare the relative abundance of bacteria between healthy subjects and diseased patients. However, since bacteria do not necessarily cause diseases in isolation, it is also important to focus on the interactions and relationships among bacteria when examining their association with diseases. In fact, although there are common approaches to represent and analyze bacterial interaction relationships as networks, there are limited methods to find bacteria associated with diseases through network-driven analysis. In this paper, we focus on rewiring of the bacterial network and propose a new method for quantifying the rewiring. We then apply the proposed method to a group of colorectal cancer patients. We show that it can identify and detect bacteria that cannot be detected by conventional methods such as abundance comparison. Furthermore, the proposed method is implemented as a general-purpose tool and made available to the general public.
Collapse
Affiliation(s)
- Shota Nose
- Graduate School of Engineering, Osaka Prefecture University, Sakai, Japan
| | - Hirotsugu Shiroma
- Department of Life Science and Technology, Tokyo Institute of Technology, Tokyo, Japan
| | - Takuji Yamada
- Department of Life Science and Technology, Tokyo Institute of Technology, Tokyo, Japan
- Metagen, Inc., Yamagata, Japan
- Metagen Theurapeutics, Inc., Yamagata, Japan
- digzyme, Inc., Tokyo, Japan
| | - Yushi Uno
- Graduate School of Informatics, Osaka Metropolitan University, Sakai, Japan.
| |
Collapse
|
16
|
Sargsyan K, Lim C. Using protein language models for protein interaction hot spot prediction with limited data. BMC Bioinformatics 2024; 25:115. [PMID: 38493120 PMCID: PMC10943781 DOI: 10.1186/s12859-024-05737-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 03/11/2024] [Indexed: 03/18/2024] Open
Abstract
BACKGROUND Protein language models, inspired by the success of large language models in deciphering human language, have emerged as powerful tools for unraveling the intricate code of life inscribed within protein sequences. They have gained significant attention for their promising applications across various areas, including the sequence-based prediction of secondary and tertiary protein structure, the discovery of new functional protein sequences/folds, and the assessment of mutational impact on protein fitness. However, their utility in learning to predict protein residue properties based on scant datasets, such as protein-protein interaction (PPI)-hotspots whose mutations significantly impair PPIs, remained unclear. Here, we explore the feasibility of using protein language-learned representations as features for machine learning to predict PPI-hotspots using a dataset containing 414 experimentally confirmed PPI-hotspots and 504 PPI-nonhot spots. RESULTS Our findings showcase the capacity of unsupervised learning with protein language models in capturing critical functional attributes of protein residues derived from the evolutionary information encoded within amino acid sequences. We show that methods relying on protein language models can compete with methods employing sequence and structure-based features to predict PPI-hotspots from the free protein structure. We observed an optimal number of features for model precision, suggesting a balance between information and overfitting. CONCLUSIONS This study underscores the potential of transformer-based protein language models to extract critical knowledge from sparse datasets, exemplified here by the challenging realm of predicting PPI-hotspots. These models offer a cost-effective and time-efficient alternative to traditional experimental methods for predicting certain residue properties. However, the challenge of explaining why specific features are important for determining certain residue properties remains.
Collapse
Affiliation(s)
- Karen Sargsyan
- Institute of Biomedical Sciences, Academia Sinica, Taipei, 115, Taiwan.
| | - Carmay Lim
- Institute of Biomedical Sciences, Academia Sinica, Taipei, 115, Taiwan.
| |
Collapse
|
17
|
Zhai S, Chen K, Yang L, Li Z, Yu T, Chen L, Zhu H. Applying machine learning to anaerobic fermentation of waste sludge using two targeted modeling strategies. Sci Total Environ 2024; 916:170232. [PMID: 38278257 DOI: 10.1016/j.scitotenv.2024.170232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 01/13/2024] [Accepted: 01/15/2024] [Indexed: 01/28/2024]
Abstract
Anaerobic fermentation is an effective method to harvest volatile fatty acids (VFAs) from waste activated sludge (WAS). Accurately predicting and optimizing VFAs production is crucial for anaerobic fermentation engineering. In this study, we developed machine learning models using two innovative strategies to precisely predict the daily yield of VFAs in a laboratory anaerobic fermenter. Strategy-1 focuses on model interpretability to comprehend the influence of variables of interest on VFAs production, while Strategy-2 takes into account the cost of variable acquisition, making it more suitable for practical applications in prediction and optimization. The results showed that Support Vector Regression emerged as the most effective model in this study, with testing R2 values of 0.949 and 0.939 for the two strategies, respectively. We conducted feature importance analysis to identify the critical factors that influence VFAs production. Detailed explanations were provided using partial dependence plots and Shepley Additive Explanations analyses. To optimize VFAs production, we integrated the developed model with optimization algorithms, resulting in a maximum yield of 2997.282 mg/L. This value was 45.2 % higher than the average VFAs level in the operated fermenter. Our study offers valuable insights for predicting and optimizing VFAs production in sludge anaerobic fermentation, and it facilitates engineering practice in VFAs harvesting from WAS.
Collapse
Affiliation(s)
- Shixin Zhai
- Beijing Key Lab for Source Control Technology of Water Pollution, Beijing Forestry University, Beijing 100083, China
| | - Kai Chen
- Beijing Key Lab for Source Control Technology of Water Pollution, Beijing Forestry University, Beijing 100083, China
| | - Lisha Yang
- Beijing Key Lab for Source Control Technology of Water Pollution, Beijing Forestry University, Beijing 100083, China
| | - Zhuo Li
- Beijing Key Lab for Source Control Technology of Water Pollution, Beijing Forestry University, Beijing 100083, China
| | - Tong Yu
- Beijing Key Lab for Source Control Technology of Water Pollution, Beijing Forestry University, Beijing 100083, China
| | - Long Chen
- Beijing Key Lab for Source Control Technology of Water Pollution, Beijing Forestry University, Beijing 100083, China
| | - Hongtao Zhu
- Beijing Key Lab for Source Control Technology of Water Pollution, Beijing Forestry University, Beijing 100083, China.
| |
Collapse
|
18
|
Khanna M, Singh LK, Shrivastava K, Singh R. An enhanced and efficient approach for feature selection for chronic human disease prediction: A breast cancer study. Heliyon 2024; 10:e26799. [PMID: 38463826 PMCID: PMC10920178 DOI: 10.1016/j.heliyon.2024.e26799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 01/15/2024] [Accepted: 02/20/2024] [Indexed: 03/12/2024] Open
Abstract
Computer-aided diagnosis (CAD) systems play a vital role in modern research by effectively minimizing both time and costs. These systems support healthcare professionals like radiologists in their decision-making process by efficiently detecting abnormalities as well as offering accurate and dependable information. These systems heavily depend on the efficient selection of features to accurately categorize high-dimensional biological data. These features can subsequently assist in the diagnosis of related medical conditions. The task of identifying patterns in biomedical data can be quite challenging due to the presence of numerous irrelevant or redundant features. Therefore, it is crucial to propose and then utilize a feature selection (FS) process in order to eliminate these features. The primary goal of FS approaches is to improve the accuracy of classification by eliminating features that are irrelevant or less informative. The FS phase plays a critical role in attaining optimal results in machine learning (ML)-driven CAD systems. The effectiveness of ML models can be significantly enhanced by incorporating efficient features during the training phase. This empirical study presents a methodology for the classification of biomedical data using the FS technique. The proposed approach incorporates three soft computing-based optimization algorithms, namely Teaching Learning-Based Optimization (TLBO), Elephant Herding Optimization (EHO), and a proposed hybrid algorithm of these two. These algorithms were previously employed; however, their effectiveness in addressing FS issues in predicting human diseases has not been investigated. The following evaluation focuses on the categorization of benign and malignant tumours using the publicly available Wisconsin Diagnostic Breast Cancer (WDBC) benchmark dataset. The five-fold cross-validation technique is employed to mitigate the risk of over-fitting. The evaluation of the proposed approach's proficiency is determined based on several metrics, including sensitivity, specificity, precision, accuracy, area under the receiver-operating characteristic curve (AUC), and F1-score. The best value of accuracy computed through the suggested approach is 97.96%. The proposed clinical decision support system demonstrates a highly favourable classification performance outcome, making it a valuable tool for medical practitioners to utilize as a secondary opinion and reducing the overburden of expert medical practitioners.
Collapse
Affiliation(s)
- Munish Khanna
- School of Computing Science and Engineering, Galgotias University, Greater Noida, Gautam Buddh Nagar, India
| | - Law Kumar Singh
- Department of Computer Engineering and Applications, GLA University, Mathura, India
| | - Kapil Shrivastava
- Department of Computer Engineering and Applications, GLA University, Mathura, India
| | - Rekha Singh
- Department of Physics, Uttar Pradesh Rajarshi Tandon Open University, Prayagraj, Uttar Pradesh, India
| |
Collapse
|
19
|
Ullah MS, Khan MA, Almujally NA, Alhaisoni M, Akram T, Shabaz M. BrainNet: a fusion assisted novel optimal framework of residual blocks and stacked autoencoders for multimodal brain tumor classification. Sci Rep 2024; 14:5895. [PMID: 38467755 PMCID: PMC10928185 DOI: 10.1038/s41598-024-56657-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Accepted: 03/08/2024] [Indexed: 03/13/2024] Open
Abstract
A significant issue in computer-aided diagnosis (CAD) for medical applications is brain tumor classification. Radiologists could reliably detect tumors using machine learning algorithms without extensive surgery. However, a few important challenges arise, such as (i) the selection of the most important deep learning architecture for classification (ii) an expert in the field who can assess the output of deep learning models. These difficulties motivate us to propose an efficient and accurate system based on deep learning and evolutionary optimization for the classification of four types of brain modalities (t1 tumor, t1ce tumor, t2 tumor, and flair tumor) on a large-scale MRI database. Thus, a CNN architecture is modified based on domain knowledge and connected with an evolutionary optimization algorithm to select hyperparameters. In parallel, a Stack Encoder-Decoder network is designed with ten convolutional layers. The features of both models are extracted and optimized using an improved version of Grey Wolf with updated criteria of the Jaya algorithm. The improved version speeds up the learning process and improves the accuracy. Finally, the selected features are fused using a novel parallel pooling approach that is classified using machine learning and neural networks. Two datasets, BraTS2020 and BraTS2021, have been employed for the experimental tasks and obtained an improved average accuracy of 98% and a maximum single-classifier accuracy of 99%. Comparison is also conducted with several classifiers, techniques, and neural nets; the proposed method achieved improved performance.
Collapse
Affiliation(s)
| | - Muhammad Attique Khan
- Department of Computer Science and Mathematics, Lebanese American University, Beirut, Lebanon
- Department of Computer Science, HITEC University, Taxila, 47080, Pakistan
| | - Nouf Abdullah Almujally
- Department of Information Systems, College of Computer and Information Sciences, Princess Nourah Bint Abdulrahman University, PO Box 84428, 11671, Riyadh, Saudi Arabia
| | - Majed Alhaisoni
- Computer Sciences Department, College of Computer and Information Sciences, Princess Nourah Bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Tallha Akram
- Department of ECE, COMSATS University Islamabad, Wah Campus, Rawalpindi, Pakistan
| | - Mohammad Shabaz
- Model Institute of Engineering and Technology, Jammu, J&K, India.
| |
Collapse
|
20
|
Lima HS, Oliveira GFVD, Ferreira RDS, Castro AGD, Silva LCF, Ferreira LDS, Oliveira DADS, Silva LFD, Kasuya MCM, de Paula SO, Silva CCD. Machine learning-based soil quality assessment for enhancing environmental monitoring in iron ore mining-impacted ecosystems. J Environ Manage 2024; 356:120559. [PMID: 38471324 DOI: 10.1016/j.jenvman.2024.120559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 02/22/2024] [Accepted: 03/05/2024] [Indexed: 03/14/2024]
Abstract
In November 2015, a catastrophic rupture of the Fundão dam in Mariana (Brazil), resulted in extensive socio-economic and environmental repercussions that persist to this day. In response, several reforestation programs were initiated to remediate the impacted regions. However, accurately assessing soil health in these areas is a complex endeavor. This study employs machine learning techniques to predict soil quality indicators that effectively differentiate between the stages of recovery in these areas. For this, a comprehensive set of soil parameters, encompassing 3 biological, 16 chemical, and 3 physical parameters, were evaluated for samples exposed to mining tailings and those unaffected, totaling 81 and 6 samples, respectively, which were evaluated over 2 years. The most robust model was the decision tree with a restriction of fewer levels to simplify the tree structure. In this model, Cation Exchange Capacity (CEC), Microbial Biomass Carbon (MBC), Base Saturation (BS), and Effective Cation Exchange Capacity (eCEC) emerged as the most pivotal factors influencing model fitting. This model achieved an accuracy score of 92% during training and 93% during testing for determining stages of recovery. The model developed in this study has the potential to revolutionize the monitoring efforts conducted by regulatory agencies in these regions. By reducing the number of parameters that necessitate evaluation, this enhanced efficiency promises to expedite recovery monitoring, simultaneously enhancing cost-effectiveness while upholding the analytical rigor of assessments.
Collapse
Affiliation(s)
- Helena Santiago Lima
- Laboratory of Applied Environmental Microbiology, Department of Microbiology, Federal University of Viçosa, Viçosa, MG, Brazil.
| | | | | | - Alex Gazolla de Castro
- Laboratory of Applied Environmental Microbiology, Department of Microbiology, Federal University of Viçosa, Viçosa, MG, Brazil.
| | - Lívia Carneiro Fidélis Silva
- Laboratory of Applied Environmental Microbiology, Department of Microbiology, Federal University of Viçosa, Viçosa, MG, Brazil.
| | - Letícia de Souza Ferreira
- Laboratory of Applied Environmental Microbiology, Department of Microbiology, Federal University of Viçosa, Viçosa, MG, Brazil.
| | | | | | | | | | - Cynthia Canêdo da Silva
- Laboratory of Applied Environmental Microbiology, Department of Microbiology, Federal University of Viçosa, Viçosa, MG, Brazil.
| |
Collapse
|
21
|
Kim SH, Kim DY, Chun SW, Kim J, Woo J. Impartial feature selection using multi-agent reinforcement learning for adverse glycemic event prediction. Comput Biol Med 2024; 173:108257. [PMID: 38520922 DOI: 10.1016/j.compbiomed.2024.108257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 02/02/2024] [Accepted: 03/06/2024] [Indexed: 03/25/2024]
Abstract
We developed an attention model to predict future adverse glycemic events 30 min in advance based on the observation of past glycemic values over a 35 min period. The proposed model effectively encodes insulin administration and meal intake time using Time2Vec (T2V) for glucose prediction. The proposed impartial feature selection algorithm is designed to distribute rewards proportionally according to agent contributions. Agent contributions are calculated by a step-by-step negation of updated agents. Thus, the proposed feature selection algorithm optimizes features from electronic medical records to improve performance. For evaluation, we collected continuous glucose monitoring data from 102 patients with type 2 diabetes admitted to Cheonan Hospital, Soonchunhyang University. Using our proposed model, we achieved F1-scores of 89.0%, 60.6%, and 89.8% for normoglycemia, hypoglycemia, and hyperglycemia, respectively.
Collapse
Affiliation(s)
- Seo-Hee Kim
- Department of ICT Convergence, Soonchunhyang University, Asan, South Korea
| | - Dae-Yeon Kim
- Department of Laboratory Medicine, Soonchunhyang University Cheonan Hospital, Cheonan, South Korea.
| | - Sung-Wan Chun
- Department of Laboratory Medicine, Soonchunhyang University Cheonan Hospital, Cheonan, South Korea
| | - Jaeyun Kim
- Department of AI and Big Data, Soonchunhyang University, Asan, South Korea
| | - Jiyoung Woo
- Department of AI and Big Data, Soonchunhyang University, Asan, South Korea.
| |
Collapse
|
22
|
Zhang Q, Coury R, Tang W. Prediction of conversion from mild cognitive impairment to Alzheimer's disease and simultaneous feature selection and grouping using Medicaid claim data. Alzheimers Res Ther 2024; 16:54. [PMID: 38461266 PMCID: PMC10924319 DOI: 10.1186/s13195-024-01421-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 02/27/2024] [Indexed: 03/11/2024]
Abstract
BACKGROUND Due to the heterogeneity among patients with Mild Cognitive Impairment (MCI), it is critical to predict their risk of converting to Alzheimer's disease (AD) early using routinely collected real-world data such as the electronic health record data or administrative claim data. METHODS The study used MarketScan Multi-State Medicaid data to construct a cohort of MCI patients. Logistic regression with tree-guided lasso regularization (TGL) was proposed to select important features and predict the risk of converting to AD. A subsampling-based technique was used to extract robust groups of predictive features. Predictive models including logistic regression, generalized random forest, and artificial neural network were trained using the extracted features. RESULTS The proposed TGL workflow selected feature groups that were robust, highly interpretable, and consistent with existing literature. The predictive models using TGL selected features demonstrated higher prediction accuracy than the models using all features or features selected using other methods. CONCLUSIONS The identified feature groups provide insights into the progression from MCI to AD and can potentially improve risk prediction in clinical practice and trial recruitment.
Collapse
Affiliation(s)
- Qi Zhang
- Department of Mathematics and Statistics, University of New Hampshire, Durham, NH, 03824, USA.
| | - Ron Coury
- Department of Mathematics and Statistics, University of New Hampshire, Durham, NH, 03824, USA
| | - Wenlong Tang
- Takeda Pharmaceuticals, Cambridge, MA, 02142, USA
| |
Collapse
|
23
|
Dutta S, Zunjare RU, Sil A, Mishra DC, Arora A, Gain N, Chand G, Chhabra R, Muthusamy V, Hossain F. Prediction of matrilineal specific patatin-like protein governing in-vivo maternal haploid induction in maize using support vector machine and di-peptide composition. Amino Acids 2024; 56:20. [PMID: 38460024 DOI: 10.1007/s00726-023-03368-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 12/05/2023] [Indexed: 03/11/2024]
Abstract
The mutant matrilineal (mtl) gene encoding patatin-like phospholipase activity is involved in in-vivo maternal haploid induction in maize. Doubling of chromosomes in haploids by colchicine treatment leads to complete fixation of inbreds in just one generation compared to 6-7 generations of selfing. Thus, knowledge of patatin-like proteins in other crops assumes great significance for in-vivo haploid induction. So far, no online tool is available that can classify unknown proteins into patatin-like proteins. Here, we aimed to optimize a machine learning-based algorithm to predict the patatin-like phospholipase activity of unknown proteins. Four different kernels [radial basis function (RBF), sigmoid, polynomial, and linear] were used for building support vector machine (SVM) classifiers using six different sequence-based compositional features (AAC, DPC, GDPC, CTDC, CTDT, and GAAC). A total of 1170 protein sequences including both patatin-like (585 sequences) from various monocots, dicots, and microbes; and non-patatin-like proteins (585 sequences) from different subspecies of Zea mays were analyzed. RBF and polynomial kernels were quite promising in the prediction of patatin-like proteins. Among six sequence-based compositional features, di-peptide composition attained > 90% prediction accuracies using RBF and polynomial kernels. Using mutual information, most explaining dipeptides that contributed the highest to the prediction process were identified. The knowledge generated in this study can be utilized in other crops prior to the initiation of any experiment. The developed SVM model opened a new paradigm for scientists working in in-vivo haploid induction in commercial crops. This is the first report of machine learning of the identification of proteins with patatin-like activity.
Collapse
Affiliation(s)
- Suman Dutta
- ICAR-Indian Agricultural Research Institute, New Delhi, India
| | | | - Anirban Sil
- ICAR-Indian Agricultural Research Institute, New Delhi, India
| | | | - Alka Arora
- ICAR-Indian Agricultural Statistical Research Institute, New Delhi, India
| | - Nisrita Gain
- ICAR-Indian Agricultural Research Institute, New Delhi, India
| | - Gulab Chand
- ICAR-Indian Agricultural Research Institute, New Delhi, India
| | - Rashmi Chhabra
- ICAR-Indian Agricultural Research Institute, New Delhi, India
| | | | - Firoz Hossain
- ICAR-Indian Agricultural Research Institute, New Delhi, India.
| |
Collapse
|
24
|
Peng M, Lin B, Zhang J, Zhou Y, Lin B. scFSNN: a feature selection method based on neural network for single-cell RNA-seq data. BMC Genomics 2024; 25:264. [PMID: 38459442 PMCID: PMC10924397 DOI: 10.1186/s12864-024-10160-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 02/25/2024] [Indexed: 03/10/2024] Open
Abstract
While single-cell RNA sequencing (scRNA-seq) allows researchers to analyze gene expression in individual cells, its unique characteristics like over-dispersion, zero-inflation, high gene-gene correlation, and large data volume with many features pose challenges for most existing feature selection methods. In this paper, we present a feature selection method based on neural network (scFSNN) to solve classification problem for the scRNA-seq data. scFSNN is an embedded method that can automatically select features (genes) during model training, control the false discovery rate of selected features and adaptively determine the number of features to be eliminated. Extensive simulation and real data studies demonstrate its excellent feature selection ability and predictive performance.
Collapse
Affiliation(s)
- Minjiao Peng
- School of Mathematical Sciences, Shenzhen University, Nanshan, Shenzhen, 518060, Guangdong, China
- School of Mathematics and Statistics and KLAS, Northeast Normal University, Renmin Street, Changchun, 130000, Jilin, China
| | - Baoqin Lin
- Experimental Center, The First Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangzhou, Guangdong, 510405, China
| | - Jun Zhang
- School of Mathematical Sciences, Shenzhen University, Nanshan, Shenzhen, 518060, Guangdong, China
| | - Yan Zhou
- School of Mathematical Sciences, Shenzhen University, Nanshan, Shenzhen, 518060, Guangdong, China
| | - Bingqing Lin
- School of Mathematical Sciences, Shenzhen University, Nanshan, Shenzhen, 518060, Guangdong, China.
| |
Collapse
|
25
|
Tutsoy O, Koç GG. Deep self-supervised machine learning algorithms with a novel feature elimination and selection approaches for blood test-based multi-dimensional health risks classification. BMC Bioinformatics 2024; 25:103. [PMID: 38459463 PMCID: PMC10921629 DOI: 10.1186/s12859-024-05729-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 03/04/2024] [Indexed: 03/10/2024] Open
Abstract
BACKGROUND Blood test is extensively performed for screening, diagnoses and surveillance purposes. Although it is possible to automatically evaluate the raw blood test data with the advanced deep self-supervised machine learning approaches, it has not been profoundly investigated and implemented yet. RESULTS This paper proposes deep machine learning algorithms with multi-dimensional adaptive feature elimination, self-feature weighting and novel feature selection approaches. To classify the health risks based on the processed data with the deep layers, four machine learning algorithms having various properties from being utterly model free to gradient driven are modified. CONCLUSIONS The results show that the proposed deep machine learning algorithms can remove the unnecessary features, assign self-importance weights, selects their most informative ones and classify the health risks automatically from the worst-case low to worst-case high values.
Collapse
Affiliation(s)
- Onder Tutsoy
- Adana Alparslan Turkes Science and Technology University, Adana, Turkey.
| | - Gizem Gul Koç
- Adana Alparslan Turkes Science and Technology University, Adana, Turkey
| |
Collapse
|
26
|
Akbar S, Raza A, Zou Q. Deepstacked-AVPs: predicting antiviral peptides using tri-segment evolutionary profile and word embedding based multi-perspective features with deep stacking model. BMC Bioinformatics 2024; 25:102. [PMID: 38454333 PMCID: PMC10921744 DOI: 10.1186/s12859-024-05726-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 03/01/2024] [Indexed: 03/09/2024] Open
Abstract
BACKGROUND Viral infections have been the main health issue in the last decade. Antiviral peptides (AVPs) are a subclass of antimicrobial peptides (AMPs) with substantial potential to protect the human body against various viral diseases. However, there has been significant production of antiviral vaccines and medications. Recently, the development of AVPs as an antiviral agent suggests an effective way to treat virus-affected cells. Recently, the involvement of intelligent machine learning techniques for developing peptide-based therapeutic agents is becoming an increasing interest due to its significant outcomes. The existing wet-laboratory-based drugs are expensive, time-consuming, and cannot effectively perform in screening and predicting the targeted motif of antiviral peptides. METHODS In this paper, we proposed a novel computational model called Deepstacked-AVPs to discriminate AVPs accurately. The training sequences are numerically encoded using a novel Tri-segmentation-based position-specific scoring matrix (PSSM-TS) and word2vec-based semantic features. Composition/Transition/Distribution-Transition (CTDT) is also employed to represent the physiochemical properties based on structural features. Apart from these, the fused vector is formed using PSSM-TS features, semantic information, and CTDT descriptors to compensate for the limitations of single encoding methods. Information gain (IG) is applied to choose the optimal feature set. The selected features are trained using a stacked-ensemble classifier. RESULTS The proposed Deepstacked-AVPs model achieved a predictive accuracy of 96.60%%, an area under the curve (AUC) of 0.98, and a precision-recall (PR) value of 0.97 using training samples. In the case of the independent samples, our model obtained an accuracy of 95.15%, an AUC of 0.97, and a PR value of 0.97. CONCLUSION Our Deepstacked-AVPs model outperformed existing models with a ~ 4% and ~ 2% higher accuracy using training and independent samples, respectively. The reliability and efficacy of the proposed Deepstacked-AVPs model make it a valuable tool for scientists and may perform a beneficial role in pharmaceutical design and research academia.
Collapse
Affiliation(s)
- Shahid Akbar
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, People's Republic of China
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, 23200, KP, Pakistan
| | - Ali Raza
- Department of Physical and Numerical Sciences, Qurtuba University of Science and Information Technology, Peshawar, 25124, KP, Pakistan
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, People's Republic of China.
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, People's Republic of China.
| |
Collapse
|
27
|
Shi R, Chang L, Shi L, Zhang Z, Zhang L, Li X. Development and validation of a prognostic model for cervical cancer by combination of machine learning and high-throughput sequencing. Eur J Surg Oncol 2024; 50:108241. [PMID: 38452717 DOI: 10.1016/j.ejso.2024.108241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Revised: 01/02/2024] [Accepted: 02/29/2024] [Indexed: 03/09/2024]
Abstract
BACKGROUND Cervical cancer holds the highest morbidity and mortality rates among female reproductive tract tumors. However, the curative outcomes for patients with persistent, recurrent, or metastatic cervical cancer remain unsatisfactory. There is a lack of comprehensive prognostic indicators for cervical cancer. This study aims to develop a model that evaluates the prognosis of cervical cancer in combination of high-throughput sequencing and various machine learning algorithms. METHODS In this study, we combined two single-cell RNA sequencing (scRNA-seq) projects and TCGA data for cervical cancer to obtain shared differentially expressed genes (DEGs). A LASSO regression and several learners were applied for signature feature selection. Six machine learning algorithms including Linear Discriminant Analysis, Naive Bayes, K Nearest Neighbors, Decision Tree, Random Forest, and eXtreme Gradient Boosting were utilized to construct a prognostic model for cervical cancer. External validation was conducted using the CGCI-HTMCP-CC dataset, and the accuracy of the model was assessed through ROC curve analysis. RESULTS The results demonstrated the successful construction of a prognostic model based on DEGs from bulk- and scRNA-seq data. Ten genes CXCL8, DLC1, GRN, MPLKIP, PRDX1, RUNX1, SNX3, TFRC, UBE2V2, and UQCRC1 were screened by feature selection and applied for model construction. Random Forest exhibited the best performance in predicting the risk of cervical cancer. Patients in the high-risk group presented worse overall survival compared to those in the low-risk group. CONCLUSION Conclusively, our model based on DEGs from bulk-seq and scRNA-seq data effectively evaluates the prognosis of cervical cancer and provides valuable insights for comprehensive clinical management.
Collapse
Affiliation(s)
- Rui Shi
- Department of Obstetrics and Gynecology, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, 200120, China
| | - Linlin Chang
- Department of Obstetrics and Gynecology, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, 200120, China
| | - Liya Shi
- Department of Reproductive Medicine Center, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, 200120, China
| | - Zhouxiang Zhang
- Department of Obstetrics and Gynecology, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, 200120, China
| | - Limin Zhang
- Department of Obstetrics and Gynecology, The First Affiliated Hospital, Fujian Medical University, Fuzhou, 350005, China; Department of Obstetrics and Gynecology, National Regional Medical Center, Binhai Campus of the First Affiliated Hospital, Fujian Medical University, Fuzhou, 350212, China.
| | - Xiaona Li
- Department of Obstetrics and Gynecology, Shanghai East Hospital, Tongji University School of Medicine, Shanghai, 200120, China.
| |
Collapse
|
28
|
Roy S, Singh J, Ray SS. Weighted Combination of Łukasiewicz implication and Fuzzy Jaccard similarity in Hybrid Ensemble Framework (WCLFJHEF) for Gene Selection. Comput Biol Med 2024; 170:107981. [PMID: 38262204 DOI: 10.1016/j.compbiomed.2024.107981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 01/02/2024] [Accepted: 01/12/2024] [Indexed: 01/25/2024]
Abstract
A framework is developed for gene expression analysis by introducing fuzzy Jaccard similarity (FJS) and combining Łukasiewicz implication with it through weights in hybrid ensemble framework (WCLFJHEF) for gene selection in cancer. The method is called weighted combination of Łukasiewicz implication and fuzzy Jaccard similarity in hybrid ensemble framework (WCLFJHEF). While the fuzziness in Jaccard similarity is incorporated by using the existing Gödel fuzzy logic, the weights are obtained by maximizing the average F-score of selected genes in classifying the cancer patients. The patients are first divided into different clusters, based on the number of patient groups, using average linkage agglomerative clustering and a new score, called WCLFJ (weighted combination of Łukasiewicz implication and fuzzy Jaccard similarity). The genes are then selected from each cluster separately using filter based Relief-F and wrapper based SVMRFE (Support Vector Machine with Recursive Feature Elimination). A gene (feature) pool is created by considering the union of selected features for all the clusters. A set of informative genes is selected from the pool using sequential backward floating search (SBFS) algorithm. Patients are then classified using Naïve Bayes'(NB) and Support Vector Machine (SVM) separately, using the selected genes and the related F-scores are calculated. The weights in WCLFJ are then updated iteratively to maximize the average F-score obtained from the results of the classifier. The effectiveness of WCLFJHEF is demonstrated on six gene expression datasets. The average values of accuracy, F-score, recall, precision and MCC over all the datasets, are 95%, 94%, 94%, 94%, and 90%, respectively. The explainability of the selected genes is shown using SHapley Additive exPlanations (SHAP) values and this information is further used to rank them. The relevance of the selected gene set are biologically validated using the KEGG Pathway, Gene Ontology (GO), and existing literatures. It is seen that the genes that are selected by WCLFJHEF are candidates for genomic alterations in the various cancer types. The source code of WCLFJHEF is available at http://www.isical.ac.in/~shubhra/WCLFJHEF.html.
Collapse
Affiliation(s)
- Sukriti Roy
- Machine Intelligence Unit, Indian Statistical Institute, Kolkata 700108, India.
| | - Joginder Singh
- Center for Soft Computing Research, Indian Statistical Institute, Kolkata 700108, India.
| | - Shubhra Sankar Ray
- Machine Intelligence Unit, Indian Statistical Institute, Kolkata 700108, India; Center for Soft Computing Research, Indian Statistical Institute, Kolkata 700108, India.
| |
Collapse
|
29
|
Liang P, Li H, Long C, Liu M, Zhou J, Zuo Y. Chromatin region binning of gene expression for improving embryo cell subtype identification. Comput Biol Med 2024; 170:108049. [PMID: 38290319 DOI: 10.1016/j.compbiomed.2024.108049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 01/01/2024] [Accepted: 01/26/2024] [Indexed: 02/01/2024]
Abstract
Mammalian embryonic development is a complex process, characterized by intricate spatiotemporal dynamics and distinct chromatin preferences. However, the quick diversification in early embryogenesis leads to significant cellular diversity and the sparsity of scRNA-seq data, posing challenges in accurately determining cell fate decisions. In this study, we introduce a chromatin region binning method using scChrBin, designed to identify chromatin regions that elucidate the dynamics of embryonic development and lineage differentiation. This method transforms scRNA-seq data into a chromatin-based matrix, leveraging genomic annotations. Our results showed that the scChrBin method achieves high accuracy, with 98.0% and 89.2% on two single-cell embryonic datasets, demonstrating its effectiveness in analyzing complex developmental processes. We also systematically and comprehensively analysis of these key chromatin binning regions and their associated genes, focusing on their roles in lineage and stage development. The perspective of chromatin region binning method enables a comprehensive analysis of transcriptome data at the chromatin level, allowing us to unveil the dynamic expression of chromatin regions across temporal and spatial development. The tool is available as an application at https://github.com/liameihao/scChrBin.
Collapse
Affiliation(s)
- Pengfei Liang
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Hanshuang Li
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Chunshen Long
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Mingzhu Liu
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Jian Zhou
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Yongchun Zuo
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Institutes of Biomedical Sciences, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China.
| |
Collapse
|
30
|
Zhang C, Xue Y, Neri F, Cai X, Slowik A. Multi-Objective Self-Adaptive Particle Swarm Optimization for Large-Scale Feature Selection in Classification. Int J Neural Syst 2024; 34:2450014. [PMID: 38352979 DOI: 10.1142/s012906572450014x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Feature selection (FS) is recognized for its role in enhancing the performance of learning algorithms, especially for high-dimensional datasets. In recent times, FS has been framed as a multi-objective optimization problem, leading to the application of various multi-objective evolutionary algorithms (MOEAs) to address it. However, the solution space expands exponentially with the dataset's dimensionality. Simultaneously, the extensive search space often results in numerous local optimal solutions due to a large proportion of unrelated and redundant features [H. Adeli and H. S. Park, Fully automated design of super-high-rise building structures by a hybrid ai model on a massively parallel machine, AI Mag. 17 (1996) 87-93]. Consequently, existing MOEAs struggle with local optima stagnation, particularly in large-scale multi-objective FS problems (LSMOFSPs). Different LSMOFSPs generally exhibit unique characteristics, yet most existing MOEAs rely on a single candidate solution generation strategy (CSGS), which may be less efficient for diverse LSMOFSPs [H. S. Park and H. Adeli, Distributed neural dynamics algorithms for optimization of large steel structures, J. Struct. Eng. ASCE 123 (1997) 880-888; M. Aldwaik and H. Adeli, Advances in optimization of highrise building structures, Struct. Multidiscip. Optim. 50 (2014) 899-919; E. G. González, J. R. Villar, Q. Tan, J. Sedano and C. Chira, An efficient multi-robot path planning solution using a* and coevolutionary algorithms, Integr. Comput. Aided Eng. 30 (2022) 41-52]. Moreover, selecting an appropriate MOEA and determining its corresponding parameter values for a specified LSMOFSP is time-consuming. To address these challenges, a multi-objective self-adaptive particle swarm optimization (MOSaPSO) algorithm is proposed, combined with a rapid nondominated sorting approach. MOSaPSO employs a self-adaptive mechanism, along with five modified efficient CSGSs, to generate new solutions. Experiments were conducted on ten datasets, and the results demonstrate that the number of features is effectively reduced by MOSaPSO while lowering the classification error rate. Furthermore, superior performance is observed in comparison to its counterparts on both the training and test sets, with advantages becoming increasingly evident as the dimensionality increases.
Collapse
Affiliation(s)
- Chenyi Zhang
- School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing 210044, P. R. China
| | - Yu Xue
- School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing 210044, P. R. China
| | - Ferrante Neri
- NICE Research Group, School of Computer Science and Electronic Engineering, University of Surrey Guildford, GU2 7XS, UK
| | - Xu Cai
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, P. R. China
| | - Adam Slowik
- Department of Electronics and Computer Science, Koszalin University of Technology, Koszalin 75-453, Poland
| |
Collapse
|
31
|
Xing J, Li C, Wu P, Cai X, Ouyang J. Optimized fuzzy K-nearest neighbor approach for accurate lung cancer prediction based on radial endobronchial ultrasonography. Comput Biol Med 2024; 171:108038. [PMID: 38442552 DOI: 10.1016/j.compbiomed.2024.108038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 01/02/2024] [Accepted: 01/26/2024] [Indexed: 03/07/2024]
Abstract
Radial endobronchial ultrasonography (R-EBUS) has been a surge in the development of new ultrasonography for the diagnosis of pulmonary diseases beyond the central airway. However, it faces challenges in accurately pinpointing the location of abnormal lesions. Therefore, this study proposes an improved machine learning model aimed at distinguishing between malignant lung disease (MLD) from benign lung disease (BLD) through R-EBUS features. An enhanced manta ray foraging optimization based on elite perturbation search and cyclic mutation strategy (ECMRFO) is introduced at first. Experimental validation on 29 test functions from CEC 2017 demonstrates that ECMRFO exhibits superior optimization capabilities and robustness compared to other competing algorithms. Subsequently, it was combined with fuzzy k-nearest neighbor for the classification prediction of BLD and MLD. Experimental results indicate that the proposed modal achieves a remarkable prediction accuracy of up to 99.38%. Additionally, parameters such as R-EBUS1 Circle-dense sign, R-EBUS2 Hemi-dense sign, R-EBUS5 Onionskin sign and CCT5 mediastinum lymph node are identified as having significant clinical diagnostic value.
Collapse
Affiliation(s)
- Jie Xing
- Key Laboratory of Intelligent Informatics for Safety & Emergency of Zhejiang Province, Wenzhou University, Wenzhou, 325035, China.
| | - Chengye Li
- Department of Pulmonary and Critical Care Medicine, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China.
| | - Peiliang Wu
- Department of Pulmonary and Critical Care Medicine, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China.
| | - Xueding Cai
- Department of Pulmonary and Critical Care Medicine, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China.
| | - Jinsheng Ouyang
- Department of Pulmonary and Critical Care Medicine, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China.
| |
Collapse
|
32
|
Lin L, Long Y, Liu J, Deng D, Yuan Y, Liu L, Tan B, Qi H. FRP-XGBoost: Identification of ferroptosis-related proteins based on multi-view features. Int J Biol Macromol 2024; 262:130180. [PMID: 38360239 DOI: 10.1016/j.ijbiomac.2024.130180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 02/11/2024] [Accepted: 02/12/2024] [Indexed: 02/17/2024]
Abstract
Ferroptosis represents a novel form of programmed cell death. Pan-cancer bioinformatics analysis indicates that identifying and modulating ferroptosis offer innovative approaches for preventing and treating diverse tumor pathologies. However, the precise detection of ferroptosis-related proteins via conventional wet-laboratory techniques remains a formidable challenge, largely due to the constraints of existing methodologies. These traditional approaches are not only labor-intensive but also financially burdensome. Consequently, there is an imperative need for the development of more sophisticated and efficient computational tools to facilitate the detection of these proteins. In this paper, we presented a XGBoost and multi-view features-based machine learning prediction method for predicting ferroptosis-related proteins, which was referred to as FRP-XGBoost. In this study, we explored four types of protein feature extraction methods and evaluated their effectiveness in predicting ferroptosis-related proteins using six of the most commonly used traditional classifiers. To enhance the representational power of the hybrid features, we employed a two-step feature selection technique to identify the optimal subset of features. Subsequently, we constructed a prediction model using the XGBoost algorithm. The FRP-XGBoost achieved an accuracy of 96.74 % in 10-fold cross-validation and a further accuracy of 91.52 % in an independent test. The implementation source code of FRP-XGBoost is available at https://github.com/linli5417/FRP-XGBoost.
Collapse
Affiliation(s)
- Li Lin
- Department of Obstetrics and Gynecology, Women and Children's Hospital of Chongqing Medical University, Chongqing 401147, China; Department of Obstetrics and Gynecology, Chongqing Health Center for Women and Children, Chongqing 401147, China
| | - Yao Long
- Chongqing Key Laboratory of Maternal and Fetal Medicine, Chongqing Medical University, Chongqing 400016, China; Joint International Research Laboratory of Reproduction and Development, Chinese Ministry of Education, Chongqing Medical University, 400016, China; Department of Obstetrics, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, China
| | - Jinkai Liu
- Chongqing Key Laboratory of Maternal and Fetal Medicine, Chongqing Medical University, Chongqing 400016, China; Joint International Research Laboratory of Reproduction and Development, Chinese Ministry of Education, Chongqing Medical University, 400016, China; Department of Obstetrics, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, China
| | - Dongliang Deng
- Department of Oncology, Chongqing Traditional Chinese Medicine Hospital, Chongqing 400021, China
| | - Yu Yuan
- Department of Obstetrics and Gynecology, Women and Children's Hospital of Chongqing Medical University, Chongqing 401147, China; Department of Obstetrics and Gynecology, Chongqing Health Center for Women and Children, Chongqing 401147, China
| | - Lubin Liu
- Department of Obstetrics and Gynecology, Women and Children's Hospital of Chongqing Medical University, Chongqing 401147, China; Department of Obstetrics and Gynecology, Chongqing Health Center for Women and Children, Chongqing 401147, China
| | - Bin Tan
- Chongqing Key Laboratory of Maternal and Fetal Medicine, Chongqing Medical University, Chongqing 400016, China; Joint International Research Laboratory of Reproduction and Development, Chinese Ministry of Education, Chongqing Medical University, 400016, China; Department of Obstetrics, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, China.
| | - Hongbo Qi
- Department of Obstetrics and Gynecology, Women and Children's Hospital of Chongqing Medical University, Chongqing 401147, China; Department of Obstetrics and Gynecology, Chongqing Health Center for Women and Children, Chongqing 401147, China; Chongqing Key Laboratory of Maternal and Fetal Medicine, Chongqing Medical University, Chongqing 400016, China; Joint International Research Laboratory of Reproduction and Development, Chinese Ministry of Education, Chongqing Medical University, 400016, China.
| |
Collapse
|
33
|
Ye Z, Peng J, Zhang X, Song L. Identification of OSAHS patients based on ReliefF-mRMR feature selection. Phys Eng Sci Med 2024; 47:99-108. [PMID: 37878092 DOI: 10.1007/s13246-023-01345-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 10/09/2023] [Indexed: 10/26/2023]
Abstract
Obstructive Sleep Apnea Hypopnea Syndrome (OSAHS) is a serious chronic sleep disorder. Snoring is a common and easily observable symptom of OSAHS patients. The purpose of this work is to identify OSAHS patients by analyzing the acoustic characteristics of snoring sounds throughout the entire night. Ten types of acoustic features, such as Mel-frequency cepstral coefficients (MFCC), linear prediction coefficients (LPC) and spectral entropy among others, were extracted from the snoring sounds. A fused feature selection algorithm based on ReliefF and Max-Relevance and Min-Redundancy (mRMR) was proposed for optimal feature set selection. Four types of machine learning models were then applied to validate the effectiveness of OSAHS patient identification. The results show that the proposed feature selection algorithm can effectively select features with high contribution, including MFCC and LPC. Based on the selected top-20 features and using a support vector machine model, the accuracies in identifying OSAHS patients under the thresholds of AHI = 5,15, and 30, were 100%, 100%, and 98.94%, respectively. This indicates that the proposed model can effectively identify OSAHS patients.
Collapse
Affiliation(s)
- Ziqiang Ye
- School of Physics and Optoelectronics, South China University of Technology, Guangzhou, 510640, China
| | - Jianxin Peng
- School of Physics and Optoelectronics, South China University of Technology, Guangzhou, 510640, China.
| | - Xiaowen Zhang
- State Key Laboratory of Respiratory Disease, Department of Otolaryngology-Head and Neck Surgery, Laboratory of ENT-HNS Disease, First Affiliated Hospital, Guangzhou Medical University, Guangzhou, 510120, China
| | - Lijuan Song
- State Key Laboratory of Respiratory Disease, Department of Otolaryngology-Head and Neck Surgery, Laboratory of ENT-HNS Disease, First Affiliated Hospital, Guangzhou Medical University, Guangzhou, 510120, China
| |
Collapse
|
34
|
Gong M, Liang D, Xu D, Jin Y, Wang G, Shan P. Analyzing predictors of in-hospital mortality in patients with acute ST-segment elevation myocardial infarction using an evolved machine learning approach. Comput Biol Med 2024; 170:107950. [PMID: 38237236 DOI: 10.1016/j.compbiomed.2024.107950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 12/08/2023] [Accepted: 01/01/2024] [Indexed: 02/28/2024]
Abstract
Acute ST-segment elevation myocardial infarction (STEMI) is a severe cardiac ailment characterized by the sudden complete blockage of a portion of the coronary artery, leading to the interruption of blood supply to the myocardium. This study examines the medical records of 3205 STEMI patients admitted to the coronary care unit of the First Affiliated Hospital of Wenzhou Medical University from January 2014 to December 2021. In this research, a novel predictive framework for STEMI is proposed, incorporating evolutionary computational methods and machine learning techniques. A variant algorithm, AGCOSCA, is introduced by integrating crossover operation and observation bee strategy into the original Sine Cosine Algorithm (SCA). The effectiveness of AGCOSCA is initially validated using IEEE CEC 2017 benchmark functions, demonstrating its ability to mitigate the deficiency in local mining after SCA random perturbation. Building upon this foundation, the AGCOSCA approach has been paired with Support Vector Machine (SVM) to forge the predictive framework referred to as AGCOSCA-SVM. Specifically, AGCOSCA is employed to refine the selection of predictors from a substantial feature set before SVM is utilized to forecast the occurrence of STEMI. In our analysis, we observed that SVM excels at managing nonlinear data relationships, a strength that becomes particularly prominent in smaller datasets of STEMI patients. To assess the effectiveness of AGCOSCA-SVM, diagnostic experiments were conducted based on the STEMI sample data. Results indicate that AGCOSCA-SVM outperforms traditional machine learning methods, achieving superior Accuracy, Sensitivity, and Specificity values of 97.83 %, 93.75 %, and 96.67 %, respectively. The selected features, such as acute kidney injury (AKI) stage, fibrinogen, mean platelet volume (MPV), free triiodothyronine (FT3), diuretics, and Killip class during hospitalization, are identified as crucial for predicting STEMI. In conclusion, AGCOSCA-SVM emerges as a promising model framework for supporting the diagnostic process of STEMI, showcasing potential applications in clinical settings.
Collapse
Affiliation(s)
- Mengge Gong
- Department of Cardiovascular Medicine, The Heart Center, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, Zhejiang, China.
| | - Dongjie Liang
- Department of Cardiovascular Medicine, The Heart Center, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, Zhejiang, China.
| | - Diyun Xu
- Department of Cardiovascular Medicine, The Heart Center, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, Zhejiang, China.
| | - Youkai Jin
- Department of Cardiovascular Medicine, The Heart Center, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, Zhejiang, China.
| | - Guoqing Wang
- Zhejiang Suosi Technology Co. Ltd, Wenzhou, 325000, Zhejiang, China.
| | - Peiren Shan
- Department of Cardiovascular Medicine, The Heart Center, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, Zhejiang, China; Key Laboratory of Intelligent Treatment and Life Support for Critical Diseases of Zhejiang Province, Wenzhou, 325000, Zhejiang, China; Zhejiang Engineering Research Center for Hospital Emergency and Process Digitization, Wenzhou, 325000, Zhejiang, China.
| |
Collapse
|
35
|
Guo S, Mao C, Peng J, Xie S, Yang J, Xie W, Li W, Yang H, Guo H, Zhu Z, Zheng Y. Improved lung cancer classification by employing diverse molecular features of microRNAs. Heliyon 2024; 10:e26081. [PMID: 38384512 PMCID: PMC10878959 DOI: 10.1016/j.heliyon.2024.e26081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Accepted: 02/07/2024] [Indexed: 02/23/2024] Open
Abstract
MiRNAs are edited or modified in multiple ways during their biogenesis pathways. It was reported that miRNA editing was deregulated in tumors, suggesting the potential value of miRNA editing in cancer classification. Here we extracted three types of miRNA features from 395 LUAD and control samples, including the abundances of original miRNAs, the abundances of edited miRNAs, and the editing levels of miRNA editing sites. Our results show that eight classification algorithms selected generally had better performances on combined features than on the abundances of miRNAs or editing features of miRNAs alone. One feature selection algorithm, i.e., the DFL algorithm, selected only three features, i.e., the frequencies of hsa-miR-135b-5p, hsa-miR-210-3p and hsa-mir-182_48u (an edited miRNA), from 316 training samples. Seven classification algorithms achieved 100% accuracies on these three features for 79 independent testing samples. These results indicate that the additional information of miRNA editing is useful in improving the classification of LUAD samples.
Collapse
Affiliation(s)
- Shiyong Guo
- State Key Laboratory of Primate Biomedical Research; Institute of Primate Translational Medicine, Kunming University of Science and Technology, Kunming, Yunnan 650500, China
- College of Horticulture and Landscape, Yunnan Agricultural University, Kunming, Yunnan, 650201, China
- College of Big Data, Yunnan Agricultural University, Kunming, Yunnan, 650201, China
| | - Chunyi Mao
- State Key Laboratory of Primate Biomedical Research; Institute of Primate Translational Medicine, Kunming University of Science and Technology, Kunming, Yunnan 650500, China
| | - Jun Peng
- Department of Thoracic Surgery, The First People's Hospital of Yunnan Province, i.e., The Affiliated Hospital of Kunming University of Science and Technology, Kunming, Yunnan 650032, China
| | - Shaohui Xie
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong 518060, China
| | - Jun Yang
- School of Criminal Investigation, Yunnan Police College, Kunming, Yunnan 650223, China
| | - Wenping Xie
- College of Horticulture and Landscape, Yunnan Agricultural University, Kunming, Yunnan, 650201, China
- College of Big Data, Yunnan Agricultural University, Kunming, Yunnan, 650201, China
| | - Wanran Li
- College of Horticulture and Landscape, Yunnan Agricultural University, Kunming, Yunnan, 650201, China
- College of Big Data, Yunnan Agricultural University, Kunming, Yunnan, 650201, China
| | - Huaide Yang
- College of Horticulture and Landscape, Yunnan Agricultural University, Kunming, Yunnan, 650201, China
- College of Big Data, Yunnan Agricultural University, Kunming, Yunnan, 650201, China
| | - Hao Guo
- Department of Cardiology, The First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan 650032, China
| | - Zexuan Zhu
- National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen, Guangdong 518060, China
| | - Yun Zheng
- College of Horticulture and Landscape, Yunnan Agricultural University, Kunming, Yunnan, 650201, China
- College of Big Data, Yunnan Agricultural University, Kunming, Yunnan, 650201, China
| |
Collapse
|
36
|
Xie S, Lei L, Sun J, Xu J. [Research on emotion recognition method based on IWOA-ELM algorithm for electroencephalogram]. Sheng Wu Yi Xue Gong Cheng Xue Za Zhi 2024; 41:1-8. [PMID: 38403598 PMCID: PMC10894732 DOI: 10.7507/1001-5515.202303010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Emotion is a crucial physiological attribute in humans, and emotion recognition technology can significantly assist individuals in self-awareness. Addressing the challenge of significant differences in electroencephalogram (EEG) signals among different subjects, we introduce a novel mechanism in the traditional whale optimization algorithm (WOA) to expedite the optimization and convergence of the algorithm. Furthermore, the improved whale optimization algorithm (IWOA) was applied to search for the optimal training solution in the extreme learning machine (ELM) model, encompassing the best feature set, training parameters, and EEG channels. By testing 24 common EEG emotion features, we concluded that optimal EEG emotion features exhibited a certain level of specificity while also demonstrating some commonality among subjects. The proposed method achieved an average recognition accuracy of 92.19% in EEG emotion recognition, significantly reducing the manual tuning workload and offering higher accuracy with shorter training times compared to the control method. It outperformed existing methods, providing a superior performance and introducing a novel perspective for decoding EEG signals, thereby contributing to the field of emotion research from EEG signal.
Collapse
Affiliation(s)
- Songyun Xie
- School of Electronics and Information, Northwestern Polytechnical University, Xi'an 710129, P. R. China
| | - Lingjun Lei
- Medical Research Institute, Northwestern Polytechnical University, Xi'an 710129, P. R. China
| | - Jiang Sun
- School of Electronics and Information, Northwestern Polytechnical University, Xi'an 710129, P. R. China
| | - Jian Xu
- School of Electronics and Information, Northwestern Polytechnical University, Xi'an 710129, P. R. China
| |
Collapse
|
37
|
Atimbire SA, Appati JK, Owusu E. Empirical exploration of whale optimisation algorithm for heart disease prediction. Sci Rep 2024; 14:4530. [PMID: 38402276 PMCID: PMC10894250 DOI: 10.1038/s41598-024-54990-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 02/19/2024] [Indexed: 02/26/2024] Open
Abstract
Heart Diseases have the highest mortality worldwide, necessitating precise predictive models for early risk assessment. Much existing research has focused on improving model accuracy with single datasets, often neglecting the need for comprehensive evaluation metrics and utilization of different datasets in the same domain (heart disease). This research introduces a heart disease risk prediction approach by harnessing the whale optimization algorithm (WOA) for feature selection and implementing a comprehensive evaluation framework. The study leverages five distinct datasets, including the combined dataset comprising the Cleveland, Long Beach VA, Switzerland, and Hungarian heart disease datasets. The others are the Z-AlizadehSani, Framingham, South African, and Cleveland heart datasets. The WOA-guided feature selection identifies optimal features, subsequently integrated into ten classification models. Comprehensive model evaluation reveals significant improvements across critical performance metrics, including accuracy, precision, recall, F1 score, and the area under the receiver operating characteristic curve. These enhancements consistently outperform state-of-the-art methods using the same dataset, validating the effectiveness of our methodology. The comprehensive evaluation framework provides a robust assessment of the model's adaptability, underscoring the WOA's effectiveness in identifying optimal features in multiple datasets in the same domain.
Collapse
Affiliation(s)
| | | | - Ebenezer Owusu
- Department of Computer Science, University of Ghana, Accra, Ghana
| |
Collapse
|
38
|
Shoombuatong W, Homdee N, Schaduangrat N, Chumnanpuen P. Leveraging a meta-learning approach to advance the accuracy of Na v blocking peptides prediction. Sci Rep 2024; 14:4463. [PMID: 38396246 PMCID: PMC10891130 DOI: 10.1038/s41598-024-55160-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Accepted: 02/21/2024] [Indexed: 02/25/2024] Open
Abstract
The voltage-gated sodium (Nav) channel is a crucial molecular component responsible for initiating and propagating action potentials. While the α subunit, forming the channel pore, plays a central role in this function, the complete physiological function of Nav channels relies on crucial interactions between the α subunit and auxiliary proteins, known as protein-protein interactions (PPI). Nav blocking peptides (NaBPs) have been recognized as a promising and alternative therapeutic agent for pain and itch. Although traditional experimental methods can precisely determine the effect and activity of NaBPs, they remain time-consuming and costly. Hence, machine learning (ML)-based methods that are capable of accurately contributing in silico prediction of NaBPs are highly desirable. In this study, we develop an innovative meta-learning-based NaBP prediction method (MetaNaBP). MetaNaBP generates new feature representations by employing a wide range of sequence-based feature descriptors that cover multiple perspectives, in combination with powerful ML algorithms. Then, these feature representations were optimized to identify informative features using a two-step feature selection method. Finally, the selected informative features were applied to develop the final meta-predictor. To the best of our knowledge, MetaNaBP is the first meta-predictor for NaBP prediction. Experimental results demonstrated that MetaNaBP achieved an accuracy of 0.948 and a Matthews correlation coefficient of 0.898 over the independent test dataset, which were 5.79% and 11.76% higher than the existing method. In addition, the discriminative power of our feature representations surpassed that of conventional feature descriptors over both the training and independent test datasets. We anticipate that MetaNaBP will be exploited for the large-scale prediction and analysis of NaBPs to narrow down the potential NaBPs.
Collapse
Affiliation(s)
- Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| | - Nutta Homdee
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Pramote Chumnanpuen
- Department of Zoology, Faculty of Science, Kasetsart University, Bangkok, 10900, Thailand
- Omics Center for Agriculture, Bioresources, Food, and Health, Kasetsart University (OmiKU), Bangkok, 10900, Thailand
| |
Collapse
|
39
|
Chang JR, Yao ZF, Hsieh S, Nordling TEM. Age Prediction Using Resting-State Functional MRI. Neuroinformatics 2024:10.1007/s12021-024-09653-x. [PMID: 38341830 DOI: 10.1007/s12021-024-09653-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/21/2023] [Indexed: 02/13/2024]
Abstract
The increasing lifespan and large individual differences in cognitive capability highlight the importance of comprehending the aging process of the brain. Contrary to visible signs of bodily ageing, like greying of hair and loss of muscle mass, the internal changes that occur within our brains remain less apparent until they impair function. Brain age, distinct from chronological age, reflects our brain's health status and may deviate from our actual chronological age. Notably, brain age has been associated with mortality and depression. The brain is plastic and can compensate even for severe structural damage by rewiring. Functional characterization offers insights that structural cannot provide. Contrary to the multitude of studies relying on structural magnetic resonance imaging (MRI), we utilize resting-state functional MRI (rsfMRI). We also address the issue of inclusion of subjects with abnormal brain ageing through outlier removal. In this study, we employ the Least Absolute Shrinkage and Selection Operator (LASSO) to identify the 39 most predictive correlations derived from the rsfMRI data. The data is from a cohort of 176 healthy right-handed volunteers, aged 18-78 years (95/81 male/female, mean age 48, SD 17) collected at the Mind Research Imaging Center at the National Cheng Kung University. We establish a normal reference model by excluding 68 outliers, which achieves a leave-one-out mean absolute error of 2.48 years. By asking which additional features that are needed to predict the chronological age of the outliers with a smaller error, we identify correlations predictive of abnormal aging. These are associated with the Default Mode Network (DMN). Our normal reference model has the lowest prediction error among published models evaluated on adult subjects of almost all ages and is thus a candidate for screening for abnormal brain aging that has not yet manifested in cognitive decline. This study advances our ability to predict brain aging and provides insights into potential biomarkers for assessing brain age, suggesting that the role of DMN in brain aging should be studied further.
Collapse
Affiliation(s)
- Jose Ramon Chang
- Department of Mechanical Engineering, National Cheng Kung University, No. 1 University Rd., Tainan, 701, Taiwan
| | - Zai-Fu Yao
- College of Education, National Tsing Hua University, Hsinchu, 30013, Taiwan
- Research Center for Education and Mind Sciences, National Tsing Hua University, Hsinchu, 30013, Taiwan
- Department of Kinesiology, National Tsing Hua University, Hsinchu, 30013, Taiwan
- Basic Psychology Group, Department of Educational Psychology and Counseling, National Tsing Hua University, Hsinchu, 30013, Taiwan
| | - Shulan Hsieh
- Department of Psychology, National Cheng Kung University, No. 1 University Rd., Tainan, 701, Taiwan
- Institute of Allied Health Sciences, National Cheng Kung University, No. 1 University Rd., Tainan, 701, Taiwan
- Department of Public Health, College of Medicine, National Cheng Kung University, No. 1 University Rd., Tainan, 701, Taiwan
| | - Torbjörn E M Nordling
- Department of Mechanical Engineering, National Cheng Kung University, No. 1 University Rd., Tainan, 701, Taiwan.
| |
Collapse
|
40
|
Zhang S, Fan Z. Characterization of three-dimensional surface-breaking slots based on regression analysis of ultrasonic Rayleigh wave simulations. Ultrasonics 2024; 138:107261. [PMID: 38350313 DOI: 10.1016/j.ultras.2024.107261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 12/10/2023] [Accepted: 02/04/2024] [Indexed: 02/15/2024]
Abstract
Rayleigh waves travel along the surface of a solid structure, with most of their energy focusing within a depth of one wavelength. Thus, the reflection coefficient from a surface-breaking crack is highly sensitive to the ratio between the crack depth and the wavelength. It is possible to characterize the depth of surface-breaking cracks by measuring the features in the reflected waves. However, a feature value can correspond to multiple depth-wavelength ratios, i.e., the mapping is non-univalent, which brings difficulties for crack sizing using the feature. In this work, we use finite element method (FEM) software to perform 3-D numerical analysis on the interaction between Rayleigh waves and surface-breaking slots with various 3-D geometries. Multiple features are selected based on the nearest neighbour regression analysis on a numerical dataset, ensuring that a univalent mapping relationship from the selected features to the slot depth can be established. This relationship is then experimentally used to predict the depth of real slots with different geometries, showing reasonable accuracy.
Collapse
Affiliation(s)
- Shengyuan Zhang
- School of Mechanical and Aerospace Engineering, Nanyang Technological University, 50 Nanyang Avenue, 639798, Singapore
| | - Zheng Fan
- School of Mechanical and Aerospace Engineering, Nanyang Technological University, 50 Nanyang Avenue, 639798, Singapore.
| |
Collapse
|
41
|
Yang G, Li W, Xie W, Wang L, Yu K. An improved binary particle swarm optimization algorithm for clinical cancer biomarker identification in microarray data. Comput Methods Programs Biomed 2024; 244:107987. [PMID: 38157825 DOI: 10.1016/j.cmpb.2023.107987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 11/04/2023] [Accepted: 12/16/2023] [Indexed: 01/03/2024]
Abstract
BACKGROUND AND OBJECTIVE The limited number of samples and high-dimensional features in microarray data make selecting a small number of features for disease diagnosis a challenging problem. Traditional feature selection methods based on evolutionary algorithms are difficult to search for the optimal set of features in a limited time when dealing with the high-dimensional feature selection problem. New solutions are proposed to solve the above problems. METHODS In this paper, we propose a hybrid feature selection method (C-IFBPFE) for biomarker identification in microarray data, which combines clustering and improved binary particle swarm optimization while incorporating an embedded feature elimination strategy. Firstly, an adaptive redundant feature judgment method based on correlation clustering is proposed for feature screening to reduce the search space in the subsequent stage. Secondly, we propose an improved flipping probability-based binary particle swarm optimization (IFBPSO), better applicable to the binary particle swarm optimization problem. Finally, we also design a new feature elimination (FE) strategy embedded in the binary particle swarm optimization algorithm. This strategy gradually removes poorer features during iterations to reduce the number of features and improve accuracy. RESULTS We compared C-IFBPFE with other published hybrid feature selection methods on eight public datasets and analyzed the impact of each improvement. The proposed method outperforms other current state-of-the-art feature selection methods in terms of accuracy, number of features, sensitivity, and specificity. The ablation study of this method validates the efficacy of each component, especially the proposed feature elimination strategy significantly improves the performance of the algorithm. CONCLUSIONS The hybrid feature selection method proposed in this paper helps address the issue of high-dimensional microarray data with few samples. It can select a small subset of features and achieve high classification accuracy on microarray datasets. Additionally, independent validation of the selected features shows that those chosen by C-IFBPFE have strong correlations with disease phenotypes and can identify important biomarkers from data related to biomedical problems.
Collapse
Affiliation(s)
- Guicheng Yang
- College of Computer Science and Engineering, Northeastern University, Shenyang, 110000, Liaoning, China.
| | - Wei Li
- Key Laboratory of Intelligent Computing in Medical Image (MIIC), Northeastern University, Ministry of Education, Shenyang, 110000, Liaoning, China; National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Shenyang, 110819, Liaoning, China.
| | - Weidong Xie
- College of Computer Science and Engineering, Northeastern University, Shenyang, 110000, Liaoning, China.
| | - Linjie Wang
- College of Computer Science and Engineering, Northeastern University, Shenyang, 110000, Liaoning, China.
| | - Kun Yu
- College of Medicine and Bioinformation Engineering, Northeastern University, Shenyang, 110819, Liaoning, China.
| |
Collapse
|
42
|
Jenul A, Stokmo HL, Schrunner S, Hjortland GO, Revheim ME, Tomic O. Novel ensemble feature selection techniques applied to high-grade gastroenteropancreatic neuroendocrine neoplasms for the prediction of survival. Comput Methods Programs Biomed 2024; 244:107934. [PMID: 38016391 DOI: 10.1016/j.cmpb.2023.107934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 09/05/2023] [Accepted: 11/17/2023] [Indexed: 11/30/2023]
Abstract
BACKGROUND AND OBJECTIVE Determining the most informative features for predicting the overall survival of patients diagnosed with high-grade gastroenteropancreatic neuroendocrine neoplasms is crucial to improve individual treatment plans for patients, as well as the biological understanding of the disease. The main objective of this study is to evaluate the use of modern ensemble feature selection techniques for this purpose with respect to (a) quantitative performance measures such as predictive performance, (b) clinical interpretability, and (c) the effect of integrating prior expert knowledge. METHODS The Repeated Elastic Net Technique for Feature Selection (RENT) and the User-Guided Bayesian Framework for Feature Selection (UBayFS) are recently developed ensemble feature selectors investigated in this work. Both allow the user to identify informative features in datasets with low sample sizes and focus on model interpretability. While RENT is purely data-driven, UBayFS can integrate expert knowledge a priori in the feature selection process. In this work, we compare both feature selectors on a dataset comprising 63 patients and 110 features from multiple sources, including baseline patient characteristics, baseline blood values, tumor histology, imaging, and treatment information. RESULTS Our experiments involve data-driven and expert-driven setups, as well as combinations of both. In a five-fold cross-validated experiment without expert knowledge, our results demonstrate that both feature selectors allow accurate predictions: A reduction from 110 to approximately 20 features (around 82%) delivers near-optimal predictive performances with minor variations according to the choice of the feature selector, the predictive model, and the fold. Thereafter, we use findings from clinical literature as a source of expert knowledge. In addition, expert knowledge has a stabilizing effect on the feature set (an increase in stability of approximately 40%), while the impact on predictive performance is limited. CONCLUSIONS The features WHO Performance Status, Albumin, Platelets, Ki-67, Tumor Morphology, Total MTV, Total TLG, and SUVmax are the most stable and predictive features in our study. Overall, this study demonstrated the practical value of feature selection in medical applications not only to improve quantitative performance but also to deliver potentially new insights to experts.
Collapse
Affiliation(s)
- Anna Jenul
- Department of Data Science, Norwegian University of Life Sciences, Universitetstunet 3, 1433 Ås, Norway.
| | - Henning Langen Stokmo
- Department of Nuclear Medicine, Division of Radiology and Nuclear Medicine, Oslo University Hospital, Oslo, Norway; Institute of Clinical Medicine, University of Oslo, Oslo, Norway.
| | - Stefan Schrunner
- Department of Data Science, Norwegian University of Life Sciences, Universitetstunet 3, 1433 Ås, Norway.
| | | | - Mona-Elisabeth Revheim
- Department of Nuclear Medicine, Division of Radiology and Nuclear Medicine, Oslo University Hospital, Oslo, Norway; Institute of Clinical Medicine, University of Oslo, Oslo, Norway; The Intervention Centre, Division of Technology and Innovation, Oslo University Hospital, Oslo, Norway.
| | - Oliver Tomic
- Department of Data Science, Norwegian University of Life Sciences, Universitetstunet 3, 1433 Ås, Norway.
| |
Collapse
|
43
|
Rabie AH, Saleh AI. Diseases diagnosis based on artificial intelligence and ensemble classification. Artif Intell Med 2024; 148:102753. [PMID: 38325931 DOI: 10.1016/j.artmed.2023.102753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 12/11/2023] [Accepted: 12/22/2023] [Indexed: 02/09/2024]
Abstract
BACKGROUND In recent years, Computer Aided Diagnosis (CAD) has become an important research area that attracted a lot of researchers. In medical diagnostic systems, several attempts have been made to build and enhance CAD applications to avoid errors that can cause dangerously misleading medical treatments. The most exciting opportunity for promoting the performance of CAD system can be accomplished by integrating Artificial Intelligence (AI) in medicine. This allows the effective automation of traditional manual workflow, which is slow, inaccurate and affected by human errors. AIMS This paper aims to provide a complete Computer Aided Disease Diagnosis (CAD2) strategy based on Machine Learning (ML) techniques that can help clinicians to make better medical decisions. METHODS The proposed CAD2 consists of three main sequential phases, namely; (i) Outlier Rejection Phase (ORP), (ii) Feature Selection Phase (FSP), and (iii) Classification Phase (CP). ORP is implemented to reject outliers using new Outlier Rejection Technique (ORT) that contains two sequential stages called Fast Outlier Rejection (FOR) and Accurate Outlier Rejection (AOR). The most informative features are selected through FSP using Hybrid Selection Technique (HST). HST includes two main stages called Quick Selection Stage (QS2) using fisher score as a filter method and Precise Selection Stage (PS2) using a Hybrid Bio-inspired Optimization (HBO) technique as a wrapper method. Finally, actual diagnose takes place through CP, which relies on Ensemble Classification Technique (ECT). RESULTS The proposed CAD2 has been tested experimentally against recent disease diagnostic strategies using two different datasets in which the first contains several diseases, while the second includes data for Covid-19 patients only. Experimental results have proven the high efficiency of the proposed CAD2 in terms of accuracy, error, precision, and recall compared with other competitors. Additionally, CAD2 strategy provides the best Wilcoxon signed rank test and Friedman test measurements against other strategies according to both datasets. CONCLUSION It is concluded that CAD2 strategy based on ORP, FSP, and CP gave an accurate diagnosis compared to other strategies because it gave the highest accuracy and the lowest error and implementation time.
Collapse
Affiliation(s)
- Asmaa H Rabie
- Computer Engineering and Systems Dept., Faculty of Engineering, Mansoura University, Mansoura, Egypt.
| | - Ahmed I Saleh
- Computer Engineering and Systems Dept., Faculty of Engineering, Mansoura University, Mansoura, Egypt
| |
Collapse
|
44
|
Mao Y, Yu X. A hybrid forecasting approach for China's national carbon emission allowance prices with balanced accuracy and interpretability. J Environ Manage 2024; 351:119873. [PMID: 38159311 DOI: 10.1016/j.jenvman.2023.119873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 12/04/2023] [Accepted: 12/13/2023] [Indexed: 01/03/2024]
Abstract
A significant milestone in China's carbon market was reached with the official launch and operation of the National Carbon Emission Trading Market. The accurate prediction of the carbon price in this market is crucial for the government to formulate scientific policies regarding the carbon market and for companies to participate effectively. Nevertheless, it remains challenging to accurately predict price fluctuations in the carbon market because of the volatility and instability caused by several complex factors. This paper proposes a new carbon price forecasting framework that considers the potential factors influencing national carbon prices, including data decomposition and reconstruction techniques, feature selection techniques, machine learning forecasting techniques for intelligent optimisation, and research on model interpretability. This comprehensive framework aims to improve the accuracy and understandability of carbon price projections to respond better to the complexity and uncertainty of carbon markets. The results indicate that (1) the hybrid forecasting framework is highly accurate in forecasting national carbon market prices and far superior to other comparative models; (2) the factors driving national carbon prices vary according to the time scale. High-frequency series are sensitive to short-term economic and energy market indicators. Medium- and low-frequency series are more susceptible to financial markets and long-term economic conditions than high-frequency series. This study provides insights into the factors affecting China's national carbon market price and serves as a reference for companies and governments to develop carbon price forecasting tools.
Collapse
Affiliation(s)
- Yaqi Mao
- The Research Institute for Risk Governance and Emergency Decision-Making, School of Management Science and Engineering, Nanjing University of Information Science & Technology, Nanjing 210044, China
| | - Xiaobing Yu
- The Research Institute for Risk Governance and Emergency Decision-Making, School of Management Science and Engineering, Nanjing University of Information Science & Technology, Nanjing 210044, China; Collaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters(CIC-FEMD), Nanjing University of Information Science & Technology, Nanjing 210044, China.
| |
Collapse
|
45
|
Feda AK, Adegboye M, Adegboye OR, Agyekum EB, Fendzi Mbasso W, Kamel S. S-shaped grey wolf optimizer-based FOX algorithm for feature selection. Heliyon 2024; 10:e24192. [PMID: 38293420 PMCID: PMC10825485 DOI: 10.1016/j.heliyon.2024.e24192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 12/09/2023] [Accepted: 01/04/2024] [Indexed: 02/01/2024] Open
Abstract
The FOX algorithm is a recently developed metaheuristic approach inspired by the behavior of foxes in their natural habitat. While the FOX algorithm exhibits commendable performance, its basic version, in complex problem scenarios, may become trapped in local optima, failing to identify the optimal solution due to its weak exploitation capabilities. This research addresses a high-dimensional feature selection problem. In feature selection, the most informative features are retained while discarding irrelevant ones. An enhanced version of the FOX algorithm is proposed, aiming to mitigate its drawbacks in feature selection. The improved approach referred to as S-shaped Grey Wolf Optimizer-based FOX (FOX-GWO), which focuses on augmenting the local search capabilities of the FOX algorithm via the integration of GWO. Additionally, the introduction of an S-shaped transfer function enables the population to explore both binary options throughout the search process. Through a series of experiments on 18 datasets with varying dimensions, FOX-GWO outperforms in 83.33 % of datasets for average accuracy, 61.11 % for reduced feature dimensionality, and 72.22 % for average fitness value across the 18 datasets. Meaning it efficiently explores high-dimensional spaces. These findings highlight its practical value and potential to advance feature selection in complex data analysis, enhancing model prediction accuracy.
Collapse
Affiliation(s)
- Afi Kekeli Feda
- Management Information System Department, European University of Lefke, Mersin, 10, Turkey
| | | | | | - Ephraim Bonah Agyekum
- Department of Nuclear and Renewable Energy, Ural Federal University named after the first President of Russia Boris Yeltsin, 620002, 19 Mira Street, Ekaterinburg, Russia
| | - Wulfran Fendzi Mbasso
- Laboratory of Technology and Applied Sciences, University Institute of Technology, University of Douala, PO Box: 8698, Douala, Cameroon
| | - Salah Kamel
- Department of Electrical Engineering, Faculty of Engineering, Aswan University, Aswan, 81542, Egypt
| |
Collapse
|
46
|
Thirugnanasambandam K, Murugan J, Ramalingam R, Rashid M, Raghav RS, Kim TH, Sampedro GA, Abisado M. Optimizing multimodal feature selection using binary reinforced cuckoo search algorithm for improved classification performance. PeerJ Comput Sci 2024; 10:e1816. [PMID: 38435570 PMCID: PMC10909206 DOI: 10.7717/peerj-cs.1816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 12/19/2023] [Indexed: 03/05/2024]
Abstract
Background Feature selection is a vital process in data mining and machine learning approaches by determining which characteristics, out of the available features, are most appropriate for categorization or knowledge representation. However, the challenging task is finding a chosen subset of elements from a given set of features to represent or extract knowledge from raw data. The number of features selected should be appropriately limited and substantial to prevent results from deviating from accuracy. When it comes to the computational time cost, feature selection is crucial. A feature selection model is put out in this study to address the feature selection issue concerning multimodal. Methods In this work, a novel optimization algorithm inspired by cuckoo birds' behavior is the Binary Reinforced Cuckoo Search Algorithm (BRCSA). In addition, we applied the proposed BRCSA-based classification approach for multimodal feature selection. The proposed method aims to select the most relevant features from multiple modalities to improve the model's classification performance. The BRCSA algorithm is used to optimize the feature selection process, and a binary encoding scheme is employed to represent the selected features. Results The experiments are conducted on several benchmark datasets, and the results are compared with other state-of-the-art feature selection methods to evaluate the effectiveness of the proposed method. The experimental results demonstrate that the proposed BRCSA-based approach outperforms other methods in terms of classification accuracy, indicating its potential applicability in real-world applications. In specific on accuracy of classification (average), the proposed algorithm outperforms the existing methods such as DGUFS with 32%, MBOICO with 24%, MBOLF with 29%, WOASAT 22%, BGSA with 28%, HGSA 39%, FS-BGSK 37%, FS-pBGSK 42%, and BSSA 40%.
Collapse
Affiliation(s)
- Kalaipriyan Thirugnanasambandam
- Centre for Smart Grid Technologies, School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India
| | - Jayalakshmi Murugan
- Department of Computer Science and Engineering, Kalasalingam Academy of Research and Education, Krishnankoil, India
| | - Rajakumar Ramalingam
- Centre for Automation, School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India
| | - Mamoon Rashid
- Department of Computer Engineering, Faculty of Science and Technology, Vishwakarma University, Pune, India
| | - R. S. Raghav
- School of Computing, SASTRA Deemed University, Villupuram, India
| | - Tai-hoon Kim
- School of Electrical and Computer Engineering, Chonnam National University, Daehak-7, Republic of Korea
| | - Gabriel Avelino Sampedro
- Faculty of Information and Communication Studies, University of the Philippines Open University, Los Baños, Philippines
- Center for Computational Imaging and Visual Innovations, De La Salle University, Malate, Philippines
| | - Mideth Abisado
- College of Computing and Information Technologies, National University, Manila, Philippines
| |
Collapse
|
47
|
Li D, Abhadiomhen SE, Zhou D, Shen XJ, Shi L, Cui Y. Asthma prediction via affinity graph enhanced classifier: a machine learning approach based on routine blood biomarkers. J Transl Med 2024; 22:100. [PMID: 38268004 PMCID: PMC10809685 DOI: 10.1186/s12967-024-04866-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Accepted: 01/06/2024] [Indexed: 01/26/2024] Open
Abstract
BACKGROUND Asthma is a chronic respiratory disease affecting millions of people worldwide, but early detection can be challenging due to the time-consuming nature of the traditional technique. Machine learning has shown great potential in the prompt prediction of asthma. However, because of the inherent complexity of asthma-related patterns, current models often fail to capture the correlation between data samples, limiting their accuracy. Our objective was to use our novel model to address the above problem via an Affinity Graph Enhanced Classifier (AGEC) to improve predictive accuracy. METHODS The clinical dataset used in this study consisted of 152 samples, where 24 routine blood markers were extracted as features to participate in the classification due to their ease of sourcing and relevance to asthma. Specifically, our model begins by constructing a projection matrix to reduce the dimensionality of the feature space while preserving the most discriminative features. Simultaneously, an affinity graph is learned through the resulting subspace to capture the internal relationship between samples better. Leveraging domain knowledge from the affinity graph, a new classifier (AGEC) is introduced for asthma prediction. AGEC's performance was compared with five state-of-the-art predictive models. RESULTS Experimental findings reveal the superior predictive capabilities of AGEC in asthma prediction. AGEC achieved an accuracy of 72.50%, surpassing FWAdaBoost (61.02%), MLFE (60.98%), SVR (64.01%), SVM (69.80%) and ERM (68.40%). These results provide evidence that capturing the correlation between samples can enhance the accuracy of asthma prediction. Moreover, the obtained [Formula: see text] values also suggest that the differences between our model and other models are statistically significant, and the effect of our model does not exist by chance. CONCLUSION As observed from the experimental results, advanced statistical machine learning approaches such as AGEC can enable accurate diagnosis of asthma. This finding holds promising implications for improving asthma management.
Collapse
Affiliation(s)
- Dejing Li
- Department of Respiratory, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi, 214023, China
| | - Stanley Ebhohimhen Abhadiomhen
- School of Computer Science and Communication Engineering, JiangSu University, Zhenjiang, JiangSu, 212013, China
- Department of Computer Science, University of Nigeria, Nsukka, Nigeria
| | - Dongmei Zhou
- Clinical Research Center, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi, 214023, China
| | - Xiang-Jun Shen
- School of Computer Science and Communication Engineering, JiangSu University, Zhenjiang, JiangSu, 212013, China
| | - Lei Shi
- Department of Clinical Laboratory, Shuguang Hospital Affiliated to Shanghai University of Chinese Traditional Medicine, Shanghai, 201203, China.
| | - Yubao Cui
- Clinical Research Center, The Affiliated Wuxi People's Hospital of Nanjing Medical University, Wuxi, 214023, China.
| |
Collapse
|
48
|
Huang WC, Lin WT, Hung MS, Lee JC, Tung CW. Decrypting orphan GPCR drug discovery via multitask learning. J Cheminform 2024; 16:10. [PMID: 38263092 PMCID: PMC10804799 DOI: 10.1186/s13321-024-00806-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 01/18/2024] [Indexed: 01/25/2024] Open
Abstract
The drug discovery of G protein-coupled receptors (GPCRs) superfamily using computational models is often limited by the availability of protein three-dimensional (3D) structures and chemicals with experimentally measured bioactivities. Orphan GPCRs without known ligands further complicate the process. To enable drug discovery for human orphan GPCRs, multitask models were proposed for predicting half maximal effective concentrations (EC50) of the pairs of chemicals and GPCRs. Protein multiple sequence alignment features, and physicochemical properties and fingerprints of chemicals were utilized to encode the protein and chemical information, respectively. The protein features enabled the transfer of data-rich GPCRs to orphan receptors and the transferability based on the similarity of protein features. The final model was trained using both agonist and antagonist data from 200 GPCRs and showed an excellent mean squared error (MSE) of 0.24 in the validation dataset. An independent test using the orphan dataset consisting of 16 receptors associated with less than 8 bioactivities showed a reasonably good MSE of 1.51 that can be further improved to 0.53 by considering the transferability based on protein features. The informative features were identified and mapped to corresponding 3D structures to gain insights into the mechanism of GPCR-ligand interactions across the GPCR family. The proposed method provides a novel perspective on learning ligand bioactivity within the diverse human GPCR superfamily and can potentially accelerate the discovery of therapeutic agents for orphan GPCRs.
Collapse
Affiliation(s)
- Wei-Cheng Huang
- Institute of Biotechnology and Pharmaceutical Research, National Health Research Institutes, Miaoli County, 35053, Taiwan
| | - Wei-Ting Lin
- Institute of Biotechnology and Pharmaceutical Research, National Health Research Institutes, Miaoli County, 35053, Taiwan
| | - Ming-Shiu Hung
- Institute of Biotechnology and Pharmaceutical Research, National Health Research Institutes, Miaoli County, 35053, Taiwan
| | - Jinq-Chyi Lee
- Institute of Biotechnology and Pharmaceutical Research, National Health Research Institutes, Miaoli County, 35053, Taiwan
| | - Chun-Wei Tung
- Institute of Biotechnology and Pharmaceutical Research, National Health Research Institutes, Miaoli County, 35053, Taiwan.
| |
Collapse
|
49
|
Ma XH, Chen ZG, Liu JM. Wavelength selection method for near-infrared spectroscopy based on Max-Relevance Min-Redundancy. Spectrochim Acta A Mol Biomol Spectrosc 2024; 310:123933. [PMID: 38309007 DOI: 10.1016/j.saa.2024.123933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 01/13/2024] [Accepted: 01/19/2024] [Indexed: 02/05/2024]
Abstract
Near-infrared spectroscopy (NIRS) is a rapid, nondestructive analytical technique utilized in various fields. However, the NIR data, which consists of hundreds of dimensions, may exhibit considerable duplication in the spectrum information. This redundancy might impair modeling effectiveness. As a result, feature selection on the spectral data becomes critical. The Max-Relevance Min-Redundancy (mRMR) method stands out among the different feature selection techniques for dimensional reduction. The approach depends on mutual information (MI) between random variables as the basis for feature selection and is unaffected by modeling methods. However, it is necessary to clarify the benefits of the maximum correlation minimal redundancy algorithm in the context of near-infrared spectral feature selection, as well as its adaptability to various modeling methods. This research focuses on the NIR spectral dataset of maize germination rate, and the mRMR method is utilized to select spectral features. Based on the preceding foundation, we create models for Support Vector Regression, Gaussian Process Regression, Random Forest, and Neural Networks. The experimental findings demonstrate that, among the feature selection methods employed in this paper, the Max-Relevance Min-Redundancy algorithm outperforms others regarding the corn germination rate dataset.
Collapse
Affiliation(s)
- Xiao-Hui Ma
- College of Information and Electrical Engineering, Heilongjiang Bayi Agricultural University, Daqing 163319, China
| | - Zheng-Guang Chen
- College of Information and Electrical Engineering, Heilongjiang Bayi Agricultural University, Daqing 163319, China.
| | - Jin-Ming Liu
- College of Information and Electrical Engineering, Heilongjiang Bayi Agricultural University, Daqing 163319, China
| |
Collapse
|
50
|
Azeem M, Khan D, Iftikhar S, Bawazeer S, Alzahrani M. Analyzing and comparing the effectiveness of malware detection: A study of machine learning approaches. Heliyon 2024; 10:e23574. [PMID: 38187275 PMCID: PMC10770453 DOI: 10.1016/j.heliyon.2023.e23574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 12/05/2023] [Accepted: 12/06/2023] [Indexed: 01/09/2024] Open
Abstract
The Internet has become a vital source of knowledge and communication in recent times. Continuous technological advancements have changed the way businesses operate, and everyone today lives in the digital world of engineering. Because of the Internet of Things (IoT) and its applications, people's impressions of the information revolution have improved. Malware detection and categorization are becoming more of a problem in the cybersecurity world. As a result, strong security on the Internet could protect billions of internet users from harmful behavior. In malware detection and classification techniques, several types of deep learning models are used; however, they still have limitations. This study will explore malware detection and classification elements using modern machine learning (ML) approaches, including K-Nearest Neighbors (KNN), Extra Tree (ET), Random Forest (RF), Logistic Regression (LR), Decision Tree (DT), and neural network Multilayer Perceptron (nnMLP). The proposed study uses the publicly available dataset UNSWNB15. In our proposed work, we applied the feature encoding method to convert our dataset into purely numeric values. After that, we applied a feature selection method named Term Frequency-Inverse Document Frequency (TFIDF) based on entropy for the best feature selection. The dataset is then balanced and provided to the ML models for classification. The study concludes that Random Forest, out of all tested ML models, yielded the best accuracy of 97.68 %.
Collapse
Affiliation(s)
- Muhammad Azeem
- Department of Computer Science, COMSATS University Islamabad, Wah Campus, Wah Cantt Pakistan
| | - Danish Khan
- Department of Computer Science, COMSATS University Islamabad, Wah Campus, Wah Cantt Pakistan
| | - Saman Iftikhar
- Faculty of Computer Studies, Arab Open University, Saudi Arabia
| | | | | |
Collapse
|