1
|
Khan Z, Ali A, Aldahmani S. Feature selection via robust weighted score for high dimensional binary class-imbalanced gene expression data. Heliyon 2024; 10:e38547. [PMID: 39398002 PMCID: PMC11471177 DOI: 10.1016/j.heliyon.2024.e38547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Revised: 09/24/2024] [Accepted: 09/25/2024] [Indexed: 10/15/2024] Open
Abstract
In this paper, a robust weighted score for unbalanced data (ROWSU) is proposed for selecting the most discriminative features for high dimensional gene expression binary classification with class-imbalance problem. The method addresses one of the most challenging problems of highly skewed class distributions in gene expression datasets that adversely affect the performance of classification algorithms. First, the training dataset is balanced by synthetically generating data points from minority class observations. Second, a minimum subset of genes is selected using a greedy search approach. Third, a novel weighted robust score, where the weights are computed by support vectors, is introduced to obtain a refined set of genes. The highest scoring genes based on this approach are combined with the minimum subset of genes selected by the greedy search approach to form the final set of genes. The novel method ensures the selection of the most discriminative genes, even in the presence of skewed class distribution, thereby improving the performance of the classifiers. The performance of the proposed ROWSU method is evaluated on 7 gene expression datasets. Classification accuracy, sensitivity and F1-score are used as performance metrics to compare the proposed ROWSU algorithm with several other state-of-the-art methods. Boxplots and stability plots are also constructed for a better understanding of the results. The results show that the proposed method outperforms the existing feature selection procedures based on classification performance from k nearest neighbors (kNN) and random forest (RF) classifiers.
Collapse
Affiliation(s)
- Zardad Khan
- Department of Statistics and Business Analytics, United Arab Emirates University, Al Ain, United Arab Emirates
| | - Amjad Ali
- Department of Statistics and Business Analytics, United Arab Emirates University, Al Ain, United Arab Emirates
| | - Saeed Aldahmani
- Department of Statistics and Business Analytics, United Arab Emirates University, Al Ain, United Arab Emirates
| |
Collapse
|
2
|
Kilicarslan S, Hiz-Cicekliyurt MM. Identification of potential biomarkers of papillary thyroid carcinoma. Endocrine 2024:10.1007/s12020-024-04068-9. [PMID: 39400774 DOI: 10.1007/s12020-024-04068-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Accepted: 10/03/2024] [Indexed: 10/15/2024]
Abstract
Papillary thyroid cancer (PTC) is the predominant form of malignant tumor affecting the thyroid gland. AIM This study aimed to identify candidate biomarkers for papillary thyroid carcinoma using an integrative analysis of bioinformatics and machine learning (ML). MATERIAL AND METHOD The PTC datasets GSE6004, GSE3467, and GSE33630 (species: Homo sapiens) were downloaded from NCBI and analyzed using the limma package to obtain DEGs. Once DEGs were identified, GO and KEGG enrichment analyses were performed as the first step in the bioinformatics process. Subsequently, a protein-protein interaction (PPI) network was constructed according to the common genes in bioinformatics and machine learning using STRING to elucidate the important genes involved in PTC pathogenesis. In machine learning, finding genes entails feature selection to identify the key genes that distinguish biological states. Hybrid feature selection will be used for this. In the second step, the original data sets were preprocessed to detect and correct missing and noisy data; after that, all data were merged. Following performing Linear and Discriminative Hybrid Feature Selection (LDHFS) on the processed dataset, machine learning algorithms such as Random Forest (RF), Naive Bayes (NB), and Support Vector Machines (SVM) are utilized. RESULTS Bioinformatics and machine learning analyses indicate that the genes RXRG, CDH2, ETV5, QPCT, LRP4, FN1, and LPAR5 are integral to the progression of thyroid cancer. This study attained the highest accuracy utilizing the RF algorithm, achieving an accuracy rate of 94.62%, a Kappa value of 91.36%, and an AUC value of 96.13%. These results offer additional evidence and confirmation for the genetic alterations of these genes. These findings may accelerate the development of prospective therapeutic and diagnostic methods in future research. CONCLUSIONS Bioinformatics and machine learning techniques identified the common genes "RXRG, CDH2, ETV5, QPCT, LRP4, FN1, and LPAR5" as PTC biomarkers, providing novel reference markers for the diagnosis and treatment of PTC patients. The model is anticipated to possess significant predictive value and assist in the early diagnosis and screening of clinical PTC. These insights enhance the field of PTC management and offer guidance for future research.
Collapse
Affiliation(s)
- Sabire Kilicarslan
- Çanakkale Onsekiz Mart University, Graduate School of Sciences, Department of Medical System Biology, Çanakkale, Turkey
| | | |
Collapse
|
3
|
Hassan W, Hussain GA, Wahid A, Safdar M, Khalid HM, Jamil MKM. Optimum feature selection for classification of PD signals produced by multiple insulation defects in electric motors. Sci Rep 2024; 14:23446. [PMID: 39379414 PMCID: PMC11461500 DOI: 10.1038/s41598-024-73196-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Accepted: 09/16/2024] [Indexed: 10/10/2024] Open
Abstract
Partial discharges (PD) are initiated in electrical equipment during various points of the equipment's lifecycle. The intensity of PD defects rises continuously with time, which can lead to insulation degradation and reduced operational life of the electrical equipment. The optimum feature selection of PD signals captured, from different insulation defects, can enhance the classification accuracy of PD defects and facilitate better visualization of PD parameters for electric motor (EM) insulation monitoring and diagnostics. This paper presents a hybrid approach, based on Maximize Relevancy and Minimize Redundancy (mRMR) and random forest (RF), for the optimum feature selection and classification of PD signals in EMs containing multiple defects. For this purpose, four PD defects are developed in the EMs insulation under laboratory conditions, and 800 PD signals are acquired using a conventional IEC-60,270 experimental platform. The severity of these defects is determined and investigated based on PD characteristic parameters. Several features of both PD sweep signals and conventional PD pulses are extracted. Consequently, the mRMR feature selection technique is implemented to select the significant features of the detected PD signals. To establish the plausibility of this technique, several other feature selection algorithms, including RefliefF, Gini Index (GI), and Information Gain (IG), are introduced for the same datasets. The performance of all these feature selection algorithms is validated using three commonly used classification techniques such as RF, support vector machines (SVM), and k-nearest neighbors (k-NN). In summary, the results show that the combination of mRMR and RF proves to be the most effective feature selection algorithm for the classification of insulation defects in EMs, achieving an accuracy of 99.875%. This accuracy is significantly better than other feature selection and classification techniques and indicates its potential for application to other power system components.
Collapse
Affiliation(s)
- Waqar Hassan
- School of Electrical & Electronics Engineering, Universiti Sains Malaysia, Pulau Pinang, Malaysia.
| | - G Amjad Hussain
- College of Engineering & IT, University of Dubai, Dubai, United Arab Emirates.
| | - Abdul Wahid
- Dept. of Mathematics & Statistics, Institute of Southern Punjab, Multan, Pakistan
| | - Madia Safdar
- School of Electrical Engineering, Lappeenranta University of Technology, Lappeenranta, Finland
| | - Haris M Khalid
- College of Engineering & IT, University of Dubai, Dubai, United Arab Emirates
- Department of Electrical and Electronic Engineering Science, University of Johannesburg, Auckland Park 2006, South Africa
| | | |
Collapse
|
4
|
Rizk-Allah RM, Abouelmagd LM, Darwish A, Snasel V, Hassanien AE. Explainable AI and optimized solar power generation forecasting model based on environmental conditions. PLoS One 2024; 19:e0308002. [PMID: 39356693 PMCID: PMC11446449 DOI: 10.1371/journal.pone.0308002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 07/16/2024] [Indexed: 10/04/2024] Open
Abstract
This paper proposes a model called X-LSTM-EO, which integrates explainable artificial intelligence (XAI), long short-term memory (LSTM), and equilibrium optimizer (EO) to reliably forecast solar power generation. The LSTM component forecasts power generation rates based on environmental conditions, while the EO component optimizes the LSTM model's hyper-parameters through training. The XAI-based Local Interpretable and Model-independent Explanation (LIME) is adapted to identify the critical factors that influence the accuracy of the power generation forecasts model in smart solar systems. The effectiveness of the proposed X-LSTM-EO model is evaluated through the use of five metrics; R-squared (R2), root mean square error (RMSE), coefficient of variation (COV), mean absolute error (MAE), and efficiency coefficient (EC). The proposed model gains values 0.99, 0.46, 0.35, 0.229, and 0.95, for R2, RMSE, COV, MAE, and EC respectively. The results of this paper improve the performance of the original model's conventional LSTM, where the improvement rate is; 148%, 21%, 27%, 20%, 134% for R2, RMSE, COV, MAE, and EC respectively. The performance of LSTM is compared with other machine learning algorithm such as Decision tree (DT), Linear regression (LR) and Gradient Boosting. It was shown that the LSTM model worked better than DT and LR when the results were compared. Additionally, the PSO optimizer was employed instead of the EO optimizer to validate the outcomes, which further demonstrated the efficacy of the EO optimizer. The experimental results and simulations demonstrate that the proposed model can accurately estimate PV power generation in response to abrupt changes in power generation patterns. Moreover, the proposed model might assist in optimizing the operations of photovoltaic power units. The proposed model is implemented utilizing TensorFlow and Keras within the Google Collab environment.
Collapse
Affiliation(s)
- Rizk M. Rizk-Allah
- Department of Basic Engineering Science, Faculty of Engineering, Menoufia University, Shebin El-Kom, Egypt
- Faculty of Electrical Engineering and Computer Science, VSB–Technical University of Ostrava, Ostrava, Czech Republic
- Scientific Research School of Egypt (SRSEG), Cairo, Egypt
| | - Lobna M. Abouelmagd
- Scientific Research School of Egypt (SRSEG), Cairo, Egypt
- Misr Higher Institute for Commerce and Computers, Mansoura, Egypt
| | - Ashraf Darwish
- Scientific Research School of Egypt (SRSEG), Cairo, Egypt
- Faculty of Science, Helwan University, Helwan, Egypt
| | - Vaclav Snasel
- Faculty of Electrical Engineering and Computer Science, VSB–Technical University of Ostrava, Ostrava, Czech Republic
| | - Aboul Ella Hassanien
- Faculty of Computers and AI, Cairo University, Giza, Egypt
- College of Business Administration (CBA), Kuwait University, Kuwait, Kuwait
| |
Collapse
|
5
|
Freda PJ, Ye S, Zhang R, Moore JH, Urbanowicz RJ. Assessing the limitations of relief-based algorithms in detecting higher-order interactions. BioData Min 2024; 17:37. [PMID: 39354639 PMCID: PMC11443793 DOI: 10.1186/s13040-024-00390-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Accepted: 09/04/2024] [Indexed: 10/03/2024] Open
Abstract
BACKGROUND Epistasis, the interaction between genetic loci where the effect of one locus is influenced by one or more other loci, plays a crucial role in the genetic architecture of complex traits. However, as the number of loci considered increases, the investigation of epistasis becomes exponentially more complex, making the selection of key features vital for effective downstream analyses. Relief-Based Algorithms (RBAs) are often employed for this purpose due to their reputation as "interaction-sensitive" algorithms and uniquely non-exhaustive approach. However, the limitations of RBAs in detecting interactions, particularly those involving multiple loci, have not been thoroughly defined. This study seeks to address this gap by evaluating the efficiency of RBAs in detecting higher-order epistatic interactions. Motivated by previous findings that suggest some RBAs may rank predictive features involved in higher-order epistasis negatively, we explore the potential of absolute value ranking of RBA feature weights as an alternative approach for capturing complex interactions. In this study, we assess the performance of ReliefF, MultiSURF, and MultiSURFstar on simulated genetic datasets that model various patterns of genotype-phenotype associations, including 2-way to 5-way genetic interactions, and compare their performance to two control methods: a random shuffle and mutual information. RESULTS Our findings indicate that while RBAs effectively identify lower-order (2 to 3-way) interactions, their capability to detect higher-order interactions is significantly limited, primarily by large feature count but also by signal noise. Specifically, we observe that RBAs are successful in detecting fully penetrant 4-way XOR interactions using an absolute value ranking approach, but this is restricted to datasets with only 20 total features. CONCLUSIONS These results highlight the inherent limitations of current RBAs and underscore the need for the development of Relief-based approaches with enhanced detection capabilities for the investigation of epistasis, particularly in datasets with large feature counts and complex higher-order interactions.
Collapse
Affiliation(s)
- Philip J Freda
- Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center, Suite G540, West Hollywood, 90069, CA, USA
| | - Suyu Ye
- Whiting School of Engineering, Johns Hopkins University, 3400 N. Charles St., Baltimore, 21218, MD, USA
| | - Robert Zhang
- University of Pennsylvania, Philadelphia, 19104, PA, USA
| | - Jason H Moore
- Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center, Suite G540, West Hollywood, 90069, CA, USA
| | - Ryan J Urbanowicz
- Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center, Suite G540, West Hollywood, 90069, CA, USA.
| |
Collapse
|
6
|
Purnell D, Etemadi A, Kamp J. Developing an Early Warning System for Financial Networks: An Explainable Machine Learning Approach. ENTROPY (BASEL, SWITZERLAND) 2024; 26:796. [PMID: 39330129 PMCID: PMC11432077 DOI: 10.3390/e26090796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 09/11/2024] [Accepted: 09/14/2024] [Indexed: 09/28/2024]
Abstract
Identifying the influential variables that provide early warning of financial network instability is challenging, in part due to the complexity of the system, uncertainty of a failure, and nonlinear, time-varying relationships between network participants. In this study, we introduce a novel methodology to select variables that, from a data-driven and statistical modeling perspective, represent these relationships and may indicate that the financial network is trending toward instability. We introduce a novel variable selection methodology that leverages Shapley values and modified Borda counts, in combination with statistical and machine learning methods, to create an explainable linear model to predict relationship value weights between network participants. We validate this new approach with data collected from the March 2023 Silicon Valley Bank Failure. The models produced using this novel method successfully identified the instability trend using only 14 input variables out of a possible 3160. The use of parsimonious linear models developed by this method has the potential to identify key financial stability indicators while also increasing the transparency of this complex system.
Collapse
Affiliation(s)
- Daren Purnell
- School of Engineering and Applied Science, George Washington University, Washington, DC 20052, USA; (A.E.); (J.K.)
| | | | | |
Collapse
|
7
|
Zhai F, Mu S, Song Y, Zhang M, Zhang C, Lv Z. Machine Learning Prediction of Residual and Recurrent High-Grade CIN Post-LEEP. Cancer Manag Res 2024; 16:1175-1187. [PMID: 39258245 PMCID: PMC11385362 DOI: 10.2147/cmar.s484057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Accepted: 08/23/2024] [Indexed: 09/12/2024] Open
Abstract
Purpose This study aims to develop a machine learning (ML) model to predict the risk of residual or recurrent high-grade cervical intraepithelial neoplasia (CIN) after loop electrosurgical excision procedure (LEEP), addressing a critical gap in personalized follow-up care. Methods A retrospective analysis of 532 patients who underwent LEEP for high-grade CIN at Cangzhou Central Hospital (2016-2020) was conducted. In the final analysis, 99 women (18.6%) were found to have residual or recurrent high-grade CIN (CIN2 or worse) within five years of follow-up. Four feature selection methods identified significant predictors of residual or recurrent CIN. Eight ML algorithms were evaluated using performance metrics such as AUROC, accuracy, sensitivity, specificity, PPV, NPV, F1 score, calibration curve, and decision curve analysis. Fivefold cross-validation optimized and validated the model, and SHAP analysis assessed feature importance. Results The XGBoost algorithm demonstrated the highest predictive performance with the best AUROC. The optimized model included six key predictors: age, ThinPrep cytologic test (TCT) results, HPV classification, CIN severity, glandular involvement, and margin status. SHAP analysis identified CIN severity and margin status as the most influential predictors. An online prediction tool was developed for real-time risk assessment. Conclusion This ML-based predictive model for post-LEEP high-grade CIN provides a significant advancement in gynecologic oncology, enhancing personalized patient care and facilitating early intervention and informed clinical decision-making.
Collapse
Affiliation(s)
- Furui Zhai
- Gynecological Clinic, Cangzhou Central Hospital, Cangzhou City, Hebei Province, People's Republic of China
| | - Shanshan Mu
- Gynecological Clinic, Cangzhou Central Hospital, Cangzhou City, Hebei Province, People's Republic of China
| | - Yinghui Song
- Gynecological Clinic, Cangzhou Central Hospital, Cangzhou City, Hebei Province, People's Republic of China
| | - Min Zhang
- Gynecological Clinic, Cangzhou Central Hospital, Cangzhou City, Hebei Province, People's Republic of China
| | - Cui Zhang
- Gynecological Clinic, Cangzhou Central Hospital, Cangzhou City, Hebei Province, People's Republic of China
| | - Ze Lv
- Gynecological Clinic, Cangzhou Central Hospital, Cangzhou City, Hebei Province, People's Republic of China
| |
Collapse
|
8
|
Han J, Ahn KJ, Cha KC, Kim SJ, Jung WJ, Roh YI, Yoon YR, Hwang SO. Prediction of blood pressure using chest compression waveform during cardiopulmonary resuscitation. Resuscitation 2024; 202:110331. [PMID: 39053839 DOI: 10.1016/j.resuscitation.2024.110331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Accepted: 07/20/2024] [Indexed: 07/27/2024]
Abstract
OBJECTIVES This study aimed to predict blood pressure during CPR using chest compression waveform information obtained from a CPR feedback device. METHODS Quantitative data including chest compression waveforms from a CPR feedback device and the blood pressure measured by arterial cannulation in patients with cardiac arrest during CPR were used. Forty-one features to predict blood pressure were selected from chest compression waveform and demographic characteristics with neighborhood component analysis algorithm. Optimized Gaussian process regression was used as a machine learning algorithm. RESULTS A total of 14,619 datasets from 19 patients with cardiac arrest (mean age: 66 ± 13 years, 14 men) were used in the analysis. The model could predict blood pressure with high precision and low bias for almost the whole range of systolic (SBP), diastolic (DBP), and mean arterial blood pressure (MAP). The correlation coefficients (r) between the predicted and actual values were 0.954 (95% confidence interval: 0.951-0.957, p < 0.001) for SBP, 0.926 (95% confidence interval: 0.921-0.931, p < 0.001) for DBP, and 0.958 (95% confidence interval: 0.955-0.961, p < 0.001) for MBP, which all indicated a very good agreement. CONCLUSIONS Blood pressure generated by chest compressions can be predicted with high accuracy by a machine learning method using chest compression waveform information obtained from a CPR feedback device and the patient's demographic characteristics. Real-time provision of the predicted blood pressure can be used to monitor the quality and efficacy of CPR.
Collapse
Affiliation(s)
- Jiho Han
- Department of Biomedical Engineering, Yonsei University, South Korea.
| | - Kyo Jin Ahn
- Department of Emergency Medicine, Yonsei University Wonju College of Medicine.
| | - Kyoung-Chul Cha
- Department of Emergency Medicine, Yonsei University Wonju College of Medicine.
| | - Sun Ju Kim
- Department of Emergency Medicine, Yonsei University Wonju College of Medicine.
| | - Woo Jin Jung
- Department of Emergency Medicine, Yonsei University Wonju College of Medicine.
| | - Young-Il Roh
- Department of Emergency Medicine, Yonsei University Wonju College of Medicine.
| | - Young Ro Yoon
- Department of Biomedical Engineering, Yonsei University, South Korea.
| | - Sung Oh Hwang
- Department of Emergency Medicine, Yonsei University Wonju College of Medicine.
| |
Collapse
|
9
|
Kelly CM, McLaughlin RL. Comparison of machine learning methods for genomic prediction of selected Arabidopsis thaliana traits. PLoS One 2024; 19:e0308962. [PMID: 39196916 PMCID: PMC11355539 DOI: 10.1371/journal.pone.0308962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 08/04/2024] [Indexed: 08/30/2024] Open
Abstract
We present a comparison of machine learning methods for the prediction of four quantitative traits in Arabidopsis thaliana. High prediction accuracies were achieved on individuals grown under standardized laboratory conditions from the 1001 Arabidopsis Genomes Project. An existing body of evidence suggests that linear models may be impeded by their inability to make use of non-additive effects to explain phenotypic variation at the population level. The results presented here use a nested cross-validation approach to confirm that some machine learning methods have the ability to statistically outperform linear prediction models, with the optimal model dependent on availability of training data and genetic architecture of the trait in question. Linear models were competitive in their performance as per previous work, though the neural network class of predictors was observed to be the most accurate and robust for traits with high heritability. The extent to which non-linear models exploit interaction effects will require further investigation of the causal pathways that lay behind their predictions. Future work utilizing more traits and larger sample sizes, combined with an improved understanding of their respective genetic architectures, may lead to improvements in prediction accuracy.
Collapse
|
10
|
Geeitha S, Ravishankar K, Cho J, Easwaramoorthy SV. Integrating cat boost algorithm with triangulating feature importance to predict survival outcome in recurrent cervical cancer. Sci Rep 2024; 14:19828. [PMID: 39191808 DOI: 10.1038/s41598-024-67562-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 07/12/2024] [Indexed: 08/29/2024] Open
Abstract
Cervical cancer is one of the most dangerous malignancies in women. Prolonged survival times are made possible by breakthroughs in early recognition and efficient treatment of a disease.The existing methods are lagging on finding the important attributes to predict the survival outcome. The main objective of this study is to find individuals with cervical cancer who are at greater risk of death from recurrence by predicting the survival.A novel approach in a proposed technique is Triangulating feature importance to find the important risk factors through which the treatment may vary to improve the survival outcome.Five algorithms Support vector machine, Naive Bayes, supervised logistic regression, decision tree algorithm, Gradient boosting, and random forest are used to build the concept. Conventional attribute selection methods like information gain (IG), FCBF, and ReliefFare employed. The recommended classifier is evaluated for Precision, Recall, F1, Mathews Correlation Coefficient (MCC), Classification Accuracy (CA), and Area under curve (AUC) using various methods. Gradient boosting algorithm (CAT BOOST) attains the highest accuracy value of 0.99 to predict survival outcome of recurrence cervical cancer patients. The proposed outcome of the research is to identify the important risk factors through which the survival outcome of the patients improved.
Collapse
Affiliation(s)
- S Geeitha
- Department of Information Technology, M. Kumarasamy College of Engineering, Thalavapalayam, Karur, Tamil Nadu, India
| | - K Ravishankar
- Department of Computer Science and Engineering, Sathyabama Institute of Science and Technology, Chennai, Tamil Nadu, India
| | - Jaehyuk Cho
- Department of Software Engineering and Division of Electronics and Information Engineering, Jeonbuk National University, Jeonju-Si, Republic of Korea.
| | | |
Collapse
|
11
|
Sánchez RV, Macancela JC, Ortega LR, Cabrera D, García Márquez FP, Cerrada M. Evaluation of Hand-Crafted Feature Extraction for Fault Diagnosis in Rotating Machinery: A Survey. SENSORS (BASEL, SWITZERLAND) 2024; 24:5400. [PMID: 39205095 PMCID: PMC11360600 DOI: 10.3390/s24165400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 08/08/2024] [Accepted: 08/13/2024] [Indexed: 09/04/2024]
Abstract
This article presents a comprehensive collection of formulas and calculations for hand-crafted feature extraction of condition monitoring signals. The documented features include 123 for the time domain and 46 for the frequency domain. Furthermore, a machine learning-based methodology is presented to evaluate the performance of features in fault classification tasks using seven data sets of different rotating machines. The evaluation methodology involves using seven ranking methods to select the best ten hand-crafted features per method for each database, to be subsequently evaluated by three types of classifiers. This process is applied exhaustively by evaluation groups, combining our databases with an external benchmark. A summary table of the performance results of the classifiers is also presented, including the percentage of classification and the number of features required to achieve that value. Through graphic resources, it has been possible to show the prevalence of certain features over others, how they are associated with the database, and the order of importance assigned by the ranking methods. In the same way, finding which features have the highest appearance percentages for each database in all experiments has been possible. The results suggest that hand-crafted feature extraction is an effective technique with low computational cost and high interpretability for fault identification and diagnosis.
Collapse
Affiliation(s)
- René-Vinicio Sánchez
- GIDTEC, Universidad Politécnica Salesiana, Cuenca 010105, Ecuador; (J.C.M.); (M.C.)
| | - Jean Carlo Macancela
- GIDTEC, Universidad Politécnica Salesiana, Cuenca 010105, Ecuador; (J.C.M.); (M.C.)
| | - Luis-Renato Ortega
- GIDTEC, Universidad Politécnica Salesiana, Cuenca 010105, Ecuador; (J.C.M.); (M.C.)
| | - Diego Cabrera
- School of Mechanical Engineering, Dongguan University of Technology, Dongguan 523000, China;
| | | | - Mariela Cerrada
- GIDTEC, Universidad Politécnica Salesiana, Cuenca 010105, Ecuador; (J.C.M.); (M.C.)
| |
Collapse
|
12
|
Borah K, Das HS, Seth S, Mallick K, Rahaman Z, Mallik S. A review on advancements in feature selection and feature extraction for high-dimensional NGS data analysis. Funct Integr Genomics 2024; 24:139. [PMID: 39158621 DOI: 10.1007/s10142-024-01415-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Revised: 07/30/2024] [Accepted: 08/01/2024] [Indexed: 08/20/2024]
Abstract
Recent advancements in biomedical technologies and the proliferation of high-dimensional Next Generation Sequencing (NGS) datasets have led to significant growth in the bulk and density of data. The NGS high-dimensional data, characterized by a large number of genomics, transcriptomics, proteomics, and metagenomics features relative to the number of biological samples, presents significant challenges for reducing feature dimensionality. The high dimensionality of NGS data poses significant challenges for data analysis, including increased computational burden, potential overfitting, and difficulty in interpreting results. Feature selection and feature extraction are two pivotal techniques employed to address these challenges by reducing the dimensionality of the data, thereby enhancing model performance, interpretability, and computational efficiency. Feature selection and feature extraction can be categorized into statistical and machine learning methods. The present study conducts a comprehensive and comparative review of various statistical, machine learning, and deep learning-based feature selection and extraction techniques specifically tailored for NGS and microarray data interpretation of humankind. A thorough literature search was performed to gather information on these techniques, focusing on array-based and NGS data analysis. Various techniques, including deep learning architectures, machine learning algorithms, and statistical methods, have been explored for microarray, bulk RNA-Seq, and single-cell, single-cell RNA-Seq (scRNA-Seq) technology-based datasets surveyed here. The study provides an overview of these techniques, highlighting their applications, advantages, and limitations in the context of high-dimensional NGS data. This review provides better insights for readers to apply feature selection and feature extraction techniques to enhance the performance of predictive models, uncover underlying biological patterns, and gain deeper insights into massive and complex NGS and microarray data.
Collapse
Affiliation(s)
- Kasmika Borah
- Department of Computer Science and Information Technology, Cotton University, Panbazar, Guwahati, 781001, Assam, India
| | - Himanish Shekhar Das
- Department of Computer Science and Information Technology, Cotton University, Panbazar, Guwahati, 781001, Assam, India.
| | - Soumita Seth
- Department of Computer Science and Engineering, Future Institute of Engineering and Management, Narendrapur, Kolkata, 700150, West Bengal, India
| | - Koushik Mallick
- Department of Computer Science and Engineering, RCC Institute of Information Technology, Canal S Rd, Beleghata, Kolkata, 700015, West Bengal, India
| | | | - Saurav Mallik
- Department of Environmental Health, Harvard T H Chan School of Public Health, Boston, MA, 02115, USA.
- Department of Pharmacology & Toxicology, University of Arizona, Tucson, AZ, 85721, USA.
| |
Collapse
|
13
|
Leghissa M, Carrera Á, Iglesias CÁ. FRELSA: A dataset for frailty in elderly people originated from ELSA and evaluated through machine learning models. Int J Med Inform 2024; 192:105603. [PMID: 39232373 DOI: 10.1016/j.ijmedinf.2024.105603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 08/01/2024] [Accepted: 08/13/2024] [Indexed: 09/06/2024]
Abstract
BACKGROUND Frailty is an age-related syndrome characterized by loss of strength and exhaustion and associated with multi-morbidity. Early detection and prediction of the appearance of frailty could help older people age better and prevent them from needing invasive and expensive treatments. Machine learning techniques show promising results in creating a medical support tool for such a task. METHODS This study aims to create a dataset for machine learning-based frailty studies, using Fried's Frailty Phenotype definition. Starting from a longitudinal study on aging in the UK population, we defined a frailty label for each subject. We evaluated the definition by training seven different models for detecting frailty with data that were contemporary to the ones used for the definition. We then integrated more data from two years before to obtain prediction models with a 24-month horizon. Features selection was performed using the MultiSURF algorithm, which ranks all features in order of relevance to the detection or prediction task. RESULTS We present a new frailty dataset of 5303 subjects and more than 6500 available features. It is publicly available, provided one has access to the original English Longitudinal Study of Ageing dataset. The dataset is balanced after grouping frailty with pre-frailty, and it is suitable for multiclass or binary classification and prediction problems. The seven tested architectures performed similarly, forming a solid baseline that can be improved with future work. Linear regression achieved the best F-score and AUROC in detection and prediction tasks. CONCLUSIONS Creating new frailty-annotated datasets of this size is necessary to develop and improve the frailty prediction techniques. We have shown that our dataset can be used to study and test machine learning models to detect and predict frailty. Future work should improve models' architecture and performance, consider explainability, and possibly enrich the dataset with older waves.
Collapse
Affiliation(s)
- Matteo Leghissa
- Universidad Politécnica de Madrid, Av. Complutense, 30, 28040, Madrid, Spain.
| | - Álvaro Carrera
- Universidad Politécnica de Madrid, Av. Complutense, 30, 28040, Madrid, Spain.
| | - Carlos Á Iglesias
- Universidad Politécnica de Madrid, Av. Complutense, 30, 28040, Madrid, Spain.
| |
Collapse
|
14
|
Çiftçi R, Dönmez E, Kurtoğlu A, Eken Ö, Samee NA, Alkanhel RI. Human gender estimation from CT images of skull using deep feature selection and feature fusion. Sci Rep 2024; 14:16879. [PMID: 39043755 PMCID: PMC11266511 DOI: 10.1038/s41598-024-65521-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 06/20/2024] [Indexed: 07/25/2024] Open
Abstract
This research endeavors to prognosticate gender by harnessing the potential of skull computed tomography (CT) images, given the seminal role of gender identification in the realm of identification. The study encompasses a corpus of CT images of cranial structures derived from 218 male and 203 female subjects, constituting a total cohort of 421 individuals within the age bracket of 25 to 65 years. Employing deep learning, a prominent subset of machine learning algorithms, the study deploys convolutional neural network (CNN) models to excavate profound attributes inherent in the skull CT images. In pursuit of the research objective, the focal methodology involves the exclusive application of deep learning algorithms to image datasets, culminating in an accuracy rate of 96.4%. The gender estimation process exhibits a precision of 96.1% for male individuals and 96.8% for female individuals. The precision performance varies across different selections of feature numbers, namely 100, 300, and 500, alongside 1000 features without feature selection. The respective precision rates for these selections are recorded as 95.0%, 95.5%, 96.2%, and 96.4%. It is notable that gender estimation via visual radiography mitigates the discrepancy in measurements between experts, concurrently yielding an expedited estimation rate. Predicated on the empirical findings of this investigation, it is inferred that the efficacy of the CNN model, the configurational intricacies of the classifier, and the judicious selection of features collectively constitute pivotal determinants in shaping the performance attributes of the proposed methodology.
Collapse
Affiliation(s)
- Rukiye Çiftçi
- Medical Faculty, Department of Anatomy, Gaziantep Islamıc Science and Technology University, Gaziantep, Turkey
| | - Emrah Dönmez
- Faculty of Engineering and Natural Sciences, Department of Software Engineering, Bandırma Onyedi Eylül University, Balıkesir, Türkiye
| | - Ahmet Kurtoğlu
- Sport Science Faculty, Department of Coaching Education, Bandırma Onyedi Eylul University, Balıkesir, Turkey
| | - Özgür Eken
- Department of Physical Education and Sport Teaching, Inonu University, Malatya, Turkey
| | - Nagwan Abdel Samee
- Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, 11671, Riyadh, Saudi Arabia
| | - Reem Ibrahim Alkanhel
- Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, 11671, Riyadh, Saudi Arabia.
| |
Collapse
|
15
|
Raju SMTU, Dipto SA, Hossain MI, Chowdhury MAS, Haque F, Nashrah AT, Nishan A, Khan MMH, Hashem MMA. DNN-BP: a novel framework for cuffless blood pressure measurement from optimal PPG features using deep learning model. Med Biol Eng Comput 2024:10.1007/s11517-024-03157-1. [PMID: 38963467 DOI: 10.1007/s11517-024-03157-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Accepted: 06/10/2024] [Indexed: 07/05/2024]
Abstract
Continuous blood pressure (BP) provides essential information for monitoring one's health condition. However, BP is currently monitored using uncomfortable cuff-based devices, which does not support continuous BP monitoring. This paper aims to introduce a blood pressure monitoring algorithm based on only photoplethysmography (PPG) signals using the deep neural network (DNN). The PPG signals are obtained from 125 unique subjects with 218 records and filtered using signal processing algorithms to reduce the effects of noise, such as baseline wandering, and motion artifacts. The proposed algorithm is based on pulse wave analysis of PPG signals, extracted various domain features from PPG signals, and mapped them to BP values. Four feature selection methods are applied and yielded four feature subsets. Therefore, an ensemble feature selection technique is proposed to obtain the optimal feature set based on major voting scores from four feature subsets. DNN models, along with the ensemble feature selection technique, outperformed in estimating the systolic blood pressure (SBP) and diastolic blood pressure (DBP) compared to previously reported approaches that rely only on the PPG signal. The coefficient of determination ( R 2 ) and mean absolute error (MAE) of the proposed algorithm are 0.962 and 2.480 mmHg, respectively, for SBP and 0.955 and 1.499 mmHg, respectively, for DBP. The proposed approach meets the Advancement of Medical Instrumentation standard for SBP and DBP estimations. Additionally, according to the British Hypertension Society standard, the results attained Grade A for both SBP and DBP estimations. It concludes that BP can be estimated more accurately using the optimal feature set and DNN models. The proposed algorithm has the potential ability to facilitate mobile healthcare devices to monitor continuous BP.
Collapse
Affiliation(s)
- S M Taslim Uddin Raju
- Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna, 9203, Bangladesh.
| | - Safin Ahmed Dipto
- Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna, 9203, Bangladesh
| | - Md Imran Hossain
- Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna, 9203, Bangladesh
| | - Md Abu Shahid Chowdhury
- Department of Biomedical Engineering, Khulna University of Engineering & Technology, Khulna, 9203, Bangladesh
| | - Fabliha Haque
- Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna, 9203, Bangladesh
| | - Ayesha Tun Nashrah
- Department of Biomedical Engineering, Khulna University of Engineering & Technology, Khulna, 9203, Bangladesh
| | - Araf Nishan
- Department of Business Administration, International American University, Los Angeles, CA, 90010, USA
| | - Md Mahamudul Hasan Khan
- Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna, 9203, Bangladesh
| | - M M A Hashem
- Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna, 9203, Bangladesh
| |
Collapse
|
16
|
Liu ZD, Li Y, Zhang YT, Zeng J, Chen ZX, Liu JK, Miao F. HGCTNet: Handcrafted Feature-Guided CNN and Transformer Network for Wearable Cuffless Blood Pressure Measurement. IEEE J Biomed Health Inform 2024; 28:3882-3894. [PMID: 38687656 DOI: 10.1109/jbhi.2024.3395445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2024]
Abstract
Biosignals collected by wearable devices, such as electrocardiogram and photoplethysmogram, exhibit redundancy and global temporal dependencies, posing a challenge in extracting discriminative features for blood pressure (BP) estimation. To address this challenge, we propose HGCTNet, a handcrafted feature-guided CNN and transformer network for cuffless BP measurement based on wearable devices. By leveraging convolutional operations and self-attention mechanisms, we design a CNN-Transformer hybrid architecture to learn features from biosignals that capture both local information and global temporal dependencies. Then, we introduce a handcrafted feature-guided attention module that utilizes handcrafted features extracted from biosignals as query vectors to eliminate redundant information within the learned features. Finally, we design a feature fusion module that integrates the learned features, handcrafted features, and demographics to enhance model performance. We validate our approach using two large wearable BP datasets: the CAS-BP dataset and the Aurora-BP dataset. Experimental results demonstrate that HGCTNet achieves an estimation error of 0.9 ± 6.5 mmHg for diastolic BP (DBP) and 0.7 ± 8.3 mmHg for systolic BP (SBP) on the CAS-BP dataset. On the Aurora-BP dataset, the corresponding errors are -0.4 ± 7.0 mmHg for DBP and -0.4 ± 8.6 mmHg for SBP. Compared to the current state-of-the-art approaches, HGCTNet reduces the mean absolute error of SBP estimation by 10.68% on the CAS-BP dataset and 9.84% on the Aurora-BP dataset. These results highlight the potential of HGCTNet in improving the performance of wearable cuffless BP measurements.
Collapse
|
17
|
Cheng J, Su W, Wang Y, Zhan Y, Wang Y, Yan S, Yuan Y, Chen L, Wei Z, Zhang S, Gao X, Tang Z. Magnetic resonance imaging based on radiomics for differentiating T1-category nasopharyngeal carcinoma from nasopharyngeal lymphoid hyperplasia: a multicenter study. Jpn J Radiol 2024; 42:709-719. [PMID: 38409300 DOI: 10.1007/s11604-024-01544-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Accepted: 01/29/2024] [Indexed: 02/28/2024]
Abstract
PURPOSE To investigate the role of magnetic resonance imaging (MRI) based on radiomics using T2-weighted imaging fat suppression (T2WI-FS) and contrast enhanced T1-weighted imaging (CE-T1WI) sequences in differentiating T1-category nasopharyngeal carcinoma (NPC) from nasopharyngeal lymphoid hyperplasia (NPH). MATERIALS AND METHODS This study enrolled 614 patients (training dataset: n = 390, internal validation dataset: n = 98, and external validation dataset: n = 126) of T1-category NPC and NPH. Three feature selection methods were used, including analysis of variance, recursive feature elimination, and relief. The logistic regression classifier was performed to construct the radiomics signatures of T2WI-FS, CE-T1WI, and T2WI-FS + CE-T1WI to differentiate T1-category NPC from NPH. The performance of the optimal radiomics signature (T2WI-FS + CE-T1WI) was compared with those of three radiologists in the internal and external validation datasets. RESULTS Twelve, 15, and 15 radiomics features were selected from T2WI-FS, CE-T1WI, and T2WI-FS + CE-T1WI to develop the three radiomics signatures, respectively. The area under the curve (AUC) values for radiomics signatures of T2WI-FS + CE-T1WI and CE-T1WI were significantly higher than that of T2WI-FS (AUCs = 0.940, 0.935, and 0.905, respectively) for distinguishing T1-category NPC and NPH in the training dataset (Ps all < 0.05). In the internal and external validation datasets, the radiomics signatures based on T2WI-FS + CE-T1WI and CE-T1WI outperformed T2WI-FS with no significant difference (AUCs = 0.938, 0.925, and 0.874 for internal validation dataset and 0.932, 0.918, and 0.882 for external validation dataset; Ps > 0.05). The radiomics signature of T2WI-FS + CE-T1WI significantly performed better than three radiologists in the internal and external validation datasets. CONCLUSION The MRI-based radiomics signature is meaningful in differentiating T1-category NPC from NPH and potentially helps clinicians select suitable therapy strategies.
Collapse
Affiliation(s)
- Jingfeng Cheng
- Department of Radiology, Eye & ENT Hospital of Fudan University, Shanghai Medical School, Fudan University, 83 Fenyang Road, Shanghai, 200031, China
| | - Wenzhe Su
- Department of Radiology, Fudan University Shanghai Cancer Center, Shanghai, 200032, China
| | - Yuzhe Wang
- Department of Radiology, Zhongshan Hospital, Fudan University, Shanghai, 200032, China
| | - Yang Zhan
- Department of Radiology, Zhongshan Hospital, Fudan University, Shanghai, 200032, China
| | - Yin Wang
- Department of Radiology, Eye & ENT Hospital of Fudan University, Shanghai Medical School, Fudan University, 83 Fenyang Road, Shanghai, 200031, China
| | - Shuyu Yan
- Fudan University, Shanghai, 200032, China
| | - Yuan Yuan
- Fudan University, Shanghai, 200032, China
| | | | - Zixun Wei
- Fudan University, Shanghai, 200032, China
| | - Shengjian Zhang
- Department of Radiology, Fudan University Shanghai Cancer Center, Shanghai, 200032, China
| | - Xin Gao
- Shanghai Universal Medical Imaging Diagnostic Center, Shanghai, 200233, China.
| | - Zuohua Tang
- Department of Radiology, Eye & ENT Hospital of Fudan University, Shanghai Medical School, Fudan University, 83 Fenyang Road, Shanghai, 200031, China.
| |
Collapse
|
18
|
Li S, Wang Y, Sun Y, Li D, Zhang Q, Ning Y, Lu Y, Wang W, Zhang H, Yang G. Both intra- and peri-tumoral radiomics signatures can be used to predict lymphatic vascular space invasion and lymphatic metastasis positive status from endometrial cancer MR imaging. Abdom Radiol (NY) 2024:10.1007/s00261-024-04432-3. [PMID: 38916618 DOI: 10.1007/s00261-024-04432-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 05/18/2024] [Accepted: 05/25/2024] [Indexed: 06/26/2024]
Abstract
OBJECTIVES To identify lymphatic vascular space invasion (LVSI) and lymphatic node metastasis (LNM) status of endometrial cancer (EC) patients, using radiomics based on MRI images. METHODS Five hundred and ninety-eight EC patients between January 2015 and September 2020 from two institutions were retrospectively included. Tumoral regions on DWI, T1CE, and T2W images were manually outlined. Radiomics features were extracted from tumor region and peri-tumor region of different thicknesses. We established sub-models to select features from each smaller category. Using this method, we separately constructed radiomic signatures for intra-tumoral and peri-tumoral images using different sequences. We constructed intra-tumoral and peri-tumoral models by combining their features, and a multi-sequence model by combining logits. Models were trained with 397 patients and validated with 170 internal and 31 external patients. RESULTS For LVSI positive/LNM positive status identification, the multi-parameter MRI radiomics model achieved the area under curve (AUC) values of 0.771 (95%CI: [0.692-0.849])/0.801 (95%CI: [0.704, 0.898]) and 0.864 (95%CI: [0.728-1.000])/0.976 (95%CI: [0.919, 1.000]) in internal and external test cohorts, respectively. CONCLUSIONS Intra-tumoral and peri-tumoral radiomics signatures based on mpMRI can both be used to identify LVSI or LNM status in EC patients non-invasively. Further studies on LVSI and LNM should pay attention to both of them.
Collapse
Affiliation(s)
- Shengyong Li
- Shanghai Key Laboratory of Magnetic Resonance, East China Normal University, Shanghai, People's Republic of China
| | - Yida Wang
- Shanghai Key Laboratory of Magnetic Resonance, East China Normal University, Shanghai, People's Republic of China
| | - Yiyang Sun
- Shanghai Key Laboratory of Magnetic Resonance, East China Normal University, Shanghai, People's Republic of China
| | - Dexuan Li
- Shanghai Key Laboratory of Magnetic Resonance, East China Normal University, Shanghai, People's Republic of China
| | - Qi Zhang
- Shanghai Key Laboratory of Magnetic Resonance, East China Normal University, Shanghai, People's Republic of China
| | - Yan Ning
- Department of Pathology, Obstetrics and Gynecology Hospital, Fudan University, Shanghai, People's Republic of China
| | - Yuanyuan Lu
- Department of Radiology, Shanghai First Maternity and Infant Hospital, School of Medicine, Tongji University, Shanghai, People's Republic of China
| | - Wenjing Wang
- Department of Radiology, Shanghai First Maternity and Infant Hospital, School of Medicine, Tongji University, Shanghai, People's Republic of China
| | - He Zhang
- Department of Radiology, Obstetrics and Gynecology Hospital, Fudan University, No.419 Fangxie Road, Shanghai, People's Republic of China.
| | - Guang Yang
- Shanghai Key Laboratory of Magnetic Resonance, East China Normal University, Shanghai, People's Republic of China.
| |
Collapse
|
19
|
Bhattacharjee A, Kar S, Ojha PK. Unveiling G-protein coupled receptor kinase-5 inhibitors for chronic degenerative diseases: Multilayered prioritization employing explainable machine learning-driven multi-class QSAR, ligand-based pharmacophore and free energy-inspired molecular simulation. Int J Biol Macromol 2024; 269:131784. [PMID: 38697440 DOI: 10.1016/j.ijbiomac.2024.131784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 04/02/2024] [Accepted: 04/21/2024] [Indexed: 05/05/2024]
Abstract
GRK5 holds a pivotal role in cellular signaling pathways, with its overexpression in cardiomyocytes, neuronal cells, and tumor cells strongly associated with various chronic degenerative diseases, which highlights the urgent need for potential inhibitors. In this study, multiclass classification-based QSAR models were developed using diverse machine learning algorithms. These models were built from curated compounds with experimentally derived GRK5 inhibitory activity. Additionally, a pharmacophore model was constructed using active compounds from the dataset. Among the models, the SVM-based approach proved most effective and was initially used to screen DrugBank compounds within the applicability domain. Compounds showing significant GRK5 inhibitory potential underwent evaluation for key pharmacophoric features. Prospective compounds were subjected to molecular docking to assess binding affinity towards GRK5's key active site amino acid residues. Stability at the binding site was analyzed through 200 ns molecular dynamics simulations. MM-GBSA analysis quantified individual free energy components contributing to the total binding energy with respect to binding site residues. Metadynamics analysis, including PCA, FEL, and PDF, provided crucial insights into conformational changes of both apo and holo forms of GRK5 at defined energy states. The study identifies DB02844 (S-Adenosyl-1,8-Diamino-3-Thiooctane) and DB13155 (Esculin) as promising GRK5 inhibitors, warranting further in vitro and in vivo validation studies.
Collapse
Affiliation(s)
- Arnab Bhattacharjee
- Drug Discovery and Development Laboratory (DDD Lab), Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India
| | - Supratik Kar
- Chemometrics and Molecular Modeling Laboratory, Department of Chemistry and Physics, Kean University, 1000 Morris Avenue, Union, NJ, 07083, USA
| | - Probir Kumar Ojha
- Drug Discovery and Development Laboratory (DDD Lab), Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| |
Collapse
|
20
|
Kushwaha NL, Kudnar NS, Vishwakarma DK, Subeesh A, Jatav MS, Gaddikeri V, Ahmed AA, Abdelaty I. Stacked hybridization to enhance the performance of artificial neural networks (ANN) for prediction of water quality index in the Bagh river basin, India. Heliyon 2024; 10:e31085. [PMID: 38784559 PMCID: PMC11112320 DOI: 10.1016/j.heliyon.2024.e31085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 05/03/2024] [Accepted: 05/09/2024] [Indexed: 05/25/2024] Open
Abstract
Water quality assessment is paramount for environmental monitoring and resource management, particularly in regions experiencing rapid urbanization and industrialization. This study introduces Artificial Neural Networks (ANN) and its hybrid machine learning models, namely ANN-RF (Random Forest), ANN-SVM (Support Vector Machine), ANN-RSS (Random Subspace), ANN-M5P (M5 Pruned), and ANN-AR (Additive Regression) for water quality assessment in the rapidly urbanizing and industrializing Bagh River Basin, India. The Relief algorithm was employed to select the most influential water quality input parameters, including Nitrate (NO3-), Magnesium (Mg2+), Sulphate (SO42-), Calcium (Ca2+), and Potassium (K+). The comparative analysis of developed ANN and its hybrid models was carried out using statistical indicators (i.e., Nash-Sutcliffe Efficiency (NSE), Pearson Correlation Coefficient (PCC), Coefficient of Determination (R2), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Relative Root Square Error (RRSE), Relative Absolute Error (RAE), and Mean Bias Error (MBE)) and graphical representations (i.e., Taylor diagram). Results indicate that the integration of support vector machine (SVM) with ANN significantly improves performance, yielding impressive statistical indicators: NSE (0.879), R2 (0.904), MAE (22.349), and MBE (12.548). The methodology outlined in this study can serve as a template for enhancing the predictive capabilities of ANN models in various other environmental and ecological applications, contributing to sustainable development and safeguarding natural resources.
Collapse
Affiliation(s)
- Nand Lal Kushwaha
- Department of Soil and Water Engineering, Punjab Agricultural University Ludhiana, Punjab, 141004, India
- Division of Agricultural Engineering, ICAR-Indian Agricultural Research Institute, New Delhi, 110012, India
| | - Nanabhau S. Kudnar
- Department of Geography, C. J. Patel College Tirora, Gondia, Maharashtra, 441911, India
| | - Dinesh Kumar Vishwakarma
- Department of Irrigation and Drainage Engineering, G.B. Pant University of Agriculture and Technology, Pantnagar, Uttarakhand, 263145, India
| | - A. Subeesh
- ICAR- Central Institute of Agricultural Engineering, Bhopal, Madhya Pradesh, 462038, India
| | - Malkhan Singh Jatav
- National Institute of Hydrology, North Western Regional Centre, Jodhpur, Rajasthan, 342003, India
| | - Venkatesh Gaddikeri
- Division of Agricultural Engineering, ICAR-Indian Agricultural Research Institute, New Delhi, 110012, India
| | - Ashraf A. Ahmed
- Department of Civil and Environmental Engineering, Brunel University London, Kingston Lane, Uxbridge UB38PH, UK
| | - Ismail Abdelaty
- Water and Water Structures Engineering Department, Faculty of Engineering, Zagazig University, Zagazig, 44519, Egypt
| |
Collapse
|
21
|
Lupea I, Lupea M, Coroian A. Helical Gearbox Defect Detection with Machine Learning Using Regular Mesh Components and Sidebands. SENSORS (BASEL, SWITZERLAND) 2024; 24:3337. [PMID: 38894129 PMCID: PMC11174595 DOI: 10.3390/s24113337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 05/17/2024] [Accepted: 05/20/2024] [Indexed: 06/21/2024]
Abstract
The current paper presents helical gearbox defect detection models built from raw vibration signals measured using a triaxial accelerometer. Gear faults, such as localized pitting, localized wear on helical pinion tooth flanks, and low lubricant level, are under observation for three rotating velocities of the actuator and three load levels at the speed reducer output. The emphasis is on the strong connection between the gear faults and the fundamental meshing frequency GMF, its harmonics, and the sidebands found in the vibration spectrum as an effect of the amplitude modulation (AM) and phase modulation (PM). Several sets of features representing powers on selected frequency bands or/and associated peak amplitudes from the vibration spectrum, and also, for comparison, time-domain and frequency-domain statistical feature sets, are proposed as predictors in the defect detection task. The best performing detection model, with a testing accuracy of 99.73%, is based on SVM (Support Vector Machine) with a cubic kernel, and the features used are the band powers associated with six GMF harmonics and two sideband pairs for all three accelerometer axes, regardless of the rotation velocities and the load levels.
Collapse
Affiliation(s)
- Iulian Lupea
- Faculty of Industrial Engineering, Robotics and Production Management, Technical University of Cluj-Napoca, 400114 Cluj-Napoca, Romania;
| | - Mihaiela Lupea
- Faculty of Mathematics and Computer Science, Babes-Bolyai University, 400084 Cluj-Napoca, Romania;
| | - Adrian Coroian
- Faculty of Industrial Engineering, Robotics and Production Management, Technical University of Cluj-Napoca, 400114 Cluj-Napoca, Romania;
| |
Collapse
|
22
|
Burton RJ, Raffray L, Moet LM, Cuff SM, White DA, Baker SE, Moser B, O’Donnell VB, Ghazal P, Morgan MP, Artemiou A, Eberl M. Conventional and unconventional T-cell responses contribute to the prediction of clinical outcome and causative bacterial pathogen in sepsis patients. Clin Exp Immunol 2024; 216:293-306. [PMID: 38430552 PMCID: PMC11097916 DOI: 10.1093/cei/uxae019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 02/12/2024] [Accepted: 02/28/2024] [Indexed: 03/04/2024] Open
Abstract
Sepsis is characterized by a dysfunctional host response to infection culminating in life-threatening organ failure that requires complex patient management and rapid intervention. Timely diagnosis of the underlying cause of sepsis is crucial, and identifying those at risk of complications and death is imperative for triaging treatment and resource allocation. Here, we explored the potential of explainable machine learning models to predict mortality and causative pathogen in sepsis patients. By using a modelling pipeline employing multiple feature selection algorithms, we demonstrate the feasibility of identifying integrative patterns from clinical parameters, plasma biomarkers, and extensive phenotyping of blood immune cells. While no single variable had sufficient predictive power, models that combined five and more features showed a macro area under the curve (AUC) of 0.85 to predict 90-day mortality after sepsis diagnosis, and a macro AUC of 0.86 to discriminate between Gram-positive and Gram-negative bacterial infections. Parameters associated with the cellular immune response contributed the most to models predictive of 90-day mortality, most notably, the proportion of T cells among PBMCs, together with expression of CXCR3 by CD4+ T cells and CD25 by mucosal-associated invariant T (MAIT) cells. Frequencies of Vδ2+ γδ T cells had the most profound impact on the prediction of Gram-negative infections, alongside other T-cell-related variables and total neutrophil count. Overall, our findings highlight the added value of measuring the proportion and activation patterns of conventional and unconventional T cells in the blood of sepsis patients in combination with other immunological, biochemical, and clinical parameters.
Collapse
Affiliation(s)
- Ross J Burton
- Division of Infection and Immunity, School of Medicine, Cardiff University, Cardiff, UK
- Adult Critical Care, University Hospital of Wales, Cardiff and Vale University Health Board, Cardiff, UK
| | - Loïc Raffray
- Division of Infection and Immunity, School of Medicine, Cardiff University, Cardiff, UK
- Department of Internal Medicine, Félix Guyon University Hospital of La Réunion, Saint Denis, Réunion Island, France
| | - Linda M Moet
- Division of Infection and Immunity, School of Medicine, Cardiff University, Cardiff, UK
| | - Simone M Cuff
- Division of Infection and Immunity, School of Medicine, Cardiff University, Cardiff, UK
| | - Daniel A White
- Division of Infection and Immunity, School of Medicine, Cardiff University, Cardiff, UK
| | - Sarah E Baker
- Division of Infection and Immunity, School of Medicine, Cardiff University, Cardiff, UK
| | - Bernhard Moser
- Division of Infection and Immunity, School of Medicine, Cardiff University, Cardiff, UK
- Systems Immunity Research Institute, Cardiff University, Cardiff, UK
| | - Valerie B O’Donnell
- Division of Infection and Immunity, School of Medicine, Cardiff University, Cardiff, UK
- Systems Immunity Research Institute, Cardiff University, Cardiff, UK
| | - Peter Ghazal
- Division of Infection and Immunity, School of Medicine, Cardiff University, Cardiff, UK
- Systems Immunity Research Institute, Cardiff University, Cardiff, UK
| | - Matt P Morgan
- Adult Critical Care, University Hospital of Wales, Cardiff and Vale University Health Board, Cardiff, UK
| | - Andreas Artemiou
- School of Mathematics, Cardiff University, Cardiff, UK
- Department of Information Technologies, University of Limassol, 3025 Limassol, Cyprus
| | - Matthias Eberl
- Division of Infection and Immunity, School of Medicine, Cardiff University, Cardiff, UK
- Systems Immunity Research Institute, Cardiff University, Cardiff, UK
| |
Collapse
|
23
|
Kuizinienė D, Savickas P, Kunickaitė R, Juozaitienė R, Damaševičius R, Maskeliūnas R, Krilavičius T. A comparative study of feature selection and feature extraction methods for financial distress identification. PeerJ Comput Sci 2024; 10:e1956. [PMID: 38855232 PMCID: PMC11157601 DOI: 10.7717/peerj-cs.1956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 03/04/2024] [Indexed: 06/11/2024]
Abstract
Financial distress identification remains an essential topic in the scientific literature due to its importance for society and the economy. The advancements in information technology and the escalating volume of stored data have led to the emergence of financial distress that transcends the realm of financial statements and its' indicators (ratios). The feature space could be expanded by incorporating new perspectives on feature data categories such as macroeconomics, sectors, social, board, management, judicial incident, etc. However, the increased dimensionality results in sparse data and overfitted models. This study proposes a new approach for efficient financial distress classification assessment by combining dimensionality reduction and machine learning techniques. The proposed framework aims to identify a subset of features leading to the minimization of the loss function describing the financial distress in an enterprise. During the study, 15 dimensionality reduction techniques with different numbers of features and 17 machine-learning models were compared. Overall, 1,432 experiments were performed using Lithuanian enterprise data covering the period from 2015 to 2022. Results revealed that the artificial neural network (ANN) model with 30 ranked features identified using the Random Forest mean decreasing Gini (RF_MDG) feature selection technique provided the highest AUC score. Moreover, this study has introduced a novel approach for feature extraction, which could improve financial distress classification models.
Collapse
Affiliation(s)
- Dovilė Kuizinienė
- Department of Applied Informatics, Vytautas Magnus University, Kaunas, Lithuania
| | - Paulius Savickas
- Department of Applied Informatics, Vytautas Magnus University, Kaunas, Lithuania
| | - Rimantė Kunickaitė
- Department of Applied Informatics, Vytautas Magnus University, Kaunas, Lithuania
| | - Rūta Juozaitienė
- Department of Applied Informatics, Vytautas Magnus University, Kaunas, Lithuania
| | | | | | - Tomas Krilavičius
- Department of Applied Informatics, Vytautas Magnus University, Kaunas, Lithuania
| |
Collapse
|
24
|
Martínez‐Mauricio KL, García‐Jacas CR, Cordoves‐Delgado G. Examining evolutionary scale modeling-derived different-dimensional embeddings in the antimicrobial peptide classification through a KNIME workflow. Protein Sci 2024; 33:e4928. [PMID: 38501511 PMCID: PMC10949403 DOI: 10.1002/pro.4928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 01/28/2024] [Accepted: 01/30/2024] [Indexed: 03/20/2024]
Abstract
Molecular features play an important role in different bio-chem-informatics tasks, such as the Quantitative Structure-Activity Relationships (QSAR) modeling. Several pre-trained models have been recently created to be used in downstream tasks, either by fine-tuning a specific model or by extracting features to feed traditional classifiers. In this regard, a new family of Evolutionary Scale Modeling models (termed as ESM-2 models) was recently introduced, demonstrating outstanding results in protein structure prediction benchmarks. Herein, we studied the usefulness of the different-dimensional embeddings derived from the ESM-2 models to classify antimicrobial peptides (AMPs). To this end, we built a KNIME workflow to use the same modeling methodology across experiments in order to guarantee fair analyses. As a result, the 640- and 1280-dimensional embeddings derived from the 30- and 33-layer ESM-2 models, respectively, are the most valuable since statistically better performances were achieved by the QSAR models built from them. We also fused features of the different ESM-2 models, and it was concluded that the fusion contributes to getting better QSAR models than using features of a single ESM-2 model. Frequency studies revealed that only a portion of the ESM-2 embeddings is valuable for modeling tasks since between 43% and 66% of the features were never used. Comparisons regarding state-of-the-art deep learning (DL) models confirm that when performing methodologically principled studies in the prediction of AMPs, non-DL based QSAR models yield comparable-to-superior performances to DL-based QSAR models. The developed KNIME workflow is available-freely at https://github.com/cicese-biocom/classification-QSAR-bioKom. This workflow can be valuable to avoid unfair comparisons regarding new computational methods, as well as to propose new non-DL based QSAR models.
Collapse
Affiliation(s)
- Karla L. Martínez‐Mauricio
- Departamento de Ciencias de la ComputaciónCentro de Investigación Científica y de Educación Superior de Ensenada (CICESE)EnsenadaMexico
| | - César R. García‐Jacas
- Cátedras CONAHCYT – Departamento de Ciencias de la ComputaciónCentro de Investigación Científica y de Educación Superior de Ensenada (CICESE)EnsenadaMexico
| | - Greneter Cordoves‐Delgado
- Departamento de Ciencias de la ComputaciónCentro de Investigación Científica y de Educación Superior de Ensenada (CICESE)EnsenadaMexico
| |
Collapse
|
25
|
Victor OA, Chen Y, Ding X. Non-Invasive Heart Failure Evaluation Using Machine Learning Algorithms. SENSORS (BASEL, SWITZERLAND) 2024; 24:2248. [PMID: 38610459 PMCID: PMC11014006 DOI: 10.3390/s24072248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 03/26/2024] [Accepted: 03/28/2024] [Indexed: 04/14/2024]
Abstract
Heart failure is a prevalent cardiovascular condition with significant health implications, necessitating effective diagnostic strategies for timely intervention. This study explores the potential of continuous monitoring of non-invasive signals, specifically integrating photoplethysmogram (PPG) and electrocardiogram (ECG), for enhancing early detection and diagnosis of heart failure. Leveraging a dataset from the MIMIC-III database, encompassing 682 heart failure patients and 954 controls, our approach focuses on continuous, non-invasive monitoring. Key features, including the QRS interval, RR interval, augmentation index, heart rate, systolic pressure, diastolic pressure, and peak-to-peak amplitude, were carefully selected for their clinical relevance and ability to capture cardiovascular dynamics. This feature selection not only highlighted important physiological indicators but also helped reduce computational complexity and the risk of overfitting in machine learning models. The use of these features in training machine learning algorithms led to a model with impressive accuracy (98%), sensitivity (97.60%), specificity (96.90%), and precision (97.20%). Our integrated approach, combining PPG and ECG signals, demonstrates superior performance compared to single-signal strategies, emphasizing its potential in early and precise heart failure diagnosis. The study also highlights the importance of continuous monitoring with wearable technology, suggesting a significant stride forward in non-invasive cardiovascular health assessment. The proposed approach holds promise for implementation in hardware systems to enable continuous monitoring, aiding in early detection and prevention of critical health conditions.
Collapse
Affiliation(s)
| | | | - Xiaorong Ding
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China; (O.A.V.); (Y.C.)
| |
Collapse
|
26
|
Nishan A, M. Taslim Uddin Raju S, Hossain MI, Dipto SA, M. Tanvir Uddin S, Sijan A, Chowdhury MAS, Ahmad A, Mahamudul Hasan Khan M. A continuous cuffless blood pressure measurement from optimal PPG characteristic features using machine learning algorithms. Heliyon 2024; 10:e27779. [PMID: 38533045 PMCID: PMC10963242 DOI: 10.1016/j.heliyon.2024.e27779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 03/01/2024] [Accepted: 03/06/2024] [Indexed: 03/28/2024] Open
Abstract
Background and objective Hypertension is a potentially dangerous health condition that can be detected by measuring blood pressure (BP). Blood pressure monitoring and measurement are essential for preventing and treating cardiovascular diseases. Cuff-based devices, on the other hand, are uncomfortable and prevent continuous BP measurement. Methods In this study, a new non-invasive and cuff-less method for estimating Systolic Blood Pressure (SBP), Mean Arterial Pressure (MAP), and Diastolic Blood Pressure (DBP) has been proposed using characteristic features of photoplethysmogram (PPG) signals and nonlinear regression algorithms. PPG signals were collected from 219 participants, which were then subjected to preprocessing and feature extraction steps. Analyzing PPG and its derivative signals, a total of 46 time, frequency, and time-frequency domain features were extracted. In addition, the age and gender of each subject were also included as features. Further, correlation-based feature selection (CFS) and Relief F feature selection (ReliefF) techniques were used to select the relevant features and reduce the possibility of over-fitting the models. Finally, support vector regression (SVR), K-nearest neighbour regression (KNR), decision tree regression (DTR), and random forest regression (RFR) were established to develop the BP estimation model. Regression models were trained and evaluated on all features as well as selected features. The best regression models for SBP, MAP, and DBP estimations were selected separately. Results The SVR model, along with the ReliefF-based feature selection algorithm, outperforms other algorithms in estimating the SBP, MAP, and DBP with the mean absolute error of 2.49, 1.62 and 1.43 mmHg, respectively. The proposed method meets the Advancement of Medical Instrumentation standard for BP estimations. Based on the British Hypertension Society standard, the results also fall within Grade A for SBP, MAP, and DBP. Conclusion The findings show that the method can be used to estimate blood pressure non-invasively, without using a cuff or calibration, and only by utilizing the PPG signal characteristic features.
Collapse
Affiliation(s)
- Araf Nishan
- Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna - 9203, Bangladesh
| | - S. M. Taslim Uddin Raju
- Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna - 9203, Bangladesh
| | - Md Imran Hossain
- Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna - 9203, Bangladesh
| | - Safin Ahmed Dipto
- Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna - 9203, Bangladesh
| | - S. M. Tanvir Uddin
- Department of Electrical and Electronic Engineering, Dhaka University of Engineering & Technology, Gazipur, Bangladesh
| | - Asif Sijan
- Department of Software Engineering, American International University, Dhaka, Bangladesh
| | - Md Abu Shahid Chowdhury
- Department of Biomedical Engineering, Khulna University of Engineering & Technology, Khulna - 9203, Bangladesh
| | - Ashfaq Ahmad
- Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna - 9203, Bangladesh
| | - Md Mahamudul Hasan Khan
- Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna - 9203, Bangladesh
| |
Collapse
|
27
|
Roy S, Singh J, Ray SS. Weighted Combination of Łukasiewicz implication and Fuzzy Jaccard similarity in Hybrid Ensemble Framework (WCLFJHEF) for Gene Selection. Comput Biol Med 2024; 170:107981. [PMID: 38262204 DOI: 10.1016/j.compbiomed.2024.107981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 01/02/2024] [Accepted: 01/12/2024] [Indexed: 01/25/2024]
Abstract
A framework is developed for gene expression analysis by introducing fuzzy Jaccard similarity (FJS) and combining Łukasiewicz implication with it through weights in hybrid ensemble framework (WCLFJHEF) for gene selection in cancer. The method is called weighted combination of Łukasiewicz implication and fuzzy Jaccard similarity in hybrid ensemble framework (WCLFJHEF). While the fuzziness in Jaccard similarity is incorporated by using the existing Gödel fuzzy logic, the weights are obtained by maximizing the average F-score of selected genes in classifying the cancer patients. The patients are first divided into different clusters, based on the number of patient groups, using average linkage agglomerative clustering and a new score, called WCLFJ (weighted combination of Łukasiewicz implication and fuzzy Jaccard similarity). The genes are then selected from each cluster separately using filter based Relief-F and wrapper based SVMRFE (Support Vector Machine with Recursive Feature Elimination). A gene (feature) pool is created by considering the union of selected features for all the clusters. A set of informative genes is selected from the pool using sequential backward floating search (SBFS) algorithm. Patients are then classified using Naïve Bayes'(NB) and Support Vector Machine (SVM) separately, using the selected genes and the related F-scores are calculated. The weights in WCLFJ are then updated iteratively to maximize the average F-score obtained from the results of the classifier. The effectiveness of WCLFJHEF is demonstrated on six gene expression datasets. The average values of accuracy, F-score, recall, precision and MCC over all the datasets, are 95%, 94%, 94%, 94%, and 90%, respectively. The explainability of the selected genes is shown using SHapley Additive exPlanations (SHAP) values and this information is further used to rank them. The relevance of the selected gene set are biologically validated using the KEGG Pathway, Gene Ontology (GO), and existing literatures. It is seen that the genes that are selected by WCLFJHEF are candidates for genomic alterations in the various cancer types. The source code of WCLFJHEF is available at http://www.isical.ac.in/~shubhra/WCLFJHEF.html.
Collapse
Affiliation(s)
- Sukriti Roy
- Machine Intelligence Unit, Indian Statistical Institute, Kolkata 700108, India.
| | - Joginder Singh
- Center for Soft Computing Research, Indian Statistical Institute, Kolkata 700108, India.
| | - Shubhra Sankar Ray
- Machine Intelligence Unit, Indian Statistical Institute, Kolkata 700108, India; Center for Soft Computing Research, Indian Statistical Institute, Kolkata 700108, India.
| |
Collapse
|
28
|
Guo C. KNCFS: Feature selection for high-dimensional datasets based on improved random multi-subspace learning. PLoS One 2024; 19:e0296108. [PMID: 38394325 PMCID: PMC10890778 DOI: 10.1371/journal.pone.0296108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Accepted: 12/05/2023] [Indexed: 02/25/2024] Open
Abstract
Feature selection has long been a focal point of research in various fields.Recent studies have focused on the application of random multi-subspaces methods to extract more information from raw samples.However,this approach inadequately addresses the adverse effects that may arise due to feature collinearity in high-dimensional datasets.To further address the limited ability of traditional algorithms to extract useful information from raw samples while considering the challenge of feature collinearity during the random subspaces learning process, we employ a clustering approach based on correlation measures to group features.Subsequently, we construct subspaces with lower inter-feature correlations.When integrating feature weights obtained from all feature spaces,we introduce a weighting factor to better handle the contributions from different feature spaces.We comprehensively evaluate our proposed algorithm on ten real datasets and four synthetic datasets,comparing it with six other feature selection algorithms.Experimental results demonstrate that our algorithm,denoted as KNCFS,effectively identifies relevant features,exhibiting robust feature selection performance,particularly suited for addressing feature selection challenges in practice.
Collapse
Affiliation(s)
- Cong Guo
- College of Computer and Information Engineering, Henan University, Kaifeng, China
| |
Collapse
|
29
|
Walsh C, Stallard-Olivera E, Fierer N. Nine (not so simple) steps: a practical guide to using machine learning in microbial ecology. mBio 2024; 15:e0205023. [PMID: 38126787 PMCID: PMC10865974 DOI: 10.1128/mbio.02050-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2023] Open
Abstract
Due to the complex nature of microbiome data, the field of microbial ecology has many current and potential uses for machine learning (ML) modeling. With the increased use of predictive ML models across many disciplines, including microbial ecology, there is extensive published information on the specific ML algorithms available and how those algorithms have been applied. Thus, our goal is not to summarize the breadth of ML models available or compare their performances. Rather, our goal is to provide more concrete and actionable information to guide microbial ecologists in how to select, run, and interpret ML algorithms to predict the taxa or genes associated with particular sample categories or environmental gradients of interest. Such microbial data often have unique characteristics that require careful consideration of how to apply ML models and how to interpret the associated results. This review is intended for practicing microbial ecologists who may be unfamiliar with some of the intricacies of ML models. We provide examples and discuss common opportunities and pitfalls specific to applying ML models to the types of data sets most frequently collected by microbial ecologists.
Collapse
Affiliation(s)
- Corinne Walsh
- Cooperative Institute of Research in Environmental Sciences, CU Boulder, Boulder, Colorado, USA
- Ecology and Evolutionary Biology Department, CU Boulder, Boulder, Colorado, USA
| | - Elías Stallard-Olivera
- Cooperative Institute of Research in Environmental Sciences, CU Boulder, Boulder, Colorado, USA
- Ecology and Evolutionary Biology Department, CU Boulder, Boulder, Colorado, USA
| | - Noah Fierer
- Cooperative Institute of Research in Environmental Sciences, CU Boulder, Boulder, Colorado, USA
- Ecology and Evolutionary Biology Department, CU Boulder, Boulder, Colorado, USA
| |
Collapse
|
30
|
Fisher TB, Saini G, Rekha TS, Krishnamurthy J, Bhattarai S, Callagy G, Webber M, Janssen EAM, Kong J, Aneja R. Digital image analysis and machine learning-assisted prediction of neoadjuvant chemotherapy response in triple-negative breast cancer. Breast Cancer Res 2024; 26:12. [PMID: 38238771 PMCID: PMC10797728 DOI: 10.1186/s13058-023-01752-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Accepted: 12/11/2023] [Indexed: 01/22/2024] Open
Abstract
BACKGROUND Pathological complete response (pCR) is associated with favorable prognosis in patients with triple-negative breast cancer (TNBC). However, only 30-40% of TNBC patients treated with neoadjuvant chemotherapy (NAC) show pCR, while the remaining 60-70% show residual disease (RD). The role of the tumor microenvironment in NAC response in patients with TNBC remains unclear. In this study, we developed a machine learning-based two-step pipeline to distinguish between various histological components in hematoxylin and eosin (H&E)-stained whole slide images (WSIs) of TNBC tissue biopsies and to identify histological features that can predict NAC response. METHODS H&E-stained WSIs of treatment-naïve biopsies from 85 patients (51 with pCR and 34 with RD) of the model development cohort and 79 patients (41 with pCR and 38 with RD) of the validation cohort were separated through a stratified eightfold cross-validation strategy for the first step and leave-one-out cross-validation strategy for the second step. A tile-level histology label prediction pipeline and four machine-learning classifiers were used to analyze 468,043 tiles of WSIs. The best-trained classifier used 55 texture features from each tile to produce a probability profile during testing. The predicted histology classes were used to generate a histology classification map of the spatial distributions of different tissue regions. A patient-level NAC response prediction pipeline was trained with features derived from paired histology classification maps. The top graph-based features capturing the relevant spatial information across the different histological classes were provided to the radial basis function kernel support vector machine (rbfSVM) classifier for NAC treatment response prediction. RESULTS The tile-level prediction pipeline achieved 86.72% accuracy for histology class classification, while the patient-level pipeline achieved 83.53% NAC response (pCR vs. RD) prediction accuracy of the model development cohort. The model was validated with an independent cohort with tile histology validation accuracy of 83.59% and NAC prediction accuracy of 81.01%. The histological class pairs with the strongest NAC response predictive ability were tumor and tumor tumor-infiltrating lymphocytes for pCR and microvessel density and polyploid giant cancer cells for RD. CONCLUSION Our machine learning pipeline can robustly identify clinically relevant histological classes that predict NAC response in TNBC patients and may help guide patient selection for NAC treatment.
Collapse
Affiliation(s)
- Timothy B Fisher
- Department of Biology, Georgia State University, Atlanta, GA, 30302, USA
| | - Geetanjali Saini
- School of Health Professions, University of Alabama at Birmingham, Birmingham, AL, 35294, USA
| | - T S Rekha
- JSSAHER (JSS Academy of Higher Education and Research) Medical College, Mysuru, Karnataka, India
| | - Jayashree Krishnamurthy
- JSSAHER (JSS Academy of Higher Education and Research) Medical College, Mysuru, Karnataka, India
| | - Shristi Bhattarai
- School of Health Professions, University of Alabama at Birmingham, Birmingham, AL, 35294, USA
| | - Grace Callagy
- Discipline of Pathology, University of Galway, Galway, Ireland
| | - Mark Webber
- Discipline of Pathology, University of Galway, Galway, Ireland
| | - Emiel A M Janssen
- Department of Pathology, Stavanger University Hospital, Stavanger, Norway
- Department of Chemistry, Bioscience and Environmental Engineering, University of Stavanger, Stavanger, Norway
| | - Jun Kong
- Department of Mathematics and Statistics, Georgia State University, Atlanta, GA, 30303, USA.
| | - Ritu Aneja
- Department of Biology, Georgia State University, Atlanta, GA, 30302, USA.
- School of Health Professions, University of Alabama at Birmingham, Birmingham, AL, 35294, USA.
| |
Collapse
|
31
|
Yolchuyeva S, Ebrahimpour L, Tonneau M, Lamaze F, Orain M, Coulombe F, Malo J, Belkaid W, Routy B, Joubert P, Manem VS. Multi-institutional prognostic modeling of survival outcomes in NSCLC patients treated with first-line immunotherapy using radiomics. J Transl Med 2024; 22:42. [PMID: 38200511 PMCID: PMC10777540 DOI: 10.1186/s12967-024-04854-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Accepted: 01/03/2024] [Indexed: 01/12/2024] Open
Abstract
BACKGROUND Immune checkpoint inhibitors (ICIs) have emerged as one of the most promising first-line therapeutics in the management of non-small cell lung cancer (NSCLC). However, only a subset of these patients responds to ICIs, highlighting the clinical need to develop better predictive and prognostic biomarkers. This study will leverage pre-treatment imaging profiles to develop survival risk models for NSCLC patients treated with first-line immunotherapy. METHODS Advanced NSCLC patients (n = 149) were retrospectively identified from two institutions who were treated with first-line ICIs. Radiomics features extracted from pretreatment imaging scans were used to build the predictive models for progression-free survival (PFS) and overall survival (OS). A compendium of five feature selection methods and seven machine learning approaches were utilized to build the survival risk models. The concordance index (C-index) was used to evaluate model performance. RESULTS From our results, we found several combinations of machine learning algorithms and feature selection methods to achieve similar performance. K-nearest neighbourhood (KNN) with ReliefF (RL) feature selection was the best-performing model to predict PFS (C-index = 0.61 and 0.604 in discovery and validation cohorts), while XGBoost with Mutual Information (MI) feature selection was the best-performing model for OS (C-index = 0.7 and 0.655 in discovery and validation cohorts). CONCLUSION The results of this study highlight the importance of implementing an appropriate feature selection method coupled with a machine learning strategy to develop robust survival models. With further validation of these models on external cohorts when available, this can have the potential to improve clinical decisions by systematically analyzing routine medical images.
Collapse
Affiliation(s)
- Sevinj Yolchuyeva
- Department of Mathematics and Computer Science, Université du Québec à Trois Rivières, Trois-Rivières, Canada
- Centre de Recherche du CHU de Québec, Université Laval, Québec, QC, Canada
| | - Leyla Ebrahimpour
- Quebec Heart & Lung Institute Research Center, Québec , Canada
- Centre de Recherche du CHU de Québec, Université Laval, Québec, QC, Canada
- Department of Physics, Laval University, Québec, Canada
| | - Marion Tonneau
- Centre de Recherche du Centre Hospitalier Universitaire de Montréal, Montreal, Canada
- Université de médecine de Lille, Lille, France
| | - Fabien Lamaze
- Quebec Heart & Lung Institute Research Center, Québec , Canada
| | - Michele Orain
- Quebec Heart & Lung Institute Research Center, Québec , Canada
| | | | - Julie Malo
- Centre de Recherche du Centre Hospitalier Universitaire de Montréal, Montreal, Canada
| | - Wiam Belkaid
- Centre de Recherche du Centre Hospitalier Universitaire de Montréal, Montreal, Canada
| | - Bertrand Routy
- Centre de Recherche du Centre Hospitalier Universitaire de Montréal, Montreal, Canada
| | - Philippe Joubert
- Quebec Heart & Lung Institute Research Center, Québec , Canada
- Department of Molecular Biology, Medical Biochemistry and Pathology, Laval University, Québec, Canada
| | - Venkata Sk Manem
- Department of Mathematics and Computer Science, Université du Québec à Trois Rivières, Trois-Rivières, Canada.
- Quebec Heart & Lung Institute Research Center, Québec , Canada.
- Centre de Recherche du CHU de Québec, Université Laval, Québec, QC, Canada.
| |
Collapse
|
32
|
Saha S, Nandi D. SVM-RLF-DNN: A DNN with reliefF and SVM for automatic identification of COVID from chest X-ray and CT images. Digit Health 2024; 10:20552076241257045. [PMID: 38812845 PMCID: PMC11135098 DOI: 10.1177/20552076241257045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 05/08/2024] [Indexed: 05/31/2024] Open
Abstract
Aim To develop an advanced determination technology for detecting COVID-19 patterns from chest X-ray and CT-scan films with distinct applications of deep learning and machine learning methods. Methods and Materials The newly enhanced proposed hybrid classification network (SVM-RLF-DNN) comprises of three phases: feature extraction, selection and classification. The in-depth features are extracted from a series of 3×3 convolution, 2×2 max polling operations followed by a flattened and fully connected layer of the deep neural network (DNN). ReLU activation function and Adam optimizer are used in the model. The ReliefF is an improved feature selection algorithm of Relief that uses Manhattan distance instead of Euclidean distance. Based on the significance of the feature, the ReliefF assigns weight to each extracted feature received from a fully connected layer. The weight to each feature is the average of k closest hits and misses in each class for a neighbouring instance pair in multiclass problems. The ReliefF eliminates lower-weight features by setting the node value to zero. The higher weights of the features are kept to obtain the feature selection. At the last layer of the neural network, the multiclass Support Vector Machine (SVM) is used to classify the patterns of COVID-19, viral pneumonia and healthy cases. The three classes with three binary SVM classifiers use linear kernel function for each binary SVM following a one-versus-all approach. The hinge loss function and L2-norm regularization are selected for more stable results. The proposed method is assessed on publicly available chest X-ray and CT-scan image databases from Kaggle and GitHub. The performance of the proposed classification model has comparable training, validation, and test accuracy, as well as sensitivity, specificity, and confusion matrix for quantitative evaluation on five-fold cross-validation. Results Our proposed network has achieved test accuracy of 98.48% and 95.34% on 2-class X-rays and CT. More importantly, the proposed model's test accuracy, sensitivity, and specificity are 87.9%, 86.32%, and 90.25% for 3-class classification (COVID-19, Pneumonia, Normal) on chest X-rays. The proposed model provides the test accuracy, sensitivity, and specificity of 95.34%, 94.12%, and 96.15% for 2-class classification (COVID-19, Non-COVID) on chest CT. Conclusion Our proposed classification network experimental results indicate competitiveness with existing neural networks. The proposed neural network assists clinicians in determining and surveilling the disease.
Collapse
Affiliation(s)
- Sanjib Saha
- Department of Computer Science and Engineering, National Institute of Technology, Durgapur, India
- Department of Computer Science and Engineering, Dr. B. C. Roy Engineering College, Durgapur, India
| | - Debashis Nandi
- Department of Computer Science and Engineering, National Institute of Technology, Durgapur, India
| |
Collapse
|
33
|
Hu WH, Lin SY, Hu YJ, Huang HY, Lu PL. Application of machine learning for mortality prediction in patients with candidemia: Feasibility verification and comparison with clinical severity scores. Mycoses 2024; 67:e13667. [PMID: 37914666 DOI: 10.1111/myc.13667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 10/12/2023] [Accepted: 10/18/2023] [Indexed: 11/03/2023]
Abstract
BACKGROUND Clinical severity scores, such as acute physiology, age, chronic health evaluation II (APACHE II), sequential organ failure assessment (SOFA), Pitt Bacteremia Score (PBS), and European Confederation of Medical Mycology Quality (EQUAL) score, may not reliably predict candidemia prognosis owing to their prespecified scorings that can limit their adaptability and applicability. OBJECTIVES Unlike those fixed and prespecified scorings, we aim to develop and validate a machine learning (ML) approach that is able to learn predictive models adaptively from available patient data to increase adaptability and applicability. METHODS Different ML algorithms follow different design philosophies and consequently, they carry different learning biases. We have designed an ensemble meta-learner based on stacked generalisation to integrate multiple learners as a team to work at its best in a synergy to improve predictive performances. RESULTS In the multicenter retrospective study, we analysed 512 patients with candidemia from January 2014 to July 2019 and compared a stacked generalisation model (SGM) with APACHE II, SOFA, PBS and EQUAL score to predict the 14-day mortality. The cross-validation results showed that the SGM significantly outperformed APACHE II, SOFA, PBS, and EQUAL score across several metrics, including F1-score (0.68, p < .005), Matthews correlation coefficient (0.54, p < .05 vs. SOFA, p < .005 vs. the others) and the area under the curve (AUC; 0.87, p < .005). In addition, in an independent external test, the model effectively predicted patients' mortality in the external validation cohort, with an AUC of 0.77. CONCLUSIONS ML models show potential for improving mortality prediction amongst patients with candidemia compared to clinical severity scores.
Collapse
Affiliation(s)
- Wei-Huan Hu
- College of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Shang-Yi Lin
- Division of Infectious Diseases, Department of Internal Medicine, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
- College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
- Department of Laboratory Medicine, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
| | - Yuh-Jyh Hu
- College of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
- Institute of Biomedical Engineering, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Ho-Yin Huang
- Department of Pharmacy, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
- School of Pharmacy, College of Pharmacy, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Po-Liang Lu
- Division of Infectious Diseases, Department of Internal Medicine, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
- School of Post-Baccalaureate Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
- Center for Liquid Biopsy and Cohort Research, Kaohsiung Medical University, Kaohsiung, Taiwan
| |
Collapse
|
34
|
Longo LHDC, Roberto GF, Tosta TAA, de Faria PR, Loyola AM, Cardoso SV, Silva AB, do Nascimento MZ, Neves LA. Classification of Multiple H&E Images via an Ensemble Computational Scheme. ENTROPY (BASEL, SWITZERLAND) 2023; 26:34. [PMID: 38248160 PMCID: PMC10814107 DOI: 10.3390/e26010034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 12/23/2023] [Accepted: 12/25/2023] [Indexed: 01/23/2024]
Abstract
In this work, a computational scheme is proposed to identify the main combinations of handcrafted descriptors and deep-learned features capable of classifying histological images stained with hematoxylin and eosin. The handcrafted descriptors were those representatives of multiscale and multidimensional fractal techniques (fractal dimension, lacunarity and percolation) applied to quantify the histological images with the corresponding representations via explainable artificial intelligence (xAI) approaches. The deep-learned features were obtained from different convolutional neural networks (DenseNet-121, EfficientNet-b2, Inception-V3, ResNet-50 and VGG-19). The descriptors were investigated through different associations. The most relevant combinations, defined through a ranking algorithm, were analyzed via a heterogeneous ensemble of classifiers with the support vector machine, naive Bayes, random forest and K-nearest neighbors algorithms. The proposed scheme was applied to histological samples representative of breast cancer, colorectal cancer, oral dysplasia and liver tissue. The best results were accuracy rates of 94.83% to 100%, with the identification of pattern ensembles for classifying multiple histological images. The computational scheme indicated solutions exploring a reduced number of features (a maximum of 25 descriptors) and with better performance values than those observed in the literature. The presented information in this study is useful to complement and improve the development of computer-aided diagnosis focused on histological images.
Collapse
Affiliation(s)
- Leonardo H. da Costa Longo
- Department of Computer Science and Statistics (DCCE), São Paulo State University (UNESP), Rua Cristóvão Colombo, 2265, São José do Rio Preto 15054-000, SP, Brazil
| | - Guilherme F. Roberto
- Department of Informatics Engineering, Faculty of Engineering, University of Porto, Dr. Roberto Frias, sn, 4200-465 Porto, Portugal;
| | - Thaína A. A. Tosta
- Science and Technology Institute, Federal University of São Paulo (UNIFESP), Avenida Cesare Mansueto Giulio Lattes, 1201, São José dos Campos 12247-014, SP, Brazil;
| | - Paulo R. de Faria
- Department of Histology and Morphology, Institute of Biomedical Science, Federal University of Uberlândia (UFU), Av. Amazonas, S/N, Uberlândia 38405-320, MG, Brazil;
| | - Adriano M. Loyola
- Area of Oral Pathology, School of Dentistry, Federal University of Uberlândia (UFU), R. Ceará—Umuarama, Uberlândia 38402-018, MG, Brazil; (A.M.L.)
| | - Sérgio V. Cardoso
- Area of Oral Pathology, School of Dentistry, Federal University of Uberlândia (UFU), R. Ceará—Umuarama, Uberlândia 38402-018, MG, Brazil; (A.M.L.)
| | - Adriano B. Silva
- Faculty of Computer Science (FACOM), Federal University of Uberlândia (UFU), Avenida João Naves de Ávila 2121, Bl.B, Uberlândia 38400-902, MG, Brazil
| | - Marcelo Z. do Nascimento
- Faculty of Computer Science (FACOM), Federal University of Uberlândia (UFU), Avenida João Naves de Ávila 2121, Bl.B, Uberlândia 38400-902, MG, Brazil
| | - Leandro A. Neves
- Department of Computer Science and Statistics (DCCE), São Paulo State University (UNESP), Rua Cristóvão Colombo, 2265, São José do Rio Preto 15054-000, SP, Brazil
| |
Collapse
|
35
|
Cui Y, Lu W, Xue J, Ge L, Yin X, Jian S, Li H, Zhu B, Dai Z, Shen Q. Machine learning-guided REIMS pattern recognition of non-dairy cream, milk fat cream and whipping cream for fraudulence identification. Food Chem 2023; 429:136986. [PMID: 37516053 DOI: 10.1016/j.foodchem.2023.136986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Revised: 07/02/2023] [Accepted: 07/22/2023] [Indexed: 07/31/2023]
Abstract
The illegal adulteration of non-dairy cream in milk fat cream during the manufacturing process of baked goods has significantly hindered the robust growth of the dairy industry. In this study, a method based on rapid evaporative ionization mass spectrometry (REIMS) lipidomics pattern recognition integrated with machine learning algorithms was established. A total of 26 ions with importance were picked using multivariate statistical analysis as salient contributing features to distinguish between milk fat cream and non-dairy cream. Furthermore, employing discriminant analysis, decision trees, support vector machines, and neural network classifiers, machine learning models were utilized to classify non-dairy cream, milk fat cream, and minute quantities of non-dairy cream adulterated in milk fat cream. These approaches were enhanced through hyperparameter optimization and feature engineering, yielding accuracy rates at 98.4-99.6%. This artificial intelligent method of machine learning-guided REIMS pattern recognition can accurately identify adulteration of whipped cream and might help combat food fraud.
Collapse
Affiliation(s)
- Yiwei Cui
- Collaborative Innovation Center of Seafood Deep Processing, Institute of Seafood, Zhejiang Gongshang University, Hangzhou 310012, China; Zhejiang Province Joint Key Laboratory of Aquatic Products Processing, Institute of Seafood, Zhejiang Gongshang University, Hangzhou 310012, China
| | - Weibo Lu
- Collaborative Innovation Center of Seafood Deep Processing, Institute of Seafood, Zhejiang Gongshang University, Hangzhou 310012, China
| | - Jing Xue
- Collaborative Innovation Center of Seafood Deep Processing, Institute of Seafood, Zhejiang Gongshang University, Hangzhou 310012, China; Zhejiang Province Joint Key Laboratory of Aquatic Products Processing, Institute of Seafood, Zhejiang Gongshang University, Hangzhou 310012, China
| | - Lijun Ge
- Collaborative Innovation Center of Seafood Deep Processing, Institute of Seafood, Zhejiang Gongshang University, Hangzhou 310012, China
| | - Xuelian Yin
- Collaborative Innovation Center of Seafood Deep Processing, Institute of Seafood, Zhejiang Gongshang University, Hangzhou 310012, China
| | - Shikai Jian
- Collaborative Innovation Center of Seafood Deep Processing, Institute of Seafood, Zhejiang Gongshang University, Hangzhou 310012, China
| | - Haihong Li
- Hangzhou Linping District Maternal & Child Health Care Hospital, Hangzhou 311113, China
| | - Beiwei Zhu
- National Engineering Research Center of Seafood, Collaborative Innovation Center of Provincial and Ministerial Co-Construction for Seafood Deep Processing, School of Food Science and Technology, Dalian Polytechnic University, Dalian 116034, China
| | - Zhiyuan Dai
- Collaborative Innovation Center of Seafood Deep Processing, Institute of Seafood, Zhejiang Gongshang University, Hangzhou 310012, China; Zhejiang Province Joint Key Laboratory of Aquatic Products Processing, Institute of Seafood, Zhejiang Gongshang University, Hangzhou 310012, China.
| | - Qing Shen
- Department of Clinical Laboratory, The Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People's Hospital, Quzhou 324000, China; Zhejiang Province Joint Key Laboratory of Aquatic Products Processing, Institute of Seafood, Zhejiang Gongshang University, Hangzhou 310012, China.
| |
Collapse
|
36
|
Nath A. Physicochemical and sequence determinants of antiviral peptides. Biol Futur 2023; 74:489-506. [PMID: 37889451 DOI: 10.1007/s42977-023-00188-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Accepted: 10/06/2023] [Indexed: 10/28/2023]
Abstract
Antiviral peptides (AVPs) open new possibilities as an effective antiviral therapeutic in the current scenario of evolving drug-resistant viruses. Knowledge about the sequence and structure activity relationship in AVPs is still largely unknown. AVPs and antimicrobial peptides (AMPs) share several common features but as they target different life forms (living organisms and viruses), exploring the differential sequence features may facilitate in designing specific AVPs. The current work developed accurate prediction models for discriminating (a) AVPs from AMPs, (b) Coronaviridae AVPs from other virus family specific AVPs and (c) highly active AVPs (HAA) from lowly active AVPs (LAA). Further explainable machine learning methods (using model agnostic global interpretable methods) are utilized for exploring and interpreting the physicochemical spaces of AVPs, Coronaviridae AVPs and highly active AVPs. To further understand the association of physicochemical space distribution with pIC50 values, regression models were developed and analyzed using accumulated local effects and interaction strength analysis. An independent sample t-test is used to filter out the significant compositional differences between the smaller length HAA and longer length HAA groups. AVPs prefer lower charge/length ratio and basic residues in comparison with AMPs. Coronaviridae family-specific AVPs have lower propensities for basic amino acids, charge and preference for aspartic acid. Further there is prevalence for basic residues in lowly active AVPs as compared to highly active AVPs. Sequence order effects captured in terms of average amino acid pair distances proved to be more constructive in deciphering the sequences of AVPs.
Collapse
Affiliation(s)
- Abhigyan Nath
- Department of Biochemistry, Pt. Jawahar Lal Nehru Memorial Medical College, Raipur, 492001, India.
| |
Collapse
|
37
|
Yolchuyeva S, Giacomazzi E, Tonneau M, Lamaze F, Orain M, Coulombe F, Malo J, Belkaid W, Routy B, Joubert P, Manem VS. Imaging-Based Biomarkers Predict Programmed Death-Ligand 1 and Survival Outcomes in Advanced NSCLC Treated With Nivolumab and Pembrolizumab: A Multi-Institutional Study. JTO Clin Res Rep 2023; 4:100602. [PMID: 38124790 PMCID: PMC10730368 DOI: 10.1016/j.jtocrr.2023.100602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 10/18/2023] [Accepted: 11/08/2023] [Indexed: 12/23/2023] Open
Abstract
Background Although the immune checkpoint inhibitors, nivolumab and pembrolizumab, were found to be promising in patients with advanced NSCLC, some of them either do not respond or have recurrence after an initial response. It is still unclear who will benefit from these therapies, and, hence, there is an unmet clinical need to build robust biomarkers. Methods Patients with advanced NSCLC (N = 323) who were treated with pembrolizumab or nivolumab were retrospectively identified from two institutions. Radiomics features extracted from baseline pretreatment computed tomography scans along with the clinical variables were used to build the predictive models for overall survival (OS), progression-free survival (PFS), and programmed death-ligand 1 (PD-L1). To develop the imaging and integrative clinical-imaging predictive models, we used the XGBoost learning algorithm with ReliefF feature selection method and validated them in an independent cohort. The concordance index for OS, PFS, and area under the curve for PD-L1 was used to evaluate model performance. Results We developed radiomics and the ensemble radiomics-clinical predictive models for OS, PFS, and PD-L1 expression. The concordance indices of the radiomics model were 0.60 and 0.61 for predicting OS and PFS and area under the curve was 0.61 for predicting PD-L1 in the validation cohort, respectively. The combined radiomics-clinical model resulted in higher performance with 0.65, 0.63, and 0.68 to predict OS, PFS, and PD-L1 in the validation cohort, respectively. Conclusions We found that pretreatment computed tomography imaging along with clinical data can aid as predictive biomarkers for PD-L1 and survival end points. These imaging-driven approaches may prove useful to expand the therapeutic options for nonresponders and improve the selection of patients who would benefit from immune checkpoint inhibitors.
Collapse
Affiliation(s)
- Sevinj Yolchuyeva
- Department of Mathematics and Computer Science, Université du Québec à Trois Rivières, Quebec, Canada
| | - Elena Giacomazzi
- Department of Mathematics and Computer Science, Université du Québec à Trois Rivières, Quebec, Canada
| | - Marion Tonneau
- Centre de Recherche du Centre Hospitalier Universitaire de Montréal, Quebec, Canada
- Université de Médecine de Lille, Lille, France
| | - Fabien Lamaze
- Quebec Heart & Lung Institute Research Center, Quebec, Canada
| | - Michele Orain
- Quebec Heart & Lung Institute Research Center, Quebec, Canada
| | | | - Julie Malo
- Centre de Recherche du Centre Hospitalier Universitaire de Montréal, Quebec, Canada
| | - Wiam Belkaid
- Centre de Recherche du Centre Hospitalier Universitaire de Montréal, Quebec, Canada
| | - Bertrand Routy
- Centre de Recherche du Centre Hospitalier Universitaire de Montréal, Quebec, Canada
- Centre Hospitalier Universitaire de Montréal, Hemato-Oncology Service, Quebec, Canada
| | - Philippe Joubert
- Quebec Heart & Lung Institute Research Center, Quebec, Canada
- Department of Molecular Biology, Medical Biochemistry and Pathology, Laval University, Quebec, Canada
| | - Venkata S.K. Manem
- Department of Mathematics and Computer Science, Université du Québec à Trois Rivières, Quebec, Canada
- Quebec Heart & Lung Institute Research Center, Quebec, Canada
- Centre de Recherche du CHU de Québec – Université Laval, Quebec, Canada
| |
Collapse
|
38
|
Li CL, Fisher CJ, Komolibus K, Grygoryev K, Lu H, Burke R, Visentin A, Andersson-Engels S. Frameworks of wavelength selection in diffuse reflectance spectroscopy for tissue differentiation in orthopedic surgery. JOURNAL OF BIOMEDICAL OPTICS 2023; 28:121207. [PMID: 37674977 PMCID: PMC10479945 DOI: 10.1117/1.jbo.28.12.121207] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 08/04/2023] [Accepted: 08/08/2023] [Indexed: 09/08/2023]
Abstract
Significance Wavelength selection from a large diffuse reflectance spectroscopy (DRS) dataset enables removal of spectral multicollinearity and thus leads to improved understanding of the feature domain. Feature selection (FS) frameworks are essential to discover the optimal wavelengths for tissue differentiation in DRS-based measurements, which can facilitate the development of compact multispectral optical systems with suitable illumination wavelengths for clinical translation. Aim The aim was to develop an FS methodology to determine wavelengths with optimal discriminative power for orthopedic applications, while providing the frameworks for adaptation to other clinical scenarios. Approach An ensemble framework for FS was developed, validated, and compared with frameworks incorporating conventional algorithms, including principal component analysis (PCA), linear discriminant analysis (LDA), and backward interval partial least squares (biPLS). Results Via the one-versus-rest binary classification approach, a feature subset of 10 wavelengths was selected from each framework yielding comparable balanced accuracy scores (PCA: 94.8 ± 3.47 % , LDA: 98.2 ± 2.02 % , biPLS: 95.8 ± 3.04 % , and ensemble: 95.8 ± 3.16 % ) to those of using all features (100%) for cortical bone versus the rest class labels. One hundred percent balanced accuracy scores were generated for bone cement versus the rest. Different feature subsets achieving similar outcomes could be identified due to spectral multicollinearity. Conclusions Wavelength selection frameworks provide a means to explore domain knowledge and discover important contributors to classification in spectroscopy. The ensemble framework generated a model with improved interpretability and preserved physical interpretation, which serves as the basis to determine illumination wavelengths in optical instrumentation design.
Collapse
Affiliation(s)
- Celina L. Li
- University College Cork, Biophotonics@Tyndall, IPIC, Tyndall National Institute, Cork, Ireland
| | - Carl J. Fisher
- University College Cork, Biophotonics@Tyndall, IPIC, Tyndall National Institute, Cork, Ireland
| | - Katarzyna Komolibus
- University College Cork, Biophotonics@Tyndall, IPIC, Tyndall National Institute, Cork, Ireland
| | - Konstantin Grygoryev
- University College Cork, Biophotonics@Tyndall, IPIC, Tyndall National Institute, Cork, Ireland
| | - Huihui Lu
- University College Cork, Biophotonics@Tyndall, IPIC, Tyndall National Institute, Cork, Ireland
| | - Ray Burke
- University College Cork, Biophotonics@Tyndall, IPIC, Tyndall National Institute, Cork, Ireland
| | - Andrea Visentin
- University College Cork, School of Computer Science and Information Technology, Insight Centre for Data Analytics, Cork, Ireland
| | - Stefan Andersson-Engels
- University College Cork, Biophotonics@Tyndall, IPIC, Tyndall National Institute, Cork, Ireland
- University College Cork, Department of Physics, Cork, Ireland
| |
Collapse
|
39
|
Li X, Wang GA, Wei Z, Wang H, Zhu X. Protein-DNA interface hotspots prediction based on fusion features of embeddings of protein language model and handcrafted features. Comput Biol Chem 2023; 107:107970. [PMID: 37866116 DOI: 10.1016/j.compbiolchem.2023.107970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 10/06/2023] [Accepted: 10/07/2023] [Indexed: 10/24/2023]
Abstract
The identification of hotspot residues at the protein-DNA binding interfaces plays a crucial role in various aspects such as drug discovery and disease treatment. Although experimental methods such as alanine scanning mutagenesis have been developed to determine the hotspot residues on protein-DNA interfaces, they are both inefficient and costly. Therefore, it is highly necessary to develop efficient and accurate computational methods for predicting hotspot residues. Several computational methods have been developed, however, they are mainly based on hand-crafted features which may not be able to represent all the information of proteins. In this regard, we propose a model called PDH-EH, which utilizes fused features of embeddings extracted from a protein language model (PLM) and handcrafted features. After we extracted the total 1141 dimensional features, we used mRMR to select the optimal feature subset. Based on the optimal feature subset, several different learning algorithms such as Random Forest, Support Vector Machine, and XGBoost were used to build the models. The cross-validation results on the training dataset show that the model built by using Random Forest achieves the highest AUROC. Further evaluation on the independent test set shows that our model outperforms the existing state-of-the-art models. Moreover, the effectiveness and interpretability of embeddings extracted from PLM were demonstrated in our analysis. The codes and datasets used in this study are available at: https://github.com/lixiangli01/PDH-EH.
Collapse
Affiliation(s)
- Xiang Li
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Gang-Ao Wang
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Zhuoyu Wei
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Hong Wang
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Xiaolei Zhu
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China.
| |
Collapse
|
40
|
Caroppo E, Calabrese C, Mazza M, Rinaldi A, Coluzzi D, Napoli P, Sapienza M, Porfiri M, De Lellis P. Migrants' mental health recovery in Italian reception facilities. COMMUNICATIONS MEDICINE 2023; 3:162. [PMID: 37993495 PMCID: PMC10665420 DOI: 10.1038/s43856-023-00385-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 10/12/2023] [Indexed: 11/24/2023] Open
Abstract
BACKGROUND Forced migration leaves deep marks on the psychological well-being of migrants, with post-traumatic stress disorder (PTSD) and other psychological conditions being prevalent among them. While research has clarified the extent to which pre-migration trauma is a predictor of mental health outcomes, the role of post-migration stressors in the settlement environment are yet to be fully characterized. METHODS We monitored mental health of a cohort of 100 asylum-seekers during their 14-day COVID-19-related quarantine in reception facilities in Rome, Italy, through the administration of six questionnaires (a demographic survey, the WHO-5 well-being index, the Primary Care PTSD Screen for Diagnostic and Statistical Manual of Mental Disorders 5 (DSM-5), the Harvard Trauma Questionnaire, the Trauma and Loss Spectrum-Self Report, and the LiMEs-Italian version). Through the combination of statistical analysis and supervised learning, we studied the impact of the first contact with the reception system on asylum-seekers' mental health and sought for possible risk and shielding factors for PTSD. RESULTS We find that sheltering in refugee centers has a positive impact on migrants' mental health; asylum-seekers with PTSD reported more traumatic events and personality characteristics related to loss and trauma; life events are predictors of PTSD in asylum-seekers. CONCLUSIONS We identify past traumatic experiences as predictors of PTSD, and establish the positive role the immediate post-migration environment can play on migrants' psychological well-being. We recommend for host countries to implement reception models that provide effective protection and integration of asylum-seekers, similar to those in the Italian system.
Collapse
Affiliation(s)
- Emanuele Caroppo
- Department of Mental Health, Local Health Authority Roma 2, Rome, Italy.
| | - Carmela Calabrese
- Department of Electrical Engineering and Information Technology, University of Naples Federico II, Naples, Italy
- Institut de Neurosciences des Systémes (INS), Aix Marseille Université, 13, Marseille, France
| | - Marianna Mazza
- Institute of Psychiatry and Psychology, Department of Geriatrics, Neuroscience and Orthopedics, Fondazione Policlinico Universitario A. Gemelli IRCCS, Università Cattolica del Sacro Cuore, Rome, Italy
- Department of Psychiatry, Università Cattolica del Sacro Cuore, Rome, Italy
| | | | - Daniele Coluzzi
- Migrant Health Unit, Local Health Authority Roma 2, Rome, Italy
| | | | - Martina Sapienza
- Department of Life Sciences and Public Health, Università Cattolica del Sacro Cuore, Rome, Italy
| | - Maurizio Porfiri
- Center for Urban Science and Progress, Department of Mechanical and Aerospace Engineering, and Department of Biomedical Engineering, New York University Tandon School of Engineering, Brooklyn, NY, USA.
| | - Pietro De Lellis
- Department of Electrical Engineering and Information Technology, University of Naples Federico II, Naples, Italy.
| |
Collapse
|
41
|
Huang H, Liu C, Wagle MM, Yang P. Evaluation of deep learning-based feature selection for single-cell RNA sequencing data analysis. Genome Biol 2023; 24:259. [PMID: 37950331 PMCID: PMC10638755 DOI: 10.1186/s13059-023-03100-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 10/24/2023] [Indexed: 11/12/2023] Open
Abstract
BACKGROUND Feature selection is an essential task in single-cell RNA-seq (scRNA-seq) data analysis and can be critical for gene dimension reduction and downstream analyses, such as gene marker identification and cell type classification. Most popular methods for feature selection from scRNA-seq data are based on the concept of differential distribution wherein a statistical model is used to detect changes in gene expression among cell types. Recent development of deep learning-based feature selection methods provides an alternative approach compared to traditional differential distribution-based methods in that the importance of a gene is determined by neural networks. RESULTS In this work, we explore the utility of various deep learning-based feature selection methods for scRNA-seq data analysis. We sample from Tabula Muris and Tabula Sapiens atlases to create scRNA-seq datasets with a range of data properties and evaluate the performance of traditional and deep learning-based feature selection methods for cell type classification, feature selection reproducibility and diversity, and computational time. CONCLUSIONS Our study provides a reference for future development and application of deep learning-based feature selection methods for single-cell omics data analyses.
Collapse
Affiliation(s)
- Hao Huang
- Computational Systems Biology Unit, Faculty of Medicine and Health, Children's Medical Research Institute, University of Sydney, Westmead, NSW, 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, University of Sydney, Camperdown, NSW, 2006, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, 2006, Australia
| | - Chunlei Liu
- Computational Systems Biology Unit, Faculty of Medicine and Health, Children's Medical Research Institute, University of Sydney, Westmead, NSW, 2145, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, 2006, Australia
| | - Manoj M Wagle
- Computational Systems Biology Unit, Faculty of Medicine and Health, Children's Medical Research Institute, University of Sydney, Westmead, NSW, 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, University of Sydney, Camperdown, NSW, 2006, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, 2006, Australia
| | - Pengyi Yang
- Computational Systems Biology Unit, Faculty of Medicine and Health, Children's Medical Research Institute, University of Sydney, Westmead, NSW, 2145, Australia.
- School of Mathematics and Statistics, Faculty of Science, University of Sydney, Camperdown, NSW, 2006, Australia.
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, 2006, Australia.
- Charles Perkins Centre, University of Sydney, Camperdown, NSW, 2006, Australia.
| |
Collapse
|
42
|
Uddin MG, Diganta MTM, Sajib AM, Rahman A, Nash S, Dabrowski T, Ahmadian R, Hartnett M, Olbert AI. Assessing the impact of COVID-19 lockdown on surface water quality in Ireland using advanced Irish water quality index (IEWQI) model. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2023; 336:122456. [PMID: 37673321 DOI: 10.1016/j.envpol.2023.122456] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 07/23/2023] [Accepted: 08/23/2023] [Indexed: 09/08/2023]
Abstract
The COVID-19 pandemic has significantly impacted various aspects of life, including environmental conditions. Surface water quality (WQ) is one area affected by lockdowns imposed to control the virus's spread. Numerous recent studies have revealed the considerable impact of COVID-19 lockdowns on surface WQ. In response, this research aimed to assess the impact of COVID-19 lockdowns on surface water quality in Ireland using an advanced WQ model. To achieve this goal, six years of water quality monitoring data from 2017 to 2022 were collected for nine water quality indicators in Cork Harbour, Ireland, before, during, and after the lockdowns. These indicators include pH, water temperature (TEMP), salinity (SAL), biological oxygen demand (BOD5), dissolved oxygen (DOX), transparency (TRAN), and three nutrient enrichment indicators-dissolved inorganic nitrogen (DIN), molybdate reactive phosphorus (MRP), and total oxidized nitrogen (TON). The results showed that the lockdown had a significant impact on various WQ indicators, particularly pH, TEMP, TON, and BOD5. Over the study period, most indicators were within the permissible limit except for MRP, with the exception of during COVID-19. During the pandemic, TON and DIN decreased, while water transparency significantly improved. In contrast, after COVID-19, WQ at 7% of monitoring sites significantly deteriorated. Overall, WQ in Cork Harbour was categorized as "good," "fair," and "marginal" classes over the study period. Compared to temporal variation, WQ improved at 17% of monitoring sites during the lockdown period in Cork Harbour. However, no significant trend in WQ was observed. Furthermore, the study analyzed the advanced model's performance in assessing the impact of COVID-19 on WQ. The results indicate that the advanced WQ model could be an effective tool for monitoring and evaluating lockdowns' impact on surface water quality. The model can provide valuable information for decision-making and planning to protect aquatic ecosystems.
Collapse
Affiliation(s)
- Md Galal Uddin
- School of Engineering, University of Galway, Ireland; Ryan Institute, University of Galway, Ireland; MaREI Research Centre, University of Galway, Ireland; Eco-HydroInformatics Research Group (EHIRG), Civil Engineering, University of Galway, Ireland.
| | - Mir Talas Mahammad Diganta
- School of Engineering, University of Galway, Ireland; Ryan Institute, University of Galway, Ireland; MaREI Research Centre, University of Galway, Ireland; Eco-HydroInformatics Research Group (EHIRG), Civil Engineering, University of Galway, Ireland
| | - Abdul Majed Sajib
- School of Engineering, University of Galway, Ireland; Ryan Institute, University of Galway, Ireland; MaREI Research Centre, University of Galway, Ireland; Eco-HydroInformatics Research Group (EHIRG), Civil Engineering, University of Galway, Ireland
| | - Azizur Rahman
- School of Computing, Mathematics and Engineering, Charles Sturt University, Wagga Wagga, Australia; The Gulbali Institute of Agriculture, Water and Environment, Charles Sturt University, Wagga Wagga, Australia
| | - Stephen Nash
- School of Engineering, University of Galway, Ireland; Ryan Institute, University of Galway, Ireland; MaREI Research Centre, University of Galway, Ireland
| | | | - Reza Ahmadian
- School of Engineering, Cardiff University, The Parade, Cardiff, CF24 3AQ, UK
| | - Michael Hartnett
- School of Engineering, University of Galway, Ireland; Ryan Institute, University of Galway, Ireland
| | - Agnieszka I Olbert
- School of Engineering, University of Galway, Ireland; Ryan Institute, University of Galway, Ireland; MaREI Research Centre, University of Galway, Ireland; Eco-HydroInformatics Research Group (EHIRG), Civil Engineering, University of Galway, Ireland
| |
Collapse
|
43
|
Kishore A, Venkataramana L, Prasad DVV, Mohan A, Jha B. Enhancing the prediction of IDC breast cancer staging from gene expression profiles using hybrid feature selection methods and deep learning architecture. Med Biol Eng Comput 2023; 61:2895-2919. [PMID: 37530887 DOI: 10.1007/s11517-023-02892-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 07/19/2023] [Indexed: 08/03/2023]
Abstract
Prediction of the stage of cancer plays an important role in planning the course of treatment and has been largely reliant on imaging tools which do not capture molecular events that cause cancer progression. Gene-expression data-based analyses are able to identify these events, allowing RNA-sequence and microarray cancer data to be used for cancer analyses. Breast cancer is the most common cancer worldwide, and is classified into four stages - stages 1, 2, 3, and 4 [2]. While machine learning models have previously been explored to perform stage classification with limited success, multi-class stage classification has not had significant progress. There is a need for improved multi-class classification models, such as by investigating deep learning models. Gene-expression-based cancer data is characterised by the small size of available datasets, class imbalance, and high dimensionality. Class balancing methods must be applied to the dataset. Since all the genes are not necessary for stage prediction, retaining only the necessary genes can improve classification accuracy. The breast cancer samples are to be classified into 4 classes of stages 1 to 4. Invasive ductal carcinoma breast cancer samples are obtained from The Cancer Genome Atlas (TCGA) and Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) datasets and combined. Two class balancing techniques are explored, synthetic minority oversampling technique (SMOTE) and SMOTE followed by random undersampling. A hybrid feature selection pipeline is proposed, with three pipelines explored involving combinations of filter and embedded feature selection methods: Pipeline 1 - minimum-redundancy maximum-relevancy (mRMR) and correlation feature selection (CFS), Pipeline 2 - mRMR, mutual information (MI) and CFS, and Pipeline 3 - mRMR and support vector machine-recursive feature elimination (SVM-RFE). The classification is done using deep learning models, namely deep neural network, convolutional neural network, recurrent neural network, a modified deep neural network, and an AutoKeras generated model. Classification performance post class-balancing and various feature selection techniques show marked improvement over classification prior to feature selection. The best multiclass classification was found to be by a deep neural network post SMOTE and random undersampling, and feature selection using mRMR and recursive feature elimination, with a Cohen-Kappa score of 0.303 and a classification accuracy of 53.1%. For binary classification into early and late-stage cancer, the best performance is obtained by a modified deep neural network (DNN) post SMOTE and random undersampling, and feature selection using mRMR and recursive feature elimination, with an accuracy of 81.0% and a Cohen-Kappa score (CKS) of 0.280. This pipeline also showed improved multiclass classification performance on neuroblastoma cancer data, with a best area under the receiver operating characteristic (auROC) curve score of 0.872, as compared to 0.71 obtained in previous work, an improvement of 22.81%. The results and analysis reveal that feature selection techniques play a vital role in gene-expression data-based classification, and the proposed hybrid feature selection pipeline improves classification performance. Multi-class classification is possible using deep learning models, though further improvement particularly in late-stage classification is necessary and should be explored further.
Collapse
Affiliation(s)
- Akash Kishore
- Department of CSE, Sri Sivasubramaniya Nadar College of Engineering, Kalavakkam, Chennai, India
| | - Lokeswari Venkataramana
- Department of CSE, Sri Sivasubramaniya Nadar College of Engineering, Kalavakkam, Chennai, India.
| | - D Venkata Vara Prasad
- Department of CSE, Sri Sivasubramaniya Nadar College of Engineering, Kalavakkam, Chennai, India
| | - Akshaya Mohan
- Department of CSE, Sri Sivasubramaniya Nadar College of Engineering, Kalavakkam, Chennai, India
| | - Bhavya Jha
- Department of CSE, Sri Sivasubramaniya Nadar College of Engineering, Kalavakkam, Chennai, India
| |
Collapse
|
44
|
Mohammadi A, Torres-Cuenca T, Mirza-Aghazadeh-Attari M, Faeghi F, Acharya UR, Abbasian Ardakani A. Deep Radiomics Features of Median Nerves for Automated Diagnosis of Carpal Tunnel Syndrome With Ultrasound Images: A Multi-Center Study. JOURNAL OF ULTRASOUND IN MEDICINE : OFFICIAL JOURNAL OF THE AMERICAN INSTITUTE OF ULTRASOUND IN MEDICINE 2023; 42:2257-2268. [PMID: 37159483 DOI: 10.1002/jum.16244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 03/18/2023] [Accepted: 04/16/2023] [Indexed: 05/11/2023]
Abstract
OBJECTIVES Ultrasound is widely used in diagnosing carpal tunnel syndrome (CTS). However, the limitations of ultrasound in CTS detection are the lack of objective measures in the detection of nerve abnormality and the operator-dependent nature of ultrasound imaging. Therefore, in this study, we developed and proposed externally validated artificial intelligence (AI) models based on deep-radiomics features. METHODS We have used 416 median nerves from 2 countries (Iran and Colombia) for the development (112 entrapped and 112 normal nerves from Iran) and validation (26 entrapped and 26 normal nerves from Iran, and 70 entrapped and 70 normal nerves from Columbia) of our models. Ultrasound images were fed to the SqueezNet architecture to extract deep-radiomics features. Then a ReliefF method was used to select the clinically significant features. The selected deep-radiomics features were fed to 9 common machine-learning algorithms to choose the best-performing classifier. The 2 best-performing AI models were then externally validated. RESULTS Our developed model achieved an area under the receiver operating characteristic (ROC) curve (AUC) of 0.910 (88.46% sensitivity, 88.46% specificity) and 0.908 (84.62% sensitivity, 88.46% specificity) with support vector machine and stochastic gradient descent (SGD), respectively using the internal validation dataset. Furthermore, both models consistently performed well in the external validation dataset, and achieved an AUC of 0.890 (85.71% sensitivity, 82.86% specificity) and 0.890 (84.29% sensitivity and 82.86% specificity), with SVM and SGD models, respectively. CONCLUSION Our proposed AI models fed with deep-radiomics features performed consistently with internal and external datasets. This justifies that our proposed system can be employed for clinical use in hospitals and polyclinics.
Collapse
Affiliation(s)
- Afshin Mohammadi
- Department of Radiology, Faculty of Medicine, Urmia University of Medical Science, Urmia, Iran
| | - Thomas Torres-Cuenca
- Department of Physical Medicine and Rehabilitation, National University of Colombia, Bogotá, Colombia
| | - Mohammad Mirza-Aghazadeh-Attari
- Russell H. Morgan Department of Radiology and Radiological Sciences, School of Medicine, Johns Hopkins University, Baltimore, Maryland, USA
| | - Fariborz Faeghi
- Department of Radiology Technology, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - U Rajendra Acharya
- School of Mathematics, Physics and Computing, University of Southern Queensland, Springfield, Queensland, Australia
- Department of Biomedical Engineering, School of Science and Technology, SUSS University, Singapore
- Department of Biomedical Informatics and Medical Engineering, Asia University, Taichung, Taiwan
| | - Ali Abbasian Ardakani
- Department of Radiology Technology, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
45
|
Lim S, Oh T, Ngayo G. Analyzing factors affecting risk aversion: Case of life insurance data in Korea. Heliyon 2023; 9:e20697. [PMID: 37829817 PMCID: PMC10565772 DOI: 10.1016/j.heliyon.2023.e20697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 10/04/2023] [Accepted: 10/04/2023] [Indexed: 10/14/2023] Open
Abstract
This research employs machine learning analysis on extensive data from a prominent Korean life insurance company to substantiate the insurance demand theory, which posits that insurance demand increases with risk aversion. We quantitatively delineate the traits of risk-averse individuals. Our study focuses on a cohort of 94,306 individuals who have filed insurance claims due to illness. To forecast prospective insurance consumers inclined toward additional purchases, we construct a predictive model using a machine learning algorithm. This model incorporates 19 demographic and socioeconomic factors as independent variables, with additional insurance acquisition as the dependent variable. Consequently, we uncover the distinctive characteristics of consumers predicted to acquire supplementary insurance products. Our findings reveal a significant association between the independent variables and the likelihood of purchasing additional insurance. Notably, 10 out of the 19 independent variables exert a substantial influence on additional insurance acquisitions. These characteristics encompass residence in rural areas, a higher likelihood of being female, advanced age, increased assets, a higher likelihood of being blue-collar workers, lower education levels, a greater likelihood of being married or divorced/separated, a history of cancer, and a predisposition for existing policyholders with prior subscriptions to actual loss insurance or substantial insurance contract amounts. Our study holds academic significance by addressing limitations observed in prior research, which predominantly relied on questionnaires to qualitatively assess risk aversion. Instead, we offer specific insights into individual characteristics associated with risk aversion. Moreover, we anticipate that Korean insurance companies can leverage these insights to attract new clientele while retaining existing members through predictive risk aversion analysis. These findings also offer valuable insights across a spectrum of disciplines, including business administration, psychology, education, sociology, and sales/marketing, related to individuals' risk preferences and behaviors.
Collapse
Affiliation(s)
- Sehyun Lim
- Seoul Business School, aSSIST University, 6 Ewhayeodae 2-gil, Fintower, Sinchon-ro, Seodaemun-gu, Seoul, South Korea, 03767
| | - Taeyeon Oh
- Seoul AI School, aSSIST University 6 Ewhayeodae 2-gil, Fintower, Sinchon-ro, Seodaemun-gu Seoul, South Korea, 03767
| | - Guy Ngayo
- Franklin University Switzerland, Via Ponte Tresa 29, 6924 Sorengo, Switzerland
| |
Collapse
|
46
|
Delgado-García G, Engbers JDT, Wiebe S, Mouches P, Amador K, Forkert ND, White J, Sajobi T, Klein KM, Josephson CB. Machine learning using multimodal clinical, electroencephalographic, and magnetic resonance imaging data can predict incident depression in adults with epilepsy: A pilot study. Epilepsia 2023; 64:2781-2791. [PMID: 37455354 DOI: 10.1111/epi.17710] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 06/30/2023] [Accepted: 07/05/2023] [Indexed: 07/18/2023]
Abstract
OBJECTIVE This study was undertaken to develop a multimodal machine learning (ML) approach for predicting incident depression in adults with epilepsy. METHODS We randomly selected 200 patients from the Calgary Comprehensive Epilepsy Program registry and linked their registry-based clinical data to their first-available clinical electroencephalogram (EEG) and magnetic resonance imaging (MRI) study. We excluded patients with a clinical or Neurological Disorders Depression Inventory for Epilepsy (NDDI-E)-based diagnosis of major depression at baseline. The NDDI-E was used to detect incident depression over a median of 2.4 years of follow-up (interquartile range [IQR] = 1.5-3.3 years). A ReliefF algorithm was applied to clinical as well as quantitative EEG and MRI parameters for feature selection. Six ML algorithms were trained and tested using stratified threefold cross-validation. Multiple metrics were used to assess model performances. RESULTS Of 200 patients, 150 had EEG and MRI data of sufficient quality for ML, of whom 59 were excluded due to prevalent depression. Therefore, 91 patients (41 women) were included, with a median age of 29 (IQR = 22-44) years. A total of 42 features were selected by ReliefF, none of which was a quantitative MRI or EEG variable. All models had a sensitivity > 80%, and five of six had an F1 score ≥ .72. A multilayer perceptron model had the highest F1 score (median = .74, IQR = .71-.78) and sensitivity (84.3%). Median area under the receiver operating characteristic curve and normalized Matthews correlation coefficient were .70 (IQR = .64-.78) and .57 (IQR = .50-.65), respectively. SIGNIFICANCE Multimodal ML using baseline features can predict incident depression in this population. Our pilot models demonstrated high accuracy for depression prediction. However, overall performance and calibration can be improved. This model has promise for identifying those at risk for incident depression during follow-up, although efforts to refine it in larger populations along with external validation are required.
Collapse
Affiliation(s)
- Guillermo Delgado-García
- Department of Clinical Neurosciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, Alberta, Canada
| | | | - Samuel Wiebe
- Department of Clinical Neurosciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, Alberta, Canada
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
- O'Brien Institute for Public Health, University of Calgary, Calgary, Alberta, Canada
- Clinical Research Unit, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
| | - Pauline Mouches
- Department of Radiology, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
| | - Kimberly Amador
- Department of Radiology, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
| | - Nils D Forkert
- Department of Clinical Neurosciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, Alberta, Canada
- Department of Radiology, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
| | - James White
- Department of Radiology, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
- Libin Cardiovascular Institute, University of Calgary, Calgary, Alberta, Canada
- Department of Cardiac Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
| | - Tolulope Sajobi
- Department of Clinical Neurosciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, Alberta, Canada
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
- O'Brien Institute for Public Health, University of Calgary, Calgary, Alberta, Canada
| | - Karl Martin Klein
- Department of Clinical Neurosciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, Alberta, Canada
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
- Department of Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Alberta, Canada
| | - Colin B Josephson
- Department of Clinical Neurosciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary, Alberta, Canada
- Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada
- O'Brien Institute for Public Health, University of Calgary, Calgary, Alberta, Canada
- Centre for Health Informatics, University of Calgary, Calgary, Alberta, Canada
| |
Collapse
|
47
|
Eşsiz UE, Yüregir OH, Saraç E. Applying data mining techniques to predict vitamin D deficiency in diabetic patients. Health Informatics J 2023; 29:14604582231214864. [PMID: 37963409 DOI: 10.1177/14604582231214864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2023]
Abstract
Vitamin D is among the vitamins necessary for both adults' and children's health. It plays a significant role in calcium absorption, the immune system, cell proliferation and differentiation, bone protection, skeletal health, rickets, muscle health, heart health, disease pathogenesis and severity, glucose metabolism, glucose intolerance, varying insulin secretion, and diabetes. Because the 25-hydroxyvitamin D (25OHD) test, which is used to measure vitamin D is expensive and may not be covered in healthcare benefits in many countries, this study aims to predict vitamin D deficiency in diabetic patients. The prediction method is based on data mining techniques combined with feature selection by using historical electronic health records. The results were compared with a filter-based feature selection algorithm, namely relief-F. Non-valuable features were eliminated effectively with the relief-F feature selection method without any performance loss in classification. The performances of the methods were evaluated using classification accuracy (ACC), sensitivity, specificity, F1-score, precision, kappa results, and receiver operating characteristic (ROC) curves. The analyses have been conducted on a vitamin D dataset of diabetic patients and the results show that the highest classification accuracy of 97.044% was obtained for the support vector machines (SVM) model using radial kernel that contains 18 features.
Collapse
Affiliation(s)
- Uğur Engin Eşsiz
- Department of Industrial Engineering, Çukurova University, Adana, Turkey
| | - Oya Hacire Yüregir
- Department of Industrial Engineering, Çukurova University, Adana, Turkey
| | - Esra Saraç
- Department of Computer Engineering, Adana Alparslan Türkeş Science and Technology University, Adana, Turkey
| |
Collapse
|
48
|
Tavakoli H, Pirzad Jahromi G, Sedaghat A. Investigating the Ability of Radiomics Features for Diagnosis of the Active Plaque of Multiple Sclerosis Patients. J Biomed Phys Eng 2023; 13:421-432. [PMID: 37868943 PMCID: PMC10589693 DOI: 10.31661/jbpe.v0i0.2302-1597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Accepted: 03/05/2023] [Indexed: 10/24/2023]
Abstract
Background Multiple sclerosis (MS) is the most common non-traumatic disabling disease. Objective The aim of this study is to investigate the ability of radiomics features for diagnosing active plaques in patients with MS from T2 Fluid Attenuated Inversion Recovery (FLAIR) images. Material and Methods In this experimental study, images of 82 patients with 122 MS lesions were investigated. Boruta and Relief algorithms were used for feature selection on the train data set (70%). Four different classifier algorithms, including Multi-Layer Perceptron (MLP), Gradient Boosting (GB), Decision Tree (DT), and Extreme Gradient Boosting (XGB) were used as classifiers for modeling. Finally, Performance metrics were obtained on the test data set (30%) with 1000 bootstrap and 95% confidence intervals (95% CIs). Results A total of 107 radiomics features were extracted for each lesion, of which 7 and 8 features were selected by the Relief method and Boruta method, respectively. DT classifier had the best performance in the two feature selection algorithms. The best performance on the test data set was related to Boruta-DT with an average accuracy of 0.86, sensitivity of 1.00, specificity of 0.84, and Area Under the Curve (AUC) of 0.92 (95% CI: 0.92-0.92). Conclusion Radiomics features have the potential for diagnosing MS active plaque by T2 FLAIR image features. Additionally, choosing the feature selection and classifier algorithms plays an important role in the diagnosis of active plaque in MS patients. The radiomics-based predictive models predict active lesions accurately and non-invasively.
Collapse
Affiliation(s)
- Hassan Tavakoli
- Neuroscience Research Center, Baqiyatallah University of Medical Sciences, Tehran, Iran
- Radiation Injuries Research Center, Baqiyatallah University of Medical Sciences, Tehran, Iran
- Department of Physiology and Biophysics, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Gila Pirzad Jahromi
- Neuroscience Research Center, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Abdolrasoul Sedaghat
- Department of Radiology, Karaj Central Medical Imaging Institute, Karaj, Alborz, Iran
| |
Collapse
|
49
|
Nazari L, Aslan MF, Sabanci K, Ropelewska E. Integrated transcriptomic meta-analysis and comparative artificial intelligence models in maize under biotic stress. Sci Rep 2023; 13:15899. [PMID: 37741865 PMCID: PMC10517993 DOI: 10.1038/s41598-023-42984-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 09/17/2023] [Indexed: 09/25/2023] Open
Abstract
Biotic stress imposed by pathogens, including fungal, bacterial, and viral, can cause heavy damage leading to yield reduction in maize. Therefore, the identification of resistant genes paves the way to the development of disease-resistant cultivars and is essential for reliable production in maize. Identifying different gene expression patterns can deepen our perception of maize resistance to disease. This study includes machine learning and deep learning-based application for classifying genes expressed under normal and biotic stress in maize. Machine learning algorithms used are Naive Bayes (NB), K-Nearest Neighbor (KNN), Ensemble, Support Vector Machine (SVM), and Decision Tree (DT). A Bidirectional Long Short Term Memory (BiLSTM) based network with Recurrent Neural Network (RNN) architecture is proposed for gene classification with deep learning. To increase the performance of these algorithms, feature selection is made from the raw gene features through the Relief feature selection algorithm. The obtained finding indicated the efficacy of BiLSTM over other machine learning algorithms. Some top genes ((S)-beta-macrocarpene synthase, zealexin A1 synthase, polyphenol oxidase I, chloroplastic, pathogenesis-related protein 10, CHY1, chitinase chem 5, barwin, and uncharacterized LOC100273479 were proved to be differentially upregulated under biotic stress condition.
Collapse
Affiliation(s)
- Leyla Nazari
- Crop and Horticultural Science Research Department, Fars Agricultural and Natural Resources Research and Education Center, Agricultural Research, Education and Extension Organization (AREEO), Shiraz, Iran.
| | - Muhammet Fatih Aslan
- Electrical and Electronics Engineering, Karamanoglu Mehmetbey University, Karaman, Turkey
| | - Kadir Sabanci
- Electrical and Electronics Engineering, Karamanoglu Mehmetbey University, Karaman, Turkey
| | - Ewa Ropelewska
- Fruit and Vegetable Storage and Processing Department, The National Institute of Horticultural Research, Skierniewice, Poland
| |
Collapse
|
50
|
Tas NP, Kaya O, Macin G, Tasci B, Dogan S, Tuncer T. ASNET: A Novel AI Framework for Accurate Ankylosing Spondylitis Diagnosis from MRI. Biomedicines 2023; 11:2441. [PMID: 37760882 PMCID: PMC10525210 DOI: 10.3390/biomedicines11092441] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 08/24/2023] [Accepted: 08/29/2023] [Indexed: 09/29/2023] Open
Abstract
BACKGROUND Ankylosing spondylitis (AS) is a chronic, painful, progressive disease usually seen in the spine. Traditional diagnostic methods have limitations in detecting the early stages of AS. The early diagnosis of AS can improve patients' quality of life. This study aims to diagnose AS with a pre-trained hybrid model using magnetic resonance imaging (MRI). MATERIALS AND METHODS In this research, we collected a new MRI dataset comprising three cases. Furthermore, we introduced a novel deep feature engineering model. Within this model, we utilized three renowned pretrained convolutional neural networks (CNNs): DenseNet201, ResNet50, and ShuffleNet. Through these pretrained CNNs, deep features were generated using the transfer learning approach. For each pretrained network, two feature vectors were generated from an MRI. Three feature selectors were employed during the feature selection phase, amplifying the number of features from 6 to 18 (calculated as 6 × 3). The k-nearest neighbors (kNN) classifier was utilized in the classification phase to determine classification results. During the information phase, the iterative majority voting (IMV) algorithm was applied to secure voted results, and our model selected the output with the highest classification accuracy. In this manner, we have introduced a self-organized deep feature engineering model. RESULTS We have applied the presented model to the collected dataset. The proposed method yielded 99.80%, 99.60%, 100%, and 99.80% results for accuracy, recall, precision, and F1-score for the collected axial images dataset. The collected coronal image dataset yielded 99.45%, 99.20%, 99.70%, and 99.45% results for accuracy, recall, precision, and F1-score, respectively. As for contrast-enhanced images, accuracy of 95.62%, recall of 80.72%, precision of 94.24%, and an F1-score of 86.96% were attained. CONCLUSIONS Based on the results, the proposed method for classifying AS disease has demonstrated successful outcomes using MRI. The model has been tested on three cases, and its consistently high classification performance across all cases underscores the model's general robustness. Furthermore, the ability to diagnose AS disease using only axial images, without the need for contrast-enhanced MRI, represents a significant advancement in both healthcare and economic terms.
Collapse
Affiliation(s)
- Nevsun Pihtili Tas
- Department of Physical Medicine and Rehabilitation, Health Sciences University Elazig Fethi Sekin City Hospital, Elazig 23280, Turkey;
| | - Oguz Kaya
- Department of Orthopedics and Traumatology, Elazig Fethi Sekin City Hospital, Elazig 23280, Turkey;
| | - Gulay Macin
- Department of Radiology, Beyhekim Training and Research Hospital, Konya 42060, Turkey;
| | - Burak Tasci
- Vocational School of Technical Sciences, Firat University, Elazig 23119, Turkey;
| | - Sengul Dogan
- Department of Digital Forensics Engineering, College of Technology, Firat University, Elazig 23119, Turkey
| | - Turker Tuncer
- Department of Digital Forensics Engineering, College of Technology, Firat University, Elazig 23119, Turkey
| |
Collapse
|