51
|
Nopour R, Kazemi-Arpanahi H. Developing an intelligent prediction system for successful aging based on artificial neural networks. Int J Prev Med 2024; 15:10. [PMID: 38563039 PMCID: PMC10982733 DOI: 10.4103/ijpvm.ijpvm_47_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2023] [Accepted: 10/04/2023] [Indexed: 04/04/2024] Open
Abstract
Background Due to the growing number of disabilities in elderly, Attention to this period of life is essential to be considered. Few studies focused on the physical, mental, disabilities, and disorders affecting the quality of life in elderly people. SA1 is related to various factors influencing the elderly's life. So, the objective of the current study is to build an intelligent system for SA prediction through ANN2 algorithms to investigate better all factors affecting the elderly life and promote them. Methods This study was performed on 1156 SA and non-SA cases. We applied statistical feature reduction method to obtain the best factors predicting the SA. Two models of ANNs with 5, 10, 15, and 20 neurons in hidden layers were used for model construction. Finally, the best ANN configuration was obtained for predicting the SA using sensitivity, specificity, accuracy, and cross-entropy loss function. Results The study showed that 25 factors correlated with SA at the statistical level of P < 0.05. Assessing all ANN structures resulted in FF-BP3 algorithm having the configuration of 25-15-1 with accuracy-train of 0.92, accuracy-test of 0.86, and accuracy-validation of 0.87 gaining the best performance over other ANN algorithms. Conclusions Developing the CDSS for predicting SA has crucial role to effectively inform geriatrics and health care policymakers decision making.
Collapse
Affiliation(s)
- Raoof Nopour
- Department of Health Information Management, Iran University of Medical Sciences, Tehran, Iran
| | - Hadi Kazemi-Arpanahi
- Department of Health Information Technology, Abadan University of Medical Sciences, Abadan, Iran
| |
Collapse
|
52
|
Lyu H, Huang H, He J, Zhu S, Hong W, Lai J, Gao T, Shao J, Zhu J, Li Y, Hu S. Task-state skin potential abnormalities can distinguish major depressive disorder and bipolar depression from healthy controls. Transl Psychiatry 2024; 14:110. [PMID: 38395985 PMCID: PMC10891315 DOI: 10.1038/s41398-024-02828-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 02/07/2024] [Accepted: 02/13/2024] [Indexed: 02/25/2024] Open
Abstract
Early detection of bipolar depression (BPD) and major depressive disorder (MDD) has been challenging due to the lack of reliable and easily measurable biological markers. This study aimed to investigate the accuracy of discriminating patients with mood disorders from healthy controls based on task state skin potential characteristics and their correlation with individual indicators of oxidative stress. A total of 77 patients with BPD, 53 patients with MDD, and 79 healthy controls were recruited. A custom-made device, previously shown to be sufficiently accurate, was used to collect skin potential data during six emotion-inducing tasks involving video, pictorial, or textual stimuli. Blood indicators reflecting individual levels of oxidative stress were collected. A discriminant model based on the support vector machine (SVM) algorithm was constructed for discriminant analysis. MDD and BPD patients were found to have abnormal skin potential characteristics on most tasks. The accuracy of the SVM model built with SP features to discriminate MDD patients from healthy controls was 78% (sensitivity 78%, specificity 82%). The SVM model gave an accuracy of 59% (sensitivity 59%, specificity 79%) in classifying BPD patients, MDD patients, and healthy controls into three groups. Significant correlations were also found between oxidative stress indicators in the blood of patients and certain SP features. Patients with depression and bipolar depression have abnormalities in task-state skin potential that partially reflect the pathological mechanism of the illness, and the abnormalities are potential biological markers of affective disorders.
Collapse
Affiliation(s)
- Hailong Lyu
- Department of Psychiatry, The First Affiliated Hospital, Zhejiang University School of Medicine; Key Laboratory of Mental Disorder's Management of Zhejiang Province, Hangzhou, 310003, China
- Brain Research Institute of Zhejiang University, Hangzhou, 310003, China
- Zhejiang Engineering Center for Mathematical Mental Health, Hangzhou, 310003, China
| | - Huimin Huang
- The Third Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325200, China
- Ruian People's Hospital, Wenzhou, 325200, China
| | - Jiadong He
- College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, 310027, China
| | - Sheng Zhu
- Department of Psychiatry, The Ruian Fifth People's Hospital, Wenzhou, 325200, China
| | - Wanchu Hong
- Department of Psychiatry, The Ruian Fifth People's Hospital, Wenzhou, 325200, China
| | - Jianbo Lai
- Department of Psychiatry, The First Affiliated Hospital, Zhejiang University School of Medicine; Key Laboratory of Mental Disorder's Management of Zhejiang Province, Hangzhou, 310003, China
- Brain Research Institute of Zhejiang University, Hangzhou, 310003, China
- Zhejiang Engineering Center for Mathematical Mental Health, Hangzhou, 310003, China
| | | | - Jiamin Shao
- Department of Psychiatry, The First Affiliated Hospital, Zhejiang University School of Medicine; Key Laboratory of Mental Disorder's Management of Zhejiang Province, Hangzhou, 310003, China
- Brain Research Institute of Zhejiang University, Hangzhou, 310003, China
- Zhejiang Engineering Center for Mathematical Mental Health, Hangzhou, 310003, China
| | - Jianfeng Zhu
- Department of Psychiatry, The Ruian Fifth People's Hospital, Wenzhou, 325200, China
| | - Yubo Li
- College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, 310027, China.
| | - Shaohua Hu
- Department of Psychiatry, The First Affiliated Hospital, Zhejiang University School of Medicine; Key Laboratory of Mental Disorder's Management of Zhejiang Province, Hangzhou, 310003, China.
- Brain Research Institute of Zhejiang University, Hangzhou, 310003, China.
- Zhejiang Engineering Center for Mathematical Mental Health, Hangzhou, 310003, China.
- Ruian People's Hospital, Wenzhou, 325200, China.
| |
Collapse
|
53
|
Zhou TH, Zhou XX, Ni J, Ma YQ, Xu FY, Fan B, Guan Y, Jiang XA, Lin XQ, Li J, Xia Y, Wang X, Wang Y, Huang WJ, Tu WT, Dong P, Li ZB, Liu SY, Fan L. CT whole lung radiomic nomogram: a potential biomarker for lung function evaluation and identification of COPD. Mil Med Res 2024; 11:14. [PMID: 38374260 PMCID: PMC10877876 DOI: 10.1186/s40779-024-00516-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 01/22/2024] [Indexed: 02/21/2024] Open
Abstract
BACKGROUND Computed tomography (CT) plays a great role in characterizing and quantifying changes in lung structure and function of chronic obstructive pulmonary disease (COPD). This study aimed to explore the performance of CT-based whole lung radiomic in discriminating COPD patients and non-COPD patients. METHODS This retrospective study was performed on 2785 patients who underwent pulmonary function examination in 5 hospitals and were divided into non-COPD group and COPD group. The radiomic features of the whole lung volume were extracted. Least absolute shrinkage and selection operator (LASSO) logistic regression was applied for feature selection and radiomic signature construction. A radiomic nomogram was established by combining the radiomic score and clinical factors. Receiver operating characteristic (ROC) curve analysis and decision curve analysis (DCA) were used to evaluate the predictive performance of the radiomic nomogram in the training, internal validation, and independent external validation cohorts. RESULTS Eighteen radiomic features were collected from the whole lung volume to construct a radiomic model. The area under the curve (AUC) of the radiomic model in the training, internal, and independent external validation cohorts were 0.888 [95% confidence interval (CI) 0.869-0.906], 0.874 (95%CI 0.844-0.904) and 0.846 (95%CI 0.822-0.870), respectively. All were higher than the clinical model (AUC were 0.732, 0.714, and 0.777, respectively, P < 0.001). DCA demonstrated that the nomogram constructed by combining radiomic score, age, sex, height, and smoking status was superior to the clinical factor model. CONCLUSIONS The intuitive nomogram constructed by CT-based whole-lung radiomic has shown good performance and high accuracy in identifying COPD in this multicenter study.
Collapse
Affiliation(s)
- Tao-Hu Zhou
- Department of Radiology, the Second Affiliated Hospital of Naval Medical University, Shanghai, 200003, China
- School of Medical Imaging, Shandong Second Medical University, Weifang, 261053, Shandong, China
| | - Xiu-Xiu Zhou
- Department of Radiology, the Second Affiliated Hospital of Naval Medical University, Shanghai, 200003, China
| | - Jiong Ni
- Department of Radiology, School of Medicine, Tongji Hospital, Tongji University, Shanghai, 200065, China
| | - Yan-Qing Ma
- Department of Radiology, Zhejiang Province People's Hospital, Affiliated People's Hospital of Hangzhou Medical College, Hangzhou, 310014, China
| | - Fang-Yi Xu
- Department of Radiology, Sir Run Run Shaw Hospital, Zhejiang, 310018, China
| | - Bing Fan
- Jiangxi Provincial People's Hospital, the First Affiliated Hospital of Nanchang Medical College, Nanchang, 330006, China
| | - Yu Guan
- Department of Radiology, the Second Affiliated Hospital of Naval Medical University, Shanghai, 200003, China
| | - Xin-Ang Jiang
- Department of Radiology, the Second Affiliated Hospital of Naval Medical University, Shanghai, 200003, China
| | - Xiao-Qing Lin
- Department of Radiology, the Second Affiliated Hospital of Naval Medical University, Shanghai, 200003, China
- College of Health Sciences and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Jie Li
- Department of Radiology, the Second Affiliated Hospital of Naval Medical University, Shanghai, 200003, China
- College of Health Sciences and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Yi Xia
- Department of Radiology, the Second Affiliated Hospital of Naval Medical University, Shanghai, 200003, China
| | - Xiang Wang
- Department of Radiology, the Second Affiliated Hospital of Naval Medical University, Shanghai, 200003, China
| | - Yun Wang
- Department of Radiology, the Second Affiliated Hospital of Naval Medical University, Shanghai, 200003, China
| | - Wen-Jun Huang
- Department of Radiology, the Second Affiliated Hospital of Naval Medical University, Shanghai, 200003, China
- Department of Radiology, the Second People's Hospital of Deyang, Deyang, 618000, Sichuan, China
| | - Wen-Ting Tu
- Department of Radiology, the Second Affiliated Hospital of Naval Medical University, Shanghai, 200003, China
| | - Peng Dong
- School of Medical Imaging, Shandong Second Medical University, Weifang, 261053, Shandong, China
| | - Zhao-Bin Li
- Department of Radiation Oncology, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, 200233, China
| | - Shi-Yuan Liu
- Department of Radiology, the Second Affiliated Hospital of Naval Medical University, Shanghai, 200003, China
| | - Li Fan
- Department of Radiology, the Second Affiliated Hospital of Naval Medical University, Shanghai, 200003, China.
| |
Collapse
|
54
|
Reduwan NH, Aziz AA, Mohd Razi R, Abdullah ERMF, Mazloom Nezhad SM, Gohain M, Ibrahim N. Application of deep learning and feature selection technique on external root resorption identification on CBCT images. BMC Oral Health 2024; 24:252. [PMID: 38373931 PMCID: PMC10875886 DOI: 10.1186/s12903-024-03910-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 01/17/2024] [Indexed: 02/21/2024] Open
Abstract
BACKGROUND Artificial intelligence has been proven to improve the identification of various maxillofacial lesions. The aim of the current study is two-fold: to assess the performance of four deep learning models (DLM) in external root resorption (ERR) identification and to assess the effect of combining feature selection technique (FST) with DLM on their ability in ERR identification. METHODS External root resorption was simulated on 88 extracted premolar teeth using tungsten bur in different depths (0.5 mm, 1 mm, and 2 mm). All teeth were scanned using a Cone beam CT (Carestream Dental, Atlanta, GA). Afterward, a training (70%), validation (10%), and test (20%) dataset were established. The performance of four DLMs including Random Forest (RF) + Visual Geometry Group 16 (VGG), RF + EfficienNetB4 (EFNET), Support Vector Machine (SVM) + VGG, and SVM + EFNET) and four hybrid models (DLM + FST: (i) FS + RF + VGG, (ii) FS + RF + EFNET, (iii) FS + SVM + VGG and (iv) FS + SVM + EFNET) was compared. Five performance parameters were assessed: classification accuracy, F1-score, precision, specificity, and error rate. FST algorithms (Boruta and Recursive Feature Selection) were combined with the DLMs to assess their performance. RESULTS RF + VGG exhibited the highest performance in identifying ERR, followed by the other tested models. Similarly, FST combined with RF + VGG outperformed other models with classification accuracy, F1-score, precision, and specificity of 81.9%, weighted accuracy of 83%, and area under the curve (AUC) of 96%. Kruskal Wallis test revealed a significant difference (p = 0.008) in the prediction accuracy among the eight DLMs. CONCLUSION In general, all DLMs have similar performance on ERR identification. However, the performance can be improved by combining FST with DLMs.
Collapse
Affiliation(s)
- Nor Hidayah Reduwan
- Department of Oral and Maxillofacial Clinical Sciences, Faculty of Dentistry, Universiti Malaya, Kuala Lumpur, 50603, Malaysia
- Centre of Oral and Maxillofacial Diagnostic and Medicine Studies, Faculty of Dentistry, University Teknologi MARA, Sungai Buloh, 47000, Malaysia
| | - Azwatee Abdul Aziz
- Department of Restorative Dentistry, Faculty of Dentistry, Universiti Malaya, Kuala Lumpur, 50603, Malaysia
| | - Roziana Mohd Razi
- Department of Pediatric Dentistry and Orthodontic, Faculty of Dentistry, Universiti Malaya, Kuala Lumpur, 50603, Malaysia
| | - Erma Rahayu Mohd Faizal Abdullah
- Department of Artificial Intelligence, Faculty of Computer Science and Information Technology, Universiti Malaya, Kuala Lumpur, 50603, Malaysia.
| | - Seyed Matin Mazloom Nezhad
- Department of Artificial Intelligence, Faculty of Computer Science and Information Technology, Universiti Malaya, Kuala Lumpur, 50603, Malaysia
| | - Meghna Gohain
- Department of Oral and Maxillofacial Clinical Sciences, Faculty of Dentistry, Universiti Malaya, Kuala Lumpur, 50603, Malaysia
| | - Norliza Ibrahim
- Department of Oral and Maxillofacial Clinical Sciences, Faculty of Dentistry, Universiti Malaya, Kuala Lumpur, 50603, Malaysia.
| |
Collapse
|
55
|
Zhang H, Yin J, Zhou C, Qiu J, Wang J, Lv Q, Luo T. Identification of ipsilateral supraclavicular lymph node metastasis in breast cancer based on LASSO regression with a high penalty factor. Front Oncol 2024; 14:1349315. [PMID: 38371618 PMCID: PMC10869533 DOI: 10.3389/fonc.2024.1349315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 01/19/2024] [Indexed: 02/20/2024] Open
Abstract
Aiming at the problems of small sample size and large feature dimension in the identification of ipsilateral supraclavicular lymph node metastasis status in breast cancer using ultrasound radiomics, an optimized feature combination search algorithm is proposed to construct linear classification models with high interpretability. The genetic algorithm (GA) is used to search for feature combinations within the feature subspace using least absolute shrinkage and selection operator (LASSO) regression. The search is optimized by applying a high penalty to the L1 norm of LASSO to retain excellent features in the crossover operation of the GA. The experimental results show that the linear model constructed using this method outperforms those using the conventional LASSO regression and standard GA. Therefore, this method can be used to build linear models with higher classification performance and more robustness.
Collapse
Affiliation(s)
- Haohan Zhang
- West China Hospital, Sichuan University, Chengdu, China
| | - Jin Yin
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China
- Med-X Center for Informatics, Sichuan University, Chengdu, China
- Division of Breast Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, China
| | - Chen Zhou
- Division of Breast Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, China
- Breast Center, West China Hospital, Sichuan University, Chengdu, China
- Clinical Research Center for Breast Diseases, West China Hospital, Sichuan University, Chengdu, China
| | - Jiajun Qiu
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China
- Med-X Center for Informatics, Sichuan University, Chengdu, China
- Division of Breast Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, China
| | - Junren Wang
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China
- Med-X Center for Informatics, Sichuan University, Chengdu, China
| | - Qing Lv
- Division of Breast Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, China
- Breast Center, West China Hospital, Sichuan University, Chengdu, China
- Clinical Research Center for Breast Diseases, West China Hospital, Sichuan University, Chengdu, China
| | - Ting Luo
- Breast Center, West China Hospital, Sichuan University, Chengdu, China
- Department of Medical Oncology, Cancer Center, West China Hospital, Sichuan University, Chengdu, China
| |
Collapse
|
56
|
Jenul A, Stokmo HL, Schrunner S, Hjortland GO, Revheim ME, Tomic O. Novel ensemble feature selection techniques applied to high-grade gastroenteropancreatic neuroendocrine neoplasms for the prediction of survival. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 244:107934. [PMID: 38016391 DOI: 10.1016/j.cmpb.2023.107934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 09/05/2023] [Accepted: 11/17/2023] [Indexed: 11/30/2023]
Abstract
BACKGROUND AND OBJECTIVE Determining the most informative features for predicting the overall survival of patients diagnosed with high-grade gastroenteropancreatic neuroendocrine neoplasms is crucial to improve individual treatment plans for patients, as well as the biological understanding of the disease. The main objective of this study is to evaluate the use of modern ensemble feature selection techniques for this purpose with respect to (a) quantitative performance measures such as predictive performance, (b) clinical interpretability, and (c) the effect of integrating prior expert knowledge. METHODS The Repeated Elastic Net Technique for Feature Selection (RENT) and the User-Guided Bayesian Framework for Feature Selection (UBayFS) are recently developed ensemble feature selectors investigated in this work. Both allow the user to identify informative features in datasets with low sample sizes and focus on model interpretability. While RENT is purely data-driven, UBayFS can integrate expert knowledge a priori in the feature selection process. In this work, we compare both feature selectors on a dataset comprising 63 patients and 110 features from multiple sources, including baseline patient characteristics, baseline blood values, tumor histology, imaging, and treatment information. RESULTS Our experiments involve data-driven and expert-driven setups, as well as combinations of both. In a five-fold cross-validated experiment without expert knowledge, our results demonstrate that both feature selectors allow accurate predictions: A reduction from 110 to approximately 20 features (around 82%) delivers near-optimal predictive performances with minor variations according to the choice of the feature selector, the predictive model, and the fold. Thereafter, we use findings from clinical literature as a source of expert knowledge. In addition, expert knowledge has a stabilizing effect on the feature set (an increase in stability of approximately 40%), while the impact on predictive performance is limited. CONCLUSIONS The features WHO Performance Status, Albumin, Platelets, Ki-67, Tumor Morphology, Total MTV, Total TLG, and SUVmax are the most stable and predictive features in our study. Overall, this study demonstrated the practical value of feature selection in medical applications not only to improve quantitative performance but also to deliver potentially new insights to experts.
Collapse
Affiliation(s)
- Anna Jenul
- Department of Data Science, Norwegian University of Life Sciences, Universitetstunet 3, 1433 Ås, Norway.
| | - Henning Langen Stokmo
- Department of Nuclear Medicine, Division of Radiology and Nuclear Medicine, Oslo University Hospital, Oslo, Norway; Institute of Clinical Medicine, University of Oslo, Oslo, Norway.
| | - Stefan Schrunner
- Department of Data Science, Norwegian University of Life Sciences, Universitetstunet 3, 1433 Ås, Norway.
| | | | - Mona-Elisabeth Revheim
- Department of Nuclear Medicine, Division of Radiology and Nuclear Medicine, Oslo University Hospital, Oslo, Norway; Institute of Clinical Medicine, University of Oslo, Oslo, Norway; The Intervention Centre, Division of Technology and Innovation, Oslo University Hospital, Oslo, Norway.
| | - Oliver Tomic
- Department of Data Science, Norwegian University of Life Sciences, Universitetstunet 3, 1433 Ås, Norway.
| |
Collapse
|
57
|
Mehari T, Sundar A, Bosnjakovic A, Harris P, Williams SE, Loewe A, Doessel O, Nagel C, Strodthoff N, Aston PJ. ECG Feature Importance Rankings: Cardiologists vs. Algorithms. IEEE J Biomed Health Inform 2024; PP:2014-2024. [PMID: 38227406 DOI: 10.1109/jbhi.2024.3354301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2024]
Abstract
Feature importance methods promise to provide a ranking of features according to importance for a given classification task. A wide range of methods exist but their rankings often disagree and they are inherently difficult to evaluate due to a lack of ground truth beyond synthetic datasets. In this work, we put feature importance methods to the test on real-world data in the domain of cardiology, where we try to distinguish three specific pathologies from healthy subjects based on ECG features comparing to features used in cardiologists' decision rules as ground truth. We found that the SHAP and LIME methods and Chi-squared test all worked well together with the native Random forest and Logistic regression feature rankings. Some methods gave inconsistent results, which included the Maximum Relevance Minimum Redundancy and Neighbourhood Component Analysis methods. The permutation-based methods generally performed quite poorly. A surprising result was found in the case of left bundle branch block, where T-wave morphology features were consistently identified as being important for diagnosis, but are not used by clinicians.
Collapse
|
58
|
Aragones DG, Palomino-Segura M, Sicilia J, Crainiciuc G, Ballesteros I, Sánchez-Cabo F, Hidalgo A, Calvo GF. Variable selection for nonlinear dimensionality reduction of biological datasets through bootstrapping of correlation networks. Comput Biol Med 2024; 168:107827. [PMID: 38086138 DOI: 10.1016/j.compbiomed.2023.107827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 11/15/2023] [Accepted: 12/04/2023] [Indexed: 01/10/2024]
Abstract
Identifying the most relevant variables or features in massive datasets for dimensionality reduction can lead to improved and more informative display, faster computation times, and more explainable models of complex systems. Despite significant advances and available algorithms, this task generally remains challenging, especially in unsupervised settings. In this work, we propose a method that constructs correlation networks using all intervening variables and then selects the most informative ones based on network bootstrapping. The method can be applied in both supervised and unsupervised scenarios. We demonstrate its functionality by applying Uniform Manifold Approximation and Projection for dimensionality reduction to several high-dimensional biological datasets, derived from 4D live imaging recordings of hundreds of morpho-kinetic variables, describing the dynamics of thousands of individual leukocytes at sites of prominent inflammation. We compare our method with other standard ones in the field, such as Principal Component Analysis and Elastic Net, showing that it outperforms them. The proposed method can be employed in a wide range of applications, encompassing data analysis and machine learning.
Collapse
Affiliation(s)
- David G Aragones
- Department of Mathematics & MOLAB-Mathematical Oncology Laboratory, Universidad de Castilla-La Mancha, Ciudad Real, Spain
| | - Miguel Palomino-Segura
- Area of Cell and Developmental Biology, Centro Nacional de Investigaciones Cardiovasculares Carlos III, Madrid, Spain; Immunophysiology Research Group, Instituto Universitario de Investigación Biosanitaria de Extremadura (INUBE), Badajoz, Spain; Department of Physiology, Faculty of Sciences, University of Extremadura, Badajoz, Spain
| | - Jon Sicilia
- Area of Cell and Developmental Biology, Centro Nacional de Investigaciones Cardiovasculares Carlos III, Madrid, Spain
| | - Georgiana Crainiciuc
- Area of Cell and Developmental Biology, Centro Nacional de Investigaciones Cardiovasculares Carlos III, Madrid, Spain
| | - Iván Ballesteros
- Area of Cell and Developmental Biology, Centro Nacional de Investigaciones Cardiovasculares Carlos III, Madrid, Spain
| | - Fátima Sánchez-Cabo
- Bioinformatics Unit, Centro Nacional de Investigaciones Cardiovasculares Carlos III, Madrid, Spain
| | - Andrés Hidalgo
- Vascular Biology and Therapeutics Program and Department of Immunobiology, Yale University School of Medicine, New Haven, CT, USA
| | - Gabriel F Calvo
- Department of Mathematics & MOLAB-Mathematical Oncology Laboratory, Universidad de Castilla-La Mancha, Ciudad Real, Spain.
| |
Collapse
|
59
|
Ding X, Li Y, Chen S. Maximum margin and global criterion based-recursive feature selection. Neural Netw 2024; 169:597-606. [PMID: 37956576 DOI: 10.1016/j.neunet.2023.10.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 06/19/2023] [Accepted: 10/22/2023] [Indexed: 11/15/2023]
Abstract
In this research paper, we aim to investigate and address the limitations of recursive feature elimination (RFE) and its variants in high-dimensional feature selection tasks. We identify two main challenges associated with these methods. Firstly, the feature ranking criterion utilized in these approaches is inconsistent with the maximum-margin theory. Secondly, the computation of the criterion is performed locally, lacking the ability to measure the importance of features globally. To overcome these challenges, we propose a novel feature ranking criterion called Maximum Margin and Global (MMG) criterion. This criterion utilizes the classification margin to determine the importance of features and computes it globally, enabling a more accurate assessment of feature importance. Moreover, we introduce an optimal feature subset evaluation algorithm that leverages the MMG criterion to determine the best subset of features. To enhance the efficiency of the proposed algorithms, we provide two alpha seeding strategies that significantly reduce computational costs while maintaining high accuracy. These strategies offer a practical means to expedite the feature selection process. Through extensive experiments conducted on ten benchmark datasets, we demonstrate that our proposed algorithms outperform current state-of-the-art methods. Additionally, the alpha seeding strategies yield significant speedups, further enhancing the efficiency of the feature selection process.
Collapse
Affiliation(s)
- Xiaojian Ding
- College of Information Engineering, Nanjing University of Finance and Economics, Nanjing 210023, China.
| | - Yi Li
- College of Economics and Management, Nanjing Agricultural University, Nanjing 210095, China
| | - Shilin Chen
- Thoracic Surgery, Nanjing Medical University Affiliated Cancer Hospital, Jiangsu Cancer Hospital, Jiangsu Institute of Cancer Research, Nanjing 221005, China
| |
Collapse
|
60
|
Williams TL, Gonen M, Wray R, Do RKG, Simpson AL. Quantitation of Oncologic Image Features for Radiomic Analyses in PET. Methods Mol Biol 2024; 2729:409-421. [PMID: 38006509 DOI: 10.1007/978-1-0716-3499-8_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2023]
Abstract
Radiomics is an emerging and exciting field of study involving the extraction of many quantitative features from radiographic images. Positron emission tomography (PET) images are used in cancer diagnosis and staging. Utilizing radiomics on PET images can better quantify the spatial relationships between image voxels and generate more consistent and accurate results for diagnosis, prognosis, treatment, etc. This chapter gives the general steps a researcher would take to extract PET radiomic features from medical images and properly develop models to implement.
Collapse
Affiliation(s)
- Travis L Williams
- Department of Epidemiology & Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Mithat Gonen
- Department of Epidemiology & Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Rick Wray
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Richard K G Do
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Amber L Simpson
- School of Computing and Department of Biomedical and Molecular Sciences, Queen's University, Kingston, ON, Canada.
| |
Collapse
|
61
|
Arian R, Vard A, Kafieh R, Plonka G, Rabbani H. A new convolutional neural network based on combination of circlets and wavelets for macular OCT classification. Sci Rep 2023; 13:22582. [PMID: 38114582 PMCID: PMC10730902 DOI: 10.1038/s41598-023-50164-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Accepted: 12/15/2023] [Indexed: 12/21/2023] Open
Abstract
Artificial intelligence (AI) algorithms, encompassing machine learning and deep learning, can assist ophthalmologists in early detection of various ocular abnormalities through the analysis of retinal optical coherence tomography (OCT) images. Despite considerable progress in these algorithms, several limitations persist in medical imaging fields, where a lack of data is a common issue. Accordingly, specific image processing techniques, such as time-frequency transforms, can be employed in conjunction with AI algorithms to enhance diagnostic accuracy. This research investigates the influence of non-data-adaptive time-frequency transforms, specifically X-lets, on the classification of OCT B-scans. For this purpose, each B-scan was transformed using every considered X-let individually, and all the sub-bands were utilized as the input for a designed 2D Convolutional Neural Network (CNN) to extract optimal features, which were subsequently fed to the classifiers. Evaluating per-class accuracy shows that the use of the 2D Discrete Wavelet Transform (2D-DWT) yields superior outcomes for normal cases, whereas the circlet transform outperforms other X-lets for abnormal cases characterized by circles in their retinal structure (due to the accumulation of fluid). As a result, we propose a novel transform named CircWave by concatenating all sub-bands from the 2D-DWT and the circlet transform. The objective is to enhance the per-class accuracy of both normal and abnormal cases simultaneously. Our findings show that classification results based on the CircWave transform outperform those derived from original images or any individual transform. Furthermore, Grad-CAM class activation visualization for B-scans reconstructed from CircWave sub-bands highlights a greater emphasis on circular formations in abnormal cases and straight lines in normal cases, in contrast to the focus on irrelevant regions in original B-scans. To assess the generalizability of our method, we applied it to another dataset obtained from a different imaging system. We achieved promising accuracies of 94.5% and 90% for the first and second datasets, respectively, which are comparable with results from previous studies. The proposed CNN based on CircWave sub-bands (i.e. CircWaveNet) not only produces superior outcomes but also offers more interpretable results with a heightened focus on features crucial for ophthalmologists.
Collapse
Affiliation(s)
- Roya Arian
- Department of Bioelectrics and Biomedical Engineering, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, 81746-73461, Iran
- Medical Image and Signal Processing Research Center, Isfahan University of Medical Sciences, Isfahan, 81746-73461, Iran
| | - Alireza Vard
- Department of Bioelectrics and Biomedical Engineering, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, 81746-73461, Iran
| | - Rahele Kafieh
- Department of Engineering, Durham University, South Road, Durham, UK
| | - Gerlind Plonka
- Institute for Numerical and Applied Mathematics, University of Göttingen, Lotzestr. 16-18, 37083, Göttingen, Germany
| | - Hossein Rabbani
- Medical Image and Signal Processing Research Center, Isfahan University of Medical Sciences, Isfahan, 81746-73461, Iran.
| |
Collapse
|
62
|
Augustine J, Jereesh AS. Identification of gene-level methylation for disease prediction. Interdiscip Sci 2023; 15:678-695. [PMID: 37603212 DOI: 10.1007/s12539-023-00584-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 07/30/2023] [Accepted: 08/01/2023] [Indexed: 08/22/2023]
Abstract
DNA methylation is an epigenetic alteration that plays a fundamental part in governing gene regulatory processes. The DNA methylation mechanism affixes methyl groups to distinct cytosine residues, influencing chromatin architectures. Multiple studies have demonstrated that DNA methylation's regulatory effect on genes is linked to the beginning and progression of several disorders. Researchers have recently uncovered thousands of phenotype-related methylation sites through the epigenome-wide association study (EWAS). However, combining the methylation levels of several sites within a gene and determining the gene-level DNA methylation remains challenging. In this study, we proposed the supervised UMAP Assisted Gene-level Methylation method (sUAGM) for disease prediction based on supervised UMAP (Uniform Manifold Approximation and Projection), a manifold learning-based method for reducing dimensionality. The methylation values at the gene level generated using the proposed method are evaluated by employing various feature selection and classification algorithms on three distinct DNA methylation datasets derived from blood samples. The performance has been assessed employing classification accuracy, F-1 score, Mathews Correlation Coefficient (MCC), Kappa, Classification Success Index (CSI) and Jaccard Index. The Support Vector Machine with the linear kernel (SVML) classifier with Recursive Feature Elimination (RFE) performs best across all three datasets. From comparative analysis, our method outperformed existing gene-level and site-level approaches by achieving 100% accuracy and F1-score with fewer genes. The functional analysis of the top 28 genes selected from the Parkinson's disease dataset revealed a significant association with the disease.
Collapse
Affiliation(s)
- Jisha Augustine
- Bioinformatics Lab, Department of Computer Science, Cochin University of Science and Technology, Cochin, Kerala, 682022, India.
| | - A S Jereesh
- Bioinformatics Lab, Department of Computer Science, Cochin University of Science and Technology, Cochin, Kerala, 682022, India
| |
Collapse
|
63
|
Chatterjee A, Pahari N, Prinz A, Riegler M. AI and semantic ontology for personalized activity eCoaching in healthy lifestyle recommendations: a meta-heuristic approach. BMC Med Inform Decis Mak 2023; 23:278. [PMID: 38041041 PMCID: PMC10693173 DOI: 10.1186/s12911-023-02364-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 11/03/2023] [Indexed: 12/03/2023] Open
Abstract
BACKGROUND Automated coaches (eCoach) can help people lead a healthy lifestyle (e.g., reduction of sedentary bouts) with continuous health status monitoring and personalized recommendation generation with artificial intelligence (AI). Semantic ontology can play a crucial role in knowledge representation, data integration, and information retrieval. METHODS This study proposes a semantic ontology model to annotate the AI predictions, forecasting outcomes, and personal preferences to conceptualize a personalized recommendation generation model with a hybrid approach. This study considers a mixed activity projection method that takes individual activity insights from the univariate time-series prediction and ensemble multi-class classification approaches. We have introduced a way to improve the prediction result with a residual error minimization (REM) technique and make it meaningful in recommendation presentation with a Naïve-based interval prediction approach. We have integrated the activity prediction results in an ontology for semantic interpretation. A SPARQL query protocol and RDF Query Language (SPARQL) have generated personalized recommendations in an understandable format. Moreover, we have evaluated the performance of the time-series prediction and classification models against standard metrics on both imbalanced and balanced public PMData and private MOX2-5 activity datasets. We have used Adaptive Synthetic (ADASYN) to generate synthetic data from the minority classes to avoid bias. The activity datasets were collected from healthy adults (n = 16 for public datasets; n = 15 for private datasets). The standard ensemble algorithms have been used to investigate the possibility of classifying daily physical activity levels into the following activity classes: sedentary (0), low active (1), active (2), highly active (3), and rigorous active (4). The daily step count, low physical activity (LPA), medium physical activity (MPA), and vigorous physical activity (VPA) serve as input for the classification models. Subsequently, we re-verify the classifiers on the private MOX2-5 dataset. The performance of the ontology has been assessed with reasoning and SPARQL query execution time. Additionally, we have verified our ontology for effective recommendation generation. RESULTS We have tested several standard AI algorithms and selected the best-performing model with optimized configuration for our use case by empirical testing. We have found that the autoregression model with the REM method outperforms the autoregression model without the REM method for both datasets. Gradient Boost (GB) classifier outperforms other classifiers with a mean accuracy score of 98.00%, and 99.00% for imbalanced PMData and MOX2-5 datasets, respectively, and 98.30%, and 99.80% for balanced PMData and MOX2-5 datasets, respectively. Hermit reasoner performs better than other ontology reasoners under defined settings. Our proposed algorithm shows a direction to combine the AI prediction forecasting results in an ontology to generate personalized activity recommendations in eCoaching. CONCLUSION The proposed method combining step-prediction, activity-level classification techniques, and personal preference information with semantic rules is an asset for generating personalized recommendations.
Collapse
Affiliation(s)
- Ayan Chatterjee
- Department of Information and Communication Technology, Centre for E-Health, University of Agder, Grimstad, Norway.
- Department of Holistic Systems, Simula Metropolitan Center for Digital Engineering (SimulaMet), Oslo, Norway.
| | - Nibedita Pahari
- Department of Computer Science and Engineering, Maulana Abul Kalam Azad University of Technology, Kolkata, India
| | - Andreas Prinz
- Department of Information and Communication Technology, Centre for E-Health, University of Agder, Grimstad, Norway
| | - Michael Riegler
- Department of Holistic Systems, Simula Metropolitan Center for Digital Engineering (SimulaMet), Oslo, Norway
| |
Collapse
|
64
|
Huang MW, Tsai CF, Tsui SC, Lin WC. Combining data discretization and missing value imputation for incomplete medical datasets. PLoS One 2023; 18:e0295032. [PMID: 38033140 PMCID: PMC10688879 DOI: 10.1371/journal.pone.0295032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Accepted: 11/14/2023] [Indexed: 12/02/2023] Open
Abstract
Data discretization aims to transform a set of continuous features into discrete features, thus simplifying the representation of information and making it easier to understand, use, and explain. In practice, users can take advantage of the discretization process to improve knowledge discovery and data analysis on medical domain problem datasets containing continuous features. However, certain feature values were frequently missing. Many data-mining algorithms cannot handle incomplete datasets. In this study, we considered the use of both discretization and missing-value imputation to process incomplete medical datasets, examining how the order of discretization and missing-value imputation combined influenced performance. The experimental results were obtained using seven different medical domain problem datasets: two discretizers, including the minimum description length principle (MDLP) and ChiMerge; three imputation methods, including the mean/mode, classification and regression tree (CART), and k-nearest neighbor (KNN) methods; and two classifiers, including support vector machines (SVM) and the C4.5 decision tree. The results show that a better performance can be obtained by first performing discretization followed by imputation, rather than vice versa. Furthermore, the highest classification accuracy rate was achieved by combining ChiMerge and KNN with SVM.
Collapse
Affiliation(s)
- Min-Wei Huang
- Kaohsiung Municipal Kai-Syuan Psychiatric Hospital, Kaohsiung, Taiwan
- Department of Physical Therapy and Graduate Institute of Rehabilitation Science, China Medical University, Taichung, Taiwan
| | - Chih-Fong Tsai
- Department of Information Management, National Central University, Taoyuan, Taiwan
| | - Shu-Ching Tsui
- Department of Information Management, National Central University, Taoyuan, Taiwan
| | - Wei-Chao Lin
- Department of Digital Financial Technology, Chang Gung University, Taoyuan, Taiwan
- Department of Information Management, Chang Gung University, Taoyuan, Taiwan
- Division of Thoracic Surgery, Chang Gung Memorial Hospital at Linkou, Taoyuan, Taiwan
| |
Collapse
|
65
|
van Dartel D, Wang Y, Hegeman JH, Vollenbroek-Hutten MMR. Prediction of Physical Activity Patterns in Older Patients Rehabilitating After Hip Fracture Surgery: Exploratory Study. JMIR Rehabil Assist Technol 2023; 10:e45307. [PMID: 38032703 PMCID: PMC10727481 DOI: 10.2196/45307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 06/25/2023] [Accepted: 07/27/2023] [Indexed: 12/01/2023] Open
Abstract
BACKGROUND Building up physical activity is a highly important aspect in an older patient's rehabilitation process after hip fracture surgery. The patterns of physical activity during rehabilitation are associated with the duration of rehabilitation stay. Predicting physical activity patterns early in the rehabilitation phase can provide patients and health care professionals an early indication of the duration of rehabilitation stay as well as insight into the degree of patients' recovery for timely adaptive interventions. OBJECTIVE This study aims to explore the early prediction of physical activity patterns in older patients rehabilitating after hip fracture surgery at a skilled nursing home. METHODS The physical activity of patients aged ≥70 years with surgically treated hip fracture was continuously monitored using an accelerometer during rehabilitation at a skilled nursing home. Physical activity patterns were described in our previous study, and the 2 most common patterns were used in this study for pattern prediction: the upward linear pattern (n=15) and the S-shape pattern (n=23). Features from the intensity of physical activity were calculated for time windows with different window sizes of the first 5, 6, 7, and 8 days to assess the early rehabilitation moment in which the patterns could be predicted most accurately. Those features were statistical features, amplitude features, and morphological features. Furthermore, the Barthel Index, Fracture Mobility Score, Functional Ambulation Categories, and the Montreal Cognitive Assessment score were used as clinical features. With the correlation-based feature selection method, relevant features were selected that were highly correlated with the physical activity patterns and uncorrelated with other features. Multiple classifiers were used: decision trees, discriminant analysis, logistic regression, support vector machines, nearest neighbors, and ensemble classifiers. The performance of the prediction models was assessed by calculating precision, recall, and F1-score (accuracy measure) for each individual physical activity pattern. Furthermore, the overall performance of the prediction model was calculated by calculating the F1-score for all physical activity patterns together. RESULTS The amplitude feature describing the overall intensity of physical activity on the first day of rehabilitation and the morphological features describing the shape of the patterns were selected as relevant features for all time windows. Relevant features extracted from the first 7 days with a cosine k-nearest neighbor model reached the highest overall prediction performance (micro F1-score=1) and a 100% correct classification of the 2 most common physical activity patterns. CONCLUSIONS Continuous monitoring of the physical activity of older patients in the first week of hip fracture rehabilitation results in an early physical activity pattern prediction. In the future, continuous physical activity monitoring can offer the possibility to predict the duration of rehabilitation stay, assess the recovery progress during hip fracture rehabilitation, and benefit health care organizations, health care professionals, and patients themselves.
Collapse
Affiliation(s)
- Dieuwke van Dartel
- Department of Biomedical Signals and Systems, University of Twente, Enschede, Netherlands
- Department of Trauma Surgery, Ziekenhuisgroep Twente, Almelo, Netherlands
| | - Ying Wang
- Department of Biomedical Signals and Systems, University of Twente, Enschede, Netherlands
- Ziekenhuisgroep Twente Academy, Ziekenhuisgroep Twente, Almelo, Netherlands
| | - Johannes H Hegeman
- Department of Biomedical Signals and Systems, University of Twente, Enschede, Netherlands
- Department of Trauma Surgery, Ziekenhuisgroep Twente, Almelo, Netherlands
| | - Miriam M R Vollenbroek-Hutten
- Department of Biomedical Signals and Systems, University of Twente, Enschede, Netherlands
- Board of Directors, Medisch Spectrum Twente, Enschede, Netherlands
| |
Collapse
|
66
|
Alghushairy O, Ali F, Alghamdi W, Khalid M, Alsini R, Asiry O. Machine learning-based model for accurate identification of druggable proteins using light extreme gradient boosting. J Biomol Struct Dyn 2023; 42:12330-12341. [PMID: 37850427 DOI: 10.1080/07391102.2023.2269280] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 10/04/2023] [Indexed: 10/19/2023]
Abstract
The identification of druggable proteins (DPs) is significant for the development of new drugs, personalized medicine, understanding of disease mechanisms, drug repurposing, and economic benefits. By identifying new druggable targets, researchers can develop new therapies for a range of diseases, leading to better patient outcomes. Identification of DPs by machine learning strategies is more efficient and cost-effective than conventional methods. In this study, a computational predictor, namely Drug-LXGB, is introduced to enhance the identification of DPs. Features are discovered by composition, transition, and distribution (CTD), composition of K-spaced amino acid pair (CKSAAP), pseudo-position-specific scoring matrix (PsePSSM), and a novel descriptor, called multi-block pseudo amino acid composition (MB-PseAAC). The dimensions of CTD, CKSAAP, PsePSSM, and MB-PseAAC are integrated and utilized the sequential forward selection as feature selection algorithm. The best characteristics are provided by random forest, extreme gradient boosting, and light eXtreme gradient boosting (LXGB). The predictive analysis of these learning methods is measured via 10-fold cross-validation. The LXGB-based model secures the highest results than other existing predictors. Our novel protocol will perform an active role in designing novel drugs and would be fruitful to explore the potential target. This study will help better to capture a more universal view of a potential target.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Omar Alghushairy
- Department of Information Systems and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | - Farman Ali
- Department of Software Engineering, Sarhad University of Science and Information Technology Peshawar Mardan Campus, Peshawar, Pakistan
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Majdi Khalid
- Department of Computer Science, College of Computers and Information Systems, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Raed Alsini
- Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Othman Asiry
- Department of Information Technology, College of Computing and Information Technology at Khulais, University of Jeddah, Jeddah, Saudi Arabia
| |
Collapse
|
67
|
de Lima Gonçalves V, Ribeiro CT, Cavalheiro GL, Zaruz MJF, da Silva DH, Milagre ST, de Oliveira Andrade A, Pereira AA. A hybrid linear discriminant analysis and genetic algorithm to create a linear model of aging when performing motor tasks through inertial sensors positioned on the hand and forearm. Biomed Eng Online 2023; 22:98. [PMID: 37845723 PMCID: PMC10580547 DOI: 10.1186/s12938-023-01161-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 10/01/2023] [Indexed: 10/18/2023] Open
Abstract
BACKGROUND During the aging process, cognitive functions and performance of the muscular and neural system show signs of decline, thus making the elderly more susceptible to disease and death. These alterations, which occur with advanced age, affect functional performance in both the lower and upper members, and consequently human motor functions. Objective measurements are important tools to help understand and characterize the dysfunctions and limitations that occur due to neuromuscular changes related to advancing age. Therefore, the objective of this study is to attest to the difference between groups of young and old individuals through manual movements and whether the combination of features can produce a linear correlation concerning the different age groups. METHODS This study counted on 99 participants, these were divided into 8 groups, which were grouped by age. The data collection was performed using inertial sensors (positioned on the back of the hand and on the back of the forearm). Firstly, the participants were divided into groups of young and elderly to verify if the groups could be distinguished through the features alone. Following this, the features were combined using the linear discriminant analysis (LDA), which gave rise to a singular feature called the LDA-value that aided in verifying the correlation between the different age ranges and the LDA-value. RESULTS The results demonstrated that 125 features are able to distinguish the difference between the groups of young and elderly individuals. The use of the LDA-value allows for the obtaining of a linear model of the changes that occur with aging in the performance of tasks in line with advancing age, the correlation obtained, using Pearson's coefficient, was 0.86. CONCLUSION When we compare only the young and elderly groups, the results indicate that there is a difference in the way tasks are performed between young and elderly individuals. When the 8 groups were analyzed, the linear correlation obtained was strong, with the LDA-value being effective in obtaining a linear correlation of the eight groups, demonstrating that although the features alone do not demonstrate gradual changes as a function of age, their combination established these changes.
Collapse
Affiliation(s)
- Veronica de Lima Gonçalves
- Postgraduate Program in Electrical and Biomedical Engineering, Faculty of Electrical Engineering, Centre for Innovation and Technology Assessment in Health, Federal University of Uberlândia, Uberlândia, Brazil
| | - Caio Tonus Ribeiro
- Postgraduate Program in Electrical and Biomedical Engineering, Faculty of Electrical Engineering, Centre for Innovation and Technology Assessment in Health, Federal University of Uberlândia, Uberlândia, Brazil
| | - Guilherme Lopes Cavalheiro
- Postgraduate Program in Electrical and Biomedical Engineering, Faculty of Electrical Engineering, Centre for Innovation and Technology Assessment in Health, Federal University of Uberlândia, Uberlândia, Brazil
| | - Maria José Ferreira Zaruz
- Postgraduate Program in Electrical and Biomedical Engineering, Faculty of Electrical Engineering, Centre for Innovation and Technology Assessment in Health, Federal University of Uberlândia, Uberlândia, Brazil
| | - Daniel Hilário da Silva
- Postgraduate Program in Electrical and Biomedical Engineering, Faculty of Electrical Engineering, Centre for Innovation and Technology Assessment in Health, Federal University of Uberlândia, Uberlândia, Brazil
| | - Selma Terezinha Milagre
- Postgraduate Program in Electrical and Biomedical Engineering, Faculty of Electrical Engineering, Centre for Innovation and Technology Assessment in Health, Federal University of Uberlândia, Uberlândia, Brazil
| | - Adriano de Oliveira Andrade
- Postgraduate Program in Electrical and Biomedical Engineering, Faculty of Electrical Engineering, Centre for Innovation and Technology Assessment in Health, Federal University of Uberlândia, Uberlândia, Brazil
| | - Adriano Alves Pereira
- Postgraduate Program in Electrical and Biomedical Engineering, Faculty of Electrical Engineering, Centre for Innovation and Technology Assessment in Health, Federal University of Uberlândia, Uberlândia, Brazil.
| |
Collapse
|
68
|
Luo KH, Wu CH, Yang CC, Chen TH, Tu HP, Yang CH, Chuang HY. Exploring the association of metal mixture in blood to the kidney function and tumor necrosis factor alpha using machine learning methods. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2023; 265:115528. [PMID: 37783110 DOI: 10.1016/j.ecoenv.2023.115528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Revised: 09/09/2023] [Accepted: 09/25/2023] [Indexed: 10/04/2023]
Abstract
This research aimed to approach relationships between metal mixture in blood and kidney function, tumor necrosis factor alpha (TNF-α) by machine learning. Metals levels were measured by Inductively Couple Plasma Mass Spectrometry in blood from 421 participants. We applied K Nearest Neighbor (KNN), Naive Bayes classifier (NB), Support Vector Machines (SVM), random forest (RF), Gradient Boosting Decision Tree (GBDT), Categorical boosting (CatBoost), eXtreme Gradient Boosting (XGBoost), Whale Optimization-based XGBoost (WXGBoost) to identify the effect of plasma metals, TNF-α, and estimated glomerular filtration rate (eGFR by CKD-EPI equation). We conducted not only toxic metals, lead (Pb), arsenic (As), cadmium (Cd) but also included trace essential metals, selenium (Se), copper (Cu), zinc (Zn), cobalt (Co), to predict the interaction of TNF-α, TNF-α/white blood count, and eGFR. The high average TNF-α level group was observed among subjects with higher Pb, As, Cd, Cu, and Zn levels in blood. No associations were shown between the low and high TNF-α level group in blood Se and Co levels. Those with lower eGFR group had high Pb, As, Cd, Co, Cu, and Zn levels. The crucial predictor of TNF-α level in metals was blood Pb, and then Cd, As, Cu, Se, Zn and Co. The machine learning revealed that As was the major role among predictors of eGFR after feature selection. The levels of kidney function and TNF-α were modified by co-exposure metals. We were able to acquire highest accuracy of over 85% in the multi-metals exposure model. The higher Pb and Zn levels had strongest interaction with declined eGFR. In addition, As and Cd had synergistic with prediction model of TNF-α. We explored the potential of machine learning approaches for predicting health outcomes with multi-metal exposure. XGBoost model added SHAP could give an explicit explanation of individualized and precision risk prediction and insight of the interaction of key features in the multi-metal exposure.
Collapse
Affiliation(s)
- Kuei-Hau Luo
- Graduate Institute of Medicine, College of Medicine, Kaohsiung Medicine University, Kaohsiung City 807, Taiwan
| | - Chih-Hsien Wu
- Department of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung 80778, Taiwan
| | - Chen-Cheng Yang
- Graduate Institute of Medicine, College of Medicine, Kaohsiung Medicine University, Kaohsiung City 807, Taiwan; Department of Occupational Medicine, Kaohsiung Municipal Siaogang Hospital, Kaohsiung Medical University, Kaohsiung 812, Taiwan
| | - Tzu-Hua Chen
- Department of Family Medicine, Kaohsiung Municipal Ta-Tung Hospital, Kaohsiung 801, Taiwan
| | - Hung-Pin Tu
- Department of Public Health and Environmental Medicine, School of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung 807, Taiwan
| | - Cheng-Hong Yang
- Department of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung 80778, Taiwan; Department of Information Management, Tainan University of Technology, Tainan 71002, Taiwan; Drug Development and Value Creation Research Center, Kaohsiung Medical University, Kaohsiung 80708, Taiwan; Ph. D. Program in Biomedical Engineering, Kaohsiung Medical University, Kaohsiung 80708, Taiwan; School of Dentistry, Kaohsiung Medical University, Kaohsiung 80708, Taiwan
| | - Hung-Yi Chuang
- Graduate Institute of Medicine, College of Medicine, Kaohsiung Medicine University, Kaohsiung City 807, Taiwan; Department of Public Health and Environmental Medicine, School of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung 807, Taiwan; Department of Occupational and Environmental Medicine, Kaohsiung Medicine University Hospital, Kaohsiung Medicine University, Kaohsiung City 807, Taiwan; Ph.D. Program in Environmental and Occupational Medicine, and Research Center for Precision Environmental Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung 807, Taiwan.
| |
Collapse
|
69
|
Pacheco J, Saiz O, Casado S, Ubillos S. A multistart tabu search-based method for feature selection in medical applications. Sci Rep 2023; 13:17140. [PMID: 37816874 PMCID: PMC10564765 DOI: 10.1038/s41598-023-44437-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 10/08/2023] [Indexed: 10/12/2023] Open
Abstract
In the design of classification models, irrelevant or noisy features are often generated. In some cases, there may even be negative interactions among features. These weaknesses can degrade the performance of the models. Feature selection is a task that searches for a small subset of relevant features from the original set that generate the most efficient models possible. In addition to improving the efficiency of the models, feature selection confers other advantages, such as greater ease in the generation of the necessary data as well as clearer and more interpretable models. In the case of medical applications, feature selection may help to distinguish which characteristics, habits, and factors have the greatest impact on the onset of diseases. However, feature selection is a complex task due to the large number of possible solutions. In the last few years, methods based on different metaheuristic strategies, mainly evolutionary algorithms, have been proposed. The motivation of this work is to develop a method that outperforms previous methods, with the benefits that this implies especially in the medical field. More precisely, the present study proposes a simple method based on tabu search and multistart techniques. The proposed method was analyzed and compared to other methods by testing their performance on several medical databases. Specifically, eight databases belong to the well-known repository of the University of California in Irvine and one of our own design were used. In these computational tests, the proposed method outperformed other recent methods as gauged by various metrics and classifiers. The analyses were accompanied by statistical tests, the results of which showed that the superiority of our method is significant and therefore strengthened these conclusions. In short, the contribution of this work is the development of a method that, on the one hand, is based on different strategies than those used in recent methods, and on the other hand, improves the performance of these methods.
Collapse
|
70
|
Lee H, Lee Y, Jo M, Nam S, Jo J, Lee C. Enhancing Diagnosis of Rotating Elements in Roll-to-Roll Manufacturing Systems through Feature Selection Approach Considering Overlapping Data Density and Distance Analysis. SENSORS (BASEL, SWITZERLAND) 2023; 23:7857. [PMID: 37765913 PMCID: PMC10534779 DOI: 10.3390/s23187857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 09/01/2023] [Accepted: 09/11/2023] [Indexed: 09/29/2023]
Abstract
Roll-to-roll manufacturing systems have been widely adopted for their cost-effectiveness, eco-friendliness, and mass-production capabilities, utilizing thin and flexible substrates. However, in these systems, defects in the rotating components such as the rollers and bearings can result in severe defects in the functional layers. Therefore, the development of an intelligent diagnostic model is crucial for effectively identifying these rotating component defects. In this study, a quantitative feature-selection method, feature partial density, to develop high-efficiency diagnostic models was proposed. The feature combinations extracted from the measured signals were evaluated based on the partial density, which is the density of the remaining data excluding the highest class in overlapping regions and the Mahalanobis distance by class to assess the classification performance of the models. The validity of the proposed algorithm was verified through the construction of ranked model groups and comparison with existing feature-selection methods. The high-ranking group selected by the algorithm outperformed the other groups in terms of training time, accuracy, and positive predictive value. Moreover, the top feature combination demonstrated superior performance across all indicators compared to existing methods.
Collapse
Affiliation(s)
- Haemi Lee
- Department of Mechanical Design and Production Engineering, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05030, Republic of Korea
| | - Yoonjae Lee
- Department of Mechanical Design and Production Engineering, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05030, Republic of Korea
| | - Minho Jo
- Department of Mechanical Design and Production Engineering, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05030, Republic of Korea
| | - Sanghoon Nam
- Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Jeongdai Jo
- Department of Printed Electronics, Korea Institute of Machinery and Materials, 156, Gajeongbuk-ro, Yuseong-gu, Daejeon 34103, Republic of Korea
| | - Changwoo Lee
- Department of Mechanical and Aerospace Engineering, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05030, Republic of Korea
| |
Collapse
|
71
|
Guo B, Liu H, Niu L. Integration of natural and deep artificial cognitive models in medical images: BERT-based NER and relation extraction for electronic medical records. Front Neurosci 2023; 17:1266771. [PMID: 37732304 PMCID: PMC10507183 DOI: 10.3389/fnins.2023.1266771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 08/14/2023] [Indexed: 09/22/2023] Open
Abstract
Introduction Medical images and signals are important data sources in the medical field, and they contain key information such as patients' physiology, pathology, and genetics. However, due to the complexity and diversity of medical images and signals, resulting in difficulties in medical knowledge acquisition and decision support. Methods In order to solve this problem, this paper proposes an end-to-end framework based on BERT for NER and RE tasks in electronic medical records. Our framework first integrates NER and RE tasks into a unified model, adopting an end-to-end processing manner, which removes the limitation and error propagation of multiple independent steps in traditional methods. Second, by pre-training and fine-tuning the BERT model on large-scale electronic medical record data, we enable the model to obtain rich semantic representation capabilities that adapt to the needs of medical fields and tasks. Finally, through multi-task learning, we enable the model to make full use of the correlation and complementarity between NER and RE tasks, and improve the generalization ability and effect of the model on different data sets. Results and discussion We conduct experimental evaluation on four electronic medical record datasets, and the model significantly out performs other methods on different datasets in the NER task. In the RE task, the EMLB model also achieved advantages on different data sets, especially in the multi-task learning mode, its performance has been significantly improved, and the ETE and MTL modules performed well in terms of comprehensive precision and recall. Our research provides an innovative solution for medical image and signal data.
Collapse
Affiliation(s)
- Bo Guo
- School of Computer and Information Engineering, Fuyang Normal University, Fuyang, China
- Department of Computing, Faculty of Communication, Visual Art and Computing, Universiti Selangor, Bestari Jaya, Selangor, Malaysia
| | - Huaming Liu
- School of Computer and Information Engineering, Fuyang Normal University, Fuyang, China
| | - Lei Niu
- School of Computer and Information Engineering, Fuyang Normal University, Fuyang, China
| |
Collapse
|
72
|
Khetavath S, Sendhilkumar NC, Mukunthan P, Jana S, Gopalakrishnan S, Malliga L, Chand SR, Farhaoui Y. An Intelligent Heuristic Manta-Ray Foraging Optimization and Adaptive Extreme Learning Machine for Hand Gesture Image Recognition. BIG DATA MINING AND ANALYTICS 2023; 6:321-335. [DOI: 10.26599/bdma.2022.9020036] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2025]
Affiliation(s)
- Seetharam Khetavath
- Chaitanya (Deemed to be University),Department of Electronics and Communication Engineering,Warangal,India,506001
| | - Navalpur Chinnappan Sendhilkumar
- Sri Indu College of Engineering & Technology, Sheriguda,Department of Electronics and Communication Engineering,Hyderabad,India,501510
| | - Pandurangan Mukunthan
- Sri Indu College of Engineering & Technology, Sheriguda,Department of Electronics and Communication Engineering,Hyderabad,India,501510
| | - Selvaganesan Jana
- Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology,Department of Electronics and Communication Engineering,Chennai,India,600062
| | - Subburayalu Gopalakrishnan
- Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology,Department of Electronics and Communication Engineering,Chennai,India,600062
| | - Lakshmanan Malliga
- Malla Reddy Engineering College for Women (Autonomous),Department of Electronics and Communication Engineering,Telangana,India,500100
| | - Sankuru Ravi Chand
- Nalla Narasimha Reddy Education Society's Group of Institutions-Integrated Campus,Department of Electronics and Communication Engineering,Hyderabad,India,500088
| | - Yousef Farhaoui
- STI Laboratory, the IDMS Team, Faculty of Sciences and Techniques, Moulay Ismail University of Meknès,Errachidia,Morocco,52000
| |
Collapse
|
73
|
Mahmoud AY, Neagu D, Scrimieri D, Abdullatif ARA. Early diagnosis and personalised treatment focusing on synthetic data modelling: Novel visual learning approach in healthcare. Comput Biol Med 2023; 164:107295. [PMID: 37557053 DOI: 10.1016/j.compbiomed.2023.107295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 07/26/2023] [Accepted: 07/28/2023] [Indexed: 08/11/2023]
Abstract
The early diagnosis and personalised treatment of diseases are facilitated by machine learning. The quality of data has an impact on diagnosis because medical data are usually sparse, imbalanced, and contain irrelevant attributes, resulting in suboptimal diagnosis. To address the impacts of data challenges, improve resource allocation, and achieve better health outcomes, a novel visual learning approach is proposed. This study contributes to the visual learning approach by determining whether less or more synthetic data are required to improve the quality of a dataset, such as the number of observations and features, according to the intended personalised treatment and early diagnosis. In addition, numerous visualisation experiments are conducted, including using statistical characteristics, cumulative sums, histograms, correlation matrix, root mean square error, and principal component analysis in order to visualise both original and synthetic data to address the data challenges. Real medical datasets for cancer, heart disease, diabetes, cryotherapy and immunotherapy are selected as case studies. As a benchmark and point of classification comparison in terms of such as accuracy, sensitivity, and specificity, several models are implemented such as k-Nearest Neighbours and Random Forest. To simulate algorithm implementation and data, Generative Adversarial Network is used to create and manipulate synthetic data, whilst, Random Forest is implemented to classify the data. An amendable and adaptable system is constructed by combining Generative Adversarial Network and Random Forest models. The system model presents working steps, overview and flowchart. Experiments reveal that the majority of data-enhancement scenarios allow for the application of visual learning in the first stage of data analysis as a novel approach. To achieve meaningful adaptable synergy between appropriate quality data and optimal classification performance while maintaining statistical characteristics, visual learning provides researchers and practitioners with practical human-in-the-loop machine learning visualisation tools. Prior to implementing algorithms, the visual learning approach can be used to actualise early, and personalised diagnosis. For the immunotherapy data, the Random Forest performed best with precision, recall, f-measure, accuracy, sensitivity, and specificity of 81%, 82%, 81%, 88%, 95%, and 60%, as opposed to 91%, 96%, 93%, 93%, 96%, and 73% for synthetic data, respectively. Future studies might examine the optimal strategies to balance the quantity and quality of medical data.
Collapse
Affiliation(s)
- Ahsanullah Yunas Mahmoud
- Faculty of Engineering and Informatics, University of Bradford, Bradford, England, United Kingdom.
| | - Daniel Neagu
- Faculty of Engineering and Informatics, University of Bradford, Bradford, England, United Kingdom
| | - Daniele Scrimieri
- Faculty of Engineering and Informatics, University of Bradford, Bradford, England, United Kingdom
| | | |
Collapse
|
74
|
Munir N, McMorrow R, Mulrennan K, Whitaker D, McLoone S, Kellomäki M, Talvitie E, Lyyra I, McAfee M. Interpretable Machine Learning Methods for Monitoring Polymer Degradation in Extrusion of Polylactic Acid. Polymers (Basel) 2023; 15:3566. [PMID: 37688192 PMCID: PMC10489772 DOI: 10.3390/polym15173566] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 08/17/2023] [Accepted: 08/24/2023] [Indexed: 09/10/2023] Open
Abstract
This work investigates real-time monitoring of extrusion-induced degradation in different grades of PLA across a range of process conditions and machine set-ups. Data on machine settings together with in-process sensor data, including temperature, pressure, and near-infrared (NIR) spectra, are used as inputs to predict the molecular weight and mechanical properties of the product. Many soft sensor approaches based on complex spectral data are essentially 'black-box' in nature, which can limit industrial acceptability. Hence, the focus here is on identifying an optimal approach to developing interpretable models while achieving high predictive accuracy and robustness across different process settings. The performance of a Recursive Feature Elimination (RFE) approach was compared to more common dimension reduction and regression approaches including Partial Least Squares (PLS), iterative PLS (i-PLS), Principal Component Regression (PCR), ridge regression, Least Absolute Shrinkage and Selection Operator (LASSO), and Random Forest (RF). It is shown that for medical-grade PLA processed under moisture-controlled conditions, accurate prediction of molecular weight is possible over a wide range of process conditions and different machine settings (different nozzle types for downstream fibre spinning) with an RFE-RF algorithm. Similarly, for the prediction of yield stress, RFE-RF achieved excellent predictive performance, outperforming the other approaches in terms of simplicity, interpretability, and accuracy. The features selected by the RFE model provide important insights to the process. It was found that change in molecular weight was not an important factor affecting the mechanical properties of the PLA, which is primarily related to the pressure and temperature at the latter stages of the extrusion process. The temperature at the extruder exit was also the most important predictor of degradation of the polymer molecular weight, highlighting the importance of accurate melt temperature control in the process. RFE not only outperforms more established methods as a soft sensor method, but also has significant advantages in terms of computational efficiency, simplicity, and interpretability. RFE-based soft sensors are promising for better quality control in processing thermally sensitive polymers such as PLA, in particular demonstrating for the first time the ability to monitor molecular weight degradation during processing across various machine settings.
Collapse
Affiliation(s)
- Nimra Munir
- Centre for Mathematical Modelling and Intelligent Systems for Health and Environment (MISHE), Atlantic Technological University, ATU Sligo, Ash Lane, F91 YW50 Sligo, Ireland;
- Centre for Precision Engineering, Materials and Manufacturing (PEM Centre), Atlantic Technological University, ATU Sligo, Ash Lane, F91 YW50 Sligo, Ireland
| | - Ross McMorrow
- Department of Mechatronic Engineering, Atlantic Technological University, ATU Sligo, Ash Lane, F91 YW50 Sligo, Ireland;
| | - Konrad Mulrennan
- Centre for Mathematical Modelling and Intelligent Systems for Health and Environment (MISHE), Atlantic Technological University, ATU Sligo, Ash Lane, F91 YW50 Sligo, Ireland;
- Centre for Precision Engineering, Materials and Manufacturing (PEM Centre), Atlantic Technological University, ATU Sligo, Ash Lane, F91 YW50 Sligo, Ireland
| | - Darren Whitaker
- Perceptive Engineering-An Applied Materials Company, Keckwick Lane, Daresbury WA4 4AB, UK;
| | - Seán McLoone
- Centre for Intelligent Autonomous Manufacturing Systems, Queen’s University Belfast, Belfast BT7 1NN, UK;
| | - Minna Kellomäki
- Biomaterials and Tissue Engineering Group, Faculty of Medicine and Health Technology, BioMediTech, Tampere University, 33720 Tampere, Finland; (M.K.); (E.T.); (I.L.)
| | - Elina Talvitie
- Biomaterials and Tissue Engineering Group, Faculty of Medicine and Health Technology, BioMediTech, Tampere University, 33720 Tampere, Finland; (M.K.); (E.T.); (I.L.)
| | - Inari Lyyra
- Biomaterials and Tissue Engineering Group, Faculty of Medicine and Health Technology, BioMediTech, Tampere University, 33720 Tampere, Finland; (M.K.); (E.T.); (I.L.)
| | - Marion McAfee
- Centre for Mathematical Modelling and Intelligent Systems for Health and Environment (MISHE), Atlantic Technological University, ATU Sligo, Ash Lane, F91 YW50 Sligo, Ireland;
- Centre for Precision Engineering, Materials and Manufacturing (PEM Centre), Atlantic Technological University, ATU Sligo, Ash Lane, F91 YW50 Sligo, Ireland
| |
Collapse
|
75
|
Rothe F, Berger J, Welker P, Fiebelkorn R, Kupper S, Kiesel D, Gedat E, Ohrndorf S. Fluorescence optical imaging feature selection with machine learning for differential diagnosis of selected rheumatic diseases. Front Med (Lausanne) 2023; 10:1228833. [PMID: 37671403 PMCID: PMC10475553 DOI: 10.3389/fmed.2023.1228833] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 07/28/2023] [Indexed: 09/07/2023] Open
Abstract
Background and objective Accurate and fast diagnosis of rheumatic diseases affecting the hands is essential for further treatment decisions. Fluorescence optical imaging (FOI) visualizes inflammation-induced impaired microcirculation by increasing signal intensity, resulting in different image features. This analysis aimed to find specific image features in FOI that might be important for accurately diagnosing different rheumatic diseases. Patients and methods FOI images of the hands of patients with different types of rheumatic diseases, such as rheumatoid arthritis (RA), osteoarthritis (OA), and connective tissue diseases (CTD), were assessed in a reading of 20 different image features in three phases of the contrast agent dynamics, yielding 60 different features for each patient. The readings were analyzed for mutual differential diagnosis of the three diseases (One-vs-One) and each disease in all data (One-vs-Rest). In the first step, statistical tools and machine-learning-based methods were applied to reveal the importance rankings of the features, that is, to find features that contribute most to the model-based classification. In the second step machine learning with a stepwise increasing number of features was applied, sequentially adding at each step the most crucial remaining feature to extract a minimized subset that yields the highest diagnostic accuracy. Results In total, n = 605 FOI of both hands were analyzed (n = 235 with RA, n = 229 with OA, and n = 141 with CTD). All classification problems showed maximum accuracy with a reduced set of image features. For RA-vs.-OA, five features were needed for high accuracy. For RA-vs.-CTD ten, OA-vs.-CTD sixteen, RA-vs.-Rest five, OA-vs.-Rest eleven, and CTD-vs-Rest fifteen, features were needed, respectively. For all problems, the final importance ranking of the features with respect to the contrast agent dynamics was determined. Conclusions With the presented investigations, the set of features in FOI examinations relevant to the differential diagnosis of the selected rheumatic diseases could be remarkably reduced, providing helpful information for the physician.
Collapse
Affiliation(s)
- Felix Rothe
- Telematics Research Group, Wildau Technical University of Applied Sciences, Wildau, Germany
| | | | - Pia Welker
- Institute of Functional Anatomy, Charité—Universitätsmedizin Berlin, Berlin, Germany
| | - Richard Fiebelkorn
- Telematics Research Group, Wildau Technical University of Applied Sciences, Wildau, Germany
| | - Stefan Kupper
- Telematics Research Group, Wildau Technical University of Applied Sciences, Wildau, Germany
| | - Denise Kiesel
- Institute of Functional Anatomy, Charité—Universitätsmedizin Berlin, Berlin, Germany
| | - Egbert Gedat
- Telematics Research Group, Wildau Technical University of Applied Sciences, Wildau, Germany
- Xiralite GmbH, Berlin, Germany
| | - Sarah Ohrndorf
- Department of Rheumatology and Clinical Immunology, Charité—Universitätsmedizin Berlin, Berlin, Germany
| |
Collapse
|
76
|
Al-Tashi Q, Saad MB, Sheshadri A, Wu CC, Chang JY, Al-Lazikani B, Gibbons C, Vokes NI, Zhang J, Lee JJ, Heymach JV, Jaffray D, Mirjalili S, Wu J. SwarmDeepSurv: swarm intelligence advances deep survival network for prognostic radiomics signatures in four solid cancers. PATTERNS (NEW YORK, N.Y.) 2023; 4:100777. [PMID: 37602223 PMCID: PMC10435962 DOI: 10.1016/j.patter.2023.100777] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 04/18/2023] [Accepted: 05/26/2023] [Indexed: 08/22/2023]
Abstract
Survival models exist to study relationships between biomarkers and treatment effects. Deep learning-powered survival models supersede the classical Cox proportional hazards (CoxPH) model, but substantial performance drops were observed on high-dimensional features because of irrelevant/redundant information. To fill this gap, we proposed SwarmDeepSurv by integrating swarm intelligence algorithms with the deep survival model. Furthermore, four objective functions were designed to optimize prognostic prediction while regularizing selected feature numbers. When testing on multicenter sets (n = 1,058) of four different cancer types, SwarmDeepSurv was less prone to overfitting and achieved optimal patient risk stratification compared with popular survival modeling algorithms. Strikingly, SwarmDeepSurv selected different features compared with classical feature selection algorithms, including the least absolute shrinkage and selection operator (LASSO), with nearly no feature overlapping across these models. Taken together, SwarmDeepSurv offers an alternative approach to model relationships between radiomics features and survival endpoints, which can further extend to study other input data types including genomics.
Collapse
Affiliation(s)
- Qasem Al-Tashi
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Maliazurina B. Saad
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Ajay Sheshadri
- Department of Pulmonary Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Carol C. Wu
- Department of Thoracic Imaging, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Joe Y. Chang
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Bissan Al-Lazikani
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Christopher Gibbons
- Section of Patient-Centered Analytics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Natalie I. Vokes
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Jianjun Zhang
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - J. Jack Lee
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - John V. Heymach
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - David Jaffray
- Office of the Chief Technology and Digital Officer, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Seyedali Mirjalili
- Centre for Artificial Intelligence Research and Optimization, Torrens University Australia, Fortitude Valley, Brisbane, QLD 4006, Australia
- Yonsei Frontier Lab, Yonsei University, Seoul 03722, Korea
- University Research and Innovation Center, Obuda University, 1034 Budapest, Hungary
| | - Jia Wu
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| |
Collapse
|
77
|
Li M, Xu G, Chen Q, Xue T, Peng H, Wang Y, Shi H, Duan S, Feng F. Computed Tomography-based Radiomics Nomogram for the Preoperative Prediction of Tumor Deposits and Clinical Outcomes in Colon Cancer: a Multicenter Study. Acad Radiol 2023; 30:1572-1583. [PMID: 36566155 DOI: 10.1016/j.acra.2022.11.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 10/16/2022] [Accepted: 11/07/2022] [Indexed: 12/24/2022]
Abstract
RATIONALE AND OBJECTIVES To develop and validate a computed tomography (CT)-based radiomics nomogram for the preoperative prediction of tumor deposits (TDs) and clinical outcomes in patients with colon cancer. MATERIALS AND METHODS This retrospective study included 383 consecutive patients with colon cancer from two centers. Radiomics features were extracted from portal venous phase CT images. Least absolute shrinkage and selection operator regression was applied for feature selection and radiomics signature construction. The multivariate logistic regression model was used to establish a radiomics nomogram. The performance of the nomogram was assessed by using receiver operating characteristic curves, calibration curves and decision curve analysis. Kaplan‒Meier survival analysis was used to assess the difference of the overall survival (OS) in the TDs-positive and TDs-negative groups. RESULTS The radiomics signature was composed of 11 TDs status related features. The AUCs of the radiomics model in the training cohort, internal validation and external validation cohorts were 0.82, 0.78 and 0.78, respectively. The radiomics nomogram that incorporated the radiomics signature and clinical independent predictors (CT-N, CEA and CA199) showed good calibration and discrimination with AUCs of 0.88, 0.80 and 0.81 in the training cohort, internal validation and external validation cohorts, respectively. The radiomics nomogram-predicted high-risk groups had a worse OS than the low-risk groups (p < 0.001). The radiomics nomogram-predicted TDs was an independent preoperative predictor of OS. CONCLUSION The radiomics nomogram based on CT radiomics features and clinical independent predictors could effectively predict the preoperative TDs status and OS of colon cancer. IMPORTANT FINDINGS CT-based radiomics nomogram may be applied in the individual preoperative prediction of TDs status in colon cancer. Additionally, there was a significant difference in OS between the high-risk and low-risk groups defined by the radiomics nomogram, in which patients with high-risk TDs had a significantly worse OS, compared with those with low-risk TDs.
Collapse
Affiliation(s)
- Manman Li
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, Jiangsu, PR China, 226361
| | - Guodong Xu
- Department of Radiology, Affiliated Hospital of Nantong University, Nantong, Jiangsu, PR China
| | - Qiaoling Chen
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, Jiangsu, PR China, 226361
| | - Ting Xue
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, Jiangsu, PR China, 226361
| | - Hui Peng
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, Jiangsu, PR China, 226361
| | - Yuwei Wang
- Department of Record room, Affiliated Tumor Hospital of Nantong University, Nantong, Jiangsu, PR China
| | - Hui Shi
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, Jiangsu, PR China, 226361
| | | | - Feng Feng
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, Jiangsu, PR China, 226361.
| |
Collapse
|
78
|
Wang H, Doumard E, Soule-Dupuy C, Kemoun P, Aligon J, Monsarrat P. Explanations as a New Metric for Feature Selection: A Systematic Approach. IEEE J Biomed Health Inform 2023; 27:4131-4142. [PMID: 37220033 DOI: 10.1109/jbhi.2023.3279340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
With the extensive use of Machine Learning (ML) in the biomedical field, there was an increasing need for Explainable Artificial Intelligence (XAI) to improve transparency and reveal complex hidden relationships between variables for medical practitioners, while meeting regulatory requirements. Feature Selection (FS) is widely used as a part of a biomedical ML pipeline to significantly reduce the number of variables while preserving as much information as possible. However, the choice of FS methods affects the entire pipeline including the final prediction explanations, whereas very few works investigate the relationship between FS and model explanations. Through a systematic workflow performed on 145 datasets and an illustration on medical data, the present work demonstrated the promising complementarity of two metrics based on explanations (using ranking and influence changes) in addition to accuracy and retention rate to select the most appropriate FS/ML models. Measuring how much explanations differ with/without FS are particularly promising for FS methods recommendation. While reliefF generally performs the best on average, the optimal choice may vary for each dataset. Positioning FS methods in a tridimensional space, integrating explanations-based metrics, accuracy and retention rate, would allow the user to choose the priorities to be given on each of the dimensions. In biomedical applications, where each medical condition may have its own preferences, this framework will make it possible to offer the healthcare professional the appropriate FS technique, to select the variables that have an important explainable impact, even if this comes at the expense of a limited drop of accuracy.
Collapse
|
79
|
Li M, Xu G, Zhou H, Chen Q, Fan Q, Shi J, Duan S, Cui Y, Feng F. Computed tomography-based radiomics nomogram for the pre-operative prediction of BRAF mutation and clinical outcomes in patients with colorectal cancer: a double-center study. Br J Radiol 2023; 96:20230019. [PMID: 37195006 PMCID: PMC10392655 DOI: 10.1259/bjr.20230019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 04/10/2023] [Accepted: 04/23/2023] [Indexed: 05/18/2023] Open
Abstract
OBJECTIVE To develop and validate a radiomics nomogram based on CT for the pre-operative prediction of BRAF mutation and clinical outcomes in patients with colorectal cancer (CRC). METHODS A total of 451 CRC patients (training cohort = 190; internal validation cohort = 125; external validation cohort = 136) from 2 centers were retrospectively included. Least absolute shrinkage and selection operator regression was used to select radiomics features and the radiomics score (Radscore) was calculated. Nomogram was constructed by combining Radscore and significant clinical predictors. Receiver operating characteristic curve analysis, calibration curve and decision curve analysis were used to evaluate the predictive performance of the nomogram. Kaplan‒Meier survival curves based on the radiomics nomogram were used to assess overall survival (OS) of the entire cohort. RESULTS The Radscore consisted of nine radiomics features which were the most relevant to BRAF mutation. The radiomics nomogram integrating Radscore and clinical independent predictors (age, tumor location and cN stage) showed good calibration and discrimination with AUCs of 0.86 (95% CI: 0.80-0.91), 0.82 (95% CI: 0.74-0.90) and 0.82 (95% CI: 0.75-0.90) in the training cohort, internal validation and external validation cohorts, respectively. Furthermore,the performance of nomogram was significantly better than that of the clinical model (p < 0.05). The radiomics nomogram-predicted BRAF mutation high-risk group had a worse OS than the low-risk group (p < 0.0001). CONCLUSION The radiomics nomogram showed good performance in predicting BRAF mutation and OS of CRC patients, which could provide valuable information for individualized treatment. ADVANCES IN KNOWLEDGE The radiomics nomogram could effectively predict BRAF mutation and OS in patients with CRC. High-risk BRAF mutation group identified by the radiomics nomogram was independently associated with poor OS.
Collapse
Affiliation(s)
| | - Guodong Xu
- Department of Radiology, Yancheng No. 1 People’s Hospital, Yancheng, Jiangsu Province, China
| | - Hui Zhou
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, Jiangsu Province, China
| | - Qiaoling Chen
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, Jiangsu Province, China
| | - Qi Fan
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, Jiangsu Province, China
| | - Jian Shi
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, Jiangsu Province, China
| | | | - Yanfen Cui
- Department of Radiology, Shanxi Cancer Hospital, Shanxi, Shanxi Province, China
| | - Feng Feng
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, Jiangsu Province, China
| |
Collapse
|
80
|
Lee DY, Choi B, Kim C, Fridgeirsson E, Reps J, Kim M, Kim J, Jang JW, Rhee SY, Seo WW, Lee S, Son SJ, Park RW. Privacy-Preserving Federated Model Predicting Bipolar Transition in Patients With Depression: Prediction Model Development Study. J Med Internet Res 2023; 25:e46165. [PMID: 37471130 PMCID: PMC10401196 DOI: 10.2196/46165] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 03/10/2023] [Accepted: 06/29/2023] [Indexed: 07/21/2023] Open
Abstract
BACKGROUND Mood disorder has emerged as a serious concern for public health; in particular, bipolar disorder has a less favorable prognosis than depression. Although prompt recognition of depression conversion to bipolar disorder is needed, early prediction is challenging due to overlapping symptoms. Recently, there have been attempts to develop a prediction model by using federated learning. Federated learning in medical fields is a method for training multi-institutional machine learning models without patient-level data sharing. OBJECTIVE This study aims to develop and validate a federated, differentially private multi-institutional bipolar transition prediction model. METHODS This retrospective study enrolled patients diagnosed with the first depressive episode at 5 tertiary hospitals in South Korea. We developed models for predicting bipolar transition by using data from 17,631 patients in 4 institutions. Further, we used data from 4541 patients for external validation from 1 institution. We created standardized pipelines to extract large-scale clinical features from the 4 institutions without any code modification. Moreover, we performed feature selection in a federated environment for computational efficiency and applied differential privacy to gradient updates. Finally, we compared the federated and the 4 local models developed with each hospital's data on internal and external validation data sets. RESULTS In the internal data set, 279 out of 17,631 patients showed bipolar disorder transition. In the external data set, 39 out of 4541 patients showed bipolar disorder transition. The average performance of the federated model in the internal test (area under the curve [AUC] 0.726) and external validation (AUC 0.719) data sets was higher than that of the other locally developed models (AUC 0.642-0.707 and AUC 0.642-0.699, respectively). In the federated model, classifications were driven by several predictors such as the Charlson index (low scores were associated with bipolar transition, which may be due to younger age), severe depression, anxiolytics, young age, and visiting months (the bipolar transition was associated with seasonality, especially during the spring and summer months). CONCLUSIONS We developed and validated a differentially private federated model by using distributed multi-institutional psychiatric data with standardized pipelines in a real-world environment. The federated model performed better than models using local data only.
Collapse
Affiliation(s)
- Dong Yun Lee
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon-si, Republic of Korea
| | - Byungjin Choi
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon-si, Republic of Korea
| | - Chungsoo Kim
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon-si, Republic of Korea
| | - Egill Fridgeirsson
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, Netherlands
| | - Jenna Reps
- Observational Health Data Analytics, Janssen Research and Development, Titusville, NJ, United States
| | - Myoungsuk Kim
- Data Solution Team, Evidnet Co, Ltd, Sungnam, Republic of Korea
| | - Jihyeong Kim
- Data Solution Team, Evidnet Co, Ltd, Sungnam, Republic of Korea
| | - Jae-Won Jang
- Department of Neurology, Kangwon National University Hospital, Kangwon National University School of Medicine, Chuncheon, Republic of Korea
| | - Sang Youl Rhee
- Center for Digital Health, Medical Science Research Institute, Kyung Hee University Medical Center, Seoul, Republic of Korea
- Department of Endocrinology and Metabolism, Kyung Hee University College of Medicine, Seoul, Republic of Korea
| | - Won-Woo Seo
- Department of Internal Medicine, Kangdong Sacred Heart Hospital, Hallym University College of Medicine, Seoul, Republic of Korea
| | - Seunghoon Lee
- Department of Psychiatry, Myongji Hospital, Goyang, Republic of Korea
| | - Sang Joon Son
- Department of Psychiatry, Ajou University School of Medicine, Suwon-si, Republic of Korea
| | - Rae Woong Park
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon-si, Republic of Korea
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon-si, Republic of Korea
| |
Collapse
|
81
|
Thomas E, Ali FB, Tolambiya A, Chambellant F, Gaveau J. Too much information is no information: how machine learning and feature selection could help in understanding the motor control of pointing. Front Big Data 2023; 6:921355. [PMID: 37546547 PMCID: PMC10399757 DOI: 10.3389/fdata.2023.921355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Accepted: 06/16/2023] [Indexed: 08/08/2023] Open
Abstract
The aim of this study was to develop the use of Machine Learning techniques as a means of multivariate analysis in studies of motor control. These studies generate a huge amount of data, the analysis of which continues to be largely univariate. We propose the use of machine learning classification and feature selection as a means of uncovering feature combinations that are altered between conditions. High dimensional electromyogram (EMG) vectors were generated as several arm and trunk muscles were recorded while subjects pointed at various angles above and below the gravity neutral horizontal plane. We used Linear Discriminant Analysis (LDA) to carry out binary classifications between the EMG vectors for pointing at a particular angle, vs. pointing at the gravity neutral direction. Classification success provided a composite index of muscular adjustments for various task constraints-in this case, pointing angles. In order to find the combination of features that were significantly altered between task conditions, we conducted a post classification feature selection i.e., investigated which combination of features had allowed for the classification. Feature selection was done by comparing the representations of each category created by LDA for the classification. In other words computing the difference between the representations of each class. We propose that this approach will help with comparing high dimensional EMG patterns in two ways; (i) quantifying the effects of the entire pattern rather than using single arbitrarily defined variables and (ii) identifying the parts of the patterns that convey the most information regarding the investigated effects.
Collapse
Affiliation(s)
- Elizabeth Thomas
- INSERMU1093, UFR STAPS, Université de Bourgogne Franche Comté, Dijon, France
| | - Ferid Ben Ali
- School of Engineering and Computer Science, University of Hertfordshire, Hatfield, United Kingdom
| | - Arvind Tolambiya
- Applied Intelligence Hub, Accenture Solutions Private Ltd., Hyderabad, Telangana, India
| | - Florian Chambellant
- INSERMU1093, UFR STAPS, Université de Bourgogne Franche Comté, Dijon, France
| | - Jérémie Gaveau
- INSERMU1093, UFR STAPS, Université de Bourgogne Franche Comté, Dijon, France
| |
Collapse
|
82
|
Kuzudisli C, Bakir-Gungor B, Bulut N, Qaqish B, Yousef M. Review of feature selection approaches based on grouping of features. PeerJ 2023; 11:e15666. [PMID: 37483989 PMCID: PMC10358338 DOI: 10.7717/peerj.15666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 06/08/2023] [Indexed: 07/25/2023] Open
Abstract
With the rapid development in technology, large amounts of high-dimensional data have been generated. This high dimensionality including redundancy and irrelevancy poses a great challenge in data analysis and decision making. Feature selection (FS) is an effective way to reduce dimensionality by eliminating redundant and irrelevant data. Most traditional FS approaches score and rank each feature individually; and then perform FS either by eliminating lower ranked features or by retaining highly-ranked features. In this review, we discuss an emerging approach to FS that is based on initially grouping features, then scoring groups of features rather than scoring individual features. Despite the presence of reviews on clustering and FS algorithms, to the best of our knowledge, this is the first review focusing on FS techniques based on grouping. The typical idea behind FS through grouping is to generate groups of similar features with dissimilarity between groups, then select representative features from each cluster. Approaches under supervised, unsupervised, semi supervised and integrative frameworks are explored. The comparison of experimental results indicates the effectiveness of sequential, optimization-based (i.e., fuzzy or evolutionary), hybrid and multi-method approaches. When it comes to biological data, the involvement of external biological sources can improve analysis results. We hope this work's findings can guide effective design of new FS approaches using feature grouping.
Collapse
Affiliation(s)
- Cihan Kuzudisli
- Department of Computer Engineering, Hasan Kalyoncu University, Gaziantep, Turkey
- Department of Electrical and Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Burcu Bakir-Gungor
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Nurten Bulut
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Bahjat Qaqish
- Department of Biostatistics, University of North Carolina at Chapel Hill, North Carolina, Chapel Hill, United States of America
| | - Malik Yousef
- Department of Information Systems, Zefat Academic College, Zefat, Israel
- Galilee Digital Health Research Center, Zefat Academic College, Zefat, Israel
| |
Collapse
|
83
|
Chattopadhyay S, Singh PK, Ijaz MF, Kim S, Sarkar R. SnapEnsemFS: a snapshot ensembling-based deep feature selection model for colorectal cancer histological analysis. Sci Rep 2023; 13:9937. [PMID: 37336964 PMCID: PMC10279666 DOI: 10.1038/s41598-023-36921-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 06/12/2023] [Indexed: 06/21/2023] Open
Abstract
Colorectal cancer is the third most common type of cancer diagnosed annually, and the second leading cause of death due to cancer. Early diagnosis of this ailment is vital for preventing the tumours to spread and plan treatment to possibly eradicate the disease. However, population-wide screening is stunted by the requirement of medical professionals to analyse histological slides manually. Thus, an automated computer-aided detection (CAD) framework based on deep learning is proposed in this research that uses histological slide images for predictions. Ensemble learning is a popular strategy for fusing the salient properties of several models to make the final predictions. However, such frameworks are computationally costly since it requires the training of multiple base learners. Instead, in this study, we adopt a snapshot ensemble method, wherein, instead of the traditional method of fusing decision scores from the snapshots of a Convolutional Neural Network (CNN) model, we extract deep features from the penultimate layer of the CNN model. Since the deep features are extracted from the same CNN model but for different learning environments, there may be redundancy in the feature set. To alleviate this, the features are fed into Particle Swarm Optimization, a popular meta-heuristic, for dimensionality reduction of the feature space and better classification. Upon evaluation on a publicly available colorectal cancer histology dataset using a five-fold cross-validation scheme, the proposed method obtains a highest accuracy of 97.60% and F1-Score of 97.61%, outperforming existing state-of-the-art methods on the same dataset. Further, qualitative investigation of class activation maps provide visual explainability to medical practitioners, as well as justifies the use of the CAD framework in screening of colorectal histology. Our source codes are publicly accessible at: https://github.com/soumitri2001/SnapEnsemFS .
Collapse
Affiliation(s)
- Soumitri Chattopadhyay
- Department of Information Technology, Jadavpur University, Jadavpur University Second Campus, Plot No. 8, Salt Lake Bypass, LB Block, Sector III, Salt Lake City, Kolkata, 700106, West Bengal, India
| | - Pawan Kumar Singh
- Department of Information Technology, Jadavpur University, Jadavpur University Second Campus, Plot No. 8, Salt Lake Bypass, LB Block, Sector III, Salt Lake City, Kolkata, 700106, West Bengal, India
| | - Muhammad Fazal Ijaz
- Department of Mechanical Engineering, Faculty of Engineering and Information Technology, The University of Melbourne, Grattam Street, Parkville, VIC, 3010, Australia.
| | - SeongKi Kim
- National Centre of Excellence in Software, Sangmyung University, Seoul, 03016, Korea.
| | - Ram Sarkar
- Department of Computer Science & Engineering, Jadavpur University, Kolkata, 700032, India
| |
Collapse
|
84
|
Tasci E, Jagasia S, Zhuge Y, Sproull M, Cooley Zgela T, Mackey M, Camphausen K, Krauze AV. RadWise: A Rank-Based Hybrid Feature Weighting and Selection Method for Proteomic Categorization of Chemoirradiation in Patients with Glioblastoma. Cancers (Basel) 2023; 15:2672. [PMID: 37345009 PMCID: PMC10216128 DOI: 10.3390/cancers15102672] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 05/03/2023] [Accepted: 05/06/2023] [Indexed: 06/23/2023] Open
Abstract
Glioblastomas (GBM) are rapidly growing, aggressive, nearly uniformly fatal, and the most common primary type of brain cancer. They exhibit significant heterogeneity and resistance to treatment, limiting the ability to analyze dynamic biological behavior that drives response and resistance, which are central to advancing outcomes in glioblastoma. Analysis of the proteome aimed at signal change over time provides a potential opportunity for non-invasive classification and examination of the response to treatment by identifying protein biomarkers associated with interventions. However, data acquired using large proteomic panels must be more intuitively interpretable, requiring computational analysis to identify trends. Machine learning is increasingly employed, however, it requires feature selection which has a critical and considerable effect on machine learning problems when applied to large-scale data to reduce the number of parameters, improve generalization, and find essential predictors. In this study, using 7k proteomic data generated from the analysis of serum obtained from 82 patients with GBM pre- and post-completion of concurrent chemoirradiation (CRT), we aimed to select the most discriminative proteomic features that define proteomic alteration that is the result of administering CRT. Thus, we present a novel rank-based feature weighting method (RadWise) to identify relevant proteomic parameters using two popular feature selection methods, least absolute shrinkage and selection operator (LASSO) and the minimum redundancy maximum relevance (mRMR). The computational results show that the proposed method yields outstanding results with very few selected proteomic features, with higher accuracy rate performance than methods that do not employ a feature selection process. While the computational method identified several proteomic signals identical to the clinical intuitive (heuristic approach), several heuristically identified proteomic signals were not selected while other novel proteomic biomarkers not selected with the heuristic approach that carry biological prognostic relevance in GBM only emerged with the novel method. The computational results show that the proposed method yields promising results, reducing 7k proteomic data to 8 selected proteomic features with a performance value of 96.364%, comparing favorably with techniques that do not employ feature selection.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Andra Valentina Krauze
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Building 10, Bethesda, MD 20892, USA; (E.T.); (S.J.); (Y.Z.); (M.S.); (T.C.Z.); (M.M.); (K.C.)
| |
Collapse
|
85
|
Oğur NB, Kotan M, Balta D, Yavuz BÇ, Oğur YS, Yuvacı HU, Yazıcı E. Detection of depression and anxiety in the perinatal period using Marine Predators Algorithm and kNN. Comput Biol Med 2023; 161:107003. [PMID: 37224599 DOI: 10.1016/j.compbiomed.2023.107003] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 04/19/2023] [Accepted: 05/02/2023] [Indexed: 05/26/2023]
Abstract
Undiagnosed prenatal anxiety and depression have the potential to worsen and have an adverse effect on both the mother and the infant. Although the diagnosis is made by specialist doctors, it is unclear which parameters are more effective. Especially in medicine, it is crucial to diagnose disease with high accuracy. For this reason, in this study, a questionnaire study was first conducted on pregnant women, and real original data were collected. Then, the Marine Predators Algorithm (MPA), one of the current metaheuristic algorithms inspired by nature, was combined with K-Nearest Neighbors (kNN) to determine high-priority features in the collected data. As a result, five of the 147 features selected by the proposed method were determined as high priority and approved by the doctors. In addition, the proposed method is compared with the Chi-square method, which is one of the filter-based feature selection methods. Thanks to the proposed feature selection method based on MPA and kNN, it has been observed that the classification gives more successful results in a shorter time with 98.11% success, and the model supports the diagnosis stage of the doctors.
Collapse
Affiliation(s)
- Nur Banu Oğur
- Sakarya University, Faculty of Computer and Information Sciences, Department of Computer Engineering, Sakarya, Turkey.
| | - Muhammed Kotan
- Sakarya University, Faculty of Computer and Information Sciences, Department of Information Systems Engineering, Sakarya, Turkey
| | - Deniz Balta
- Sakarya University, Faculty of Computer and Information Sciences, Department of Software Engineering, Sakarya, Turkey
| | - Burcu Çarklı Yavuz
- Sakarya University, Faculty of Computer and Information Sciences, Department of Information Systems Engineering, Sakarya, Turkey
| | - Yavuz Selim Oğur
- Sakarya University, Faculty of Medicine, Department of Psychiatry, Sakarya, Turkey
| | - Hilal Uslu Yuvacı
- Sakarya University, Faculty of Medicine, Department of Obstetrics and Gynecology, Sakarya, Turkey
| | - Esra Yazıcı
- Sakarya University, Faculty of Medicine, Department of Psychiatry, Sakarya, Turkey
| |
Collapse
|
86
|
Seyedtabib M, Kamyari N. Predicting polypharmacy in half a million adults in the Iranian population: comparison of machine learning algorithms. BMC Med Inform Decis Mak 2023; 23:84. [PMID: 37147615 PMCID: PMC10161984 DOI: 10.1186/s12911-023-02177-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Accepted: 04/21/2023] [Indexed: 05/07/2023] Open
Abstract
BACKGROUND Polypharmacy (PP) is increasingly common in Iran, and contributes to the substantial burden of drug-related morbidity, increasing the potential for drug interactions and potentially inappropriate medications. Machine learning algorithms (ML) can be employed as an alternative solution for the prediction of PP. Therefore, our study aimed to compare several ML algorithms to predict the PP using the health insurance claims data and choose the best-performing algorithm as a predictive tool for decision-making. METHODS This population-based cross-sectional study was performed between April 2021 and March 2022. After feature selection, information about 550 thousand patients were obtained from National Center for Health Insurance Research (NCHIR). Afterwards, several ML algorithms were trained to predict PP. Finally, to assess the models' performance, the metrics derived from the confusion matrix were calculated. RESULTS The study sample comprised 554 133 adults with a median (IQR) age of 51 years (40 - 62) that nested in 27 cities within the Khuzestan province of Iran. Most of the patients were female (62.5%), married (63.5%), and employed (83.2%) during the last year. The prevalence of PP in all populations was about 36.0%. After performing the feature selection, out of 23 features, the number of prescriptions, Insurance coverage for prescription drugs, and hypertension were found as the top three predictors. Experimental results showed that Random Forest (RF) performed better than other ML algorithms with recall, specificity, accuracy, precision and F1-score of 63.92%, 89.92%, 79.99%, 63.92% and 63.92% respectively. CONCLUSION It was found that ML provides a reasonable level of accuracy in predicting polypharmacy. Therefore, the prediction models based on ML, especially the RF algorithm, performed better than other methods for predicting PP in Iranian people in terms of the performance criteria.
Collapse
Affiliation(s)
- Maryam Seyedtabib
- Department of Biostatistics and Epidemiology, School of Health, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran
| | - Naser Kamyari
- Department of Biostatistics and Epidemiology, School of Health, Abadan University of Medical Sciences, Abadan, Iran.
| |
Collapse
|
87
|
Wang J, Xu P, Ji X, Li M, Lu W. Feature Selection in Machine Learning for Perovskite Materials Design and Discovery. MATERIALS (BASEL, SWITZERLAND) 2023; 16:3134. [PMID: 37109971 PMCID: PMC10146176 DOI: 10.3390/ma16083134] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Revised: 04/11/2023] [Accepted: 04/13/2023] [Indexed: 06/19/2023]
Abstract
Perovskite materials have been one of the most important research objects in materials science due to their excellent photoelectric properties as well as correspondingly complex structures. Machine learning (ML) methods have been playing an important role in the design and discovery of perovskite materials, while feature selection as a dimensionality reduction method has occupied a crucial position in the ML workflow. In this review, we introduced the recent advances in the applications of feature selection in perovskite materials. First, the development tendency of publications about ML in perovskite materials was analyzed, and the ML workflow for materials was summarized. Then the commonly used feature selection methods were briefly introduced, and the applications of feature selection in inorganic perovskites, hybrid organic-inorganic perovskites (HOIPs), and double perovskites (DPs) were reviewed. Finally, we put forward some directions for the future development of feature selection in machine learning for perovskite material design.
Collapse
Affiliation(s)
- Junya Wang
- Department of Mathematics, College of Sciences, Shanghai University, Shanghai 200444, China
| | - Pengcheng Xu
- Materials Genome Institute, Shanghai University, Shanghai 200444, China
| | - Xiaobo Ji
- Department of Chemistry, College of Sciences, Shanghai University, Shanghai 200444, China
| | - Minjie Li
- Department of Chemistry, College of Sciences, Shanghai University, Shanghai 200444, China
| | - Wencong Lu
- Department of Chemistry, College of Sciences, Shanghai University, Shanghai 200444, China
- Zhejiang Laboratory, Hangzhou 311100, China
- Key Laboratory of Silicate Cultural Relics Conservation (Shanghai University), Ministry of Education, Shanghai 200444, China
| |
Collapse
|
88
|
Tao R, Yu X, Lu J, Wang Y, Lu W, Zhang Z, Li H, Zhou J. A deep learning nomogram of continuous glucose monitoring data for the risk prediction of diabetic retinopathy in type 2 diabetes. Phys Eng Sci Med 2023; 46:813-825. [PMID: 37041318 DOI: 10.1007/s13246-023-01254-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 03/27/2023] [Indexed: 04/13/2023]
Abstract
Continuous glucose monitoring (CGM) data analysis will provide a new perspective to analyze factors related to diabetic retinopathy (DR). However, the problem of visualizing CGM data and automatically predicting the incidence of DR from CGM is still controversial. Here, we explored the feasibility of using CGM profiles to predict DR in type 2 diabetes (T2D) by deep learning approach. This study fused deep learning with a regularized nomogram to construct a novel deep learning nomogram from CGM profiles to identify patients at high risk of DR. Specifically, a deep learning network was employed to mine the nonlinear relationship between CGM profiles and DR. Moreover, a novel nomogram combining CGM deep factors with basic information was established to score the patients' DR risk. This dataset consists of 788 patients belonging to two cohorts: 494 in the training cohort and 294 in the testing cohort. The area under the curve (AUC) values of our deep learning nomogram were 0.82 and 0.80 in the training cohort and testing cohort, respectively. By incorporating basic clinical factors, the deep learning nomogram achieved an AUC of 0.86 in the training cohort and 0.85 in the testing cohort. The calibration plot and decision curve showed that the deep learning nomogram had the potential for clinical application. This analysis method of CGM profiles can be extended to other diabetic complications by further investigation.
Collapse
Affiliation(s)
- Rui Tao
- College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning, China
| | - Xia Yu
- College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning, China
| | - Jingyi Lu
- Department of Endocrinology and Metabolism, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai Clinical Center for Diabetes, 600 Yishan Road, Shanghai, 200233, China
| | - Yaxin Wang
- Department of Endocrinology and Metabolism, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai Clinical Center for Diabetes, 600 Yishan Road, Shanghai, 200233, China
| | - Wei Lu
- Department of Endocrinology and Metabolism, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai Clinical Center for Diabetes, 600 Yishan Road, Shanghai, 200233, China
| | - Zhanhu Zhang
- College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning, China
| | - Hongru Li
- College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning, China
| | - Jian Zhou
- Department of Endocrinology and Metabolism, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai Clinical Center for Diabetes, 600 Yishan Road, Shanghai, 200233, China.
| |
Collapse
|
89
|
Fajarda O, Almeida JR, Duarte-Pereira S, Silva RM, Oliveira JL. Methodology to identify a gene expression signature by merging microarray datasets. Comput Biol Med 2023; 159:106867. [PMID: 37060770 DOI: 10.1016/j.compbiomed.2023.106867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 03/01/2023] [Accepted: 03/30/2023] [Indexed: 04/17/2023]
Abstract
A vast number of microarray datasets have been produced as a way to identify differentially expressed genes and gene expression signatures. A better understanding of these biological processes can help in the diagnosis and prognosis of diseases, as well as in the therapeutic response to drugs. However, most of the available datasets are composed of a reduced number of samples, leading to low statistical, predictive and generalization power. One way to overcome this problem is by merging several microarray datasets into a single dataset, which is typically a challenging task. Statistical methods or supervised machine learning algorithms are usually used to determine gene expression signatures. Nevertheless, statistical methods require an arbitrary threshold to be defined, and supervised machine learning methods can be ineffective when applied to high-dimensional datasets like microarrays. We propose a methodology to identify gene expression signatures by merging microarray datasets. This methodology uses statistical methods to obtain several sets of differentially expressed genes and uses supervised machine learning algorithms to select the gene expression signature. This methodology was validated using two distinct research applications: one using heart failure and the other using autism spectrum disorder microarray datasets. For the first, we obtained a gene expression signature composed of 117 genes, with a classification accuracy of approximately 98%. For the second use case, we obtained a gene expression signature composed of 79 genes, with a classification accuracy of approximately 82%. This methodology was implemented in R language and is available, under the MIT licence, at https://github.com/bioinformatics-ua/MicroGES.
Collapse
Affiliation(s)
- Olga Fajarda
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal.
| | - João Rafael Almeida
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal; Department of Computation, University of A Coruña, A Coruña, Spain.
| | - Sara Duarte-Pereira
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal; Department of Medical Sciences and iBiMED-Institute of Biomedicine, University of Aveiro, Aveiro, Portugal.
| | - Raquel M Silva
- Universidade Católica Portuguesa, Faculty of Dental Medicine (FMD), Center for Interdisciplinary Research in Health (CIIS), Viseu, Portugal.
| | | |
Collapse
|
90
|
Daneshvar NHN, Masoudi-Sobhanzadeh Y, Omidi Y. A voting-based machine learning approach for classifying biological and clinical datasets. BMC Bioinformatics 2023; 24:140. [PMID: 37041456 PMCID: PMC10088226 DOI: 10.1186/s12859-023-05274-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Accepted: 04/05/2023] [Indexed: 04/13/2023] Open
Abstract
BACKGROUND Different machine learning techniques have been proposed to classify a wide range of biological/clinical data. Given the practicability of these approaches accordingly, various software packages have been also designed and developed. However, the existing methods suffer from several limitations such as overfitting on a specific dataset, ignoring the feature selection concept in the preprocessing step, and losing their performance on large-size datasets. To tackle the mentioned restrictions, in this study, we introduced a machine learning framework consisting of two main steps. First, our previously suggested optimization algorithm (Trader) was extended to select a near-optimal subset of features/genes. Second, a voting-based framework was proposed to classify the biological/clinical data with high accuracy. To evaluate the efficiency of the proposed method, it was applied to 13 biological/clinical datasets, and the outcomes were comprehensively compared with the prior methods. RESULTS The results demonstrated that the Trader algorithm could select a near-optimal subset of features with a significant level of p-value < 0.01 relative to the compared algorithms. Additionally, on the large-sie datasets, the proposed machine learning framework improved prior studies by ~ 10% in terms of the mean values associated with fivefold cross-validation of accuracy, precision, recall, specificity, and F-measure. CONCLUSION Based on the obtained results, it can be concluded that a proper configuration of efficient algorithms and methods can increase the prediction power of machine learning approaches and help researchers in designing practical diagnosis health care systems and offering effective treatment plans.
Collapse
Affiliation(s)
| | - Yosef Masoudi-Sobhanzadeh
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran.
- Faculty of Advanced Medical Sciences, Tabriz University of Medical Sciences, Tabriz, Iran.
| | - Yadollah Omidi
- Department of Pharmaceutical Sciences, College of Pharmacy, Nova Southeastern University, Florida, 33328, USA.
| |
Collapse
|
91
|
Karras A, Karras C, Schizas N, Avlonitis M, Sioutas S. AutoML with Bayesian Optimizations for Big Data Management. INFORMATION 2023. [DOI: 10.3390/info14040223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/08/2023] Open
Abstract
The field of automated machine learning (AutoML) has gained significant attention in recent years due to its ability to automate the process of building and optimizing machine learning models. However, the increasing amount of big data being generated has presented new challenges for AutoML systems in terms of big data management. In this paper, we introduce Fabolas and learning curve extrapolation as two methods for accelerating hyperparameter optimization. Four methods for quickening training were presented including Bag of Little Bootstraps, k-means clustering for Support Vector Machines, subsample size selection for gradient descent, and subsampling for logistic regression. Additionally, we also discuss the use of Markov Chain Monte Carlo (MCMC) methods and other stochastic optimization techniques to improve the efficiency of AutoML systems in managing big data. These methods enhance various facets of the training process, making it feasible to combine them in diverse ways to gain further speedups. We review several combinations that have potential and provide a comprehensive understanding of the current state of AutoML and its potential for managing big data in various industries. Furthermore, we also mention the importance of parallel computing and distributed systems to improve the scalability of the AutoML systems while working with big data.
Collapse
Affiliation(s)
- Aristeidis Karras
- Computer Engineering and Informatics Department, University of Patras, 26504 Patras, Greece
| | - Christos Karras
- Computer Engineering and Informatics Department, University of Patras, 26504 Patras, Greece
| | - Nikolaos Schizas
- Computer Engineering and Informatics Department, University of Patras, 26504 Patras, Greece
| | - Markos Avlonitis
- Department of Informatics, Ionian University, 49100 Kerkira, Greece
| | - Spyros Sioutas
- Computer Engineering and Informatics Department, University of Patras, 26504 Patras, Greece
| |
Collapse
|
92
|
Lötsch J, Ultsch A. Recursive computed ABC (cABC) analysis as a precise method for reducing machine learning based feature sets to their minimum informative size. Sci Rep 2023; 13:5470. [PMID: 37016033 PMCID: PMC10073099 DOI: 10.1038/s41598-023-32396-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 03/27/2023] [Indexed: 04/06/2023] Open
Abstract
Selecting the k best features is a common task in machine learning. Typically, a few features have high importance, but many have low importance (right-skewed distribution). This report proposes a numerically precise method to address this skewed feature importance distribution in order to reduce a feature set to the informative minimum of items. Computed ABC analysis (cABC) is an item categorization method that aims to identify the most important items by partitioning a set of non-negative numerical items into subsets "A", "B", and "C" such that subset "A" contains the "few important" items based on specific properties of ABC curves defined by their relationship to Lorenz curves. In its recursive form, the cABC analysis can be applied again to subset "A". A generic image dataset and three biomedical datasets (lipidomics and two genomics datasets) with a large number of variables were used to perform the experiments. The experimental results show that the recursive cABC analysis limits the dimensions of the data projection to a minimum where the relevant information is still preserved and directs the feature selection in machine learning to the most important class-relevant information, including filtering feature sets for nonsense variables. Feature sets were reduced to 10% or less of the original variables and still provided accurate classification in data not used for feature selection. cABC analysis, in its recursive variant, provides a computationally precise means of reducing information to a minimum. The minimum is the result of a computation of the number of k most relevant items, rather than a decision to select the k best items from a list. In addition, there are precise criteria for stopping the reduction process. The reduction to the most important features can improve the human understanding of the properties of the data set. The cABC method is implemented in the Python package "cABCanalysis" available at https://pypi.org/project/cABCanalysis/ .
Collapse
Affiliation(s)
- Jörn Lötsch
- Institute of Clinical Pharmacology, Goethe - University, Theodor - Stern - Kai 7, 60590, Frankfurt am Main, Germany.
- Fraunhofer Institute for Translational Medicine and Pharmacology ITMP, Theodor - Stern - Kai 7, 60596, Frankfurt am Main, Germany.
| | - Alfred Ultsch
- DataBionics Research Group, University of Marburg, Hans - Meerwein - Straße 22, 35032, Marburg, Germany
| |
Collapse
|
93
|
Zang Z, Xu Y, Lu L, Geng Y, Yang S, Li SZ. UDRN: Unified Dimensional Reduction Neural Network for feature selection and feature projection. Neural Netw 2023; 161:626-637. [PMID: 36827960 DOI: 10.1016/j.neunet.2023.02.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 11/22/2022] [Accepted: 02/11/2023] [Indexed: 02/17/2023]
Abstract
Dimensional reduction (DR) maps high-dimensional data into a lower dimensions latent space with minimized defined optimization objectives. The two independent branches of DR are feature selection (FS) and feature projection (FP). FS focuses on selecting a critical subset of dimensions but risks destroying the data distribution (structure). On the other hand, FP combines all the input features into lower dimensions space, aiming to maintain the data structure, but lacks interpretability and sparsity. Moreover, FS and FP are traditionally incompatible categories and have not been unified into an amicable framework. Therefore, we consider that the ideal DR approach combines both FS and FP into a unified end-to-end manifold learning framework, simultaneously performing fundamental feature discovery while maintaining the intrinsic relationships between data samples in the latent space. This paper proposes a unified framework named Unified Dimensional Reduction Network (UDRN) to integrate FS and FP in an end-to-end way. Furthermore, a novel network framework is designed to implement FS and FP tasks separately using a stacked feature selection network and feature projection network. In addition, a stronger manifold assumption and a novel loss function are proposed. Furthermore, the loss function can leverage the priors of data augmentation to enhance the generalization ability of the proposed UDRN. Finally, comprehensive experimental results on four image and four biological datasets, including very high-dimensional data, demonstrate the advantages of DRN over existing methods (FS, FP, and FS&FP pipeline), especially in downstream tasks such as classification and visualization.
Collapse
Affiliation(s)
- Zelin Zang
- Zhejiang University, Hangzhou, 310000, China; Westlake University, AI Lab, School of Engineering, Hangzhou, 310000, China; Westlake Institute for Advanced Study, Institute of Advanced Technology, Hangzhou, 310000, China.
| | - Yongjie Xu
- Zhejiang University, Hangzhou, 310000, China; Westlake University, AI Lab, School of Engineering, Hangzhou, 310000, China; Westlake Institute for Advanced Study, Institute of Advanced Technology, Hangzhou, 310000, China
| | - Linyan Lu
- China Telecom Corporation Limited, Hangzhou Branch, Hangzhou, 310000, China
| | - Yulan Geng
- Westlake University, AI Lab, School of Engineering, Hangzhou, 310000, China
| | - Senqiao Yang
- Westlake University, AI Lab, School of Engineering, Hangzhou, 310000, China
| | - Stan Z Li
- Westlake University, AI Lab, School of Engineering, Hangzhou, 310000, China; Westlake Institute for Advanced Study, Institute of Advanced Technology, Hangzhou, 310000, China.
| |
Collapse
|
94
|
Guhan Seshadri N, Agrawal S, Kumar Singh B, Geethanjali B, Mahesh V, Pachori RB. EEG based classification of children with learning disabilities using shallow and deep neural network. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
95
|
Alhenawi E, Al-Sayyed R, Hudaib A, Mirjalili S. Improved intelligent water drop-based hybrid feature selection method for microarray data processing. Comput Biol Chem 2023; 103:107809. [PMID: 36696844 DOI: 10.1016/j.compbiolchem.2022.107809] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2021] [Revised: 12/13/2022] [Accepted: 12/30/2022] [Indexed: 01/15/2023]
Abstract
Classifying microarray datasets, which usually contains many noise genes that degrade the performance of classifiers and decrease classification accuracy rate, is a competitive research topic. Feature selection (FS) is one of the most practical ways for finding the most optimal subset of genes that increases classification's accuracy for diagnostic and prognostic prediction of tumor cancer from the microarray datasets. This means that we always need to develop more efficient FS methods, that select only optimal or close-to-optimal subset of features to improve classification performance. In this paper, we propose a hybrid FS method for microarray data processing, that combines an ensemble filter with an Improved Intelligent Water Drop (IIWD) algorithm as a wrapper by adding one of three local search (LS) algorithms: Tabu search (TS), Novel LS algorithm (NLSA), or Hill Climbing (HC) in each iteration from IWD, and using a correlation coefficient filter as a heuristic undesirability (HUD) for next node selection in the original IWD algorithm. The effects of adding three different LS algorithms to the proposed IIWD algorithm have been evaluated through comparing the performance of the proposed ensemble filter-IIWD-based wrapper without adding any LS algorithms named (PHFS-IWD) FS method versus its performance when adding a specific LS algorithm from (TS, NLSA or HC) in FS methods named, (PHFS-IWDTS, PHFS-IWDNLSA, and PHFS-IWDHC), respectively. Naïve Bayes(NB) classifier with five microarray datasets have been deployed for evaluating and comparing the proposed hybrid FS methods. Results show that using LS algorithms in each iteration from the IWD algorithm improves F-score value with an average equal to 5% compared with PHFS-IWD. Also, PHFS-IWDNLSA improves the F-score value with an average of 4.15% over PHFS-IWDTS, and 5.67% over PHFS-IWDHC while PHFS-IWDTS outperformed PHFS-IWDHC with an average of increment equal to 1.6%. On the other hand, the proposed hybrid-based FS methods improve accuracy with an average equal to 8.92% in three out of five datasets and decrease the number of genes with a percentage of 58.5% in all five datasets compared with six of the most recent state-of-the-art FS methods.
Collapse
Affiliation(s)
- Esra'a Alhenawi
- Software Engineering Department, Al-Ahliyya Amman University, Amman, Jordan; King Abdullah II School for Information Technology, The University of Jordan, Amman, Jordan.
| | - Rizik Al-Sayyed
- King Abdullah II School for Information Technology, The University of Jordan, Amman, Jordan.
| | - Amjad Hudaib
- King Abdullah II School for Information Technology, The University of Jordan, Amman, Jordan.
| | - Seyedali Mirjalili
- Center for Artificial Intelligence Research and Optimization, Torrens University Australia, Fortitude Valley, Brisbane, 4006 QLD, Australia; University Research and Innovation Center, Obuda University, Budapest, Hungary.
| |
Collapse
|
96
|
Chadaga K, Prabhu S, Bhat V, Sampathila N, Umakanth S, Chadaga R. A Decision Support System for Diagnosis of COVID-19 from Non-COVID-19 Influenza-like Illness Using Explainable Artificial Intelligence. Bioengineering (Basel) 2023; 10:439. [PMID: 37106626 PMCID: PMC10135993 DOI: 10.3390/bioengineering10040439] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 03/27/2023] [Accepted: 03/29/2023] [Indexed: 04/03/2023] Open
Abstract
The coronavirus pandemic emerged in early 2020 and turned out to be deadly, killing a vast number of people all around the world. Fortunately, vaccines have been discovered, and they seem effectual in controlling the severe prognosis induced by the virus. The reverse transcription-polymerase chain reaction (RT-PCR) test is the current golden standard for diagnosing different infectious diseases, including COVID-19; however, it is not always accurate. Therefore, it is extremely crucial to find an alternative diagnosis method which can support the results of the standard RT-PCR test. Hence, a decision support system has been proposed in this study that uses machine learning and deep learning techniques to predict the COVID-19 diagnosis of a patient using clinical, demographic and blood markers. The patient data used in this research were collected from two Manipal hospitals in India and a custom-made, stacked, multi-level ensemble classifier has been used to predict the COVID-19 diagnosis. Deep learning techniques such as deep neural networks (DNN) and one-dimensional convolutional networks (1D-CNN) have also been utilized. Further, explainable artificial techniques (XAI) such as Shapley additive values (SHAP), ELI5, local interpretable model explainer (LIME), and QLattice have been used to make the models more precise and understandable. Among all of the algorithms, the multi-level stacked model obtained an excellent accuracy of 96%. The precision, recall, f1-score and AUC obtained were 94%, 95%, 94% and 98% respectively. The models can be used as a decision support system for the initial screening of coronavirus patients and can also help ease the existing burden on medical infrastructure.
Collapse
Affiliation(s)
- Krishnaraj Chadaga
- Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal 576104, India;
| | - Srikanth Prabhu
- Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal 576104, India;
| | - Vivekananda Bhat
- Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal 576104, India;
| | - Niranjana Sampathila
- Department of Biomedical Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal 576104, India;
| | - Shashikiran Umakanth
- Department of Medicine, Dr. TMA Hospital, Manipal Academy of Higher Education, Manipal 576104, India;
| | - Rajagopala Chadaga
- Department of Mechanical and Industrial Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal 576104, India;
| |
Collapse
|
97
|
Verdicchio M, Brancato V, Cavaliere C, Isgrò F, Salvatore M, Aiello M. A pathomic approach for tumor-infiltrating lymphocytes classification on breast cancer digital pathology images. Heliyon 2023; 9:e14371. [PMID: 36950640 PMCID: PMC10025040 DOI: 10.1016/j.heliyon.2023.e14371] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 03/03/2023] [Accepted: 03/03/2023] [Indexed: 03/11/2023] Open
Abstract
Background and objectives The detection of tumor-infiltrating lymphocytes (TILs) could aid in the development of objective measures of the infiltration grade and can support decision-making in breast cancer (BC). However, manual quantification of TILs in BC histopathological whole slide images (WSI) is currently based on a visual assessment, thus resulting not standardized, not reproducible, and time-consuming for pathologists. In this work, a novel pathomic approach, aimed to apply high-throughput image feature extraction techniques to analyze the microscopic patterns in WSI, is proposed. In fact, pathomic features provide additional information concerning the underlying biological processes compared to the WSI visual interpretation, thus providing more easily interpretable and explainable results than the most frequently investigated Deep Learning based methods in the literature. Methods A dataset containing 1037 regions of interest with tissue compartments and TILs annotated on 195 TNBC and HER2+ BC hematoxylin and eosin (H&E)-stained WSI was used. After segmenting nuclei within tumor-associated stroma using a watershed-based approach, 71 pathomic features were extracted from each nucleus and reduced using a Spearman's correlation filter followed by a nonparametric Wilcoxon rank-sum test and least absolute shrinkage and selection operator. The relevant features were used to classify each candidate nucleus as either TILs or non-TILs using 5 multivariable machine learning classification models trained using 5-fold cross-validation (1) without resampling, (2) with the synthetic minority over-sampling technique and (3) with downsampling. The prediction performance of the models was assessed using ROC curves. Results 21 features were selected, with most of them related to the well-known TILs properties of having regular shape, clearer margins, high peak intensity, more homogeneous enhancement and different textural pattern than other cells. The best performance was obtained by Random-Forest with ROC AUC of 0.86, regardless of resampling technique. Conclusions The presented approach holds promise for the classification of TILs in BC H&E-stained WSI and could provide support to pathologists for a reliable, rapid and interpretable clinical assessment of TILs in BC.
Collapse
Affiliation(s)
| | | | - Carlo Cavaliere
- IRCCS SYNLAB SDN, Via E. Gianturco 113, Naples, 80143, Italy
| | - Francesco Isgrò
- Department of Electrical Engineering and Information Technologies, University of Naples Federico II, Claudio 21, Naples, 80125, Italy
| | - Marco Salvatore
- IRCCS SYNLAB SDN, Via E. Gianturco 113, Naples, 80143, Italy
| | - Marco Aiello
- IRCCS SYNLAB SDN, Via E. Gianturco 113, Naples, 80143, Italy
| |
Collapse
|
98
|
Tang X, Mo Z, Chang C, Qian X. Group-shrinkage feature selection with a spatial network for mining DNA methylation data. Comput Biol Med 2023; 154:106573. [PMID: 36706568 DOI: 10.1016/j.compbiomed.2023.106573] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 01/05/2023] [Accepted: 01/22/2023] [Indexed: 01/25/2023]
Abstract
Identifying disease-related biomarkers from high-dimensional DNA methylation data helps in reducing early screening costs and inferring pathogenesis mechanisms. Good discovery results have been achieved through spatial correlation methods of methylation sites, group-based regularization, and network constraints. However, these methods still have some key limitations as they cannot exclude isolated differential sites and only consider adjacent site ordering. Therefore, we propose a group-shrinkage feature selection algorithm to encourage the selection of clustered sites and discourage the selection of isolated differential sites. Specifically, a network-guided group-shrinkage strategy is developed to penalize weakly-correlated isolated methylation sites through a network structure constraint. The spatial network is constructed based on spatial correlation information of DNA methylation sites, where this information accounts for the uneven site distribution. The experimental simulations and applications demonstrated that the proposed method outperforms the advanced regularization methods, especially in rejecting isolated methylation sites; hence this study provides an efficient and clinical-valuable method for biomarker candidate discovery in DNA methylation data. Additionally, the proposed method exhibits enhanced reliability due to introducing biological prior knowledge into a regularization-based feature selection framework and could promote more research in the integration between biological prior knowledge and classical feature selection methods, thus facilitating their clinical application. Our source codes will be released at https://github.com/SJTUBME-QianLab/Group-shrinkage-Spatial-Network once this manuscript is accepted for publication.
Collapse
Affiliation(s)
- Xinlu Tang
- Medical Image and Health Informatics Lab, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China.
| | - Zhanfeng Mo
- School of Computer Science and Engineering, Nanyang Technological University, Singapore.
| | - Cheng Chang
- Department of Nuclear Medicine, Shanghai, Chest Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200030, China.
| | - Xiaohua Qian
- Medical Image and Health Informatics Lab, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China.
| |
Collapse
|
99
|
Sadeghian Z, Akbari E, Nematzadeh H, Motameni H. A review of feature selection methods based on meta-heuristic algorithms. J EXP THEOR ARTIF IN 2023. [DOI: 10.1080/0952813x.2023.2183267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2023]
Affiliation(s)
- Zohre Sadeghian
- Department of Computer Engineering, Sari Branch, Islamic Azad University, Sari, Iran
| | - Ebrahim Akbari
- Department of Computer Engineering, Sari Branch, Islamic Azad University, Sari, Iran
| | - Hossein Nematzadeh
- Department of Computer Engineering, Sari Branch, Islamic Azad University, Sari, Iran
| | - Homayun Motameni
- Department of Computer Engineering, Sari Branch, Islamic Azad University, Sari, Iran
| |
Collapse
|
100
|
Devi RM, Premkumar M, Kiruthiga G, Sowmya R. IGJO: An Improved Golden Jackel Optimization Algorithm Using Local Escaping Operator for Feature Selection Problems. Neural Process Lett 2023. [DOI: 10.1007/s11063-023-11146-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]
|