51
|
Lee H, Lee Y, Jo M, Nam S, Jo J, Lee C. Enhancing Diagnosis of Rotating Elements in Roll-to-Roll Manufacturing Systems through Feature Selection Approach Considering Overlapping Data Density and Distance Analysis. SENSORS (BASEL, SWITZERLAND) 2023; 23:7857. [PMID: 37765913 PMCID: PMC10534779 DOI: 10.3390/s23187857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 09/01/2023] [Accepted: 09/11/2023] [Indexed: 09/29/2023]
Abstract
Roll-to-roll manufacturing systems have been widely adopted for their cost-effectiveness, eco-friendliness, and mass-production capabilities, utilizing thin and flexible substrates. However, in these systems, defects in the rotating components such as the rollers and bearings can result in severe defects in the functional layers. Therefore, the development of an intelligent diagnostic model is crucial for effectively identifying these rotating component defects. In this study, a quantitative feature-selection method, feature partial density, to develop high-efficiency diagnostic models was proposed. The feature combinations extracted from the measured signals were evaluated based on the partial density, which is the density of the remaining data excluding the highest class in overlapping regions and the Mahalanobis distance by class to assess the classification performance of the models. The validity of the proposed algorithm was verified through the construction of ranked model groups and comparison with existing feature-selection methods. The high-ranking group selected by the algorithm outperformed the other groups in terms of training time, accuracy, and positive predictive value. Moreover, the top feature combination demonstrated superior performance across all indicators compared to existing methods.
Collapse
Affiliation(s)
- Haemi Lee
- Department of Mechanical Design and Production Engineering, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05030, Republic of Korea
| | - Yoonjae Lee
- Department of Mechanical Design and Production Engineering, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05030, Republic of Korea
| | - Minho Jo
- Department of Mechanical Design and Production Engineering, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05030, Republic of Korea
| | - Sanghoon Nam
- Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Jeongdai Jo
- Department of Printed Electronics, Korea Institute of Machinery and Materials, 156, Gajeongbuk-ro, Yuseong-gu, Daejeon 34103, Republic of Korea
| | - Changwoo Lee
- Department of Mechanical and Aerospace Engineering, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05030, Republic of Korea
| |
Collapse
|
52
|
Guo B, Liu H, Niu L. Integration of natural and deep artificial cognitive models in medical images: BERT-based NER and relation extraction for electronic medical records. Front Neurosci 2023; 17:1266771. [PMID: 37732304 PMCID: PMC10507183 DOI: 10.3389/fnins.2023.1266771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 08/14/2023] [Indexed: 09/22/2023] Open
Abstract
Introduction Medical images and signals are important data sources in the medical field, and they contain key information such as patients' physiology, pathology, and genetics. However, due to the complexity and diversity of medical images and signals, resulting in difficulties in medical knowledge acquisition and decision support. Methods In order to solve this problem, this paper proposes an end-to-end framework based on BERT for NER and RE tasks in electronic medical records. Our framework first integrates NER and RE tasks into a unified model, adopting an end-to-end processing manner, which removes the limitation and error propagation of multiple independent steps in traditional methods. Second, by pre-training and fine-tuning the BERT model on large-scale electronic medical record data, we enable the model to obtain rich semantic representation capabilities that adapt to the needs of medical fields and tasks. Finally, through multi-task learning, we enable the model to make full use of the correlation and complementarity between NER and RE tasks, and improve the generalization ability and effect of the model on different data sets. Results and discussion We conduct experimental evaluation on four electronic medical record datasets, and the model significantly out performs other methods on different datasets in the NER task. In the RE task, the EMLB model also achieved advantages on different data sets, especially in the multi-task learning mode, its performance has been significantly improved, and the ETE and MTL modules performed well in terms of comprehensive precision and recall. Our research provides an innovative solution for medical image and signal data.
Collapse
Affiliation(s)
- Bo Guo
- School of Computer and Information Engineering, Fuyang Normal University, Fuyang, China
- Department of Computing, Faculty of Communication, Visual Art and Computing, Universiti Selangor, Bestari Jaya, Selangor, Malaysia
| | - Huaming Liu
- School of Computer and Information Engineering, Fuyang Normal University, Fuyang, China
| | - Lei Niu
- School of Computer and Information Engineering, Fuyang Normal University, Fuyang, China
| |
Collapse
|
53
|
Mahmoud AY, Neagu D, Scrimieri D, Abdullatif ARA. Early diagnosis and personalised treatment focusing on synthetic data modelling: Novel visual learning approach in healthcare. Comput Biol Med 2023; 164:107295. [PMID: 37557053 DOI: 10.1016/j.compbiomed.2023.107295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 07/26/2023] [Accepted: 07/28/2023] [Indexed: 08/11/2023]
Abstract
The early diagnosis and personalised treatment of diseases are facilitated by machine learning. The quality of data has an impact on diagnosis because medical data are usually sparse, imbalanced, and contain irrelevant attributes, resulting in suboptimal diagnosis. To address the impacts of data challenges, improve resource allocation, and achieve better health outcomes, a novel visual learning approach is proposed. This study contributes to the visual learning approach by determining whether less or more synthetic data are required to improve the quality of a dataset, such as the number of observations and features, according to the intended personalised treatment and early diagnosis. In addition, numerous visualisation experiments are conducted, including using statistical characteristics, cumulative sums, histograms, correlation matrix, root mean square error, and principal component analysis in order to visualise both original and synthetic data to address the data challenges. Real medical datasets for cancer, heart disease, diabetes, cryotherapy and immunotherapy are selected as case studies. As a benchmark and point of classification comparison in terms of such as accuracy, sensitivity, and specificity, several models are implemented such as k-Nearest Neighbours and Random Forest. To simulate algorithm implementation and data, Generative Adversarial Network is used to create and manipulate synthetic data, whilst, Random Forest is implemented to classify the data. An amendable and adaptable system is constructed by combining Generative Adversarial Network and Random Forest models. The system model presents working steps, overview and flowchart. Experiments reveal that the majority of data-enhancement scenarios allow for the application of visual learning in the first stage of data analysis as a novel approach. To achieve meaningful adaptable synergy between appropriate quality data and optimal classification performance while maintaining statistical characteristics, visual learning provides researchers and practitioners with practical human-in-the-loop machine learning visualisation tools. Prior to implementing algorithms, the visual learning approach can be used to actualise early, and personalised diagnosis. For the immunotherapy data, the Random Forest performed best with precision, recall, f-measure, accuracy, sensitivity, and specificity of 81%, 82%, 81%, 88%, 95%, and 60%, as opposed to 91%, 96%, 93%, 93%, 96%, and 73% for synthetic data, respectively. Future studies might examine the optimal strategies to balance the quantity and quality of medical data.
Collapse
Affiliation(s)
- Ahsanullah Yunas Mahmoud
- Faculty of Engineering and Informatics, University of Bradford, Bradford, England, United Kingdom.
| | - Daniel Neagu
- Faculty of Engineering and Informatics, University of Bradford, Bradford, England, United Kingdom
| | - Daniele Scrimieri
- Faculty of Engineering and Informatics, University of Bradford, Bradford, England, United Kingdom
| | | |
Collapse
|
54
|
Munir N, McMorrow R, Mulrennan K, Whitaker D, McLoone S, Kellomäki M, Talvitie E, Lyyra I, McAfee M. Interpretable Machine Learning Methods for Monitoring Polymer Degradation in Extrusion of Polylactic Acid. Polymers (Basel) 2023; 15:3566. [PMID: 37688192 PMCID: PMC10489772 DOI: 10.3390/polym15173566] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 08/17/2023] [Accepted: 08/24/2023] [Indexed: 09/10/2023] Open
Abstract
This work investigates real-time monitoring of extrusion-induced degradation in different grades of PLA across a range of process conditions and machine set-ups. Data on machine settings together with in-process sensor data, including temperature, pressure, and near-infrared (NIR) spectra, are used as inputs to predict the molecular weight and mechanical properties of the product. Many soft sensor approaches based on complex spectral data are essentially 'black-box' in nature, which can limit industrial acceptability. Hence, the focus here is on identifying an optimal approach to developing interpretable models while achieving high predictive accuracy and robustness across different process settings. The performance of a Recursive Feature Elimination (RFE) approach was compared to more common dimension reduction and regression approaches including Partial Least Squares (PLS), iterative PLS (i-PLS), Principal Component Regression (PCR), ridge regression, Least Absolute Shrinkage and Selection Operator (LASSO), and Random Forest (RF). It is shown that for medical-grade PLA processed under moisture-controlled conditions, accurate prediction of molecular weight is possible over a wide range of process conditions and different machine settings (different nozzle types for downstream fibre spinning) with an RFE-RF algorithm. Similarly, for the prediction of yield stress, RFE-RF achieved excellent predictive performance, outperforming the other approaches in terms of simplicity, interpretability, and accuracy. The features selected by the RFE model provide important insights to the process. It was found that change in molecular weight was not an important factor affecting the mechanical properties of the PLA, which is primarily related to the pressure and temperature at the latter stages of the extrusion process. The temperature at the extruder exit was also the most important predictor of degradation of the polymer molecular weight, highlighting the importance of accurate melt temperature control in the process. RFE not only outperforms more established methods as a soft sensor method, but also has significant advantages in terms of computational efficiency, simplicity, and interpretability. RFE-based soft sensors are promising for better quality control in processing thermally sensitive polymers such as PLA, in particular demonstrating for the first time the ability to monitor molecular weight degradation during processing across various machine settings.
Collapse
Affiliation(s)
- Nimra Munir
- Centre for Mathematical Modelling and Intelligent Systems for Health and Environment (MISHE), Atlantic Technological University, ATU Sligo, Ash Lane, F91 YW50 Sligo, Ireland;
- Centre for Precision Engineering, Materials and Manufacturing (PEM Centre), Atlantic Technological University, ATU Sligo, Ash Lane, F91 YW50 Sligo, Ireland
| | - Ross McMorrow
- Department of Mechatronic Engineering, Atlantic Technological University, ATU Sligo, Ash Lane, F91 YW50 Sligo, Ireland;
| | - Konrad Mulrennan
- Centre for Mathematical Modelling and Intelligent Systems for Health and Environment (MISHE), Atlantic Technological University, ATU Sligo, Ash Lane, F91 YW50 Sligo, Ireland;
- Centre for Precision Engineering, Materials and Manufacturing (PEM Centre), Atlantic Technological University, ATU Sligo, Ash Lane, F91 YW50 Sligo, Ireland
| | - Darren Whitaker
- Perceptive Engineering-An Applied Materials Company, Keckwick Lane, Daresbury WA4 4AB, UK;
| | - Seán McLoone
- Centre for Intelligent Autonomous Manufacturing Systems, Queen’s University Belfast, Belfast BT7 1NN, UK;
| | - Minna Kellomäki
- Biomaterials and Tissue Engineering Group, Faculty of Medicine and Health Technology, BioMediTech, Tampere University, 33720 Tampere, Finland; (M.K.); (E.T.); (I.L.)
| | - Elina Talvitie
- Biomaterials and Tissue Engineering Group, Faculty of Medicine and Health Technology, BioMediTech, Tampere University, 33720 Tampere, Finland; (M.K.); (E.T.); (I.L.)
| | - Inari Lyyra
- Biomaterials and Tissue Engineering Group, Faculty of Medicine and Health Technology, BioMediTech, Tampere University, 33720 Tampere, Finland; (M.K.); (E.T.); (I.L.)
| | - Marion McAfee
- Centre for Mathematical Modelling and Intelligent Systems for Health and Environment (MISHE), Atlantic Technological University, ATU Sligo, Ash Lane, F91 YW50 Sligo, Ireland;
- Centre for Precision Engineering, Materials and Manufacturing (PEM Centre), Atlantic Technological University, ATU Sligo, Ash Lane, F91 YW50 Sligo, Ireland
| |
Collapse
|
55
|
Rothe F, Berger J, Welker P, Fiebelkorn R, Kupper S, Kiesel D, Gedat E, Ohrndorf S. Fluorescence optical imaging feature selection with machine learning for differential diagnosis of selected rheumatic diseases. Front Med (Lausanne) 2023; 10:1228833. [PMID: 37671403 PMCID: PMC10475553 DOI: 10.3389/fmed.2023.1228833] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 07/28/2023] [Indexed: 09/07/2023] Open
Abstract
Background and objective Accurate and fast diagnosis of rheumatic diseases affecting the hands is essential for further treatment decisions. Fluorescence optical imaging (FOI) visualizes inflammation-induced impaired microcirculation by increasing signal intensity, resulting in different image features. This analysis aimed to find specific image features in FOI that might be important for accurately diagnosing different rheumatic diseases. Patients and methods FOI images of the hands of patients with different types of rheumatic diseases, such as rheumatoid arthritis (RA), osteoarthritis (OA), and connective tissue diseases (CTD), were assessed in a reading of 20 different image features in three phases of the contrast agent dynamics, yielding 60 different features for each patient. The readings were analyzed for mutual differential diagnosis of the three diseases (One-vs-One) and each disease in all data (One-vs-Rest). In the first step, statistical tools and machine-learning-based methods were applied to reveal the importance rankings of the features, that is, to find features that contribute most to the model-based classification. In the second step machine learning with a stepwise increasing number of features was applied, sequentially adding at each step the most crucial remaining feature to extract a minimized subset that yields the highest diagnostic accuracy. Results In total, n = 605 FOI of both hands were analyzed (n = 235 with RA, n = 229 with OA, and n = 141 with CTD). All classification problems showed maximum accuracy with a reduced set of image features. For RA-vs.-OA, five features were needed for high accuracy. For RA-vs.-CTD ten, OA-vs.-CTD sixteen, RA-vs.-Rest five, OA-vs.-Rest eleven, and CTD-vs-Rest fifteen, features were needed, respectively. For all problems, the final importance ranking of the features with respect to the contrast agent dynamics was determined. Conclusions With the presented investigations, the set of features in FOI examinations relevant to the differential diagnosis of the selected rheumatic diseases could be remarkably reduced, providing helpful information for the physician.
Collapse
Affiliation(s)
- Felix Rothe
- Telematics Research Group, Wildau Technical University of Applied Sciences, Wildau, Germany
| | | | - Pia Welker
- Institute of Functional Anatomy, Charité—Universitätsmedizin Berlin, Berlin, Germany
| | - Richard Fiebelkorn
- Telematics Research Group, Wildau Technical University of Applied Sciences, Wildau, Germany
| | - Stefan Kupper
- Telematics Research Group, Wildau Technical University of Applied Sciences, Wildau, Germany
| | - Denise Kiesel
- Institute of Functional Anatomy, Charité—Universitätsmedizin Berlin, Berlin, Germany
| | - Egbert Gedat
- Telematics Research Group, Wildau Technical University of Applied Sciences, Wildau, Germany
- Xiralite GmbH, Berlin, Germany
| | - Sarah Ohrndorf
- Department of Rheumatology and Clinical Immunology, Charité—Universitätsmedizin Berlin, Berlin, Germany
| |
Collapse
|
56
|
Al-Tashi Q, Saad MB, Sheshadri A, Wu CC, Chang JY, Al-Lazikani B, Gibbons C, Vokes NI, Zhang J, Lee JJ, Heymach JV, Jaffray D, Mirjalili S, Wu J. SwarmDeepSurv: swarm intelligence advances deep survival network for prognostic radiomics signatures in four solid cancers. PATTERNS (NEW YORK, N.Y.) 2023; 4:100777. [PMID: 37602223 PMCID: PMC10435962 DOI: 10.1016/j.patter.2023.100777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 04/18/2023] [Accepted: 05/26/2023] [Indexed: 08/22/2023]
Abstract
Survival models exist to study relationships between biomarkers and treatment effects. Deep learning-powered survival models supersede the classical Cox proportional hazards (CoxPH) model, but substantial performance drops were observed on high-dimensional features because of irrelevant/redundant information. To fill this gap, we proposed SwarmDeepSurv by integrating swarm intelligence algorithms with the deep survival model. Furthermore, four objective functions were designed to optimize prognostic prediction while regularizing selected feature numbers. When testing on multicenter sets (n = 1,058) of four different cancer types, SwarmDeepSurv was less prone to overfitting and achieved optimal patient risk stratification compared with popular survival modeling algorithms. Strikingly, SwarmDeepSurv selected different features compared with classical feature selection algorithms, including the least absolute shrinkage and selection operator (LASSO), with nearly no feature overlapping across these models. Taken together, SwarmDeepSurv offers an alternative approach to model relationships between radiomics features and survival endpoints, which can further extend to study other input data types including genomics.
Collapse
Affiliation(s)
- Qasem Al-Tashi
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Maliazurina B. Saad
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Ajay Sheshadri
- Department of Pulmonary Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Carol C. Wu
- Department of Thoracic Imaging, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Joe Y. Chang
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Bissan Al-Lazikani
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Christopher Gibbons
- Section of Patient-Centered Analytics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Natalie I. Vokes
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Jianjun Zhang
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - J. Jack Lee
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - John V. Heymach
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - David Jaffray
- Office of the Chief Technology and Digital Officer, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Seyedali Mirjalili
- Centre for Artificial Intelligence Research and Optimization, Torrens University Australia, Fortitude Valley, Brisbane, QLD 4006, Australia
- Yonsei Frontier Lab, Yonsei University, Seoul 03722, Korea
- University Research and Innovation Center, Obuda University, 1034 Budapest, Hungary
| | - Jia Wu
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| |
Collapse
|
57
|
Wang H, Doumard E, Soule-Dupuy C, Kemoun P, Aligon J, Monsarrat P. Explanations as a New Metric for Feature Selection: A Systematic Approach. IEEE J Biomed Health Inform 2023; 27:4131-4142. [PMID: 37220033 DOI: 10.1109/jbhi.2023.3279340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
With the extensive use of Machine Learning (ML) in the biomedical field, there was an increasing need for Explainable Artificial Intelligence (XAI) to improve transparency and reveal complex hidden relationships between variables for medical practitioners, while meeting regulatory requirements. Feature Selection (FS) is widely used as a part of a biomedical ML pipeline to significantly reduce the number of variables while preserving as much information as possible. However, the choice of FS methods affects the entire pipeline including the final prediction explanations, whereas very few works investigate the relationship between FS and model explanations. Through a systematic workflow performed on 145 datasets and an illustration on medical data, the present work demonstrated the promising complementarity of two metrics based on explanations (using ranking and influence changes) in addition to accuracy and retention rate to select the most appropriate FS/ML models. Measuring how much explanations differ with/without FS are particularly promising for FS methods recommendation. While reliefF generally performs the best on average, the optimal choice may vary for each dataset. Positioning FS methods in a tridimensional space, integrating explanations-based metrics, accuracy and retention rate, would allow the user to choose the priorities to be given on each of the dimensions. In biomedical applications, where each medical condition may have its own preferences, this framework will make it possible to offer the healthcare professional the appropriate FS technique, to select the variables that have an important explainable impact, even if this comes at the expense of a limited drop of accuracy.
Collapse
|
58
|
Li M, Xu G, Chen Q, Xue T, Peng H, Wang Y, Shi H, Duan S, Feng F. Computed Tomography-based Radiomics Nomogram for the Preoperative Prediction of Tumor Deposits and Clinical Outcomes in Colon Cancer: a Multicenter Study. Acad Radiol 2023; 30:1572-1583. [PMID: 36566155 DOI: 10.1016/j.acra.2022.11.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 10/16/2022] [Accepted: 11/07/2022] [Indexed: 12/24/2022]
Abstract
RATIONALE AND OBJECTIVES To develop and validate a computed tomography (CT)-based radiomics nomogram for the preoperative prediction of tumor deposits (TDs) and clinical outcomes in patients with colon cancer. MATERIALS AND METHODS This retrospective study included 383 consecutive patients with colon cancer from two centers. Radiomics features were extracted from portal venous phase CT images. Least absolute shrinkage and selection operator regression was applied for feature selection and radiomics signature construction. The multivariate logistic regression model was used to establish a radiomics nomogram. The performance of the nomogram was assessed by using receiver operating characteristic curves, calibration curves and decision curve analysis. Kaplan‒Meier survival analysis was used to assess the difference of the overall survival (OS) in the TDs-positive and TDs-negative groups. RESULTS The radiomics signature was composed of 11 TDs status related features. The AUCs of the radiomics model in the training cohort, internal validation and external validation cohorts were 0.82, 0.78 and 0.78, respectively. The radiomics nomogram that incorporated the radiomics signature and clinical independent predictors (CT-N, CEA and CA199) showed good calibration and discrimination with AUCs of 0.88, 0.80 and 0.81 in the training cohort, internal validation and external validation cohorts, respectively. The radiomics nomogram-predicted high-risk groups had a worse OS than the low-risk groups (p < 0.001). The radiomics nomogram-predicted TDs was an independent preoperative predictor of OS. CONCLUSION The radiomics nomogram based on CT radiomics features and clinical independent predictors could effectively predict the preoperative TDs status and OS of colon cancer. IMPORTANT FINDINGS CT-based radiomics nomogram may be applied in the individual preoperative prediction of TDs status in colon cancer. Additionally, there was a significant difference in OS between the high-risk and low-risk groups defined by the radiomics nomogram, in which patients with high-risk TDs had a significantly worse OS, compared with those with low-risk TDs.
Collapse
Affiliation(s)
- Manman Li
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, Jiangsu, PR China, 226361
| | - Guodong Xu
- Department of Radiology, Affiliated Hospital of Nantong University, Nantong, Jiangsu, PR China
| | - Qiaoling Chen
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, Jiangsu, PR China, 226361
| | - Ting Xue
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, Jiangsu, PR China, 226361
| | - Hui Peng
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, Jiangsu, PR China, 226361
| | - Yuwei Wang
- Department of Record room, Affiliated Tumor Hospital of Nantong University, Nantong, Jiangsu, PR China
| | - Hui Shi
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, Jiangsu, PR China, 226361
| | | | - Feng Feng
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, Jiangsu, PR China, 226361.
| |
Collapse
|
59
|
Li M, Xu G, Zhou H, Chen Q, Fan Q, Shi J, Duan S, Cui Y, Feng F. Computed tomography-based radiomics nomogram for the pre-operative prediction of BRAF mutation and clinical outcomes in patients with colorectal cancer: a double-center study. Br J Radiol 2023; 96:20230019. [PMID: 37195006 PMCID: PMC10392655 DOI: 10.1259/bjr.20230019] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 04/10/2023] [Accepted: 04/23/2023] [Indexed: 05/18/2023] Open
Abstract
OBJECTIVE To develop and validate a radiomics nomogram based on CT for the pre-operative prediction of BRAF mutation and clinical outcomes in patients with colorectal cancer (CRC). METHODS A total of 451 CRC patients (training cohort = 190; internal validation cohort = 125; external validation cohort = 136) from 2 centers were retrospectively included. Least absolute shrinkage and selection operator regression was used to select radiomics features and the radiomics score (Radscore) was calculated. Nomogram was constructed by combining Radscore and significant clinical predictors. Receiver operating characteristic curve analysis, calibration curve and decision curve analysis were used to evaluate the predictive performance of the nomogram. Kaplan‒Meier survival curves based on the radiomics nomogram were used to assess overall survival (OS) of the entire cohort. RESULTS The Radscore consisted of nine radiomics features which were the most relevant to BRAF mutation. The radiomics nomogram integrating Radscore and clinical independent predictors (age, tumor location and cN stage) showed good calibration and discrimination with AUCs of 0.86 (95% CI: 0.80-0.91), 0.82 (95% CI: 0.74-0.90) and 0.82 (95% CI: 0.75-0.90) in the training cohort, internal validation and external validation cohorts, respectively. Furthermore,the performance of nomogram was significantly better than that of the clinical model (p < 0.05). The radiomics nomogram-predicted BRAF mutation high-risk group had a worse OS than the low-risk group (p < 0.0001). CONCLUSION The radiomics nomogram showed good performance in predicting BRAF mutation and OS of CRC patients, which could provide valuable information for individualized treatment. ADVANCES IN KNOWLEDGE The radiomics nomogram could effectively predict BRAF mutation and OS in patients with CRC. High-risk BRAF mutation group identified by the radiomics nomogram was independently associated with poor OS.
Collapse
Affiliation(s)
| | - Guodong Xu
- Department of Radiology, Yancheng No. 1 People’s Hospital, Yancheng, Jiangsu Province, China
| | - Hui Zhou
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, Jiangsu Province, China
| | - Qiaoling Chen
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, Jiangsu Province, China
| | - Qi Fan
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, Jiangsu Province, China
| | - Jian Shi
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, Jiangsu Province, China
| | | | - Yanfen Cui
- Department of Radiology, Shanxi Cancer Hospital, Shanxi, Shanxi Province, China
| | - Feng Feng
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, Jiangsu Province, China
| |
Collapse
|
60
|
Lee DY, Choi B, Kim C, Fridgeirsson E, Reps J, Kim M, Kim J, Jang JW, Rhee SY, Seo WW, Lee S, Son SJ, Park RW. Privacy-Preserving Federated Model Predicting Bipolar Transition in Patients With Depression: Prediction Model Development Study. J Med Internet Res 2023; 25:e46165. [PMID: 37471130 PMCID: PMC10401196 DOI: 10.2196/46165] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 03/10/2023] [Accepted: 06/29/2023] [Indexed: 07/21/2023] Open
Abstract
BACKGROUND Mood disorder has emerged as a serious concern for public health; in particular, bipolar disorder has a less favorable prognosis than depression. Although prompt recognition of depression conversion to bipolar disorder is needed, early prediction is challenging due to overlapping symptoms. Recently, there have been attempts to develop a prediction model by using federated learning. Federated learning in medical fields is a method for training multi-institutional machine learning models without patient-level data sharing. OBJECTIVE This study aims to develop and validate a federated, differentially private multi-institutional bipolar transition prediction model. METHODS This retrospective study enrolled patients diagnosed with the first depressive episode at 5 tertiary hospitals in South Korea. We developed models for predicting bipolar transition by using data from 17,631 patients in 4 institutions. Further, we used data from 4541 patients for external validation from 1 institution. We created standardized pipelines to extract large-scale clinical features from the 4 institutions without any code modification. Moreover, we performed feature selection in a federated environment for computational efficiency and applied differential privacy to gradient updates. Finally, we compared the federated and the 4 local models developed with each hospital's data on internal and external validation data sets. RESULTS In the internal data set, 279 out of 17,631 patients showed bipolar disorder transition. In the external data set, 39 out of 4541 patients showed bipolar disorder transition. The average performance of the federated model in the internal test (area under the curve [AUC] 0.726) and external validation (AUC 0.719) data sets was higher than that of the other locally developed models (AUC 0.642-0.707 and AUC 0.642-0.699, respectively). In the federated model, classifications were driven by several predictors such as the Charlson index (low scores were associated with bipolar transition, which may be due to younger age), severe depression, anxiolytics, young age, and visiting months (the bipolar transition was associated with seasonality, especially during the spring and summer months). CONCLUSIONS We developed and validated a differentially private federated model by using distributed multi-institutional psychiatric data with standardized pipelines in a real-world environment. The federated model performed better than models using local data only.
Collapse
Affiliation(s)
- Dong Yun Lee
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon-si, Republic of Korea
| | - Byungjin Choi
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon-si, Republic of Korea
| | - Chungsoo Kim
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon-si, Republic of Korea
| | - Egill Fridgeirsson
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, Netherlands
| | - Jenna Reps
- Observational Health Data Analytics, Janssen Research and Development, Titusville, NJ, United States
| | - Myoungsuk Kim
- Data Solution Team, Evidnet Co, Ltd, Sungnam, Republic of Korea
| | - Jihyeong Kim
- Data Solution Team, Evidnet Co, Ltd, Sungnam, Republic of Korea
| | - Jae-Won Jang
- Department of Neurology, Kangwon National University Hospital, Kangwon National University School of Medicine, Chuncheon, Republic of Korea
| | - Sang Youl Rhee
- Center for Digital Health, Medical Science Research Institute, Kyung Hee University Medical Center, Seoul, Republic of Korea
- Department of Endocrinology and Metabolism, Kyung Hee University College of Medicine, Seoul, Republic of Korea
| | - Won-Woo Seo
- Department of Internal Medicine, Kangdong Sacred Heart Hospital, Hallym University College of Medicine, Seoul, Republic of Korea
| | - Seunghoon Lee
- Department of Psychiatry, Myongji Hospital, Goyang, Republic of Korea
| | - Sang Joon Son
- Department of Psychiatry, Ajou University School of Medicine, Suwon-si, Republic of Korea
| | - Rae Woong Park
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon-si, Republic of Korea
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon-si, Republic of Korea
| |
Collapse
|
61
|
Thomas E, Ali FB, Tolambiya A, Chambellant F, Gaveau J. Too much information is no information: how machine learning and feature selection could help in understanding the motor control of pointing. Front Big Data 2023; 6:921355. [PMID: 37546547 PMCID: PMC10399757 DOI: 10.3389/fdata.2023.921355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Accepted: 06/16/2023] [Indexed: 08/08/2023] Open
Abstract
The aim of this study was to develop the use of Machine Learning techniques as a means of multivariate analysis in studies of motor control. These studies generate a huge amount of data, the analysis of which continues to be largely univariate. We propose the use of machine learning classification and feature selection as a means of uncovering feature combinations that are altered between conditions. High dimensional electromyogram (EMG) vectors were generated as several arm and trunk muscles were recorded while subjects pointed at various angles above and below the gravity neutral horizontal plane. We used Linear Discriminant Analysis (LDA) to carry out binary classifications between the EMG vectors for pointing at a particular angle, vs. pointing at the gravity neutral direction. Classification success provided a composite index of muscular adjustments for various task constraints-in this case, pointing angles. In order to find the combination of features that were significantly altered between task conditions, we conducted a post classification feature selection i.e., investigated which combination of features had allowed for the classification. Feature selection was done by comparing the representations of each category created by LDA for the classification. In other words computing the difference between the representations of each class. We propose that this approach will help with comparing high dimensional EMG patterns in two ways; (i) quantifying the effects of the entire pattern rather than using single arbitrarily defined variables and (ii) identifying the parts of the patterns that convey the most information regarding the investigated effects.
Collapse
Affiliation(s)
- Elizabeth Thomas
- INSERMU1093, UFR STAPS, Université de Bourgogne Franche Comté, Dijon, France
| | - Ferid Ben Ali
- School of Engineering and Computer Science, University of Hertfordshire, Hatfield, United Kingdom
| | - Arvind Tolambiya
- Applied Intelligence Hub, Accenture Solutions Private Ltd., Hyderabad, Telangana, India
| | - Florian Chambellant
- INSERMU1093, UFR STAPS, Université de Bourgogne Franche Comté, Dijon, France
| | - Jérémie Gaveau
- INSERMU1093, UFR STAPS, Université de Bourgogne Franche Comté, Dijon, France
| |
Collapse
|
62
|
Kuzudisli C, Bakir-Gungor B, Bulut N, Qaqish B, Yousef M. Review of feature selection approaches based on grouping of features. PeerJ 2023; 11:e15666. [PMID: 37483989 PMCID: PMC10358338 DOI: 10.7717/peerj.15666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 06/08/2023] [Indexed: 07/25/2023] Open
Abstract
With the rapid development in technology, large amounts of high-dimensional data have been generated. This high dimensionality including redundancy and irrelevancy poses a great challenge in data analysis and decision making. Feature selection (FS) is an effective way to reduce dimensionality by eliminating redundant and irrelevant data. Most traditional FS approaches score and rank each feature individually; and then perform FS either by eliminating lower ranked features or by retaining highly-ranked features. In this review, we discuss an emerging approach to FS that is based on initially grouping features, then scoring groups of features rather than scoring individual features. Despite the presence of reviews on clustering and FS algorithms, to the best of our knowledge, this is the first review focusing on FS techniques based on grouping. The typical idea behind FS through grouping is to generate groups of similar features with dissimilarity between groups, then select representative features from each cluster. Approaches under supervised, unsupervised, semi supervised and integrative frameworks are explored. The comparison of experimental results indicates the effectiveness of sequential, optimization-based (i.e., fuzzy or evolutionary), hybrid and multi-method approaches. When it comes to biological data, the involvement of external biological sources can improve analysis results. We hope this work's findings can guide effective design of new FS approaches using feature grouping.
Collapse
Affiliation(s)
- Cihan Kuzudisli
- Department of Computer Engineering, Hasan Kalyoncu University, Gaziantep, Turkey
- Department of Electrical and Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Burcu Bakir-Gungor
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Nurten Bulut
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Bahjat Qaqish
- Department of Biostatistics, University of North Carolina at Chapel Hill, North Carolina, Chapel Hill, United States of America
| | - Malik Yousef
- Department of Information Systems, Zefat Academic College, Zefat, Israel
- Galilee Digital Health Research Center, Zefat Academic College, Zefat, Israel
| |
Collapse
|
63
|
Chattopadhyay S, Singh PK, Ijaz MF, Kim S, Sarkar R. SnapEnsemFS: a snapshot ensembling-based deep feature selection model for colorectal cancer histological analysis. Sci Rep 2023; 13:9937. [PMID: 37336964 PMCID: PMC10279666 DOI: 10.1038/s41598-023-36921-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 06/12/2023] [Indexed: 06/21/2023] Open
Abstract
Colorectal cancer is the third most common type of cancer diagnosed annually, and the second leading cause of death due to cancer. Early diagnosis of this ailment is vital for preventing the tumours to spread and plan treatment to possibly eradicate the disease. However, population-wide screening is stunted by the requirement of medical professionals to analyse histological slides manually. Thus, an automated computer-aided detection (CAD) framework based on deep learning is proposed in this research that uses histological slide images for predictions. Ensemble learning is a popular strategy for fusing the salient properties of several models to make the final predictions. However, such frameworks are computationally costly since it requires the training of multiple base learners. Instead, in this study, we adopt a snapshot ensemble method, wherein, instead of the traditional method of fusing decision scores from the snapshots of a Convolutional Neural Network (CNN) model, we extract deep features from the penultimate layer of the CNN model. Since the deep features are extracted from the same CNN model but for different learning environments, there may be redundancy in the feature set. To alleviate this, the features are fed into Particle Swarm Optimization, a popular meta-heuristic, for dimensionality reduction of the feature space and better classification. Upon evaluation on a publicly available colorectal cancer histology dataset using a five-fold cross-validation scheme, the proposed method obtains a highest accuracy of 97.60% and F1-Score of 97.61%, outperforming existing state-of-the-art methods on the same dataset. Further, qualitative investigation of class activation maps provide visual explainability to medical practitioners, as well as justifies the use of the CAD framework in screening of colorectal histology. Our source codes are publicly accessible at: https://github.com/soumitri2001/SnapEnsemFS .
Collapse
Affiliation(s)
- Soumitri Chattopadhyay
- Department of Information Technology, Jadavpur University, Jadavpur University Second Campus, Plot No. 8, Salt Lake Bypass, LB Block, Sector III, Salt Lake City, Kolkata, 700106, West Bengal, India
| | - Pawan Kumar Singh
- Department of Information Technology, Jadavpur University, Jadavpur University Second Campus, Plot No. 8, Salt Lake Bypass, LB Block, Sector III, Salt Lake City, Kolkata, 700106, West Bengal, India
| | - Muhammad Fazal Ijaz
- Department of Mechanical Engineering, Faculty of Engineering and Information Technology, The University of Melbourne, Grattam Street, Parkville, VIC, 3010, Australia.
| | - SeongKi Kim
- National Centre of Excellence in Software, Sangmyung University, Seoul, 03016, Korea.
| | - Ram Sarkar
- Department of Computer Science & Engineering, Jadavpur University, Kolkata, 700032, India
| |
Collapse
|
64
|
Tasci E, Jagasia S, Zhuge Y, Sproull M, Cooley Zgela T, Mackey M, Camphausen K, Krauze AV. RadWise: A Rank-Based Hybrid Feature Weighting and Selection Method for Proteomic Categorization of Chemoirradiation in Patients with Glioblastoma. Cancers (Basel) 2023; 15:2672. [PMID: 37345009 PMCID: PMC10216128 DOI: 10.3390/cancers15102672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 05/03/2023] [Accepted: 05/06/2023] [Indexed: 06/23/2023] Open
Abstract
Glioblastomas (GBM) are rapidly growing, aggressive, nearly uniformly fatal, and the most common primary type of brain cancer. They exhibit significant heterogeneity and resistance to treatment, limiting the ability to analyze dynamic biological behavior that drives response and resistance, which are central to advancing outcomes in glioblastoma. Analysis of the proteome aimed at signal change over time provides a potential opportunity for non-invasive classification and examination of the response to treatment by identifying protein biomarkers associated with interventions. However, data acquired using large proteomic panels must be more intuitively interpretable, requiring computational analysis to identify trends. Machine learning is increasingly employed, however, it requires feature selection which has a critical and considerable effect on machine learning problems when applied to large-scale data to reduce the number of parameters, improve generalization, and find essential predictors. In this study, using 7k proteomic data generated from the analysis of serum obtained from 82 patients with GBM pre- and post-completion of concurrent chemoirradiation (CRT), we aimed to select the most discriminative proteomic features that define proteomic alteration that is the result of administering CRT. Thus, we present a novel rank-based feature weighting method (RadWise) to identify relevant proteomic parameters using two popular feature selection methods, least absolute shrinkage and selection operator (LASSO) and the minimum redundancy maximum relevance (mRMR). The computational results show that the proposed method yields outstanding results with very few selected proteomic features, with higher accuracy rate performance than methods that do not employ a feature selection process. While the computational method identified several proteomic signals identical to the clinical intuitive (heuristic approach), several heuristically identified proteomic signals were not selected while other novel proteomic biomarkers not selected with the heuristic approach that carry biological prognostic relevance in GBM only emerged with the novel method. The computational results show that the proposed method yields promising results, reducing 7k proteomic data to 8 selected proteomic features with a performance value of 96.364%, comparing favorably with techniques that do not employ feature selection.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Andra Valentina Krauze
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Building 10, Bethesda, MD 20892, USA; (E.T.); (S.J.); (Y.Z.); (M.S.); (T.C.Z.); (M.M.); (K.C.)
| |
Collapse
|
65
|
Oğur NB, Kotan M, Balta D, Yavuz BÇ, Oğur YS, Yuvacı HU, Yazıcı E. Detection of depression and anxiety in the perinatal period using Marine Predators Algorithm and kNN. Comput Biol Med 2023; 161:107003. [PMID: 37224599 DOI: 10.1016/j.compbiomed.2023.107003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 04/19/2023] [Accepted: 05/02/2023] [Indexed: 05/26/2023]
Abstract
Undiagnosed prenatal anxiety and depression have the potential to worsen and have an adverse effect on both the mother and the infant. Although the diagnosis is made by specialist doctors, it is unclear which parameters are more effective. Especially in medicine, it is crucial to diagnose disease with high accuracy. For this reason, in this study, a questionnaire study was first conducted on pregnant women, and real original data were collected. Then, the Marine Predators Algorithm (MPA), one of the current metaheuristic algorithms inspired by nature, was combined with K-Nearest Neighbors (kNN) to determine high-priority features in the collected data. As a result, five of the 147 features selected by the proposed method were determined as high priority and approved by the doctors. In addition, the proposed method is compared with the Chi-square method, which is one of the filter-based feature selection methods. Thanks to the proposed feature selection method based on MPA and kNN, it has been observed that the classification gives more successful results in a shorter time with 98.11% success, and the model supports the diagnosis stage of the doctors.
Collapse
Affiliation(s)
- Nur Banu Oğur
- Sakarya University, Faculty of Computer and Information Sciences, Department of Computer Engineering, Sakarya, Turkey.
| | - Muhammed Kotan
- Sakarya University, Faculty of Computer and Information Sciences, Department of Information Systems Engineering, Sakarya, Turkey
| | - Deniz Balta
- Sakarya University, Faculty of Computer and Information Sciences, Department of Software Engineering, Sakarya, Turkey
| | - Burcu Çarklı Yavuz
- Sakarya University, Faculty of Computer and Information Sciences, Department of Information Systems Engineering, Sakarya, Turkey
| | - Yavuz Selim Oğur
- Sakarya University, Faculty of Medicine, Department of Psychiatry, Sakarya, Turkey
| | - Hilal Uslu Yuvacı
- Sakarya University, Faculty of Medicine, Department of Obstetrics and Gynecology, Sakarya, Turkey
| | - Esra Yazıcı
- Sakarya University, Faculty of Medicine, Department of Psychiatry, Sakarya, Turkey
| |
Collapse
|
66
|
Seyedtabib M, Kamyari N. Predicting polypharmacy in half a million adults in the Iranian population: comparison of machine learning algorithms. BMC Med Inform Decis Mak 2023; 23:84. [PMID: 37147615 PMCID: PMC10161984 DOI: 10.1186/s12911-023-02177-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Accepted: 04/21/2023] [Indexed: 05/07/2023] Open
Abstract
BACKGROUND Polypharmacy (PP) is increasingly common in Iran, and contributes to the substantial burden of drug-related morbidity, increasing the potential for drug interactions and potentially inappropriate medications. Machine learning algorithms (ML) can be employed as an alternative solution for the prediction of PP. Therefore, our study aimed to compare several ML algorithms to predict the PP using the health insurance claims data and choose the best-performing algorithm as a predictive tool for decision-making. METHODS This population-based cross-sectional study was performed between April 2021 and March 2022. After feature selection, information about 550 thousand patients were obtained from National Center for Health Insurance Research (NCHIR). Afterwards, several ML algorithms were trained to predict PP. Finally, to assess the models' performance, the metrics derived from the confusion matrix were calculated. RESULTS The study sample comprised 554 133 adults with a median (IQR) age of 51 years (40 - 62) that nested in 27 cities within the Khuzestan province of Iran. Most of the patients were female (62.5%), married (63.5%), and employed (83.2%) during the last year. The prevalence of PP in all populations was about 36.0%. After performing the feature selection, out of 23 features, the number of prescriptions, Insurance coverage for prescription drugs, and hypertension were found as the top three predictors. Experimental results showed that Random Forest (RF) performed better than other ML algorithms with recall, specificity, accuracy, precision and F1-score of 63.92%, 89.92%, 79.99%, 63.92% and 63.92% respectively. CONCLUSION It was found that ML provides a reasonable level of accuracy in predicting polypharmacy. Therefore, the prediction models based on ML, especially the RF algorithm, performed better than other methods for predicting PP in Iranian people in terms of the performance criteria.
Collapse
Affiliation(s)
- Maryam Seyedtabib
- Department of Biostatistics and Epidemiology, School of Health, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran
| | - Naser Kamyari
- Department of Biostatistics and Epidemiology, School of Health, Abadan University of Medical Sciences, Abadan, Iran.
| |
Collapse
|
67
|
Wang J, Xu P, Ji X, Li M, Lu W. Feature Selection in Machine Learning for Perovskite Materials Design and Discovery. MATERIALS (BASEL, SWITZERLAND) 2023; 16:3134. [PMID: 37109971 PMCID: PMC10146176 DOI: 10.3390/ma16083134] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Revised: 04/11/2023] [Accepted: 04/13/2023] [Indexed: 06/19/2023]
Abstract
Perovskite materials have been one of the most important research objects in materials science due to their excellent photoelectric properties as well as correspondingly complex structures. Machine learning (ML) methods have been playing an important role in the design and discovery of perovskite materials, while feature selection as a dimensionality reduction method has occupied a crucial position in the ML workflow. In this review, we introduced the recent advances in the applications of feature selection in perovskite materials. First, the development tendency of publications about ML in perovskite materials was analyzed, and the ML workflow for materials was summarized. Then the commonly used feature selection methods were briefly introduced, and the applications of feature selection in inorganic perovskites, hybrid organic-inorganic perovskites (HOIPs), and double perovskites (DPs) were reviewed. Finally, we put forward some directions for the future development of feature selection in machine learning for perovskite material design.
Collapse
Affiliation(s)
- Junya Wang
- Department of Mathematics, College of Sciences, Shanghai University, Shanghai 200444, China
| | - Pengcheng Xu
- Materials Genome Institute, Shanghai University, Shanghai 200444, China
| | - Xiaobo Ji
- Department of Chemistry, College of Sciences, Shanghai University, Shanghai 200444, China
| | - Minjie Li
- Department of Chemistry, College of Sciences, Shanghai University, Shanghai 200444, China
| | - Wencong Lu
- Department of Chemistry, College of Sciences, Shanghai University, Shanghai 200444, China
- Zhejiang Laboratory, Hangzhou 311100, China
- Key Laboratory of Silicate Cultural Relics Conservation (Shanghai University), Ministry of Education, Shanghai 200444, China
| |
Collapse
|
68
|
Tao R, Yu X, Lu J, Wang Y, Lu W, Zhang Z, Li H, Zhou J. A deep learning nomogram of continuous glucose monitoring data for the risk prediction of diabetic retinopathy in type 2 diabetes. Phys Eng Sci Med 2023; 46:813-825. [PMID: 37041318 DOI: 10.1007/s13246-023-01254-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 03/27/2023] [Indexed: 04/13/2023]
Abstract
Continuous glucose monitoring (CGM) data analysis will provide a new perspective to analyze factors related to diabetic retinopathy (DR). However, the problem of visualizing CGM data and automatically predicting the incidence of DR from CGM is still controversial. Here, we explored the feasibility of using CGM profiles to predict DR in type 2 diabetes (T2D) by deep learning approach. This study fused deep learning with a regularized nomogram to construct a novel deep learning nomogram from CGM profiles to identify patients at high risk of DR. Specifically, a deep learning network was employed to mine the nonlinear relationship between CGM profiles and DR. Moreover, a novel nomogram combining CGM deep factors with basic information was established to score the patients' DR risk. This dataset consists of 788 patients belonging to two cohorts: 494 in the training cohort and 294 in the testing cohort. The area under the curve (AUC) values of our deep learning nomogram were 0.82 and 0.80 in the training cohort and testing cohort, respectively. By incorporating basic clinical factors, the deep learning nomogram achieved an AUC of 0.86 in the training cohort and 0.85 in the testing cohort. The calibration plot and decision curve showed that the deep learning nomogram had the potential for clinical application. This analysis method of CGM profiles can be extended to other diabetic complications by further investigation.
Collapse
Affiliation(s)
- Rui Tao
- College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning, China
| | - Xia Yu
- College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning, China
| | - Jingyi Lu
- Department of Endocrinology and Metabolism, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai Clinical Center for Diabetes, 600 Yishan Road, Shanghai, 200233, China
| | - Yaxin Wang
- Department of Endocrinology and Metabolism, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai Clinical Center for Diabetes, 600 Yishan Road, Shanghai, 200233, China
| | - Wei Lu
- Department of Endocrinology and Metabolism, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai Clinical Center for Diabetes, 600 Yishan Road, Shanghai, 200233, China
| | - Zhanhu Zhang
- College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning, China
| | - Hongru Li
- College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning, China
| | - Jian Zhou
- Department of Endocrinology and Metabolism, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai Clinical Center for Diabetes, 600 Yishan Road, Shanghai, 200233, China.
| |
Collapse
|
69
|
Fajarda O, Almeida JR, Duarte-Pereira S, Silva RM, Oliveira JL. Methodology to identify a gene expression signature by merging microarray datasets. Comput Biol Med 2023; 159:106867. [PMID: 37060770 DOI: 10.1016/j.compbiomed.2023.106867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 03/01/2023] [Accepted: 03/30/2023] [Indexed: 04/17/2023]
Abstract
A vast number of microarray datasets have been produced as a way to identify differentially expressed genes and gene expression signatures. A better understanding of these biological processes can help in the diagnosis and prognosis of diseases, as well as in the therapeutic response to drugs. However, most of the available datasets are composed of a reduced number of samples, leading to low statistical, predictive and generalization power. One way to overcome this problem is by merging several microarray datasets into a single dataset, which is typically a challenging task. Statistical methods or supervised machine learning algorithms are usually used to determine gene expression signatures. Nevertheless, statistical methods require an arbitrary threshold to be defined, and supervised machine learning methods can be ineffective when applied to high-dimensional datasets like microarrays. We propose a methodology to identify gene expression signatures by merging microarray datasets. This methodology uses statistical methods to obtain several sets of differentially expressed genes and uses supervised machine learning algorithms to select the gene expression signature. This methodology was validated using two distinct research applications: one using heart failure and the other using autism spectrum disorder microarray datasets. For the first, we obtained a gene expression signature composed of 117 genes, with a classification accuracy of approximately 98%. For the second use case, we obtained a gene expression signature composed of 79 genes, with a classification accuracy of approximately 82%. This methodology was implemented in R language and is available, under the MIT licence, at https://github.com/bioinformatics-ua/MicroGES.
Collapse
Affiliation(s)
- Olga Fajarda
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal.
| | - João Rafael Almeida
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal; Department of Computation, University of A Coruña, A Coruña, Spain.
| | - Sara Duarte-Pereira
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal; Department of Medical Sciences and iBiMED-Institute of Biomedicine, University of Aveiro, Aveiro, Portugal.
| | - Raquel M Silva
- Universidade Católica Portuguesa, Faculty of Dental Medicine (FMD), Center for Interdisciplinary Research in Health (CIIS), Viseu, Portugal.
| | | |
Collapse
|
70
|
Daneshvar NHN, Masoudi-Sobhanzadeh Y, Omidi Y. A voting-based machine learning approach for classifying biological and clinical datasets. BMC Bioinformatics 2023; 24:140. [PMID: 37041456 PMCID: PMC10088226 DOI: 10.1186/s12859-023-05274-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Accepted: 04/05/2023] [Indexed: 04/13/2023] Open
Abstract
BACKGROUND Different machine learning techniques have been proposed to classify a wide range of biological/clinical data. Given the practicability of these approaches accordingly, various software packages have been also designed and developed. However, the existing methods suffer from several limitations such as overfitting on a specific dataset, ignoring the feature selection concept in the preprocessing step, and losing their performance on large-size datasets. To tackle the mentioned restrictions, in this study, we introduced a machine learning framework consisting of two main steps. First, our previously suggested optimization algorithm (Trader) was extended to select a near-optimal subset of features/genes. Second, a voting-based framework was proposed to classify the biological/clinical data with high accuracy. To evaluate the efficiency of the proposed method, it was applied to 13 biological/clinical datasets, and the outcomes were comprehensively compared with the prior methods. RESULTS The results demonstrated that the Trader algorithm could select a near-optimal subset of features with a significant level of p-value < 0.01 relative to the compared algorithms. Additionally, on the large-sie datasets, the proposed machine learning framework improved prior studies by ~ 10% in terms of the mean values associated with fivefold cross-validation of accuracy, precision, recall, specificity, and F-measure. CONCLUSION Based on the obtained results, it can be concluded that a proper configuration of efficient algorithms and methods can increase the prediction power of machine learning approaches and help researchers in designing practical diagnosis health care systems and offering effective treatment plans.
Collapse
Affiliation(s)
| | - Yosef Masoudi-Sobhanzadeh
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran.
- Faculty of Advanced Medical Sciences, Tabriz University of Medical Sciences, Tabriz, Iran.
| | - Yadollah Omidi
- Department of Pharmaceutical Sciences, College of Pharmacy, Nova Southeastern University, Florida, 33328, USA.
| |
Collapse
|
71
|
Karras A, Karras C, Schizas N, Avlonitis M, Sioutas S. AutoML with Bayesian Optimizations for Big Data Management. INFORMATION 2023. [DOI: 10.3390/info14040223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/08/2023] Open
Abstract
The field of automated machine learning (AutoML) has gained significant attention in recent years due to its ability to automate the process of building and optimizing machine learning models. However, the increasing amount of big data being generated has presented new challenges for AutoML systems in terms of big data management. In this paper, we introduce Fabolas and learning curve extrapolation as two methods for accelerating hyperparameter optimization. Four methods for quickening training were presented including Bag of Little Bootstraps, k-means clustering for Support Vector Machines, subsample size selection for gradient descent, and subsampling for logistic regression. Additionally, we also discuss the use of Markov Chain Monte Carlo (MCMC) methods and other stochastic optimization techniques to improve the efficiency of AutoML systems in managing big data. These methods enhance various facets of the training process, making it feasible to combine them in diverse ways to gain further speedups. We review several combinations that have potential and provide a comprehensive understanding of the current state of AutoML and its potential for managing big data in various industries. Furthermore, we also mention the importance of parallel computing and distributed systems to improve the scalability of the AutoML systems while working with big data.
Collapse
Affiliation(s)
- Aristeidis Karras
- Computer Engineering and Informatics Department, University of Patras, 26504 Patras, Greece
| | - Christos Karras
- Computer Engineering and Informatics Department, University of Patras, 26504 Patras, Greece
| | - Nikolaos Schizas
- Computer Engineering and Informatics Department, University of Patras, 26504 Patras, Greece
| | - Markos Avlonitis
- Department of Informatics, Ionian University, 49100 Kerkira, Greece
| | - Spyros Sioutas
- Computer Engineering and Informatics Department, University of Patras, 26504 Patras, Greece
| |
Collapse
|
72
|
Lötsch J, Ultsch A. Recursive computed ABC (cABC) analysis as a precise method for reducing machine learning based feature sets to their minimum informative size. Sci Rep 2023; 13:5470. [PMID: 37016033 PMCID: PMC10073099 DOI: 10.1038/s41598-023-32396-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 03/27/2023] [Indexed: 04/06/2023] Open
Abstract
Selecting the k best features is a common task in machine learning. Typically, a few features have high importance, but many have low importance (right-skewed distribution). This report proposes a numerically precise method to address this skewed feature importance distribution in order to reduce a feature set to the informative minimum of items. Computed ABC analysis (cABC) is an item categorization method that aims to identify the most important items by partitioning a set of non-negative numerical items into subsets "A", "B", and "C" such that subset "A" contains the "few important" items based on specific properties of ABC curves defined by their relationship to Lorenz curves. In its recursive form, the cABC analysis can be applied again to subset "A". A generic image dataset and three biomedical datasets (lipidomics and two genomics datasets) with a large number of variables were used to perform the experiments. The experimental results show that the recursive cABC analysis limits the dimensions of the data projection to a minimum where the relevant information is still preserved and directs the feature selection in machine learning to the most important class-relevant information, including filtering feature sets for nonsense variables. Feature sets were reduced to 10% or less of the original variables and still provided accurate classification in data not used for feature selection. cABC analysis, in its recursive variant, provides a computationally precise means of reducing information to a minimum. The minimum is the result of a computation of the number of k most relevant items, rather than a decision to select the k best items from a list. In addition, there are precise criteria for stopping the reduction process. The reduction to the most important features can improve the human understanding of the properties of the data set. The cABC method is implemented in the Python package "cABCanalysis" available at https://pypi.org/project/cABCanalysis/ .
Collapse
Affiliation(s)
- Jörn Lötsch
- Institute of Clinical Pharmacology, Goethe - University, Theodor - Stern - Kai 7, 60590, Frankfurt am Main, Germany.
- Fraunhofer Institute for Translational Medicine and Pharmacology ITMP, Theodor - Stern - Kai 7, 60596, Frankfurt am Main, Germany.
| | - Alfred Ultsch
- DataBionics Research Group, University of Marburg, Hans - Meerwein - Straße 22, 35032, Marburg, Germany
| |
Collapse
|
73
|
Zang Z, Xu Y, Lu L, Geng Y, Yang S, Li SZ. UDRN: Unified Dimensional Reduction Neural Network for feature selection and feature projection. Neural Netw 2023; 161:626-637. [PMID: 36827960 DOI: 10.1016/j.neunet.2023.02.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 11/22/2022] [Accepted: 02/11/2023] [Indexed: 02/17/2023]
Abstract
Dimensional reduction (DR) maps high-dimensional data into a lower dimensions latent space with minimized defined optimization objectives. The two independent branches of DR are feature selection (FS) and feature projection (FP). FS focuses on selecting a critical subset of dimensions but risks destroying the data distribution (structure). On the other hand, FP combines all the input features into lower dimensions space, aiming to maintain the data structure, but lacks interpretability and sparsity. Moreover, FS and FP are traditionally incompatible categories and have not been unified into an amicable framework. Therefore, we consider that the ideal DR approach combines both FS and FP into a unified end-to-end manifold learning framework, simultaneously performing fundamental feature discovery while maintaining the intrinsic relationships between data samples in the latent space. This paper proposes a unified framework named Unified Dimensional Reduction Network (UDRN) to integrate FS and FP in an end-to-end way. Furthermore, a novel network framework is designed to implement FS and FP tasks separately using a stacked feature selection network and feature projection network. In addition, a stronger manifold assumption and a novel loss function are proposed. Furthermore, the loss function can leverage the priors of data augmentation to enhance the generalization ability of the proposed UDRN. Finally, comprehensive experimental results on four image and four biological datasets, including very high-dimensional data, demonstrate the advantages of DRN over existing methods (FS, FP, and FS&FP pipeline), especially in downstream tasks such as classification and visualization.
Collapse
Affiliation(s)
- Zelin Zang
- Zhejiang University, Hangzhou, 310000, China; Westlake University, AI Lab, School of Engineering, Hangzhou, 310000, China; Westlake Institute for Advanced Study, Institute of Advanced Technology, Hangzhou, 310000, China.
| | - Yongjie Xu
- Zhejiang University, Hangzhou, 310000, China; Westlake University, AI Lab, School of Engineering, Hangzhou, 310000, China; Westlake Institute for Advanced Study, Institute of Advanced Technology, Hangzhou, 310000, China
| | - Linyan Lu
- China Telecom Corporation Limited, Hangzhou Branch, Hangzhou, 310000, China
| | - Yulan Geng
- Westlake University, AI Lab, School of Engineering, Hangzhou, 310000, China
| | - Senqiao Yang
- Westlake University, AI Lab, School of Engineering, Hangzhou, 310000, China
| | - Stan Z Li
- Westlake University, AI Lab, School of Engineering, Hangzhou, 310000, China; Westlake Institute for Advanced Study, Institute of Advanced Technology, Hangzhou, 310000, China.
| |
Collapse
|
74
|
Guhan Seshadri N, Agrawal S, Kumar Singh B, Geethanjali B, Mahesh V, Pachori RB. EEG based classification of children with learning disabilities using shallow and deep neural network. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
75
|
Alhenawi E, Al-Sayyed R, Hudaib A, Mirjalili S. Improved intelligent water drop-based hybrid feature selection method for microarray data processing. Comput Biol Chem 2023; 103:107809. [PMID: 36696844 DOI: 10.1016/j.compbiolchem.2022.107809] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2021] [Revised: 12/13/2022] [Accepted: 12/30/2022] [Indexed: 01/15/2023]
Abstract
Classifying microarray datasets, which usually contains many noise genes that degrade the performance of classifiers and decrease classification accuracy rate, is a competitive research topic. Feature selection (FS) is one of the most practical ways for finding the most optimal subset of genes that increases classification's accuracy for diagnostic and prognostic prediction of tumor cancer from the microarray datasets. This means that we always need to develop more efficient FS methods, that select only optimal or close-to-optimal subset of features to improve classification performance. In this paper, we propose a hybrid FS method for microarray data processing, that combines an ensemble filter with an Improved Intelligent Water Drop (IIWD) algorithm as a wrapper by adding one of three local search (LS) algorithms: Tabu search (TS), Novel LS algorithm (NLSA), or Hill Climbing (HC) in each iteration from IWD, and using a correlation coefficient filter as a heuristic undesirability (HUD) for next node selection in the original IWD algorithm. The effects of adding three different LS algorithms to the proposed IIWD algorithm have been evaluated through comparing the performance of the proposed ensemble filter-IIWD-based wrapper without adding any LS algorithms named (PHFS-IWD) FS method versus its performance when adding a specific LS algorithm from (TS, NLSA or HC) in FS methods named, (PHFS-IWDTS, PHFS-IWDNLSA, and PHFS-IWDHC), respectively. Naïve Bayes(NB) classifier with five microarray datasets have been deployed for evaluating and comparing the proposed hybrid FS methods. Results show that using LS algorithms in each iteration from the IWD algorithm improves F-score value with an average equal to 5% compared with PHFS-IWD. Also, PHFS-IWDNLSA improves the F-score value with an average of 4.15% over PHFS-IWDTS, and 5.67% over PHFS-IWDHC while PHFS-IWDTS outperformed PHFS-IWDHC with an average of increment equal to 1.6%. On the other hand, the proposed hybrid-based FS methods improve accuracy with an average equal to 8.92% in three out of five datasets and decrease the number of genes with a percentage of 58.5% in all five datasets compared with six of the most recent state-of-the-art FS methods.
Collapse
Affiliation(s)
- Esra'a Alhenawi
- Software Engineering Department, Al-Ahliyya Amman University, Amman, Jordan; King Abdullah II School for Information Technology, The University of Jordan, Amman, Jordan.
| | - Rizik Al-Sayyed
- King Abdullah II School for Information Technology, The University of Jordan, Amman, Jordan.
| | - Amjad Hudaib
- King Abdullah II School for Information Technology, The University of Jordan, Amman, Jordan.
| | - Seyedali Mirjalili
- Center for Artificial Intelligence Research and Optimization, Torrens University Australia, Fortitude Valley, Brisbane, 4006 QLD, Australia; University Research and Innovation Center, Obuda University, Budapest, Hungary.
| |
Collapse
|
76
|
Chadaga K, Prabhu S, Bhat V, Sampathila N, Umakanth S, Chadaga R. A Decision Support System for Diagnosis of COVID-19 from Non-COVID-19 Influenza-like Illness Using Explainable Artificial Intelligence. Bioengineering (Basel) 2023; 10:439. [PMID: 37106626 PMCID: PMC10135993 DOI: 10.3390/bioengineering10040439] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 03/27/2023] [Accepted: 03/29/2023] [Indexed: 04/03/2023] Open
Abstract
The coronavirus pandemic emerged in early 2020 and turned out to be deadly, killing a vast number of people all around the world. Fortunately, vaccines have been discovered, and they seem effectual in controlling the severe prognosis induced by the virus. The reverse transcription-polymerase chain reaction (RT-PCR) test is the current golden standard for diagnosing different infectious diseases, including COVID-19; however, it is not always accurate. Therefore, it is extremely crucial to find an alternative diagnosis method which can support the results of the standard RT-PCR test. Hence, a decision support system has been proposed in this study that uses machine learning and deep learning techniques to predict the COVID-19 diagnosis of a patient using clinical, demographic and blood markers. The patient data used in this research were collected from two Manipal hospitals in India and a custom-made, stacked, multi-level ensemble classifier has been used to predict the COVID-19 diagnosis. Deep learning techniques such as deep neural networks (DNN) and one-dimensional convolutional networks (1D-CNN) have also been utilized. Further, explainable artificial techniques (XAI) such as Shapley additive values (SHAP), ELI5, local interpretable model explainer (LIME), and QLattice have been used to make the models more precise and understandable. Among all of the algorithms, the multi-level stacked model obtained an excellent accuracy of 96%. The precision, recall, f1-score and AUC obtained were 94%, 95%, 94% and 98% respectively. The models can be used as a decision support system for the initial screening of coronavirus patients and can also help ease the existing burden on medical infrastructure.
Collapse
Affiliation(s)
- Krishnaraj Chadaga
- Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal 576104, India;
| | - Srikanth Prabhu
- Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal 576104, India;
| | - Vivekananda Bhat
- Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal 576104, India;
| | - Niranjana Sampathila
- Department of Biomedical Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal 576104, India;
| | - Shashikiran Umakanth
- Department of Medicine, Dr. TMA Hospital, Manipal Academy of Higher Education, Manipal 576104, India;
| | - Rajagopala Chadaga
- Department of Mechanical and Industrial Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal 576104, India;
| |
Collapse
|
77
|
Verdicchio M, Brancato V, Cavaliere C, Isgrò F, Salvatore M, Aiello M. A pathomic approach for tumor-infiltrating lymphocytes classification on breast cancer digital pathology images. Heliyon 2023; 9:e14371. [PMID: 36950640 PMCID: PMC10025040 DOI: 10.1016/j.heliyon.2023.e14371] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 03/03/2023] [Accepted: 03/03/2023] [Indexed: 03/11/2023] Open
Abstract
Background and objectives The detection of tumor-infiltrating lymphocytes (TILs) could aid in the development of objective measures of the infiltration grade and can support decision-making in breast cancer (BC). However, manual quantification of TILs in BC histopathological whole slide images (WSI) is currently based on a visual assessment, thus resulting not standardized, not reproducible, and time-consuming for pathologists. In this work, a novel pathomic approach, aimed to apply high-throughput image feature extraction techniques to analyze the microscopic patterns in WSI, is proposed. In fact, pathomic features provide additional information concerning the underlying biological processes compared to the WSI visual interpretation, thus providing more easily interpretable and explainable results than the most frequently investigated Deep Learning based methods in the literature. Methods A dataset containing 1037 regions of interest with tissue compartments and TILs annotated on 195 TNBC and HER2+ BC hematoxylin and eosin (H&E)-stained WSI was used. After segmenting nuclei within tumor-associated stroma using a watershed-based approach, 71 pathomic features were extracted from each nucleus and reduced using a Spearman's correlation filter followed by a nonparametric Wilcoxon rank-sum test and least absolute shrinkage and selection operator. The relevant features were used to classify each candidate nucleus as either TILs or non-TILs using 5 multivariable machine learning classification models trained using 5-fold cross-validation (1) without resampling, (2) with the synthetic minority over-sampling technique and (3) with downsampling. The prediction performance of the models was assessed using ROC curves. Results 21 features were selected, with most of them related to the well-known TILs properties of having regular shape, clearer margins, high peak intensity, more homogeneous enhancement and different textural pattern than other cells. The best performance was obtained by Random-Forest with ROC AUC of 0.86, regardless of resampling technique. Conclusions The presented approach holds promise for the classification of TILs in BC H&E-stained WSI and could provide support to pathologists for a reliable, rapid and interpretable clinical assessment of TILs in BC.
Collapse
Affiliation(s)
| | | | - Carlo Cavaliere
- IRCCS SYNLAB SDN, Via E. Gianturco 113, Naples, 80143, Italy
| | - Francesco Isgrò
- Department of Electrical Engineering and Information Technologies, University of Naples Federico II, Claudio 21, Naples, 80125, Italy
| | - Marco Salvatore
- IRCCS SYNLAB SDN, Via E. Gianturco 113, Naples, 80143, Italy
| | - Marco Aiello
- IRCCS SYNLAB SDN, Via E. Gianturco 113, Naples, 80143, Italy
| |
Collapse
|
78
|
Tang X, Mo Z, Chang C, Qian X. Group-shrinkage feature selection with a spatial network for mining DNA methylation data. Comput Biol Med 2023; 154:106573. [PMID: 36706568 DOI: 10.1016/j.compbiomed.2023.106573] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 01/05/2023] [Accepted: 01/22/2023] [Indexed: 01/25/2023]
Abstract
Identifying disease-related biomarkers from high-dimensional DNA methylation data helps in reducing early screening costs and inferring pathogenesis mechanisms. Good discovery results have been achieved through spatial correlation methods of methylation sites, group-based regularization, and network constraints. However, these methods still have some key limitations as they cannot exclude isolated differential sites and only consider adjacent site ordering. Therefore, we propose a group-shrinkage feature selection algorithm to encourage the selection of clustered sites and discourage the selection of isolated differential sites. Specifically, a network-guided group-shrinkage strategy is developed to penalize weakly-correlated isolated methylation sites through a network structure constraint. The spatial network is constructed based on spatial correlation information of DNA methylation sites, where this information accounts for the uneven site distribution. The experimental simulations and applications demonstrated that the proposed method outperforms the advanced regularization methods, especially in rejecting isolated methylation sites; hence this study provides an efficient and clinical-valuable method for biomarker candidate discovery in DNA methylation data. Additionally, the proposed method exhibits enhanced reliability due to introducing biological prior knowledge into a regularization-based feature selection framework and could promote more research in the integration between biological prior knowledge and classical feature selection methods, thus facilitating their clinical application. Our source codes will be released at https://github.com/SJTUBME-QianLab/Group-shrinkage-Spatial-Network once this manuscript is accepted for publication.
Collapse
Affiliation(s)
- Xinlu Tang
- Medical Image and Health Informatics Lab, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China.
| | - Zhanfeng Mo
- School of Computer Science and Engineering, Nanyang Technological University, Singapore.
| | - Cheng Chang
- Department of Nuclear Medicine, Shanghai, Chest Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200030, China.
| | - Xiaohua Qian
- Medical Image and Health Informatics Lab, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China.
| |
Collapse
|
79
|
Sadeghian Z, Akbari E, Nematzadeh H, Motameni H. A review of feature selection methods based on meta-heuristic algorithms. J EXP THEOR ARTIF IN 2023. [DOI: 10.1080/0952813x.2023.2183267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2023]
Affiliation(s)
- Zohre Sadeghian
- Department of Computer Engineering, Sari Branch, Islamic Azad University, Sari, Iran
| | - Ebrahim Akbari
- Department of Computer Engineering, Sari Branch, Islamic Azad University, Sari, Iran
| | - Hossein Nematzadeh
- Department of Computer Engineering, Sari Branch, Islamic Azad University, Sari, Iran
| | - Homayun Motameni
- Department of Computer Engineering, Sari Branch, Islamic Azad University, Sari, Iran
| |
Collapse
|
80
|
Devi RM, Premkumar M, Kiruthiga G, Sowmya R. IGJO: An Improved Golden Jackel Optimization Algorithm Using Local Escaping Operator for Feature Selection Problems. Neural Process Lett 2023. [DOI: 10.1007/s11063-023-11146-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]
|
81
|
Kang IA, Njimbouom SN, Kim JD. Optimal Feature Selection-Based Dental Caries Prediction Model Using Machine Learning for Decision Support System. Bioengineering (Basel) 2023; 10:bioengineering10020245. [PMID: 36829739 PMCID: PMC9952690 DOI: 10.3390/bioengineering10020245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 02/07/2023] [Accepted: 02/08/2023] [Indexed: 02/16/2023] Open
Abstract
The high frequency of dental caries is a major public health concern worldwide. The condition is common, particularly in developing countries. Because there are no evident early-stage signs, dental caries frequently goes untreated. Meanwhile, early detection and timely clinical intervention are required to slow disease development. Machine learning (ML) models can benefit clinicians in the early detection of dental cavities through efficient and cost-effective computer-aided diagnoses. This study proposed a more effective method for diagnosing dental caries by integrating the GINI and mRMR algorithms with the GBDT classifier. Because just a few clinical test features are required for the diagnosis, this strategy could save time and money when screening for dental caries. The proposed method was compared to recently proposed dental procedures. Among these classifiers, the suggested GBDT trained with a reduced feature set achieved the best classification performance, with accuracy, F1-score, precision, and recall values of 95%, 93%, 99%, and 88%, respectively. Furthermore, the experimental results suggest that feature selection improved the performance of the various classifiers. The suggested method yielded a good predictive model for dental caries diagnosis, which might be used in more imbalanced medical datasets to identify disease more effectively.
Collapse
Affiliation(s)
- In-Ae Kang
- Department of Computer and Electronics Convergence Engineering, Sun Moon University, Asan-si 31460, Republic of Korea
| | - Soualihou Ngnamsie Njimbouom
- Department of Computer and Electronics Convergence Engineering, Sun Moon University, Asan-si 31460, Republic of Korea
| | - Jeong-Dong Kim
- Department of Computer and Electronics Convergence Engineering, Sun Moon University, Asan-si 31460, Republic of Korea
- Department of Computer Science and Engineering, Sun Moon University, Asan-si 31460, Republic of Korea
- Genome-Based BioIT Convergence Institute, Sun Moon University, Asan-si 31460, Republic of Korea
- Correspondence:
| |
Collapse
|
82
|
Liu K, Chen Q, Huang GH. An Efficient Feature Selection Algorithm for Gene Families Using NMF and ReliefF. Genes (Basel) 2023; 14:421. [PMID: 36833348 PMCID: PMC9957060 DOI: 10.3390/genes14020421] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 01/24/2023] [Accepted: 01/25/2023] [Indexed: 02/10/2023] Open
Abstract
Gene families, which are parts of a genome's information storage hierarchy, play a significant role in the development and diversity of multicellular organisms. Several studies have focused on the characteristics of gene families, such as function, homology, or phenotype. However, statistical and correlation analyses on the distribution of gene family members in the genome have yet to be conducted. Here, a novel framework incorporating gene family analysis and genome selection based on NMF-ReliefF is reported. Specifically, the proposed method starts by obtaining gene families from the TreeFam database and determining the number of gene families within the feature matrix. Then, NMF-ReliefF is used to select features from the gene feature matrix, which is a new feature selection algorithm that overcomes the inefficiencies of traditional methods. Finally, a support vector machine is utilized to classify the acquired features. The results show that the framework achieved an accuracy of 89.1% and an AUC of 0.919 on the insect genome test set. We also employed four microarray gene data sets to evaluate the performance of the NMF-ReliefF algorithm. The outcomes show that the proposed method may strike a delicate balance between robustness and discrimination. Additionally, the proposed method's categorization is superior to state-of-the-art feature selection approaches.
Collapse
Affiliation(s)
- Kai Liu
- College of Plant Protection, Hunan Agricultural University, Changsha 410128, China
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Nongda Road, Furong District, Changsha 410128, China
- College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China
| | - Qi Chen
- College of Plant Protection, Hunan Agricultural University, Changsha 410128, China
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Nongda Road, Furong District, Changsha 410128, China
| | - Guo-Hua Huang
- College of Plant Protection, Hunan Agricultural University, Changsha 410128, China
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Nongda Road, Furong District, Changsha 410128, China
| |
Collapse
|
83
|
Zhang S, Mu W, Dong D, Wei J, Fang M, Shao L, Zhou Y, He B, Zhang S, Liu Z, Liu J, Tian J. The Applications of Artificial Intelligence in Digestive System Neoplasms: A Review. HEALTH DATA SCIENCE 2023; 3:0005. [PMID: 38487199 PMCID: PMC10877701 DOI: 10.34133/hds.0005] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 12/05/2022] [Indexed: 03/17/2024]
Abstract
Importance Digestive system neoplasms (DSNs) are the leading cause of cancer-related mortality with a 5-year survival rate of less than 20%. Subjective evaluation of medical images including endoscopic images, whole slide images, computed tomography images, and magnetic resonance images plays a vital role in the clinical practice of DSNs, but with limited performance and increased workload of radiologists or pathologists. The application of artificial intelligence (AI) in medical image analysis holds promise to augment the visual interpretation of medical images, which could not only automate the complicated evaluation process but also convert medical images into quantitative imaging features that associated with tumor heterogeneity. Highlights We briefly introduce the methodology of AI for medical image analysis and then review its clinical applications including clinical auxiliary diagnosis, assessment of treatment response, and prognosis prediction on 4 typical DSNs including esophageal cancer, gastric cancer, colorectal cancer, and hepatocellular carcinoma. Conclusion AI technology has great potential in supporting the clinical diagnosis and treatment decision-making of DSNs. Several technical issues should be overcome before its application into clinical practice of DSNs.
Collapse
Affiliation(s)
- Shuaitong Zhang
- School of Engineering Medicine, Beihang University, Beijing, China
- Key Laboratory of Big Data-Based Precision Medicine, Beihang University, Ministry of Industry and Information Technology, Beijing, China
| | - Wei Mu
- School of Engineering Medicine, Beihang University, Beijing, China
- Key Laboratory of Big Data-Based Precision Medicine, Beihang University, Ministry of Industry and Information Technology, Beijing, China
| | - Di Dong
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing, China
| | - Jingwei Wei
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing, China
| | - Mengjie Fang
- School of Engineering Medicine, Beihang University, Beijing, China
- Key Laboratory of Big Data-Based Precision Medicine, Beihang University, Ministry of Industry and Information Technology, Beijing, China
| | - Lizhi Shao
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing, China
| | - Yu Zhou
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing, China
| | - Bingxi He
- School of Engineering Medicine, Beihang University, Beijing, China
- Key Laboratory of Big Data-Based Precision Medicine, Beihang University, Ministry of Industry and Information Technology, Beijing, China
| | - Song Zhang
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing, China
| | - Zhenyu Liu
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing, China
| | - Jianhua Liu
- Department of Oncology, Guangdong Provincial People's Hospital/Second Clinical Medical College of Southern Medical University/Guangdong Academy of Medical Sciences, Guangzhou, Guangdong, China
| | - Jie Tian
- School of Engineering Medicine, Beihang University, Beijing, China
- Key Laboratory of Big Data-Based Precision Medicine, Beihang University, Ministry of Industry and Information Technology, Beijing, China
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
84
|
Three-stage multi-objective feature selection for distributed systems. Soft comput 2023. [DOI: 10.1007/s00500-023-07865-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
85
|
A multi-objective evolutionary algorithm with decomposition and the information feedback for high-dimensional medical data. Appl Soft Comput 2023. [DOI: 10.1016/j.asoc.2023.110102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
|
86
|
Gögenur I. Introducing machine learning-based prediction models in the perioperative setting. Br J Surg 2023; 110:533-535. [PMID: 36680383 DOI: 10.1093/bjs/znac462] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Accepted: 11/15/2022] [Indexed: 01/22/2023]
Affiliation(s)
- Ismail Gögenur
- Centre for Surgical Science, Department of Surgery, Zealand University Hospital, Koege, Denmark.,Institute for Clinical Medicine, Copenhagen University, Copenhagen, Denmark
| |
Collapse
|
87
|
An in-depth and contrasting survey of meta-heuristic approaches with classical feature selection techniques specific to cervical cancer. Knowl Inf Syst 2023. [DOI: 10.1007/s10115-022-01825-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
88
|
Kleiman MJ, Ariko T, Galvin JE. Hierarchical Two-Stage Cost-Sensitive Clinical Decision Support System for Screening Prodromal Alzheimer's Disease and Related Dementias. J Alzheimers Dis 2023; 91:895-909. [PMID: 36502329 PMCID: PMC10515190 DOI: 10.3233/jad-220891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
BACKGROUND The detection of subtle cognitive impairment in a clinical setting is difficult. Because time is a key factor in small clinics and research sites, the brief cognitive assessments that are relied upon often misclassify patients with very mild impairment as normal. OBJECTIVE In this study, we seek to identify a parsimonious screening tool in one stage, followed by additional assessments in an optional second stage if additional specificity is desired, tested using a machine learning algorithm capable of being integrated into a clinical decision support system. METHODS The best primary stage incorporated measures of short-term memory, executive and visuospatial functioning, and self-reported memory and daily living questions, with a total time of 5 minutes. The best secondary stage incorporated a measure of neurobiology as well as additional cognitive assessment and brief informant report questionnaires, totaling 30 minutes including delayed recall. Combined performance was evaluated using 25 sets of models, trained on 1,181 ADNI participants and tested on 127 patients from a memory clinic. RESULTS The 5-minute primary stage was highly sensitive (96.5%) but lacked specificity (34.1%), with an AUC of 87.5% and diagnostic odds ratio of 14.3. The optional secondary stage increased specificity to 58.6%, resulting in an overall AUC of 89.7% using the best model combination of logistic regression and gradient-boosted machine. CONCLUSION The primary stage is brief and effective at screening, with the optional two-stage technique further increasing specificity. The hierarchical two-stage technique exhibited similar accuracy but with reduced costs compared to the more common single-stage paradigm.
Collapse
Affiliation(s)
- Michael J. Kleiman
- Department of Neurology, Comprehensive Center for Brain Health, University of Miami Miller School of Medicine, Boca Raton, FL, USA
| | - Taylor Ariko
- Department of Neurology, Evelyn F. McKnight Brain Institute, University of Miami Miller School of Medicine, Miami, FL, USA
| | - James E. Galvin
- Department of Neurology, Comprehensive Center for Brain Health, University of Miami Miller School of Medicine, Boca Raton, FL, USA
| |
Collapse
|
89
|
Xie W, Wang L, Yu K, Shi T, Li W. Improved multi-layer binary firefly algorithm for optimizing feature selection and classification of microarray data. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
90
|
Oyelade ON, Agushaka JO, Ezugwu AE. Evolutionary binary feature selection using adaptive ebola optimization search algorithm for high-dimensional datasets. PLoS One 2023; 18:e0282812. [PMID: 36930670 PMCID: PMC10022820 DOI: 10.1371/journal.pone.0282812] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 02/23/2023] [Indexed: 03/18/2023] Open
Abstract
Feature selection problem represents the field of study that requires approximate algorithms to identify discriminative and optimally combined features. The evaluation and suitability of these selected features are often analyzed using classifiers. These features are locked with data increasingly being generated from different sources such as social media, surveillance systems, network applications, and medical records. The high dimensionality of these datasets often impairs the quality of the optimal combination of these features selected. The use of the binary optimization method has been proposed in the literature to address this challenge. However, the underlying deficiency of the single binary optimizer is transferred to the quality of the features selected. Though hybrid methods have been proposed, most still suffer from the inherited design limitation of the single combined methods. To address this, we proposed a novel hybrid binary optimization capable of effectively selecting features from increasingly high-dimensional datasets. The approach used in this study designed a sub-population selective mechanism that dynamically assigns individuals to a 2-level optimization process. The level-1 method first mutates items in the population and then reassigns them to a level-2 optimizer. The selective mechanism determines what sub-population is assigned for the level-2 optimizer based on the exploration and exploitation phase of the level-1 optimizer. In addition, we designed nested transfer (NT) functions and investigated the influence of the function on the level-1 optimizer. The binary Ebola optimization search algorithm (BEOSA) is applied for the level-1 mutation, while the simulated annealing (SA) and firefly (FFA) algorithms are investigated for the level-2 optimizer. The outcome of these are the HBEOSA-SA and HBEOSA-FFA, which are then investigated on the NT, and their corresponding variants HBEOSA-SA-NT and HBEOSA-FFA-NT with no NT applied. The hybrid methods were experimentally tested over high-dimensional datasets to address the challenge of feature selection. A comparative analysis was done on the methods to obtain performance variability with the low-dimensional datasets. Results obtained for classification accuracy for large, medium, and small-scale datasets are 0.995 using HBEOSA-FFA, 0.967 using HBEOSA-FFA-NT, and 0.953 using HBEOSA-FFA, respectively. Fitness and cost values relative to large, medium, and small-scale datasets are 0.066 and 0.934 using HBEOSA-FFA, 0.068 and 0.932 using HBEOSA-FFA, with 0.222 and 0.970 using HBEOSA-SA-NT, respectively. Findings from the study indicate that the HBEOSA-SA, HBEOSA-FFA, HBEOSA-SA-NT and HBEOSA-FFA-NT outperformed the BEOSA.
Collapse
Affiliation(s)
- Olaide N. Oyelade
- Department of Computer Science, Faculty of Physical Sciences, Ahmadu Bello University, Zaria, Nigeria
| | - Jeffrey O. Agushaka
- Unit for Data Science and Computing, North-West University, Potchefstroom, South Africa
| | - Absalom E. Ezugwu
- Unit for Data Science and Computing, North-West University, Potchefstroom, South Africa
- * E-mail:
| |
Collapse
|
91
|
Zhao H, Cao J, Xie J, Liao WH, Lei Y, Cao H, Qu Q, Bowen C. Wearable sensors and features for diagnosis of neurodegenerative diseases: A systematic review. Digit Health 2023; 9:20552076231173569. [PMID: 37214662 PMCID: PMC10192816 DOI: 10.1177/20552076231173569] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Accepted: 04/17/2023] [Indexed: 05/24/2023] Open
Abstract
Objective Neurodegenerative diseases affect millions of families around the world, while various wearable sensors and corresponding data analysis can be of great support for clinical diagnosis and health assessment. This systematic review aims to provide a comprehensive overview of the existing research that uses wearable sensors and features for the diagnosis of neurodegenerative diseases. Methods A systematic review was conducted of studies published between 2015 and 2022 in major scientific databases such as Web of Science, Google Scholar, PubMed, and Scopes. The obtained studies were analyzed and organized into the process of diagnosis: wearable sensors, feature extraction, and feature selection. Results The search led to 171 eligible studies included in this overview. Wearable sensors such as force sensors, inertial sensors, electromyography, electroencephalography, acoustic sensors, optical fiber sensors, and global positioning systems were employed to monitor and diagnose neurodegenerative diseases. Various features including physical features, statistical features, nonlinear features, and features from the network can be extracted from these wearable sensors, and the alteration of features toward neurodegenerative diseases was illustrated. Moreover, different kinds of feature selection methods such as filter, wrapper, and embedded methods help to find the distinctive indicator of the diseases and benefit to a better diagnosis performance. Conclusions This systematic review enables a comprehensive understanding of wearable sensors and features for the diagnosis of neurodegenerative diseases.
Collapse
Affiliation(s)
- Huan Zhao
- School of Mechanical Engineering, Xi’an Jiaotong University, Xi'an, P.R. China
| | - Junyi Cao
- School of Mechanical Engineering, Xi’an Jiaotong University, Xi'an, P.R. China
| | - Junxiao Xie
- School of Mechanical Engineering, Xi’an Jiaotong University, Xi'an, P.R. China
| | - Wei-Hsin Liao
- Department of Mechanical and Automation
Engineering, The Chinese University of Hong
Kong, Shatin, N.T., Hong Kong, China
| | - Yaguo Lei
- School of Mechanical Engineering, Xi’an Jiaotong University, Xi'an, P.R. China
| | - Hongmei Cao
- Department of Neurology, The First
Affiliated Hospital of Xi’an Jiaotong University, Xi’an, P.R. China
| | - Qiumin Qu
- Department of Neurology, The First
Affiliated Hospital of Xi’an Jiaotong University, Xi’an, P.R. China
| | - Chris Bowen
- Department of Mechanical Engineering, University of Bath, Bath, UK
| |
Collapse
|
92
|
Neonatal Disease Prediction Using Machine Learning Techniques. JOURNAL OF HEALTHCARE ENGINEERING 2023; 2023:3567194. [PMID: 36875748 PMCID: PMC9981287 DOI: 10.1155/2023/3567194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 01/04/2023] [Accepted: 02/09/2023] [Indexed: 02/25/2023]
Abstract
Neonatal diseases are among the main causes of morbidity and a significant contributor to underfive mortality in the world. There is an increase in understanding of the pathophysiology of the diseases and the implementation of different strategies to minimize their burden. However, improvements in outcomes are not adequate. Limited success is due to different factors, including the similarity of symptoms, which can lead to misdiagnosis, and the inability to detect early for timely intervention. In resource-limited countries like Ethiopia, the challenge is more severe. Low access to diagnosis and treatment due to the inadequacy of neonatal health professionals is one of the shortcomings. Due to the shortage of medical facilities, many neonatal health professionals are forced to decide the type of disease only based on interviews. They may not have a complete picture of all variables that have a contributing effect on neonatal disease from the interview. This can make the diagnosis inconclusive and may lead to a misdiagnosis. Machine learning has great potential for early prediction if relevant historical data is available. We have applied a classification stacking model for the following four main neonatal diseases: sepsis, birth asphyxia, necrotizing enter colitis (NEC), and respiratory distress syndrome. These diseases account for 75% of neonatal deaths. The dataset has been obtained from the Asella Comprehensive Hospital. It has been collected between 2018 and 2021. The developed stacking model was compared to three related machine-learning models XGBoost (XGB), Random Forest (RF), and Support Vector Machine (SVM). The proposed stacking model outperformed the other models, with an accuracy of 97.04%. We believe that this will contribute to the early detection and accurate diagnosis of neonatal diseases, especially for resource-limited health facilities.
Collapse
|
93
|
Mukherjee R, Kundu A, Mukherjee I, Gupta D, Tiwari P, Khanna A, Shorfuzzaman M. IoT-cloud based healthcare model for COVID-19 detection: an enhanced k-Nearest Neighbour classifier based approach. COMPUTING 2023; 105. [PMCID: PMC8085103 DOI: 10.1007/s00607-021-00951-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
COVID - 19 affected severely worldwide. The pandemic has caused many causalities in a very short span. The IoT-cloud-based healthcare model requirement is utmost in this situation to provide a better decision in the covid-19 pandemic. In this paper, an attempt has been made to perform predictive analytics regarding the disease using a machine learning classifier. This research proposed an enhanced KNN (k NearestNeighbor) algorithm eKNN, which did not randomly choose the value of k. However, it used a mathematical function of the dataset’s sample size while determining the k value. The enhanced KNN algorithm eKNN has experimented on 7 benchmark COVID-19 datasets of different size, which has been gathered from standard data cloud of different countries (Brazil, Mexico, etc.). It appeared that the enhanced KNN classifier performs significantly better than ordinary KNN. The second research question augmented the enhanced KNN algorithm with feature selection using ACO (Ant Colony Optimization). Results indicated that the enhanced KNN classifier along with the feature selection mechanism performed way better than enhanced KNN without feature selection. This paper involves proposing an improved KNN attempting to find an optimal value of k and studying IoT-cloud-based COVID - 19 detection.
Collapse
Affiliation(s)
- Rajendrani Mukherjee
- Department of Computer Science and Engineering, University of Engineering and Management, Kolkata, India
| | - Aurghyadip Kundu
- Department of Computer Science and Engineering, Brainware University, Kolkata, India
| | - Indrajit Mukherjee
- Department of Computer Science and Engineering, Birla Institute of Technology, Mesra, India
| | - Deepak Gupta
- Maharaja Agrasen Institute of Technology, Delhi, India
| | - Prayag Tiwari
- Department of Computer Science, Aalto University, Espoo, Finland
| | - Ashish Khanna
- Maharaja Agrasen Institute of Technology, Delhi, India
| | - Mohammad Shorfuzzaman
- Department of Computer Science, College of Computers and Information Technology, Taif University, Taif, 21944 Saudi Arabia
| |
Collapse
|
94
|
Moly A, Aksenov A, Martel F, Aksenova T. Online adaptive group-wise sparse Penalized Recursive Exponentially Weighted N-way Partial Least Square for epidural intracranial BCI. Front Hum Neurosci 2023; 17:1075666. [PMID: 36950147 PMCID: PMC10025377 DOI: 10.3389/fnhum.2023.1075666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 02/03/2023] [Indexed: 03/08/2023] Open
Abstract
Introduction Motor Brain-Computer Interfaces (BCIs) create new communication pathways between the brain and external effectors for patients with severe motor impairments. Control of complex effectors such as robotic arms or exoskeletons is generally based on the real-time decoding of high-resolution neural signals. However, high-dimensional and noisy brain signals pose challenges, such as limitations in the generalization ability of the decoding model and increased computational demands. Methods The use of sparse decoders may offer a way to address these challenges. A sparsity-promoting penalization is a common approach to obtaining a sparse solution. BCI features are naturally structured and grouped according to spatial (electrodes), frequency, and temporal dimensions. Applying group-wise sparsity, where the coefficients of a group are set to zero simultaneously, has the potential to decrease computational time and memory usage, as well as simplify data transfer. Additionally, online closed-loop decoder adaptation (CLDA) is known to be an efficient procedure for BCI decoder training, taking into account neuronal feedback. In this study, we propose a new algorithm for online closed-loop training of group-wise sparse multilinear decoders using L p -Penalized Recursive Exponentially Weighted N-way Partial Least Square (PREW-NPLS). Three types of sparsity-promoting penalization were explored using L p with p = 0., 0.5, and 1. Results The algorithms were tested offline in a pseudo-online manner for features grouped by spatial dimension. A comparison study was conducted using an epidural ECoG dataset recorded from a tetraplegic individual during long-term BCI experiments for controlling a virtual avatar (left/right-hand 3D translation). Novel algorithms showed comparable or better decoding performance than conventional REW-NPLS, which was achieved with sparse models. The proposed algorithms are compatible with real-time CLDA. Discussion The proposed algorithm demonstrated good performance while drastically reducing the computational load and the memory consumption. However, the current study is limited to offline computation on data recorded with a single patient, with penalization restricted to the spatial domain only.
Collapse
Affiliation(s)
- Alexandre Moly
- Université Grenoble Alpes, CEA, LETI, Clinatec, Grenoble, France
| | | | - Félix Martel
- Université Grenoble Alpes, CEA, LETI, Clinatec, Grenoble, France
| | - Tetiana Aksenova
- Université Grenoble Alpes, CEA, LETI, Clinatec, Grenoble, France
- *Correspondence: Tetiana Aksenova
| |
Collapse
|
95
|
Georgiadou E, Bougias H, Leandrou S, Stogiannos N. Radiomics for Alzheimer's Disease: Fundamental Principles and Clinical Applications. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2023; 1424:297-311. [PMID: 37486507 DOI: 10.1007/978-3-031-31982-2_34] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/25/2023]
Abstract
Alzheimer's disease is a neurodegenerative disease with a huge impact on people's quality of life, life expectancy, and morbidity. The ongoing prevalence of the disease, in conjunction with an increased financial burden to healthcare services, necessitates the development of new technologies to be employed in this field. Hence, advanced computational methods have been developed to facilitate early and accurate diagnosis of the disease and improve all health outcomes. Artificial intelligence is now deeply involved in the fight against this disease, with many clinical applications in the field of medical imaging. Deep learning approaches have been tested for use in this domain, while radiomics, an emerging quantitative method, are already being evaluated to be used in various medical imaging modalities. This chapter aims to provide an insight into the fundamental principles behind radiomics, discuss the most common techniques alongside their strengths and weaknesses, and suggest ways forward for future research standardization and reproducibility.
Collapse
Affiliation(s)
- Eleni Georgiadou
- Department of Radiology, Metaxa Anticancer Hospital, Piraeus, Greece
| | - Haralabos Bougias
- Department of Clinical Radiology, University Hospital of Ioannina, Ioannina, Greece
| | - Stephanos Leandrou
- Department of Health Sciences, School of Sciences, European University Cyprus, Engomi, Cyprus
| | - Nikolaos Stogiannos
- Discipline of Medical Imaging and Radiation Therapy, University College Cork, Cork, Ireland.
- Division of Midwifery & Radiography, City, University of London, London, UK.
- Medical Imaging Department, Corfu General Hospital, Corfu, Greece.
| |
Collapse
|
96
|
Puyana-Romero V, Díaz-Márquez AM, Ciaburro G, Hernández-Molina R. The Acoustic Environment and University Students' Satisfaction with the Online Education Method during the COVID-19 Lockdown. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 20:709. [PMID: 36613032 PMCID: PMC9819076 DOI: 10.3390/ijerph20010709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 12/20/2022] [Accepted: 12/27/2022] [Indexed: 06/17/2023]
Abstract
The acoustic environment has been pointed out as a possible distractor during student activities in the online academic modality; however, it has not been specifically studied, nor has it been studied in relation to parameters frequently used in academic-quality evaluations. The objective of this study is to characterize the acoustic environment and relate it to students' satisfaction with the online learning modality. For that, three artificial neural networks were calculated, using as target variables the students' satisfaction and the noise interference with autonomous and synchronous activities, using acoustic variables as predictors. The data were obtained during the COVID-19 lockdown, through an online survey addressed to the students of the Universidad de Las Américas (Quito, Ecuador). Results show that the noise interference with comprehensive reading or with making exams and that the frequency of noises, which made the students lose track of the lesson, were relevant factors for students' satisfaction. The perceived loudness also had a remarkable influence on engaging in autonomous and synchronous activities. The performance of the models on students' satisfaction and on the noise interference with autonomous and synchronous activities was satisfactory given that it was built only with acoustic variables, with correlation coefficients of 0.567, 0.853, and 0.865, respectively.
Collapse
Affiliation(s)
- Virginia Puyana-Romero
- Department of Sound and Acoustic Engineering, Universidad de Las Américas, Quito EC170125, Ecuador
- Laboratory of Acoustic Engineering, Universidad de Cádiz, 11510 Puerto Real, Spain
| | | | - Giuseppe Ciaburro
- Department of Architecture and Industrial Design, Università degli Studi della Campania Luigi Vanvitelli, Borgo San Lorenzo, 81031 Aversa, Italy
| | | |
Collapse
|
97
|
Noshad A, Fallahi S. A new hybrid framework based on deep neural networks and JAYA optimization algorithm for feature selection using SVM applied to classification of acute lymphoblastic Leukaemia. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING: IMAGING & VISUALIZATION 2022. [DOI: 10.1080/21681163.2022.2157748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Affiliation(s)
- Ali Noshad
- Department of Engineering, Polytechnic University of Milan, Milan, Italy
| | - Saeed Fallahi
- Department of Mathematics, Salman Farsi University of Kazerun, Kazerun, Iran
| |
Collapse
|
98
|
Silva Rocha ED, de Morais Melo FL, de Mello MEF, Figueiroa B, Sampaio V, Endo PT. On usage of artificial intelligence for predicting mortality during and post-pregnancy: a systematic review of literature. BMC Med Inform Decis Mak 2022; 22:334. [PMID: 36536413 PMCID: PMC9764498 DOI: 10.1186/s12911-022-02082-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Care during pregnancy, childbirth and puerperium are fundamental to avoid pathologies for the mother and her baby. However, health issues can occur during this period, causing misfortunes, such as the death of the fetus or neonate. Predictive models of fetal and infant deaths are important technological tools that can help to reduce mortality indexes. The main goal of this work is to present a systematic review of literature focused on computational models to predict mortality, covering stillbirth, perinatal, neonatal, and infant deaths, highlighting their methodology and the description of the proposed computational models. METHODS We conducted a systematic review of literature, limiting the search to the last 10 years of publications considering the five main scientific databases as source. RESULTS From 671 works, 18 of them were selected as primary studies for further analysis. We found that most of works are focused on prediction of neonatal deaths, using machine learning models (more specifically Random Forest). The top five most common features used to train models are birth weight, gestational age, sex of the child, Apgar score and mother's age. Having predictive models for preventing mortality during and post-pregnancy not only improve the mother's quality of life, as well as it can be a powerful and low-cost tool to decrease mortality ratios. CONCLUSION Based on the results of this SRL, we can state that scientific efforts have been done in this area, but there are many open research opportunities to be developed by the community.
Collapse
Affiliation(s)
- Elisson da Silva Rocha
- grid.26141.300000 0000 9011 5442Programa de Pós-Graduação em Engenharia da Computação, Universidade de Pernambuco, Recife, Brazil
| | - Flavio Leandro de Morais Melo
- grid.26141.300000 0000 9011 5442Programa de Pós-Graduação em Engenharia da Computação, Universidade de Pernambuco, Recife, Brazil
| | | | - Barbara Figueiroa
- Programa Mãe Coruja Pernambucana, Secretaria de Saúde do Estado de Pernambuco, Recife, Brazil
| | | | - Patricia Takako Endo
- grid.26141.300000 0000 9011 5442Programa de Pós-Graduação em Engenharia da Computação, Universidade de Pernambuco, Recife, Brazil
| |
Collapse
|
99
|
Chiu SI, Fan LY, Lin CH, Chen TF, Lim WS, Jang JSR, Chiu MJ. Machine Learning-Based Classification of Subjective Cognitive Decline, Mild Cognitive Impairment, and Alzheimer's Dementia Using Neuroimage and Plasma Biomarkers. ACS Chem Neurosci 2022; 13:3263-3270. [PMID: 36378559 DOI: 10.1021/acschemneuro.2c00255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Alzheimer's disease (AD) progresses relentlessly from the preclinical to the dementia stage. The process begins decades before the diagnosis of dementia. Therefore, it is crucial to detect early manifestations to prevent cognitive decline. Recent advances in artificial intelligence help tackle the complex high-dimensional data encountered in clinical decision-making. In total, we recruited 206 subjects, including 69 cognitively unimpaired, 40 subjective cognitive decline (SCD), 34 mild cognitive impairment (MCI), and 63 AD dementia (ADD). We included 3 demographic, 1 clinical, 18 brain-image, and 3 plasma biomarker (Aß1-42, Aß1-40, and tau protein) features. We employed the linear discriminant analysis method for feature extraction to make data more distinguishable after dimension reduction. The sequential forward selection method was used for feature selection to identify the 12 best features for machine learning classifiers. We used both random forest and support vector machine as classifiers. The area under the receiver operative curve (AUROC) was close to 0.9 between diseased (combining ADD and MCI) and the controls. AUROC was higher than 0.85 between SCD and controls, 0.90 between MCI and SCD, and above 0.85 between ADD and MCI. We can differentiate between adjacent phases of the AD spectrum with blood biomarkers and brain MR images with the help of machine learning algorithms.
Collapse
Affiliation(s)
- Shu-I Chiu
- Department of Computer Science, National Chengchi University, Taipei 116302, Taiwan
| | - Ling-Yun Fan
- Queensland Brain Institute, University of Queensland, St Lucia, QLD 4067, Australia.,Departments of Neurology, National Taiwan University Hospital Bei-Hu Branch, Taipei 108206, Taiwan
| | - Chin-Hsien Lin
- Department of Neurology, College of Medicine, National Taiwan University Hospital, National Taiwan University, Taipei 100225, Taiwan
| | - Ta-Fu Chen
- Department of Neurology, College of Medicine, National Taiwan University Hospital, National Taiwan University, Taipei 100225, Taiwan
| | - Wee Shin Lim
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei 106319, Taiwan
| | - Jyh-Shing Roger Jang
- Department of Computer Science and Information Engineering, National Taiwan University, Taipei 106319, Taiwan
| | - Ming-Jang Chiu
- Department of Neurology, College of Medicine, National Taiwan University Hospital, National Taiwan University, Taipei 100225, Taiwan.,Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei 106319, Taiwan.,Graduate Institute of Brain and Mind Sciences, National Taiwan University, Taipei 100233, Taiwan.,Graduate Institute of Psychology, National Taiwan University, Taipei 106319, Taiwan
| |
Collapse
|
100
|
Elaziz MA, Ewees AA, Al-qaness MAA, Alshathri S, Ibrahim RA. Feature Selection for High Dimensional Datasets Based on Quantum-Based Dwarf Mongoose Optimization. MATHEMATICS 2022; 10:4565. [DOI: 10.3390/math10234565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
Feature selection (FS) methods play essential roles in different machine learning applications. Several FS methods have been developed; however, those FS methods that depend on metaheuristic (MH) algorithms showed impressive performance in various domains. Thus, in this paper, based on the recent advances in MH algorithms, we introduce a new FS technique to modify the performance of the Dwarf Mongoose Optimization (DMO) Algorithm using quantum-based optimization (QBO). The main idea is to utilize QBO as a local search of the traditional DMO to avoid its search limitations. So, the developed method, named DMOAQ, benefits from the advantages of the DMO and QBO. It is tested with well-known benchmark and high-dimensional datasets, with comprehensive comparisons to several optimization methods, including the original DMO. The evaluation outcomes verify that the DMOAQ has significantly enhanced the search capability of the traditional DMO and outperformed other compared methods in the evaluation experiments.
Collapse
|