1
|
Zhou T, Guan Y, Lin X, Zhou X, Mao L, Ma Y, Fan B, Li J, Tu W, Liu S, Fan L. A clinical-radiomics nomogram based on automated segmentation of chest CT to discriminate PRISm and COPD patients. Eur J Radiol Open 2024; 13:100580. [PMID: 38989052 PMCID: PMC11233899 DOI: 10.1016/j.ejro.2024.100580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 05/31/2024] [Accepted: 06/11/2024] [Indexed: 07/12/2024] Open
Abstract
Purpose It is vital to develop noninvasive approaches with high accuracy to discriminate the preserved ratio impaired spirometry (PRISm) group from the chronic obstructive pulmonary disease (COPD) groups. Radiomics has emerged as an image analysis technique. This study aims to develop and confirm the new radiomics-based noninvasive approach to discriminate these two groups. Methods Totally 1066 subjects from 4 centers were included in this retrospective research, and classified into training, internal validation or external validation sets. The chest computed tomography (CT) images were segmented by the fully automated deep learning segmentation algorithm (Unet231) for radiomics feature extraction. We established the radiomics signature (Rad-score) using the least absolute shrinkage and selection operator algorithm, then conducted ten-fold cross-validation using the training set. Last, we constructed a radiomics signature by incorporating independent risk factors using the multivariate logistic regression model. Model performance was evaluated by receiver operating characteristic (ROC) curve, calibration curve, and decision curve analyses (DCA). Results The Rad-score, including 15 radiomic features in whole-lung region, which was suitable for diffuse lung diseases, was demonstrated to be effective for discriminating between PRISm and COPD. Its diagnostic accuracy was improved through integrating Rad-score with a clinical model, and the area under the ROC (AUC) were 0.82(95 %CI 0.79-0.86), 0.77(95 %CI 0.72-0.83) and 0.841(95 %CI 0.78-0.91) for training, internal validation and external validation sets, respectively. As revealed by analysis, radiomics nomogram showed good fit and superior clinical utility. Conclusions The present work constructed the new radiomics-based nomogram and verified its reliability for discriminating between PRISm and COPD.
Collapse
Affiliation(s)
- TaoHu Zhou
- Department of Radiology, Second Affiliated Hospital of Naval Medical University, No. 415 Fengyang Road, Shanghai 200003, China
- School of Medical Imaging, Shandong Second Medical University, Weifang, Shandong 261053, China
| | - Yu Guan
- Department of Radiology, Second Affiliated Hospital of Naval Medical University, No. 415 Fengyang Road, Shanghai 200003, China
| | - XiaoQing Lin
- Department of Radiology, Second Affiliated Hospital of Naval Medical University, No. 415 Fengyang Road, Shanghai 200003, China
- College of Health Sciences and Engineering, University of Shanghai for Science and Technology, No.516 Jungong Road, Shanghai 200093, China
| | - XiuXiu Zhou
- Department of Radiology, Second Affiliated Hospital of Naval Medical University, No. 415 Fengyang Road, Shanghai 200003, China
| | - Liang Mao
- Department of Medical Imaging, Affiliated Hospital of Ji Ning Medical University, Ji Ning 272000, China
| | - YanQing Ma
- Department of Radiology, Zhejiang Provincial People's Hospital, Affiliated People's Hospital of Hangzhou Medical College, Hangzhou, ZJ, China
| | - Bing Fan
- Department of Radiology, Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang Medical College, Nanchang, China
| | - Jie Li
- Department of Radiology, Second Affiliated Hospital of Naval Medical University, No. 415 Fengyang Road, Shanghai 200003, China
- College of Health Sciences and Engineering, University of Shanghai for Science and Technology, No.516 Jungong Road, Shanghai 200093, China
| | - WenTing Tu
- Department of Radiology, Second Affiliated Hospital of Naval Medical University, No. 415 Fengyang Road, Shanghai 200003, China
| | - ShiYuan Liu
- Department of Radiology, Second Affiliated Hospital of Naval Medical University, No. 415 Fengyang Road, Shanghai 200003, China
| | - Li Fan
- Department of Radiology, Second Affiliated Hospital of Naval Medical University, No. 415 Fengyang Road, Shanghai 200003, China
| |
Collapse
|
2
|
Kanchanapiboon P, Tunksook P, Tunksook P, Ritthipravat P, Boonpratham S, Satravaha Y, Chaweewannakorn C, Peanchitlertkajorn S. Classification of cervical vertebral maturation stages with machine learning models: leveraging datasets with high inter- and intra-observer agreement. Prog Orthod 2024; 25:35. [PMID: 39279025 PMCID: PMC11402886 DOI: 10.1186/s40510-024-00535-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Accepted: 07/22/2024] [Indexed: 09/18/2024] Open
Abstract
OBJECTIVES This study aimed to assess the accuracy of machine learning (ML) models with feature selection technique in classifying cervical vertebral maturation stages (CVMS). Consensus-based datasets were used for models training and evaluation for their model generalization capabilities on unseen datasets. METHODS Three clinicians independently rated CVMS on 1380 lateral cephalograms, resulting in the creation of five datasets: two consensus-based datasets (Complete Agreement and Majority Voting), and three datasets based on a single rater's evaluations. Additionally, landmarks annotation of the second to fourth cervical vertebrae and patients' information underwent a feature selection process. These datasets were used to train various ML models and identify the top-performing model for each dataset. These models were subsequently tested on their generalization capabilities. RESULTS Features that considered significant in the consensus-based datasets were consistent with a CVMS guideline. The Support Vector Machine model on the Complete Agreement dataset achieved the highest accuracy (77.4%), followed by the Multi-Layer Perceptron model on the Majority Voting dataset (69.6%). Models from individual ratings showed lower accuracies (60.4-67.9%). The consensus-based training models also exhibited lower coefficient of variation (CV), indicating superior generalization capability compared to models from single raters. CONCLUSION ML models trained on consensus-based datasets for CVMS classification exhibited the highest accuracy, with significant features consistent with the original CVMS guidelines. These models also showed robust generalization capabilities, underscoring the importance of dataset quality.
Collapse
Affiliation(s)
- Potjanee Kanchanapiboon
- Division of Nuclear Medicine, Department of Radiology, Faculty of Medicine Siriraj Hospital, Mahidol University, 2 Wang Lang Rd, Siriraj, Bangkok Noi, Bangkok, 10700, Thailand
| | - Pitipat Tunksook
- Department of Orthodontics, Faculty of Dentistry, Mahidol University, 6 Yothi Rd, Thung Phaya Thai, Ratchathewi, Bangkok, 10400, Thailand
| | | | - Panrasee Ritthipravat
- Department of Biomedical Engineering, Faculty of Engineering, Mahidol University, 999 Phutthamonthon 4 Rd, Salaya, Nakhon Pathom, 73170, Thailand
| | - Supatchai Boonpratham
- Department of Orthodontics, Faculty of Dentistry, Mahidol University, 6 Yothi Rd, Thung Phaya Thai, Ratchathewi, Bangkok, 10400, Thailand
| | - Yodhathai Satravaha
- Department of Orthodontics, Faculty of Dentistry, Mahidol University, 6 Yothi Rd, Thung Phaya Thai, Ratchathewi, Bangkok, 10400, Thailand
| | - Chaiyapol Chaweewannakorn
- Department of Orthodontics, Faculty of Dentistry, Mahidol University, 6 Yothi Rd, Thung Phaya Thai, Ratchathewi, Bangkok, 10400, Thailand
| | - Supakit Peanchitlertkajorn
- Department of Orthodontics, Faculty of Dentistry, Mahidol University, 6 Yothi Rd, Thung Phaya Thai, Ratchathewi, Bangkok, 10400, Thailand.
| |
Collapse
|
3
|
Meng Q, Chen B, Xu Y, Zhang Q, Ding R, Ma Z, Jin Z, Gao S, Qu F. A machine learning model for early candidemia prediction in the intensive care unit: Clinical application. PLoS One 2024; 19:e0309748. [PMID: 39250466 PMCID: PMC11383240 DOI: 10.1371/journal.pone.0309748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Accepted: 08/17/2024] [Indexed: 09/11/2024] Open
Abstract
Candidemia often poses a diagnostic challenge due to the lack of specific clinical features, and delayed antifungal therapy can significantly increase mortality rates, particularly in the intensive care unit (ICU). This study aims to develop a machine learning predictive model for early candidemia diagnosis in ICU patients, leveraging their clinical information and findings. We conducted this study with a cohort of 334 patients admitted to the ICU unit at Ji Ning NO.1 people's hospital in China from Jan. 2015 to Dec. 2022. To ensure the model's reliability, we validated this model with an external group consisting of 77 patients from other sources. The candidemia to bacteremia ratio is 1:1. We collected relevant clinical procedures and eighteen key examinations or tests features to support the recursive feature elimination (RFE) algorithm. These features included total bilirubin, age, platelet count, hemoglobin, CVC, lymphocyte, Duration of stay in ICU and so on. To construct the candidemia diagnosis model, we employed random forest (RF) algorithm alongside other machine learning methods and conducted internal and external validation with training and testing sets allocated in a 7:3 ratio. The RF model demonstrated the highest area under the receiver operating characteristic (AUC) with values of 0.87 and 0.83 for internal and external validation, respectively. To evaluate the importance of features in predicting candidemia, Shapley additive explanation (SHAP) values were calculated and results revealed that total bilirubin and age were the most important factors in the prediction model. This advancement in candidemia prediction holds significant promise for early intervention and improved patient outcomes in the ICU setting, where timely diagnosis is of paramount crucial.
Collapse
Affiliation(s)
- Qiang Meng
- Jining No. 1 People's Hospital Affiliated to Shandong First Medical University, Jining, Shandong, China
| | - Bowang Chen
- Jining No. 1 People's Hospital Affiliated to Shandong First Medical University, Jining, Shandong, China
| | - Yingyuan Xu
- Pulmonary and Critical Care Medicine, Tengzhou Central People's Hospital, Tengzhou City, Shandong Province, People's Republic of China
| | - Qiang Zhang
- Pulmonary and Critical Care Medicine, Tengzhou Central People's Hospital, Tengzhou City, Shandong Province, People's Republic of China
| | - Ranran Ding
- Jining No. 1 People's Hospital Affiliated to Shandong First Medical University, Jining, Shandong, China
| | - Zhen Ma
- Jining No. 1 People's Hospital Affiliated to Shandong First Medical University, Jining, Shandong, China
| | - Zhi Jin
- Jining No. 1 People's Hospital Affiliated to Shandong First Medical University, Jining, Shandong, China
| | - Shuhong Gao
- Jining No. 1 People's Hospital Affiliated to Shandong First Medical University, Jining, Shandong, China
| | - Feng Qu
- Jining No. 1 People's Hospital Affiliated to Shandong First Medical University, Jining, Shandong, China
| |
Collapse
|
4
|
Attallah O. Skin cancer classification leveraging multi-directional compact convolutional neural network ensembles and gabor wavelets. Sci Rep 2024; 14:20637. [PMID: 39232043 PMCID: PMC11375051 DOI: 10.1038/s41598-024-69954-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Accepted: 08/12/2024] [Indexed: 09/06/2024] Open
Abstract
Skin cancer (SC) is an important medical condition that necessitates prompt identification to ensure timely treatment. Although visual evaluation by dermatologists is considered the most reliable method, its efficacy is subjective and laborious. Deep learning-based computer-aided diagnostic (CAD) platforms have become valuable tools for supporting dermatologists. Nevertheless, current CAD tools frequently depend on Convolutional Neural Networks (CNNs) with huge amounts of deep layers and hyperparameters, single CNN model methodologies, large feature space, and exclusively utilise spatial image information, which restricts their effectiveness. This study presents SCaLiNG, an innovative CAD tool specifically developed to address and surpass these constraints. SCaLiNG leverages a collection of three compact CNNs and Gabor Wavelets (GW) to acquire a comprehensive feature vector consisting of spatial-textural-frequency attributes. SCaLiNG gathers a wide range of image details by breaking down these photos into multiple directional sub-bands using GW, and then learning several CNNs using those sub-bands and the original picture. SCaLiNG also combines attributes taken from various CNNs trained with the actual images and subbands derived from GW. This fusion process correspondingly improves diagnostic accuracy due to the thorough representation of attributes. Furthermore, SCaLiNG applies a feature selection approach which further enhances the model's performance by choosing the most distinguishing features. Experimental findings indicate that SCaLiNG maintains a classification accuracy of 0.9170 in categorising SC subcategories, surpassing conventional single-CNN models. The outstanding performance of SCaLiNG underlines its ability to aid dermatologists in swiftly and precisely recognising and classifying SC, thereby enhancing patient outcomes.
Collapse
Affiliation(s)
- Omneya Attallah
- Department of Electronics and Communications Engineering, College of Engineering and Technology, Arab Academy for Science, Technology and Maritime Transport, Alexandria, 21937, Egypt.
- Wearables, Biosensing, and Biosignal Processing Laboratory, Arab Academy for Science, Technology, and Maritime Transport, Alexandria, 21937, Egypt.
| |
Collapse
|
5
|
Zhou T, Guan Y, Lin X, Zhou X, Mao L, Ma Y, Fan B, Li J, Liu S, Fan L. CT-based whole lung radiomics nomogram for identification of PRISm from non-COPD subjects. Respir Res 2024; 25:329. [PMID: 39227894 PMCID: PMC11373438 DOI: 10.1186/s12931-024-02964-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 08/28/2024] [Indexed: 09/05/2024] Open
Abstract
BACKGROUND Preserved Ratio Impaired Spirometry (PRISm) is considered to be a precursor of chronic obstructive pulmonary disease. Radiomics nomogram can effectively identify the PRISm subjects from non-COPD subjects, especially when during large-scale CT lung cancer screening. METHODS Totally 1481 participants (864, 370 and 247 in training, internal validation, and external validation cohorts, respectively) were included. Whole lung on thin-section computed tomography (CT) was segmented with a fully automated segmentation algorithm. PyRadiomics was adopted for extracting radiomics features. Clinical features were also obtained. Moreover, Spearman correlation analysis, minimum redundancy maximum relevance (mRMR) feature ranking and least absolute shrinkage and selection operator (LASSO) classifier were adopted to analyze whether radiomics features could be used to build radiomics signatures. A nomogram that incorporated clinical features and radiomics signature was constructed through multivariable logistic regression. Last, calibration, discrimination and clinical usefulness were analyzed using validation cohorts. RESULTS The radiomics signature, which included 14 stable features, was related to PRISm of training and validation cohorts (p < 0.001). The radiomics nomogram incorporating independent predicting factors (radiomics signature, age, BMI, and gender) well discriminated PRISm from non-COPD subjects compared with clinical model or radiomics signature alone for training cohort (AUC 0.787 vs. 0.675 vs. 0.778), internal (AUC 0.773 vs. 0.682 vs. 0.767) and external validation cohorts (AUC 0.702 vs. 0.610 vs. 0.699). Decision curve analysis suggested that our constructed radiomics nomogram outperformed clinical model. CONCLUSIONS The CT-based whole lung radiomics nomogram could identify PRISm to help decision-making in clinic.
Collapse
Affiliation(s)
- TaoHu Zhou
- Department of Radiology, Second Affiliated Hospital of Naval Medical University, No. 415 Fengyang Road, Shanghai, 200003, China
- School of Medical Imaging, Shandong Second Medical University, Weifang, 261053, Shandong, China
| | - Yu Guan
- Department of Radiology, Second Affiliated Hospital of Naval Medical University, No. 415 Fengyang Road, Shanghai, 200003, China
| | - XiaoQing Lin
- Department of Radiology, Second Affiliated Hospital of Naval Medical University, No. 415 Fengyang Road, Shanghai, 200003, China
- College of Health Sciences and Engineering, University of Shanghai for Science and Technology, No.516 Jungong Road, Shanghai, 200093, China
| | - XiuXiu Zhou
- Department of Radiology, Second Affiliated Hospital of Naval Medical University, No. 415 Fengyang Road, Shanghai, 200003, China
| | - Liang Mao
- Department of Medical Imaging, Affiliated Hospital of Ji Ning Medical University, Ji Ning, 272000, China
| | - YanQing Ma
- Department of Radiology, Zhejiang Provincial People's Hospital, Affiliated People's Hospital of Hangzhou Medical College, Hangzhou, ZJ, China
| | - Bing Fan
- Department of Radiology, Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang Medical College, Nanchang, China
| | - Jie Li
- Department of Radiology, Second Affiliated Hospital of Naval Medical University, No. 415 Fengyang Road, Shanghai, 200003, China
- College of Health Sciences and Engineering, University of Shanghai for Science and Technology, No.516 Jungong Road, Shanghai, 200093, China
| | - ShiYuan Liu
- Department of Radiology, Second Affiliated Hospital of Naval Medical University, No. 415 Fengyang Road, Shanghai, 200003, China
| | - Li Fan
- Department of Radiology, Second Affiliated Hospital of Naval Medical University, No. 415 Fengyang Road, Shanghai, 200003, China.
| |
Collapse
|
6
|
Elkahwagy DMAS, Kiriacos CJ, Mansour M. Logistic regression and other statistical tools in diagnostic biomarker studies. Clin Transl Oncol 2024; 26:2172-2180. [PMID: 38530558 PMCID: PMC11333519 DOI: 10.1007/s12094-024-03413-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 02/16/2024] [Indexed: 03/28/2024]
Abstract
A biomarker is a measured indicator of a variety of processes, and is often used as a clinical tool for the diagnosis of diseases. While the developmental process of biomarkers from lab to clinic is complex, initial exploratory stages often focus on characterizing the potential of biomarkers through utilizing various statistical methods that can be used to assess their discriminatory performance, establish an appropriate cut-off that transforms continuous data to apt binary responses of confirming or excluding a diagnosis, or establish a robust association when tested against confounders. This review aims to provide a gentle introduction to the most common tools found in diagnostic biomarker studies used to assess the performance of biomarkers with an emphasis on logistic regression.
Collapse
Affiliation(s)
| | - Caroline Joseph Kiriacos
- Pharmaceutical Biology Department, Faculty of Pharmacy and Biotechnology, German University in Cairo, Cairo, 11835, Egypt
| | - Manar Mansour
- Pharmaceutical Biology Department, Faculty of Pharmacy and Biotechnology, German University in Cairo, Cairo, 11835, Egypt
| |
Collapse
|
7
|
Geng Y, Li Y, Deng C. An Improved Binary Walrus Optimizer with Golden Sine Disturbance and Population Regeneration Mechanism to Solve Feature Selection Problems. Biomimetics (Basel) 2024; 9:501. [PMID: 39194480 DOI: 10.3390/biomimetics9080501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2024] [Revised: 08/13/2024] [Accepted: 08/14/2024] [Indexed: 08/29/2024] Open
Abstract
Feature selection (FS) is a significant dimensionality reduction technique in machine learning and data mining that is adept at managing high-dimensional data efficiently and enhancing model performance. Metaheuristic algorithms have become one of the most promising solutions in FS owing to their powerful search capabilities as well as their performance. In this paper, the novel improved binary walrus optimizer (WO) algorithm utilizing the golden sine strategy, elite opposition-based learning (EOBL), and population regeneration mechanism (BGEPWO) is proposed for FS. First, the population is initialized using an iterative chaotic map with infinite collapses (ICMIC) chaotic map to improve the diversity. Second, a safe signal is obtained by introducing an adaptive operator to enhance the stability of the WO and optimize the trade-off between exploration and exploitation of the algorithm. Third, BGEPWO innovatively designs a population regeneration mechanism to continuously eliminate hopeless individuals and generate new promising ones, which keeps the population moving toward the optimal solution and accelerates the convergence process. Fourth, EOBL is used to guide the escape behavior of the walrus to expand the search range. Finally, the golden sine strategy is utilized for perturbing the population in the late iteration to improve the algorithm's capacity to evade local optima. The BGEPWO algorithm underwent evaluation on 21 datasets of different sizes and was compared with the BWO algorithm and 10 other representative optimization algorithms. The experimental results demonstrate that BGEPWO outperforms these competing algorithms in terms of fitness value, number of selected features, and F1-score in most datasets. The proposed algorithm achieves higher accuracy, better feature reduction ability, and stronger convergence by increasing population diversity, continuously balancing exploration and exploitation processes and effectively escaping local optimal traps.
Collapse
Affiliation(s)
- Yanyu Geng
- College of Computer Science and Technology, Jilin University, Changchun 130012, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| | - Ying Li
- College of Computer Science and Technology, Jilin University, Changchun 130012, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| | - Chunyan Deng
- College of Computer Science and Technology, Jilin University, Changchun 130012, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| |
Collapse
|
8
|
Liu F. Data Science Methods for Real-World Evidence Generation in Real-World Data. Annu Rev Biomed Data Sci 2024; 7:201-224. [PMID: 38748863 DOI: 10.1146/annurev-biodatasci-102423-113220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/25/2024]
Abstract
In the healthcare landscape, data science (DS) methods have emerged as indispensable tools to harness real-world data (RWD) from various data sources such as electronic health records, claim and registry data, and data gathered from digital health technologies. Real-world evidence (RWE) generated from RWD empowers researchers, clinicians, and policymakers with a more comprehensive understanding of real-world patient outcomes. Nevertheless, persistent challenges in RWD (e.g., messiness, voluminousness, heterogeneity, multimodality) and a growing awareness of the need for trustworthy and reliable RWE demand innovative, robust, and valid DS methods for analyzing RWD. In this article, I review some common current DS methods for extracting RWE and valuable insights from complex and diverse RWD. This article encompasses the entire RWE-generation pipeline, from study design with RWD to data preprocessing, exploratory analysis, methods for analyzing RWD, and trustworthiness and reliability guarantees, along with data ethics considerations and open-source tools. This review, tailored for an audience that may not be experts in DS, aspires to offer a systematic review of DS methods and assists readers in selecting suitable DS methods and enhancing the process of RWE generation for addressing their specific challenges.
Collapse
Affiliation(s)
- Fang Liu
- Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, Indiana, USA;
| |
Collapse
|
9
|
Lu Y, Duong T, Miao Z, Thieu T, Lamichhane J, Ahmed A, Delen D. A novel hyperparameter search approach for accuracy and simplicity in disease prediction risk scoring. J Am Med Inform Assoc 2024; 31:1763-1773. [PMID: 38899502 PMCID: PMC11258418 DOI: 10.1093/jamia/ocae140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 05/07/2024] [Accepted: 05/28/2024] [Indexed: 06/21/2024] Open
Abstract
OBJECTIVE Develop a novel technique to identify an optimal number of regression units corresponding to a single risk point, while creating risk scoring systems from logistic regression-based disease predictive models. The optimal value of this hyperparameter balances simplicity and accuracy, yielding risk scores of small scale and high accuracy for patient risk stratification. MATERIALS AND METHODS The proposed technique applies an adapted line search across all potential hyperparameter values. Additionally, DeLong test is integrated to ensure the selected value produces an accuracy insignificantly different from the best achievable risk score accuracy. We assessed the approach through two case studies predicting diabetic retinopathy (DR) within six months and hip fracture readmissions (HFR) within 30 days, involving cohorts of 90 400 diabetic patients and 18 065 hip fracture patients. RESULTS Our scores achieve accuracies insignificantly different from those obtained by existing approaches, reaching AUROCs of 0.803 and 0.645 for DR and HFR predictions, respectively. Regarding the scale, our scores ranged 0-53 for DR and 0-15 for HFR, while scores produced by existing methods frequently spanned hundreds or thousands. DISCUSSION According to the assessment, our risk scores offer simple and accurate predictions for diseases. Furthermore, our new DR score provides a competitive alternative to state-of-the-art risk scores for DR, while our HFR case study presents the first risk score for this condition. CONCLUSION Our technique offers a generalizable framework for crafting precise risk scores of compact scales, addressing the demand for user-friendly and effective risk stratification tool in healthcare.
Collapse
Affiliation(s)
- Yajun Lu
- Department of Management and Marketing, Jacksonville State University, Jacksonville, AL 36265, United States
| | - Thanh Duong
- Department of Computer Science and Engineering, University of South Florida, Tampa, FL 33620, United States
- Department of Machine Learning, Moffitt Cancer Center and Research Institute, Tampa, FL 33612, United States
| | - Zhuqi Miao
- School of Business, The State University of New York at New Paltz, New Paltz, NY 12561, United States
| | - Thanh Thieu
- Department of Machine Learning, Moffitt Cancer Center and Research Institute, Tampa, FL 33612, United States
- Department of Oncological Sciences, University of South Florida Morsani College of Medicine, Tampa, FL 33612, United States
| | - Jivan Lamichhane
- The State University of New York Upstate Medical University, Syracuse, NY 13210, United States
| | - Abdulaziz Ahmed
- Department of Health Services Administration, School of Health Professions, The University of Alabama at Birmingham, Birmingham, AL 35233, United States
| | - Dursun Delen
- Center for Health Systems Innovation, Department of Management Science and Information Systems, Oklahoma State University, Stillwater, OK 74078, United States
- Department of Industrial Engineering, Faculty of Engineering and Natural Sciences, Istinye University, Sariyer/Istanbul 34396, Turkey
| |
Collapse
|
10
|
Liao J, Misaki K, Sakamoto J. Impact Exploration of Spatiotemporal Feature Derivation and Selection on Machine Learning-Based Predictive Models for Post-Embolization Cerebral Aneurysm Recanalization. Cardiovasc Eng Technol 2024; 15:394-404. [PMID: 38782877 DOI: 10.1007/s13239-024-00721-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 02/04/2024] [Indexed: 05/25/2024]
Abstract
PURPOSE To enhance the performance of machine learning (ML) models for the post-embolization recanalization of cerebral aneurysms, we evaluated the impact of hemodynamic feature derivation and selection method on six ML algorithms. METHODS We utilized computational fluid dynamics (CFD) to simulate hemodynamics in 66 cerebral aneurysms from 65 patients, including 57 stable and nine recanalized aneurysms. We derived a total of 107 features for each aneurysm, encompassing four clinical features, 12 morphological features, and 91 hemodynamic features. To investigate the influence of feature derivation and selection methods on the ML models, we employed two derivation methods, simplified and fully derived, in combination with four selection methods: all features, statistically significant analysis, stepwise multivariate logistic regression analysis (stepwise-LR), and recursive feature elimination (RFE). Model performance was assessed using the area under the receiver operating characteristic curve (AUROC) and precision-recall curve (AUPRC) on both the training and testing datasets. RESULTS The AUROC values on the testing dataset exhibited a wide-ranging spectrum, spanning from 0.373 to 0.863. Fully derived features and the RFE selection method demonstrated superior performance in intra-model comparisons. The multi-layer perceptron (MLP) model, trained with RFE-selected fully derived features, achieved the best performance on the testing dataset, with an AUROC value of 0.863 (95% CI: 0.684- 1.000). CONCLUSION Our study demonstrated the importance of feature derivation and selection in determining the performance of ML models. This enabled the development of accurate decision-making models without the need to invade the patient.
Collapse
Affiliation(s)
- Jing Liao
- Division of Transdisciplinary Sciences, Graduate School of Frontier Science Initiative, Kanazawa University, Ishikawa, Japan.
| | - Kouichi Misaki
- Department of Neurosurgery, Kanazawa University, Ishikawa, Japan
| | - Jiro Sakamoto
- Division of Mechanical Science and Engineering, Graduate School of Natural Science and Technology, Kanazawa University, Kanazawa, Ishikawa, Japan
| |
Collapse
|
11
|
Ingle M, Sharma M, Verma S, Sharma N, Bhurane A, Rajendra Acharya U. Automated explainable wavelet-based sleep scoring system for a population suspected with insomnia, apnea and periodic leg movement. Med Eng Phys 2024; 130:104208. [PMID: 39160031 DOI: 10.1016/j.medengphy.2024.104208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 05/31/2024] [Accepted: 07/01/2024] [Indexed: 08/21/2024]
Abstract
Sleep is an integral and vital component of human life, contributing significantly to overall health and well-being, but a considerable number of people worldwide experience sleep disorders. Sleep disorder diagnosis heavily depends on accurately classifying sleep stages. Traditionally, this classification has been performed manually by trained sleep technologists that visually inspect polysomnography records. However, in order to mitigate the labor-intensive nature of this process, automated approaches have been developed. These automated methods aim to streamline and facilitate sleep stage classification. This study aims to classify sleep stages in a dataset comprising subjects with insomnia, PLM, and sleep apnea. The dataset consists of PSG recordings from the multi-ethnic study of atherosclerosis (MESA) cohort of the national sleep research resource (NSRR), including 2056 subjects. Among these subjects, 130 have insomnia, 39 suffer from PLM, 156 have sleep apnea, and the remaining 1731 are classified as good sleepers. This study proposes an automated computerized technique to classify sleep stages, developing a machine-learning model with explainable artificial intelligence (XAI) capabilities using wavelet-based Hjorth parameters. An optimal biorthogonal wavelet filter bank (BOWFB) has been employed to extract subbands (SBs) from 30 seconds of electroencephalogram (EEG) epochs. Three EEG channels, namely: Fz_Cz, Cz_Oz, and C4_M1, are employed to yield an optimum outcome. The Hjorth parameters extracted from SBs were then fed to different machine learning algorithms. To gain an understanding of the model, in this study, we used SHAP (Shapley Additive explanations) method. For subjects suffering from the aforementioned diseases, the model utilized features derived from all channels and employed an ensembled bagged trees (EnBT) classifier. The highest accuracy of 86.8%, 87.3%, 85.0%, 84.5%, and 83.8% is obtained for the insomniac, PLM, apniac, good sleepers and complete datasets, respectively. Using these techniques and datasets, the study aims to enhance sleep stage classification accuracy and improve understanding of sleep disorders such as insomnia, PLM, and sleep apnea.
Collapse
Affiliation(s)
- Manisha Ingle
- Department of Electronics and Communication Engineering, Visvesvaraya National Institute of Technology, Nagpur-440010, Maharashtra, India.
| | - Manish Sharma
- Department of Electrical and Computer Science Engineering, and Centre of Advanced Defence Technology (CADT), Institute of Infrastructure, Technology, Research and Management (IITRAM), Ahmedabad-380026, Gujrat, India.
| | - Shresth Verma
- Department of Electrical and Computer Science Engineering, and Centre of Advanced Defence Technology (CADT), Institute of Infrastructure, Technology, Research and Management (IITRAM), Ahmedabad-380026, Gujrat, India.
| | - Nishant Sharma
- Department of Electrical and Computer Science Engineering, and Centre of Advanced Defence Technology (CADT), Institute of Infrastructure, Technology, Research and Management (IITRAM), Ahmedabad-380026, Gujrat, India.
| | - Ankit Bhurane
- Department of Electronics and Communication Engineering, Visvesvaraya National Institute of Technology, Nagpur-440010, Maharashtra, India.
| | - U Rajendra Acharya
- School of Mathematics, Physics and Computing, University of Southern Queensland, Springfield, Australia.
| |
Collapse
|
12
|
Li Y, Geng Y, Sheng H. An improved mountain gazelle optimizer based on chaotic map and spiral disturbance for medical feature selection. PLoS One 2024; 19:e0307288. [PMID: 39012921 PMCID: PMC11251600 DOI: 10.1371/journal.pone.0307288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 07/03/2024] [Indexed: 07/18/2024] Open
Abstract
Feature selection is an important solution for dealing with high-dimensional data in the fields of machine learning and data mining. In this paper, we present an improved mountain gazelle optimizer (IMGO) based on the newly proposed mountain gazelle optimizer (MGO) and design a binary version of IMGO (BIMGO) to solve the feature selection problem for medical data. First, the gazelle population is initialized using iterative chaotic map with infinite collapses (ICMIC) mapping, which increases the diversity of the population. Second, a nonlinear control factor is introduced to balance the exploration and exploitation components of the algorithm. Individuals in the population are perturbed using a spiral perturbation mechanism to enhance the local search capability of the algorithm. Finally, a neighborhood search strategy is used for the optimal individuals to enhance the exploitation and convergence capabilities of the algorithm. The superior ability of the IMGO algorithm to solve continuous problems is demonstrated on 23 benchmark datasets. Then, BIMGO is evaluated on 16 medical datasets of different dimensions and compared with 8 well-known metaheuristic algorithms. The experimental results indicate that BIMGO outperforms the competing algorithms in terms of the fitness value, number of selected features and sensitivity. In addition, the statistical results of the experiments demonstrate the significantly superior ability of BIMGO to select the most effective features in medical datasets.
Collapse
Affiliation(s)
- Ying Li
- College of Computer Science and Technology, Jilin University, Changchun, People’s Republic of China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, People’s Republic of China
| | - Yanyu Geng
- College of Computer Science and Technology, Jilin University, Changchun, People’s Republic of China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, People’s Republic of China
| | - Huankun Sheng
- College of Computer Science and Technology, Jilin University, Changchun, People’s Republic of China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, People’s Republic of China
| |
Collapse
|
13
|
Du Y, Niu J, Xing Y, Li B, Calhoun VD. Neuroimage Analysis Methods and Artificial Intelligence Techniques for Reliable Biomarkers and Accurate Diagnosis of Schizophrenia: Achievements Made by Chinese Scholars Around the Past Decade. Schizophr Bull 2024:sbae110. [PMID: 38982882 DOI: 10.1093/schbul/sbae110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 07/11/2024]
Abstract
BACKGROUND AND HYPOTHESIS Schizophrenia (SZ) is characterized by significant cognitive and behavioral disruptions. Neuroimaging techniques, particularly magnetic resonance imaging (MRI), have been widely utilized to investigate biomarkers of SZ, distinguish SZ from healthy conditions or other mental disorders, and explore biotypes within SZ or across SZ and other mental disorders, which aim to promote the accurate diagnosis of SZ. In China, research on SZ using MRI has grown considerably in recent years. STUDY DESIGN The article reviews advanced neuroimaging and artificial intelligence (AI) methods using single-modal or multimodal MRI to reveal the mechanism of SZ and promote accurate diagnosis of SZ, with a particular emphasis on the achievements made by Chinese scholars around the past decade. STUDY RESULTS Our article focuses on the methods for capturing subtle brain functional and structural properties from the high-dimensional MRI data, the multimodal fusion and feature selection methods for obtaining important and sparse neuroimaging features, the supervised statistical analysis and classification for distinguishing disorders, and the unsupervised clustering and semi-supervised learning methods for identifying neuroimage-based biotypes. Crucially, our article highlights the characteristics of each method and underscores the interconnections among various approaches regarding biomarker extraction and neuroimage-based diagnosis, which is beneficial not only for comprehending SZ but also for exploring other mental disorders. CONCLUSIONS We offer a valuable review of advanced neuroimage analysis and AI methods primarily focused on SZ research by Chinese scholars, aiming to promote the diagnosis, treatment, and prevention of SZ, as well as other mental disorders, both within China and internationally.
Collapse
Affiliation(s)
- Yuhui Du
- School of Computer and Information Technology, Shanxi University, Taiyuan, 030006, China
| | - Ju Niu
- School of Computer and Information Technology, Shanxi University, Taiyuan, 030006, China
| | - Ying Xing
- School of Computer and Information Technology, Shanxi University, Taiyuan, 030006, China
| | - Bang Li
- School of Computer and Information Technology, Shanxi University, Taiyuan, 030006, China
| | - Vince D Calhoun
- The Tri-Institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, Emory University, Atlanta, 30303, GA, USA
| |
Collapse
|
14
|
Rajab MD, Taketa T, Wharton SB, Wang D. Ranking and filtering of neuropathology features in the machine learning evaluation of dementia studies. Brain Pathol 2024; 34:e13247. [PMID: 38374326 PMCID: PMC11189772 DOI: 10.1111/bpa.13247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 01/30/2024] [Indexed: 02/21/2024] Open
Abstract
Early diagnosis of dementia diseases, such as Alzheimer's disease, is difficult because of the time and resources needed to perform neuropsychological and pathological assessments. Given the increasing use of machine learning methods to evaluate neuropathology features in the brains of dementia patients, it is important to investigate how the selection of features may be impacted and which features are most important for the classification of dementia. We objectively assessed neuropathology features using machine learning techniques for filtering features in two independent ageing cohorts, the Cognitive Function and Aging Studies (CFAS) and Alzheimer's Disease Neuroimaging Initiative (ADNI). The reliefF and least loss methods were most consistent with their rankings between ADNI and CFAS; however, reliefF was most biassed by feature-feature correlations. Braak stage was consistently the highest ranked feature and its ranking was not correlated with other features, highlighting its unique importance. Using a smaller set of highly ranked features, rather than all features, can achieve a similar or better dementia classification performance in CFAS (60%-70% accuracy with Naïve Bayes). This study showed that specific neuropathology features can be prioritised by feature filtering methods, but they are impacted by feature-feature correlations and their results can vary between cohort studies. By understanding these biases, we can reduce discrepancies in feature ranking and identify a minimal set of features needed for accurate classification of dementia.
Collapse
Affiliation(s)
- Mohammed D. Rajab
- Sheffield Institute for Translational NeuroscienceUniversity of SheffieldSheffieldUK
- Department of Computer ScienceUniversity of SheffieldSheffieldUK
| | - Teruka Taketa
- Sheffield Institute for Translational NeuroscienceUniversity of SheffieldSheffieldUK
| | - Stephen B. Wharton
- Sheffield Institute for Translational NeuroscienceUniversity of SheffieldSheffieldUK
| | - Dennis Wang
- Sheffield Institute for Translational NeuroscienceUniversity of SheffieldSheffieldUK
- Department of Computer ScienceUniversity of SheffieldSheffieldUK
- Singapore Institute Clinical SciencesAgency for Science Technology and Research (A*STAR)SingaporeSingapore
- Bioinformatics InstituteAgency for Science Technology and Research (A*STAR)SingaporeSingapore
- National Heart and Lung InstituteImperial College LondonLondonUK
| | | |
Collapse
|
15
|
Jing X, Wielema M, Monroy-Gonzalez AG, Stams TRG, Mahesh SVK, Oudkerk M, Sijens PE, Dorrius MD, van Ooijen PMA. Automated Breast Density Assessment in MRI Using Deep Learning and Radiomics: Strategies for Reducing Inter-Observer Variability. J Magn Reson Imaging 2024; 60:80-91. [PMID: 37846440 DOI: 10.1002/jmri.29058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 09/18/2023] [Accepted: 09/19/2023] [Indexed: 10/18/2023] Open
Abstract
BACKGROUND Accurate breast density evaluation allows for more precise risk estimation but suffers from high inter-observer variability. PURPOSE To evaluate the feasibility of reducing inter-observer variability of breast density assessment through artificial intelligence (AI) assisted interpretation. STUDY TYPE Retrospective. POPULATION Six hundred and twenty-one patients without breast prosthesis or reconstructions were randomly divided into training (N = 377), validation (N = 98), and independent test (N = 146) datasets. FIELD STRENGTH/SEQUENCE 1.5 T and 3.0 T; T1-weighted spectral attenuated inversion recovery. ASSESSMENT Five radiologists independently assessed each scan in the independent test set to establish the inter-observer variability baseline and to reach a reference standard. Deep learning and three radiomics models were developed for three classification tasks: (i) four Breast Imaging-Reporting and Data System (BI-RADS) breast composition categories (A-D), (ii) dense (categories C, D) vs. non-dense (categories A, B), and (iii) extremely dense (category D) vs. moderately dense (categories A-C). The models were tested against the reference standard on the independent test set. AI-assisted interpretation was performed by majority voting between the models and each radiologist's assessment. STATISTICAL TESTS Inter-observer variability was assessed using linear-weighted kappa (κ) statistics. Kappa statistics, accuracy, and area under the receiver operating characteristic curve (AUC) were used to assess models against reference standard. RESULTS In the independent test set, five readers showed an overall substantial agreement on tasks (i) and (ii), but moderate agreement for task (iii). The best-performing model showed substantial agreement with reference standard for tasks (i) and (ii), but moderate agreement for task (iii). With the assistance of the AI models, almost perfect inter-observer variability was obtained for tasks (i) (mean κ = 0.86), (ii) (mean κ = 0.94), and (iii) (mean κ = 0.94). DATA CONCLUSION Deep learning and radiomics models have the potential to help reduce inter-observer variability of breast density assessment. LEVEL OF EVIDENCE 3 TECHNICAL EFFICACY: Stage 1.
Collapse
Affiliation(s)
- Xueping Jing
- Department of Radiation Oncology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- Machine Learning Lab, Data Science Center in Health (DASH), University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Mirjam Wielema
- Department of Radiology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Andrea G Monroy-Gonzalez
- Department of Radiology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Thom R G Stams
- Department of Radiology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Shekar V K Mahesh
- Department of Radiology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Matthijs Oudkerk
- Faculty of Medical Sciences, University of Groningen, Groningen, The Netherlands
- Institute of Diagnostic Accuracy Research B.V., Groningen, The Netherlands
| | - Paul E Sijens
- Department of Radiology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Monique D Dorrius
- Department of Radiology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Peter M A van Ooijen
- Department of Radiation Oncology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- Machine Learning Lab, Data Science Center in Health (DASH), University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| |
Collapse
|
16
|
Xu C, Wu J, Zhang F, Freer J, Zhang Z, Cheng Y. A deep image classification model based on prior feature knowledge embedding and application in medical diagnosis. Sci Rep 2024; 14:13244. [PMID: 38853158 PMCID: PMC11163012 DOI: 10.1038/s41598-024-63818-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 06/03/2024] [Indexed: 06/11/2024] Open
Abstract
Aiming at the problem of image classification with insignificant morphological structural features, strong target correlation, and low signal-to-noise ratio, combined with prior feature knowledge embedding, a deep learning method based on ResNet and Radial Basis Probabilistic Neural Network (RBPNN) is proposed model. Taking ResNet50 as a visual modeling network, it uses feature pyramid and self-attention mechanism to extract appearance and semantic features of images at multiple scales, and associate and enhance local and global features. Taking into account the diversity of category features, channel cosine similarity attention and dynamic C-means clustering algorithms are used to select representative sample features in different category of sample subsets to implicitly express prior category feature knowledge, and use them as the kernel centers of radial basis probability neurons (RBPN) to realize the embedding of diverse prior feature knowledge. In the RBPNN pattern aggregation layer, the outputs of RBPN are selectively summed according to the category of the kernel center, that is, the subcategory features are combined into category features, and finally the image classification is implemented based on Softmax. The functional module of the proposed method is designed specifically for image characteristics, which can highlight the significance of local and structural features of the image, form a non-convex decision-making area, and reduce the requirements for the completeness of the sample set. Applying the proposed method to medical image classification, experiments were conducted based on the brain tumor MRI image classification public dataset and the actual cardiac ultrasound image dataset, and the accuracy rate reached 85.82% and 83.92% respectively. Compared with the three mainstream image classification models, the performance indicators of this method have been significantly improved.
Collapse
Affiliation(s)
- Chen Xu
- School of Computer Science, Fudan University, Shanghai, China.
| | - Jiangxing Wu
- School of Computer Science, Fudan University, Shanghai, China
| | - Fan Zhang
- School of Computer Science, Fudan University, Shanghai, China
| | - Jonathan Freer
- School of Computer Science, University of Birmingham, Birmingham, UK
| | - Zhongqun Zhang
- School of Computer Science, University of Birmingham, Birmingham, UK
| | - Yihua Cheng
- School of Computer Science, University of Birmingham, Birmingham, UK
| |
Collapse
|
17
|
Alsadi B, Musleh S, Al-Absi HRH, Refaee M, Qureshi R, El Hajj N, Alam T. An ensemble-based machine learning model for predicting type 2 diabetes and its effect on bone health. BMC Med Inform Decis Mak 2024; 24:144. [PMID: 38811939 PMCID: PMC11134939 DOI: 10.1186/s12911-024-02540-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Accepted: 05/17/2024] [Indexed: 05/31/2024] Open
Abstract
BACKGROUND Diabetes is a chronic condition that can result in many long-term physiological, metabolic, and neurological complications. Therefore, early detection of diabetes would help to determine a proper diagnosis and treatment plan. METHODS In this study, we employed machine learning (ML) based case-control study on a diabetic cohort size of 1000 participants form Qatar Biobank to predict diabetes using clinical and bone health indicators from Dual Energy X-ray Absorptiometry (DXA) machines. ML models were utilized to distinguish diabetes groups from non-diabetes controls. Recursive feature elimination (RFE) was leveraged to identify a subset of features to improve the performance of model. SHAP based analysis was used for the importance of features and support the explainability of the proposed model. RESULTS Ensemble based models XGboost and RF achieved over 84% accuracy for detecting diabetes. After applying RFE, we selected only 20 features which improved the model accuracy to 87.2%. From a clinical standpoint, higher HDL-Cholesterol and Neutrophil levels were observed in the diabetic group, along with lower vitamin B12 and testosterone levels. Lower sodium levels were found in diabetics, potentially stemming from clinical factors including specific medications, hormonal imbalances, unmanaged diabetes. We believe Dapagliflozin prescriptions in Qatar were associated with decreased Gamma Glutamyltransferase and Aspartate Aminotransferase enzyme levels, confirming prior research. We observed that bone area, bone mineral content, and bone mineral density were slightly lower in the Diabetes group across almost all body parts, but the difference against the control group was not statistically significant except in T12, troch and trunk area. No significant negative impact of diabetes progression on bone health was observed over a period of 5-15 yrs in the cohort. CONCLUSION This study recommends the inclusion of ML model which combines both DXA and clinical data for the early diagnosis of diabetes.
Collapse
Affiliation(s)
- Belqes Alsadi
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Saleh Musleh
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Hamada R H Al-Absi
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | | | - Rizwan Qureshi
- Department of Imaging Physics, MD Anderson Cancer Center, The University of Texas, Houston, USA
| | - Nady El Hajj
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
- College of Health and Life Sciences, Hamad Bin Khalifa University, Doha, Qatar
| | - Tanvir Alam
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar.
| |
Collapse
|
18
|
Liu W, Jia L, Xu L, Yang F, Guo Z, Li J, Zhang D, Liu Y, Xiang H, Cheng H, Hou J, Li S, Li H. Prediction of early neurologic deterioration in patients with perforating artery territory infarction using machine learning: a retrospective study. Front Neurol 2024; 15:1368902. [PMID: 38841697 PMCID: PMC11150528 DOI: 10.3389/fneur.2024.1368902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 04/24/2024] [Indexed: 06/07/2024] Open
Abstract
Background Early neurological deterioration (END) is a frequent complication in patients with perforating artery territory infarction (PAI), leading to poorer outcomes. Therefore, we aimed to apply machine learning (ML) algorithms to predict the occurrence of END in PAI and investigate related risk factors. Methods This retrospective study analyzed a cohort of PAI patients, excluding those with severe stenosis of the parent artery. We included demographic characteristics, clinical features, laboratory data, and imaging variables. Recursive feature elimination with cross-validation (RFECV) was performed to identify critical features. Seven ML algorithms, namely logistic regression, random forest, adaptive boosting, gradient boosting decision tree, histogram-based gradient boosting, extreme gradient boosting, and category boosting, were developed to predict END in PAI patients using these critical features. We compared the accuracy of these models in predicting outcomes. Additionally, SHapley Additive exPlanations (SHAP) values were introduced to interpret the optimal model and assess the significance of input features. Results The study enrolled 1,020 PAI patients with a mean age of 60.46 (range 49.11-71.81) years. Of these, 30.39% were women, and 129 (12.65%) experienced END. RFECV selected 13 critical features, including blood urea nitrogen (BUN), total cholesterol (TC), low-density-lipoprotein cholesterol (LDL-C), apolipoprotein B (apoB), atrial fibrillation, loading dual antiplatelet therapy (DAPT), single antiplatelet therapy (SAPT), argatroban, the basal ganglia, the thalamus, the posterior choroidal arteries, maximal axial infarct diameter (measured at < 15 mm), and stroke subtype. The gradient-boosting decision tree had the highest area under the curve (0.914) among the seven ML algorithms. The SHAP analysis identified apoB as the most significant variable for END. Conclusion Our results suggest that ML algorithms, especially the gradient-boosting decision tree, are effective in predicting the occurrence of END in PAI patients.
Collapse
Affiliation(s)
- Wei Liu
- Department of Neurology, Jincheng People's Hospital, Jincheng, China
| | - Longbin Jia
- Department of Neurology, Jincheng People's Hospital, Jincheng, China
| | - Lina Xu
- Department of Neurology, Jincheng People's Hospital, Jincheng, China
| | - Fengbing Yang
- Department of Neurology, Jincheng People's Hospital, Jincheng, China
| | - Zixuan Guo
- Department of Neurology, Jincheng People's Hospital, Jincheng, China
| | - Jinna Li
- Department of Neurology, Jincheng People's Hospital, Jincheng, China
| | - Dandan Zhang
- Department of Neurology, Jincheng People's Hospital, Jincheng, China
| | - Yan Liu
- The First Clinical College of Changzhi Medical College, Changzhi, China
| | - Han Xiang
- The First Clinical College of Changzhi Medical College, Changzhi, China
| | - Hongjiang Cheng
- Department of Neurology, Jincheng People's Hospital, Jincheng, China
| | - Jing Hou
- Department of Neurology, Jincheng People's Hospital, Jincheng, China
| | - Shifang Li
- Department of Neurology, Jincheng People's Hospital, Jincheng, China
| | - Huimin Li
- Department of Neurology, Jincheng People's Hospital, Jincheng, China
| |
Collapse
|
19
|
Bahameish M, Stockman T, Requena Carrión J. Strategies for Reliable Stress Recognition: A Machine Learning Approach Using Heart Rate Variability Features. SENSORS (BASEL, SWITZERLAND) 2024; 24:3210. [PMID: 38794064 PMCID: PMC11126126 DOI: 10.3390/s24103210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 05/11/2024] [Accepted: 05/14/2024] [Indexed: 05/26/2024]
Abstract
Stress recognition, particularly using machine learning (ML) with physiological data such as heart rate variability (HRV), holds promise for mental health interventions. However, limited datasets in affective computing and healthcare research can lead to inaccurate conclusions regarding the ML model performance. This study employed supervised learning algorithms to classify stress and relaxation states using HRV measures. To account for limitations associated with small datasets, robust strategies were implemented based on methodological recommendations for ML with a limited dataset, including data segmentation, feature selection, and model evaluation. Our findings highlight that the random forest model achieved the best performance in distinguishing stress from non-stress states. Notably, it showed higher performance in identifying stress from relaxation (F1-score: 86.3%) compared to neutral states (F1-score: 65.8%). Additionally, the model demonstrated generalizability when tested on independent secondary datasets, showcasing its ability to distinguish between stress and relaxation states. While our performance metrics might be lower than some previous studies, this likely reflects our focus on robust methodologies to enhance the generalizability and interpretability of ML models, which are crucial for real-world applications with limited datasets.
Collapse
Affiliation(s)
- Mariam Bahameish
- College of Science and Engineering, Hamad Bin Khalifa University, Doha P.O. Box 34110, Qatar
| | - Tony Stockman
- School of Electronic Engineering and Computer Science, Queen Mary University of London, London E1 4NS, UK; (T.S.); (J.R.C.)
| | - Jesús Requena Carrión
- School of Electronic Engineering and Computer Science, Queen Mary University of London, London E1 4NS, UK; (T.S.); (J.R.C.)
| |
Collapse
|
20
|
Burton RJ, Raffray L, Moet LM, Cuff SM, White DA, Baker SE, Moser B, O’Donnell VB, Ghazal P, Morgan MP, Artemiou A, Eberl M. Conventional and unconventional T-cell responses contribute to the prediction of clinical outcome and causative bacterial pathogen in sepsis patients. Clin Exp Immunol 2024; 216:293-306. [PMID: 38430552 PMCID: PMC11097916 DOI: 10.1093/cei/uxae019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 02/12/2024] [Accepted: 02/28/2024] [Indexed: 03/04/2024] Open
Abstract
Sepsis is characterized by a dysfunctional host response to infection culminating in life-threatening organ failure that requires complex patient management and rapid intervention. Timely diagnosis of the underlying cause of sepsis is crucial, and identifying those at risk of complications and death is imperative for triaging treatment and resource allocation. Here, we explored the potential of explainable machine learning models to predict mortality and causative pathogen in sepsis patients. By using a modelling pipeline employing multiple feature selection algorithms, we demonstrate the feasibility of identifying integrative patterns from clinical parameters, plasma biomarkers, and extensive phenotyping of blood immune cells. While no single variable had sufficient predictive power, models that combined five and more features showed a macro area under the curve (AUC) of 0.85 to predict 90-day mortality after sepsis diagnosis, and a macro AUC of 0.86 to discriminate between Gram-positive and Gram-negative bacterial infections. Parameters associated with the cellular immune response contributed the most to models predictive of 90-day mortality, most notably, the proportion of T cells among PBMCs, together with expression of CXCR3 by CD4+ T cells and CD25 by mucosal-associated invariant T (MAIT) cells. Frequencies of Vδ2+ γδ T cells had the most profound impact on the prediction of Gram-negative infections, alongside other T-cell-related variables and total neutrophil count. Overall, our findings highlight the added value of measuring the proportion and activation patterns of conventional and unconventional T cells in the blood of sepsis patients in combination with other immunological, biochemical, and clinical parameters.
Collapse
Affiliation(s)
- Ross J Burton
- Division of Infection and Immunity, School of Medicine, Cardiff University, Cardiff, UK
- Adult Critical Care, University Hospital of Wales, Cardiff and Vale University Health Board, Cardiff, UK
| | - Loïc Raffray
- Division of Infection and Immunity, School of Medicine, Cardiff University, Cardiff, UK
- Department of Internal Medicine, Félix Guyon University Hospital of La Réunion, Saint Denis, Réunion Island, France
| | - Linda M Moet
- Division of Infection and Immunity, School of Medicine, Cardiff University, Cardiff, UK
| | - Simone M Cuff
- Division of Infection and Immunity, School of Medicine, Cardiff University, Cardiff, UK
| | - Daniel A White
- Division of Infection and Immunity, School of Medicine, Cardiff University, Cardiff, UK
| | - Sarah E Baker
- Division of Infection and Immunity, School of Medicine, Cardiff University, Cardiff, UK
| | - Bernhard Moser
- Division of Infection and Immunity, School of Medicine, Cardiff University, Cardiff, UK
- Systems Immunity Research Institute, Cardiff University, Cardiff, UK
| | - Valerie B O’Donnell
- Division of Infection and Immunity, School of Medicine, Cardiff University, Cardiff, UK
- Systems Immunity Research Institute, Cardiff University, Cardiff, UK
| | - Peter Ghazal
- Division of Infection and Immunity, School of Medicine, Cardiff University, Cardiff, UK
- Systems Immunity Research Institute, Cardiff University, Cardiff, UK
| | - Matt P Morgan
- Adult Critical Care, University Hospital of Wales, Cardiff and Vale University Health Board, Cardiff, UK
| | - Andreas Artemiou
- School of Mathematics, Cardiff University, Cardiff, UK
- Department of Information Technologies, University of Limassol, 3025 Limassol, Cyprus
| | - Matthias Eberl
- Division of Infection and Immunity, School of Medicine, Cardiff University, Cardiff, UK
- Systems Immunity Research Institute, Cardiff University, Cardiff, UK
| |
Collapse
|
21
|
Li Q, Lv H, Chen Y, Shen J, Shi J, Zhou C. Hybrid feature selection in a machine learning predictive model for perioperative myocardial injury in noncoronary cardiac surgery with cardiopulmonary bypass. Perfusion 2024:2676591241253459. [PMID: 38733257 DOI: 10.1177/02676591241253459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2024]
Abstract
BACKGROUND Perioperative myocardial injury (PMI) is associated with increased mobility and mortality after noncoronary cardiac surgery. However, limited studies have developed a predictive model for PMI. Therefore, we used hybrid feature selection (FS) methods to establish a predictive model for PMI in noncoronary cardiac surgery with cardiopulmonary bypass (CPB). METHODS This was a single-center retrospective study conducted at the Fuwai Hospital in China. Patients aged 18-70 years who underwent elective noncoronary surgery with CPB at our institution from December 2018 to April 2021 were enrolled. The primary outcome was PMI, defined as the postoperative cardiac troponin I (cTnI) levels exceeding 220 times of upper reference limit (URL). Statistical analyses were conducted by Python (Python Software Foundation, version 3.9.7 and integrated development environment Jupyter Notebook 1.1.0) and SPSS software version 26.0 (IBM Corp., Armonk, New York, USA). RESULTS A total of 1130 patients were eventually eligible for this study. The incidence of PMI was 20.3% (229/1130) in the overall patients, 20.6% (163/791) in the training dataset, and 19.5% (66/339) in the testing dataset. The logistic regression model performed the best AUC of 0.6893 (95 CI%: 0.6371-0.7382) by the traditional selection method, and the random forest model performed the best AUC of 0.6937 (95 CI%: 0.6416-0.7423) by the union of Wrapper and Embedded method, and the CatBoost model performed the best AUC of 0.6828 (95 CI%: 0.6304-0.7320) by the union of Embedded and forward logistic regression technique, and the Naïve Bayes model achieved the best AUC with 0.7254 (95 CI%: 0.6746-0.7723) by forwarding logistic regression method. Moreover, the decision tree, KNeighborsClassifier, and support vector machine models performed the worse AUC in all selection forms. Furthermore, the SHapley Additive exPlanations plot showed that prolonged CPB, aortic clamp time, and preoperative low platelets count were strongly related to the PMI risk. CONCLUSIONS In total, four category feature selection methods were utilized, comprising five individual selection techniques and 15 combined methods. Notably, the combination of logistic regression and embedded methods demonstrated outstanding performance in predicting PMI risk. We also concluded that the machine learning model, including random forest, catboost, and Naive Bayes, were suitable candidates for establishing PMI predictive model. Nevertheless, additional investigation and validation are imperative for substantiating these finding.
Collapse
Affiliation(s)
- Qian Li
- Department of Anesthesiology, State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Bejing, China
| | - Hong Lv
- Department of Anesthesiology, State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Bejing, China
| | - Yuye Chen
- Department of Anesthesiology, State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Bejing, China
| | - Jingjia Shen
- Department of Anesthesiology, State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Bejing, China
| | - Jia Shi
- Department of Anesthesiology, State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Bejing, China
| | - Chenghui Zhou
- Department of Anesthesiology, State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Bejing, China
- Center for Anesthesiology, Beijing Anzhen Hospital, Capital Medical University, Beijing, China
| |
Collapse
|
22
|
Bulut E, Arslan Yildiz U, Cengiz M, Yilmaz M, Kavakli AS, Arici AG, Ozturk N, Uslu S. Evaluation of the Effect of Morphological Structure on Dilatational Tracheostomy Interference Location and Complications with Ultrasonography and Fiberoptic Bronchoscopy. J Clin Med 2024; 13:2788. [PMID: 38792330 PMCID: PMC11122435 DOI: 10.3390/jcm13102788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Revised: 04/25/2024] [Accepted: 05/06/2024] [Indexed: 05/26/2024] Open
Abstract
Background: Percutaneous dilatational tracheostomy (PDT) is the most commonly performed minimally invasive intensive care unit procedure worldwide. Methods: This study evaluated the percentage of consistency between the entry site observed with fiberoptic bronchoscopy (FOB) and the prediction for the PDT level based on pre-procedural ultrasonography (USG) in PDT procedures performed using the forceps dilatation method. The effect of morphological features on intervention sites was also investigated. Complications that occurred during and after the procedure, as well as the duration, site, and quantity of the procedures, were recorded. Results: Data obtained from a total of 91 patients were analyzed. In 57 patients (62.6%), the USG-estimated tracheal puncture level was consistent with the intercartilaginous space observed by FOB, while in 34 patients (37.4%), there was a discrepancy between these two methods. According to Bland Altman, the agreement between the tracheal spaces determined by USG and FOB was close. Regression formulas for PDT procedures defining the intercartilaginous puncture level based on morphologic measurements of the patients were created. The most common complication related to PDT was cartilage fracture (17.6%), which was proven to be predicted with maximum relevance by punctured tracheal level, neck extension limitation, and procedure duration. Conclusions: In PDT procedures using the forceps dilatation method, the prediction of the PDT intervention level based on pre-procedural USG was considerably in accordance with the entry site observed by FOB. The intercartilaginous puncture level could be estimated based on morphological measurements.
Collapse
Affiliation(s)
- Esin Bulut
- Department of Anesthesiology and Reanimation, Akdeniz University Faculty of Medicine, Antalya 07070, Turkey; (E.B.); (M.C.); (M.Y.); (A.G.A.)
| | - Ulku Arslan Yildiz
- Department of Anesthesiology and Reanimation, Akdeniz University Faculty of Medicine, Antalya 07070, Turkey; (E.B.); (M.C.); (M.Y.); (A.G.A.)
| | - Melike Cengiz
- Department of Anesthesiology and Reanimation, Akdeniz University Faculty of Medicine, Antalya 07070, Turkey; (E.B.); (M.C.); (M.Y.); (A.G.A.)
| | - Murat Yilmaz
- Department of Anesthesiology and Reanimation, Akdeniz University Faculty of Medicine, Antalya 07070, Turkey; (E.B.); (M.C.); (M.Y.); (A.G.A.)
| | - Ali Sait Kavakli
- Department of Anesthesiology and Reanimation, Istinye University Faculty of Medicine, Istanbul 34010, Turkey;
| | - Ayse Gulbin Arici
- Department of Anesthesiology and Reanimation, Akdeniz University Faculty of Medicine, Antalya 07070, Turkey; (E.B.); (M.C.); (M.Y.); (A.G.A.)
| | - Nihal Ozturk
- Department of Biophysics, Akdeniz University Faculty of Medicine, Antalya 07070, Turkey; (N.O.); (S.U.)
| | - Serkan Uslu
- Department of Biophysics, Akdeniz University Faculty of Medicine, Antalya 07070, Turkey; (N.O.); (S.U.)
| |
Collapse
|
23
|
Li S, Xiang S, Ma Q, Cai W, Liu S, Fang F, Yu H. A decision support system for upper limb rehabilitation robot based on hybrid reasoning with RBR and CBR. Front Bioeng Biotechnol 2024; 12:1400912. [PMID: 38720881 PMCID: PMC11076720 DOI: 10.3389/fbioe.2024.1400912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 04/08/2024] [Indexed: 05/12/2024] Open
Abstract
The rehabilitation robot can assist hemiplegic patients to complete the training program effectively, but it only focuses on helping the patient's training process and requires the rehabilitation therapists to manually adjust the training parameters according to the patient's condition. Therefore, there is an urgent need for intelligent training prescription research of rehabilitation robots to promote the clinical applications. This study proposed a decision support system for the training of upper limb rehabilitation robot based on hybrid reasoning with rule-based reasoning (RBR) and case-based reasoning (CBR). The expert knowledge base of this system is established base on 10 professional rehabilitation therapists from three different rehabilitation departments in Shanghai who are enriched with experiences in using desktop-based upper limb rehabilitation robot. The rule-based reasoning is chosen to construct the cycle plan inference model, which develops a 21-day training plan for the patients. The case base consists of historical case data from 54 stroke patients who underwent rehabilitation training with a desktop-based upper limb rehabilitation robot. The case-based reasoning, combined with a Random Forest optimized algorithm, was constructed to adjust the training parameters for the patients in real-time. The system recommended a rehabilitation training program with an average accuracy of 91.5%, an average AUC value of 0.924, an average recall rate of 88.7%, and an average F1 score of 90.1%. The application of this system in rehabilitation robot would be useful for therapists.
Collapse
Affiliation(s)
- Sujiao Li
- Institute of Rehabilitation Engineering and Technology, University of Shanghai for Science and Technology, Shanghai, China
- Shanghai Engineering Research Center of Assistive Devices, Shanghai, China
| | - Shuhan Xiang
- Institute of Rehabilitation Engineering and Technology, University of Shanghai for Science and Technology, Shanghai, China
| | - Qiqi Ma
- Institute of Rehabilitation Engineering and Technology, University of Shanghai for Science and Technology, Shanghai, China
| | - Wenqian Cai
- Institute of Rehabilitation Engineering and Technology, University of Shanghai for Science and Technology, Shanghai, China
| | - Suiyi Liu
- Department of Medical Engineering, Shanghai Eastern Hepatobiliary Surgery Hospital, Naval Medical University, Shanghai, China
| | | | - Hongliu Yu
- Institute of Rehabilitation Engineering and Technology, University of Shanghai for Science and Technology, Shanghai, China
- Shanghai Engineering Research Center of Assistive Devices, Shanghai, China
| |
Collapse
|
24
|
Seyedtabib M, Najafi-Vosough R, Kamyari N. The predictive power of data: machine learning analysis for Covid-19 mortality based on personal, clinical, preclinical, and laboratory variables in a case-control study. BMC Infect Dis 2024; 24:411. [PMID: 38637727 PMCID: PMC11025285 DOI: 10.1186/s12879-024-09298-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 04/05/2024] [Indexed: 04/20/2024] Open
Abstract
BACKGROUND AND PURPOSE The COVID-19 pandemic has presented unprecedented public health challenges worldwide. Understanding the factors contributing to COVID-19 mortality is critical for effective management and intervention strategies. This study aims to unlock the predictive power of data collected from personal, clinical, preclinical, and laboratory variables through machine learning (ML) analyses. METHODS A retrospective study was conducted in 2022 in a large hospital in Abadan, Iran. Data were collected and categorized into demographic, clinical, comorbid, treatment, initial vital signs, symptoms, and laboratory test groups. The collected data were subjected to ML analysis to identify predictive factors associated with COVID-19 mortality. Five algorithms were used to analyze the data set and derive the latent predictive power of the variables by the shapely additive explanation values. RESULTS Results highlight key factors associated with COVID-19 mortality, including age, comorbidities (hypertension, diabetes), specific treatments (antibiotics, remdesivir, favipiravir, vitamin zinc), and clinical indicators (heart rate, respiratory rate, temperature). Notably, specific symptoms (productive cough, dyspnea, delirium) and laboratory values (D-dimer, ESR) also play a critical role in predicting outcomes. This study highlights the importance of feature selection and the impact of data quantity and quality on model performance. CONCLUSION This study highlights the potential of ML analysis to improve the accuracy of COVID-19 mortality prediction and emphasizes the need for a comprehensive approach that considers multiple feature categories. It highlights the critical role of data quality and quantity in improving model performance and contributes to our understanding of the multifaceted factors that influence COVID-19 outcomes.
Collapse
Affiliation(s)
- Maryam Seyedtabib
- Department of Biostatistics and Epidemiology, School of Health, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran
| | - Roya Najafi-Vosough
- Research Center for Health Sciences, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Naser Kamyari
- Department of Biostatistics and Epidemiology, School of Health, Abadan University of Medical Sciences, Abadan, Iran.
| |
Collapse
|
25
|
Nishan A, M. Taslim Uddin Raju S, Hossain MI, Dipto SA, M. Tanvir Uddin S, Sijan A, Chowdhury MAS, Ahmad A, Mahamudul Hasan Khan M. A continuous cuffless blood pressure measurement from optimal PPG characteristic features using machine learning algorithms. Heliyon 2024; 10:e27779. [PMID: 38533045 PMCID: PMC10963242 DOI: 10.1016/j.heliyon.2024.e27779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 03/01/2024] [Accepted: 03/06/2024] [Indexed: 03/28/2024] Open
Abstract
Background and objective Hypertension is a potentially dangerous health condition that can be detected by measuring blood pressure (BP). Blood pressure monitoring and measurement are essential for preventing and treating cardiovascular diseases. Cuff-based devices, on the other hand, are uncomfortable and prevent continuous BP measurement. Methods In this study, a new non-invasive and cuff-less method for estimating Systolic Blood Pressure (SBP), Mean Arterial Pressure (MAP), and Diastolic Blood Pressure (DBP) has been proposed using characteristic features of photoplethysmogram (PPG) signals and nonlinear regression algorithms. PPG signals were collected from 219 participants, which were then subjected to preprocessing and feature extraction steps. Analyzing PPG and its derivative signals, a total of 46 time, frequency, and time-frequency domain features were extracted. In addition, the age and gender of each subject were also included as features. Further, correlation-based feature selection (CFS) and Relief F feature selection (ReliefF) techniques were used to select the relevant features and reduce the possibility of over-fitting the models. Finally, support vector regression (SVR), K-nearest neighbour regression (KNR), decision tree regression (DTR), and random forest regression (RFR) were established to develop the BP estimation model. Regression models were trained and evaluated on all features as well as selected features. The best regression models for SBP, MAP, and DBP estimations were selected separately. Results The SVR model, along with the ReliefF-based feature selection algorithm, outperforms other algorithms in estimating the SBP, MAP, and DBP with the mean absolute error of 2.49, 1.62 and 1.43 mmHg, respectively. The proposed method meets the Advancement of Medical Instrumentation standard for BP estimations. Based on the British Hypertension Society standard, the results also fall within Grade A for SBP, MAP, and DBP. Conclusion The findings show that the method can be used to estimate blood pressure non-invasively, without using a cuff or calibration, and only by utilizing the PPG signal characteristic features.
Collapse
Affiliation(s)
- Araf Nishan
- Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna - 9203, Bangladesh
| | - S. M. Taslim Uddin Raju
- Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna - 9203, Bangladesh
| | - Md Imran Hossain
- Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna - 9203, Bangladesh
| | - Safin Ahmed Dipto
- Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna - 9203, Bangladesh
| | - S. M. Tanvir Uddin
- Department of Electrical and Electronic Engineering, Dhaka University of Engineering & Technology, Gazipur, Bangladesh
| | - Asif Sijan
- Department of Software Engineering, American International University, Dhaka, Bangladesh
| | - Md Abu Shahid Chowdhury
- Department of Biomedical Engineering, Khulna University of Engineering & Technology, Khulna - 9203, Bangladesh
| | - Ashfaq Ahmad
- Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna - 9203, Bangladesh
| | - Md Mahamudul Hasan Khan
- Department of Computer Science and Engineering, Khulna University of Engineering & Technology, Khulna - 9203, Bangladesh
| |
Collapse
|
26
|
Tiwari AK, Saini R, Nath A, Singh P, Shah MA. Hybrid similarity relation based mutual information for feature selection in intuitionistic fuzzy rough framework and its applications. Sci Rep 2024; 14:5958. [PMID: 38472266 PMCID: PMC10933482 DOI: 10.1038/s41598-024-55902-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Accepted: 02/28/2024] [Indexed: 03/14/2024] Open
Abstract
Fuzzy rough entropy established in the notion of fuzzy rough set theory, which has been effectively and efficiently applied for feature selection to handle the uncertainty in real-valued datasets. Further, Fuzzy rough mutual information has been presented by integrating information entropy with fuzzy rough set to measure the importance of features. However, none of the methods till date can handle noise, uncertainty and vagueness simultaneously due to both judgement and identification, which lead to degrade the overall performances of the learning algorithms with the increment in the number of mixed valued conditional features. In the current study, these issues are tackled by presenting a novel intuitionistic fuzzy (IF) assisted mutual information concept along with IF granular structure. Initially, a hybrid IF similarity relation is introduced. Based on this relation, an IF granular structure is introduced. Then, IF rough conditional and joint entropies are established. Further, mutual information based on these concepts are discussed. Next, mathematical theorems are proved to demonstrate the validity of the given notions. Thereafter, significance of the features subset is computed by using this mutual information, and corresponding feature selection is suggested to delete the irrelevant and redundant features. The current approach effectively handles noise and subsequent uncertainty in both nominal and mixed data (including both nominal and category variables). Moreover, comprehensive experimental performances are evaluated on real-valued benchmark datasets to demonstrate the practical validation and effectiveness of the addressed technique. Finally, an application of the proposed method is exhibited to improve the prediction of phospholipidosis positive molecules. RF(h2o) produces the most effective results till date based on our proposed methodology with sensitivity, accuracy, specificity, MCC, and AUC of 86.7%, 90.1%, 93.0% , 0.808, and 0.922 respectively.
Collapse
Affiliation(s)
- Anoop Kumar Tiwari
- Department of Computer Science and Information Technology, Central University of Haryana, Mahendergarh, 123031, India
| | - Rajat Saini
- Department of Mathematics, School of Basic Sciences, Central University of Haryana, Mahendergarh, 123031, India.
| | - Abhigyan Nath
- Department of Biochemistry, Pt. Jawahar Lal Nehru Memorial Medical College, Raipur, 492001, India
| | - Phool Singh
- Department of Mathematics (SoET), Central University of Haryana, Mahendergarh, 123031, India
| | - Mohd Asif Shah
- Department of Economics, Kebri Dehar University, 250, Kebri Dehar, Somali, Ethiopia.
- Centre of Research Impact and Outcome, Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, 140401, Punjab, India.
- Division of Research and Development, Lovely Professional University, Phagwara, 144001, Punjab, India.
| |
Collapse
|
27
|
Yang F, Xu Z, Wang H, Sun L, Zhai M, Zhang J. A hybrid feature selection algorithm combining information gain and grouping particle swarm optimization for cancer diagnosis. PLoS One 2024; 19:e0290332. [PMID: 38466662 PMCID: PMC10927139 DOI: 10.1371/journal.pone.0290332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Accepted: 08/04/2023] [Indexed: 03/13/2024] Open
Abstract
BACKGROUND Cancer diagnosis based on machine learning has become a popular application direction. Support vector machine (SVM), as a classical machine learning algorithm, has been widely used in cancer diagnosis because of its advantages in high-dimensional and small sample data. However, due to the high-dimensional feature space and high feature redundancy of gene expression data, SVM faces the problem of poor classification effect when dealing with such data. METHODS Based on this, this paper proposes a hybrid feature selection algorithm combining information gain and grouping particle swarm optimization (IG-GPSO). The algorithm firstly calculates the information gain values of the features and ranks them in descending order according to the value. Then, ranked features are grouped according to the information index, so that the features in the group are close, and the features outside the group are sparse. Finally, grouped features are searched using grouping PSO and evaluated according to in-group and out-group. RESULTS Experimental results show that the average accuracy (ACC) of the SVM on the feature subset selected by the IG-GPSO is 98.50%, which is significantly better than the traditional feature selection algorithm. Compared with KNN, the classification effect of the feature subset selected by the IG-GPSO is still optimal. In addition, the results of multiple comparison tests show that the feature selection effect of the IG-GPSO is significantly better than that of traditional feature selection algorithms. CONCLUSION The feature subset selected by IG-GPSO not only has the best classification effect, but also has the least feature scale (FS). More importantly, the IG-GPSO significantly improves the ACC of SVM in cancer diagnostic.
Collapse
Affiliation(s)
- Fangyuan Yang
- Department of Gynecologic Oncology, The First Affiliated Hospital of Henan Polytechnic University, Jiaozuo, Henan, China
| | - Zhaozhao Xu
- School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, Henan, China
| | - Hong Wang
- Department of Gynecologic Oncology, The First Affiliated Hospital of Henan Polytechnic University, Jiaozuo, Henan, China
| | - Lisha Sun
- Department of Gynecologic Oncology, The First Affiliated Hospital of Henan Polytechnic University, Jiaozuo, Henan, China
| | - Mengjiao Zhai
- Department of Gynecologic Oncology, The First Affiliated Hospital of Henan Polytechnic University, Jiaozuo, Henan, China
| | - Juan Zhang
- Department of Gynecologic Oncology, The First Affiliated Hospital of Henan Polytechnic University, Jiaozuo, Henan, China
| |
Collapse
|
28
|
Deng F, Zhao L, Yu N, Lin Y, Zhang L. Union With Recursive Feature Elimination: A Feature Selection Framework to Improve the Classification Performance of Multicategory Causes of Death in Colorectal Cancer. J Transl Med 2024; 104:100320. [PMID: 38158124 DOI: 10.1016/j.labinv.2023.100320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Revised: 12/05/2023] [Accepted: 12/20/2023] [Indexed: 01/03/2024] Open
Abstract
Despite the use of machine learning tools, it is challenging to properly model cause-specific deaths in colorectal cancer (CRC) patients and choose appropriate treatments. Here, we propose an interesting feature selection framework, namely union with recursive feature elimination (U-RFE), to select the union feature sets that are crucial in CRC progression-specific mortality using The Cancer Genome Atlas (TCGA) dataset. Based on the union feature sets, we compared the performance of 5 classification algorithms, including logistic regression (LR), support vector machines (SVM), random forest (RF), eXtreme gradient boosting (XGBoost), and Stacking, to identify the best model for classifying 4-category deaths. In the first stage of U-RFE, LR, SVM, and RF were used as base estimators to obtain subsets containing the same number of features but not exactly the same specific features. Union analysis of the subsets was then performed to determine the final union feature set, effectively combining the advantages of different algorithms. We found that the U-RFE framework could improve various models' performance. Stacking outperformed LR, SVM, RF, and XGBoost in most scenarios. When the target feature number of the RFE was set to 50 and the union feature set contained 298 deterministic features, the Stacking model achieved F1_weighted, Recall_weighted, Precision_weighted, Accuracy, and Matthews correlation coefficient of 0.851, 0.864, 0.854, 0.864, and 0.717, respectively. The performance of the minority categories was also significantly improved. Therefore, this recursive feature elimination-based approach of feature selection improves performances of classifying CRC deaths using clinical and omics data or those using other data with high feature redundancy and imbalance.
Collapse
Affiliation(s)
- Fei Deng
- School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China.
| | - Lin Zhao
- School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China
| | - Ning Yu
- School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China
| | - Yuxiang Lin
- School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China
| | - Lanjing Zhang
- Department of Biological Sciences, Rutgers University, Newark, New Jersey; Department of Pathology, Princeton Medical Center, Plainsboro, New Jersey; Rutgers Cancer Institute of New Jersey, New Brunswick, New Jersey; Department of Chemical Biology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, New Jersey.
| |
Collapse
|
29
|
Nopour R, Kazemi-Arpanahi H. Developing an intelligent prediction system for successful aging based on artificial neural networks. Int J Prev Med 2024; 15:10. [PMID: 38563039 PMCID: PMC10982733 DOI: 10.4103/ijpvm.ijpvm_47_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2023] [Accepted: 10/04/2023] [Indexed: 04/04/2024] Open
Abstract
Background Due to the growing number of disabilities in elderly, Attention to this period of life is essential to be considered. Few studies focused on the physical, mental, disabilities, and disorders affecting the quality of life in elderly people. SA1 is related to various factors influencing the elderly's life. So, the objective of the current study is to build an intelligent system for SA prediction through ANN2 algorithms to investigate better all factors affecting the elderly life and promote them. Methods This study was performed on 1156 SA and non-SA cases. We applied statistical feature reduction method to obtain the best factors predicting the SA. Two models of ANNs with 5, 10, 15, and 20 neurons in hidden layers were used for model construction. Finally, the best ANN configuration was obtained for predicting the SA using sensitivity, specificity, accuracy, and cross-entropy loss function. Results The study showed that 25 factors correlated with SA at the statistical level of P < 0.05. Assessing all ANN structures resulted in FF-BP3 algorithm having the configuration of 25-15-1 with accuracy-train of 0.92, accuracy-test of 0.86, and accuracy-validation of 0.87 gaining the best performance over other ANN algorithms. Conclusions Developing the CDSS for predicting SA has crucial role to effectively inform geriatrics and health care policymakers decision making.
Collapse
Affiliation(s)
- Raoof Nopour
- Department of Health Information Management, Iran University of Medical Sciences, Tehran, Iran
| | - Hadi Kazemi-Arpanahi
- Department of Health Information Technology, Abadan University of Medical Sciences, Abadan, Iran
| |
Collapse
|
30
|
Lyu H, Huang H, He J, Zhu S, Hong W, Lai J, Gao T, Shao J, Zhu J, Li Y, Hu S. Task-state skin potential abnormalities can distinguish major depressive disorder and bipolar depression from healthy controls. Transl Psychiatry 2024; 14:110. [PMID: 38395985 PMCID: PMC10891315 DOI: 10.1038/s41398-024-02828-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 02/07/2024] [Accepted: 02/13/2024] [Indexed: 02/25/2024] Open
Abstract
Early detection of bipolar depression (BPD) and major depressive disorder (MDD) has been challenging due to the lack of reliable and easily measurable biological markers. This study aimed to investigate the accuracy of discriminating patients with mood disorders from healthy controls based on task state skin potential characteristics and their correlation with individual indicators of oxidative stress. A total of 77 patients with BPD, 53 patients with MDD, and 79 healthy controls were recruited. A custom-made device, previously shown to be sufficiently accurate, was used to collect skin potential data during six emotion-inducing tasks involving video, pictorial, or textual stimuli. Blood indicators reflecting individual levels of oxidative stress were collected. A discriminant model based on the support vector machine (SVM) algorithm was constructed for discriminant analysis. MDD and BPD patients were found to have abnormal skin potential characteristics on most tasks. The accuracy of the SVM model built with SP features to discriminate MDD patients from healthy controls was 78% (sensitivity 78%, specificity 82%). The SVM model gave an accuracy of 59% (sensitivity 59%, specificity 79%) in classifying BPD patients, MDD patients, and healthy controls into three groups. Significant correlations were also found between oxidative stress indicators in the blood of patients and certain SP features. Patients with depression and bipolar depression have abnormalities in task-state skin potential that partially reflect the pathological mechanism of the illness, and the abnormalities are potential biological markers of affective disorders.
Collapse
Affiliation(s)
- Hailong Lyu
- Department of Psychiatry, The First Affiliated Hospital, Zhejiang University School of Medicine; Key Laboratory of Mental Disorder's Management of Zhejiang Province, Hangzhou, 310003, China
- Brain Research Institute of Zhejiang University, Hangzhou, 310003, China
- Zhejiang Engineering Center for Mathematical Mental Health, Hangzhou, 310003, China
| | - Huimin Huang
- The Third Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325200, China
- Ruian People's Hospital, Wenzhou, 325200, China
| | - Jiadong He
- College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, 310027, China
| | - Sheng Zhu
- Department of Psychiatry, The Ruian Fifth People's Hospital, Wenzhou, 325200, China
| | - Wanchu Hong
- Department of Psychiatry, The Ruian Fifth People's Hospital, Wenzhou, 325200, China
| | - Jianbo Lai
- Department of Psychiatry, The First Affiliated Hospital, Zhejiang University School of Medicine; Key Laboratory of Mental Disorder's Management of Zhejiang Province, Hangzhou, 310003, China
- Brain Research Institute of Zhejiang University, Hangzhou, 310003, China
- Zhejiang Engineering Center for Mathematical Mental Health, Hangzhou, 310003, China
| | | | - Jiamin Shao
- Department of Psychiatry, The First Affiliated Hospital, Zhejiang University School of Medicine; Key Laboratory of Mental Disorder's Management of Zhejiang Province, Hangzhou, 310003, China
- Brain Research Institute of Zhejiang University, Hangzhou, 310003, China
- Zhejiang Engineering Center for Mathematical Mental Health, Hangzhou, 310003, China
| | - Jianfeng Zhu
- Department of Psychiatry, The Ruian Fifth People's Hospital, Wenzhou, 325200, China
| | - Yubo Li
- College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, 310027, China.
| | - Shaohua Hu
- Department of Psychiatry, The First Affiliated Hospital, Zhejiang University School of Medicine; Key Laboratory of Mental Disorder's Management of Zhejiang Province, Hangzhou, 310003, China.
- Brain Research Institute of Zhejiang University, Hangzhou, 310003, China.
- Zhejiang Engineering Center for Mathematical Mental Health, Hangzhou, 310003, China.
- Ruian People's Hospital, Wenzhou, 325200, China.
| |
Collapse
|
31
|
Zhou TH, Zhou XX, Ni J, Ma YQ, Xu FY, Fan B, Guan Y, Jiang XA, Lin XQ, Li J, Xia Y, Wang X, Wang Y, Huang WJ, Tu WT, Dong P, Li ZB, Liu SY, Fan L. CT whole lung radiomic nomogram: a potential biomarker for lung function evaluation and identification of COPD. Mil Med Res 2024; 11:14. [PMID: 38374260 PMCID: PMC10877876 DOI: 10.1186/s40779-024-00516-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 01/22/2024] [Indexed: 02/21/2024] Open
Abstract
BACKGROUND Computed tomography (CT) plays a great role in characterizing and quantifying changes in lung structure and function of chronic obstructive pulmonary disease (COPD). This study aimed to explore the performance of CT-based whole lung radiomic in discriminating COPD patients and non-COPD patients. METHODS This retrospective study was performed on 2785 patients who underwent pulmonary function examination in 5 hospitals and were divided into non-COPD group and COPD group. The radiomic features of the whole lung volume were extracted. Least absolute shrinkage and selection operator (LASSO) logistic regression was applied for feature selection and radiomic signature construction. A radiomic nomogram was established by combining the radiomic score and clinical factors. Receiver operating characteristic (ROC) curve analysis and decision curve analysis (DCA) were used to evaluate the predictive performance of the radiomic nomogram in the training, internal validation, and independent external validation cohorts. RESULTS Eighteen radiomic features were collected from the whole lung volume to construct a radiomic model. The area under the curve (AUC) of the radiomic model in the training, internal, and independent external validation cohorts were 0.888 [95% confidence interval (CI) 0.869-0.906], 0.874 (95%CI 0.844-0.904) and 0.846 (95%CI 0.822-0.870), respectively. All were higher than the clinical model (AUC were 0.732, 0.714, and 0.777, respectively, P < 0.001). DCA demonstrated that the nomogram constructed by combining radiomic score, age, sex, height, and smoking status was superior to the clinical factor model. CONCLUSIONS The intuitive nomogram constructed by CT-based whole-lung radiomic has shown good performance and high accuracy in identifying COPD in this multicenter study.
Collapse
Affiliation(s)
- Tao-Hu Zhou
- Department of Radiology, the Second Affiliated Hospital of Naval Medical University, Shanghai, 200003, China
- School of Medical Imaging, Shandong Second Medical University, Weifang, 261053, Shandong, China
| | - Xiu-Xiu Zhou
- Department of Radiology, the Second Affiliated Hospital of Naval Medical University, Shanghai, 200003, China
| | - Jiong Ni
- Department of Radiology, School of Medicine, Tongji Hospital, Tongji University, Shanghai, 200065, China
| | - Yan-Qing Ma
- Department of Radiology, Zhejiang Province People's Hospital, Affiliated People's Hospital of Hangzhou Medical College, Hangzhou, 310014, China
| | - Fang-Yi Xu
- Department of Radiology, Sir Run Run Shaw Hospital, Zhejiang, 310018, China
| | - Bing Fan
- Jiangxi Provincial People's Hospital, the First Affiliated Hospital of Nanchang Medical College, Nanchang, 330006, China
| | - Yu Guan
- Department of Radiology, the Second Affiliated Hospital of Naval Medical University, Shanghai, 200003, China
| | - Xin-Ang Jiang
- Department of Radiology, the Second Affiliated Hospital of Naval Medical University, Shanghai, 200003, China
| | - Xiao-Qing Lin
- Department of Radiology, the Second Affiliated Hospital of Naval Medical University, Shanghai, 200003, China
- College of Health Sciences and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Jie Li
- Department of Radiology, the Second Affiliated Hospital of Naval Medical University, Shanghai, 200003, China
- College of Health Sciences and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Yi Xia
- Department of Radiology, the Second Affiliated Hospital of Naval Medical University, Shanghai, 200003, China
| | - Xiang Wang
- Department of Radiology, the Second Affiliated Hospital of Naval Medical University, Shanghai, 200003, China
| | - Yun Wang
- Department of Radiology, the Second Affiliated Hospital of Naval Medical University, Shanghai, 200003, China
| | - Wen-Jun Huang
- Department of Radiology, the Second Affiliated Hospital of Naval Medical University, Shanghai, 200003, China
- Department of Radiology, the Second People's Hospital of Deyang, Deyang, 618000, Sichuan, China
| | - Wen-Ting Tu
- Department of Radiology, the Second Affiliated Hospital of Naval Medical University, Shanghai, 200003, China
| | - Peng Dong
- School of Medical Imaging, Shandong Second Medical University, Weifang, 261053, Shandong, China
| | - Zhao-Bin Li
- Department of Radiation Oncology, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, 200233, China
| | - Shi-Yuan Liu
- Department of Radiology, the Second Affiliated Hospital of Naval Medical University, Shanghai, 200003, China
| | - Li Fan
- Department of Radiology, the Second Affiliated Hospital of Naval Medical University, Shanghai, 200003, China.
| |
Collapse
|
32
|
Reduwan NH, Abdul Aziz AA, Mohd Razi R, Abdullah ERMF, Mazloom Nezhad SM, Gohain M, Ibrahim N. Application of deep learning and feature selection technique on external root resorption identification on CBCT images. BMC Oral Health 2024; 24:252. [PMID: 38373931 PMCID: PMC10875886 DOI: 10.1186/s12903-024-03910-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 01/17/2024] [Indexed: 02/21/2024] Open
Abstract
BACKGROUND Artificial intelligence has been proven to improve the identification of various maxillofacial lesions. The aim of the current study is two-fold: to assess the performance of four deep learning models (DLM) in external root resorption (ERR) identification and to assess the effect of combining feature selection technique (FST) with DLM on their ability in ERR identification. METHODS External root resorption was simulated on 88 extracted premolar teeth using tungsten bur in different depths (0.5 mm, 1 mm, and 2 mm). All teeth were scanned using a Cone beam CT (Carestream Dental, Atlanta, GA). Afterward, a training (70%), validation (10%), and test (20%) dataset were established. The performance of four DLMs including Random Forest (RF) + Visual Geometry Group 16 (VGG), RF + EfficienNetB4 (EFNET), Support Vector Machine (SVM) + VGG, and SVM + EFNET) and four hybrid models (DLM + FST: (i) FS + RF + VGG, (ii) FS + RF + EFNET, (iii) FS + SVM + VGG and (iv) FS + SVM + EFNET) was compared. Five performance parameters were assessed: classification accuracy, F1-score, precision, specificity, and error rate. FST algorithms (Boruta and Recursive Feature Selection) were combined with the DLMs to assess their performance. RESULTS RF + VGG exhibited the highest performance in identifying ERR, followed by the other tested models. Similarly, FST combined with RF + VGG outperformed other models with classification accuracy, F1-score, precision, and specificity of 81.9%, weighted accuracy of 83%, and area under the curve (AUC) of 96%. Kruskal Wallis test revealed a significant difference (p = 0.008) in the prediction accuracy among the eight DLMs. CONCLUSION In general, all DLMs have similar performance on ERR identification. However, the performance can be improved by combining FST with DLMs.
Collapse
Affiliation(s)
- Nor Hidayah Reduwan
- Department of Oral and Maxillofacial Clinical Sciences, Faculty of Dentistry, Universiti Malaya, Kuala Lumpur, 50603, Malaysia
- Centre of Oral and Maxillofacial Diagnostic and Medicine Studies, Faculty of Dentistry, University Teknologi MARA, Sungai Buloh, 47000, Malaysia
| | - Azwatee Abdul Abdul Aziz
- Department of Restorative Dentistry, Faculty of Dentistry, Universiti Malaya, Kuala Lumpur, 50603, Malaysia
| | - Roziana Mohd Razi
- Department of Pediatric Dentistry and Orthodontic, Faculty of Dentistry, Universiti Malaya, Kuala Lumpur, 50603, Malaysia
| | - Erma Rahayu Mohd Faizal Abdullah
- Department of Artificial Intelligence, Faculty of Computer Science and Information Technology, Universiti Malaya, Kuala Lumpur, 50603, Malaysia.
| | - Seyed Matin Mazloom Nezhad
- Department of Artificial Intelligence, Faculty of Computer Science and Information Technology, Universiti Malaya, Kuala Lumpur, 50603, Malaysia
| | - Meghna Gohain
- Department of Oral and Maxillofacial Clinical Sciences, Faculty of Dentistry, Universiti Malaya, Kuala Lumpur, 50603, Malaysia
| | - Norliza Ibrahim
- Department of Oral and Maxillofacial Clinical Sciences, Faculty of Dentistry, Universiti Malaya, Kuala Lumpur, 50603, Malaysia.
| |
Collapse
|
33
|
Zhang H, Yin J, Zhou C, Qiu J, Wang J, Lv Q, Luo T. Identification of ipsilateral supraclavicular lymph node metastasis in breast cancer based on LASSO regression with a high penalty factor. Front Oncol 2024; 14:1349315. [PMID: 38371618 PMCID: PMC10869533 DOI: 10.3389/fonc.2024.1349315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 01/19/2024] [Indexed: 02/20/2024] Open
Abstract
Aiming at the problems of small sample size and large feature dimension in the identification of ipsilateral supraclavicular lymph node metastasis status in breast cancer using ultrasound radiomics, an optimized feature combination search algorithm is proposed to construct linear classification models with high interpretability. The genetic algorithm (GA) is used to search for feature combinations within the feature subspace using least absolute shrinkage and selection operator (LASSO) regression. The search is optimized by applying a high penalty to the L1 norm of LASSO to retain excellent features in the crossover operation of the GA. The experimental results show that the linear model constructed using this method outperforms those using the conventional LASSO regression and standard GA. Therefore, this method can be used to build linear models with higher classification performance and more robustness.
Collapse
Affiliation(s)
- Haohan Zhang
- West China Hospital, Sichuan University, Chengdu, China
| | - Jin Yin
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China
- Med-X Center for Informatics, Sichuan University, Chengdu, China
- Division of Breast Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, China
| | - Chen Zhou
- Division of Breast Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, China
- Breast Center, West China Hospital, Sichuan University, Chengdu, China
- Clinical Research Center for Breast Diseases, West China Hospital, Sichuan University, Chengdu, China
| | - Jiajun Qiu
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China
- Med-X Center for Informatics, Sichuan University, Chengdu, China
- Division of Breast Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, China
| | - Junren Wang
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China
- Med-X Center for Informatics, Sichuan University, Chengdu, China
| | - Qing Lv
- Division of Breast Surgery, Department of General Surgery, West China Hospital, Sichuan University, Chengdu, China
- Breast Center, West China Hospital, Sichuan University, Chengdu, China
- Clinical Research Center for Breast Diseases, West China Hospital, Sichuan University, Chengdu, China
| | - Ting Luo
- Breast Center, West China Hospital, Sichuan University, Chengdu, China
- Department of Medical Oncology, Cancer Center, West China Hospital, Sichuan University, Chengdu, China
| |
Collapse
|
34
|
Jenul A, Stokmo HL, Schrunner S, Hjortland GO, Revheim ME, Tomic O. Novel ensemble feature selection techniques applied to high-grade gastroenteropancreatic neuroendocrine neoplasms for the prediction of survival. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 244:107934. [PMID: 38016391 DOI: 10.1016/j.cmpb.2023.107934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 09/05/2023] [Accepted: 11/17/2023] [Indexed: 11/30/2023]
Abstract
BACKGROUND AND OBJECTIVE Determining the most informative features for predicting the overall survival of patients diagnosed with high-grade gastroenteropancreatic neuroendocrine neoplasms is crucial to improve individual treatment plans for patients, as well as the biological understanding of the disease. The main objective of this study is to evaluate the use of modern ensemble feature selection techniques for this purpose with respect to (a) quantitative performance measures such as predictive performance, (b) clinical interpretability, and (c) the effect of integrating prior expert knowledge. METHODS The Repeated Elastic Net Technique for Feature Selection (RENT) and the User-Guided Bayesian Framework for Feature Selection (UBayFS) are recently developed ensemble feature selectors investigated in this work. Both allow the user to identify informative features in datasets with low sample sizes and focus on model interpretability. While RENT is purely data-driven, UBayFS can integrate expert knowledge a priori in the feature selection process. In this work, we compare both feature selectors on a dataset comprising 63 patients and 110 features from multiple sources, including baseline patient characteristics, baseline blood values, tumor histology, imaging, and treatment information. RESULTS Our experiments involve data-driven and expert-driven setups, as well as combinations of both. In a five-fold cross-validated experiment without expert knowledge, our results demonstrate that both feature selectors allow accurate predictions: A reduction from 110 to approximately 20 features (around 82%) delivers near-optimal predictive performances with minor variations according to the choice of the feature selector, the predictive model, and the fold. Thereafter, we use findings from clinical literature as a source of expert knowledge. In addition, expert knowledge has a stabilizing effect on the feature set (an increase in stability of approximately 40%), while the impact on predictive performance is limited. CONCLUSIONS The features WHO Performance Status, Albumin, Platelets, Ki-67, Tumor Morphology, Total MTV, Total TLG, and SUVmax are the most stable and predictive features in our study. Overall, this study demonstrated the practical value of feature selection in medical applications not only to improve quantitative performance but also to deliver potentially new insights to experts.
Collapse
Affiliation(s)
- Anna Jenul
- Department of Data Science, Norwegian University of Life Sciences, Universitetstunet 3, 1433 Ås, Norway.
| | - Henning Langen Stokmo
- Department of Nuclear Medicine, Division of Radiology and Nuclear Medicine, Oslo University Hospital, Oslo, Norway; Institute of Clinical Medicine, University of Oslo, Oslo, Norway.
| | - Stefan Schrunner
- Department of Data Science, Norwegian University of Life Sciences, Universitetstunet 3, 1433 Ås, Norway.
| | | | - Mona-Elisabeth Revheim
- Department of Nuclear Medicine, Division of Radiology and Nuclear Medicine, Oslo University Hospital, Oslo, Norway; Institute of Clinical Medicine, University of Oslo, Oslo, Norway; The Intervention Centre, Division of Technology and Innovation, Oslo University Hospital, Oslo, Norway.
| | - Oliver Tomic
- Department of Data Science, Norwegian University of Life Sciences, Universitetstunet 3, 1433 Ås, Norway.
| |
Collapse
|
35
|
Aragones DG, Palomino-Segura M, Sicilia J, Crainiciuc G, Ballesteros I, Sánchez-Cabo F, Hidalgo A, Calvo GF. Variable selection for nonlinear dimensionality reduction of biological datasets through bootstrapping of correlation networks. Comput Biol Med 2024; 168:107827. [PMID: 38086138 DOI: 10.1016/j.compbiomed.2023.107827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 11/15/2023] [Accepted: 12/04/2023] [Indexed: 01/10/2024]
Abstract
Identifying the most relevant variables or features in massive datasets for dimensionality reduction can lead to improved and more informative display, faster computation times, and more explainable models of complex systems. Despite significant advances and available algorithms, this task generally remains challenging, especially in unsupervised settings. In this work, we propose a method that constructs correlation networks using all intervening variables and then selects the most informative ones based on network bootstrapping. The method can be applied in both supervised and unsupervised scenarios. We demonstrate its functionality by applying Uniform Manifold Approximation and Projection for dimensionality reduction to several high-dimensional biological datasets, derived from 4D live imaging recordings of hundreds of morpho-kinetic variables, describing the dynamics of thousands of individual leukocytes at sites of prominent inflammation. We compare our method with other standard ones in the field, such as Principal Component Analysis and Elastic Net, showing that it outperforms them. The proposed method can be employed in a wide range of applications, encompassing data analysis and machine learning.
Collapse
Affiliation(s)
- David G Aragones
- Department of Mathematics & MOLAB-Mathematical Oncology Laboratory, Universidad de Castilla-La Mancha, Ciudad Real, Spain
| | - Miguel Palomino-Segura
- Area of Cell and Developmental Biology, Centro Nacional de Investigaciones Cardiovasculares Carlos III, Madrid, Spain; Immunophysiology Research Group, Instituto Universitario de Investigación Biosanitaria de Extremadura (INUBE), Badajoz, Spain; Department of Physiology, Faculty of Sciences, University of Extremadura, Badajoz, Spain
| | - Jon Sicilia
- Area of Cell and Developmental Biology, Centro Nacional de Investigaciones Cardiovasculares Carlos III, Madrid, Spain
| | - Georgiana Crainiciuc
- Area of Cell and Developmental Biology, Centro Nacional de Investigaciones Cardiovasculares Carlos III, Madrid, Spain
| | - Iván Ballesteros
- Area of Cell and Developmental Biology, Centro Nacional de Investigaciones Cardiovasculares Carlos III, Madrid, Spain
| | - Fátima Sánchez-Cabo
- Bioinformatics Unit, Centro Nacional de Investigaciones Cardiovasculares Carlos III, Madrid, Spain
| | - Andrés Hidalgo
- Vascular Biology and Therapeutics Program and Department of Immunobiology, Yale University School of Medicine, New Haven, CT, USA
| | - Gabriel F Calvo
- Department of Mathematics & MOLAB-Mathematical Oncology Laboratory, Universidad de Castilla-La Mancha, Ciudad Real, Spain.
| |
Collapse
|
36
|
Ding X, Li Y, Chen S. Maximum margin and global criterion based-recursive feature selection. Neural Netw 2024; 169:597-606. [PMID: 37956576 DOI: 10.1016/j.neunet.2023.10.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 06/19/2023] [Accepted: 10/22/2023] [Indexed: 11/15/2023]
Abstract
In this research paper, we aim to investigate and address the limitations of recursive feature elimination (RFE) and its variants in high-dimensional feature selection tasks. We identify two main challenges associated with these methods. Firstly, the feature ranking criterion utilized in these approaches is inconsistent with the maximum-margin theory. Secondly, the computation of the criterion is performed locally, lacking the ability to measure the importance of features globally. To overcome these challenges, we propose a novel feature ranking criterion called Maximum Margin and Global (MMG) criterion. This criterion utilizes the classification margin to determine the importance of features and computes it globally, enabling a more accurate assessment of feature importance. Moreover, we introduce an optimal feature subset evaluation algorithm that leverages the MMG criterion to determine the best subset of features. To enhance the efficiency of the proposed algorithms, we provide two alpha seeding strategies that significantly reduce computational costs while maintaining high accuracy. These strategies offer a practical means to expedite the feature selection process. Through extensive experiments conducted on ten benchmark datasets, we demonstrate that our proposed algorithms outperform current state-of-the-art methods. Additionally, the alpha seeding strategies yield significant speedups, further enhancing the efficiency of the feature selection process.
Collapse
Affiliation(s)
- Xiaojian Ding
- College of Information Engineering, Nanjing University of Finance and Economics, Nanjing 210023, China.
| | - Yi Li
- College of Economics and Management, Nanjing Agricultural University, Nanjing 210095, China
| | - Shilin Chen
- Thoracic Surgery, Nanjing Medical University Affiliated Cancer Hospital, Jiangsu Cancer Hospital, Jiangsu Institute of Cancer Research, Nanjing 221005, China
| |
Collapse
|
37
|
Williams TL, Gonen M, Wray R, Do RKG, Simpson AL. Quantitation of Oncologic Image Features for Radiomic Analyses in PET. Methods Mol Biol 2024; 2729:409-421. [PMID: 38006509 DOI: 10.1007/978-1-0716-3499-8_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2023]
Abstract
Radiomics is an emerging and exciting field of study involving the extraction of many quantitative features from radiographic images. Positron emission tomography (PET) images are used in cancer diagnosis and staging. Utilizing radiomics on PET images can better quantify the spatial relationships between image voxels and generate more consistent and accurate results for diagnosis, prognosis, treatment, etc. This chapter gives the general steps a researcher would take to extract PET radiomic features from medical images and properly develop models to implement.
Collapse
Affiliation(s)
- Travis L Williams
- Department of Epidemiology & Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Mithat Gonen
- Department of Epidemiology & Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Rick Wray
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Richard K G Do
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Amber L Simpson
- School of Computing and Department of Biomedical and Molecular Sciences, Queen's University, Kingston, ON, Canada.
| |
Collapse
|
38
|
Arian R, Vard A, Kafieh R, Plonka G, Rabbani H. A new convolutional neural network based on combination of circlets and wavelets for macular OCT classification. Sci Rep 2023; 13:22582. [PMID: 38114582 PMCID: PMC10730902 DOI: 10.1038/s41598-023-50164-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Accepted: 12/15/2023] [Indexed: 12/21/2023] Open
Abstract
Artificial intelligence (AI) algorithms, encompassing machine learning and deep learning, can assist ophthalmologists in early detection of various ocular abnormalities through the analysis of retinal optical coherence tomography (OCT) images. Despite considerable progress in these algorithms, several limitations persist in medical imaging fields, where a lack of data is a common issue. Accordingly, specific image processing techniques, such as time-frequency transforms, can be employed in conjunction with AI algorithms to enhance diagnostic accuracy. This research investigates the influence of non-data-adaptive time-frequency transforms, specifically X-lets, on the classification of OCT B-scans. For this purpose, each B-scan was transformed using every considered X-let individually, and all the sub-bands were utilized as the input for a designed 2D Convolutional Neural Network (CNN) to extract optimal features, which were subsequently fed to the classifiers. Evaluating per-class accuracy shows that the use of the 2D Discrete Wavelet Transform (2D-DWT) yields superior outcomes for normal cases, whereas the circlet transform outperforms other X-lets for abnormal cases characterized by circles in their retinal structure (due to the accumulation of fluid). As a result, we propose a novel transform named CircWave by concatenating all sub-bands from the 2D-DWT and the circlet transform. The objective is to enhance the per-class accuracy of both normal and abnormal cases simultaneously. Our findings show that classification results based on the CircWave transform outperform those derived from original images or any individual transform. Furthermore, Grad-CAM class activation visualization for B-scans reconstructed from CircWave sub-bands highlights a greater emphasis on circular formations in abnormal cases and straight lines in normal cases, in contrast to the focus on irrelevant regions in original B-scans. To assess the generalizability of our method, we applied it to another dataset obtained from a different imaging system. We achieved promising accuracies of 94.5% and 90% for the first and second datasets, respectively, which are comparable with results from previous studies. The proposed CNN based on CircWave sub-bands (i.e. CircWaveNet) not only produces superior outcomes but also offers more interpretable results with a heightened focus on features crucial for ophthalmologists.
Collapse
Affiliation(s)
- Roya Arian
- Department of Bioelectrics and Biomedical Engineering, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, 81746-73461, Iran
- Medical Image and Signal Processing Research Center, Isfahan University of Medical Sciences, Isfahan, 81746-73461, Iran
| | - Alireza Vard
- Department of Bioelectrics and Biomedical Engineering, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, 81746-73461, Iran
| | - Rahele Kafieh
- Department of Engineering, Durham University, South Road, Durham, UK
| | - Gerlind Plonka
- Institute for Numerical and Applied Mathematics, University of Göttingen, Lotzestr. 16-18, 37083, Göttingen, Germany
| | - Hossein Rabbani
- Medical Image and Signal Processing Research Center, Isfahan University of Medical Sciences, Isfahan, 81746-73461, Iran.
| |
Collapse
|
39
|
Augustine J, Jereesh AS. Identification of gene-level methylation for disease prediction. Interdiscip Sci 2023; 15:678-695. [PMID: 37603212 DOI: 10.1007/s12539-023-00584-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 07/30/2023] [Accepted: 08/01/2023] [Indexed: 08/22/2023]
Abstract
DNA methylation is an epigenetic alteration that plays a fundamental part in governing gene regulatory processes. The DNA methylation mechanism affixes methyl groups to distinct cytosine residues, influencing chromatin architectures. Multiple studies have demonstrated that DNA methylation's regulatory effect on genes is linked to the beginning and progression of several disorders. Researchers have recently uncovered thousands of phenotype-related methylation sites through the epigenome-wide association study (EWAS). However, combining the methylation levels of several sites within a gene and determining the gene-level DNA methylation remains challenging. In this study, we proposed the supervised UMAP Assisted Gene-level Methylation method (sUAGM) for disease prediction based on supervised UMAP (Uniform Manifold Approximation and Projection), a manifold learning-based method for reducing dimensionality. The methylation values at the gene level generated using the proposed method are evaluated by employing various feature selection and classification algorithms on three distinct DNA methylation datasets derived from blood samples. The performance has been assessed employing classification accuracy, F-1 score, Mathews Correlation Coefficient (MCC), Kappa, Classification Success Index (CSI) and Jaccard Index. The Support Vector Machine with the linear kernel (SVML) classifier with Recursive Feature Elimination (RFE) performs best across all three datasets. From comparative analysis, our method outperformed existing gene-level and site-level approaches by achieving 100% accuracy and F1-score with fewer genes. The functional analysis of the top 28 genes selected from the Parkinson's disease dataset revealed a significant association with the disease.
Collapse
Affiliation(s)
- Jisha Augustine
- Bioinformatics Lab, Department of Computer Science, Cochin University of Science and Technology, Cochin, Kerala, 682022, India.
| | - A S Jereesh
- Bioinformatics Lab, Department of Computer Science, Cochin University of Science and Technology, Cochin, Kerala, 682022, India
| |
Collapse
|
40
|
Chatterjee A, Pahari N, Prinz A, Riegler M. AI and semantic ontology for personalized activity eCoaching in healthy lifestyle recommendations: a meta-heuristic approach. BMC Med Inform Decis Mak 2023; 23:278. [PMID: 38041041 PMCID: PMC10693173 DOI: 10.1186/s12911-023-02364-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 11/03/2023] [Indexed: 12/03/2023] Open
Abstract
BACKGROUND Automated coaches (eCoach) can help people lead a healthy lifestyle (e.g., reduction of sedentary bouts) with continuous health status monitoring and personalized recommendation generation with artificial intelligence (AI). Semantic ontology can play a crucial role in knowledge representation, data integration, and information retrieval. METHODS This study proposes a semantic ontology model to annotate the AI predictions, forecasting outcomes, and personal preferences to conceptualize a personalized recommendation generation model with a hybrid approach. This study considers a mixed activity projection method that takes individual activity insights from the univariate time-series prediction and ensemble multi-class classification approaches. We have introduced a way to improve the prediction result with a residual error minimization (REM) technique and make it meaningful in recommendation presentation with a Naïve-based interval prediction approach. We have integrated the activity prediction results in an ontology for semantic interpretation. A SPARQL query protocol and RDF Query Language (SPARQL) have generated personalized recommendations in an understandable format. Moreover, we have evaluated the performance of the time-series prediction and classification models against standard metrics on both imbalanced and balanced public PMData and private MOX2-5 activity datasets. We have used Adaptive Synthetic (ADASYN) to generate synthetic data from the minority classes to avoid bias. The activity datasets were collected from healthy adults (n = 16 for public datasets; n = 15 for private datasets). The standard ensemble algorithms have been used to investigate the possibility of classifying daily physical activity levels into the following activity classes: sedentary (0), low active (1), active (2), highly active (3), and rigorous active (4). The daily step count, low physical activity (LPA), medium physical activity (MPA), and vigorous physical activity (VPA) serve as input for the classification models. Subsequently, we re-verify the classifiers on the private MOX2-5 dataset. The performance of the ontology has been assessed with reasoning and SPARQL query execution time. Additionally, we have verified our ontology for effective recommendation generation. RESULTS We have tested several standard AI algorithms and selected the best-performing model with optimized configuration for our use case by empirical testing. We have found that the autoregression model with the REM method outperforms the autoregression model without the REM method for both datasets. Gradient Boost (GB) classifier outperforms other classifiers with a mean accuracy score of 98.00%, and 99.00% for imbalanced PMData and MOX2-5 datasets, respectively, and 98.30%, and 99.80% for balanced PMData and MOX2-5 datasets, respectively. Hermit reasoner performs better than other ontology reasoners under defined settings. Our proposed algorithm shows a direction to combine the AI prediction forecasting results in an ontology to generate personalized activity recommendations in eCoaching. CONCLUSION The proposed method combining step-prediction, activity-level classification techniques, and personal preference information with semantic rules is an asset for generating personalized recommendations.
Collapse
Affiliation(s)
- Ayan Chatterjee
- Department of Information and Communication Technology, Centre for E-Health, University of Agder, Grimstad, Norway.
- Department of Holistic Systems, Simula Metropolitan Center for Digital Engineering (SimulaMet), Oslo, Norway.
| | - Nibedita Pahari
- Department of Computer Science and Engineering, Maulana Abul Kalam Azad University of Technology, Kolkata, India
| | - Andreas Prinz
- Department of Information and Communication Technology, Centre for E-Health, University of Agder, Grimstad, Norway
| | - Michael Riegler
- Department of Holistic Systems, Simula Metropolitan Center for Digital Engineering (SimulaMet), Oslo, Norway
| |
Collapse
|
41
|
Huang MW, Tsai CF, Tsui SC, Lin WC. Combining data discretization and missing value imputation for incomplete medical datasets. PLoS One 2023; 18:e0295032. [PMID: 38033140 PMCID: PMC10688879 DOI: 10.1371/journal.pone.0295032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Accepted: 11/14/2023] [Indexed: 12/02/2023] Open
Abstract
Data discretization aims to transform a set of continuous features into discrete features, thus simplifying the representation of information and making it easier to understand, use, and explain. In practice, users can take advantage of the discretization process to improve knowledge discovery and data analysis on medical domain problem datasets containing continuous features. However, certain feature values were frequently missing. Many data-mining algorithms cannot handle incomplete datasets. In this study, we considered the use of both discretization and missing-value imputation to process incomplete medical datasets, examining how the order of discretization and missing-value imputation combined influenced performance. The experimental results were obtained using seven different medical domain problem datasets: two discretizers, including the minimum description length principle (MDLP) and ChiMerge; three imputation methods, including the mean/mode, classification and regression tree (CART), and k-nearest neighbor (KNN) methods; and two classifiers, including support vector machines (SVM) and the C4.5 decision tree. The results show that a better performance can be obtained by first performing discretization followed by imputation, rather than vice versa. Furthermore, the highest classification accuracy rate was achieved by combining ChiMerge and KNN with SVM.
Collapse
Affiliation(s)
- Min-Wei Huang
- Kaohsiung Municipal Kai-Syuan Psychiatric Hospital, Kaohsiung, Taiwan
- Department of Physical Therapy and Graduate Institute of Rehabilitation Science, China Medical University, Taichung, Taiwan
| | - Chih-Fong Tsai
- Department of Information Management, National Central University, Taoyuan, Taiwan
| | - Shu-Ching Tsui
- Department of Information Management, National Central University, Taoyuan, Taiwan
| | - Wei-Chao Lin
- Department of Digital Financial Technology, Chang Gung University, Taoyuan, Taiwan
- Department of Information Management, Chang Gung University, Taoyuan, Taiwan
- Division of Thoracic Surgery, Chang Gung Memorial Hospital at Linkou, Taoyuan, Taiwan
| |
Collapse
|
42
|
van Dartel D, Wang Y, Hegeman JH, Vollenbroek-Hutten MMR. Prediction of Physical Activity Patterns in Older Patients Rehabilitating After Hip Fracture Surgery: Exploratory Study. JMIR Rehabil Assist Technol 2023; 10:e45307. [PMID: 38032703 PMCID: PMC10727481 DOI: 10.2196/45307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 06/25/2023] [Accepted: 07/27/2023] [Indexed: 12/01/2023] Open
Abstract
BACKGROUND Building up physical activity is a highly important aspect in an older patient's rehabilitation process after hip fracture surgery. The patterns of physical activity during rehabilitation are associated with the duration of rehabilitation stay. Predicting physical activity patterns early in the rehabilitation phase can provide patients and health care professionals an early indication of the duration of rehabilitation stay as well as insight into the degree of patients' recovery for timely adaptive interventions. OBJECTIVE This study aims to explore the early prediction of physical activity patterns in older patients rehabilitating after hip fracture surgery at a skilled nursing home. METHODS The physical activity of patients aged ≥70 years with surgically treated hip fracture was continuously monitored using an accelerometer during rehabilitation at a skilled nursing home. Physical activity patterns were described in our previous study, and the 2 most common patterns were used in this study for pattern prediction: the upward linear pattern (n=15) and the S-shape pattern (n=23). Features from the intensity of physical activity were calculated for time windows with different window sizes of the first 5, 6, 7, and 8 days to assess the early rehabilitation moment in which the patterns could be predicted most accurately. Those features were statistical features, amplitude features, and morphological features. Furthermore, the Barthel Index, Fracture Mobility Score, Functional Ambulation Categories, and the Montreal Cognitive Assessment score were used as clinical features. With the correlation-based feature selection method, relevant features were selected that were highly correlated with the physical activity patterns and uncorrelated with other features. Multiple classifiers were used: decision trees, discriminant analysis, logistic regression, support vector machines, nearest neighbors, and ensemble classifiers. The performance of the prediction models was assessed by calculating precision, recall, and F1-score (accuracy measure) for each individual physical activity pattern. Furthermore, the overall performance of the prediction model was calculated by calculating the F1-score for all physical activity patterns together. RESULTS The amplitude feature describing the overall intensity of physical activity on the first day of rehabilitation and the morphological features describing the shape of the patterns were selected as relevant features for all time windows. Relevant features extracted from the first 7 days with a cosine k-nearest neighbor model reached the highest overall prediction performance (micro F1-score=1) and a 100% correct classification of the 2 most common physical activity patterns. CONCLUSIONS Continuous monitoring of the physical activity of older patients in the first week of hip fracture rehabilitation results in an early physical activity pattern prediction. In the future, continuous physical activity monitoring can offer the possibility to predict the duration of rehabilitation stay, assess the recovery progress during hip fracture rehabilitation, and benefit health care organizations, health care professionals, and patients themselves.
Collapse
Affiliation(s)
- Dieuwke van Dartel
- Department of Biomedical Signals and Systems, University of Twente, Enschede, Netherlands
- Department of Trauma Surgery, Ziekenhuisgroep Twente, Almelo, Netherlands
| | - Ying Wang
- Department of Biomedical Signals and Systems, University of Twente, Enschede, Netherlands
- Ziekenhuisgroep Twente Academy, Ziekenhuisgroep Twente, Almelo, Netherlands
| | - Johannes H Hegeman
- Department of Biomedical Signals and Systems, University of Twente, Enschede, Netherlands
- Department of Trauma Surgery, Ziekenhuisgroep Twente, Almelo, Netherlands
| | - Miriam M R Vollenbroek-Hutten
- Department of Biomedical Signals and Systems, University of Twente, Enschede, Netherlands
- Board of Directors, Medisch Spectrum Twente, Enschede, Netherlands
| |
Collapse
|
43
|
Alghushairy O, Ali F, Alghamdi W, Khalid M, Alsini R, Asiry O. Machine learning-based model for accurate identification of druggable proteins using light extreme gradient boosting. J Biomol Struct Dyn 2023:1-12. [PMID: 37850427 DOI: 10.1080/07391102.2023.2269280] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 10/04/2023] [Indexed: 10/19/2023]
Abstract
The identification of druggable proteins (DPs) is significant for the development of new drugs, personalized medicine, understanding of disease mechanisms, drug repurposing, and economic benefits. By identifying new druggable targets, researchers can develop new therapies for a range of diseases, leading to better patient outcomes. Identification of DPs by machine learning strategies is more efficient and cost-effective than conventional methods. In this study, a computational predictor, namely Drug-LXGB, is introduced to enhance the identification of DPs. Features are discovered by composition, transition, and distribution (CTD), composition of K-spaced amino acid pair (CKSAAP), pseudo-position-specific scoring matrix (PsePSSM), and a novel descriptor, called multi-block pseudo amino acid composition (MB-PseAAC). The dimensions of CTD, CKSAAP, PsePSSM, and MB-PseAAC are integrated and utilized the sequential forward selection as feature selection algorithm. The best characteristics are provided by random forest, extreme gradient boosting, and light eXtreme gradient boosting (LXGB). The predictive analysis of these learning methods is measured via 10-fold cross-validation. The LXGB-based model secures the highest results than other existing predictors. Our novel protocol will perform an active role in designing novel drugs and would be fruitful to explore the potential target. This study will help better to capture a more universal view of a potential target.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Omar Alghushairy
- Department of Information Systems and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | - Farman Ali
- Department of Software Engineering, Sarhad University of Science and Information Technology Peshawar Mardan Campus, Peshawar, Pakistan
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Majdi Khalid
- Department of Computer Science, College of Computers and Information Systems, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Raed Alsini
- Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Othman Asiry
- Department of Information Technology, College of Computing and Information Technology at Khulais, University of Jeddah, Jeddah, Saudi Arabia
| |
Collapse
|
44
|
de Lima Gonçalves V, Ribeiro CT, Cavalheiro GL, Zaruz MJF, da Silva DH, Milagre ST, de Oliveira Andrade A, Pereira AA. A hybrid linear discriminant analysis and genetic algorithm to create a linear model of aging when performing motor tasks through inertial sensors positioned on the hand and forearm. Biomed Eng Online 2023; 22:98. [PMID: 37845723 PMCID: PMC10580547 DOI: 10.1186/s12938-023-01161-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 10/01/2023] [Indexed: 10/18/2023] Open
Abstract
BACKGROUND During the aging process, cognitive functions and performance of the muscular and neural system show signs of decline, thus making the elderly more susceptible to disease and death. These alterations, which occur with advanced age, affect functional performance in both the lower and upper members, and consequently human motor functions. Objective measurements are important tools to help understand and characterize the dysfunctions and limitations that occur due to neuromuscular changes related to advancing age. Therefore, the objective of this study is to attest to the difference between groups of young and old individuals through manual movements and whether the combination of features can produce a linear correlation concerning the different age groups. METHODS This study counted on 99 participants, these were divided into 8 groups, which were grouped by age. The data collection was performed using inertial sensors (positioned on the back of the hand and on the back of the forearm). Firstly, the participants were divided into groups of young and elderly to verify if the groups could be distinguished through the features alone. Following this, the features were combined using the linear discriminant analysis (LDA), which gave rise to a singular feature called the LDA-value that aided in verifying the correlation between the different age ranges and the LDA-value. RESULTS The results demonstrated that 125 features are able to distinguish the difference between the groups of young and elderly individuals. The use of the LDA-value allows for the obtaining of a linear model of the changes that occur with aging in the performance of tasks in line with advancing age, the correlation obtained, using Pearson's coefficient, was 0.86. CONCLUSION When we compare only the young and elderly groups, the results indicate that there is a difference in the way tasks are performed between young and elderly individuals. When the 8 groups were analyzed, the linear correlation obtained was strong, with the LDA-value being effective in obtaining a linear correlation of the eight groups, demonstrating that although the features alone do not demonstrate gradual changes as a function of age, their combination established these changes.
Collapse
Affiliation(s)
- Veronica de Lima Gonçalves
- Postgraduate Program in Electrical and Biomedical Engineering, Faculty of Electrical Engineering, Centre for Innovation and Technology Assessment in Health, Federal University of Uberlândia, Uberlândia, Brazil
| | - Caio Tonus Ribeiro
- Postgraduate Program in Electrical and Biomedical Engineering, Faculty of Electrical Engineering, Centre for Innovation and Technology Assessment in Health, Federal University of Uberlândia, Uberlândia, Brazil
| | - Guilherme Lopes Cavalheiro
- Postgraduate Program in Electrical and Biomedical Engineering, Faculty of Electrical Engineering, Centre for Innovation and Technology Assessment in Health, Federal University of Uberlândia, Uberlândia, Brazil
| | - Maria José Ferreira Zaruz
- Postgraduate Program in Electrical and Biomedical Engineering, Faculty of Electrical Engineering, Centre for Innovation and Technology Assessment in Health, Federal University of Uberlândia, Uberlândia, Brazil
| | - Daniel Hilário da Silva
- Postgraduate Program in Electrical and Biomedical Engineering, Faculty of Electrical Engineering, Centre for Innovation and Technology Assessment in Health, Federal University of Uberlândia, Uberlândia, Brazil
| | - Selma Terezinha Milagre
- Postgraduate Program in Electrical and Biomedical Engineering, Faculty of Electrical Engineering, Centre for Innovation and Technology Assessment in Health, Federal University of Uberlândia, Uberlândia, Brazil
| | - Adriano de Oliveira Andrade
- Postgraduate Program in Electrical and Biomedical Engineering, Faculty of Electrical Engineering, Centre for Innovation and Technology Assessment in Health, Federal University of Uberlândia, Uberlândia, Brazil
| | - Adriano Alves Pereira
- Postgraduate Program in Electrical and Biomedical Engineering, Faculty of Electrical Engineering, Centre for Innovation and Technology Assessment in Health, Federal University of Uberlândia, Uberlândia, Brazil.
| |
Collapse
|
45
|
Luo KH, Wu CH, Yang CC, Chen TH, Tu HP, Yang CH, Chuang HY. Exploring the association of metal mixture in blood to the kidney function and tumor necrosis factor alpha using machine learning methods. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2023; 265:115528. [PMID: 37783110 DOI: 10.1016/j.ecoenv.2023.115528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Revised: 09/09/2023] [Accepted: 09/25/2023] [Indexed: 10/04/2023]
Abstract
This research aimed to approach relationships between metal mixture in blood and kidney function, tumor necrosis factor alpha (TNF-α) by machine learning. Metals levels were measured by Inductively Couple Plasma Mass Spectrometry in blood from 421 participants. We applied K Nearest Neighbor (KNN), Naive Bayes classifier (NB), Support Vector Machines (SVM), random forest (RF), Gradient Boosting Decision Tree (GBDT), Categorical boosting (CatBoost), eXtreme Gradient Boosting (XGBoost), Whale Optimization-based XGBoost (WXGBoost) to identify the effect of plasma metals, TNF-α, and estimated glomerular filtration rate (eGFR by CKD-EPI equation). We conducted not only toxic metals, lead (Pb), arsenic (As), cadmium (Cd) but also included trace essential metals, selenium (Se), copper (Cu), zinc (Zn), cobalt (Co), to predict the interaction of TNF-α, TNF-α/white blood count, and eGFR. The high average TNF-α level group was observed among subjects with higher Pb, As, Cd, Cu, and Zn levels in blood. No associations were shown between the low and high TNF-α level group in blood Se and Co levels. Those with lower eGFR group had high Pb, As, Cd, Co, Cu, and Zn levels. The crucial predictor of TNF-α level in metals was blood Pb, and then Cd, As, Cu, Se, Zn and Co. The machine learning revealed that As was the major role among predictors of eGFR after feature selection. The levels of kidney function and TNF-α were modified by co-exposure metals. We were able to acquire highest accuracy of over 85% in the multi-metals exposure model. The higher Pb and Zn levels had strongest interaction with declined eGFR. In addition, As and Cd had synergistic with prediction model of TNF-α. We explored the potential of machine learning approaches for predicting health outcomes with multi-metal exposure. XGBoost model added SHAP could give an explicit explanation of individualized and precision risk prediction and insight of the interaction of key features in the multi-metal exposure.
Collapse
Affiliation(s)
- Kuei-Hau Luo
- Graduate Institute of Medicine, College of Medicine, Kaohsiung Medicine University, Kaohsiung City 807, Taiwan
| | - Chih-Hsien Wu
- Department of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung 80778, Taiwan
| | - Chen-Cheng Yang
- Graduate Institute of Medicine, College of Medicine, Kaohsiung Medicine University, Kaohsiung City 807, Taiwan; Department of Occupational Medicine, Kaohsiung Municipal Siaogang Hospital, Kaohsiung Medical University, Kaohsiung 812, Taiwan
| | - Tzu-Hua Chen
- Department of Family Medicine, Kaohsiung Municipal Ta-Tung Hospital, Kaohsiung 801, Taiwan
| | - Hung-Pin Tu
- Department of Public Health and Environmental Medicine, School of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung 807, Taiwan
| | - Cheng-Hong Yang
- Department of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung 80778, Taiwan; Department of Information Management, Tainan University of Technology, Tainan 71002, Taiwan; Drug Development and Value Creation Research Center, Kaohsiung Medical University, Kaohsiung 80708, Taiwan; Ph. D. Program in Biomedical Engineering, Kaohsiung Medical University, Kaohsiung 80708, Taiwan; School of Dentistry, Kaohsiung Medical University, Kaohsiung 80708, Taiwan
| | - Hung-Yi Chuang
- Graduate Institute of Medicine, College of Medicine, Kaohsiung Medicine University, Kaohsiung City 807, Taiwan; Department of Public Health and Environmental Medicine, School of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung 807, Taiwan; Department of Occupational and Environmental Medicine, Kaohsiung Medicine University Hospital, Kaohsiung Medicine University, Kaohsiung City 807, Taiwan; Ph.D. Program in Environmental and Occupational Medicine, and Research Center for Precision Environmental Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung 807, Taiwan.
| |
Collapse
|
46
|
Pacheco J, Saiz O, Casado S, Ubillos S. A multistart tabu search-based method for feature selection in medical applications. Sci Rep 2023; 13:17140. [PMID: 37816874 PMCID: PMC10564765 DOI: 10.1038/s41598-023-44437-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 10/08/2023] [Indexed: 10/12/2023] Open
Abstract
In the design of classification models, irrelevant or noisy features are often generated. In some cases, there may even be negative interactions among features. These weaknesses can degrade the performance of the models. Feature selection is a task that searches for a small subset of relevant features from the original set that generate the most efficient models possible. In addition to improving the efficiency of the models, feature selection confers other advantages, such as greater ease in the generation of the necessary data as well as clearer and more interpretable models. In the case of medical applications, feature selection may help to distinguish which characteristics, habits, and factors have the greatest impact on the onset of diseases. However, feature selection is a complex task due to the large number of possible solutions. In the last few years, methods based on different metaheuristic strategies, mainly evolutionary algorithms, have been proposed. The motivation of this work is to develop a method that outperforms previous methods, with the benefits that this implies especially in the medical field. More precisely, the present study proposes a simple method based on tabu search and multistart techniques. The proposed method was analyzed and compared to other methods by testing their performance on several medical databases. Specifically, eight databases belong to the well-known repository of the University of California in Irvine and one of our own design were used. In these computational tests, the proposed method outperformed other recent methods as gauged by various metrics and classifiers. The analyses were accompanied by statistical tests, the results of which showed that the superiority of our method is significant and therefore strengthened these conclusions. In short, the contribution of this work is the development of a method that, on the one hand, is based on different strategies than those used in recent methods, and on the other hand, improves the performance of these methods.
Collapse
|
47
|
Lee H, Lee Y, Jo M, Nam S, Jo J, Lee C. Enhancing Diagnosis of Rotating Elements in Roll-to-Roll Manufacturing Systems through Feature Selection Approach Considering Overlapping Data Density and Distance Analysis. SENSORS (BASEL, SWITZERLAND) 2023; 23:7857. [PMID: 37765913 PMCID: PMC10534779 DOI: 10.3390/s23187857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 09/01/2023] [Accepted: 09/11/2023] [Indexed: 09/29/2023]
Abstract
Roll-to-roll manufacturing systems have been widely adopted for their cost-effectiveness, eco-friendliness, and mass-production capabilities, utilizing thin and flexible substrates. However, in these systems, defects in the rotating components such as the rollers and bearings can result in severe defects in the functional layers. Therefore, the development of an intelligent diagnostic model is crucial for effectively identifying these rotating component defects. In this study, a quantitative feature-selection method, feature partial density, to develop high-efficiency diagnostic models was proposed. The feature combinations extracted from the measured signals were evaluated based on the partial density, which is the density of the remaining data excluding the highest class in overlapping regions and the Mahalanobis distance by class to assess the classification performance of the models. The validity of the proposed algorithm was verified through the construction of ranked model groups and comparison with existing feature-selection methods. The high-ranking group selected by the algorithm outperformed the other groups in terms of training time, accuracy, and positive predictive value. Moreover, the top feature combination demonstrated superior performance across all indicators compared to existing methods.
Collapse
Affiliation(s)
- Haemi Lee
- Department of Mechanical Design and Production Engineering, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05030, Republic of Korea
| | - Yoonjae Lee
- Department of Mechanical Design and Production Engineering, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05030, Republic of Korea
| | - Minho Jo
- Department of Mechanical Design and Production Engineering, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05030, Republic of Korea
| | - Sanghoon Nam
- Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Jeongdai Jo
- Department of Printed Electronics, Korea Institute of Machinery and Materials, 156, Gajeongbuk-ro, Yuseong-gu, Daejeon 34103, Republic of Korea
| | - Changwoo Lee
- Department of Mechanical and Aerospace Engineering, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05030, Republic of Korea
| |
Collapse
|
48
|
Guo B, Liu H, Niu L. Integration of natural and deep artificial cognitive models in medical images: BERT-based NER and relation extraction for electronic medical records. Front Neurosci 2023; 17:1266771. [PMID: 37732304 PMCID: PMC10507183 DOI: 10.3389/fnins.2023.1266771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 08/14/2023] [Indexed: 09/22/2023] Open
Abstract
Introduction Medical images and signals are important data sources in the medical field, and they contain key information such as patients' physiology, pathology, and genetics. However, due to the complexity and diversity of medical images and signals, resulting in difficulties in medical knowledge acquisition and decision support. Methods In order to solve this problem, this paper proposes an end-to-end framework based on BERT for NER and RE tasks in electronic medical records. Our framework first integrates NER and RE tasks into a unified model, adopting an end-to-end processing manner, which removes the limitation and error propagation of multiple independent steps in traditional methods. Second, by pre-training and fine-tuning the BERT model on large-scale electronic medical record data, we enable the model to obtain rich semantic representation capabilities that adapt to the needs of medical fields and tasks. Finally, through multi-task learning, we enable the model to make full use of the correlation and complementarity between NER and RE tasks, and improve the generalization ability and effect of the model on different data sets. Results and discussion We conduct experimental evaluation on four electronic medical record datasets, and the model significantly out performs other methods on different datasets in the NER task. In the RE task, the EMLB model also achieved advantages on different data sets, especially in the multi-task learning mode, its performance has been significantly improved, and the ETE and MTL modules performed well in terms of comprehensive precision and recall. Our research provides an innovative solution for medical image and signal data.
Collapse
Affiliation(s)
- Bo Guo
- School of Computer and Information Engineering, Fuyang Normal University, Fuyang, China
- Department of Computing, Faculty of Communication, Visual Art and Computing, Universiti Selangor, Bestari Jaya, Selangor, Malaysia
| | - Huaming Liu
- School of Computer and Information Engineering, Fuyang Normal University, Fuyang, China
| | - Lei Niu
- School of Computer and Information Engineering, Fuyang Normal University, Fuyang, China
| |
Collapse
|
49
|
Mahmoud AY, Neagu D, Scrimieri D, Abdullatif ARA. Early diagnosis and personalised treatment focusing on synthetic data modelling: Novel visual learning approach in healthcare. Comput Biol Med 2023; 164:107295. [PMID: 37557053 DOI: 10.1016/j.compbiomed.2023.107295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 07/26/2023] [Accepted: 07/28/2023] [Indexed: 08/11/2023]
Abstract
The early diagnosis and personalised treatment of diseases are facilitated by machine learning. The quality of data has an impact on diagnosis because medical data are usually sparse, imbalanced, and contain irrelevant attributes, resulting in suboptimal diagnosis. To address the impacts of data challenges, improve resource allocation, and achieve better health outcomes, a novel visual learning approach is proposed. This study contributes to the visual learning approach by determining whether less or more synthetic data are required to improve the quality of a dataset, such as the number of observations and features, according to the intended personalised treatment and early diagnosis. In addition, numerous visualisation experiments are conducted, including using statistical characteristics, cumulative sums, histograms, correlation matrix, root mean square error, and principal component analysis in order to visualise both original and synthetic data to address the data challenges. Real medical datasets for cancer, heart disease, diabetes, cryotherapy and immunotherapy are selected as case studies. As a benchmark and point of classification comparison in terms of such as accuracy, sensitivity, and specificity, several models are implemented such as k-Nearest Neighbours and Random Forest. To simulate algorithm implementation and data, Generative Adversarial Network is used to create and manipulate synthetic data, whilst, Random Forest is implemented to classify the data. An amendable and adaptable system is constructed by combining Generative Adversarial Network and Random Forest models. The system model presents working steps, overview and flowchart. Experiments reveal that the majority of data-enhancement scenarios allow for the application of visual learning in the first stage of data analysis as a novel approach. To achieve meaningful adaptable synergy between appropriate quality data and optimal classification performance while maintaining statistical characteristics, visual learning provides researchers and practitioners with practical human-in-the-loop machine learning visualisation tools. Prior to implementing algorithms, the visual learning approach can be used to actualise early, and personalised diagnosis. For the immunotherapy data, the Random Forest performed best with precision, recall, f-measure, accuracy, sensitivity, and specificity of 81%, 82%, 81%, 88%, 95%, and 60%, as opposed to 91%, 96%, 93%, 93%, 96%, and 73% for synthetic data, respectively. Future studies might examine the optimal strategies to balance the quantity and quality of medical data.
Collapse
Affiliation(s)
- Ahsanullah Yunas Mahmoud
- Faculty of Engineering and Informatics, University of Bradford, Bradford, England, United Kingdom.
| | - Daniel Neagu
- Faculty of Engineering and Informatics, University of Bradford, Bradford, England, United Kingdom
| | - Daniele Scrimieri
- Faculty of Engineering and Informatics, University of Bradford, Bradford, England, United Kingdom
| | | |
Collapse
|
50
|
Munir N, McMorrow R, Mulrennan K, Whitaker D, McLoone S, Kellomäki M, Talvitie E, Lyyra I, McAfee M. Interpretable Machine Learning Methods for Monitoring Polymer Degradation in Extrusion of Polylactic Acid. Polymers (Basel) 2023; 15:3566. [PMID: 37688192 PMCID: PMC10489772 DOI: 10.3390/polym15173566] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 08/17/2023] [Accepted: 08/24/2023] [Indexed: 09/10/2023] Open
Abstract
This work investigates real-time monitoring of extrusion-induced degradation in different grades of PLA across a range of process conditions and machine set-ups. Data on machine settings together with in-process sensor data, including temperature, pressure, and near-infrared (NIR) spectra, are used as inputs to predict the molecular weight and mechanical properties of the product. Many soft sensor approaches based on complex spectral data are essentially 'black-box' in nature, which can limit industrial acceptability. Hence, the focus here is on identifying an optimal approach to developing interpretable models while achieving high predictive accuracy and robustness across different process settings. The performance of a Recursive Feature Elimination (RFE) approach was compared to more common dimension reduction and regression approaches including Partial Least Squares (PLS), iterative PLS (i-PLS), Principal Component Regression (PCR), ridge regression, Least Absolute Shrinkage and Selection Operator (LASSO), and Random Forest (RF). It is shown that for medical-grade PLA processed under moisture-controlled conditions, accurate prediction of molecular weight is possible over a wide range of process conditions and different machine settings (different nozzle types for downstream fibre spinning) with an RFE-RF algorithm. Similarly, for the prediction of yield stress, RFE-RF achieved excellent predictive performance, outperforming the other approaches in terms of simplicity, interpretability, and accuracy. The features selected by the RFE model provide important insights to the process. It was found that change in molecular weight was not an important factor affecting the mechanical properties of the PLA, which is primarily related to the pressure and temperature at the latter stages of the extrusion process. The temperature at the extruder exit was also the most important predictor of degradation of the polymer molecular weight, highlighting the importance of accurate melt temperature control in the process. RFE not only outperforms more established methods as a soft sensor method, but also has significant advantages in terms of computational efficiency, simplicity, and interpretability. RFE-based soft sensors are promising for better quality control in processing thermally sensitive polymers such as PLA, in particular demonstrating for the first time the ability to monitor molecular weight degradation during processing across various machine settings.
Collapse
Affiliation(s)
- Nimra Munir
- Centre for Mathematical Modelling and Intelligent Systems for Health and Environment (MISHE), Atlantic Technological University, ATU Sligo, Ash Lane, F91 YW50 Sligo, Ireland;
- Centre for Precision Engineering, Materials and Manufacturing (PEM Centre), Atlantic Technological University, ATU Sligo, Ash Lane, F91 YW50 Sligo, Ireland
| | - Ross McMorrow
- Department of Mechatronic Engineering, Atlantic Technological University, ATU Sligo, Ash Lane, F91 YW50 Sligo, Ireland;
| | - Konrad Mulrennan
- Centre for Mathematical Modelling and Intelligent Systems for Health and Environment (MISHE), Atlantic Technological University, ATU Sligo, Ash Lane, F91 YW50 Sligo, Ireland;
- Centre for Precision Engineering, Materials and Manufacturing (PEM Centre), Atlantic Technological University, ATU Sligo, Ash Lane, F91 YW50 Sligo, Ireland
| | - Darren Whitaker
- Perceptive Engineering-An Applied Materials Company, Keckwick Lane, Daresbury WA4 4AB, UK;
| | - Seán McLoone
- Centre for Intelligent Autonomous Manufacturing Systems, Queen’s University Belfast, Belfast BT7 1NN, UK;
| | - Minna Kellomäki
- Biomaterials and Tissue Engineering Group, Faculty of Medicine and Health Technology, BioMediTech, Tampere University, 33720 Tampere, Finland; (M.K.); (E.T.); (I.L.)
| | - Elina Talvitie
- Biomaterials and Tissue Engineering Group, Faculty of Medicine and Health Technology, BioMediTech, Tampere University, 33720 Tampere, Finland; (M.K.); (E.T.); (I.L.)
| | - Inari Lyyra
- Biomaterials and Tissue Engineering Group, Faculty of Medicine and Health Technology, BioMediTech, Tampere University, 33720 Tampere, Finland; (M.K.); (E.T.); (I.L.)
| | - Marion McAfee
- Centre for Mathematical Modelling and Intelligent Systems for Health and Environment (MISHE), Atlantic Technological University, ATU Sligo, Ash Lane, F91 YW50 Sligo, Ireland;
- Centre for Precision Engineering, Materials and Manufacturing (PEM Centre), Atlantic Technological University, ATU Sligo, Ash Lane, F91 YW50 Sligo, Ireland
| |
Collapse
|