51
|
Cao C, Zhang T, Xin T. The effect of reading engagement on scientific literacy - an analysis based on the XGBoost method. Front Psychol 2024; 15:1329724. [PMID: 38420178 PMCID: PMC10899671 DOI: 10.3389/fpsyg.2024.1329724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Accepted: 01/22/2024] [Indexed: 03/02/2024] Open
Abstract
Scientific literacy is a key factor of personal competitiveness, and reading is the most common activity in daily learning life, and playing the influence of reading on individuals day by day is the most convenient way to improve the level of scientific literacy of all people. Reading engagement is one of the important student characteristics related to reading literacy, which is highly malleable and is jointly reflected by behavioral, cognitive, and affective engagement, and it is of theoretical and practical significance to explore the relationship between reading engagement and scientific literacy using reading engagement as an entry point. In this study, we used PISA2018 data from China to explore the relationship between reading engagement and scientific literacy with a sample of 15-year-old students in mainland China. 36 variables related to reading engagement and background variables (gender, grade, and socioeconomic and cultural status of the family) were selected from the questionnaire as the independent variables, and the score of the Scientific Literacy Assessment (SLA) was taken as the outcome variable, and supervised machine learning method, the XGBoost algorithm, to construct the model. The dataset is randomly divided into training set and test set to optimize the model, which can verify that the obtained model has good fitting degree and generalization ability. Meanwhile, global and local personalized interpretation is done by introducing the SHAP value, a cutting-edge machine model interpretation method. It is found that among the three major components of reading engagement, cognitive engagement is the more influential factor, and students with high reading cognitive engagement level are more likely to get high scores in scientific literacy assessment, which is relatively dominant in the model of this study. On the other hand, this study verifies the feasibility of the current popular machine learning model, i.e., XGBoost, in a large-scale international education assessment program, with a better model adaptability and conditions for global and local interpretation.
Collapse
Affiliation(s)
| | | | - Tao Xin
- Collaborative Innovation Center of Assessment for Basic Education Quality, Beijing Normal University, Beijing, China
| |
Collapse
|
52
|
Radhakrishnan BL, Ezra K, Jebadurai IJ, Selvakumar I, Karthikeyan P. An Autonomous Sleep-Stage Detection Technique in Disruptive Technology Environment. Sensors (Basel) 2024; 24:1197. [PMID: 38400354 PMCID: PMC10892786 DOI: 10.3390/s24041197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 02/07/2024] [Accepted: 02/08/2024] [Indexed: 02/25/2024]
Abstract
Autonomous sleep tracking at home has become inevitable in today's fast-paced world. A crucial aspect of addressing sleep-related issues involves accurately classifying sleep stages. This paper introduces a novel approach PSO-XGBoost, combining particle swarm optimisation (PSO) with extreme gradient boosting (XGBoost) to enhance the XGBoost model's performance. Our model achieves improved overall accuracy and faster convergence by leveraging PSO to fine-tune hyperparameters. Our proposed model utilises features extracted from EEG signals, spanning time, frequency, and time-frequency domains. We employed the Pz-oz signal dataset from the sleep-EDF expanded repository for experimentation. Our model achieves impressive metrics through stratified-K-fold validation on ten selected subjects: 95.4% accuracy, 95.4% F1-score, 95.4% precision, and 94.3% recall. The experiment results demonstrate the effectiveness of our technique, showcasing an average accuracy of 95%, outperforming traditional machine learning classifications. The findings revealed that the feature-shifting approach supplements the classification outcome by 3 to 4 per cent. Moreover, our findings suggest that prefrontal EEG derivations are ideal options and could open up exciting possibilities for using wearable EEG devices in sleep monitoring. The ease of obtaining EEG signals with dry electrodes on the forehead enhances the feasibility of this application. Furthermore, the proposed method demonstrates computational efficiency and holds significant value for real-time sleep classification applications.
Collapse
Affiliation(s)
- Baskaran Lizzie Radhakrishnan
- Department of Computer Science and Engineering, Karunya Institute of Technology and Sciences, Coimbatore 641114, India; (B.L.R.); (I.J.J.)
| | - Kirubakaran Ezra
- Department of Computer Science and Engineering, Grace College of Engineering, Coimbatore 628005, India;
| | - Immanuel Johnraja Jebadurai
- Department of Computer Science and Engineering, Karunya Institute of Technology and Sciences, Coimbatore 641114, India; (B.L.R.); (I.J.J.)
| | - Immanuel Selvakumar
- Department of Electrical and Electronics Engineering, Karunya Institute of Technology and Sciences, Coimbatore 641114, India;
| | | |
Collapse
|
53
|
Navratil G, Giannopoulos I. Classifying Motorcyclist Behaviour with XGBoost Based on IMU Data. Sensors (Basel) 2024; 24:1042. [PMID: 38339759 PMCID: PMC10857319 DOI: 10.3390/s24031042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 01/31/2024] [Accepted: 02/01/2024] [Indexed: 02/12/2024]
Abstract
Human behaviour detection is relevant in many fields. During navigational tasks it is an indicator for environmental conditions. Therefore, monitoring people while they move along the street network provides insights on the environment. This is especially true for motorcyclists, who have to observe aspects such as road surface conditions or traffic very careful. We thus performed an experiment to check whether IMU data is sufficient to classify motorcyclist behaviour as a data source for later spatial and temporal analysis. The classification was done using XGBoost and proved successful for four out of originally five different types of behaviour. A classification accuracy of approximately 80% was achieved. Only overtake manoeuvrers were not identified reliably.
Collapse
Affiliation(s)
- Gerhard Navratil
- Department for Geodesy and Geoinformation, TU Wien, Wiedner Hauptstr. 8-10, 1040 Vienna, Austria;
| | | |
Collapse
|
54
|
Zheng Z, Liang L, Luo X, Chen J, Lin M, Wang G, Xue C. Diagnosing and tracking depression based on eye movement in response to virtual reality. Front Psychiatry 2024; 15:1280935. [PMID: 38374979 PMCID: PMC10875075 DOI: 10.3389/fpsyt.2024.1280935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 01/16/2024] [Indexed: 02/21/2024] Open
Abstract
Introduction Depression is a prevalent mental illness that is primarily diagnosed using psychological and behavioral assessments. However, these assessments lack objective and quantitative indices, making rapid and objective detection challenging. In this study, we propose a novel method for depression detection based on eye movement data captured in response to virtual reality (VR). Methods Eye movement data was collected and used to establish high-performance classification and prediction models. Four machine learning algorithms, namely eXtreme Gradient Boosting (XGBoost), multilayer perceptron (MLP), Support Vector Machine (SVM), and Random Forest, were employed. The models were evaluated using five-fold cross-validation, and performance metrics including accuracy, precision, recall, area under the curve (AUC), and F1-score were assessed. The predicted error for the Patient Health Questionnaire-9 (PHQ-9) score was also determined. Results The XGBoost model achieved a mean accuracy of 76%, precision of 94%, recall of 73%, and AUC of 82%, with an F1-score of 78%. The MLP model achieved a classification accuracy of 86%, precision of 96%, recall of 91%, and AUC of 86%, with an F1-score of 92%. The predicted error for the PHQ-9 score ranged from -0.6 to 0.6.To investigate the role of computerized cognitive behavioral therapy (CCBT) in treating depression, participants were divided into intervention and control groups. The intervention group received CCBT, while the control group received no treatment. After five CCBT sessions, significant changes were observed in the eye movement indices of fixation and saccade, as well as in the PHQ-9 scores. These two indices played significant roles in the predictive model, indicating their potential as biomarkers for detecting depression symptoms. Discussion The results suggest that eye movement indices obtained using a VR eye tracker can serve as useful biomarkers for detecting depression symptoms. Specifically, the fixation and saccade indices showed promise in predicting depression. Furthermore, CCBT demonstrated effectiveness in treating depression, as evidenced by the observed changes in eye movement indices and PHQ-9 scores. In conclusion, this study presents a novel approach for depression detection using eye movement data captured in VR. The findings highlight the potential of eye movement indices as biomarkers and underscore the effectiveness of CCBT in treating depression.
Collapse
Affiliation(s)
- Zhiguo Zheng
- School of Information and Communication Engineering, Hainan University, Haikou, China
- School of Information Engineering, Hainan Vocational University of Science and Technology, Haikou, China
| | - Lijuan Liang
- The First Affiliated Hospital of Hainan Medical University, Haikou, China
| | - Xiong Luo
- Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
| | - Jie Chen
- School of Information Engineering, Hainan Vocational University of Science and Technology, Haikou, China
| | - Meirong Lin
- School of Information Engineering, Hainan Vocational University of Science and Technology, Haikou, China
| | - Guanjun Wang
- School of Electronic Science and Technology, Hainan University, Haikou, China
| | - Chenyang Xue
- School of Electronic Science and Technology, Hainan University, Haikou, China
| |
Collapse
|
55
|
Joe H, Kim HG. Multi-label classification with XGBoost for metabolic pathway prediction. BMC Bioinformatics 2024; 25:52. [PMID: 38297220 PMCID: PMC10832249 DOI: 10.1186/s12859-024-05666-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 01/22/2024] [Indexed: 02/02/2024] Open
Abstract
BACKGROUND Metabolic pathway prediction is one possible approach to address the problem in system biology of reconstructing an organism's metabolic network from its genome sequence. Recently there have been developments in machine learning-based pathway prediction methods that conclude that machine learning-based approaches are similar in performance to the most used method, PathoLogic which is a rule-based method. One issue is that previous studies evaluated PathoLogic without taxonomic pruning which decreases its performance. RESULTS In this study, we update the evaluation results from previous studies to demonstrate that PathoLogic with taxonomic pruning outperforms previous machine learning-based approaches and that further improvements in performance need to be made for them to be competitive. Furthermore, we introduce mlXGPR, a XGBoost-based metabolic pathway prediction method based on the multi-label classification pathway prediction framework introduced from mlLGPR. We also improve on this multi-label framework by utilizing correlations between labels using classifier chains. We propose a ranking method that determines the order of the chain so that lower performing classifiers are placed later in the chain to utilize the correlations between labels more. We evaluate mlXGPR with and without classifier chains on single-organism and multi-organism benchmarks. Our results indicate that mlXGPR outperform other previous pathway prediction methods including PathoLogic with taxonomic pruning in terms of hamming loss, precision and F1 score on single organism benchmarks. CONCLUSIONS The results from our study indicate that the performance of machine learning-based pathway prediction methods can be substantially improved and can even outperform PathoLogic with taxonomic pruning.
Collapse
Affiliation(s)
- Hyunwhan Joe
- Biomedical Knowledge Engineering Lab., Seoul National University, Seoul, Republic of Korea
| | - Hong-Gee Kim
- Biomedical Knowledge Engineering Lab., Seoul National University, Seoul, Republic of Korea.
- School of Dentistry and Dental Research Institute, Seoul National University, Seoul, Republic of Korea.
| |
Collapse
|
56
|
Lei L, Zhang L, Han Z, Chen Q, Liao P, Wu D, Tai J, Xie B, Su Y. Advancing chronic toxicity risk assessment in freshwater ecology by molecular characterization-based machine learning. Environ Pollut 2024; 342:123093. [PMID: 38072027 DOI: 10.1016/j.envpol.2023.123093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/30/2023] [Accepted: 12/02/2023] [Indexed: 01/26/2024]
Abstract
The continuously increased production of various chemicals and their release into environments have raised potential negative effects on ecological health. However, traditional labor-intensive assessment methods cannot effectively and rapidly evaluate these hazards, especially for chronic risk. In this study, machine learning (ML) was employed to construct quantitative structure-activity relationship (QSAR) models, enabling the prediction of chronic toxicity to aquatic organisms by leveraging the molecular characteristics of pollutants, namely, the molecular descriptors, fingerprints, and graphs. The limited dataset size hindered the notable advantages of the graph attention network (GAT) model for the molecular graphs. Considering computational efficiency and performance (R2 = 0.78; RMSE = 0.77), XGBoost (XGB) was used for reliable QSAR-ML models predicting chronic toxicity using small- or medium-sized tabular data and the molecular descriptors. Further kernel density estimation analysis confirmed the high accuracy of the model for pollutant concentrations ranging from 10-3 to 102 mg/L, effectively aligning with most environmental scenarios. Model interpretation showed SlogP and exposure duration as the primary influential factors. SlogP, representing the distribution coefficient of a molecule between lipophilic and hydrophilic environments, had a negative effect on the toxicity outcomes. Additionally, the exposure duration played a crucial role in determining the chronic toxicity. Finally, the chronic toxicity data of bisphenol A validated the robustness and reliability of the model established in this research. Our study provided a robust and feasible methodology for chronic ecological risk evaluation of various types of pollutants and could facilitate and increase the use of ML applications in environmental fields.
Collapse
Affiliation(s)
- Lang Lei
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
| | - Liangmao Zhang
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
| | - Zhibang Han
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
| | - Qirui Chen
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
| | - Pengcheng Liao
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
| | - Dong Wu
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China; Chongqing Key Laboratory of Precision Optics, Chongqing Institute of East China Normal University, Chongqing, 401120, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai, 200092, China
| | - Jun Tai
- Shanghai Environmental Sanitation Engineering Design Institute Co., Ltd., Shanghai, 200232, China
| | - Bing Xie
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai, 200092, China
| | - Yinglong Su
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China; Chongqing Key Laboratory of Precision Optics, Chongqing Institute of East China Normal University, Chongqing, 401120, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai, 200092, China.
| |
Collapse
|
57
|
Tao Q, Wu L, An J, Liu Z, Zhang K, Zhou L, Zhang X. Proteomic analysis of human aqueous humor from fuchs uveitis syndrome. Exp Eye Res 2024; 239:109752. [PMID: 38123010 DOI: 10.1016/j.exer.2023.109752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 11/25/2023] [Accepted: 12/11/2023] [Indexed: 12/23/2023]
Abstract
Fuchs uveitis syndrome (FUS) is a commonly misdiagnosed uveitis syndrome often presenting as an asymptomatic mild inflammatory condition until complications arise. The diagnosis of this disease remains clinical because of the lack of specific laboratory tests. The aqueous humor (AH) is a complex fluid containing nutrients and metabolic wastes from the eye. Changes in the AH protein provide important information for diagnosing intraocular diseases. This study aimed to analyze the proteomic profile of AH in individuals diagnosed with FUS and to identify potential biomarkers of the disease. We used liquid chromatography-tandem mass spectrometry-based proteomic methods to evaluate the AH protein profiles of all 37 samples, comprising 15 patients with FUS, six patients with Posner-Schlossman syndrome (PSS), and 16 patients with age-related cataract. A total of 538 proteins were identified from a comprehensive spectral library of 634 proteins. Subsequent differential expression analysis, enrichment analysis, and construction of key sub-networks revealed that the inflammatory response, complement activation and hypoxia might be crucial in mediating the process of FUS. The hypoxia inducible factor-1 may serve as a key regulator and therapeutic target. Additionally, the innate and adaptive immune responses are considered dominant in the patients with FUS. A diagnostic model was constructed using machine-learning algorithm to classify FUS, PSS, and normal controls. Two proteins, complement C1q subcomponent subunit B and secretogranin-1, were found to have the highest scores by the Extreme Gradient Boosting, suggesting their potential utility as a biomarker panel. Furthermore, these two proteins as biomarkers were validated in a cohort of 18 patients using high resolution multiple reaction monitoring assays. Therefore, this study contributes to advancing of the current knowledge of FUS pathogenesis and promotes the development of effective diagnostic strategies.
Collapse
Affiliation(s)
- Qingqin Tao
- Tianjin Key Laboratory of Retinal Functions and Diseases, Tianjin Branch of National Clinical Research Center for Ocular Disease, Eye Institute and School of Optometry, Tianjin Medical University Eye Hospital, Tianjin, China
| | - Lingzi Wu
- Tianjin Key Laboratory of Retinal Functions and Diseases, Tianjin Branch of National Clinical Research Center for Ocular Disease, Eye Institute and School of Optometry, Tianjin Medical University Eye Hospital, Tianjin, China; Beijing Institute of Ophthalmology, Beijing Tongren Eye Center, Beijing Tongren Hospital, Capital Medical University, Beijing, China
| | - Jinying An
- Tianjin Key Laboratory of Retinal Functions and Diseases, Tianjin Branch of National Clinical Research Center for Ocular Disease, Eye Institute and School of Optometry, Tianjin Medical University Eye Hospital, Tianjin, China
| | | | - Kai Zhang
- The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Key Laboratory of Immune Microenvironment and Disease (Ministry of Education), Tianjin Key Laboratory of Medical Epigenetics, Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Lei Zhou
- School of Optometry, Department of Applied Biology and Chemical Technology, Research Centre for SHARP Vision (RCSV), The Hong Kong Polytechnic University, Hong Kong, China; Centre for Eye and Vision Research (CEVR), 17W Hong Kong Science Park, Hong Kong, China
| | - Xiaomin Zhang
- Tianjin Key Laboratory of Retinal Functions and Diseases, Tianjin Branch of National Clinical Research Center for Ocular Disease, Eye Institute and School of Optometry, Tianjin Medical University Eye Hospital, Tianjin, China.
| |
Collapse
|
58
|
Alabi RO, Almangush A, Elmusrati M, Leivo I, Mäkitie AA. Interpretable machine learning model for prediction of overall survival in laryngeal cancer. Acta Otolaryngol 2024:1-7. [PMID: 38279817 DOI: 10.1080/00016489.2023.2301648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 12/21/2023] [Indexed: 01/29/2024]
Abstract
Background: The mortality rates of laryngeal squamous cell carcinoma cancer (LSCC) have not significantly decreased in the last decades.Objectives: We primarily aimed to compare the predictive performance of DeepTables with the state-of-the-art machine learning (ML) algorithms (Voting ensemble, Stack ensemble, and XGBoost) to stratify patients with LSCC into chance of overall survival (OS). In addition, we complemented the developed model by providing interpretability using both global and local model-agnostic techniques.Methods: A total of 2792 patients in the Surveillance, Epidemiology, and End Results (SEER) database diagnosed with LSCC were reviewed. The global model-agnostic interpretability was examined using SHapley Additive exPlanations (SHAP) technique. Likewise, individual interpretation of the prediction was made using Local Interpretable Model Agnostic Explanations (LIME).Results: The state-of-the-art ML ensemble algorithms outperformed DeepTables. Specifically, the examined ensemble algorithms showed comparable weighted area under receiving curve of 76.9, 76.8, and 76.1 with an accuracy of 71.2%, 70.2%, and 71.8%, respectively. The global methods of interpretability (SHAP) demonstrated that the age of the patient at diagnosis, N-stage, T-stage, tumor grade, and marital status are among the prominent parameters.Conclusions: A ML model for OS prediction may serve as an ancillary tool for treatment planning of LSCC patients.
Collapse
Affiliation(s)
- Rasheed Omobolaji Alabi
- Research Program in Systems Oncology, University of Helsinki, Helsinki, Finland
- Department of Industrial Digitalization, School of Technology and Innovations, University of Vaasa, Vaasa, Finland
| | - Alhadi Almangush
- Research Program in Systems Oncology, University of Helsinki, Helsinki, Finland
- Department of Pathology, University of Helsinki, Helsinki, Finland
- Institute of Biomedicine, University of Turku, Pathology, Finland
| | - Mohammed Elmusrati
- Department of Industrial Digitalization, School of Technology and Innovations, University of Vaasa, Vaasa, Finland
| | - Ilmo Leivo
- Institute of Biomedicine, University of Turku, Pathology, Finland
| | - Antti A Mäkitie
- Research Program in Systems Oncology, University of Helsinki, Helsinki, Finland
- Department of Otorhinolaryngology - Head and Neck Surgery, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
- Division of Ear, Nose and Throat Diseases, Department of Clinical Sciences, Intervention and Technology, Karolinska Institute and Karolinska University Hospital, Stockholm, Sweden
| |
Collapse
|
59
|
Liu SH, Ting CE, Wang JJ, Chang CJ, Chen W, Sharma AK. Estimation of Gait Parameters for Adults with Surface Electromyogram Based on Machine Learning Models. Sensors (Basel) 2024; 24:734. [PMID: 38339451 PMCID: PMC10857519 DOI: 10.3390/s24030734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 01/18/2024] [Accepted: 01/22/2024] [Indexed: 02/12/2024]
Abstract
Gait analysis has been studied over the last few decades as the best way to objectively assess the technical outcome of a procedure designed to improve gait. The treating physician can understand the type of gait problem, gain insight into the etiology, and find the best treatment with gait analysis. The gait parameters are the kinematics, including the temporal and spatial parameters, and lack the activity information of skeletal muscles. Thus, the gait analysis measures not only the three-dimensional temporal and spatial graphs of kinematics but also the surface electromyograms (sEMGs) of the lower limbs. Now, the shoe-worn GaitUp Physilog® wearable inertial sensors can easily measure the gait parameters when subjects are walking on the general ground. However, it cannot measure muscle activity. The aim of this study is to measure the gait parameters using the sEMGs of the lower limbs. A self-made wireless device was used to measure the sEMGs from the vastus lateralis and gastrocnemius muscles of the left and right feet. Twenty young female subjects with a skeletal muscle index (SMI) below 5.7 kg/m2 were recruited for this study and examined by the InBody 270 instrument. Four parameters of sEMG were used to estimate 23 gait parameters. They were measured using the GaitUp Physilog® wearable inertial sensors with three machine learning models, including random forest (RF), decision tree (DT), and XGBoost. The results show that 14 gait parameters could be well-estimated, and their correlation coefficients are above 0.800. This study signifies a step towards a more comprehensive analysis of gait with only sEMGs.
Collapse
Affiliation(s)
- Shing-Hong Liu
- Department of Computer Science and Information Engineering, Chaoyang University of Technology, Taichung City 41349, Taiwan; (S.-H.L.); (C.-E.T.)
| | - Chi-En Ting
- Department of Computer Science and Information Engineering, Chaoyang University of Technology, Taichung City 41349, Taiwan; (S.-H.L.); (C.-E.T.)
| | - Jia-Jung Wang
- Department of Biomedical Engineering, I-Shou University, Kaohsiung 82445, Taiwan
| | - Chun-Ju Chang
- Department of Golden-Ager Industry Management, Chaoyang University of Technology, Taichung City 41349, Taiwan;
| | - Wenxi Chen
- Division of Information Systems, School of Computer Science and Engineering, The University of Aizu, Aizu-Wakamatsu City 965-8580, Fukushima, Japan;
| | - Alok Kumar Sharma
- Department of Computer Science and Information Engineering, Chaoyang University of Technology, Taichung City 41349, Taiwan; (S.-H.L.); (C.-E.T.)
| |
Collapse
|
60
|
Wang H, Tao Q, Zhang X. Ensemble Learning Method for the Continuous Decoding of Hand Joint Angles. Sensors (Basel) 2024; 24:660. [PMID: 38276352 DOI: 10.3390/s24020660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Revised: 01/16/2024] [Accepted: 01/18/2024] [Indexed: 01/27/2024]
Abstract
Human-machine interface technology is fundamentally constrained by the dexterity of motion decoding. Simultaneous and proportional control can greatly improve the flexibility and dexterity of smart prostheses. In this research, a new model using ensemble learning to solve the angle decoding problem is proposed. Ultimately, seven models for angle decoding from surface electromyography (sEMG) signals are designed. The kinematics of five angles of the metacarpophalangeal (MCP) joints are estimated using the sEMG recorded during functional tasks. The estimation performance was evaluated through the Pearson correlation coefficient (CC). In this research, the comprehensive model, which combines CatBoost and LightGBM, is the best model for this task, whose average CC value and RMSE are 0.897 and 7.09. The mean of the CC and the mean of the RMSE for all the test scenarios of the subjects' dataset outperform the results of the Gaussian process model, with significant differences. Moreover, the research proposed a whole pipeline that uses ensemble learning to build a high-performance angle decoding system for the hand motion recognition task. Researchers or engineers in this field can quickly find the most suitable ensemble learning model for angle decoding through this process, with fewer parameters and fewer training data requirements than traditional deep learning models. In conclusion, the proposed ensemble learning approach has the potential for simultaneous and proportional control (SPC) of future hand prostheses.
Collapse
Affiliation(s)
- Hai Wang
- School of Mechanical Engineering, Xinjiang University, Urumqi 830017, China
| | - Qing Tao
- School of Mechanical Engineering, Xinjiang University, Urumqi 830017, China
| | - Xiaodong Zhang
- School of Mechanical Engineering, Xinjiang University, Urumqi 830017, China
- Shaanxi Key Laboratory of Intelligent Robot, Xi'an Jiaotong University, Xi'an 710049, China
| |
Collapse
|
61
|
Zhang Y, Ma Y, Wang J, Guan Q, Yu B. Construction and validation of a clinical prediction model for deep vein thrombosis in patients with digestive system tumors based on a machine learning. Am J Cancer Res 2024; 14:155-168. [PMID: 38323284 PMCID: PMC10839316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 12/13/2023] [Indexed: 02/08/2024] Open
Abstract
This study developed a deep vein thrombosis (DVT) risk prediction model based on multiple machine learning methods for patients with digestive system tumors undergoing surgical treatment. Data of 1048 patients with digestive system tumors admitted to Shanxi Provincial People's Hospital (College of Shanxi Medical University) from January 2020 to January 2023 were retrospectively analyzed, and 845 cases were screened according to the inclusion and exclusion criteria. The patients were divided into a training group (586 patients), and a validation group (259 patients), then feature selection was performed using six models, including Lasso regression, XGBoost, Random Forest, Decision Tree, Support Vector Machine, and Logistics. Predictive models were subsequently constructed from column-line plots, and the predictive validity of the models was assessed using receiver operating characteristic curves, precision-recall curves, and decision-curve analysis. In the model comparison, the XGBoost model showed the largest area under the curve (AUC) on the validation set (P < 0.05), demonstrating excellent predictive performance and generalization ability. We selected the common characteristic factors in the six models to further develop the column line plots to assess the DVT risk. The model performed well in clinical validation and effectively differentiated high-risk and low-risk patients. The differences in BMI, procedure time, and D-dimer were statistically significant between patients in the thrombus group and those in the non-thrombus group (P < 0.05). However, the AUC of the Xgboost model was found to be greater than that of the column chart model by the Delong test (P < 0.05). BMI, procedure time, and D-dimer are critical predictors of DVT risk in patients with digestive system tumors. Our model is an adequate assessment tool for DVT risk, which can help improve the prevention and treatment of DVT.
Collapse
Affiliation(s)
- Yunfeng Zhang
- Department of Vascular Surgery, Shanxi Provincial People’s Hospital (The Fifth Clinical Medical School of Shanxi Medical University)No. 29 Shuangtasi Street, Taiyuan 030012, Shanxi, China
| | - Yongqi Ma
- Shanxi University of Chinese MedicineNo. 121 Daxue Street, Yuci District, Jinzhong 030619, Shanxi, China
| | - Jie Wang
- Department of Vascular Surgery, Shanxi Provincial People’s Hospital (The Fifth Clinical Medical School of Shanxi Medical University)No. 29 Shuangtasi Street, Taiyuan 030012, Shanxi, China
| | - Qiang Guan
- Department of Vascular Surgery, Shanxi Provincial People’s Hospital (The Fifth Clinical Medical School of Shanxi Medical University)No. 29 Shuangtasi Street, Taiyuan 030012, Shanxi, China
| | - Bo Yu
- Department of Operating Room, Affiliated Hospital of Hebei UniversityNo. 212 Yuhua East Road, Lianchi District, Baoding 071000, Hebei, China
| |
Collapse
|
62
|
Nasimian A, Younus S, Tatli Ö, Hammarlund EU, Pienta KJ, Rönnstrand L, Kazi JU. AlphaML: A clear, legible, explainable, transparent, and elucidative binary classification platform for tabular data. Patterns (N Y) 2024; 5:100897. [PMID: 38264719 PMCID: PMC10801203 DOI: 10.1016/j.patter.2023.100897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 09/07/2023] [Accepted: 11/21/2023] [Indexed: 01/25/2024]
Abstract
Leveraging the potential of machine learning and recognizing the broad applications of binary classification, it becomes essential to develop platforms that are not only powerful but also transparent, interpretable, and user friendly. We introduce alphaML, a user-friendly platform that provides clear, legible, explainable, transparent, and elucidative (CLETE) binary classification models with comprehensive customization options. AlphaML offers feature selection, hyperparameter search, sampling, and normalization methods, along with 15 machine learning algorithms with global and local interpretation. We have integrated a custom metric for hyperparameter search that considers both training and validation scores, safeguarding against under- or overfitting. Additionally, we employ the NegLog2RMSL scoring method, which uses both training and test scores for a thorough model evaluation. The platform has been tested using datasets from multiple domains and offers a graphical interface, removing the need for programming expertise. Consequently, alphaML exhibits versatility, demonstrating promising applicability across a broad spectrum of tabular data configurations.
Collapse
Affiliation(s)
- Ahmad Nasimian
- Division of Translational Cancer Research, Department of Laboratory Medicine, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
- Lund University Cancer Centre (LUCC), Lund University, Lund, Sweden
| | - Saleena Younus
- Division of Translational Cancer Research, Department of Laboratory Medicine, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
- Lund University Cancer Centre (LUCC), Lund University, Lund, Sweden
| | - Özge Tatli
- Division of Translational Cancer Research, Department of Laboratory Medicine, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
- Lund University Cancer Centre (LUCC), Lund University, Lund, Sweden
| | - Emma U. Hammarlund
- Lund Stem Cell Center, Lund University, Lund, Sweden
- Lund University Cancer Centre (LUCC), Lund University, Lund, Sweden
- Tissue Development and Evolution (TiDE), Department of Experimental Medical Sciences, Lund University, Lund, Sweden
| | - Kenneth J. Pienta
- The Cancer Ecology Center, Brady Urological Institute, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Lars Rönnstrand
- Division of Translational Cancer Research, Department of Laboratory Medicine, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
- Lund University Cancer Centre (LUCC), Lund University, Lund, Sweden
- Department of Hematology, Oncology and Radiation Physics, Skåne University Hospital, Lund, Sweden
| | - Julhash U. Kazi
- Division of Translational Cancer Research, Department of Laboratory Medicine, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
- Lund University Cancer Centre (LUCC), Lund University, Lund, Sweden
| |
Collapse
|
63
|
Ogunpola A, Saeed F, Basurra S, Albarrak AM, Qasem SN. Machine Learning-Based Predictive Models for Detection of Cardiovascular Diseases. Diagnostics (Basel) 2024; 14:144. [PMID: 38248021 PMCID: PMC10813849 DOI: 10.3390/diagnostics14020144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 12/21/2023] [Accepted: 12/25/2023] [Indexed: 01/23/2024] Open
Abstract
Cardiovascular diseases present a significant global health challenge that emphasizes the critical need for developing accurate and more effective detection methods. Several studies have contributed valuable insights in this field, but it is still necessary to advance the predictive models and address the gaps in the existing detection approaches. For instance, some of the previous studies have not considered the challenge of imbalanced datasets, which can lead to biased predictions, especially when the datasets include minority classes. This study's primary focus is the early detection of heart diseases, particularly myocardial infarction, using machine learning techniques. It tackles the challenge of imbalanced datasets by conducting a comprehensive literature review to identify effective strategies. Seven machine learning and deep learning classifiers, including K-Nearest Neighbors, Support Vector Machine, Logistic Regression, Convolutional Neural Network, Gradient Boost, XGBoost, and Random Forest, were deployed to enhance the accuracy of heart disease predictions. The research explores different classifiers and their performance, providing valuable insights for developing robust prediction models for myocardial infarction. The study's outcomes emphasize the effectiveness of meticulously fine-tuning an XGBoost model for cardiovascular diseases. This optimization yields remarkable results: 98.50% accuracy, 99.14% precision, 98.29% recall, and a 98.71% F1 score. Such optimization significantly enhances the model's diagnostic accuracy for heart disease.
Collapse
Affiliation(s)
- Adedayo Ogunpola
- DAAI Research Group, College of Computing and Digital Technology, Birmingham City University, Birmingham B4 7XG, UK; (A.O.); (S.B.)
| | - Faisal Saeed
- DAAI Research Group, College of Computing and Digital Technology, Birmingham City University, Birmingham B4 7XG, UK; (A.O.); (S.B.)
| | - Shadi Basurra
- DAAI Research Group, College of Computing and Digital Technology, Birmingham City University, Birmingham B4 7XG, UK; (A.O.); (S.B.)
| | - Abdullah M. Albarrak
- Computer Science Department, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 11432, Saudi Arabia; (A.M.A.); (S.N.Q.)
| | - Sultan Noman Qasem
- Computer Science Department, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 11432, Saudi Arabia; (A.M.A.); (S.N.Q.)
| |
Collapse
|
64
|
Shen Y, Zhao X, Wang K, Sun Y, Zhang X, Wang C, Yang Z, Feng Z, Zhang X. Exploring White Matter Abnormalities in Young Children with Autism Spectrum Disorder: Integrating Multi-shell Diffusion Data and Machine Learning Analysis. Acad Radiol 2024:S1076-6332(23)00700-6. [PMID: 38185571 DOI: 10.1016/j.acra.2023.12.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Revised: 12/09/2023] [Accepted: 12/14/2023] [Indexed: 01/09/2024]
Abstract
RATIONALE AND OBJECTIVES This study employed tract-based spatial statistics (TBSS) to investigate abnormalities in the white matter microstructure among children with autism spectrum disorder (ASD). Additionally, an eXtreme Gradient Boosting (XGBoost) model was developed to effectively classify individuals with ASD and typical developing children (TDC). METHODS AND MATERIALS Multi-shell diffusion weighted images were acquired from 62 children with ASD and 44 TDC. Using the Pydesigner procedure, diffusion tensor (DT), diffusion kurtosis (DK), and white matter tract integrity (WMTI) metrics were computed. Subsequently, TBSS analysis was applied to discern differences in these diffusion parameters between ASD and TDC groups. The XGBoost model was then trained using metrics showing significant differences, and Shapley Additive explanations (SHAP) values were computed to assess the feature importance in the model's predictions. RESULTS TBSS analysis revealed a significant reduction in axonal diffusivity (AD) in the left posterior corona radiata and the right superior corona radiata. Among the DK indicators, mean kurtosis, axial kurtosis, and kurtosis fractional anisotropy were notably increased in children with ASD, with no significant difference in radial kurtosis. WMTI metrics such as axonal water fraction, axonal diffusivity of the extra-axonal space (EAS_AD), tortuosity of the extra-axonal space (EAS_TORT), and diffusivity of intra-axonal space (IAS_Da) were significantly increased, primarily in the corpus callosum and fornix. Notably, there was no significant difference in radial diffusivity of the extra-axial space (EAS_RD). The XGBoost model demonstrated excellent classification ability, and the SHAP analysis identified EAS_TORT as the feature with the highest importance in the model's predictions. CONCLUSION This study utilized TBSS analyses with multi-shell diffusion data to examine white matter abnormalities in pediatric autism. Additionally, the developed XGBoost model showed outstanding performance in classifying ASD and TDC. The ranking of SHAP values based on the XGBoost model underscored the significance of features in influencing model predictions.
Collapse
Affiliation(s)
- Yanyong Shen
- Department of Radiology, The Third Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, China (Y.S., X.Z., X.Z., C.W., Z.Y., Z.F., X.Z.); Henan International Joint Laboratory of Neuroimaging, Zhengzhou, 450052, China (Y.S., X.Z., X.Z., C.W., Z.Y., Z.F., X.Z.)
| | - Xin Zhao
- Department of Radiology, The Third Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, China (Y.S., X.Z., X.Z., C.W., Z.Y., Z.F., X.Z.); Henan International Joint Laboratory of Neuroimaging, Zhengzhou, 450052, China (Y.S., X.Z., X.Z., C.W., Z.Y., Z.F., X.Z.)
| | - Kaiyu Wang
- MR Research China, GE Healthcare, Beijing, 100000, PR China (K.W.)
| | - Yongbing Sun
- Department of Radiology, Henan Provincial People's Hospital, Zhengzhou, 450000, China (Y.S.)
| | - Xiaoxue Zhang
- Department of Radiology, The Third Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, China (Y.S., X.Z., X.Z., C.W., Z.Y., Z.F., X.Z.); Henan International Joint Laboratory of Neuroimaging, Zhengzhou, 450052, China (Y.S., X.Z., X.Z., C.W., Z.Y., Z.F., X.Z.)
| | - Changhao Wang
- Department of Radiology, The Third Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, China (Y.S., X.Z., X.Z., C.W., Z.Y., Z.F., X.Z.); Henan International Joint Laboratory of Neuroimaging, Zhengzhou, 450052, China (Y.S., X.Z., X.Z., C.W., Z.Y., Z.F., X.Z.)
| | - Zhexuan Yang
- Department of Radiology, The Third Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, China (Y.S., X.Z., X.Z., C.W., Z.Y., Z.F., X.Z.); Henan International Joint Laboratory of Neuroimaging, Zhengzhou, 450052, China (Y.S., X.Z., X.Z., C.W., Z.Y., Z.F., X.Z.)
| | - Zhanqi Feng
- Department of Radiology, The Third Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, China (Y.S., X.Z., X.Z., C.W., Z.Y., Z.F., X.Z.); Henan International Joint Laboratory of Neuroimaging, Zhengzhou, 450052, China (Y.S., X.Z., X.Z., C.W., Z.Y., Z.F., X.Z.)
| | - Xiaoan Zhang
- Department of Radiology, The Third Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, China (Y.S., X.Z., X.Z., C.W., Z.Y., Z.F., X.Z.); Henan International Joint Laboratory of Neuroimaging, Zhengzhou, 450052, China (Y.S., X.Z., X.Z., C.W., Z.Y., Z.F., X.Z.).
| |
Collapse
|
65
|
Li X, Li C, Guo F, Meng X, Liu Y, Ren F. Coefficient of variation method combined with XGboost ensemble model for wheat growth monitoring. Front Plant Sci 2024; 14:1267108. [PMID: 38235205 PMCID: PMC10791907 DOI: 10.3389/fpls.2023.1267108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 11/30/2023] [Indexed: 01/19/2024]
Abstract
Introduction Obtaining wheat growth information accurately and efficiently is the key to estimating yields and guiding agricultural development. Methods This paper takes the precision agriculture demonstration area of Jiaozuo Academy of Agriculture and Forestry in Henan Province as the research area to obtain data on wheat biomass, nitrogen content, chlorophyll content, and leaf area index. By using the coefficient of variation method, a Comprehensive Growth Monitoring Indicator (CGMI) was constructed to perform fractional derivative processing on drone spectral data, and correlation analysis was performed on the fractional derivative spectra with a single indicator and CGMI, respectively. Then, grey correlation analysis was carried out on differential spectral bands with high correlation, the grey correlation coefficients between differential spectral bands were calculated, and spectral bands with high correlation were screened and taken as input variables for the model. Next, ridge regression, random forest, and XGboost models were used to establish a wheat CGMI inversion model, and the coefficient of determination (R2) and root mean squared error (RMSE) were adopted for accuracy evaluation to optimize the wheat optimal growth inversion model. Results and discussion The results of the study show that: using the data of wheat biomass, nitrogen content, chlorophyll content and leaf area index to construct the comprehensive growth monitoring indicators, the correlation between the wheat growth monitoring indicators and the spectra was calculated, and the results showed that the correlation between the comprehensive growth monitoring indicators and the single indicator correlation had different degrees of increase, and the growth rate could reach 82.22%. The correlation coefficient between the comprehensive growth monitoring indexes and the differential spectra reached 0.92 at the flowering stage, and compared with the correlation coefficient with the original spectra at the same period, the correlation coefficients increased to different degrees, which indicated that the differential processing of spectral data could effectively enhance the spectral correlation. The three models of Random Forest, Ridge Regression and XGBoost were used to construct the wheat growth inversion model with the best effect at the flowering stage, and the XGBoost model had the highest inversion accuracy when comparing in the same period, with the training and test sets reaching 0.904 and 0.870, and the RMSEs were 0.050 and 0.079, so that the XGBoost model can be used as an effective method of monitoring the growth of wheat. To sum up, this study demonstrates that the combination of constructing comprehensive growth monitoring indicators and differential processing spectra can effectively improve the accuracy of wheat growth monitoring, bringing new methods for precision agriculture management.
Collapse
Affiliation(s)
- Xinyan Li
- School of Surveying and Land Information Engineering, Henan Polytechnic University, Jiaozuo, China
| | - Changchun Li
- School of Surveying and Land Information Engineering, Henan Polytechnic University, Jiaozuo, China
| | - Fuchen Guo
- School of Surveying and Land Information Engineering, Henan Polytechnic University, Jiaozuo, China
| | - Xiaopeng Meng
- School of Surveying and Land Information Engineering, Henan Polytechnic University, Jiaozuo, China
| | - Yanghua Liu
- PIESAT Information Technology Co., Ltd, Beijing, China
| | - Fang Ren
- PIESAT Information Technology Co., Ltd, Beijing, China
| |
Collapse
|
66
|
Ryyppö R, Häyrynen S, Joutsijoki H, Juhola M, Seppänen MRJ. Comparison of machine learning methods in the early identification of vasculitides, myositides and glomerulonephritides. Comput Methods Programs Biomed 2024; 243:107917. [PMID: 37948909 DOI: 10.1016/j.cmpb.2023.107917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 09/29/2023] [Accepted: 11/05/2023] [Indexed: 11/12/2023]
Abstract
BACKGROUND Rare disease diagnoses are often delayed by years, including multiple doctor visits, and potential imprecise or incorrect diagnoses before receiving the correct one. Machine learning could solve this problem by flagging potential patients that doctors should examine more closely. METHODS Making the prediction situation as close as possible to real situation, we tested different masking sizes. In the masking phase, data was removed, and it was applied to all data points following the first rare disease diagnosis, including the day when the diagnosis was received, and in addition applied to selected number of days before initial diagnosis. Performance of machine learning models were compared with positive predictive value (PPV), negative predictive value (NPV), prevalence PPV (pPPV), prevalence NPV (pNPV), accuracy (ACC) and area under the receiver operation characteristics curve (AUC). RESULTS XGBoost had PPVs over 90 % in all masking settings, and InceptionVasGloMyotides had most of the PPVs over 90 %, but not as consistently. When the prevalence of the diseases was considered XGBoost achieved highest value of 8.8 % in binary classification with 30 days masking and InceptionVasGloMyotides achieved the best value of 6 % in the binary classification as well, but with 2160 days and 4320 days masking. ACC were varying between 89 % and 98 % with XGBoost and InceptionVasGloMyotides having variation between 79 % and 94 %. AUC on the other hand varied between 72.6 % and 94.5 % with InceptionVasGloMyotides and for XGBoost it varied between 69.9 % and 96.4 %. CONCLUSIONS XGBoost and InceptionVasGloMyotides could successfully predict rare diseases for patients at least 30 days prior to initial rare disease diagnose. In addition, we managed to build performative custom deep learning model.
Collapse
Affiliation(s)
- Rasmus Ryyppö
- Faculty of Information Technology and Communication Sciences, Tampere University, Kanslerinrinne 1, Tampere 33014, Finland; Tietoevry Ltd, Espoo, Finland.
| | | | - Henry Joutsijoki
- Faculty of Information Technology and Communication Sciences, Tampere University, Kanslerinrinne 1, Tampere 33014, Finland
| | - Martti Juhola
- Faculty of Information Technology and Communication Sciences, Tampere University, Kanslerinrinne 1, Tampere 33014, Finland
| | - Mikko R J Seppänen
- Rare Disease Center and Pediatric Research Center, New Children's Hospital, University of Helsinki and HUS Helsinki University Hospital, Helsinki, Finland
| |
Collapse
|
67
|
Cao J, Xu Y. Predicting cysteine reactivity changes upon phosphorylation using XGBoost. FEBS Open Bio 2024; 14:51-62. [PMID: 37964470 PMCID: PMC10761938 DOI: 10.1002/2211-5463.13737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 10/11/2023] [Accepted: 10/27/2023] [Indexed: 11/16/2023] Open
Abstract
Cysteine reactivity serves as a significant indicator of protein function and can be affected by phosphorylation events. Experimental approaches have been developed to investigate this effect, but the scale is still relatively limited. Machine-learning approaches promise to accelerate the investigation of these phenomena. In this study, protein sequence information, distances to the closest phosphorylation sites, and the membership score of the intrinsically disordered region were used to represent the cysteine. Following the feature selection using an elastic net model, two groups of binary classifiers based on XGBoost were built to predict the occurrence and the direction of the reactivity change as a response to phosphorylation events, respectively. In addition, function enrichment analysis was performed on proteins/genes predicted to have reactivity changes. XGBoost performed the best in the independent test with AUC of 0.8192 and 0.9203 for the prediction of the change's occurrence and direction, respectively. The use of two binary classifiers successively resulted in an accuracy of 0.7568 in predicting whether reactivity would be unchanged, increased, or decreased. The enrichment analysis revealed the association of proteins carrying reactivity-changed cysteine residues with various disease-related pathways, particularly cancer, autosomal dominant diseases, and viral infections. Changes in cysteine reactivity influenced by phosphorylation are site-specific and can be predicted by XGBoost algorithms. Our model provides an efficient alternative way to explore the cysteine reactivity upon phosphorylation at the proteome-wide level, facilitating the investigation of protein functions and their clinical insights. Our code is available on GitHub (https://github.com/DarinaOsamu/predictors-of-cysteine-reactivity-changes).
Collapse
Affiliation(s)
- Jing Cao
- Department of StatisticsUniversity of Science and Technology BeijingChina
| | - Yan Xu
- Department of StatisticsUniversity of Science and Technology BeijingChina
| |
Collapse
|
68
|
Sharma K, Saini N, Hasija Y. Identifying the mitochondrial metabolism network by integration of machine learning and explainable artificial intelligence in skeletal muscle in type 2 diabetes. Mitochondrion 2024; 74:101821. [PMID: 38040172 DOI: 10.1016/j.mito.2023.11.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 10/04/2023] [Accepted: 11/26/2023] [Indexed: 12/03/2023]
Abstract
Imbalance in glucose metabolism and insulin resistance are two primary features of type 2 diabetes/diabetes mellitus. Its etiology is linked to mitochondrial dysfunction in skeletal muscle tissue. The mitochondria are vital organelles involved in ATP synthesis and metabolism. The underlying biological pathways leading to mitochondrial dysfunction in type 2 diabetes can help us understand the pathophysiology of the disease. In this study, the mitochondrial gene expression dataset were retrieved from the GSE22309, GSE25462, and GSE18732 using Mitocarta 3.0, focusing specifically on genes that are associated with mitochondrial function in type 2 disease. Feature selection on the expression dataset of skeletal muscle tissue from 107 control patients and 70 type 2 diabetes patients using the XGBoost algorithm having the highest accuracy. For interpretation and analysis of results linked to the disease by examining the feature importance deduced from the model was done using SHAP (SHapley Additive exPlanations). Next, to comprehend the biological connections, study of protein-protien and mRNA-miRNA networks was conducted using String and Mienturnet respectively. The analysis revealed BDH1, YARS2, AKAP10, RARS2, MRPS31, were potential mitochondrial target genes among the other twenty genes. These genes are mainly involved in the transport and organization of mitochondria, regulation of its membrane potential, and intrinsic apoptotic signaling etc. mRNA-miRNA interaction network revealed a significant role of miR-375; miR-30a-5p; miR-16-5p; miR-129-5p; miR-1229-3p; and miR-1224-3p; in the regulation of mitochondrial function exhibited strong associations with type 2 diabetes. These results might aid in the creation of novel targets for therapy and type 2 diabetes biomarkers.
Collapse
Affiliation(s)
- Kritika Sharma
- CSIR-Institute of Genomics and Integrative Biology, Mall Road, New Delhi 110007, India; Department of Biotechnology, Delhi Technological University, Delhi 110042, India
| | - Neeru Saini
- CSIR-Institute of Genomics and Integrative Biology, Mall Road, New Delhi 110007, India; Academy of Scientific & Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Yasha Hasija
- Department of Biotechnology, Delhi Technological University, Delhi 110042, India.
| |
Collapse
|
69
|
Nabeel SM, Bazai SU, Alasbali N, Liu Y, Ghafoor MI, Khan R, Ku CS, Yang J, Shahab S, Por LY. Optimizing lung cancer classification through hyperparameter tuning. Digit Health 2024; 10:20552076241249661. [PMID: 38698834 PMCID: PMC11064752 DOI: 10.1177/20552076241249661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 04/04/2024] [Indexed: 05/05/2024] Open
Abstract
Artificial intelligence is steadily permeating various sectors, including healthcare. This research specifically addresses lung cancer, the world's deadliest disease with the highest mortality rate. Two primary factors contribute to its onset: genetic predisposition and environmental factors, such as smoking and exposure to pollutants. Recognizing the need for more effective diagnosis techniques, our study embarked on devising a machine learning strategy tailored to boost precision in lung cancer detection. Our aim was to devise a diagnostic method that is both less invasive and cost-effective. To this end, we proposed four methods, benchmarking them against prevalent techniques using a universally recognized dataset from Kaggle. Among our methods, one emerged as particularly promising, outperforming the competition in accuracy, precision and sensitivity. This method utilized hyperparameter tuning, focusing on the Gamma and C parameters, which were set at a value of 10. These parameters influence kernel width and regularization strength, respectively. As a result, we achieved an accuracy of 99.16%, a precision of 98% and a sensitivity rate of 100%. In conclusion, our enhanced prediction mechanism has proven to surpass traditional and contemporary strategies in lung cancer detection.
Collapse
Affiliation(s)
- Syed Muhammad Nabeel
- Department of Computer Engineering, Balochistan University of Information Technology, Engineering, and Management Sciences (BUITEMS), Quetta, Balochistan, Pakistan
| | - Sibghat Ullah Bazai
- Department of Computer Engineering, Balochistan University of Information Technology, Engineering, and Management Sciences (BUITEMS), Quetta, Balochistan, Pakistan
| | - Nada Alasbali
- Department of Informatics and Computing Systems, College of Computer Science, King Khalid University, Abha, Saudi Arabia
| | - Yifan Liu
- Department of Electronic Science, Binhai College of Nankai University, Tianjing, China
| | | | - Rozi Khan
- Department of Computer Science, National University of Sciences and Technology (NUST) Balochistan Campus Quetta, Quetta, Balochistan, Pakistan
| | - Chin Soon Ku
- Department of Computer Science, Universiti Tunku Abdul Rahman, Kampar, Malaysia
| | - Jing Yang
- Department of Computer System and Technology, Faculty of Computer Science and Information Technology, Universiti Malaya, Kuala Lumpur, Malaysia
| | - Sana Shahab
- Department of Business Administration, College of Business Administration, Princess Nourah Bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Lip Yee Por
- Department of Computer System and Technology, Faculty of Computer Science and Information Technology, Universiti Malaya, Kuala Lumpur, Malaysia
| |
Collapse
|
70
|
Calderón-Díaz M, Silvestre Aguirre R, Vásconez JP, Yáñez R, Roby M, Querales M, Salas R. Explainable Machine Learning Techniques to Predict Muscle Injuries in Professional Soccer Players through Biomechanical Analysis. Sensors (Basel) 2023; 24:119. [PMID: 38202981 PMCID: PMC10780883 DOI: 10.3390/s24010119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 11/25/2023] [Accepted: 12/18/2023] [Indexed: 01/12/2024]
Abstract
There is a significant risk of injury in sports and intense competition due to the demanding physical and psychological requirements. Hamstring strain injuries (HSIs) are the most prevalent type of injury among professional soccer players and are the leading cause of missed days in the sport. These injuries stem from a combination of factors, making it challenging to pinpoint the most crucial risk factors and their interactions, let alone find effective prevention strategies. Recently, there has been growing recognition of the potential of tools provided by artificial intelligence (AI). However, current studies primarily concentrate on enhancing the performance of complex machine learning models, often overlooking their explanatory capabilities. Consequently, medical teams have difficulty interpreting these models and are hesitant to trust them fully. In light of this, there is an increasing need for advanced injury detection and prediction models that can aid doctors in diagnosing or detecting injuries earlier and with greater accuracy. Accordingly, this study aims to identify the biomarkers of muscle injuries in professional soccer players through biomechanical analysis, employing several ML algorithms such as decision tree (DT) methods, discriminant methods, logistic regression, naive Bayes, support vector machine (SVM), K-nearest neighbor (KNN), ensemble methods, boosted and bagged trees, artificial neural networks (ANNs), and XGBoost. In particular, XGBoost is also used to obtain the most important features. The findings highlight that the variables that most effectively differentiate the groups and could serve as reliable predictors for injury prevention are the maximum muscle strength of the hamstrings and the stiffness of the same muscle. With regard to the 35 techniques employed, a precision of up to 78% was achieved with XGBoost, indicating that by considering scientific evidence, suggestions based on various data sources, and expert opinions, it is possible to attain good precision, thus enhancing the reliability of the results for doctors and trainers. Furthermore, the obtained results strongly align with the existing literature, although further specific studies about this sport are necessary to draw a definitive conclusion.
Collapse
Affiliation(s)
- Mailyn Calderón-Díaz
- Faculty of Engineering, Universidad Andres Bello, Santiago 7550196, Chile;
- Ph.D. Program in Health Sciences and Engineering, Universidad de Valparaiso, Valparaiso 2362735, Chile
- Millennium Institute for Intelligent Healthcare Engineering (iHealth), Valparaiso 2362735, Chile
| | - Rony Silvestre Aguirre
- Laboratorio de Biomecánica, Centro de Innovación Clínica MEDS, Santiago 7691236, Chile; (R.S.A.); (R.Y.); (M.R.)
| | - Juan P. Vásconez
- Faculty of Engineering, Universidad Andres Bello, Santiago 7550196, Chile;
| | - Roberto Yáñez
- Laboratorio de Biomecánica, Centro de Innovación Clínica MEDS, Santiago 7691236, Chile; (R.S.A.); (R.Y.); (M.R.)
| | - Matías Roby
- Laboratorio de Biomecánica, Centro de Innovación Clínica MEDS, Santiago 7691236, Chile; (R.S.A.); (R.Y.); (M.R.)
| | - Marvin Querales
- School of Medical Technology, Universidad de Valparaiso, Valparaiso 2362735, Chile;
| | - Rodrigo Salas
- Ph.D. Program in Health Sciences and Engineering, Universidad de Valparaiso, Valparaiso 2362735, Chile
- Millennium Institute for Intelligent Healthcare Engineering (iHealth), Valparaiso 2362735, Chile
- School of Biomedical Engineering, Universidad de Valparaiso, Valparaiso 2362735, Chile
| |
Collapse
|
71
|
Lu B, Meng X, Dong S, Zhang Z, Liu C, Jiang J, Herrmann H, Li X. High-resolution mapping of regional VOCs using the enhanced space-time extreme gradient boosting machine ( XGBoost) in Shanghai. Sci Total Environ 2023; 905:167054. [PMID: 37714357 DOI: 10.1016/j.scitotenv.2023.167054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 09/10/2023] [Accepted: 09/11/2023] [Indexed: 09/17/2023]
Abstract
The accurate estimation of highly spatiotemporal volatile organic compounds (VOCs) is of great significance to establish advanced early warning systems and regulate air pollution control. However, the estimation of high spatiotemporal VOCs remains incomplete. Here, the space-time extreme gradient boost model (STXGB) was enhanced by integrating spatiotemporal information to obtain the spatial resolution and overall accuracy of VOCs. To this end, meteorological, topographical and pollutant emissions, was input to the STXGB model, and regional hourly 300 m VOCs maps for 2020 in Shanghai were produced. Our results show that the STXGB model achieve good hourly VOCs estimations performance (R2 = 0.73). A further analysis of SHapley Additive exPlanation (SHAP) regression indicate that local interpretations of the STXGB models demonstrate the strong contribution of emissions on mapping VOCs estimations, while acknowledging the important contribution of space and time term. The proposed approach outperforms many traditional machine learning models with a lower computational burden in terms of speed and memory.
Collapse
Affiliation(s)
- Bingqing Lu
- Department of Environmental Science & Engineering, Fudan University, Shanghai 200438, PR China
| | - Xue Meng
- Department of Environmental Science & Engineering, Fudan University, Shanghai 200438, PR China
| | - Shanshan Dong
- Department of Environmental Science & Engineering, Fudan University, Shanghai 200438, PR China
| | - Zekun Zhang
- Department of Environmental Science & Engineering, Fudan University, Shanghai 200438, PR China
| | - Chao Liu
- Department of Environmental Science & Engineering, Fudan University, Shanghai 200438, PR China
| | - Jiakui Jiang
- Department of Environmental Science & Engineering, Fudan University, Shanghai 200438, PR China
| | - Hartmut Herrmann
- Leibniz-Institut für Troposphärenforschung (IfT), Permoserstr. 15, 04318 Leipzig, Germany
| | - Xiang Li
- Department of Environmental Science & Engineering, Fudan University, Shanghai 200438, PR China; Institute of Eco-Chongming (IEC), Shanghai 200241, China.
| |
Collapse
|
72
|
Zhu H, Hao H, Yu L. Identifying disease-related microbes based on multi-scale variational graph autoencoder embedding Wasserstein distance. BMC Biol 2023; 21:294. [PMID: 38115088 PMCID: PMC10731776 DOI: 10.1186/s12915-023-01796-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 12/05/2023] [Indexed: 12/21/2023] Open
Abstract
BACKGROUND Enormous clinical and biomedical researches have demonstrated that microbes are crucial to human health. Identifying associations between microbes and diseases can not only reveal potential disease mechanisms, but also facilitate early diagnosis and promote precision medicine. Due to the data perturbation and unsatisfactory latent representation, there is a significant room for improvement. RESULTS In this work, we proposed a novel framework, Multi-scale Variational Graph AutoEncoder embedding Wasserstein distance (MVGAEW) to predict disease-related microbes, which had the ability to resist data perturbation and effectively generate latent representations for both microbes and diseases from the perspective of distribution. First, we calculated multiple similarities and integrated them through similarity network confusion. Subsequently, we obtained node latent representations by improved variational graph autoencoder. Ultimately, XGBoost classifier was employed to predict potential disease-related microbes. We also introduced multi-order node embedding reconstruction to enhance the representation capacity. We also performed ablation studies to evaluate the contribution of each section of our model. Moreover, we conducted experiments on common drugs and case studies, including Alzheimer's disease, Crohn's disease, and colorectal neoplasms, to validate the effectiveness of our framework. CONCLUSIONS Significantly, our model exceeded other currently state-of-the-art methods, exhibiting a great improvement on the HMDAD database.
Collapse
Affiliation(s)
- Huan Zhu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Hongxia Hao
- School of Computer Science and Technology, Xidian University, Xi'an, China.
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an, China.
| |
Collapse
|
73
|
Mahlknecht J, Torres-Martínez JA, Kumar M, Mora A, Kaown D, Loge FJ. Nitrate prediction in groundwater of data scarce regions: The futuristic fresh-water management outlook. Sci Total Environ 2023; 905:166863. [PMID: 37690767 DOI: 10.1016/j.scitotenv.2023.166863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 08/28/2023] [Accepted: 09/03/2023] [Indexed: 09/12/2023]
Abstract
Nitrate contamination in groundwater poses a significant threat to water quality and public health, especially in regions with limited data availability. This study addresses this challenge by employing machine learning (ML) techniques to predict nitrate (NO3--N) concentrations in Mexico's groundwater. Four ML algorithms-Extreme Gradient Boosting (XGB), Boosted Regression Trees (BRT), Random Forest (RF), and Support Vector Machines (SVM)-were executed to model NO3--N concentrations across the country. Despite data limitations, the ML models achieved robust predictive performances. XGB and BRT algorithms demonstrated superior accuracy (0.80 and 0.78, respectively). Notably, this was achieved using ∼10 times less information than previous large-scale assessments. The novelty lies in the first-ever implementation of the 'Support Points-based Split Approach' during data pre-processing. The models considered initially 68 covariates and identified 13-19 significant predictors of NO3--N concentration spanning from climate, geomorphology, soil, hydrogeology, and human factors. Rainfall, elevation, and slope emerged as key predictors. A validation incorporated nationwide waste disposal sites, yielding an encouraging correlation. Spatial risk mapping unveiled significant pollution hotspots across Mexico. Regions with elevated NO3--N concentrations (>10 mg/L) were identified, particularly in the north-central and northeast parts of the country, associated with agricultural and industrial activities. Approximately 21 million people, accounting for 10 % of Mexico's population, are potentially exposed to elevated NO3--N levels in groundwater. Moreover, the NO3--N hotspots align with reported NO3--N health implications such as gastric and colorectal cancer. This study not only demonstrates the potential of ML in data-scarce regions but also offers actionable insights for policy and management strategies. Our research underscores the urgency of implementing sustainable agricultural practices and comprehensive domestic waste management measures to mitigate NO3--N contamination. Moreover, it advocates for the establishment of effective policies based on real-time monitoring and collaboration among stakeholders.
Collapse
Affiliation(s)
- Jürgen Mahlknecht
- Escuela de Ingeniería y Ciencias, Tecnologico de Monterrey, Campus Monterrey, Eugenio Garza Sada 2501, Monterrey, NL 64849, Mexico
| | - Juan Antonio Torres-Martínez
- Escuela de Ingeniería y Ciencias, Tecnologico de Monterrey, Campus Monterrey, Eugenio Garza Sada 2501, Monterrey, NL 64849, Mexico.
| | - Manish Kumar
- Escuela de Ingeniería y Ciencias, Tecnologico de Monterrey, Campus Monterrey, Eugenio Garza Sada 2501, Monterrey, NL 64849, Mexico; Sustainability Cluster, School of Advanced Engineering, UPES, Dehradun, Uttarakhand 248007, India
| | - Abrahan Mora
- Escuela de Ingeniería y Ciencias, Tecnologico de Monterrey, Campus Puebla, Atlixcáyotl 5718, Puebla de Zaragoza, Puebla 72453, Mexico
| | - Dugin Kaown
- School of Earth and Environmental Sciences, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, South Korea
| | - Frank J Loge
- Escuela de Ingeniería y Ciencias, Tecnologico de Monterrey, Campus Monterrey, Eugenio Garza Sada 2501, Monterrey, NL 64849, Mexico; Department of Civil and Environmental Engineering, University of California Davis, One Shields Avenue, Davis, CA 95616, USA
| |
Collapse
|
74
|
Teng X, Wang Z. Online COVID-19 diagnosis prediction using complete blood count: an innovative tool for public health. BMC Public Health 2023; 23:2536. [PMID: 38114942 PMCID: PMC10729447 DOI: 10.1186/s12889-023-17477-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 12/13/2023] [Indexed: 12/21/2023] Open
Abstract
BACKGROUND COVID-19, caused by SARS-CoV-2, presents distinct diagnostic challenges due to its wide range of clinical manifestations and the overlapping symptoms with other common respiratory diseases. This study focuses on addressing these difficulties by employing machine learning (ML) methodologies, particularly the XGBoost algorithm, to utilize Complete Blood Count (CBC) parameters for predictive analysis. METHODS We performed a retrospective study involving 2114 COVID-19 patients treated between December 2022 and January 2023 at our healthcare facility. These patients were classified into fever (1057 patients) and pneumonia groups (1057 patients), based on their clinical symptoms. The CBC data were utilized to create predictive models, with model performance evaluated through metrics like Area Under the Receiver Operating Characteristics Curve (AUC), accuracy, sensitivity, specificity, and precision. We selected the top 10 predictive variables based on their significance in disease prediction. The data were then split into a training set (70% of patients) and a validation set (30% of patients) for model validation. RESULTS We identified 31 indicators with significant disparities. The XGBoost model outperformed others, with an AUC of 0.920 and high precision, sensitivity, specificity, and accuracy. The top 10 features (Age, Monocyte%, Mean Platelet Volume, Lymphocyte%, SIRI, Eosinophil count, Platelet count, Hemoglobin, Platelet Distribution Width, and Neutrophil count.) were crucial in constructing a more precise predictive model. The model demonstrated strong performance on both training (AUC = 0.977) and validation (AUC = 0.912) datasets, validated by decision curve analysis and calibration curve. CONCLUSION ML models that incorporate CBC parameters offer an innovative and effective tool for data analysis in COVID-19. They potentially enhance diagnostic accuracy and the efficacy of therapeutic interventions, ultimately contributing to a reduction in the mortality rate of this infectious disease.
Collapse
Affiliation(s)
- Xiaojing Teng
- Department of Clinical Laboratory, Affiliated Hangzhou First People's Hospital, Westlake University School of Medicine, Hangzhou, Zhejiang, 310000, China
| | - Zhiyi Wang
- Department of Clinical Laboratory, Hangzhou Women's Hospital (Hangzhou Maternity and Child Health Care Hospital), No. 369, Kunpeng Road, Shangcheng District Hangzhou, Hangzhou, Zhejiang, 310008, China.
| |
Collapse
|
75
|
Zhang J, Chen R, Chen S, Yu D, Elkamchouchi DH, Alqahtani MS, Assilzadeh H, Huang Z, Huang Y. Application of lipid and polymeric-based nanoparticles for treatment of inner ear infections via XGBoost. Environ Res 2023; 239:117115. [PMID: 37717809 DOI: 10.1016/j.envres.2023.117115] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 08/26/2023] [Accepted: 09/09/2023] [Indexed: 09/19/2023]
Abstract
Taking hearing loss as a prevalent sensory disorder, the restricted permeability of blood flow and the blood-labyrinth barrier in the inner ear pose significant challenges to transporting drugs to the inner ear tissues. The current options for hear loss consist of cochlear surgery, medication, and hearing devices. There are some restrictions to the conventional drug delivery methods to treat inner ear illnesses, however, different smart nanoparticles, including inorganic-based nanoparticles, have been presented to regulate drug administration, enhance the targeting of particular cells, and decrease systemic adverse effects. Zinc oxide nanoparticles possess distinct characteristics that facilitate accurate drug delivery, improved targeting of specific cells, and minimized systemic adverse effects. Zinc oxide nanoparticles was studied for targeted delivery and controlled release of therapeutic drugs within specific cells. XGBoost model is used on the Wideband Absorbance Immittance (WAI) measuring test after cochlear surgery. There were 90 middle ear effusion samples (ages = 1-10 years, mean = 34.9 months) had chronic middle ear effusion for four months and verified effusion for seven weeks. In this research, 400 sets underwent wideband absorbance imaging (WAI) to assess inner ear performance after surgery. Among them, 60 patients had effusion Otitis Media with Effusion (OME), while 30 ones had normal ears (control). OME ears showed significantly lower absorbance at 250, 500, and 1000 Hz than controls (p < 0.001). Absorbance thresholds >0.252 (1000 Hz) and >0.330 (2000 Hz) predicted a favorable prognosis (p < 0.05, odds ratio: 6). It means that cochlear surgery and WAI showed high function in diagnosis and treatment of inner ear infections. Regarding the R2 0.899 and RMSE 1.223, XGBoost shows excellent specificity and sensitivity for categorizing ears as having effusions absent or present or partial or complete flows present, with areas under the curve (1-0.944).
Collapse
Affiliation(s)
- Jie Zhang
- Department of Otolaryngology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang,325000, China
| | - Ru Chen
- Department of Otolaryngology, The Third Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang 325000, China
| | - Shuainan Chen
- Department of Otolaryngology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang,325000, China
| | - Die Yu
- Department of Otolaryngology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang,325000, China
| | - Dalia H Elkamchouchi
- Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Mohammed S Alqahtani
- Radiological Sciences Department, College of Applied Medical Sciences, King Khalid University, Abha 61421, Saudi Arabia; BioImaging Unit, Space Research Centre, Michael Atiyah Building, University of Leicester, Leicester, LE1 7RH, UK
| | - Hamid Assilzadeh
- Faculty of Architecture and Urbanism, UTE University, Calle Rumipamba S/N and Bourgeois, Quito, Ecuador; Institute of Research and Development, Duy Tan University, Da Nang, Viet Nam; School of Engineering & Technology, Duy Tan University, Da Nang, Viet Nam; Department of Biomaterials, Saveetha Dental College and Hospital, Saveetha Institute of Medical and Technical Sciences, Chennai 600077, India.
| | - Zhongguan Huang
- Department of Otolaryngology, Pingyang Affiliated Hospital of Wenzhou Medical University, Pingyang, Zhejiang, 325400, China.
| | - Yideng Huang
- Department of Otolaryngology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang,325000, China.
| |
Collapse
|
76
|
Xu Y, Park Y, Park JD, Sun B. Predicting Nurse Turnover for Highly Imbalanced Data Using the Synthetic Minority Over-Sampling Technique and Machine Learning Algorithms. Healthcare (Basel) 2023; 11:3173. [PMID: 38132063 PMCID: PMC10742910 DOI: 10.3390/healthcare11243173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 12/11/2023] [Accepted: 12/13/2023] [Indexed: 12/23/2023] Open
Abstract
Predicting nurse turnover is a growing challenge within the healthcare sector, profoundly impacting healthcare quality and the nursing profession. This study employs the Synthetic Minority Over-sampling Technique (SMOTE) to address class imbalance issues in the 2018 National Sample Survey of Registered Nurses dataset and predict nurse turnover using machine learning algorithms. Four machine learning algorithms, namely logistic regression, random forests, decision tree, and extreme gradient boosting, were applied to the SMOTE-enhanced dataset. The data were split into 80% training and 20% validation sets. Eighteen carefully selected variables from the database served as predictive features, and the machine learning model identified age, working hours, electric health record/electronic medical record, individual income, and job type as important features concerning nurse turnover. The study includes a performance comparison based on accuracy, precision, recall (sensitivity), F1-score, and AUC. In summary, the results demonstrate that SMOTE-enhanced random forests exhibit the most robust predictive power in the classical approach (with all 18 predictive variables) and an optimized approach (utilizing eight key predictive variables). Extreme gradient boosting, decision tree, and logistic regression follow in performance. Notably, age emerges as the most influential factor in nurse turnover, with working hours, electric health record/electronic medical record usability, individual income, and region also playing significant roles. This research offers valuable insights for healthcare researchers and stakeholders, aiding in selecting suitable machine learning algorithms for nurse turnover prediction.
Collapse
Affiliation(s)
- Yuan Xu
- School of Maritime Economics and Management, Collaborative Innovation Center for Transport Studies, Dalian Maritime University, 1 Linghai Road, Dalian 116026, China;
| | - Yongshin Park
- Department of Marketing, Operations, and Analytics, Bill Munday School of Business, St. Edward’s University, 3001 South Congress, Austin, TX 78704, USA
| | - Ju Dong Park
- Department of Maritime Police and Production System, Gyeongsang National University, Tongyeong-si 53064, Gyeongsangnam-do, Republic of Korea
| | - Bora Sun
- School of Nursing, The University of Texas Austin, 1710 Red River St., Austin, TX 78712, USA;
| |
Collapse
|
77
|
Guo J, Cheng H, Wang Z, Qiao M, Li J, Lyu J. Factor analysis based on SHapley Additive exPlanations for sepsis-associated encephalopathy in ICU mortality prediction using XGBoost - a retrospective study based on two large database. Front Neurol 2023; 14:1290117. [PMID: 38162445 PMCID: PMC10755941 DOI: 10.3389/fneur.2023.1290117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 11/30/2023] [Indexed: 01/03/2024] Open
Abstract
Objective Sepsis-associated encephalopathy (SAE) is strongly linked to a high mortality risk, and frequently occurs in conjunction with the acute and late phases of sepsis. The objective of this study was to construct and verify a predictive model for mortality in ICU-dwelling patients with SAE. Methods The study selected 7,576 patients with SAE from the MIMIC-IV database according to the inclusion criteria and randomly divided them into training (n = 5,303, 70%) and internal validation (n = 2,273, 30%) sets. According to the same criteria, 1,573 patients from the eICU-CRD database were included as an external test set. Independent risk factors for ICU mortality were identified using Extreme Gradient Boosting (XGBoost) software, and prediction models were constructed and verified using the validation set. The receiver operating characteristic (ROC) and the area under the ROC curve (AUC) were used to evaluate the discrimination ability of the model. The SHapley Additive exPlanations (SHAP) approach was applied to determine the Shapley values for specific patients, account for the effects of factors attributed to the model, and examine how specific traits affect the output of the model. Results The survival rate of patients with SAE in the MIMIC-IV database was 88.6% and that of 1,573 patients in the eICU-CRD database was 89.1%. The ROC of the XGBoost model indicated good discrimination. The AUCs for the training, test, and validation sets were 0.908, 0.898, and 0.778, respectively. The impact of each parameter on the XGBoost model was depicted using a SHAP plot, covering both positive (acute physiology score III, vasopressin, age, red blood cell distribution width, partial thromboplastin time, and norepinephrine) and negative (Glasgow Coma Scale) ones. Conclusion A prediction model developed using XGBoost can accurately predict the ICU mortality of patients with SAE. The SHAP approach can enhance the interpretability of the machine-learning model and support clinical decision-making.
Collapse
Affiliation(s)
- Jiayu Guo
- Department of Clinical Research, The First Affiliated Hospital of Jinan University, Guangzhou, China
- School of Public Health, Shannxi University of Chinese Medicine, Xianyang, China
| | - Hongtao Cheng
- Department of Clinical Research, The First Affiliated Hospital of Jinan University, Guangzhou, China
- School of Nursing, Jinan University, Guangzhou, Guangdong, China
| | - Zicheng Wang
- Department of Clinical Research, The First Affiliated Hospital of Jinan University, Guangzhou, China
| | - Mengmeng Qiao
- Department of Clinical Research, The First Affiliated Hospital of Jinan University, Guangzhou, China
- School of Public Health, Shannxi University of Chinese Medicine, Xianyang, China
| | - Jing Li
- Department of Clinical Research, The First Affiliated Hospital of Jinan University, Guangzhou, China
- School of Public Health, Shannxi University of Chinese Medicine, Xianyang, China
| | - Jun Lyu
- Department of Clinical Research, The First Affiliated Hospital of Jinan University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Traditional Chinese Medicine Informatization, Guangzhou, Guangdong, China
| |
Collapse
|
78
|
Villanueva P, Yang J, Radmer L, Liang X, Leung T, Ikuma K, Swanner ED, Howe A, Lee J. One-Week-Ahead Prediction of Cyanobacterial Harmful Algal Blooms in Iowa Lakes. Environ Sci Technol 2023; 57:20636-20646. [PMID: 38011382 DOI: 10.1021/acs.est.3c07764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Cyanobacterial harmful algal blooms (CyanoHABs) pose serious risks to inland water resources. Despite advancements in our understanding of associated environmental factors and modeling efforts, predicting CyanoHABs remains challenging. Leveraging an integrated water quality data collection effort in Iowa lakes, this study aimed to identify factors associated with hazardous microcystin levels and develop one-week-ahead predictive classification models. Using water samples from 38 Iowa lakes collected between 2018 and 2021, feature selection was conducted considering both linear and nonlinear properties. Subsequently, we developed three model types (Neural Network, XGBoost, and Logistic Regression) with different sampling strategies using the nine selected variables (mcyA_M, TKN, % hay/pasture, pH, mcyA_M:16S, % developed, DOC, dewpoint temperature, and ortho-P). Evaluation metrics demonstrated the strong performance of the Neural Network with oversampling (ROC-AUC 0.940, accuracy 0.861, sensitivity 0.857, specificity 0.857, LR+ 5.993, and 1/LR- 5.993), as well as the XGBoost with downsampling (ROC-AUC 0.944, accuracy 0.831, sensitivity 0.928, specificity 0.833, LR+ 5.557, and 1/LR- 11.569). This study exhibited the intricacies of modeling with limited data and class imbalances, underscoring the importance of continuous monitoring and data collection to improve predictive accuracy. Also, the methodologies employed can serve as meaningful references for researchers tackling similar challenges in diverse environments.
Collapse
Affiliation(s)
- Paul Villanueva
- Department of Agricultural and Biosystems Engineering, Iowa State University, Ames, Iowa 50011, United States
| | - Jihoon Yang
- Department of Agricultural and Biosystems Engineering, Iowa State University, Ames, Iowa 50011, United States
| | - Lorien Radmer
- Department of Agricultural and Biosystems Engineering, Iowa State University, Ames, Iowa 50011, United States
| | - Xuewei Liang
- Department of Civil, Construction and Environmental Engineering, Iowa State University, Ames, Iowa 50011, United States
| | - Tania Leung
- Department of Geological and Atmospheric Sciences, Iowa State University, Ames, Iowa 50011, United States
| | - Kaoru Ikuma
- Department of Civil, Construction and Environmental Engineering, Iowa State University, Ames, Iowa 50011, United States
| | - Elizabeth D Swanner
- Department of Geological and Atmospheric Sciences, Iowa State University, Ames, Iowa 50011, United States
| | - Adina Howe
- Department of Agricultural and Biosystems Engineering, Iowa State University, Ames, Iowa 50011, United States
| | - Jaejin Lee
- Department of Agricultural and Biosystems Engineering, Iowa State University, Ames, Iowa 50011, United States
| |
Collapse
|
79
|
Nambiar A, S H, S S. Model-agnostic explainable artificial intelligence tools for severity prediction and symptom analysis on Indian COVID-19 data. Front Artif Intell 2023; 6:1272506. [PMID: 38111787 PMCID: PMC10726049 DOI: 10.3389/frai.2023.1272506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 11/07/2023] [Indexed: 12/20/2023] Open
Abstract
Introduction The COVID-19 pandemic had a global impact and created an unprecedented emergency in healthcare and other related frontline sectors. Various Artificial-Intelligence-based models were developed to effectively manage medical resources and identify patients at high risk. However, many of these AI models were limited in their practical high-risk applicability due to their "black-box" nature, i.e., lack of interpretability of the model. To tackle this problem, Explainable Artificial Intelligence (XAI) was introduced, aiming to explore the "black box" behavior of machine learning models and offer definitive and interpretable evidence. XAI provides interpretable analysis in a human-compliant way, thus boosting our confidence in the successful implementation of AI systems in the wild. Methods In this regard, this study explores the use of model-agnostic XAI models, such as SHapley Additive exPlanations values (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME), for COVID-19 symptom analysis in Indian patients toward a COVID severity prediction task. Various machine learning models such as Decision Tree Classifier, XGBoost Classifier, and Neural Network Classifier are leveraged to develop Machine Learning models. Results and discussion The proposed XAI tools are found to augment the high performance of AI systems with human interpretable evidence and reasoning, as shown through the interpretation of various explainability plots. Our comparative analysis illustrates the significance of XAI tools and their impact within a healthcare context. The study suggests that SHAP and LIME analysis are promising methods for incorporating explainability in model development and can lead to better and more trustworthy ML models in the future.
Collapse
Affiliation(s)
- Athira Nambiar
- Department of Computational Intelligence, Faculty of Engineering and Technology, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, India
| | | | | |
Collapse
|
80
|
Ao Z, Li H, Chen J, Yuan J, Xia Z, Zhang J, Chen H, Wang H, Liu G, Qi L. A new approach to optimizing aeration using XGB-Bi-LSTM via the online monitoring of oxygen transfer efficiency and oxygen uptake rate. Environ Res 2023; 238:117142. [PMID: 37739155 DOI: 10.1016/j.envres.2023.117142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Revised: 09/04/2023] [Accepted: 09/13/2023] [Indexed: 09/24/2023]
Abstract
In wastewater treatment plants (WWTPs), aeration is vital for microbial oxygen needs. To achieve carbon neutrality, optimizing aeration for energy and emissions reduction is imperative. Machine learning (ML) is used in wastewater treatment to reveal complex rules in large data sets has become a trend. In this vein, the present paper proposes an aeration optimization approach based on the extreme gradient boosting-bidirectional long short-term memory (XGB-Bi-LSTM) model via the online monitoring of oxygen transfer efficiency (OTE) and oxygen uptake rate (OUR), thus allowing WWTPs to conserve energy and reduce indirect carbon emissions. The approach uses gain algorithm of XGB to calculate the importance of features and identify important parameters, and then uses Bi-LSTM to predict the target with important parameters as features. Operational data from a WWTP in Suzhou, China, is employed to train and test the approach, the performance of which is compared with ML models suitable for regression prediction tasks (XGB, random forest, light gradient boosting machine, gradient boosting and LSTM). Experimental results show the approach requires only a small number of input parameters to achieve good performance and outperforms other machine-learning models. When OTE and dissolved oxygen (DO) are used as features to predict the alpha factor (αF; since diffusers were used, multiply by the pollution factor F), the R-squared (R2) is 0.9977, the root mean square error (RMSE) is 0.0043, the mean absolute percentage error (MAPE) is 0.0069 and the median absolute error (MedAE) is 0.0032. When the predicted αF and the OUR are used as features to predict the air flow rate of an aeration unit, the R2 is 0.9901, the RMSE is 3.6150, the MAPE is 0.0209 and the MedAE is 1.5472. Using our optimized aeration approach, the energy consumption can be reduced by 23%.
Collapse
Affiliation(s)
- Ziding Ao
- Research Center for Low Carbon Technology of Water Environment, School of Environment and Natural Resource, Renmin University of China, Beijing, 100872, China
| | - Hao Li
- School of Environment and Natural Resource, Renmin University of China, Beijing, 100872, China
| | - Jiabo Chen
- Research Center for Low Carbon Technology of Water Environment, School of Environment and Natural Resource, Renmin University of China, Beijing, 100872, China
| | - Junli Yuan
- China Forestry Digital Co., Ltd, Beijing, 100036, China
| | - Zhiheng Xia
- Research Center for Low Carbon Technology of Water Environment, School of Environment and Natural Resource, Renmin University of China, Beijing, 100872, China
| | - Jinsen Zhang
- Research Center for Low Carbon Technology of Water Environment, School of Environment and Natural Resource, Renmin University of China, Beijing, 100872, China
| | - Huiling Chen
- Research Center for Low Carbon Technology of Water Environment, School of Environment and Natural Resource, Renmin University of China, Beijing, 100872, China
| | - Hongchen Wang
- Research Center for Low Carbon Technology of Water Environment, School of Environment and Natural Resource, Renmin University of China, Beijing, 100872, China
| | - Guohua Liu
- Research Center for Low Carbon Technology of Water Environment, School of Environment and Natural Resource, Renmin University of China, Beijing, 100872, China
| | - Lu Qi
- Research Center for Low Carbon Technology of Water Environment, School of Environment and Natural Resource, Renmin University of China, Beijing, 100872, China.
| |
Collapse
|
81
|
Nasiri S, Vaezihir A, Ahmadishali J. Designing soil contamination monitoring network in petroleum refineries by XGBoost weighting and geostatistical facility allocation methods. Environ Sci Pollut Res Int 2023; 30:118377-118395. [PMID: 37910363 DOI: 10.1007/s11356-023-30452-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 10/10/2023] [Indexed: 11/03/2023]
Abstract
Petroleum refineries are deemed strategic industrial sectors that can release toxic materials to the environment and cause potential hazards. In this regard, designing and installation of soil contamination monitoring networks at petroleum refineries is a necessity. In this research, we designed an optimal monitoring network with maximum coverage and minimum number of monitoring boreholes. The main regarded parameters are the groundwater contamination history, the location of effective structures, the location of flare stacks and the soil texture. In addition, the soil contamination was calculated based on previous contamination of the soil at the sampling points by the Entropy Weighting Model. It was employed with other parameters to estimate the soil contamination across the site. The Machine Learning method of XGBoost was implemented for estimating and assigning priority for every point of the site. To achieve the optimal network in the optimization program, four parameters were regarded including (a) the optimal value of the optimization program's objective function, (b) the number of Advance Zero-half cuts of the Cut Generation algorithm, (c) the consumed time, and (d) the optimal boreholes number of the network corresponding with different effective contamination detection radius. The network was designed by generalized Maximal Covering Location Problem and for optimizing it, the advantages of Mixed-Integer Linear Programming method were used. To evaluate the applicability of the method, it has been developed and implemented in a refinery in the south of Iran. 92.84% of XGBoost estimation accuracy, the optimal number of 113 and the effective contamination detection radius of 160 m were obtained for boreholes of the network. To investigate the efficiency of the model, a new Regret function has been defined. Furthermore, sensitivity analysis of the parameters and feature importance analysis of XGBoost both showed that the main parameter of the model was the location of effective structures.
Collapse
Affiliation(s)
- Shahla Nasiri
- Department of Mathematics, Statistics and Computer Science, University of Tabriz, Tabriz, Iran
| | | | - Jafar Ahmadishali
- Department of Mathematics, Statistics and Computer Science, University of Tabriz, Tabriz, Iran
| |
Collapse
|
82
|
Zheng J, Zhang Z, Wang J, Zhao R, Liu S, Yang G, Liu Z, Deng Z. Metabolic syndrome prediction model using Bayesian optimization and XGBoost based on traditional Chinese medicine features. Heliyon 2023; 9:e22727. [PMID: 38125549 PMCID: PMC10730568 DOI: 10.1016/j.heliyon.2023.e22727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 11/16/2023] [Accepted: 11/17/2023] [Indexed: 12/23/2023] Open
Abstract
Metabolic syndrome (MetS) has a high prevalence and is prone to many complications. However, current MetS diagnostic methods require blood tests that are not conducive to self-testing, so a user-friendly and accurate method for predicting MetS is needed to facilitate early detection and treatment. In this study, a MetS prediction model based on a simple, small number of Traditional Chinese Medicine (TCM) clinical indicators and biological indicators combined with machine learning algorithms is investigated. Electronic medical record data from 2040 patients who visited outpatient clinics at Guangdong Chinese medicine hospitals from 2020 to 2021 were used to investigate the fusion of Bayesian optimization (BO) and eXtreme gradient boosting (XGBoost) in order to create a BO-XGBoost model for screening nineteen key features in three categories: individual bio-information, TCM indicators, and TCM habits that influence MetS prediction. Subsequently, the predictive diagnostic model for MetS was developed. The experimental results revealed that the model proposed in this paper achieved values of 93.35 %, 90.67 %, 80.40 %, and 0.920 for the F1, sensitivity, FRS, and AUC metrics, respectively. These values outperformed those of the seven other tested machine learning models. Finally, this study developed an intelligent prediction application for MetS based on the proposed model, which can be utilized by ordinary users to perform self-diagnosis through a web-based questionnaire, thereby accomplishing the objective of early detection and intervention for MetS.
Collapse
Affiliation(s)
- Jianhua Zheng
- College of Information Science and Technology, Zhongkai University of Agriculture and Engineering, Guangzhou, 510225, China
- Guangdong Provincial Key Laboratory of Traditional Chinese Medicine Informatization, Guangzhou, 510630, China
| | - Zihao Zhang
- College of Information Science and Technology, Zhongkai University of Agriculture and Engineering, Guangzhou, 510225, China
| | - Jinhe Wang
- Xiyuan Hospital of China Academy of Chinese Medical Sciences, Beijing, 100091, China
| | - Ruolin Zhao
- College of Information Science and Technology, Zhongkai University of Agriculture and Engineering, Guangzhou, 510225, China
| | - Shuangyin Liu
- College of Information Science and Technology, Zhongkai University of Agriculture and Engineering, Guangzhou, 510225, China
- Guangdong Provincial Key Laboratory of Traditional Chinese Medicine Informatization, Guangzhou, 510630, China
| | - Gaolin Yang
- College of Information Science and Technology, Zhongkai University of Agriculture and Engineering, Guangzhou, 510225, China
| | - Zhengjie Liu
- Guangdong Provincial Hospital of Chinese Medicine, Guangzhou, 510120, China
- The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangzhou, 510120, China
| | - Zhengyuan Deng
- College of Information Science and Technology, Zhongkai University of Agriculture and Engineering, Guangzhou, 510225, China
- Network and Educational Technology Center, Jinan University, Guangzhou, 510630, China
| |
Collapse
|
83
|
Wang L, Duan SB, Yan P, Luo XQ, Zhang NY. Utilization of interpretable machine learning model to forecast the risk of major adverse kidney events in elderly patients in critical care. Ren Fail 2023; 45:2215329. [PMID: 37218683 DOI: 10.1080/0886022x.2023.2215329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023] Open
Abstract
Major adverse kidney events within 30 d (MAKE30) implicates poor outcomes for elderly patients in the intensive care unit (ICU). This study aimed to predict the occurrence of MAKE30 in elderly ICU patients using machine learning. The study cohort comprised 2366 elderly ICU patients admitted to the Second Xiangya Hospital of Central South University between January 2020 and December 2021. Variables including demographic information, laboratory values, physiological parameters, and medical interventions were used to construct an extreme gradient boosting (XGBoost) -based prediction model. Out of the 2366 patients, 1656 were used for model derivation and 710 for testing. The incidence of MAKE30 was 13.8% in the derivation cohort and 13.2% in the test cohort. The average area under the receiver operating characteristic curve of the XGBoost model was 0.930 (95% CI: 0.912-0.946) in the training set and 0.851 (95% CI: 0.810-0.890) in the test set. The top 8 predictors of MAKE30 tentatively identified by the Shapley additive explanations method were Acute Physiology and Chronic Health Evaluation II score, serum creatinine, blood urea nitrogen, Simplified Acute Physiology Score II score, Sequential Organ Failure Assessment score, aspartate aminotransferase, arterial blood bicarbonate, and albumin. The XGBoost model accurately predicted the occurrence of MAKE30 in elderly ICU patients, and the findings of this study provide valuable information to clinicians for making informed clinical decisions.
Collapse
Affiliation(s)
- Lin Wang
- Department of Nephrology, Hunan Key Laboratory of Kidney Disease and Blood Purification, The Second Xiangya Hospital of Central South University, Changsha, Hunan, China
| | - Shao-Bin Duan
- Department of Nephrology, Hunan Key Laboratory of Kidney Disease and Blood Purification, The Second Xiangya Hospital of Central South University, Changsha, Hunan, China
| | - Ping Yan
- Department of Nephrology, Hunan Key Laboratory of Kidney Disease and Blood Purification, The Second Xiangya Hospital of Central South University, Changsha, Hunan, China
| | - Xiao-Qin Luo
- Department of Nephrology, Hunan Key Laboratory of Kidney Disease and Blood Purification, The Second Xiangya Hospital of Central South University, Changsha, Hunan, China
| | - Ning-Ya Zhang
- Information Center, The Second Xiangya Hospital of Central South University, Changsha, Hunan, China
| |
Collapse
|
84
|
Hammoudi Halat D, Abdel-Salam ASG, Bensaid A, Soltani A, Alsarraj L, Dalli R, Malki A. Use of machine learning to assess factors affecting progression, retention, and graduation in first-year health professions students in Qatar: a longitudinal study. BMC Med Educ 2023; 23:909. [PMID: 38036997 PMCID: PMC10691082 DOI: 10.1186/s12909-023-04887-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Accepted: 11/20/2023] [Indexed: 12/02/2023]
Abstract
BACKGROUND Across higher education, student retention, progression, and graduation are considered essential elements of students' academic success. However, there is scarce literature analyzing these attributes across health professions education. The current study aims to explore rates of student retention, progression, and graduation across five colleges of the Health Cluster at Qatar University, and identify predictive factors. METHODS Secondary longitudinal data for students enrolled at the Health Cluster between 2015 and 2021 were subject to descriptive statistics to obtain retention, progression and graduation rates. The importance of student demographic and academic variables in predicting retention, progression, or graduation was determined by a predictive model using XGBoost, after preparation and feature engineering. A predictive model was constructed, in which weak decision tree models were combined to capture the relationships between the initial predictors and student outcomes. A feature importance score for each predictor was estimated; features that had higher scores were indicative of higher influence on student retention, progression, or graduation. RESULTS A total of 88% of the studied cohorts were female Qatari students. The rates of retention and progression across the studied period showed variable distribution, and the majority of students graduated from health colleges within a timeframe of 4-7 years. The first academic year performance, followed by high school GPA, were factors that respectively ranked first and second in importance in predicting retention, progression, and graduation of health majors students. The health college ranked third in importance affecting retention and graduation and fifth regarding progression. The remaining factors including nationality, gender, and whether students were enrolled in a common first year experience for all colleges, had lower predictive importance. CONCLUSIONS Student retention, progression, and graduation at Qatar University Health Cluster is complex and multifactorial. First year performance and secondary education before college are important in predicting progress in health majors after the first year of university study. Efforts to increase retention, progression, and graduation rates should include academic advising, student support, engagement and communication. Machine learning-based predictive algorithms remain a useful tool that can be precisely leveraged to identify key variables affecting health professions students' performance.
Collapse
Affiliation(s)
| | - Abdel-Salam G Abdel-Salam
- Department of Mathematics, Statistics, and Physics, College of Arts and Sciences, Qatar University, Doha, Qatar
- Student Data Management Department, Student Experience Department, Student Affairs, Qatar University, Doha, Qatar
| | - Ahmed Bensaid
- Student Data Management Department, Student Experience Department, Student Affairs, Qatar University, Doha, Qatar
| | | | - Lama Alsarraj
- Academic Quality Department, QU Health, Qatar University, Doha, Qatar
| | - Roua Dalli
- Academic Quality Department, QU Health, Qatar University, Doha, Qatar
| | - Ahmed Malki
- Academic Quality Department, QU Health, Qatar University, Doha, Qatar.
| |
Collapse
|
85
|
Nedadur R, Bhatt N, Chung J, Chu MWA, Ouzounian M, Wang B. Machine learning and decision making in aortic arch repair. J Thorac Cardiovasc Surg 2023:S0022-5223(23)01108-X. [PMID: 38016622 DOI: 10.1016/j.jtcvs.2023.11.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 11/16/2023] [Accepted: 11/19/2023] [Indexed: 11/30/2023]
Abstract
BACKGROUND Decision making during aortic arch surgery regarding cannulation strategy and nadir temperature are important in reducing risk, and there is a need to determine the best individualized strategy in a data-driven fashion. Using machine learning (ML), we modeled the risk of death or stroke in elective aortic arch surgery based on patient characteristics and intraoperative decisions. METHODS The study cohort comprised 1323 patients from 9 institutions who underwent an elective aortic arch procedure between 2002 and 2021. A total of 69 variables were used in developing a logistic regression and XGBoost ML model trained for binary classification of mortality and stroke. Shapely additive explanations (SHAP) values were studied to determine the importance of intraoperative decisions. RESULTS During the study period, 3.9% of patients died and 5.4% experienced stroke. XGBoost (area under the curve [AUC], 0.77 for death, 0.87 for stroke) demonstrated better discrimination than logistic regression (AUC, 0.65 for death, 0.75 for stroke). From SHAP analysis, intraoperative decisions are 3 of the top 20 predictors of death and 6 of the top 20 predictors of stroke. Predictor weights are patient-specific and reflect the patient's preoperative characteristics and other intraoperative decisions. Patient-level simulation also demonstrates the variable contribution of each decision in the context of the other choices that are made. CONCLUSIONS Using ML, we can more accurately identify patients at risk of death and stroke, as well as the strategy that better reduces the risk of adverse events compared to traditional prediction models. Operative decisions made may be tailored based on a patient's specific characteristics, allowing for maximized, personalized benefit.
Collapse
Affiliation(s)
- Rashmi Nedadur
- Peter Munk Cardiac Center, Toronto General Hospital, Toronto, Ontario, Canada
| | - Nitish Bhatt
- Peter Munk Cardiac Center, Toronto General Hospital, Toronto, Ontario, Canada
| | - Jennifer Chung
- Peter Munk Cardiac Center, Toronto General Hospital, Toronto, Ontario, Canada
| | - Michael W A Chu
- Department of Cardiac Surgery, London Health Sciences Center, London, Ontario, Canada
| | - Maral Ouzounian
- Peter Munk Cardiac Center, Toronto General Hospital, Toronto, Ontario, Canada.
| | - Bo Wang
- Peter Munk Cardiac Center, Toronto General Hospital, Toronto, Ontario, Canada
| |
Collapse
|
86
|
Wu J, Zhang C, He F, Wang Y, Zeng L, Liu W, Zhao D, Mao J, Gao F. Factors Affecting Intention to Leave Among ICU Healthcare Professionals in China: Insights from a Cross-Sectional Survey and XGBoost Analysis. Risk Manag Healthc Policy 2023; 16:2543-2553. [PMID: 38024488 PMCID: PMC10676671 DOI: 10.2147/rmhp.s432847] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 11/02/2023] [Indexed: 12/01/2023] Open
Abstract
Background The intention to leave among intensive care unit (ICU) healthcare professionals in China has become a concerning issue. Therefore, understanding the factors influencing the intention to leave and implementing appropriate measures have become urgent needs for maintaining a stable healthcare workforce. Objective This study aims to investigate the current status of intention to leave among ICU healthcare professionals in China, explore the relevant factors affecting this intention, and provide targeted recommendations to reduce the intention to leave among healthcare professionals. Methods A cross-sectional survey was conducted, involving ICU healthcare professionals from 3-A hospitals of the 34 provinces in China. The survey encompassed 22 indicators, including demographic information (marital status, children, income), work-related factors (weekly working hours, night shift frequency, hospital environment), and psychological assessment (using Symptom Checklist-90 (SCL-90)). The data from a sample population of 3653 individuals were analyzed using the extreme gradient boosting (XGBoost) method to predict intention to leave. Results The survey results revealed that 62.09% (2268 individuals) of the surveyed ICU healthcare professionals expressed an intention to leave. The XGBoost model achieved a predictive accuracy of 75.38% and an Area Under the Curve (AUC) of 0.77. Conclusion Satisfaction with income was found to be the strongest predictor of intention to leave among ICU healthcare professionals. Additionally, factors such as years of experience, night shift frequency, and pride in hospital work were found to play significant roles in influencing the intention to leave.
Collapse
Affiliation(s)
- Jiangnan Wu
- Department of Artificial Intelligence, Tianjin University of Technology, Tianjin, People’s Republic of China
| | - Chao Zhang
- Sixth Department of Oncology, Hebei General Hospital, Shijiazhuang, People’s Republic of China
| | - Feng He
- The Second Hospital of Hebei Medical University, Shijiazhuang, People’s Republic of China
| | - Yuan Wang
- Department of Neurosurgery, Tangshan Gongren Hospital, Tangshan, People’s Republic of China
| | - Liangnan Zeng
- Department of Nursing, Chengdu Fifth People’s Hospital, The Fifth People’s Hospital Affiliated to Chengdu University of Traditional Chinese Medicine, Chengdu, People’s Republic of China
| | - Wei Liu
- Hebei Psychological Counselor Association, Shijiazhuang, People’s Republic of China
| | - Di Zhao
- Department of Neurosurgery, The Fourth Hospital of Hebei Medical University, Shijiazhuang, People’s Republic of China
| | - Jingkun Mao
- Department of Artificial Intelligence, Tianjin University of Technology, Tianjin, People’s Republic of China
| | - Fei Gao
- Hebei General Hospital, Shijiazhuang, People’s Republic of China
| |
Collapse
|
87
|
Sun Y, Zhao Z, Tong H, Sun B, Liu Y, Ren N, You S. Machine Learning Models for Inverse Design of the Electrochemical Oxidation Process for Water Purification. Environ Sci Technol 2023; 57:17990-18000. [PMID: 37189261 DOI: 10.1021/acs.est.2c08771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
In this study, a machine learning (ML) framework is developed toward target-oriented inverse design of the electrochemical oxidation (EO) process for water purification. The XGBoost model exhibited the best performances for prediction of reaction rate (k) based on training the data set relevant to pollutant characteristics and reaction conditions, indicated by Rext2 of 0.84 and RMSEext of 0.79. Based on 315 data points collected from the literature, the current density, pollutant concentration, and gap energy (Egap) were identified to be the most impactful parameters available for the inverse design of the EO process. In particular, adding reaction conditions as model input features allowed provision of more available information and an increase in the sample size of the data set to improve the model accuracy. The feature importance analysis was performed for revealing the data pattern and feature interpretation by using Shapley additive explanations (SHAP). The ML-based inverse design for the EO process was generalized to a random case for tailoring the optimum conditions with phenol and 2,4-dichlorophenol (2,4-DCP) serving as model pollutants. The resulting predicted k values were close to the experimental k values by experimental verification, accounting for the relative error lower than 5%. This study provides a paradigm shift from conventional trial-and-error mode to data-driven mode for advancing research and development of the EO process by a time-saving, labor-effective, and environmentally friendly target-oriented strategy, which makes electrochemical water purification more efficient, more economic, and more sustainable in the context of global carbon peaking and carbon neutrality.
Collapse
Affiliation(s)
- Ye Sun
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, P. R. China
| | - Zhiyuan Zhao
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, P. R. China
| | - Hailong Tong
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, P. R. China
- State Key Laboratory of Veterinary Biotechnology, Harbin Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Harbin 150069, P. R. China
| | - Baiming Sun
- State Key Laboratory of Veterinary Biotechnology, Harbin Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Harbin 150069, P. R. China
| | - Yanbiao Liu
- College of Environmental Science and Engineering, Textile Pollution Controlling Engineering Center of the Ministry of Ecology and Environment, Donghua University, Shanghai 201620, China
| | - Nanqi Ren
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, P. R. China
| | - Shijie You
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, P. R. China
| |
Collapse
|
88
|
Liu J, Cao B, Luo Y, Chen X, Han H, Li L, Zeng J. Risk factors of major bleeding detected by machine learning method in patients undergoing liver resection with controlled low central venous pressure technique. Postgrad Med J 2023; 99:1280-1286. [PMID: 37794600 DOI: 10.1093/postmj/qgad087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 08/18/2023] [Accepted: 09/01/2023] [Indexed: 10/06/2023]
Abstract
BACKGROUND Controlled low central venous pressure (CLCVP) technique has been extensively validated in clinical practices to decrease intraoperative bleeding during liver resection process; however, no studies to date have attempted to propose a scoring method to better understand what risk factors might still be responsible for bleeding when CLCVP technique was implemented. METHODS We aimed to use machine learning to develop a model for detecting the risk factors of major bleeding in patients who underwent liver resection using CLCVP technique. We reviewed the medical records of 1077 patients who underwent liver surgery between January 2017 and June 2020. We evaluated the XGBoost model and logistic regression model using stratified K-fold cross-validation (K = 5), and the area under the receiver operating characteristic curve, the recall rate, precision rate, and accuracy score were calculated and compared. The SHapley Additive exPlanations was employed to identify the most influencing factors and their contribution to the prediction. RESULTS The XGBoost classifier with an accuracy of 0.80 and precision of 0.89 outperformed the logistic regression model with an accuracy of 0.76 and precision of 0.79. According to the SHapley Additive exPlanations summary plot, the top six variables ranked from most to least important included intraoperative hematocrit, surgery duration, intraoperative lactate, preoperative hemoglobin, preoperative aspartate transaminase, and Pringle maneuver duration. CONCLUSIONS Anesthesiologists should be aware of the potential impact of increased Pringle maneuver duration and lactate levels on intraoperative major bleeding in patients undergoing liver resection with CLCVP technique. What is already known on this topic-Low central venous pressure technique has already been extensively validated in clinical practices, with no prediction model for major bleeding. What this study adds-The XGBoost classifier outperformed logistic regression model for the prediction of major bleeding during liver resection with low central venous pressure technique. How this study might affect research, practice, or policy-anesthesiologists should be aware of the potential impact of increased PM duration and lactate levels on intraoperative major bleeding in patients undergoing liver resection with CLCVP technique.
Collapse
Affiliation(s)
- Jing Liu
- Department of Anesthesiology, the Eighth Affiliated Hospital, Sun Yat-sen University, Shenzhen 518033, China
| | - Bingbing Cao
- Department of Anesthesiology, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, Guangzhou 510000, China
| | - Yuelian Luo
- Department of Anesthesiology, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, Guangzhou 510000, China
| | - Xianqing Chen
- Department of Hepatobiliary and Pancreatic Surgery, the Eighth Affiliated Hospital, Sun Yat-sen University, Shenzhen 518033, China
| | - Hong Han
- Department of Anesthesiology, the Eighth Affiliated Hospital, Sun Yat-sen University, Shenzhen 518033, China
| | - Li Li
- Department of Anesthesiology, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, Guangzhou 510000, China
| | - Jianfeng Zeng
- Department of Anesthesiology, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, Guangzhou 510000, China
| |
Collapse
|
89
|
Kalita K, Ganesh N, Jayalakshmi S, Chohan JS, Mallik S, Qin H. Multi-Objective artificial bee colony optimized hybrid deep belief network and XGBoost algorithm for heart disease prediction. Front Digit Health 2023; 5:1279644. [PMID: 38034907 PMCID: PMC10687430 DOI: 10.3389/fdgth.2023.1279644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 10/27/2023] [Indexed: 12/02/2023] Open
Abstract
The global rise in heart disease necessitates precise prediction tools to assess individual risk levels. This paper introduces a novel Multi-Objective Artificial Bee Colony Optimized Hybrid Deep Belief Network and XGBoost (HDBN-XG) algorithm, enhancing coronary heart disease prediction accuracy. Key physiological data, including Electrocardiogram (ECG) readings and blood volume measurements, are analyzed. The HDBN-XG algorithm assesses data quality, normalizes using z-score values, extracts features via the Computational Rough Set method, and constructs feature subsets using the Multi-Objective Artificial Bee Colony approach. Our findings indicate that the HDBN-XG algorithm achieves an accuracy of 99%, precision of 95%, specificity of 98%, sensitivity of 97%, and F1-measure of 96%, outperforming existing classifiers. This paper contributes to predictive analytics by offering a data-driven approach to healthcare, providing insights to mitigate the global impact of coronary heart disease.
Collapse
Affiliation(s)
- Kanak Kalita
- Department of Mechanical Engineering, Vel Tech Rangarajan Dr. Sagunthala R & D Institute of Science and Technology, Chennai, India
| | - Narayanan Ganesh
- School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, India
| | - Sambandam Jayalakshmi
- Department of Master of Computer Applications, MEASI Institute of Information Technology, Chennai, India
| | - Jasgurpreet Singh Chohan
- Department of Mechanical Engineering and University Centre for Research & Development, Chandigarh University, Mohali, India
| | - Saurav Mallik
- Department of Environmental Health, Harvard T H Chan School of Public Health, Boston, MA, United States
| | - Hong Qin
- Department of Computer Science and Engineering, University of Tennessee at Chattanooga, Chattanooga, TN, United States
| |
Collapse
|
90
|
Sun S, Wang L, Lin J, Sun Y, Ma C. An effective prediction model based on XGBoost for the 12-month recurrence of AF patients after RFA. BMC Cardiovasc Disord 2023; 23:561. [PMID: 37974062 PMCID: PMC10655386 DOI: 10.1186/s12872-023-03599-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Accepted: 11/07/2023] [Indexed: 11/19/2023] Open
Abstract
BACKGROUND Atrial fibrillation (AF) is a common heart rhythm disorder that can lead to complications such as stroke and heart failure. Radiofrequency ablation (RFA) is a procedure used to treat AF, but it is not always successful in maintaining a normal heart rhythm. This study aimed to construct a clinical prediction model based on extreme gradient boosting (XGBoost) for AF recurrence 12 months after ablation. METHODS The 27-dimensional data of 359 patients with AF undergoing RFA in the First Affiliated Hospital of Soochow University from October 2018 to November 2021 were retrospectively analysed. We adopted the logistic regression, support vector machine (SVM), random forest (RF) and XGBoost methods to conduct the experiment. To evaluate the performance of the prediction, we used the area under the receiver operating characteristic curve (AUC), the area under the precision-recall curve (AP), and calibration curves of both the training and testing sets. Finally, Shapley additive explanations (SHAP) were utilized to explain the significance of the variables. RESULTS Of the 27-dimensional variables, ejection fraction (EF) of the left atrial appendage (LAA), N-terminal probrain natriuretic peptide (NT-proBNP), global peak longitudinal strain of the LAA (LAAGPLS), left atrial diameter (LAD), diabetes mellitus (DM) history, and female sex had a significant role in the predictive model. The experimental results demonstrated that XGBoost exhibited the best performance among these methods, and the accuracy, specificity, sensitivity, precision and F1 score (a measure of test accuracy) of XGBoost were 86.1%, 89.7%, 71.4%, 62.5% and 0.67, respectively. In addition, SHAP analysis also proved that the 6 parameters were decisive for the effect of the XGBoost-based prediction model. CONCLUSIONS We proposed an effective model based on XGBoost that can be used to predict the recurrence of AF patients after RFA. This prediction result can guide treatment decisions and help to optimize the management of AF.
Collapse
Affiliation(s)
- ShiKun Sun
- The First Affiliated Hospital of Soochow University, Suzhou, 215006, China
| | - Li Wang
- The First Affiliated Hospital of Soochow University, Suzhou, 215006, China
| | - Jia Lin
- The First Affiliated Hospital of Soochow University, Suzhou, 215006, China
| | - YouFen Sun
- The Shengcheng Street Health Center, Shouguang, 262700, China.
| | - ChangSheng Ma
- The First Affiliated Hospital of Soochow University, Suzhou, 215006, China.
| |
Collapse
|
91
|
Li W, Yu S, Yang R, Tian Y, Zhu T, Liu H, Jiao D, Zhang F, Liu X, Tao L, Gao Y, Li Q, Zhang J, Guo X. Machine Learning Model of ResNet50-Ensemble Voting for Malignant-Benign Small Pulmonary Nodule Classification on Computed Tomography Images. Cancers (Basel) 2023; 15:5417. [PMID: 38001677 PMCID: PMC10670717 DOI: 10.3390/cancers15225417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 09/21/2023] [Accepted: 09/26/2023] [Indexed: 11/26/2023] Open
Abstract
BACKGROUND The early detection of benign and malignant lung tumors enabled patients to diagnose lesions and implement appropriate health measures earlier, dramatically improving lung cancer patients' quality of living. Machine learning methods performed admirably when recognizing small benign and malignant lung nodules. However, exploration and investigation are required to fully leverage the potential of machine learning in distinguishing between benign and malignant small lung nodules. OBJECTIVE The aim of this study was to develop and evaluate the ResNet50-Ensemble Voting model for detecting the benign and malignant nature of small pulmonary nodules (<20 mm) based on CT images. METHODS In this study, 834 CT imaging data from 396 patients with small pulmonary nodules were gathered and randomly assigned to the training and validation sets in an 8:2 ratio. ResNet50 and VGG16 algorithms were utilized to extract CT image features, followed by XGBoost, SVM, and Ensemble Voting techniques for classification, for a total of ten different classes of machine learning combinatorial classifiers. Indicators such as accuracy, sensitivity, and specificity were used to assess the models. The collected features are also shown to investigate the contrasts between them. RESULTS The algorithm we presented, ResNet50-Ensemble Voting, performed best in the test set, with an accuracy of 0.943 (0.938, 0.948) and sensitivity and specificity of 0.964 and 0.911, respectively. VGG16-Ensemble Voting had an accuracy of 0.887 (0.880, 0.894), with a sensitivity and specificity of 0.952 and 0.784, respectively. CONCLUSION Machine learning models that were implemented and integrated ResNet50-Ensemble Voting performed exceptionally well in identifying benign and malignant small pulmonary nodules (<20 mm) from various sites, which might help doctors in accurately diagnosing the nature of early-stage lung nodules in clinical practice.
Collapse
Affiliation(s)
- Weiming Li
- Department of Epidemiology and Health Statistics, School of Public Health, Capital Medical University, Beijing 100069, China; (W.L.); (S.Y.); (R.Y.); (Y.T.); (T.Z.); (H.L.); (D.J.); (F.Z.); (X.L.); (L.T.)
- Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing 100069, China
| | - Siqi Yu
- Department of Epidemiology and Health Statistics, School of Public Health, Capital Medical University, Beijing 100069, China; (W.L.); (S.Y.); (R.Y.); (Y.T.); (T.Z.); (H.L.); (D.J.); (F.Z.); (X.L.); (L.T.)
- Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing 100069, China
| | - Runhuang Yang
- Department of Epidemiology and Health Statistics, School of Public Health, Capital Medical University, Beijing 100069, China; (W.L.); (S.Y.); (R.Y.); (Y.T.); (T.Z.); (H.L.); (D.J.); (F.Z.); (X.L.); (L.T.)
- Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing 100069, China
| | - Yixing Tian
- Department of Epidemiology and Health Statistics, School of Public Health, Capital Medical University, Beijing 100069, China; (W.L.); (S.Y.); (R.Y.); (Y.T.); (T.Z.); (H.L.); (D.J.); (F.Z.); (X.L.); (L.T.)
- Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing 100069, China
| | - Tianyu Zhu
- Department of Epidemiology and Health Statistics, School of Public Health, Capital Medical University, Beijing 100069, China; (W.L.); (S.Y.); (R.Y.); (Y.T.); (T.Z.); (H.L.); (D.J.); (F.Z.); (X.L.); (L.T.)
- Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing 100069, China
| | - Haotian Liu
- Department of Epidemiology and Health Statistics, School of Public Health, Capital Medical University, Beijing 100069, China; (W.L.); (S.Y.); (R.Y.); (Y.T.); (T.Z.); (H.L.); (D.J.); (F.Z.); (X.L.); (L.T.)
- Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing 100069, China
| | - Danyang Jiao
- Department of Epidemiology and Health Statistics, School of Public Health, Capital Medical University, Beijing 100069, China; (W.L.); (S.Y.); (R.Y.); (Y.T.); (T.Z.); (H.L.); (D.J.); (F.Z.); (X.L.); (L.T.)
- Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing 100069, China
| | - Feng Zhang
- Department of Epidemiology and Health Statistics, School of Public Health, Capital Medical University, Beijing 100069, China; (W.L.); (S.Y.); (R.Y.); (Y.T.); (T.Z.); (H.L.); (D.J.); (F.Z.); (X.L.); (L.T.)
- Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing 100069, China
| | - Xiangtong Liu
- Department of Epidemiology and Health Statistics, School of Public Health, Capital Medical University, Beijing 100069, China; (W.L.); (S.Y.); (R.Y.); (Y.T.); (T.Z.); (H.L.); (D.J.); (F.Z.); (X.L.); (L.T.)
- Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing 100069, China
| | - Lixin Tao
- Department of Epidemiology and Health Statistics, School of Public Health, Capital Medical University, Beijing 100069, China; (W.L.); (S.Y.); (R.Y.); (Y.T.); (T.Z.); (H.L.); (D.J.); (F.Z.); (X.L.); (L.T.)
- Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing 100069, China
| | - Yan Gao
- Department of Nuclear Medicine, Xuanwu Hospital Capital Medical University, Beijing 100053, China;
| | - Qiang Li
- Beijing Physical Examination Center, Beijing 100050, China; (Q.L.); (J.Z.)
| | - Jingbo Zhang
- Beijing Physical Examination Center, Beijing 100050, China; (Q.L.); (J.Z.)
| | - Xiuhua Guo
- Department of Epidemiology and Health Statistics, School of Public Health, Capital Medical University, Beijing 100069, China; (W.L.); (S.Y.); (R.Y.); (Y.T.); (T.Z.); (H.L.); (D.J.); (F.Z.); (X.L.); (L.T.)
- Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing 100069, China
| |
Collapse
|
92
|
Tore U, Abilgazym A, Asunsolo-del-Barco A, Terzic M, Yemenkhan Y, Zollanvari A, Sarria-Santamera A. Diagnosis of Endometriosis Based on Comorbidities: A Machine Learning Approach. Biomedicines 2023; 11:3015. [PMID: 38002015 PMCID: PMC10669733 DOI: 10.3390/biomedicines11113015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 10/27/2023] [Accepted: 10/31/2023] [Indexed: 11/26/2023] Open
Abstract
Endometriosis is defined as the presence of estrogen-dependent endometrial-like tissue outside the uterine cavity. Despite extensive research, endometriosis is still an enigmatic disease and is challenging to diagnose and treat. A common clinical finding is the association of endometriosis with multiple diseases. We use a total of 627,566 clinically collected data from cases of endometriosis (0.82%) and controls (99.18%) to construct and evaluate predictive models. We develop a machine learning platform to construct diagnostic tools for endometriosis. The platform consists of logistic regression, decision tree, random forest, AdaBoost, and XGBoost for prediction, and uses Shapley Additive Explanation (SHAP) values to quantify the importance of features. In the model selection phase, the constructed XGBoost model performs better than other algorithms while achieving an area under the curve (AUC) of 0.725 on the test set during the evaluation phase, resulting in a specificity of 62.9% and a sensitivity of 68.6%. The model leads to a quite low positive predictive value of 1.5%, but a quite satisfactory negative predictive value of 99.58%. Moreover, the feature importance analysis points to age, infertility, uterine fibroids, anxiety, and allergic rhinitis as the top five most important features for predicting endometriosis. Although these results show the feasibility of using machine learning to improve the diagnosis of endometriosis, more research is required to improve the performance of predictive models for the diagnosis of endometriosis. This state of affairs is in part attributed to the complex nature of the condition and, at the same time, the administrative nature of our features. Should more informative features be used, we could possibly achieve a higher AUC for predicting endometriosis. As a result, we merely perceive the constructed predictive model as a tool to provide auxiliary information in clinical practice.
Collapse
Affiliation(s)
- Ulan Tore
- School of Engineering and Digital Sciences, Nazarbayev University, Astana 010000, Kazakhstan; (U.T.); (A.A.)
| | - Aibek Abilgazym
- School of Engineering and Digital Sciences, Nazarbayev University, Astana 010000, Kazakhstan; (U.T.); (A.A.)
| | - Angel Asunsolo-del-Barco
- Department of Surgery, Medical and Social Sciences, Faculty of Medicine, University of Alcalá, 288871 Alcalá de Henares, Spain;
- Department of Epidemiology and Biostatistics, Graduate School of Public Health and Health Policy, City University of New York (CUNY), New York, NY 10028, USA
- Ramón y Cajal Institute of Healthcare Research (IRYCIS), 28034 Madrid, Spain
| | - Milan Terzic
- Department of Surgery, School of Medicine, Nazarbayev University, Astana 010000, Kazakhstan;
- Clinical Academic Department of Women’s Health, CF “University Medical Center”, Astana 010000, Kazakhstan
- Department of Obstetrics, Gynecology and Reproductive Sciences, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Yerden Yemenkhan
- Department of Medicine, School of Medicine, Nazarbayev University, Astana 010000, Kazakhstan;
| | - Amin Zollanvari
- School of Engineering and Digital Sciences, Nazarbayev University, Astana 010000, Kazakhstan; (U.T.); (A.A.)
| | - Antonio Sarria-Santamera
- Department of Biomedical Sciences, School of Medicine, Nazarbayev University, Astana 010000, Kazakhstan;
| |
Collapse
|
93
|
Saylam B, İncel ÖD. Quantifying Digital Biomarkers for Well-Being: Stress, Anxiety, Positive and Negative Affect via Wearable Devices and Their Time-Based Predictions. Sensors (Basel) 2023; 23:8987. [PMID: 37960685 PMCID: PMC10649682 DOI: 10.3390/s23218987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 10/27/2023] [Accepted: 11/03/2023] [Indexed: 11/15/2023]
Abstract
Wearable devices have become ubiquitous, collecting rich temporal data that offers valuable insights into human activities, health monitoring, and behavior analysis. Leveraging these data, researchers have developed innovative approaches to classify and predict time-based patterns and events in human life. Time-based techniques allow the capture of intricate temporal dependencies, which is the nature of the data coming from wearable devices. This paper focuses on predicting well-being factors, such as stress, anxiety, and positive and negative affect, on the Tesserae dataset collected from office workers. We examine the performance of different methodologies, including deep-learning architectures, LSTM, ensemble techniques, Random Forest (RF), and XGBoost, and compare their performances for time-based and non-time-based versions. In time-based versions, we investigate the effect of previous records of well-being factors on the upcoming ones. The overall results show that time-based LSTM performs the best among conventional (non-time-based) RF, XGBoost, and LSTM. The performance even increases when we consider a more extended previous period, in this case, 3 past-days rather than 1 past-day to predict the next day. Furthermore, we explore the corresponding biomarkers for each well-being factor using feature ranking. The obtained rankings are compatible with the psychological literature. In this work, we validated them based on device measurements rather than subjective survey responses.
Collapse
Affiliation(s)
- Berrenur Saylam
- Computer Engineering Department, Boğaziçi University, 34342 İstanbul, Türkiye;
| | | |
Collapse
|
94
|
Yuan Y, Han Y, Yap CW, Kochhar JS, Li H, Xiang X, Kang L. Prediction of drug permeation through microneedled skin by machine learning. Bioeng Transl Med 2023; 8:e10512. [PMID: 38023708 PMCID: PMC10658566 DOI: 10.1002/btm2.10512] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 02/22/2023] [Accepted: 03/08/2023] [Indexed: 04/07/2023] Open
Abstract
Stratum corneum is the outermost layer of the skin preventing external substances from entering human body. Microneedles (MNs) are sharp protrusions of a few hundred microns in length, which can penetrate the stratum corneum to facilitate drug permeation through skin. To determine the amount of drug delivered through skin, in vitro drug permeation testing is commonly used, but the testing is costly and time-consuming. To address this issue, machine learning methods were employed to predict drug permeation through the skin, circumventing the need of conducting skin permeation experiments. By comparing the experimental data and simulated results, it was found extreme gradient boosting (XGBoost) was the best among the four simulation methods. It was also found that drug loading, permeation time, and MN surface area were critical parameters in the models. In conclusion, machine learning is useful to predict drug permeation profiles for MN-facilitated transdermal drug delivery.
Collapse
Affiliation(s)
- Yunong Yuan
- School of Pharmacy, Faculty of Medicine and HealthUniversity of SydneyNew South Wales2006Australia
| | - Yiting Han
- Department of Clinical Pharmacy and Pharmacy Administration, School of PharmacyFudan UniversityShanghai201203China
- Harvard T.H. Chan School of Public Health677 Huntington AvenueBostonMassachusetts02115USA
| | - Chun Wei Yap
- National Healthcare Group1 Fusionopolis LinkSingapore138542Singapore
| | | | - Hairui Li
- MGI Tech21 Biopolis Road, NucleosSingapore138567Singapore
| | - Xiaoqiang Xiang
- Department of Clinical Pharmacy and Pharmacy Administration, School of PharmacyFudan UniversityShanghai201203China
| | - Lifeng Kang
- School of Pharmacy, Faculty of Medicine and HealthUniversity of SydneyNew South Wales2006Australia
| |
Collapse
|
95
|
Ma Y, Zhang J, Lu J, Chen S, Xing G, Feng R. Prediction and analysis of likelihood of freeway crash occurrence considering risky driving behavior. Accid Anal Prev 2023; 192:107244. [PMID: 37573710 DOI: 10.1016/j.aap.2023.107244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 07/29/2023] [Accepted: 07/30/2023] [Indexed: 08/15/2023]
Abstract
The prediction of the likelihood of vehicle crashes constitutes an indispensable component of freeway safety management. Due to data collection limitations, studies have used mainly traffic flow-related variables to develop freeway crash prediction models but rarely have considered the effect of risky driving behavior on the likelihood of crashes. This study employed navigation software to collect driving behavior data and integrated multi-source data that include vehicle speed, traffic volume, and congestion index values. The study also employed the 'synthesizing minority oversampling technique and edited nearest neighbor' (SMOTE + ENN) coupled method for data balance processing. Three freeway crash likelihood prediction models were built based on the binomial logit, eXtreme Gradient Boosting (XGBoost), and support vector machine algorithms, respectively. The Shapley additive explanation (SHAP) algorithm was utilized to explore the effect of each feature variable on the likelihood of crashes. The results show that the prediction accuracy of the XGBoost model is the best of the three compared models. Under the optimal control-to-case ratio (1:1), the prediction accuracy of the XGBoost model reached 0.96 in this study, and the recall rate, specificity, and area-under-the-curve values were 0.86, 0.96, and 0.907, respectively. Comparative test results demonstrate that ranking risky driving behavior into three levels of intensity can effectively enhance the predictive accuracy of the XGBoost model. Moreover, the XGBoost model with its ten-minute time step outperformed the XGBoost model with its five-minute time step in terms of prediction accuracy. The results of the SHAP-based analysis show that the likelihood of highway crashes is high when the traffic congestion level is high and the distribution of the vehicle speed in the upstream roadway section is significant. Also, both sharp acceleration and sharp deceleration lead to greater likelihood of crashes. This paper aims to provide an effective framework for predicting and interpreting the likelihood of freeway crashes, thereby providing guidance for crash prevention, driver training, and the development of traffic regulations.
Collapse
Affiliation(s)
- Yongfeng Ma
- Jiangsu Key Laboratory of Urban ITS, School of Transportation, Southeast University, Nanjing 211189, China; Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic Technologies, Southeast University, Nanjing 211189, China.
| | - Junjie Zhang
- Jiangsu Key Laboratory of Urban ITS, School of Transportation, Southeast University, Nanjing 211189, China; Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic Technologies, Southeast University, Nanjing 211189, China
| | - Jian Lu
- Jiangsu Key Laboratory of Urban ITS, School of Transportation, Southeast University, Nanjing 211189, China; Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic Technologies, Southeast University, Nanjing 211189, China.
| | - Shuyan Chen
- Jiangsu Key Laboratory of Urban ITS, School of Transportation, Southeast University, Nanjing 211189, China; Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic Technologies, Southeast University, Nanjing 211189, China
| | - Guanyang Xing
- Jiangsu Key Laboratory of Urban ITS, School of Transportation, Southeast University, Nanjing 211189, China; Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic Technologies, Southeast University, Nanjing 211189, China
| | - Ranqun Feng
- Jiangsu Key Laboratory of Urban ITS, School of Transportation, Southeast University, Nanjing 211189, China; Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic Technologies, Southeast University, Nanjing 211189, China
| |
Collapse
|
96
|
Al-Shboul KF. Unraveling the complex interplay between soil characteristics and radon surface exhalation rates through machine learning models and multivariate analysis. Environ Pollut 2023; 336:122440. [PMID: 37625775 DOI: 10.1016/j.envpol.2023.122440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 07/28/2023] [Accepted: 08/22/2023] [Indexed: 08/27/2023]
Abstract
This research seeks to elucidate the intricate interplay between soil characteristics and the rates of radon surface exhalation rate. To achieve this aim, Light Gradient Boosting Machine (LightGBM) and eXtreme Gradient Boosting (XGBoost) machine learning (ML) algorithms are employed, supported by Multivariate Analysis (MA). An analysis was performed on a collection of soil samples, examining radon surface exhalation rates and other pertinent properties such as moisture content, particle size distributions, and the concentrations of Ra-226, Th-232, and K-40. The analysis revealed several key factors influencing radon exhalation rates, namely Ra-226 concentration, moisture content, and larger soil particles. To visualize the intricate relationships between these variables, contour plots of experimental and ML-generated data were created. These visual representations demonstrated that elevated soil moisture levels decrease radon exhalation rates. In contrast, higher concentrations of Ra-226 and a greater proportion of large soil particles led to an increase in exhalation rates. This endeavor presents these complex relationships in an accessible manner, furthering our understanding of the factors in radon surface exhalation. MA techniques, including Hierarchical Cluster Analysis (HCA) and Principal Component Analysis (PCA), were initially employed to investigate the complex interactions of soil attributes on radon exhalation. HCA identified three distinct clusters but faced limitations in detecting strong negative impacts. PCA successfully captured these inverse effects, indicating that the first two principal components accounted for approximately 80% of the total variance, primarily attributed to Ra-226 concentration, moisture content, and the percentage of large soil particles. However, neither technique could quantify the effects of soil attributes on radon exhalation rates. LightGBM outperformed XGBoost, but both successfully quantified the impacts of the studied soil characteristics on radon exhalation. Sensitivity analysis confirmed the robustness and accuracy of both models. This study highlights that XGBoost and LightGBM algorithms can effectively quantify radon exhalation rates based on soil characteristics, providing valuable insights for environmental policies, land use planning, and radon mitigation strategies.
Collapse
Affiliation(s)
- Khaled F Al-Shboul
- Department of Nuclear Engineering, Jordan University of Science & Technology, P.O. Box 3030, Irbid, 22110, Jordan.
| |
Collapse
|
97
|
Atehortúa A, Gkontra P, Camacho M, Diaz O, Bulgheroni M, Simonetti V, Chadeau-Hyam M, Felix JF, Sebert S, Lekadir K. Cardiometabolic risk estimation using exposome data and machine learning. Int J Med Inform 2023; 179:105209. [PMID: 37729839 DOI: 10.1016/j.ijmedinf.2023.105209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 08/11/2023] [Accepted: 08/30/2023] [Indexed: 09/22/2023]
Abstract
BACKGROUND The human exposome encompasses all exposures that individuals encounter throughout their lifetime. It is now widely acknowledged that health outcomes are influenced not only by genetic factors but also by the interactions between these factors and various exposures. Consequently, the exposome has emerged as a significant contributor to the overall risk of developing major diseases, such as cardiovascular disease (CVD) and diabetes. Therefore, personalized early risk assessment based on exposome attributes might be a promising tool for identifying high-risk individuals and improving disease prevention. OBJECTIVE Develop and evaluate a novel and fair machine learning (ML) model for CVD and type 2 diabetes (T2D) risk prediction based on a set of readily available exposome factors. We evaluated our model using internal and external validation groups from a multi-center cohort. To be considered fair, the model was required to demonstrate consistent performance across different sub-groups of the cohort. METHODS From the UK Biobank, we identified 5,348 and 1,534 participants who within 13 years from the baseline visit were diagnosed with CVD and T2D, respectively. An equal number of participants who did not develop these pathologies were randomly selected as the control group. 109 readily available exposure variables from six different categories (physical measures, environmental, lifestyle, mental health events, sociodemographics, and early-life factors) from the participant's baseline visit were considered. We adopted the XGBoost ensemble model to predict individuals at risk of developing the diseases. The model's performance was compared to that of an integrative ML model which is based on a set of biological, clinical, physical, and sociodemographic variables, and, additionally for CVD, to the Framingham risk score. Moreover, we assessed the proposed model for potential bias related to sex, ethnicity, and age. Lastly, we interpreted the model's results using SHAP, a state-of-the-art explainability method. RESULTS The proposed ML model presents a comparable performance to the integrative ML model despite using solely exposome information, achieving a ROC-AUC of 0.78±0.01 and 0.77±0.01 for CVD and T2D, respectively. Additionally, for CVD risk prediction, the exposome-based model presents an improved performance over the traditional Framingham risk score. No bias in terms of key sensitive variables was identified. CONCLUSIONS We identified exposome factors that play an important role in identifying patients at risk of CVD and T2D, such as naps during the day, age completed full-time education, past tobacco smoking, frequency of tiredness/unenthusiasm, and current work status. Overall, this work demonstrates the potential of exposome-based machine learning as a fair CVD and T2D risk assessment tool.
Collapse
Affiliation(s)
- Angélica Atehortúa
- BCN-AIM laboratory, Facultat de Matemàtiques i Informàtica, Universitat de Barcelona, Barcelona, Spain.
| | - Polyxeni Gkontra
- BCN-AIM laboratory, Facultat de Matemàtiques i Informàtica, Universitat de Barcelona, Barcelona, Spain
| | - Marina Camacho
- BCN-AIM laboratory, Facultat de Matemàtiques i Informàtica, Universitat de Barcelona, Barcelona, Spain
| | - Oliver Diaz
- BCN-AIM laboratory, Facultat de Matemàtiques i Informàtica, Universitat de Barcelona, Barcelona, Spain
| | | | | | - Marc Chadeau-Hyam
- Department of Epidemiology and Biostatistics, MRC-HPA Centre for Environment and Health, School of Public Health, Imperial College London, London, United Kingdom
| | - Janine F Felix
- The Generation R Study Group, Erasmus MC, University Medical Center Rotterdam, Rotterdam, the Netherlands; Department of Pediatrics, Erasmus MC, University Medical Center Rotterdam, Rotterdam, the Netherlands
| | - Sylvain Sebert
- Research Unit of Population Health, Faculty of Medicine, University of Oulu, Oulu, Finland
| | - Karim Lekadir
- BCN-AIM laboratory, Facultat de Matemàtiques i Informàtica, Universitat de Barcelona, Barcelona, Spain
| |
Collapse
|
98
|
Yu Y, Li J, Li J, Zen X, Fu Q. Evidence from Machine Learning, Diagnostic Hub Genes in Sepsis and Diagnostic Models based on Xgboost Models, Novel Molecular Models for the Diagnosis of Sepsis. Curr Med Chem 2023:CMC-EPUB-135666. [PMID: 37921181 DOI: 10.2174/0109298673273009231017061448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 09/15/2023] [Accepted: 09/26/2023] [Indexed: 11/04/2023]
Abstract
BACKGROUND Systemic multi-organ dysfunction resulting from dysregulated immune responses in the host triggered by microbial infection or other factors is a major cause of death in sepsis, and secretory pathways play an important role in it. METHODS GSE57065, GSE65682, GSE145227, and GSE54514 from Gene Expression Omnibus (GEO) were derived for this study. Secretory pathways single sample gene set enrichment analysis (ssGSEA) scores in sepsis and normal samples were exposed. Gene modules associated with secretory pathways were selected by weighted gene coexpression network analysis (WGCNA) for Protein-Protein Interaction Networks (PPI) assessment, and crossover genes in both were evaluated by eXtreme Gradient Boosting (XGBoost) model in feature selection to identify hub genes in sepsis. In addition, we explored the immune cells and signaling pathways regulated by hub genes. RESULTS Remarkable dysregulation of secretory pathways was demonstrated in sepsis. The secretory pathways-associated gene modules were intimately involved in cytokine and immune responses in infection. Four crossover genes (CD163, FCER1G, C3AR1, ARG1) were present in WGCNA and PPI, and training in the XGBoost model revealed the best diagnostic performance of these 4 genes, meaning that these genes were the hub genes for sepsis. The 4-hub genes showed a significant negative correlation with T cell activity and a significant positive correlation with inflammatory immune cells. In addition, we found that the 4-hub genes markedly positively regulated INFLAMMATORY RESPONSE, IL6 JAK STAT3 SIGNALING. CONCLUSION Based on WGCNA, PPI, and XGBoost models, we identified hub genes that play an important regulatory role in sepsis. We also developed novel molecular models for the diagnosis of sepsis.
Collapse
Affiliation(s)
- YangZi Yu
- Department of Geriatrics, Tianjin Nankai Hospital, Tianjin, 300000, China
| | - Jing Li
- Department of Cardiology, Tianjin Nankai Hospital, Tianjin, 300000, China
| | - JiaRui Li
- Department of Geriatrics, Tianjin Nankai Hospital, Tianjin, 300000, China
| | - XianMing Zen
- Department of Geriatrics, Tianjin Nankai Hospital, Tianjin, 300000, China
| | - Qiang Fu
- Department of Critical Medicine, Tianjin Forth Central Hospital, Tianjin, 300000, China
| |
Collapse
|
99
|
Ganie SM, Pramanik PKD, Bashir Malik M, Mallik S, Qin H. An ensemble learning approach for diabetes prediction using boosting techniques. Front Genet 2023; 14:1252159. [PMID: 37953921 PMCID: PMC10639159 DOI: 10.3389/fgene.2023.1252159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 10/16/2023] [Indexed: 11/14/2023] Open
Abstract
Introduction: Diabetes is considered one of the leading healthcare concerns affecting millions worldwide. Taking appropriate action at the earliest stages of the disease depends on early diabetes prediction and identification. To support healthcare providers for better diagnosis and prognosis of diseases, machine learning has been explored in the healthcare industry in recent years. Methods: To predict diabetes, this research has conducted experiments on five boosting algorithms on the Pima diabetes dataset. The dataset was obtained from the University of California, Irvine (UCI) machine learning repository, which contains several important clinical features. Exploratory data analysis was used to identify the characteristics of the dataset. Moreover, upsampling, normalisation, feature selection, and hyperparameter tuning were employed for predictive analytics. Results: The results were analysed using various statistical/machine learning metrics and k-fold cross-validation techniques. Gradient boosting achieved the greatest accuracy rate of 92.85% among all the classifiers. Precision, recall, f1-score, and receiver operating characteristic (ROC) curves were used to further validate the model. Discussion: The suggested model outperformed the current studies in terms of prediction accuracy, demonstrating its applicability to other diseases with similar predicate indications.
Collapse
Affiliation(s)
| | | | - Majid Bashir Malik
- Department of Computer Science, Baba Ghulam Shah Badshah University, Rajauri, India
| | - Saurav Mallik
- Department of Environmental Health, School of Public Health, Harvard University, Boston, MA, United States
| | - Hong Qin
- College of Engineering and Computer Science, University of Tennessee at Chattanooga, Chattanooga, TN, United States
| |
Collapse
|
100
|
Zaki FR, Monroy GL, Shi J, Sudhir K, Boppart SA. Texture-based speciation of otitis media-related bacterial biofilms from optical coherence tomography images using supervised classification. Res Sq 2023:rs.3.rs-3466690. [PMID: 37961282 PMCID: PMC10635317 DOI: 10.21203/rs.3.rs-3466690/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Otitis media (OM) is primarily a bacterial middle-ear infection prevalent among children worldwide. In recurrent and/or chronic OM cases, antibiotic-resistant bacterial biofilms can develop in the middle ear. A biofilm related to OM typically contains one or multiple bacterial strains, the most common include Haemophilus influenzae, Streptococcus pneumoniae, Moraxella catarrhalis, Pseudomonas aeruginosa, and Staphylococcus aureus. Optical coherence tomography (OCT) has been used clinically to visualize the presence of bacterial biofilms in the middle ear. This study used OCT to compare microstructural image texture features from primary bacterial biofilms in vitro and in vivo. The proposed method applied supervised machine-learning-based frameworks (SVM, random forest (RF), and XGBoost) to classify and speciate multiclass bacterial biofilms from the texture features extracted from OCT B-Scan images obtained from in vitro cultures and from clinically-obtained in vivo images from human subjects. Our findings show that optimized SVM-RBF and XGBoost classifiers can help distinguish bacterial biofilms by incorporating clinical knowledge into classification decisions. Furthermore, both classifiers achieved more than 95% of AUC (area under receiver operating curve), detecting each biofilm class. These results demonstrate the potential for differentiating OM-causing bacterial biofilms through texture analysis of OCT images and a machine-learning framework, which could provide additional clinically relevant data during real-time in vivo characterization of ear infections.
Collapse
Affiliation(s)
- Farzana R Zaki
- Beckman Institute for Advanced Science and Technology, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Guillermo L Monroy
- Beckman Institute for Advanced Science and Technology, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Jindou Shi
- Beckman Institute for Advanced Science and Technology, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Kavya Sudhir
- Beckman Institute for Advanced Science and Technology, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Department of Bioengineering, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Stephen A Boppart
- Beckman Institute for Advanced Science and Technology, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Department of Bioengineering, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- Carle Illinois College of Medicine, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
- NIH/NIBIB P41 Center for Label-free Imaging and Multiscale Biophotonics (CLIMB), University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| |
Collapse
|