Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

476
(from Reference Citation Analysis)

Article PDFs (133)

Cited by > 0 (245)

Searched Name

XGBoost

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Cao C, Zhang T, Xin T. The effect of reading engagement on scientific literacy - an analysis based on the XGBoost method. Front Psychol 2024;15:1329724. [PMID: 38420178 PMCID: PMC10899671 DOI: 10.3389/fpsyg.2024.1329724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Accepted: 01/22/2024] [Indexed: 03/02/2024] Open

Abstract

Scientific literacy is a key factor of personal competitiveness, and reading is the most common activity in daily learning life, and playing the influence of reading on individuals day by day is the most convenient way to improve the level of scientific literacy of all people. Reading engagement is one of the important student characteristics related to reading literacy, which is highly malleable and is jointly reflected by behavioral, cognitive, and affective engagement, and it is of theoretical and practical significance to explore the relationship between reading engagement and scientific literacy using reading engagement as an entry point. In this study, we used PISA2018 data from China to explore the relationship between reading engagement and scientific literacy with a sample of 15-year-old students in mainland China. 36 variables related to reading engagement and background variables (gender, grade, and socioeconomic and cultural status of the family) were selected from the questionnaire as the independent variables, and the score of the Scientific Literacy Assessment (SLA) was taken as the outcome variable, and supervised machine learning method, the XGBoost algorithm, to construct the model. The dataset is randomly divided into training set and test set to optimize the model, which can verify that the obtained model has good fitting degree and generalization ability. Meanwhile, global and local personalized interpretation is done by introducing the SHAP value, a cutting-edge machine model interpretation method. It is found that among the three major components of reading engagement, cognitive engagement is the more influential factor, and students with high reading cognitive engagement level are more likely to get high scores in scientific literacy assessment, which is relatively dominant in the model of this study. On the other hand, this study verifies the feasibility of the current popular machine learning model, i.e., XGBoost, in a large-scale international education assessment program, with a better model adaptability and conditions for global and local interpretation.

Collapse

Radhakrishnan BL, Ezra K, Jebadurai IJ, Selvakumar I, Karthikeyan P. An Autonomous Sleep-Stage Detection Technique in Disruptive Technology Environment. Sensors (Basel) 2024;24:1197. [PMID: 38400354 PMCID: PMC10892786 DOI: 10.3390/s24041197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 02/07/2024] [Accepted: 02/08/2024] [Indexed: 02/25/2024]

Navratil G, Giannopoulos I. Classifying Motorcyclist Behaviour with XGBoost Based on IMU Data. Sensors (Basel) 2024;24:1042. [PMID: 38339759 PMCID: PMC10857319 DOI: 10.3390/s24031042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 01/31/2024] [Accepted: 02/01/2024] [Indexed: 02/12/2024]

Zheng Z, Liang L, Luo X, Chen J, Lin M, Wang G, Xue C. Diagnosing and tracking depression based on eye movement in response to virtual reality. Front Psychiatry 2024;15:1280935. [PMID: 38374979 PMCID: PMC10875075 DOI: 10.3389/fpsyt.2024.1280935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 01/16/2024] [Indexed: 02/21/2024] Open

Abstract

Introduction

Depression is a prevalent mental illness that is primarily diagnosed using psychological and behavioral assessments. However, these assessments lack objective and quantitative indices, making rapid and objective detection challenging. In this study, we propose a novel method for depression detection based on eye movement data captured in response to virtual reality (VR).

Methods

Eye movement data was collected and used to establish high-performance classification and prediction models. Four machine learning algorithms, namely eXtreme Gradient Boosting (XGBoost), multilayer perceptron (MLP), Support Vector Machine (SVM), and Random Forest, were employed. The models were evaluated using five-fold cross-validation, and performance metrics including accuracy, precision, recall, area under the curve (AUC), and F1-score were assessed. The predicted error for the Patient Health Questionnaire-9 (PHQ-9) score was also determined.

Results

The XGBoost model achieved a mean accuracy of 76%, precision of 94%, recall of 73%, and AUC of 82%, with an F1-score of 78%. The MLP model achieved a classification accuracy of 86%, precision of 96%, recall of 91%, and AUC of 86%, with an F1-score of 92%. The predicted error for the PHQ-9 score ranged from -0.6 to 0.6.To investigate the role of computerized cognitive behavioral therapy (CCBT) in treating depression, participants were divided into intervention and control groups. The intervention group received CCBT, while the control group received no treatment. After five CCBT sessions, significant changes were observed in the eye movement indices of fixation and saccade, as well as in the PHQ-9 scores. These two indices played significant roles in the predictive model, indicating their potential as biomarkers for detecting depression symptoms.

Discussion

The results suggest that eye movement indices obtained using a VR eye tracker can serve as useful biomarkers for detecting depression symptoms. Specifically, the fixation and saccade indices showed promise in predicting depression. Furthermore, CCBT demonstrated effectiveness in treating depression, as evidenced by the observed changes in eye movement indices and PHQ-9 scores. In conclusion, this study presents a novel approach for depression detection using eye movement data captured in VR. The findings highlight the potential of eye movement indices as biomarkers and underscore the effectiveness of CCBT in treating depression.

Collapse

Joe H, Kim HG. Multi-label classification with XGBoost for metabolic pathway prediction. BMC Bioinformatics 2024;25:52. [PMID: 38297220 PMCID: PMC10832249 DOI: 10.1186/s12859-024-05666-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 01/22/2024] [Indexed: 02/02/2024] Open

Lei L, Zhang L, Han Z, Chen Q, Liao P, Wu D, Tai J, Xie B, Su Y. Advancing chronic toxicity risk assessment in freshwater ecology by molecular characterization-based machine learning. Environ Pollut 2024;342:123093. [PMID: 38072027 DOI: 10.1016/j.envpol.2023.123093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/30/2023] [Accepted: 12/02/2023] [Indexed: 01/26/2024]

Abstract

The continuously increased production of various chemicals and their release into environments have raised potential negative effects on ecological health. However, traditional labor-intensive assessment methods cannot effectively and rapidly evaluate these hazards, especially for chronic risk. In this study, machine learning (ML) was employed to construct quantitative structure-activity relationship (QSAR) models, enabling the prediction of chronic toxicity to aquatic organisms by leveraging the molecular characteristics of pollutants, namely, the molecular descriptors, fingerprints, and graphs. The limited dataset size hindered the notable advantages of the graph attention network (GAT) model for the molecular graphs. Considering computational efficiency and performance (R2 = 0.78; RMSE = 0.77), XGBoost (XGB) was used for reliable QSAR-ML models predicting chronic toxicity using small- or medium-sized tabular data and the molecular descriptors. Further kernel density estimation analysis confirmed the high accuracy of the model for pollutant concentrations ranging from 10-3 to 102 mg/L, effectively aligning with most environmental scenarios. Model interpretation showed SlogP and exposure duration as the primary influential factors. SlogP, representing the distribution coefficient of a molecule between lipophilic and hydrophilic environments, had a negative effect on the toxicity outcomes. Additionally, the exposure duration played a crucial role in determining the chronic toxicity. Finally, the chronic toxicity data of bisphenol A validated the robustness and reliability of the model established in this research. Our study provided a robust and feasible methodology for chronic ecological risk evaluation of various types of pollutants and could facilitate and increase the use of ML applications in environmental fields.

Collapse

Affiliation(s)

Lang Lei Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
Liangmao Zhang Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
Zhibang Han Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
Qirui Chen Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
Pengcheng Liao Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
Dong Wu Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China; Chongqing Key Laboratory of Precision Optics, Chongqing Institute of East China Normal University, Chongqing, 401120, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai, 200092, China
Jun Tai Shanghai Environmental Sanitation Engineering Design Institute Co., Ltd., Shanghai, 200232, China
Bing Xie Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai, 200092, China
Yinglong Su Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China; Chongqing Key Laboratory of Precision Optics, Chongqing Institute of East China Normal University, Chongqing, 401120, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai, 200092, China.

Collapse

Tao Q, Wu L, An J, Liu Z, Zhang K, Zhou L, Zhang X. Proteomic analysis of human aqueous humor from fuchs uveitis syndrome. Exp Eye Res 2024;239:109752. [PMID: 38123010 DOI: 10.1016/j.exer.2023.109752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 11/25/2023] [Accepted: 12/11/2023] [Indexed: 12/23/2023]

Abstract

Fuchs uveitis syndrome (FUS) is a commonly misdiagnosed uveitis syndrome often presenting as an asymptomatic mild inflammatory condition until complications arise. The diagnosis of this disease remains clinical because of the lack of specific laboratory tests. The aqueous humor (AH) is a complex fluid containing nutrients and metabolic wastes from the eye. Changes in the AH protein provide important information for diagnosing intraocular diseases. This study aimed to analyze the proteomic profile of AH in individuals diagnosed with FUS and to identify potential biomarkers of the disease. We used liquid chromatography-tandem mass spectrometry-based proteomic methods to evaluate the AH protein profiles of all 37 samples, comprising 15 patients with FUS, six patients with Posner-Schlossman syndrome (PSS), and 16 patients with age-related cataract. A total of 538 proteins were identified from a comprehensive spectral library of 634 proteins. Subsequent differential expression analysis, enrichment analysis, and construction of key sub-networks revealed that the inflammatory response, complement activation and hypoxia might be crucial in mediating the process of FUS. The hypoxia inducible factor-1 may serve as a key regulator and therapeutic target. Additionally, the innate and adaptive immune responses are considered dominant in the patients with FUS. A diagnostic model was constructed using machine-learning algorithm to classify FUS, PSS, and normal controls. Two proteins, complement C1q subcomponent subunit B and secretogranin-1, were found to have the highest scores by the Extreme Gradient Boosting, suggesting their potential utility as a biomarker panel. Furthermore, these two proteins as biomarkers were validated in a cohort of 18 patients using high resolution multiple reaction monitoring assays. Therefore, this study contributes to advancing of the current knowledge of FUS pathogenesis and promotes the development of effective diagnostic strategies.

Collapse

Alabi RO, Almangush A, Elmusrati M, Leivo I, Mäkitie AA. Interpretable machine learning model for prediction of overall survival in laryngeal cancer. Acta Otolaryngol 2024:1-7. [PMID: 38279817 DOI: 10.1080/00016489.2023.2301648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 12/21/2023] [Indexed: 01/29/2024]

Liu SH, Ting CE, Wang JJ, Chang CJ, Chen W, Sharma AK. Estimation of Gait Parameters for Adults with Surface Electromyogram Based on Machine Learning Models. Sensors (Basel) 2024;24:734. [PMID: 38339451 PMCID: PMC10857519 DOI: 10.3390/s24030734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 01/18/2024] [Accepted: 01/22/2024] [Indexed: 02/12/2024]

Wang H, Tao Q, Zhang X. Ensemble Learning Method for the Continuous Decoding of Hand Joint Angles. Sensors (Basel) 2024;24:660. [PMID: 38276352 DOI: 10.3390/s24020660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Revised: 01/16/2024] [Accepted: 01/18/2024] [Indexed: 01/27/2024]

Zhang Y, Ma Y, Wang J, Guan Q, Yu B. Construction and validation of a clinical prediction model for deep vein thrombosis in patients with digestive system tumors based on a machine learning. Am J Cancer Res 2024;14:155-168. [PMID: 38323284 PMCID: PMC10839316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 12/13/2023] [Indexed: 02/08/2024] Open

Abstract

This study developed a deep vein thrombosis (DVT) risk prediction model based on multiple machine learning methods for patients with digestive system tumors undergoing surgical treatment. Data of 1048 patients with digestive system tumors admitted to Shanxi Provincial People's Hospital (College of Shanxi Medical University) from January 2020 to January 2023 were retrospectively analyzed, and 845 cases were screened according to the inclusion and exclusion criteria. The patients were divided into a training group (586 patients), and a validation group (259 patients), then feature selection was performed using six models, including Lasso regression, XGBoost, Random Forest, Decision Tree, Support Vector Machine, and Logistics. Predictive models were subsequently constructed from column-line plots, and the predictive validity of the models was assessed using receiver operating characteristic curves, precision-recall curves, and decision-curve analysis. In the model comparison, the XGBoost model showed the largest area under the curve (AUC) on the validation set (P < 0.05), demonstrating excellent predictive performance and generalization ability. We selected the common characteristic factors in the six models to further develop the column line plots to assess the DVT risk. The model performed well in clinical validation and effectively differentiated high-risk and low-risk patients. The differences in BMI, procedure time, and D-dimer were statistically significant between patients in the thrombus group and those in the non-thrombus group (P < 0.05). However, the AUC of the Xgboost model was found to be greater than that of the column chart model by the Delong test (P < 0.05). BMI, procedure time, and D-dimer are critical predictors of DVT risk in patients with digestive system tumors. Our model is an adequate assessment tool for DVT risk, which can help improve the prevention and treatment of DVT.

Collapse

Nasimian A, Younus S, Tatli Ö, Hammarlund EU, Pienta KJ, Rönnstrand L, Kazi JU. AlphaML: A clear, legible, explainable, transparent, and elucidative binary classification platform for tabular data. Patterns (N Y) 2024;5:100897. [PMID: 38264719 PMCID: PMC10801203 DOI: 10.1016/j.patter.2023.100897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 09/07/2023] [Accepted: 11/21/2023] [Indexed: 01/25/2024]

Ogunpola A, Saeed F, Basurra S, Albarrak AM, Qasem SN. Machine Learning-Based Predictive Models for Detection of Cardiovascular Diseases. Diagnostics (Basel) 2024;14:144. [PMID: 38248021 PMCID: PMC10813849 DOI: 10.3390/diagnostics14020144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 12/21/2023] [Accepted: 12/25/2023] [Indexed: 01/23/2024] Open

Shen Y, Zhao X, Wang K, Sun Y, Zhang X, Wang C, Yang Z, Feng Z, Zhang X. Exploring White Matter Abnormalities in Young Children with Autism Spectrum Disorder: Integrating Multi-shell Diffusion Data and Machine Learning Analysis. Acad Radiol 2024:S1076-6332(23)00700-6. [PMID: 38185571 DOI: 10.1016/j.acra.2023.12.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Revised: 12/09/2023] [Accepted: 12/14/2023] [Indexed: 01/09/2024]

Abstract

RATIONALE AND OBJECTIVES

This study employed tract-based spatial statistics (TBSS) to investigate abnormalities in the white matter microstructure among children with autism spectrum disorder (ASD). Additionally, an eXtreme Gradient Boosting (XGBoost) model was developed to effectively classify individuals with ASD and typical developing children (TDC).

METHODS AND MATERIALS

Multi-shell diffusion weighted images were acquired from 62 children with ASD and 44 TDC. Using the Pydesigner procedure, diffusion tensor (DT), diffusion kurtosis (DK), and white matter tract integrity (WMTI) metrics were computed. Subsequently, TBSS analysis was applied to discern differences in these diffusion parameters between ASD and TDC groups. The XGBoost model was then trained using metrics showing significant differences, and Shapley Additive explanations (SHAP) values were computed to assess the feature importance in the model's predictions.

RESULTS

TBSS analysis revealed a significant reduction in axonal diffusivity (AD) in the left posterior corona radiata and the right superior corona radiata. Among the DK indicators, mean kurtosis, axial kurtosis, and kurtosis fractional anisotropy were notably increased in children with ASD, with no significant difference in radial kurtosis. WMTI metrics such as axonal water fraction, axonal diffusivity of the extra-axonal space (EAS_AD), tortuosity of the extra-axonal space (EAS_TORT), and diffusivity of intra-axonal space (IAS_Da) were significantly increased, primarily in the corpus callosum and fornix. Notably, there was no significant difference in radial diffusivity of the extra-axial space (EAS_RD). The XGBoost model demonstrated excellent classification ability, and the SHAP analysis identified EAS_TORT as the feature with the highest importance in the model's predictions.

CONCLUSION

This study utilized TBSS analyses with multi-shell diffusion data to examine white matter abnormalities in pediatric autism. Additionally, the developed XGBoost model showed outstanding performance in classifying ASD and TDC. The ranking of SHAP values based on the XGBoost model underscored the significance of features in influencing model predictions.

Collapse

Affiliation(s)

Yanyong Shen Department of Radiology, The Third Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, China (Y.S., X.Z., X.Z., C.W., Z.Y., Z.F., X.Z.); Henan International Joint Laboratory of Neuroimaging, Zhengzhou, 450052, China (Y.S., X.Z., X.Z., C.W., Z.Y., Z.F., X.Z.)
Xin Zhao Department of Radiology, The Third Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, China (Y.S., X.Z., X.Z., C.W., Z.Y., Z.F., X.Z.); Henan International Joint Laboratory of Neuroimaging, Zhengzhou, 450052, China (Y.S., X.Z., X.Z., C.W., Z.Y., Z.F., X.Z.)
Kaiyu Wang MR Research China, GE Healthcare, Beijing, 100000, PR China (K.W.)
Yongbing Sun Department of Radiology, Henan Provincial People's Hospital, Zhengzhou, 450000, China (Y.S.)
Xiaoxue Zhang Department of Radiology, The Third Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, China (Y.S., X.Z., X.Z., C.W., Z.Y., Z.F., X.Z.); Henan International Joint Laboratory of Neuroimaging, Zhengzhou, 450052, China (Y.S., X.Z., X.Z., C.W., Z.Y., Z.F., X.Z.)
Changhao Wang Department of Radiology, The Third Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, China (Y.S., X.Z., X.Z., C.W., Z.Y., Z.F., X.Z.); Henan International Joint Laboratory of Neuroimaging, Zhengzhou, 450052, China (Y.S., X.Z., X.Z., C.W., Z.Y., Z.F., X.Z.)
Zhexuan Yang Department of Radiology, The Third Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, China (Y.S., X.Z., X.Z., C.W., Z.Y., Z.F., X.Z.); Henan International Joint Laboratory of Neuroimaging, Zhengzhou, 450052, China (Y.S., X.Z., X.Z., C.W., Z.Y., Z.F., X.Z.)
Zhanqi Feng Department of Radiology, The Third Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, China (Y.S., X.Z., X.Z., C.W., Z.Y., Z.F., X.Z.); Henan International Joint Laboratory of Neuroimaging, Zhengzhou, 450052, China (Y.S., X.Z., X.Z., C.W., Z.Y., Z.F., X.Z.)
Xiaoan Zhang Department of Radiology, The Third Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, China (Y.S., X.Z., X.Z., C.W., Z.Y., Z.F., X.Z.); Henan International Joint Laboratory of Neuroimaging, Zhengzhou, 450052, China (Y.S., X.Z., X.Z., C.W., Z.Y., Z.F., X.Z.).

Collapse

Li X, Li C, Guo F, Meng X, Liu Y, Ren F. Coefficient of variation method combined with XGboost ensemble model for wheat growth monitoring. Front Plant Sci 2024;14:1267108. [PMID: 38235205 PMCID: PMC10791907 DOI: 10.3389/fpls.2023.1267108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 11/30/2023] [Indexed: 01/19/2024]

Abstract

Introduction

Obtaining wheat growth information accurately and efficiently is the key to estimating yields and guiding agricultural development.

Methods

This paper takes the precision agriculture demonstration area of Jiaozuo Academy of Agriculture and Forestry in Henan Province as the research area to obtain data on wheat biomass, nitrogen content, chlorophyll content, and leaf area index. By using the coefficient of variation method, a Comprehensive Growth Monitoring Indicator (CGMI) was constructed to perform fractional derivative processing on drone spectral data, and correlation analysis was performed on the fractional derivative spectra with a single indicator and CGMI, respectively. Then, grey correlation analysis was carried out on differential spectral bands with high correlation, the grey correlation coefficients between differential spectral bands were calculated, and spectral bands with high correlation were screened and taken as input variables for the model. Next, ridge regression, random forest, and XGboost models were used to establish a wheat CGMI inversion model, and the coefficient of determination (R2) and root mean squared error (RMSE) were adopted for accuracy evaluation to optimize the wheat optimal growth inversion model.

Results and discussion

The results of the study show that: using the data of wheat biomass, nitrogen content, chlorophyll content and leaf area index to construct the comprehensive growth monitoring indicators, the correlation between the wheat growth monitoring indicators and the spectra was calculated, and the results showed that the correlation between the comprehensive growth monitoring indicators and the single indicator correlation had different degrees of increase, and the growth rate could reach 82.22%. The correlation coefficient between the comprehensive growth monitoring indexes and the differential spectra reached 0.92 at the flowering stage, and compared with the correlation coefficient with the original spectra at the same period, the correlation coefficients increased to different degrees, which indicated that the differential processing of spectral data could effectively enhance the spectral correlation. The three models of Random Forest, Ridge Regression and XGBoost were used to construct the wheat growth inversion model with the best effect at the flowering stage, and the XGBoost model had the highest inversion accuracy when comparing in the same period, with the training and test sets reaching 0.904 and 0.870, and the RMSEs were 0.050 and 0.079, so that the XGBoost model can be used as an effective method of monitoring the growth of wheat. To sum up, this study demonstrates that the combination of constructing comprehensive growth monitoring indicators and differential processing spectra can effectively improve the accuracy of wheat growth monitoring, bringing new methods for precision agriculture management.

Collapse

Ryyppö R, Häyrynen S, Joutsijoki H, Juhola M, Seppänen MRJ. Comparison of machine learning methods in the early identification of vasculitides, myositides and glomerulonephritides. Comput Methods Programs Biomed 2024;243:107917. [PMID: 37948909 DOI: 10.1016/j.cmpb.2023.107917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 09/29/2023] [Accepted: 11/05/2023] [Indexed: 11/12/2023]

Abstract

BACKGROUND

Rare disease diagnoses are often delayed by years, including multiple doctor visits, and potential imprecise or incorrect diagnoses before receiving the correct one. Machine learning could solve this problem by flagging potential patients that doctors should examine more closely.

METHODS

Making the prediction situation as close as possible to real situation, we tested different masking sizes. In the masking phase, data was removed, and it was applied to all data points following the first rare disease diagnosis, including the day when the diagnosis was received, and in addition applied to selected number of days before initial diagnosis. Performance of machine learning models were compared with positive predictive value (PPV), negative predictive value (NPV), prevalence PPV (pPPV), prevalence NPV (pNPV), accuracy (ACC) and area under the receiver operation characteristics curve (AUC).

RESULTS

XGBoost had PPVs over 90 % in all masking settings, and InceptionVasGloMyotides had most of the PPVs over 90 %, but not as consistently. When the prevalence of the diseases was considered XGBoost achieved highest value of 8.8 % in binary classification with 30 days masking and InceptionVasGloMyotides achieved the best value of 6 % in the binary classification as well, but with 2160 days and 4320 days masking. ACC were varying between 89 % and 98 % with XGBoost and InceptionVasGloMyotides having variation between 79 % and 94 %. AUC on the other hand varied between 72.6 % and 94.5 % with InceptionVasGloMyotides and for XGBoost it varied between 69.9 % and 96.4 %.

CONCLUSIONS

XGBoost and InceptionVasGloMyotides could successfully predict rare diseases for patients at least 30 days prior to initial rare disease diagnose. In addition, we managed to build performative custom deep learning model.

Collapse

Cao J, Xu Y. Predicting cysteine reactivity changes upon phosphorylation using XGBoost. FEBS Open Bio 2024;14:51-62. [PMID: 37964470 PMCID: PMC10761938 DOI: 10.1002/2211-5463.13737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 10/11/2023] [Accepted: 10/27/2023] [Indexed: 11/16/2023] Open

Abstract

Cysteine reactivity serves as a significant indicator of protein function and can be affected by phosphorylation events. Experimental approaches have been developed to investigate this effect, but the scale is still relatively limited. Machine-learning approaches promise to accelerate the investigation of these phenomena. In this study, protein sequence information, distances to the closest phosphorylation sites, and the membership score of the intrinsically disordered region were used to represent the cysteine. Following the feature selection using an elastic net model, two groups of binary classifiers based on XGBoost were built to predict the occurrence and the direction of the reactivity change as a response to phosphorylation events, respectively. In addition, function enrichment analysis was performed on proteins/genes predicted to have reactivity changes. XGBoost performed the best in the independent test with AUC of 0.8192 and 0.9203 for the prediction of the change's occurrence and direction, respectively. The use of two binary classifiers successively resulted in an accuracy of 0.7568 in predicting whether reactivity would be unchanged, increased, or decreased. The enrichment analysis revealed the association of proteins carrying reactivity-changed cysteine residues with various disease-related pathways, particularly cancer, autosomal dominant diseases, and viral infections. Changes in cysteine reactivity influenced by phosphorylation are site-specific and can be predicted by XGBoost algorithms. Our model provides an efficient alternative way to explore the cysteine reactivity upon phosphorylation at the proteome-wide level, facilitating the investigation of protein functions and their clinical insights. Our code is available on GitHub (https://github.com/DarinaOsamu/predictors-of-cysteine-reactivity-changes).

Collapse

Sharma K, Saini N, Hasija Y. Identifying the mitochondrial metabolism network by integration of machine learning and explainable artificial intelligence in skeletal muscle in type 2 diabetes. Mitochondrion 2024;74:101821. [PMID: 38040172 DOI: 10.1016/j.mito.2023.11.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 10/04/2023] [Accepted: 11/26/2023] [Indexed: 12/03/2023]

Nabeel SM, Bazai SU, Alasbali N, Liu Y, Ghafoor MI, Khan R, Ku CS, Yang J, Shahab S, Por LY. Optimizing lung cancer classification through hyperparameter tuning. Digit Health 2024;10:20552076241249661. [PMID: 38698834 PMCID: PMC11064752 DOI: 10.1177/20552076241249661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 04/04/2024] [Indexed: 05/05/2024] Open

Calderón-Díaz M, Silvestre Aguirre R, Vásconez JP, Yáñez R, Roby M, Querales M, Salas R. Explainable Machine Learning Techniques to Predict Muscle Injuries in Professional Soccer Players through Biomechanical Analysis. Sensors (Basel) 2023;24:119. [PMID: 38202981 PMCID: PMC10780883 DOI: 10.3390/s24010119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 11/25/2023] [Accepted: 12/18/2023] [Indexed: 01/12/2024]

Abstract

There is a significant risk of injury in sports and intense competition due to the demanding physical and psychological requirements. Hamstring strain injuries (HSIs) are the most prevalent type of injury among professional soccer players and are the leading cause of missed days in the sport. These injuries stem from a combination of factors, making it challenging to pinpoint the most crucial risk factors and their interactions, let alone find effective prevention strategies. Recently, there has been growing recognition of the potential of tools provided by artificial intelligence (AI). However, current studies primarily concentrate on enhancing the performance of complex machine learning models, often overlooking their explanatory capabilities. Consequently, medical teams have difficulty interpreting these models and are hesitant to trust them fully. In light of this, there is an increasing need for advanced injury detection and prediction models that can aid doctors in diagnosing or detecting injuries earlier and with greater accuracy. Accordingly, this study aims to identify the biomarkers of muscle injuries in professional soccer players through biomechanical analysis, employing several ML algorithms such as decision tree (DT) methods, discriminant methods, logistic regression, naive Bayes, support vector machine (SVM), K-nearest neighbor (KNN), ensemble methods, boosted and bagged trees, artificial neural networks (ANNs), and XGBoost. In particular, XGBoost is also used to obtain the most important features. The findings highlight that the variables that most effectively differentiate the groups and could serve as reliable predictors for injury prevention are the maximum muscle strength of the hamstrings and the stiffness of the same muscle. With regard to the 35 techniques employed, a precision of up to 78% was achieved with XGBoost, indicating that by considering scientific evidence, suggestions based on various data sources, and expert opinions, it is possible to attain good precision, thus enhancing the reliability of the results for doctors and trainers. Furthermore, the obtained results strongly align with the existing literature, although further specific studies about this sport are necessary to draw a definitive conclusion.

Collapse

Lu B, Meng X, Dong S, Zhang Z, Liu C, Jiang J, Herrmann H, Li X. High-resolution mapping of regional VOCs using the enhanced space-time extreme gradient boosting machine (XGBoost) in Shanghai. Sci Total Environ 2023;905:167054. [PMID: 37714357 DOI: 10.1016/j.scitotenv.2023.167054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 09/10/2023] [Accepted: 09/11/2023] [Indexed: 09/17/2023]

Zhu H, Hao H, Yu L. Identifying disease-related microbes based on multi-scale variational graph autoencoder embedding Wasserstein distance. BMC Biol 2023;21:294. [PMID: 38115088 PMCID: PMC10731776 DOI: 10.1186/s12915-023-01796-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 12/05/2023] [Indexed: 12/21/2023] Open

Mahlknecht J, Torres-Martínez JA, Kumar M, Mora A, Kaown D, Loge FJ. Nitrate prediction in groundwater of data scarce regions: The futuristic fresh-water management outlook. Sci Total Environ 2023;905:166863. [PMID: 37690767 DOI: 10.1016/j.scitotenv.2023.166863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 08/28/2023] [Accepted: 09/03/2023] [Indexed: 09/12/2023]

Abstract

Nitrate contamination in groundwater poses a significant threat to water quality and public health, especially in regions with limited data availability. This study addresses this challenge by employing machine learning (ML) techniques to predict nitrate (NO3--N) concentrations in Mexico's groundwater. Four ML algorithms-Extreme Gradient Boosting (XGB), Boosted Regression Trees (BRT), Random Forest (RF), and Support Vector Machines (SVM)-were executed to model NO3--N concentrations across the country. Despite data limitations, the ML models achieved robust predictive performances. XGB and BRT algorithms demonstrated superior accuracy (0.80 and 0.78, respectively). Notably, this was achieved using ∼10 times less information than previous large-scale assessments. The novelty lies in the first-ever implementation of the 'Support Points-based Split Approach' during data pre-processing. The models considered initially 68 covariates and identified 13-19 significant predictors of NO3--N concentration spanning from climate, geomorphology, soil, hydrogeology, and human factors. Rainfall, elevation, and slope emerged as key predictors. A validation incorporated nationwide waste disposal sites, yielding an encouraging correlation. Spatial risk mapping unveiled significant pollution hotspots across Mexico. Regions with elevated NO3--N concentrations (>10 mg/L) were identified, particularly in the north-central and northeast parts of the country, associated with agricultural and industrial activities. Approximately 21 million people, accounting for 10 % of Mexico's population, are potentially exposed to elevated NO3--N levels in groundwater. Moreover, the NO3--N hotspots align with reported NO3--N health implications such as gastric and colorectal cancer. This study not only demonstrates the potential of ML in data-scarce regions but also offers actionable insights for policy and management strategies. Our research underscores the urgency of implementing sustainable agricultural practices and comprehensive domestic waste management measures to mitigate NO3--N contamination. Moreover, it advocates for the establishment of effective policies based on real-time monitoring and collaboration among stakeholders.

Collapse

Teng X, Wang Z. Online COVID-19 diagnosis prediction using complete blood count: an innovative tool for public health. BMC Public Health 2023;23:2536. [PMID: 38114942 PMCID: PMC10729447 DOI: 10.1186/s12889-023-17477-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 12/13/2023] [Indexed: 12/21/2023] Open

Abstract

BACKGROUND

COVID-19, caused by SARS-CoV-2, presents distinct diagnostic challenges due to its wide range of clinical manifestations and the overlapping symptoms with other common respiratory diseases. This study focuses on addressing these difficulties by employing machine learning (ML) methodologies, particularly the XGBoost algorithm, to utilize Complete Blood Count (CBC) parameters for predictive analysis.

METHODS

We performed a retrospective study involving 2114 COVID-19 patients treated between December 2022 and January 2023 at our healthcare facility. These patients were classified into fever (1057 patients) and pneumonia groups (1057 patients), based on their clinical symptoms. The CBC data were utilized to create predictive models, with model performance evaluated through metrics like Area Under the Receiver Operating Characteristics Curve (AUC), accuracy, sensitivity, specificity, and precision. We selected the top 10 predictive variables based on their significance in disease prediction. The data were then split into a training set (70% of patients) and a validation set (30% of patients) for model validation.

RESULTS

We identified 31 indicators with significant disparities. The XGBoost model outperformed others, with an AUC of 0.920 and high precision, sensitivity, specificity, and accuracy. The top 10 features (Age, Monocyte%, Mean Platelet Volume, Lymphocyte%, SIRI, Eosinophil count, Platelet count, Hemoglobin, Platelet Distribution Width, and Neutrophil count.) were crucial in constructing a more precise predictive model. The model demonstrated strong performance on both training (AUC = 0.977) and validation (AUC = 0.912) datasets, validated by decision curve analysis and calibration curve.

CONCLUSION

ML models that incorporate CBC parameters offer an innovative and effective tool for data analysis in COVID-19. They potentially enhance diagnostic accuracy and the efficacy of therapeutic interventions, ultimately contributing to a reduction in the mortality rate of this infectious disease.

Collapse

Zhang J, Chen R, Chen S, Yu D, Elkamchouchi DH, Alqahtani MS, Assilzadeh H, Huang Z, Huang Y. Application of lipid and polymeric-based nanoparticles for treatment of inner ear infections via XGBoost. Environ Res 2023;239:117115. [PMID: 37717809 DOI: 10.1016/j.envres.2023.117115] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 08/26/2023] [Accepted: 09/09/2023] [Indexed: 09/19/2023]

Abstract

Taking hearing loss as a prevalent sensory disorder, the restricted permeability of blood flow and the blood-labyrinth barrier in the inner ear pose significant challenges to transporting drugs to the inner ear tissues. The current options for hear loss consist of cochlear surgery, medication, and hearing devices. There are some restrictions to the conventional drug delivery methods to treat inner ear illnesses, however, different smart nanoparticles, including inorganic-based nanoparticles, have been presented to regulate drug administration, enhance the targeting of particular cells, and decrease systemic adverse effects. Zinc oxide nanoparticles possess distinct characteristics that facilitate accurate drug delivery, improved targeting of specific cells, and minimized systemic adverse effects. Zinc oxide nanoparticles was studied for targeted delivery and controlled release of therapeutic drugs within specific cells. XGBoost model is used on the Wideband Absorbance Immittance (WAI) measuring test after cochlear surgery. There were 90 middle ear effusion samples (ages = 1-10 years, mean = 34.9 months) had chronic middle ear effusion for four months and verified effusion for seven weeks. In this research, 400 sets underwent wideband absorbance imaging (WAI) to assess inner ear performance after surgery. Among them, 60 patients had effusion Otitis Media with Effusion (OME), while 30 ones had normal ears (control). OME ears showed significantly lower absorbance at 250, 500, and 1000 Hz than controls (p < 0.001). Absorbance thresholds >0.252 (1000 Hz) and >0.330 (2000 Hz) predicted a favorable prognosis (p < 0.05, odds ratio: 6). It means that cochlear surgery and WAI showed high function in diagnosis and treatment of inner ear infections. Regarding the R2 0.899 and RMSE 1.223, XGBoost shows excellent specificity and sensitivity for categorizing ears as having effusions absent or present or partial or complete flows present, with areas under the curve (1-0.944).

Collapse

Xu Y, Park Y, Park JD, Sun B. Predicting Nurse Turnover for Highly Imbalanced Data Using the Synthetic Minority Over-Sampling Technique and Machine Learning Algorithms. Healthcare (Basel) 2023;11:3173. [PMID: 38132063 PMCID: PMC10742910 DOI: 10.3390/healthcare11243173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 12/11/2023] [Accepted: 12/13/2023] [Indexed: 12/23/2023] Open

Guo J, Cheng H, Wang Z, Qiao M, Li J, Lyu J. Factor analysis based on SHapley Additive exPlanations for sepsis-associated encephalopathy in ICU mortality prediction using XGBoost - a retrospective study based on two large database. Front Neurol 2023;14:1290117. [PMID: 38162445 PMCID: PMC10755941 DOI: 10.3389/fneur.2023.1290117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 11/30/2023] [Indexed: 01/03/2024] Open

Abstract

Objective

Sepsis-associated encephalopathy (SAE) is strongly linked to a high mortality risk, and frequently occurs in conjunction with the acute and late phases of sepsis. The objective of this study was to construct and verify a predictive model for mortality in ICU-dwelling patients with SAE.

Methods

The study selected 7,576 patients with SAE from the MIMIC-IV database according to the inclusion criteria and randomly divided them into training (n = 5,303, 70%) and internal validation (n = 2,273, 30%) sets. According to the same criteria, 1,573 patients from the eICU-CRD database were included as an external test set. Independent risk factors for ICU mortality were identified using Extreme Gradient Boosting (XGBoost) software, and prediction models were constructed and verified using the validation set. The receiver operating characteristic (ROC) and the area under the ROC curve (AUC) were used to evaluate the discrimination ability of the model. The SHapley Additive exPlanations (SHAP) approach was applied to determine the Shapley values for specific patients, account for the effects of factors attributed to the model, and examine how specific traits affect the output of the model.

Results

The survival rate of patients with SAE in the MIMIC-IV database was 88.6% and that of 1,573 patients in the eICU-CRD database was 89.1%. The ROC of the XGBoost model indicated good discrimination. The AUCs for the training, test, and validation sets were 0.908, 0.898, and 0.778, respectively. The impact of each parameter on the XGBoost model was depicted using a SHAP plot, covering both positive (acute physiology score III, vasopressin, age, red blood cell distribution width, partial thromboplastin time, and norepinephrine) and negative (Glasgow Coma Scale) ones.

Conclusion

A prediction model developed using XGBoost can accurately predict the ICU mortality of patients with SAE. The SHAP approach can enhance the interpretability of the machine-learning model and support clinical decision-making.

Collapse

Villanueva P, Yang J, Radmer L, Liang X, Leung T, Ikuma K, Swanner ED, Howe A, Lee J. One-Week-Ahead Prediction of Cyanobacterial Harmful Algal Blooms in Iowa Lakes. Environ Sci Technol 2023;57:20636-20646. [PMID: 38011382 DOI: 10.1021/acs.est.3c07764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]

Nambiar A, S H, S S. Model-agnostic explainable artificial intelligence tools for severity prediction and symptom analysis on Indian COVID-19 data. Front Artif Intell 2023;6:1272506. [PMID: 38111787 PMCID: PMC10726049 DOI: 10.3389/frai.2023.1272506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 11/07/2023] [Indexed: 12/20/2023] Open

Ao Z, Li H, Chen J, Yuan J, Xia Z, Zhang J, Chen H, Wang H, Liu G, Qi L. A new approach to optimizing aeration using XGB-Bi-LSTM via the online monitoring of oxygen transfer efficiency and oxygen uptake rate. Environ Res 2023;238:117142. [PMID: 37739155 DOI: 10.1016/j.envres.2023.117142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Revised: 09/04/2023] [Accepted: 09/13/2023] [Indexed: 09/24/2023]

Abstract

In wastewater treatment plants (WWTPs), aeration is vital for microbial oxygen needs. To achieve carbon neutrality, optimizing aeration for energy and emissions reduction is imperative. Machine learning (ML) is used in wastewater treatment to reveal complex rules in large data sets has become a trend. In this vein, the present paper proposes an aeration optimization approach based on the extreme gradient boosting-bidirectional long short-term memory (XGB-Bi-LSTM) model via the online monitoring of oxygen transfer efficiency (OTE) and oxygen uptake rate (OUR), thus allowing WWTPs to conserve energy and reduce indirect carbon emissions. The approach uses gain algorithm of XGB to calculate the importance of features and identify important parameters, and then uses Bi-LSTM to predict the target with important parameters as features. Operational data from a WWTP in Suzhou, China, is employed to train and test the approach, the performance of which is compared with ML models suitable for regression prediction tasks (XGB, random forest, light gradient boosting machine, gradient boosting and LSTM). Experimental results show the approach requires only a small number of input parameters to achieve good performance and outperforms other machine-learning models. When OTE and dissolved oxygen (DO) are used as features to predict the alpha factor (αF; since diffusers were used, multiply by the pollution factor F), the R-squared (R2) is 0.9977, the root mean square error (RMSE) is 0.0043, the mean absolute percentage error (MAPE) is 0.0069 and the median absolute error (MedAE) is 0.0032. When the predicted αF and the OUR are used as features to predict the air flow rate of an aeration unit, the R2 is 0.9901, the RMSE is 3.6150, the MAPE is 0.0209 and the MedAE is 1.5472. Using our optimized aeration approach, the energy consumption can be reduced by 23%.

Collapse

Nasiri S, Vaezihir A, Ahmadishali J. Designing soil contamination monitoring network in petroleum refineries by XGBoost weighting and geostatistical facility allocation methods. Environ Sci Pollut Res Int 2023;30:118377-118395. [PMID: 37910363 DOI: 10.1007/s11356-023-30452-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 10/10/2023] [Indexed: 11/03/2023]

Abstract

Petroleum refineries are deemed strategic industrial sectors that can release toxic materials to the environment and cause potential hazards. In this regard, designing and installation of soil contamination monitoring networks at petroleum refineries is a necessity. In this research, we designed an optimal monitoring network with maximum coverage and minimum number of monitoring boreholes. The main regarded parameters are the groundwater contamination history, the location of effective structures, the location of flare stacks and the soil texture. In addition, the soil contamination was calculated based on previous contamination of the soil at the sampling points by the Entropy Weighting Model. It was employed with other parameters to estimate the soil contamination across the site. The Machine Learning method of XGBoost was implemented for estimating and assigning priority for every point of the site. To achieve the optimal network in the optimization program, four parameters were regarded including (a) the optimal value of the optimization program's objective function, (b) the number of Advance Zero-half cuts of the Cut Generation algorithm, (c) the consumed time, and (d) the optimal boreholes number of the network corresponding with different effective contamination detection radius. The network was designed by generalized Maximal Covering Location Problem and for optimizing it, the advantages of Mixed-Integer Linear Programming method were used. To evaluate the applicability of the method, it has been developed and implemented in a refinery in the south of Iran. 92.84% of XGBoost estimation accuracy, the optimal number of 113 and the effective contamination detection radius of 160 m were obtained for boreholes of the network. To investigate the efficiency of the model, a new Regret function has been defined. Furthermore, sensitivity analysis of the parameters and feature importance analysis of XGBoost both showed that the main parameter of the model was the location of effective structures.

Collapse

Zheng J, Zhang Z, Wang J, Zhao R, Liu S, Yang G, Liu Z, Deng Z. Metabolic syndrome prediction model using Bayesian optimization and XGBoost based on traditional Chinese medicine features. Heliyon 2023;9:e22727. [PMID: 38125549 PMCID: PMC10730568 DOI: 10.1016/j.heliyon.2023.e22727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 11/16/2023] [Accepted: 11/17/2023] [Indexed: 12/23/2023] Open

Wang L, Duan SB, Yan P, Luo XQ, Zhang NY. Utilization of interpretable machine learning model to forecast the risk of major adverse kidney events in elderly patients in critical care. Ren Fail 2023;45:2215329. [PMID: 37218683 DOI: 10.1080/0886022x.2023.2215329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023] Open

Hammoudi Halat D, Abdel-Salam ASG, Bensaid A, Soltani A, Alsarraj L, Dalli R, Malki A. Use of machine learning to assess factors affecting progression, retention, and graduation in first-year health professions students in Qatar: a longitudinal study. BMC Med Educ 2023;23:909. [PMID: 38036997 PMCID: PMC10691082 DOI: 10.1186/s12909-023-04887-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Accepted: 11/20/2023] [Indexed: 12/02/2023]

Abstract

BACKGROUND

Across higher education, student retention, progression, and graduation are considered essential elements of students' academic success. However, there is scarce literature analyzing these attributes across health professions education. The current study aims to explore rates of student retention, progression, and graduation across five colleges of the Health Cluster at Qatar University, and identify predictive factors.

METHODS

Secondary longitudinal data for students enrolled at the Health Cluster between 2015 and 2021 were subject to descriptive statistics to obtain retention, progression and graduation rates. The importance of student demographic and academic variables in predicting retention, progression, or graduation was determined by a predictive model using XGBoost, after preparation and feature engineering. A predictive model was constructed, in which weak decision tree models were combined to capture the relationships between the initial predictors and student outcomes. A feature importance score for each predictor was estimated; features that had higher scores were indicative of higher influence on student retention, progression, or graduation.

RESULTS

A total of 88% of the studied cohorts were female Qatari students. The rates of retention and progression across the studied period showed variable distribution, and the majority of students graduated from health colleges within a timeframe of 4-7 years. The first academic year performance, followed by high school GPA, were factors that respectively ranked first and second in importance in predicting retention, progression, and graduation of health majors students. The health college ranked third in importance affecting retention and graduation and fifth regarding progression. The remaining factors including nationality, gender, and whether students were enrolled in a common first year experience for all colleges, had lower predictive importance.

CONCLUSIONS

Student retention, progression, and graduation at Qatar University Health Cluster is complex and multifactorial. First year performance and secondary education before college are important in predicting progress in health majors after the first year of university study. Efforts to increase retention, progression, and graduation rates should include academic advising, student support, engagement and communication. Machine learning-based predictive algorithms remain a useful tool that can be precisely leveraged to identify key variables affecting health professions students' performance.

Collapse

Nedadur R, Bhatt N, Chung J, Chu MWA, Ouzounian M, Wang B. Machine learning and decision making in aortic arch repair. J Thorac Cardiovasc Surg 2023:S0022-5223(23)01108-X. [PMID: 38016622 DOI: 10.1016/j.jtcvs.2023.11.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 11/16/2023] [Accepted: 11/19/2023] [Indexed: 11/30/2023]

Wu J, Zhang C, He F, Wang Y, Zeng L, Liu W, Zhao D, Mao J, Gao F. Factors Affecting Intention to Leave Among ICU Healthcare Professionals in China: Insights from a Cross-Sectional Survey and XGBoost Analysis. Risk Manag Healthc Policy 2023;16:2543-2553. [PMID: 38024488 PMCID: PMC10676671 DOI: 10.2147/rmhp.s432847] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 11/02/2023] [Indexed: 12/01/2023] Open

Sun Y, Zhao Z, Tong H, Sun B, Liu Y, Ren N, You S. Machine Learning Models for Inverse Design of the Electrochemical Oxidation Process for Water Purification. Environ Sci Technol 2023;57:17990-18000. [PMID: 37189261 DOI: 10.1021/acs.est.2c08771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]

Liu J, Cao B, Luo Y, Chen X, Han H, Li L, Zeng J. Risk factors of major bleeding detected by machine learning method in patients undergoing liver resection with controlled low central venous pressure technique. Postgrad Med J 2023;99:1280-1286. [PMID: 37794600 DOI: 10.1093/postmj/qgad087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 08/18/2023] [Accepted: 09/01/2023] [Indexed: 10/06/2023]

Abstract

BACKGROUND

Controlled low central venous pressure (CLCVP) technique has been extensively validated in clinical practices to decrease intraoperative bleeding during liver resection process; however, no studies to date have attempted to propose a scoring method to better understand what risk factors might still be responsible for bleeding when CLCVP technique was implemented.

METHODS

We aimed to use machine learning to develop a model for detecting the risk factors of major bleeding in patients who underwent liver resection using CLCVP technique. We reviewed the medical records of 1077 patients who underwent liver surgery between January 2017 and June 2020. We evaluated the XGBoost model and logistic regression model using stratified K-fold cross-validation (K = 5), and the area under the receiver operating characteristic curve, the recall rate, precision rate, and accuracy score were calculated and compared. The SHapley Additive exPlanations was employed to identify the most influencing factors and their contribution to the prediction.

RESULTS

The XGBoost classifier with an accuracy of 0.80 and precision of 0.89 outperformed the logistic regression model with an accuracy of 0.76 and precision of 0.79. According to the SHapley Additive exPlanations summary plot, the top six variables ranked from most to least important included intraoperative hematocrit, surgery duration, intraoperative lactate, preoperative hemoglobin, preoperative aspartate transaminase, and Pringle maneuver duration.

CONCLUSIONS

Anesthesiologists should be aware of the potential impact of increased Pringle maneuver duration and lactate levels on intraoperative major bleeding in patients undergoing liver resection with CLCVP technique. What is already known on this topic-Low central venous pressure technique has already been extensively validated in clinical practices, with no prediction model for major bleeding. What this study adds-The XGBoost classifier outperformed logistic regression model for the prediction of major bleeding during liver resection with low central venous pressure technique. How this study might affect research, practice, or policy-anesthesiologists should be aware of the potential impact of increased PM duration and lactate levels on intraoperative major bleeding in patients undergoing liver resection with CLCVP technique.

Collapse

Kalita K, Ganesh N, Jayalakshmi S, Chohan JS, Mallik S, Qin H. Multi-Objective artificial bee colony optimized hybrid deep belief network and XGBoost algorithm for heart disease prediction. Front Digit Health 2023;5:1279644. [PMID: 38034907 PMCID: PMC10687430 DOI: 10.3389/fdgth.2023.1279644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 10/27/2023] [Indexed: 12/02/2023] Open

Sun S, Wang L, Lin J, Sun Y, Ma C. An effective prediction model based on XGBoost for the 12-month recurrence of AF patients after RFA. BMC Cardiovasc Disord 2023;23:561. [PMID: 37974062 PMCID: PMC10655386 DOI: 10.1186/s12872-023-03599-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Accepted: 11/07/2023] [Indexed: 11/19/2023] Open

Abstract

BACKGROUND

Atrial fibrillation (AF) is a common heart rhythm disorder that can lead to complications such as stroke and heart failure. Radiofrequency ablation (RFA) is a procedure used to treat AF, but it is not always successful in maintaining a normal heart rhythm. This study aimed to construct a clinical prediction model based on extreme gradient boosting (XGBoost) for AF recurrence 12 months after ablation.

METHODS

The 27-dimensional data of 359 patients with AF undergoing RFA in the First Affiliated Hospital of Soochow University from October 2018 to November 2021 were retrospectively analysed. We adopted the logistic regression, support vector machine (SVM), random forest (RF) and XGBoost methods to conduct the experiment. To evaluate the performance of the prediction, we used the area under the receiver operating characteristic curve (AUC), the area under the precision-recall curve (AP), and calibration curves of both the training and testing sets. Finally, Shapley additive explanations (SHAP) were utilized to explain the significance of the variables.

RESULTS

Of the 27-dimensional variables, ejection fraction (EF) of the left atrial appendage (LAA), N-terminal probrain natriuretic peptide (NT-proBNP), global peak longitudinal strain of the LAA (LAAGPLS), left atrial diameter (LAD), diabetes mellitus (DM) history, and female sex had a significant role in the predictive model. The experimental results demonstrated that XGBoost exhibited the best performance among these methods, and the accuracy, specificity, sensitivity, precision and F1 score (a measure of test accuracy) of XGBoost were 86.1%, 89.7%, 71.4%, 62.5% and 0.67, respectively. In addition, SHAP analysis also proved that the 6 parameters were decisive for the effect of the XGBoost-based prediction model.

CONCLUSIONS

We proposed an effective model based on XGBoost that can be used to predict the recurrence of AF patients after RFA. This prediction result can guide treatment decisions and help to optimize the management of AF.

Collapse

Li W, Yu S, Yang R, Tian Y, Zhu T, Liu H, Jiao D, Zhang F, Liu X, Tao L, Gao Y, Li Q, Zhang J, Guo X. Machine Learning Model of ResNet50-Ensemble Voting for Malignant-Benign Small Pulmonary Nodule Classification on Computed Tomography Images. Cancers (Basel) 2023;15:5417. [PMID: 38001677 PMCID: PMC10670717 DOI: 10.3390/cancers15225417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 09/21/2023] [Accepted: 09/26/2023] [Indexed: 11/26/2023] Open

Abstract

BACKGROUND

The early detection of benign and malignant lung tumors enabled patients to diagnose lesions and implement appropriate health measures earlier, dramatically improving lung cancer patients' quality of living. Machine learning methods performed admirably when recognizing small benign and malignant lung nodules. However, exploration and investigation are required to fully leverage the potential of machine learning in distinguishing between benign and malignant small lung nodules.

OBJECTIVE

The aim of this study was to develop and evaluate the ResNet50-Ensemble Voting model for detecting the benign and malignant nature of small pulmonary nodules (<20 mm) based on CT images.

METHODS

In this study, 834 CT imaging data from 396 patients with small pulmonary nodules were gathered and randomly assigned to the training and validation sets in an 8:2 ratio. ResNet50 and VGG16 algorithms were utilized to extract CT image features, followed by XGBoost, SVM, and Ensemble Voting techniques for classification, for a total of ten different classes of machine learning combinatorial classifiers. Indicators such as accuracy, sensitivity, and specificity were used to assess the models. The collected features are also shown to investigate the contrasts between them.

RESULTS

The algorithm we presented, ResNet50-Ensemble Voting, performed best in the test set, with an accuracy of 0.943 (0.938, 0.948) and sensitivity and specificity of 0.964 and 0.911, respectively. VGG16-Ensemble Voting had an accuracy of 0.887 (0.880, 0.894), with a sensitivity and specificity of 0.952 and 0.784, respectively.

CONCLUSION

Machine learning models that were implemented and integrated ResNet50-Ensemble Voting performed exceptionally well in identifying benign and malignant small pulmonary nodules (<20 mm) from various sites, which might help doctors in accurately diagnosing the nature of early-stage lung nodules in clinical practice.

Collapse

Affiliation(s)

Weiming Li Department of Epidemiology and Health Statistics, School of Public Health, Capital Medical University, Beijing 100069, China; (W.L.); (S.Y.); (R.Y.); (Y.T.); (T.Z.); (H.L.); (D.J.); (F.Z.); (X.L.); (L.T.) Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing 100069, China
Siqi Yu Department of Epidemiology and Health Statistics, School of Public Health, Capital Medical University, Beijing 100069, China; (W.L.); (S.Y.); (R.Y.); (Y.T.); (T.Z.); (H.L.); (D.J.); (F.Z.); (X.L.); (L.T.) Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing 100069, China
Runhuang Yang Department of Epidemiology and Health Statistics, School of Public Health, Capital Medical University, Beijing 100069, China; (W.L.); (S.Y.); (R.Y.); (Y.T.); (T.Z.); (H.L.); (D.J.); (F.Z.); (X.L.); (L.T.) Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing 100069, China
Yixing Tian Department of Epidemiology and Health Statistics, School of Public Health, Capital Medical University, Beijing 100069, China; (W.L.); (S.Y.); (R.Y.); (Y.T.); (T.Z.); (H.L.); (D.J.); (F.Z.); (X.L.); (L.T.) Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing 100069, China
Tianyu Zhu Department of Epidemiology and Health Statistics, School of Public Health, Capital Medical University, Beijing 100069, China; (W.L.); (S.Y.); (R.Y.); (Y.T.); (T.Z.); (H.L.); (D.J.); (F.Z.); (X.L.); (L.T.) Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing 100069, China
Haotian Liu Department of Epidemiology and Health Statistics, School of Public Health, Capital Medical University, Beijing 100069, China; (W.L.); (S.Y.); (R.Y.); (Y.T.); (T.Z.); (H.L.); (D.J.); (F.Z.); (X.L.); (L.T.) Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing 100069, China
Danyang Jiao Department of Epidemiology and Health Statistics, School of Public Health, Capital Medical University, Beijing 100069, China; (W.L.); (S.Y.); (R.Y.); (Y.T.); (T.Z.); (H.L.); (D.J.); (F.Z.); (X.L.); (L.T.) Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing 100069, China
Feng Zhang Department of Epidemiology and Health Statistics, School of Public Health, Capital Medical University, Beijing 100069, China; (W.L.); (S.Y.); (R.Y.); (Y.T.); (T.Z.); (H.L.); (D.J.); (F.Z.); (X.L.); (L.T.) Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing 100069, China
Xiangtong Liu Department of Epidemiology and Health Statistics, School of Public Health, Capital Medical University, Beijing 100069, China; (W.L.); (S.Y.); (R.Y.); (Y.T.); (T.Z.); (H.L.); (D.J.); (F.Z.); (X.L.); (L.T.) Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing 100069, China
Lixin Tao Department of Epidemiology and Health Statistics, School of Public Health, Capital Medical University, Beijing 100069, China; (W.L.); (S.Y.); (R.Y.); (Y.T.); (T.Z.); (H.L.); (D.J.); (F.Z.); (X.L.); (L.T.) Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing 100069, China
Yan Gao Department of Nuclear Medicine, Xuanwu Hospital Capital Medical University, Beijing 100053, China;
Qiang Li Beijing Physical Examination Center, Beijing 100050, China; (Q.L.); (J.Z.)
Jingbo Zhang Beijing Physical Examination Center, Beijing 100050, China; (Q.L.); (J.Z.)
Xiuhua Guo Department of Epidemiology and Health Statistics, School of Public Health, Capital Medical University, Beijing 100069, China; (W.L.); (S.Y.); (R.Y.); (Y.T.); (T.Z.); (H.L.); (D.J.); (F.Z.); (X.L.); (L.T.) Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing 100069, China

Collapse

Tore U, Abilgazym A, Asunsolo-del-Barco A, Terzic M, Yemenkhan Y, Zollanvari A, Sarria-Santamera A. Diagnosis of Endometriosis Based on Comorbidities: A Machine Learning Approach. Biomedicines 2023;11:3015. [PMID: 38002015 PMCID: PMC10669733 DOI: 10.3390/biomedicines11113015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 10/27/2023] [Accepted: 10/31/2023] [Indexed: 11/26/2023] Open

Abstract

Endometriosis is defined as the presence of estrogen-dependent endometrial-like tissue outside the uterine cavity. Despite extensive research, endometriosis is still an enigmatic disease and is challenging to diagnose and treat. A common clinical finding is the association of endometriosis with multiple diseases. We use a total of 627,566 clinically collected data from cases of endometriosis (0.82%) and controls (99.18%) to construct and evaluate predictive models. We develop a machine learning platform to construct diagnostic tools for endometriosis. The platform consists of logistic regression, decision tree, random forest, AdaBoost, and XGBoost for prediction, and uses Shapley Additive Explanation (SHAP) values to quantify the importance of features. In the model selection phase, the constructed XGBoost model performs better than other algorithms while achieving an area under the curve (AUC) of 0.725 on the test set during the evaluation phase, resulting in a specificity of 62.9% and a sensitivity of 68.6%. The model leads to a quite low positive predictive value of 1.5%, but a quite satisfactory negative predictive value of 99.58%. Moreover, the feature importance analysis points to age, infertility, uterine fibroids, anxiety, and allergic rhinitis as the top five most important features for predicting endometriosis. Although these results show the feasibility of using machine learning to improve the diagnosis of endometriosis, more research is required to improve the performance of predictive models for the diagnosis of endometriosis. This state of affairs is in part attributed to the complex nature of the condition and, at the same time, the administrative nature of our features. Should more informative features be used, we could possibly achieve a higher AUC for predicting endometriosis. As a result, we merely perceive the constructed predictive model as a tool to provide auxiliary information in clinical practice.

Collapse

Saylam B, İncel ÖD. Quantifying Digital Biomarkers for Well-Being: Stress, Anxiety, Positive and Negative Affect via Wearable Devices and Their Time-Based Predictions. Sensors (Basel) 2023;23:8987. [PMID: 37960685 PMCID: PMC10649682 DOI: 10.3390/s23218987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 10/27/2023] [Accepted: 11/03/2023] [Indexed: 11/15/2023]

Yuan Y, Han Y, Yap CW, Kochhar JS, Li H, Xiang X, Kang L. Prediction of drug permeation through microneedled skin by machine learning. Bioeng Transl Med 2023;8:e10512. [PMID: 38023708 PMCID: PMC10658566 DOI: 10.1002/btm2.10512] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 02/22/2023] [Accepted: 03/08/2023] [Indexed: 04/07/2023] Open

Ma Y, Zhang J, Lu J, Chen S, Xing G, Feng R. Prediction and analysis of likelihood of freeway crash occurrence considering risky driving behavior. Accid Anal Prev 2023;192:107244. [PMID: 37573710 DOI: 10.1016/j.aap.2023.107244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 07/29/2023] [Accepted: 07/30/2023] [Indexed: 08/15/2023]

Abstract

The prediction of the likelihood of vehicle crashes constitutes an indispensable component of freeway safety management. Due to data collection limitations, studies have used mainly traffic flow-related variables to develop freeway crash prediction models but rarely have considered the effect of risky driving behavior on the likelihood of crashes. This study employed navigation software to collect driving behavior data and integrated multi-source data that include vehicle speed, traffic volume, and congestion index values. The study also employed the 'synthesizing minority oversampling technique and edited nearest neighbor' (SMOTE + ENN) coupled method for data balance processing. Three freeway crash likelihood prediction models were built based on the binomial logit, eXtreme Gradient Boosting (XGBoost), and support vector machine algorithms, respectively. The Shapley additive explanation (SHAP) algorithm was utilized to explore the effect of each feature variable on the likelihood of crashes. The results show that the prediction accuracy of the XGBoost model is the best of the three compared models. Under the optimal control-to-case ratio (1:1), the prediction accuracy of the XGBoost model reached 0.96 in this study, and the recall rate, specificity, and area-under-the-curve values were 0.86, 0.96, and 0.907, respectively. Comparative test results demonstrate that ranking risky driving behavior into three levels of intensity can effectively enhance the predictive accuracy of the XGBoost model. Moreover, the XGBoost model with its ten-minute time step outperformed the XGBoost model with its five-minute time step in terms of prediction accuracy. The results of the SHAP-based analysis show that the likelihood of highway crashes is high when the traffic congestion level is high and the distribution of the vehicle speed in the upstream roadway section is significant. Also, both sharp acceleration and sharp deceleration lead to greater likelihood of crashes. This paper aims to provide an effective framework for predicting and interpreting the likelihood of freeway crashes, thereby providing guidance for crash prevention, driver training, and the development of traffic regulations.

Collapse

Al-Shboul KF. Unraveling the complex interplay between soil characteristics and radon surface exhalation rates through machine learning models and multivariate analysis. Environ Pollut 2023;336:122440. [PMID: 37625775 DOI: 10.1016/j.envpol.2023.122440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 07/28/2023] [Accepted: 08/22/2023] [Indexed: 08/27/2023]

Abstract

This research seeks to elucidate the intricate interplay between soil characteristics and the rates of radon surface exhalation rate. To achieve this aim, Light Gradient Boosting Machine (LightGBM) and eXtreme Gradient Boosting (XGBoost) machine learning (ML) algorithms are employed, supported by Multivariate Analysis (MA). An analysis was performed on a collection of soil samples, examining radon surface exhalation rates and other pertinent properties such as moisture content, particle size distributions, and the concentrations of Ra-226, Th-232, and K-40. The analysis revealed several key factors influencing radon exhalation rates, namely Ra-226 concentration, moisture content, and larger soil particles. To visualize the intricate relationships between these variables, contour plots of experimental and ML-generated data were created. These visual representations demonstrated that elevated soil moisture levels decrease radon exhalation rates. In contrast, higher concentrations of Ra-226 and a greater proportion of large soil particles led to an increase in exhalation rates. This endeavor presents these complex relationships in an accessible manner, furthering our understanding of the factors in radon surface exhalation. MA techniques, including Hierarchical Cluster Analysis (HCA) and Principal Component Analysis (PCA), were initially employed to investigate the complex interactions of soil attributes on radon exhalation. HCA identified three distinct clusters but faced limitations in detecting strong negative impacts. PCA successfully captured these inverse effects, indicating that the first two principal components accounted for approximately 80% of the total variance, primarily attributed to Ra-226 concentration, moisture content, and the percentage of large soil particles. However, neither technique could quantify the effects of soil attributes on radon exhalation rates. LightGBM outperformed XGBoost, but both successfully quantified the impacts of the studied soil characteristics on radon exhalation. Sensitivity analysis confirmed the robustness and accuracy of both models. This study highlights that XGBoost and LightGBM algorithms can effectively quantify radon exhalation rates based on soil characteristics, providing valuable insights for environmental policies, land use planning, and radon mitigation strategies.

Collapse

Atehortúa A, Gkontra P, Camacho M, Diaz O, Bulgheroni M, Simonetti V, Chadeau-Hyam M, Felix JF, Sebert S, Lekadir K. Cardiometabolic risk estimation using exposome data and machine learning. Int J Med Inform 2023;179:105209. [PMID: 37729839 DOI: 10.1016/j.ijmedinf.2023.105209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 08/11/2023] [Accepted: 08/30/2023] [Indexed: 09/22/2023]

Abstract

BACKGROUND

The human exposome encompasses all exposures that individuals encounter throughout their lifetime. It is now widely acknowledged that health outcomes are influenced not only by genetic factors but also by the interactions between these factors and various exposures. Consequently, the exposome has emerged as a significant contributor to the overall risk of developing major diseases, such as cardiovascular disease (CVD) and diabetes. Therefore, personalized early risk assessment based on exposome attributes might be a promising tool for identifying high-risk individuals and improving disease prevention.

OBJECTIVE

Develop and evaluate a novel and fair machine learning (ML) model for CVD and type 2 diabetes (T2D) risk prediction based on a set of readily available exposome factors. We evaluated our model using internal and external validation groups from a multi-center cohort. To be considered fair, the model was required to demonstrate consistent performance across different sub-groups of the cohort.

METHODS

From the UK Biobank, we identified 5,348 and 1,534 participants who within 13 years from the baseline visit were diagnosed with CVD and T2D, respectively. An equal number of participants who did not develop these pathologies were randomly selected as the control group. 109 readily available exposure variables from six different categories (physical measures, environmental, lifestyle, mental health events, sociodemographics, and early-life factors) from the participant's baseline visit were considered. We adopted the XGBoost ensemble model to predict individuals at risk of developing the diseases. The model's performance was compared to that of an integrative ML model which is based on a set of biological, clinical, physical, and sociodemographic variables, and, additionally for CVD, to the Framingham risk score. Moreover, we assessed the proposed model for potential bias related to sex, ethnicity, and age. Lastly, we interpreted the model's results using SHAP, a state-of-the-art explainability method.

RESULTS

The proposed ML model presents a comparable performance to the integrative ML model despite using solely exposome information, achieving a ROC-AUC of 0.78±0.01 and 0.77±0.01 for CVD and T2D, respectively. Additionally, for CVD risk prediction, the exposome-based model presents an improved performance over the traditional Framingham risk score. No bias in terms of key sensitive variables was identified.

CONCLUSIONS

We identified exposome factors that play an important role in identifying patients at risk of CVD and T2D, such as naps during the day, age completed full-time education, past tobacco smoking, frequency of tiredness/unenthusiasm, and current work status. Overall, this work demonstrates the potential of exposome-based machine learning as a fair CVD and T2D risk assessment tool.

Collapse

Yu Y, Li J, Li J, Zen X, Fu Q. Evidence from Machine Learning, Diagnostic Hub Genes in Sepsis and Diagnostic Models based on Xgboost Models, Novel Molecular Models for the Diagnosis of Sepsis. Curr Med Chem 2023:CMC-EPUB-135666. [PMID: 37921181 DOI: 10.2174/0109298673273009231017061448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 09/15/2023] [Accepted: 09/26/2023] [Indexed: 11/04/2023]

Ganie SM, Pramanik PKD, Bashir Malik M, Mallik S, Qin H. An ensemble learning approach for diabetes prediction using boosting techniques. Front Genet 2023;14:1252159. [PMID: 37953921 PMCID: PMC10639159 DOI: 10.3389/fgene.2023.1252159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 10/16/2023] [Indexed: 11/14/2023] Open

100

Zaki FR, Monroy GL, Shi J, Sudhir K, Boppart SA. Texture-based speciation of otitis media-related bacterial biofilms from optical coherence tomography images using supervised classification. Res Sq 2023:rs.3.rs-3466690. [PMID: 37961282 PMCID: PMC10635317 DOI: 10.21203/rs.3.rs-3466690/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]