1
|
Halabi R, Selvarajan R, Lin Z, Herd C, Li X, Kabrit J, Tummalacherla M, Chaibub Neto E, Pratap A. Comparative Assessment of Multimodal Sensor Data Quality Collected Using Android and iOS Smartphones in Real-World Settings. SENSORS (BASEL, SWITZERLAND) 2024; 24:6246. [PMID: 39409286 PMCID: PMC11478693 DOI: 10.3390/s24196246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Revised: 09/03/2024] [Accepted: 09/18/2024] [Indexed: 10/20/2024]
Abstract
Healthcare researchers are increasingly utilizing smartphone sensor data as a scalable and cost-effective approach to studying individualized health-related behaviors in real-world settings. However, to develop reliable and robust digital behavioral signatures that may help in the early prediction of the individualized disease trajectory and future prognosis, there is a critical need to quantify the potential variability that may be present in the underlying sensor data due to variations in the smartphone hardware and software used by large population. Using sensor data collected in real-world settings from 3000 participants' smartphones for up to 84 days, we compared differences in the completeness, correctness, and consistency of the three most common smartphone sensors-the accelerometer, gyroscope, and GPS- within and across Android and iOS devices. Our findings show considerable variation in sensor data quality within and across Android and iOS devices. Sensor data from iOS devices showed significantly lower levels of anomalous point density (APD) compared to Android across all sensors (p < 1 × 10-4). iOS devices showed a considerably lower missing data ratio (MDR) for the accelerometer compared to the GPS data (p < 1 × 10-4). Notably, the quality features derived from raw sensor data across devices alone could predict the device type (Android vs. iOS) with an up to 0.98 accuracy 95% CI [0.977, 0.982]. Such significant differences in sensor data quantity and quality gathered from iOS and Android platforms could lead to considerable variation in health-related inference derived from heterogenous consumer-owned smartphones. Our research highlights the importance of assessing, measuring, and adjusting for such critical differences in smartphone sensor-based assessments. Understanding the factors contributing to the variation in sensor data based on daily device usage will help develop reliable, standardized, inclusive, and practically applicable digital behavioral patterns that may be linked to health outcomes in real-world settings.
Collapse
Affiliation(s)
- Ramzi Halabi
- Centre for Addiction and Mental Health, Toronto, ON M6J 1H4, Canada; (R.H.); (R.S.); (Z.L.); (C.H.); (X.L.); (J.K.)
| | - Rahavi Selvarajan
- Centre for Addiction and Mental Health, Toronto, ON M6J 1H4, Canada; (R.H.); (R.S.); (Z.L.); (C.H.); (X.L.); (J.K.)
| | - Zixiong Lin
- Centre for Addiction and Mental Health, Toronto, ON M6J 1H4, Canada; (R.H.); (R.S.); (Z.L.); (C.H.); (X.L.); (J.K.)
| | - Calvin Herd
- Centre for Addiction and Mental Health, Toronto, ON M6J 1H4, Canada; (R.H.); (R.S.); (Z.L.); (C.H.); (X.L.); (J.K.)
| | - Xueying Li
- Centre for Addiction and Mental Health, Toronto, ON M6J 1H4, Canada; (R.H.); (R.S.); (Z.L.); (C.H.); (X.L.); (J.K.)
| | - Jana Kabrit
- Centre for Addiction and Mental Health, Toronto, ON M6J 1H4, Canada; (R.H.); (R.S.); (Z.L.); (C.H.); (X.L.); (J.K.)
| | | | | | - Abhishek Pratap
- Centre for Addiction and Mental Health, Toronto, ON M6J 1H4, Canada; (R.H.); (R.S.); (Z.L.); (C.H.); (X.L.); (J.K.)
- Department of Psychiatry, University of Toronto, Toronto, ON M5S 1A1, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON M5T 1R8, Canada
- Institute of Psychiatry, Psychology & Neuroscience, King’s College London, London WC2R 2LS, UK
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
2
|
Hu T, Li K, Ma C, Zhou N, Chen Q, Qi C. Improved classification of soil As contamination at continental scale: Resolving class imbalances using machine learning approach. CHEMOSPHERE 2024; 363:142697. [PMID: 38925515 DOI: 10.1016/j.chemosphere.2024.142697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 06/11/2024] [Accepted: 06/23/2024] [Indexed: 06/28/2024]
Abstract
The identification of arsenic (As)-contaminated areas is an important prerequisite for soil management and reclamation. Although previous studies have attempted to identify soil As contamination via machine learning (ML) methods combined with soil spectroscopy, they have ignored the rarity of As-contaminated soil samples, leading to an imbalanced learning problem. A novel ML framework was thus designed herein to solve the imbalance issue in identifying soil As contamination from soil visible and near-infrared spectra. Spectral preprocessing, imbalanced dataset resampling, and model comparisons were combined in the ML framework, and the optimal combination was selected based on the recall. In addition, Bayesian optimization was used to tune the model hyperparameters. The optimized model achieved recall, area under the curve, and balanced accuracy values of 0.83, 0.88, and 0.79, respectively, on the testing set. The recall was further improved to 0.87 with the threshold adjustment, indicating the model's excellent performance and generalization capability in classifying As-contaminated soil samples. The optimal model was applied to a global soil spectral dataset to predict areas at a high risk of soil As contamination on a global scale. The ML framework established in this study represents a milestone in the classification of soil As contamination and can serve as a valuable reference for contamination management in soil science.
Collapse
Affiliation(s)
- Tao Hu
- School of Resources and Safety Engineering, Central South University, Changsha, 410083, China
| | - Kechao Li
- School of Resources and Safety Engineering, Central South University, Changsha, 410083, China
| | - Chundi Ma
- School of Resources and Safety Engineering, Central South University, Changsha, 410083, China
| | - Nana Zhou
- School of Resources and Safety Engineering, Central South University, Changsha, 410083, China
| | - Qiusong Chen
- School of Resources and Safety Engineering, Central South University, Changsha, 410083, China
| | - Chongchong Qi
- School of Resources and Safety Engineering, Central South University, Changsha, 410083, China; School of Metallurgy and Environment, Central South University, Changsha, 410083, China; Fankou Lead-Zinc Mine, NONFEMET, Shaoguan, 511100, China.
| |
Collapse
|
3
|
Barai P, Leroy G, Bisht P, Rothman JM, Lee S, Andrews J, Rice SA, Ahmed A. Crowdsourcing with Enhanced Data Quality Assurance: An Efficient Approach to Mitigate Resource Scarcity Challenges in Training Large Language Models for Healthcare. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2024; 2024:75-84. [PMID: 38827063 PMCID: PMC11141838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Large Language Models (LLMs) have demonstrated immense potential in artificial intelligence across various domains, including healthcare. However, their efficacy is hindered by the need for high-quality labeled data, which is often expensive and time-consuming to create, particularly in low-resource domains like healthcare. To address these challenges, we propose a crowdsourcing (CS) framework enriched with quality control measures at the pre-, real-time-, and post-data gathering stages. Our study evaluated the effectiveness of enhancing data quality through its impact on LLMs (Bio-BERT) for predicting autism-related symptoms. The results show that real-time quality control improves data quality by 19% compared to pre-quality control. Fine-tuning Bio-BERT using crowdsourced data generally increased recall compared to the Bio-BERT baseline but lowered precision. Our findings highlighted the potential of crowdsourcing and quality control in resource-constrained environments and offered insights into optimizing healthcare LLMs for informed decision-making and improved patient care.
Collapse
Affiliation(s)
| | - Gondy Leroy
- The University of Arizona, Tucson 85721, U.S.A
| | | | | | - Sumi Lee
- The University of Arizona, Tucson 85721, U.S.A
| | | | | | - Arif Ahmed
- The University of Arizona, Tucson 85721, U.S.A
| |
Collapse
|
4
|
Yang J, Peng H, Luo Y, Zhu T, Xie L. Explainable ensemble machine learning model for prediction of 28-day mortality risk in patients with sepsis-associated acute kidney injury. Front Med (Lausanne) 2023; 10:1165129. [PMID: 37275353 PMCID: PMC10232880 DOI: 10.3389/fmed.2023.1165129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Accepted: 05/02/2023] [Indexed: 06/07/2023] Open
Abstract
Background Sepsis-associated acute kidney injury (S-AKI) is a major contributor to mortality in intensive care units (ICU). Early prediction of mortality risk is crucial to enhance prognosis and optimize clinical decisions. This study aims to develop a 28-day mortality risk prediction model for S-AKI utilizing an explainable ensemble machine learning (ML) algorithm. Methods This study utilized data from the Medical Information Mart for Intensive Care IV (MIMIC-IV 2.0) database to gather information on patients with S-AKI. Univariate regression, correlation analysis and Boruta were combined for feature selection. To construct the four ML models, hyperparameters were tuned via random search and five-fold cross-validation. To evaluate the performance of all models, ROC, K-S, and LIFT curves were used. The discrimination of ML models and traditional scoring systems was compared using area under the receiver operating characteristic curve (AUC). Additionally, the SHapley Additive exPlanation (SHAP) was utilized to interpret the ML model and identify essential variables. To investigate the relationship between the top nine continuous variables and the risk of 28-day mortality. COX regression-restricted cubic splines were utilized while controlling for age and comorbidities. Results The study analyzed data from 9,158 patients with S-AKI, dividing them into a 28-day mortality group of 1,940 and a survival group of 7,578. The results showed that XGBoost was the best performing model of the four ML models with AUC of 0.873. All models outperformed APS-III 0.713 and SAPS-II 0.681. The K-S and LIFT curves indicated XGBoost as the most effective predictor for 28-day mortality risk. The model's performance was evaluated using ROCpr curves, calibration curves, accuracy, precision, and F1 scores. SHAP force plots were utilized to interpret and visualize the personalized predictive power of the 28-day mortality risk model. Additionally, COX regression restricted cubic splines revealed an interesting non-linear relationship between the top nine variables and 28-day mortality. Conclusion The use of ensemble ML models has shown to be more effective than the LR model and conventional scoring systems in predicting 28-day mortality risk in S-AKI patients. By visualizing the XGBoost model with the best predictive performance, clinicians are able to identify high-risk patients early on and improve prognosis.
Collapse
Affiliation(s)
- Jijun Yang
- Department of Critical Care Medicine, Loudi Central Hospital, Loudi, China
| | - Hongbing Peng
- Department of Pulmonary and Critical Care Medicine, Loudi Central Hospital, Loudi, China
| | - Youhong Luo
- Department of Critical Care Medicine, Loudi Central Hospital, Loudi, China
| | - Tao Zhu
- Department of Critical Care Medicine, Loudi Central Hospital, Loudi, China
| | - Li Xie
- Patient Service Center, Loudi Central Hospital, Loudi, China
| |
Collapse
|
5
|
Levy TJ, Coppa K, Cang J, Barnaby DP, Paradis MD, Cohen SL, Makhnevich A, van Klaveren D, Kent DM, Davidson KW, Hirsch JS, Zanos TP. Development and validation of self-monitoring auto-updating prognostic models of survival for hospitalized COVID-19 patients. Nat Commun 2022; 13:6812. [PMID: 36357420 PMCID: PMC9648888 DOI: 10.1038/s41467-022-34646-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 11/02/2022] [Indexed: 11/12/2022] Open
Abstract
Clinical prognostic models can assist patient care decisions. However, their performance can drift over time and location, necessitating model monitoring and updating. Despite rapid and significant changes during the pandemic, prognostic models for COVID-19 patients do not currently account for these drifts. We develop a framework for continuously monitoring and updating prognostic models and apply it to predict 28-day survival in COVID-19 patients. We use demographic, laboratory, and clinical data from electronic health records of 34912 hospitalized COVID-19 patients from March 2020 until May 2022 and compare three modeling methods. Model calibration performance drift is immediately detected with minor fluctuations in discrimination. The overall calibration on the prospective validation cohort is significantly improved when comparing the dynamically updated models against their static counterparts. Our findings suggest that, using this framework, models remain accurate and well-calibrated across various waves, variants, race and sex and yield positive net-benefits.
Collapse
Affiliation(s)
- Todd J Levy
- Institute of Health System Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, 11030, USA
- Institute of Bioelectronic Medicine, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, 11030, USA
| | - Kevin Coppa
- Clinical Digital Solutions, Northwell Health, New Hyde Park, NY, 11042, USA
| | - Jinxuan Cang
- Institute of Health System Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, 11030, USA
- Institute of Bioelectronic Medicine, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, 11030, USA
| | - Douglas P Barnaby
- Institute of Health System Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, 11030, USA
- Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Northwell Health, Hempstead, NY, 11549, USA
| | - Marc D Paradis
- Northwell Holdings, Northwell Health, Manhasset, NY, 11030, USA
| | - Stuart L Cohen
- Institute of Health System Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, 11030, USA
- Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Northwell Health, Hempstead, NY, 11549, USA
| | - Alex Makhnevich
- Institute of Health System Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, 11030, USA
- Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Northwell Health, Hempstead, NY, 11549, USA
| | - David van Klaveren
- Department of Public Health, Erasmus MC University Medical Center, Rotterdam, Netherlands
- Predictive Analytics and Comparative Effectiveness Center, Tufts Medical Center, Boston, MA, USA
| | - David M Kent
- Predictive Analytics and Comparative Effectiveness Center, Tufts Medical Center, Boston, MA, USA
| | - Karina W Davidson
- Institute of Health System Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, 11030, USA
- Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Northwell Health, Hempstead, NY, 11549, USA
| | - Jamie S Hirsch
- Institute of Health System Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, 11030, USA
- Clinical Digital Solutions, Northwell Health, New Hyde Park, NY, 11042, USA
- Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Northwell Health, Hempstead, NY, 11549, USA
| | - Theodoros P Zanos
- Institute of Health System Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, 11030, USA.
- Institute of Bioelectronic Medicine, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, 11030, USA.
- Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Northwell Health, Hempstead, NY, 11549, USA.
| |
Collapse
|
6
|
Feng H, Wang H, Xu L, Ren Y, Ni Q, Yang Z, Ma S, Deng Q, Chen X, Xia B, Kuang Y, Li X. Prediction of radiation-induced acute skin toxicity in breast cancer patients using data encapsulation screening and dose-gradient-based multi-region radiomics technique: A multicenter study. Front Oncol 2022; 12:1017435. [PMID: 36439515 PMCID: PMC9686850 DOI: 10.3389/fonc.2022.1017435] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Accepted: 09/27/2022] [Indexed: 11/11/2022] Open
Abstract
Purpose Radiation-induced dermatitis is one of the most common side effects for breast cancer patients treated with radiation therapy (RT). Acute complications can have a considerable impact on tumor control and quality of life for breast cancer patients. In this study, we aimed to develop a novel quantitative high-accuracy machine learning tool for prediction of radiation-induced dermatitis (grade ≥ 2) (RD 2+) before RT by using data encapsulation screening and multi-region dose-gradient-based radiomics techniques, based on the pre-treatment planning computed tomography (CT) images, clinical and dosimetric information of breast cancer patients. Methods and Materials 214 patients with breast cancer who underwent RT between 2018 and 2021 were retrospectively collected from 3 cancer centers in China. The CT images, as well as the clinical and dosimetric information of patients were retrieved from the medical records. 3 PTV dose related ROIs, including irradiation volume covered by 100%, 105%, and 108% of prescribed dose, combined with 3 skin dose-related ROIs, including irradiation volume covered by 20-Gy, 30-Gy, 40-Gy isodose lines within skin, were contoured for radiomics feature extraction. A total of 4280 radiomics features were extracted from all 6 ROIs. Meanwhile, 29 clinical and dosimetric characteristics were included in the data analysis. A data encapsulation screening algorithm was applied for data cleaning. Multiple-variable logistic regression and 5-fold-cross-validation gradient boosting decision tree (GBDT) were employed for modeling training and validation, which was evaluated by using receiver operating characteristic analysis. Results The best predictors for symptomatic RD 2+ were the combination of 20 radiomics features, 8 clinical and dosimetric variables, achieving an area under the curve (AUC) of 0.998 [95% CI: 0.996-1.0] and an AUC of 0.911 [95% CI: 0.838-0.983] in the training and validation dataset, respectively, in the 5-fold-cross-validation GBDT model. Meanwhile, the top 12 most important characteristics as well as their corresponding importance measures for RD 2+ prediction in the GBDT machine learning process were identified and calculated. Conclusions A novel multi-region dose-gradient-based GBDT machine learning framework with a random forest based data encapsulation screening method integrated can achieve a high-accuracy prediction of acute RD 2+ in breast cancer patients.
Collapse
Affiliation(s)
- Huichun Feng
- Medical Imaging and Translational Medicine Laboratory, Hangzhou Cancer Center, Hangzhou, China
- Patient follow-up center, Hangzhou Cancer Hospital, Hangzhou, China
| | - Hui Wang
- Medical Imaging and Translational Medicine Laboratory, Hangzhou Cancer Center, Hangzhou, China
- Department of Radiotherapy, Affiliated Hangzhou Cancer Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Lixia Xu
- Medical Imaging and Translational Medicine Laboratory, Hangzhou Cancer Center, Hangzhou, China
- Department of Radiotherapy, Affiliated Hangzhou Cancer Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Yao Ren
- Medical Imaging and Translational Medicine Laboratory, Hangzhou Cancer Center, Hangzhou, China
- Department of Radiotherapy, Affiliated Hangzhou Cancer Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Qianxi Ni
- Department of Radiology, Hunan Cancer Hospital, Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, China
| | - Zhen Yang
- Department of Radiotherapy, Xiangya Hospital Central South University, Changsha, China
| | - Shenglin Ma
- Medical Imaging and Translational Medicine Laboratory, Hangzhou Cancer Center, Hangzhou, China
- Medical Oncology, Xiaoshan Hospital Affiliated to Hangzhou Normal University, Hangzhou, China
| | - Qinghua Deng
- Medical Imaging and Translational Medicine Laboratory, Hangzhou Cancer Center, Hangzhou, China
- Patient follow-up center, Hangzhou Cancer Hospital, Hangzhou, China
| | - Xueqin Chen
- Medical Imaging and Translational Medicine Laboratory, Hangzhou Cancer Center, Hangzhou, China
- Patient follow-up center, Hangzhou Cancer Hospital, Hangzhou, China
| | - Bing Xia
- Medical Imaging and Translational Medicine Laboratory, Hangzhou Cancer Center, Hangzhou, China
- Patient follow-up center, Hangzhou Cancer Hospital, Hangzhou, China
| | - Yu Kuang
- Medical Physics Program, University of Nevada, Las Vegas, NV, United States
- *Correspondence: Xiadong Li, ; Yu Kuang,
| | - Xiadong Li
- Medical Imaging and Translational Medicine Laboratory, Hangzhou Cancer Center, Hangzhou, China
- Department of Radiotherapy, Affiliated Hangzhou Cancer Hospital, Zhejiang University School of Medicine, Hangzhou, China
- *Correspondence: Xiadong Li, ; Yu Kuang,
| |
Collapse
|
7
|
Machine learning methods to predict 30-day hospital readmission outcome among US adults with pneumonia: analysis of the national readmission database. BMC Med Inform Decis Mak 2022; 22:288. [DOI: 10.1186/s12911-022-01995-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 09/14/2022] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
Hospital readmissions for pneumonia are a growing concern in the US, with significant consequences for costs and quality of care. This study developed the rule-based model and other machine learning (ML) models to predict 30-day readmission risk in patients with pneumonia and compared model performance.
Methods
This population-based study involved patients aged ≥ 18 years hospitalized with pneumonia from January 1, 2016, through November 30, 2016, using the Healthcare Cost and Utilization Project-National Readmission Database (HCUP-NRD). Rule-based algorithms and other ML algorithms, specifically decision trees, random forest, extreme gradient descent boosting (XGBoost), and Least Absolute Shrinkage and Selection Operator (LASSO), were used to model all-cause readmissions 30 days post-discharge from index pneumonia hospitalization. A total of 61 clinically relevant variables were included for ML model development. Models were trained on randomly partitioned 50% of the data and evaluated using the remaining dataset. Model hyperparameters were tuned using the ten-fold cross-validation on the resampled training dataset. The area under the receiver operating curves (AUROC) and area under precision-recall curves (AUPRC) were calculated for the testing set to evaluate the model performance.
Results
Of the 372,293 patients with an index hospital hospitalization for pneumonia, 48,280 (12.97%) were readmitted within 30 days. Judged by AUROC in the testing data, rule-based model (0.6591) significantly outperformed decision tree (0.5783, p value < 0.001), random forest (0.6509, p value < 0.01) and LASSO (0.6087, p value < 0.001), but was less superior than XGBoost (0.6606, p value = 0.015). The AUPRC of the rule-based model in the testing data (0.2146) was higher than the decision tree (0.1560), random forest (0.2052), and LASSO (0.2042), but was similar to XGBoost (0.2147). The top risk-predictive rules captured by the rule-based algorithm were comorbidities, illness severity, disposition locations, payer type, age, and length of stay. These predictive risk factors were also identified by other ML models with high variable importance.
Conclusion
The performance of machine learning models for predicting readmission in pneumonia patients varied. The XGboost was better than the rule-based model based on the AUROC. However, important risk factors for predicting readmission remained consistent across ML models.
Collapse
|
8
|
Construct and Validate a Predictive Model for Surgical Site Infection after Posterior Lumbar Interbody Fusion Based on Machine Learning Algorithm. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:2697841. [PMID: 36050996 PMCID: PMC9427297 DOI: 10.1155/2022/2697841] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 07/28/2022] [Accepted: 08/06/2022] [Indexed: 11/17/2022]
Abstract
Purpose. Surgical site infection is one of the serious complications after lumbar fusion. Early prediction and timely intervention can reduce the harm to patients. The aims of this study were to construct and validate a machine learning model for predicting surgical site infection after posterior lumbar interbody fusion, to screen out the most important risk factors for surgical site infection, and to explore whether synthetic minority oversampling technique could improve the model performance. Method. This study reviewed 584 patients who underwent posterior lumbar interbody fusion for degenerative lumbar disease at our center from January 2019 to August 2021. Clinical information and laboratory test data were collected from the electronic medical records. The original dataset was divided into training set and validation set in a 1 : 1 ratio. Seven machine learning algorithms were used to develop predictive models; the training set of each model was resampled using synthetic minority oversampling technique. Finally, the model performance was assessed in the validation set. Results. Of the 584 patients, 33 (5.65%) occurred surgical site infection. Stepwise logistic regression showed that preoperative albumin level (OR 0.659, 95% CI 0.563-0.756), diabetes (OR 9.129, 95% CI 3.816-23.126), intraoperative dural tear (OR 8.436, 95% CI 2.729-25.334), and rheumatic disease (OR 8.471, 95% CI 1.743-39.567) were significant predictors associated with surgical site infection. The performance of the AdaBoost Classification Trees model was the best among the seven machine learning models, and synthetic minority oversampling technique improved the performance of all models. Conclusion. The prediction model we constructed based on machine learning and synthetic minority oversampling technique can accurately predict surgical site infection, which is conducive to clinical decision-making and optimization of perioperative management.
Collapse
|
9
|
Makhalova J, Medina Villalon S, Wang H, Giusiano B, Woodman M, Bénar C, Guye M, Jirsa V, Bartolomei F. Virtual Epileptic Patient brain modeling: relationships with seizure onset and surgical outcome. Epilepsia 2022; 63:1942-1955. [PMID: 35604575 PMCID: PMC9543509 DOI: 10.1111/epi.17310] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2022] [Revised: 05/20/2022] [Accepted: 05/20/2022] [Indexed: 11/29/2022]
Abstract
Objective The virtual epileptic patient (VEP) is a large‐scale brain modeling method based on virtual brain technology, using stereoelectroencephalography (SEEG), anatomical data (magnetic resonance imaging [MRI] and connectivity), and a computational neuronal model to provide computer simulations of a patient's seizures. VEP has potential interest in the presurgical evaluation of drug‐resistant epilepsy by identifying regions most likely to generate seizures. We aimed to assess the performance of the VEP approach in estimating the epileptogenic zone and in predicting surgical outcome. Methods VEP modeling was retrospectively applied in a cohort of 53 patients with pharmacoresistant epilepsy and available SEEG, T1‐weighted MRI, and diffusion‐weighted MRI. Precision recall was used to compare the regions identified as epileptogenic by VEP (EZVEP) to the epileptogenic zone defined by clinical analysis incorporating the Epileptogenicity Index (EI) method (EZC). In 28 operated patients, we compared the VEP results and clinical analysis with surgical outcome. Results VEP showed a precision of 64% and a recall of 44% for EZVEP detection compared to EZC. There was a better concordance of VEP predictions with clinical results, with higher precision (77%) in seizure‐free compared to non‐seizure‐free patients. Although the completeness of resection was significantly correlated with surgical outcome for both EZC and EZVEP, there was a significantly higher number of regions defined as epileptogenic exclusively by VEP that remained nonresected in non‐seizure‐free patients. Significance VEP is the first computational model that estimates the extent and organization of the epileptogenic zone network. It is characterized by good precision in detecting epileptogenic regions as defined by a combination of visual analysis and EI. The potential impact of VEP on improving surgical prognosis remains to be exploited. Analysis of factors limiting the performance of the actual model is crucial for its further development.
Collapse
Affiliation(s)
- Julia Makhalova
- APHM, Timone Hospital, Epileptology and Cerebral Rhythmology, Marseille, France.,Aix Marseille Univ, CNRS, CRMBM, Marseille, France.,APHM, Timone Hospital, CEMEREM, Marseille, France
| | - Samuel Medina Villalon
- APHM, Timone Hospital, Epileptology and Cerebral Rhythmology, Marseille, France.,Aix Marseille Univ, INSERM, INS, Inst Neurosci Syst, Marseille, France
| | - Huifang Wang
- Aix Marseille Univ, INSERM, INS, Inst Neurosci Syst, Marseille, France
| | - Bernard Giusiano
- Aix Marseille Univ, INSERM, INS, Inst Neurosci Syst, Marseille, France.,APHM, Public Health Department, Marseille, France
| | - Marmaduke Woodman
- Aix Marseille Univ, INSERM, INS, Inst Neurosci Syst, Marseille, France
| | - Christian Bénar
- Aix Marseille Univ, INSERM, INS, Inst Neurosci Syst, Marseille, France
| | - Maxime Guye
- APHM, Timone Hospital, Epileptology and Cerebral Rhythmology, Marseille, France.,Aix Marseille Univ, CNRS, CRMBM, Marseille, France.,APHM, Timone Hospital, CEMEREM, Marseille, France
| | - Viktor Jirsa
- Aix Marseille Univ, INSERM, INS, Inst Neurosci Syst, Marseille, France
| | - Fabrice Bartolomei
- APHM, Timone Hospital, Epileptology and Cerebral Rhythmology, Marseille, France.,Aix Marseille Univ, INSERM, INS, Inst Neurosci Syst, Marseille, France
| |
Collapse
|
10
|
Tarimo CS, Bhuyan SS, Zhao Y, Ren W, Mohammed A, Li Q, Gardner M, Mahande MJ, Wang Y, Wu J. Prediction of low Apgar score at five minutes following labor induction intervention in vaginal deliveries: machine learning approach for imbalanced data at a tertiary hospital in North Tanzania. BMC Pregnancy Childbirth 2022; 22:275. [PMID: 35365129 PMCID: PMC8976377 DOI: 10.1186/s12884-022-04534-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 02/28/2022] [Indexed: 11/18/2022] Open
Abstract
Background Prediction of low Apgar score for vaginal deliveries following labor induction intervention is critical for improving neonatal health outcomes. We set out to investigate important attributes and train popular machine learning (ML) algorithms to correctly classify neonates with a low Apgar scores from an imbalanced learning perspective. Methods We analyzed 7716 induced vaginal deliveries from the electronic birth registry of the Kilimanjaro Christian Medical Centre (KCMC). 733 (9.5%) of which constituted of low (< 7) Apgar score neonates. The ‘extra-tree classifier’ was used to assess features’ importance. We used Area Under Curve (AUC), recall, precision, F-score, Matthews Correlation Coefficient (MCC), balanced accuracy (BA), bookmaker informedness (BM), and markedness (MK) to evaluate the performance of the selected six (6) machine learning classifiers. To address class imbalances, we examined three widely used resampling techniques: the Synthetic Minority Oversampling Technique (SMOTE) and Random Oversampling Examples (ROS) and Random undersampling techniques (RUS). We applied Decision Curve Analysis (DCA) to evaluate the net benefit of the selected classifiers. Results Birth weight, maternal age, and gestational age were found to be important predictors for the low Apgar score following induced vaginal delivery. SMOTE, ROS and and RUS techniques were more effective at improving “recalls” among other metrics in all the models under investigation. A slight improvement was observed in the F1 score, BA, and BM. DCA revealed potential benefits of applying Boosting method for predicting low Apgar scores among the tested models. Conclusion There is an opportunity for more algorithms to be tested to come up with theoretical guidance on more effective rebalancing techniques suitable for this particular imbalanced ratio. Future research should prioritize a debate on which performance indicators to look up to when dealing with imbalanced or skewed data. Supplementary Information The online version contains supplementary material available at 10.1186/s12884-022-04534-0.
Collapse
Affiliation(s)
- Clifford Silver Tarimo
- Department of Epidemiology and Health Statistics, College of Public Health, Zhengzhou University, 100 Kexue Avenue, Zhengzhou, 450001, Henan, China.,Department of Science and Laboratory Technology, Dar es Salaam Institute of Technology, P.O. Box 2958, Dar es Salaam, Tanzania
| | - Soumitra S Bhuyan
- Rutgers University-New Brunswick, Edward J. Bloustein, School of Planning and Public Policy, New Brunswick, USA
| | - Yizhen Zhao
- Luoyang Orthopedic Traumatological Hospital of Henan Province, Luoyang, China
| | - Weicun Ren
- Department of Epidemiology and Health Statistics, College of Public Health, Zhengzhou University, 100 Kexue Avenue, Zhengzhou, 450001, Henan, China.,College of Sanquan, Xinxiang Medical University, Xinxiang, People's Republic of China
| | - Akram Mohammed
- Center for Biomedical Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Quanman Li
- Department of Epidemiology and Health Statistics, College of Public Health, Zhengzhou University, 100 Kexue Avenue, Zhengzhou, 450001, Henan, China
| | - Marilyn Gardner
- Department of Public Health, Western Kentucky University, 1906 College Heights Blvd, Bowling Green, KY, 42101, USA
| | - Michael Johnson Mahande
- Institute of Public Health, Kilimanjaro Christian Medical University College, P.O. Box 2240, Moshi, Tanzania
| | - Yuhui Wang
- Centre for Financial and Corporate Integrity, Coventry University, Coventry, UK
| | - Jian Wu
- Department of Epidemiology and Health Statistics, College of Public Health, Zhengzhou University, 100 Kexue Avenue, Zhengzhou, 450001, Henan, China. .,Henan Province Engineering Research Center of Health Economics & Health Technology Assessment, Henan Province, China.
| |
Collapse
|