1
|
Kim JS, Kwon D, Kim K, Lee SH, Lee SB, Kim K, Kim D, Lee MW, Park N, Choi JH, Jang ES, Cho IR, Paik WH, Lee JK, Ryu JK, Kim YT. Machine learning-based prediction of pulmonary embolism to reduce unnecessary computed tomography scans in gastrointestinal cancer patients: a retrospective multicenter study. Sci Rep 2024; 14:25359. [PMID: 39455658 PMCID: PMC11511972 DOI: 10.1038/s41598-024-75977-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Accepted: 10/09/2024] [Indexed: 10/28/2024] Open
Abstract
This study aimed to develop a machine learning (ML) model for predicting pulmonary embolism (PE) in patients with gastrointestinal cancers, a group at increased risk for PE. We conducted a retrospective, multicenter study analyzing patients who underwent computed tomographic pulmonary angiography (CTPA) between 2010 and 2020. The study utilized demographic and clinical data, including the Wells score and D-dimer levels, to train a random forest ML model. The model's effectiveness was assessed using the area under the receiver operating curve (AUROC). In total, 446 patients from hospital A and 139 from hospital B were included. The training set consisted of 356 patients from hospital A, with internal validation on 90 and external validation on 139 patients from hospital B. The model achieved an AUROC of 0.736 in hospital A and 0.669 in hospital B. The ML model significantly reduced the number of patients recommended for CTPA compared to the conventional diagnostic strategy (hospital A; 100.0% vs. 91.1%, P < 0.001, hospital B; 100.0% vs. 93.5%, P = 0.003). The results indicate that an ML-based prediction model can reduce unnecessary CTPA procedures in gastrointestinal cancer patients, highlighting its potential to enhance diagnostic efficiency and reduce patient burden.
Collapse
Affiliation(s)
- Joo Seong Kim
- Department of Internal Medicine and Liver Research Institute, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Korea
- Department of Internal Medicine, Dongguk University College of Medicine, Dongguk University Ilsan Hospital, Goyang-si, Korea
| | - Doyun Kwon
- Interdisciplinary Program of Medical Informatics, Seoul National University College of Medicine, Seoul, Korea
| | - Kyungdo Kim
- Department of Biomedical Engineering, Pratt School of Engineering, Duke University, Durham, NC, 27708, USA
- Transdisciplinary Department of Medicine & Advanced Technology, Seoul National University Hospital, Seoul, Korea
| | - Sang Hyub Lee
- Department of Internal Medicine and Liver Research Institute, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Korea.
| | - Seung-Bo Lee
- Department of Medical Informatics, Keimyung University School of Medicine, 1095, Dalgubeol-daero, Dalseo-gu, Daegu, 42601, Republic of Korea.
| | - Kwangsoo Kim
- Transdisciplinary Department of Medicine & Advanced Technology, Seoul National University Hospital, Seoul, Korea
- Department of Medicine, Seoul National University College of Medicine, Seoul, Korea
| | - Dongmin Kim
- Biomedical Research Institute, Seoul National University Hospital, Seoul, Korea
| | - Min Woo Lee
- Department of Internal Medicine and Liver Research Institute, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Korea
| | - Namyoung Park
- Department of Medicine, Kyung Hee University Gangdong Hospital, Seoul, Korea
| | - Jin Ho Choi
- Department of Internal Medicine and Liver Research Institute, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Korea
| | - Eun Sun Jang
- Department of Internal Medicine, Seoul National University Bundang Hospital, Seongnam-si, Korea
| | - In Rae Cho
- Department of Internal Medicine and Liver Research Institute, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Korea
| | - Woo Hyun Paik
- Department of Internal Medicine and Liver Research Institute, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Korea
| | - Jun Kyu Lee
- Department of Internal Medicine, Dongguk University College of Medicine, Dongguk University Ilsan Hospital, Goyang-si, Korea
| | - Ji Kon Ryu
- Department of Internal Medicine and Liver Research Institute, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Korea
| | - Yong-Tae Kim
- Department of Internal Medicine and Liver Research Institute, Seoul National University Hospital, Seoul National University College of Medicine, Seoul, Korea
| |
Collapse
|
2
|
Wu J, Ge Y, Chen K, Chen S, Yang J, Yuan H. Machine Learning Diagnostic Model for Early Stage NSTEMI: Using hs-cTnI 1/2h Changes and Multiple Cardiovascular Biomarkers. Diagnostics (Basel) 2024; 14:2322. [PMID: 39451645 PMCID: PMC11506866 DOI: 10.3390/diagnostics14202322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2024] [Revised: 10/15/2024] [Accepted: 10/15/2024] [Indexed: 10/26/2024] Open
Abstract
BACKGROUND This study demonstrates differences in the distribution of multiple cardiovascular biomarkers between non-ST-segment elevation myocardial infarction (NSTEMI) and unstable angina (UA) patients. Diagnostic machine learning predictive models measured at the time of admission and 1/2 h post-admission, achieving competitive diagnostic predictive results. OBJECTIVE This study aims to explore the diagnostic value of changes in high-sensitivity cardiac troponin I (hs-cTnI) levels in patients with suspected NSTEMI. METHODS A total of 267 patients presented with chest pain, requiring confirmation of acute coronary syndrome (ACS) subtypes (NSTEMI vs. UA). Hs-cTnI and other cardiac markers, such as creatine kinase-MB (CK-MB) and Myoglobin (Myo), were analyzed. Machine learning techniques were employed to assess the application of hs-cTnI level changes in the clinical diagnosis of NSTEMI. RESULTS Levels of CK-MB, Myo, hs-cTnI measured at admission, hs-cTnI measured 1-2 h after admission, and NT-proBNP in NSTEMI patients were significantly higher than those in UA patients (p < 0.001). There was a positive correlation between hs-cTnI and CK-MB, as well as Myo (R = 0.72, R = 0.51, R = 0.60). The optimal diagnostic model, Hybiome_1/2h, demonstrated an F1-Score of 0.74, an AUROC of 0.96, and an AP of 0.89. CONCLUSIONS This study confirms the significant value of hs-cTnI as a sensitive marker of myocardial injury in the diagnosis of NSTEMI. Continuous monitoring of hs-cTnI levels enhances the accuracy of distinguishing NSTEMI from UA. The models indicate that the Hybiome hs-cTnI assays perform comparably well to the Beckman assays in predicting NSTEMI. Moreover, incorporating hs-cTnI measurements taken 1-2 h post-admission significantly enhances the model's effectiveness.
Collapse
Affiliation(s)
| | | | | | | | | | - Hui Yuan
- Department of Clinical Laboratory in Beijing Anzhen Hospital, Affiliated Hospital of Capital Medical University, Beijing 100029, China; (J.W.); (Y.G.); (K.C.); (S.C.); (J.Y.)
| |
Collapse
|
3
|
Zhang X, Lin Y, He D, Sun M, Xu L, Chang Z, Liu Z, Li B. 18F-Fluoro-2-Deoxyglucose Positron Emission Tomography/Computed Tomography Measures of Spatial Heterogeneity for Predicting Platinum Resistance of High-Grade Serous Ovarian Cancer. Cancer Med 2024; 13:e70287. [PMID: 39435561 PMCID: PMC11494247 DOI: 10.1002/cam4.70287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 08/02/2024] [Accepted: 09/22/2024] [Indexed: 10/23/2024] Open
Abstract
BACKGROUND The purpose of this study is to construct models for predicting platinum resistance in high-grade serous ovarian cancer (HGSOC) derived from quantitative spatial heterogeneity indicators obtained from 18F-FDG PET/CT images. METHODS A retrospective study was conducted on patients diagnosed with HGSOC. Quantitative indicators of spatial heterogeneity were generated using conventional features and Haralick texture features from both CT and PET images. Three groups of predictive models (conventional, heterogeneity, and integrated) were built. Each group's optimal model was the one with the highest area under curve (AUC). Postoperative immunohistochemical staining for Ki-67 and p53 was conducted. The correlation between the heterogeneity indicators and scores for Ki-67 and p53 was assessed by Spearman's correlation coefficient (ρ). RESULTS A total of 286 patients (54.6 ± 9.3 years) were enrolled. And 107 spatial heterogeneity indicators were extracted. The optimal models for each group were obtained using the Gradient Boosting Machine (GBM) algorithm. There was an AUC of 0.790 (95% CI: 0.696, 0.885) in the conventional model for the validation set, and an AUC of 0.904 (95% CI: 0.842, 0.966) in the heterogeneity model for the validation set. The integrated model achieved the highest predictive performance, with an AUC value of 0.928 (95% CI: 0.872, 0.984) for the validation set. Spearman's correlation showed that HU_Kurtosis had the strongest correlation with p53 scores with ρ = 0.718, while cluster site entropy had the strongest correlation with Ki-67 scores with ρ = 0.753. CONCLUSIONS Adding quantitative spatial heterogeneity indicators derived from PET/CT images can improve the prediction of platinum resistance in patients with HGSOC. Spatial heterogeneity indicators were related to Ki-67 and p53 scores.
Collapse
Affiliation(s)
- Xin Zhang
- Department of General SurgeryShengjing Hospital of China Medical UniversityShenyangLiaoningPeople's Republic of China
| | - Yuhe Lin
- Department of OncologyShengjing Hospital of China Medical UniversityShenyangLiaoningPeople's Republic of China
| | - Dianning He
- School of Health ManagementChina Medical UniversityShenyangLiaoningPeople's Republic of China
| | - Mingli Sun
- Department of Obstetrics and GynecologyShengjing Hospital of China Medical UniversityShenyangLiaoningPeople's Republic of China
| | - Lanlan Xu
- Department of RadiologyShengjing Hospital of China Medical UniversityShenyangLiaoningPeople's Republic of China
| | - Zhihui Chang
- Department of RadiologyShengjing Hospital of China Medical UniversityShenyangLiaoningPeople's Republic of China
| | - Zhaoyu Liu
- Department of RadiologyShengjing Hospital of China Medical UniversityShenyangLiaoningPeople's Republic of China
| | - Beibei Li
- Department of RadiologyShengjing Hospital of China Medical UniversityShenyangLiaoningPeople's Republic of China
| |
Collapse
|
4
|
Li W, Peng Y, Peng K. Diabetes prediction model based on GA-XGBoost and stacking ensemble algorithm. PLoS One 2024; 19:e0311222. [PMID: 39348356 PMCID: PMC11441666 DOI: 10.1371/journal.pone.0311222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Accepted: 09/16/2024] [Indexed: 10/02/2024] Open
Abstract
Diabetes, as an incurable lifelong chronic disease, has profound and far-reaching effects on patients. Given this, early intervention is particularly crucial, as it can not only significantly improve the prognosis of patients but also provide valuable reference information for clinical treatment. This study selected the BRFSS (Behavioral Risk Factor Surveillance System) dataset, which is publicly available on the Kaggle platform, as the research object, aiming to provide a scientific basis for the early diagnosis and treatment of diabetes through advanced machine learning techniques. Firstly, the dataset was balanced using various sampling methods; secondly, a Stacking model based on GA-XGBoost (XGBoost model optimized by genetic algorithm) was constructed for the risk prediction of diabetes; finally, the interpretability of the model was deeply analyzed using Shapley values. The results show: (1) Random oversampling, ADASYN, SMOTE, and SMOTEENN were used for data balance processing, among which SMOTEENN showed better efficiency and effect in dealing with data imbalance. (2) The GA-XGBoost model optimized the hyperparameters of the XGBoost model through a genetic algorithm to improve the model's predictive accuracy. Combined with the better-performing LightGBM model and random forest model, a two-layer Stacking model was constructed. This model not only outperforms single machine learning models in predictive effect but also provides a new idea and method in the field of model integration. (3) Shapley value analysis identified features that have a significant impact on the prediction of diabetes, such as age and body mass index. This analysis not only enhances the transparency of the model but also provides more precise treatment decision support for doctors and patients. In summary, this study has not only improved the accuracy of predicting the risk of diabetes by adopting advanced machine learning techniques and model integration strategies but also provided a powerful tool for the early diagnosis and personalized treatment of diabetes.
Collapse
Affiliation(s)
- Wenguang Li
- College of Computer Science and Engineering, Sichuan University of Science and Engineering, Yibin, China
| | - Yan Peng
- College of Computer Science and Engineering, Sichuan University of Science and Engineering, Yibin, China
| | - Ke Peng
- College of Computer Science and Engineering, Sichuan University of Science and Engineering, Yibin, China
| |
Collapse
|
5
|
Xu L, Li C, Gao S, Zhao L, Guan C, Shen X, Zhu Z, Guo C, Zhang L, Yang C, Bu Q, Zhou B, Xu Y. Personalized Prediction of Long-Term Renal Function Prognosis Following Nephrectomy Using Interpretable Machine Learning Algorithms: Case-Control Study. JMIR Med Inform 2024; 12:e52837. [PMID: 39303280 PMCID: PMC11452755 DOI: 10.2196/52837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 04/08/2024] [Accepted: 07/21/2024] [Indexed: 09/22/2024] Open
Abstract
BACKGROUND Acute kidney injury (AKI) is a common adverse outcome following nephrectomy. The progression from AKI to acute kidney disease (AKD) and subsequently to chronic kidney disease (CKD) remains a concern; yet, the predictive mechanisms for these transitions are not fully understood. Interpretable machine learning (ML) models offer insights into how clinical features influence long-term renal function outcomes after nephrectomy, providing a more precise framework for identifying patients at risk and supporting improved clinical decision-making processes. OBJECTIVE This study aimed to (1) evaluate postnephrectomy rates of AKI, AKD, and CKD, analyzing long-term renal outcomes along different trajectories; (2) interpret AKD and CKD models using Shapley Additive Explanations values and Local Interpretable Model-Agnostic Explanations algorithm; and (3) develop a web-based tool for estimating AKD or CKD risk after nephrectomy. METHODS We conducted a retrospective cohort study involving patients who underwent nephrectomy between July 2012 and June 2019. Patient data were randomly split into training, validation, and test sets, maintaining a ratio of 76.5:8.5:15. Eight ML algorithms were used to construct predictive models for postoperative AKD and CKD. The performance of the best-performing models was assessed using various metrics. We used various Shapley Additive Explanations plots and Local Interpretable Model-Agnostic Explanations bar plots to interpret the model and generated directed acyclic graphs to explore the potential causal relationships between features. Additionally, we developed a web-based prediction tool using the top 10 features for AKD prediction and the top 5 features for CKD prediction. RESULTS The study cohort comprised 1559 patients. Incidence rates for AKI, AKD, and CKD were 21.7% (n=330), 15.3% (n=238), and 10.6% (n=165), respectively. Among the evaluated ML models, the Light Gradient-Boosting Machine (LightGBM) model demonstrated superior performance, with an area under the receiver operating characteristic curve of 0.97 for AKD prediction and 0.96 for CKD prediction. Performance metrics and plots highlighted the model's competence in discrimination, calibration, and clinical applicability. Operative duration, hemoglobin, blood loss, urine protein, and hematocrit were identified as the top 5 features associated with predicted AKD. Baseline estimated glomerular filtration rate, pathology, trajectories of renal function, age, and total bilirubin were the top 5 features associated with predicted CKD. Additionally, we developed a web application using the LightGBM model to estimate AKD and CKD risks. CONCLUSIONS An interpretable ML model effectively elucidated its decision-making process in identifying patients at risk of AKD and CKD following nephrectomy by enumerating critical features. The web-based calculator, found on the LightGBM model, can assist in formulating more personalized and evidence-based clinical strategies.
Collapse
Affiliation(s)
- Lingyu Xu
- Department of Nephrology, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Chenyu Li
- Department of Nephrology, The Affiliated Hospital of Qingdao University, Qingdao, China
- Medizinische Klinik und Poliklinik IV, Klinikum der Universität, Munich, Germany
| | - Shuang Gao
- Ocean University of China, Qingdao, CN, Qingdao, China
| | - Long Zhao
- Department of Nephrology, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Chen Guan
- Department of Nephrology, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Xuefei Shen
- Department of Nephrology, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Zhihui Zhu
- Center of Structural Heart Disease, Beijing Anzhen Hospital, Capital Medical University, Beijing, China
| | - Cheng Guo
- Allianz Technology, Allianz, Munich, Germany
| | - Liwei Zhang
- Institute of Diabetes and Regeneration Research, Helmholtz Diabetes Center, Helmholtz Center Munich, Neuherberg, Germany
| | - Chengyu Yang
- Department of Nephrology, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Quandong Bu
- Department of Nephrology, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Bin Zhou
- Department of Nephrology, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Yan Xu
- Department of Nephrology, The Affiliated Hospital of Qingdao University, Qingdao, China
| |
Collapse
|
6
|
Iftikhar M, Saqib M, Qayyum SN, Asmat R, Mumtaz H, Rehan M, Ullah I, Ud-Din I, Noori S, Khan M, Rehman E, Ejaz Z. Artificial intelligence-driven transformations in diabetes care: a comprehensive literature review. Ann Med Surg (Lond) 2024; 86:5334-5342. [PMID: 39238969 PMCID: PMC11374247 DOI: 10.1097/ms9.0000000000002369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2024] [Accepted: 07/05/2024] [Indexed: 09/07/2024] Open
Abstract
Artificial intelligence (AI) has been applied in healthcare for diagnosis, treatments, disease management, and for studying underlying mechanisms and disease complications in diseases like diabetes and metabolic disorders. This review is a comprehensive overview of various applications of AI in the healthcare system for managing diabetes. A literature search was conducted on PubMed to locate studies integrating AI in the diagnosis, treatment, management and prevention of diabetes. As diabetes is now considered a pandemic now so employing AI and machine learning approaches can be applied to limit diabetes in areas with higher prevalence. Machine learning algorithms can visualize big datasets, and make predictions. AI-powered mobile apps and the closed-loop system automated glucose monitoring and insulin delivery can lower the burden on insulin. AI can help identify disease markers and potential risk factors as well. While promising, AI's integration in the medical field is still challenging due to privacy, data security, bias, and transparency. Overall, AI's potential can be harnessed for better patient outcomes through personalized treatment.
Collapse
Affiliation(s)
| | | | | | | | | | - Muhammad Rehan
- Al-Nafees Medical College and Hospital, Islamabad, Pakistan
| | | | | | - Samim Noori
- Nangarhar University, Faculty of Medicine, Nangarhar, Afghanistan
| | | | | | | |
Collapse
|
7
|
Wang T, Tan J, Wang T, Xiang S, Zhang Y, Jian C, Jian J, Zhao W. A Real-World Study on the Short-Term Efficacy of Amlodipine in Treating Hypertension Among Inpatients. Pragmat Obs Res 2024; 15:121-137. [PMID: 39130528 PMCID: PMC11316486 DOI: 10.2147/por.s464439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 07/12/2024] [Indexed: 08/13/2024] Open
Abstract
Purpose Hospitalized hypertensive patients rely on blood pressure medication, yet there is limited research on the sole use of amlodipine, despite its proven efficacy in protecting target organs and reducing mortality. This study aims to identify key indicators influencing the efficacy of amlodipine, thereby enhancing treatment outcomes. Patients and Methods In this multicenter retrospective study, 870 hospitalized patients with primary hypertension exclusively received amlodipine for the first 5 days after admission, and their medical records contained comprehensive blood pressure records. They were categorized into success (n=479) and failure (n=391) groups based on average blood pressure control efficacy. Predictive models were constructed using six machine learning algorithms. Evaluation metrics encompassed the area under the curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). SHapley Additive exPlanations (SHAP) analysis assessed feature contributions to efficacy. Results All six machine learning models demonstrated superior predictive performance. Following variable reduction, the model predicting amlodipine efficacy was reconstructed using these algorithms, with the light gradient boosting machine (LightGBM) model achieving the highest overall performance (AUC = 0.803). Notably, amlodipine showed enhanced efficacy in patients with low platelet distribution width (PDW) values, as well as high hematocrit (HCT) and thrombin time (TT) values. Conclusion This study utilized machine learning to predict amlodipine's effectiveness in hypertension treatment, pinpointing key factors: HCT, PDW, and TT levels. Lower PDW, along with higher HCT and TT, correlated with enhanced treatment outcomes. This facilitates personalized treatment, particularly for hospitalized hypertensive patients undergoing amlodipine monotherapy.
Collapse
Affiliation(s)
- Tingting Wang
- College of Medical Informatics, Chongqing Medical University, Chongqing, 400016, People’s Republic of China
| | - Juntao Tan
- Operation Management Office, Affiliated Banan Hospital of Chongqing Medical University, Chongqing, 401320, People’s Republic of China
| | - Tiantian Wang
- Operation Management Office, Affiliated Banan Hospital of Chongqing Medical University, Chongqing, 401320, People’s Republic of China
| | - Shoushu Xiang
- Operation Management Office, Affiliated Banan Hospital of Chongqing Medical University, Chongqing, 401320, People’s Republic of China
| | - Yang Zhang
- College of Medical Informatics, Chongqing Medical University, Chongqing, 400016, People’s Republic of China
| | - Chang Jian
- College of Medical Informatics, Chongqing Medical University, Chongqing, 400016, People’s Republic of China
| | - Jie Jian
- College of Medical Informatics, Chongqing Medical University, Chongqing, 400016, People’s Republic of China
| | - Wenlong Zhao
- College of Medical Informatics, Chongqing Medical University, Chongqing, 400016, People’s Republic of China
- Medical Data Science Academy, Chongqing Medical University, Chongqing, People’s Republic of China
| |
Collapse
|
8
|
Xie ZW, He Y, Feng YX, Wang XH. Identification of programmed cell death-related genes and diagnostic biomarkers in endometriosis using a machine learning and Mendelian randomization approach. Front Endocrinol (Lausanne) 2024; 15:1372221. [PMID: 39149122 PMCID: PMC11324423 DOI: 10.3389/fendo.2024.1372221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 07/15/2024] [Indexed: 08/17/2024] Open
Abstract
Background Endometriosis (EM) is a prevalent gynecological disorder frequently associated with irregular menstruation and infertility. Programmed cell death (PCD) is pivotal in the pathophysiological mechanisms underlying EM. Despite this, the precise pathogenesis of EM remains poorly understood, leading to diagnostic delays. Consequently, identifying biomarkers associated with PCD is critical for advancing the diagnosis and treatment of EM. Methods This study used datasets from the Gene Expression Omnibus (GEO) to identify differentially expressed genes (DEGs) following preprocessing. By cross-referencing these DEGs with genes associated with PCD, differentially expressed PCD-related genes (DPGs) were identified. Enrichment analyses for KEGG and GO pathways were conducted on these DPGs. Additionally, Mendelian randomization and machine learning techniques were applied to identify biomarkers strongly associated with EM. Results The study identified three pivotal biomarkers: TNFSF12, AP3M1, and PDK2, and established a diagnostic model for EM based on these genes. The results revealed a marked upregulation of TNFSF12 and PDK2 in EM samples, coupled with a significant downregulation of AP3M1. Single-cell analysis further underscored the potential of TNFSF12, AP3M1, and PDK2 as biomarkers for EM. Additionally, molecular docking studies demonstrated that these genes exhibit significant binding affinities with drugs currently utilized in clinical practice. Conclusion This study systematically elucidated the molecular characteristics of PCD in EM and identified TNFSF12, AP3M1, and PDK2 as key biomarkers. These findings provide new directions for the early diagnosis and personalized treatment of EM.
Collapse
Affiliation(s)
- Zi-Wei Xie
- Department of Gynecology, People's Hospital Affiliated of Fujian University of Traditional Chinese Medicine, Fuzhou, China
- First Clinical Medical College, Fujian University of Traditional Chinese Medicine, Fuzhou, China
| | - Yue He
- Department of Gynecology, People's Hospital Affiliated of Fujian University of Traditional Chinese Medicine, Fuzhou, China
- First Clinical Medical College, Fujian University of Traditional Chinese Medicine, Fuzhou, China
| | - Yu-Xin Feng
- Department of Gynecology, People's Hospital Affiliated of Fujian University of Traditional Chinese Medicine, Fuzhou, China
- First Clinical Medical College, Fujian University of Traditional Chinese Medicine, Fuzhou, China
| | - Xiao-Hong Wang
- Department of Gynecology, People's Hospital Affiliated of Fujian University of Traditional Chinese Medicine, Fuzhou, China
| |
Collapse
|
9
|
Kurasawa H, Waki K, Seki T, Chiba A, Fujino A, Hayashi K, Nakahara E, Haga T, Noguchi T, Ohe K. Enhancing Type 2 Diabetes Treatment Decisions With Interpretable Machine Learning Models for Predicting Hemoglobin A1c Changes: Machine Learning Model Development. JMIR AI 2024; 3:e56700. [PMID: 39024008 PMCID: PMC11294778 DOI: 10.2196/56700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 04/21/2024] [Accepted: 05/31/2024] [Indexed: 07/20/2024]
Abstract
BACKGROUND Type 2 diabetes (T2D) is a significant global health challenge. Physicians need to assess whether future glycemic control will be poor on the current trajectory of usual care and usual-care treatment intensifications so that they can consider taking extra treatment measures to prevent poor outcomes. Predicting poor glycemic control from trends in hemoglobin A1c (HbA1c) levels is difficult due to the influence of seasonal fluctuations and other factors. OBJECTIVE We sought to develop a model that accurately predicts poor glycemic control among patients with T2D receiving usual care. METHODS Our machine learning model predicts poor glycemic control (HbA1c≥8%) using the transformer architecture, incorporating an attention mechanism to process irregularly spaced HbA1c time series and quantify temporal relationships of past HbA1c levels at each time point. We assessed the model using HbA1c levels from 7787 patients with T2D seeing specialist physicians at the University of Tokyo Hospital. The training data include instances of poor glycemic control occurring during usual care with usual-care treatment intensifications. We compared prediction accuracy, assessed with the area under the receiver operating characteristic curve, the area under the precision-recall curve, and the accuracy rate, to that of LightGBM. RESULTS The area under the receiver operating characteristic curve, the area under the precision-recall curve, and the accuracy rate (95% confidence limits) of the proposed model were 0.925 (95% CI 0.923-0.928), 0.864 (95% CI 0.852-0.875), and 0.864 (95% CI 0.86-0.869), respectively. The proposed model achieved high prediction accuracy comparable to or surpassing LightGBM's performance. The model prioritized the most recent HbA1c levels for predictions. Older HbA1c levels in patients with poor glycemic control were slightly more influential in predictions compared to patients with good glycemic control. CONCLUSIONS The proposed model accurately predicts poor glycemic control for patients with T2D receiving usual care, including patients receiving usual-care treatment intensifications, allowing physicians to identify cases warranting extraordinary treatment intensifications. If used by a nonspecialist, the model's indication of likely future poor glycemic control may warrant a referral to a specialist. Future efforts could incorporate diverse and large-scale clinical data for improved accuracy.
Collapse
Affiliation(s)
- Hisashi Kurasawa
- Nippon Telegraph and Telephone Corporation, Tokyo, Japan
- The University of Tokyo Hospital, Tokyo, Japan
| | - Kayo Waki
- The University of Tokyo Hospital, Tokyo, Japan
| | | | - Akihiro Chiba
- Nippon Telegraph and Telephone Corporation, Tokyo, Japan
- NTT DOCOMO, Inc, Tokyo, Japan
| | - Akinori Fujino
- Nippon Telegraph and Telephone Corporation, Tokyo, Japan
| | | | - Eri Nakahara
- Nippon Telegraph and Telephone Corporation, Tokyo, Japan
- The University of Tokyo Hospital, Tokyo, Japan
| | - Tsuneyuki Haga
- Nippon Telegraph and Telephone Corporation, Tokyo, Japan
- NTT-AT IPS Corporation, Kanagawa, Japan
| | - Takashi Noguchi
- National Center for Child Health and Development, Tokyo, Japan
| | | |
Collapse
|
10
|
Xu L, Li C, Zhang J, Guan C, Zhao L, Shen X, Zhang N, Li T, Yang C, Zhou B, Bu Q, Xu Y. Personalized prediction of mortality in patients with acute ischemic stroke using explainable artificial intelligence. Eur J Med Res 2024; 29:341. [PMID: 38902792 PMCID: PMC11188208 DOI: 10.1186/s40001-024-01940-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 06/17/2024] [Indexed: 06/22/2024] Open
Abstract
BACKGROUND Research into the acute kidney disease (AKD) after acute ischemic stroke (AIS) is rare, and how clinical features influence its prognosis remain unknown. We aim to employ interpretable machine learning (ML) models to study AIS and clarify its decision-making process in identifying the risk of mortality. METHODS We conducted a retrospective cohort study involving AIS patients from January 2020 to June 2021. Patient data were randomly divided into training and test sets. Eight ML algorithms were employed to construct predictive models for mortality. The performance of the best model was evaluated using various metrics. Furthermore, we created an artificial intelligence (AI)-driven web application that leveraged the top ten most crucial features for mortality prediction. RESULTS The study cohort consisted of 1633 AIS patients, among whom 257 (15.74%) developed subacute AKD, 173 (10.59%) experienced AKI recovery, and 65 (3.98%) met criteria for both AKI and AKD. The mortality rate stood at 4.84%. The LightGBM model displayed superior performance, boasting an AUROC of 0.96 for mortality prediction. The top five features linked to mortality were ACEI/ARE, renal function trajectories, neutrophil count, diuretics, and serum creatinine. Moreover, we designed a web application using the LightGBM model to estimate mortality risk. CONCLUSIONS Complete renal function trajectories, including AKI and AKD, are vital for fitting mortality in AIS patients. An interpretable ML model effectively clarified its decision-making process for identifying AIS patients at risk of mortality. The AI-driven web application has the potential to contribute to the development of personalized early mortality prevention.
Collapse
Affiliation(s)
- Lingyu Xu
- Department of Nephrology, The Affiliated Hospital of Qingdao University, 16 Jiangsu Road, Qingdao, 266003, China
| | - Chenyu Li
- Department of Nephrology, The Affiliated Hospital of Qingdao University, 16 Jiangsu Road, Qingdao, 266003, China
- Division of Nephrology, Medizinische Klinik Und Poliklinik IV, Klinikum der Universität, Munich, Germany
| | - Jiaqi Zhang
- Yidu Central Hospital of Weifang, Weifang, China
| | - Chen Guan
- Department of Nephrology, The Affiliated Hospital of Qingdao University, 16 Jiangsu Road, Qingdao, 266003, China
| | - Long Zhao
- Department of Nephrology, The Affiliated Hospital of Qingdao University, 16 Jiangsu Road, Qingdao, 266003, China
| | - Xuefei Shen
- Department of Nephrology, The Affiliated Hospital of Qingdao University, 16 Jiangsu Road, Qingdao, 266003, China
| | - Ningxin Zhang
- Department of Nephrology, The Affiliated Hospital of Qingdao University, 16 Jiangsu Road, Qingdao, 266003, China
| | - Tianyang Li
- Department of Nephrology, The Affiliated Hospital of Qingdao University, 16 Jiangsu Road, Qingdao, 266003, China
| | - Chengyu Yang
- Department of Nephrology, The Affiliated Hospital of Qingdao University, 16 Jiangsu Road, Qingdao, 266003, China
| | - Bin Zhou
- Department of Nephrology, The Affiliated Hospital of Qingdao University, 16 Jiangsu Road, Qingdao, 266003, China
| | - Quandong Bu
- Department of Nephrology, The Affiliated Hospital of Qingdao University, 16 Jiangsu Road, Qingdao, 266003, China
| | - Yan Xu
- Department of Nephrology, The Affiliated Hospital of Qingdao University, 16 Jiangsu Road, Qingdao, 266003, China.
| |
Collapse
|
11
|
Zhang Y, Zhu Y, Bao X, Dai Z, Shen Q, Wang L, Xue Y. Mining Bovine Milk Proteins for DPP-4 Inhibitory Peptides Using Machine Learning and Virtual Proteolysis. RESEARCH (WASHINGTON, D.C.) 2024; 7:0391. [PMID: 38887277 PMCID: PMC11182572 DOI: 10.34133/research.0391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 04/26/2024] [Indexed: 06/20/2024]
Abstract
Dipeptidyl peptidase-IV (DPP-4) enzyme inhibitors are a promising category of diabetes medications. Bioactive peptides, particularly those derived from bovine milk proteins, play crucial roles in inhibiting the DPP-4 enzyme. This study describes a comprehensive strategy for DPP-4 inhibitory peptide discovery and validation that combines machine learning and virtual proteolysis techniques. Five machine learning models, including GBDT, XGBoost, LightGBM, CatBoost, and RF, were trained. Notably, LightGBM demonstrated superior performance with an AUC value of 0.92 ± 0.01. Subsequently, LightGBM was employed to forecast the DPP-4 inhibitory potential of peptides generated through virtual proteolysis of milk proteins. Through a series of in silico screening process and in vitro experiments, GPVRGPF and HPHPHL were found to exhibit good DPP-4 inhibitory activity. Molecular docking and molecular dynamics simulations further confirmed the inhibitory mechanisms of these peptides. Through retracing the virtual proteolysis steps, it was found that GPVRGPF can be obtained from β-casein through enzymatic hydrolysis by chymotrypsin, while HPHPHL can be obtained from κ-casein through enzymatic hydrolysis by stem bromelain or papain. In summary, the integration of machine learning and virtual proteolysis techniques can aid in the preliminary determination of key hydrolysis parameters and facilitate the efficient screening of bioactive peptides.
Collapse
Affiliation(s)
- Yiyun Zhang
- National Engineering and Technology Research Center for Fruits and Vegetables, College of Food Science and Nutritional Engineering,
China Agricultural University, Beijing 100083, P.R. China
| | - Yiqing Zhu
- National Engineering and Technology Research Center for Fruits and Vegetables, College of Food Science and Nutritional Engineering,
China Agricultural University, Beijing 100083, P.R. China
| | - Xin Bao
- National Engineering and Technology Research Center for Fruits and Vegetables, College of Food Science and Nutritional Engineering,
China Agricultural University, Beijing 100083, P.R. China
| | - Zijian Dai
- National Engineering and Technology Research Center for Fruits and Vegetables, College of Food Science and Nutritional Engineering,
China Agricultural University, Beijing 100083, P.R. China
| | - Qun Shen
- National Engineering and Technology Research Center for Fruits and Vegetables, College of Food Science and Nutritional Engineering,
China Agricultural University, Beijing 100083, P.R. China
- National Center of Technology Innovation (Deep Processing of Highland Barley) in Food Industry,
China Agricultural University, Haidian District, Beijing 100083, P.R. China
| | - Liyang Wang
- National Engineering and Technology Research Center for Fruits and Vegetables, College of Food Science and Nutritional Engineering,
China Agricultural University, Beijing 100083, P.R. China
- School of Clinical Medicine,
Tsinghua University, Beijing 100084, P.R. China
| | - Yong Xue
- National Engineering and Technology Research Center for Fruits and Vegetables, College of Food Science and Nutritional Engineering,
China Agricultural University, Beijing 100083, P.R. China
- National Center of Technology Innovation (Deep Processing of Highland Barley) in Food Industry,
China Agricultural University, Haidian District, Beijing 100083, P.R. China
| |
Collapse
|
12
|
Lee S, Kang M. A Data-Driven Approach to Predicting Recreational Activity Participation Using Machine Learning. RESEARCH QUARTERLY FOR EXERCISE AND SPORT 2024:1-13. [PMID: 38875156 DOI: 10.1080/02701367.2024.2343815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 04/07/2024] [Indexed: 06/16/2024]
Abstract
Purpose: With the popularity of recreational activities, the study aimed to develop prediction models for recreational activity participation and explore the key factors affecting participation in recreational activities. Methods: A total of 12,712 participants, excluding individuals under 20, were selected from the National Health and Nutrition Examination Survey (NHANES) from 2011 to 2018. The mean age of the sample was 46.86 years (±16.97), with a gender distribution of 6,721 males and 5,991 females. The variables included demographic, physical-related variables, and lifestyle variables. This study developed 42 prediction models using six machine learning methods, including logistic regression, Support Vector Machine (SVM), decision tree, random forest, eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM). The relative importance of each variable was evaluated by permutation feature importance. Results: The results illustrated that the LightGBM was the most effective algorithm for predicting recreational activity participation (accuracy: .838, precision: .783, recall: .967, F1-score: .865, AUC: .826). In particular, prediction performance increased when the demographic and lifestyle datasets were used together. Next, as the result of the permutation feature importance based on the top models, education level and moderate-vigorous physical activity (MVPA) were found to be essential variables. Conclusion: These findings demonstrated the potential of a data-driven approach utilizing machine learning in a recreational discipline. Furthermore, this study interpreted the prediction model through feature importance analysis to overcome the limitation of machine learning interpretability.
Collapse
|
13
|
Kim SH, Oh YJ, Son J, Jung D, Kim D, Ryu SR, Na JY, Hwang JK, Kim TH, Park HK. Machine learning-based analysis for prediction of surgical necrotizing enterocolitis in very low birth weight infants using perinatal factors: a nationwide cohort study. Eur J Pediatr 2024; 183:2743-2751. [PMID: 38554173 PMCID: PMC11098869 DOI: 10.1007/s00431-024-05505-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 02/20/2024] [Accepted: 03/02/2024] [Indexed: 04/01/2024]
Abstract
Early prediction of surgical necrotizing enterocolitis (sNEC) in preterm infants is important. However, owing to the complexity of the disease, identifying infants with NEC at a high risk for surgical intervention is difficult. We developed a machine learning (ML) algorithm to predict sNEC using perinatal factors obtained from the national cohort registry of very low birth weight (VLBW) infants. Data were collected from the medical records of 16,385 VLBW infants registered in the Korean Neonatal Network (KNN). Infants who underwent surgical intervention were identified with sNEC, and infants who received medical treatment, with medical NEC (mNEC). We used 38 variables, including maternal, prenatal, and postnatal factors that were obtained within 1 week of birth, for training. A total of 1085 patients had NEC (654 with sNEC and 431 with mNEC). VLBW infants showed a higher incidence of sNEC at a lower gestational age (GA) (p < 0.001). Our proposed ensemble model showed an area under the receiver operating characteristic curve of 0.721 for sNEC prediction. Conclusion: Proposed ensemble model may help predict which infants with NEC are likely to develop sNEC. Through early prediction and prompt intervention, prognosis of sNEC may be improved. What is Known: • Machine learning (ML)-based techniques have been employed in NEC research for prediction, diagnosis, and prognosis, with promising outcomes. • While most studies have utilized abdominal radiographs and clinical manifestations of NEC as data sources, and have demonstrated their usefulness, they may prove weak in terms of early prediction. What is New: • We analyzed the perinatal factors of VLBW infants acquired within 7 days of birth and used ML-based analysis to identify which infants with NEC are vulnerable to clinical deterioration and at high risk for surgical intervention using nationwide cohort data.
Collapse
Affiliation(s)
- Seung Hyun Kim
- Department of Pediatrics, Hanyang University College of Medicine, 222 Wangsimni-ro, Seongdong-gu, Seoul, 04763, Republic of Korea
- Department of Pediatrics, Samsung Medical Center, Sungkyunkwan University School of Medicine, 81, Irwon-ro, Gangnam-gu, Seoul, 06351, Republic of Korea
| | - Yoon Ju Oh
- Department of Artificial Intelligence, Hanyang University, 222 Wangsimni-ro, Seongdong-gu, Seoul, 04763, Republic of Korea
| | - Joonhyuk Son
- Department of Pediatric Surgery, Hanyang University College of Medicine, 222 Wangsimni-ro, Seongdong-gu, Seoul, 04763, Republic of Korea
| | - Donggoo Jung
- Department of Artificial Intelligence, Hanyang University, 222 Wangsimni-ro, Seongdong-gu, Seoul, 04763, Republic of Korea
| | - Daehyun Kim
- Department of Artificial Intelligence, Hanyang University, 222 Wangsimni-ro, Seongdong-gu, Seoul, 04763, Republic of Korea
| | - Soo Rack Ryu
- Biostatistical Consulting and Research Lab, Medical Research Collaborating Center, Hanyang University, 222 Wangsimni-ro, Seongdong-gu, Seoul, 04763, Republic of Korea
| | - Jae Yoon Na
- Department of Pediatrics, Hanyang University College of Medicine, 222 Wangsimni-ro, Seongdong-gu, Seoul, 04763, Republic of Korea
| | - Jae Kyoon Hwang
- Department of Pediatrics, Hanyang University College of Medicine, 222 Wangsimni-ro, Seongdong-gu, Seoul, 04763, Republic of Korea
| | - Tae Hyun Kim
- Department of Computer Science, Hanyang University, 222 Wangsimni-ro, Seongdong-gu, Seoul, 04763, Republic of Korea.
| | - Hyun-Kyung Park
- Department of Pediatrics, Hanyang University College of Medicine, 222 Wangsimni-ro, Seongdong-gu, Seoul, 04763, Republic of Korea.
| |
Collapse
|
14
|
Ruiz D, Casas A, Escobar CA, Perez A, Gonzalez V. Advanced Machine Learning Techniques for Corrosion Rate Estimation and Prediction in Industrial Cooling Water Pipelines. SENSORS (BASEL, SWITZERLAND) 2024; 24:3564. [PMID: 38894355 PMCID: PMC11175261 DOI: 10.3390/s24113564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 05/24/2024] [Accepted: 05/29/2024] [Indexed: 06/21/2024]
Abstract
This paper presents the results of a study on data preprocessing and modeling for predicting corrosion in water pipelines of a steel industrial plant. The use case is a cooling circuit consisting of both direct and indirect cooling. In the direct cooling circuit, water comes into direct contact with the product, whereas in the indirect one, it does not. In this study, advanced machine learning techniques, such as extreme gradient boosting and deep neural networks, have been employed for two distinct applications. Firstly, a virtual sensor was created to estimate the corrosion rate based on influencing process variables, such as pH and temperature. Secondly, a predictive tool was designed to foresee the future evolution of the corrosion rate, considering past values of both influencing variables and the corrosion rate. The results show that the most suitable algorithm for the virtual sensor approach is the dense neural network, with MAPE values of (25 ± 4)% and (11 ± 4)% for the direct and indirect circuits, respectively. In contrast, different results are obtained for the two circuits when following the predictive tool approach. For the primary circuit, the convolutional neural network yields the best results, with MAPE = 4% on the testing set, whereas for the secondary circuit, the LSTM recurrent network shows the highest prediction accuracy, with MAPE = 9%. In general, models employing temporal windows have emerged as more suitable for corrosion prediction, with model performance significantly improving with a larger dataset.
Collapse
Affiliation(s)
| | - Abraham Casas
- Centro Tecnológico de Componentes-CTC, Scientific and Technological Park of Cantabria (PCTCAN), 39011 Santander, Spain; (D.R.); (C.A.E.); (V.G.)
| | | | | | | |
Collapse
|
15
|
Park JY, Lee SH, Kim YJ, Kim KG, Lee GJ. Machine learning model based on radiomics features for AO/OTA classification of pelvic fractures on pelvic radiographs. PLoS One 2024; 19:e0304350. [PMID: 38814948 PMCID: PMC11139281 DOI: 10.1371/journal.pone.0304350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Accepted: 05/10/2024] [Indexed: 06/01/2024] Open
Abstract
Depending on the degree of fracture, pelvic fracture can be accompanied by vascular damage, and in severe cases, it may progress to hemorrhagic shock. Pelvic radiography can quickly diagnose pelvic fractures, and the Association for Osteosynthesis Foundation and Orthopedic Trauma Association (AO/OTA) classification system is useful for evaluating pelvic fracture instability. This study aimed to develop a radiomics-based machine-learning algorithm to quickly diagnose fractures on pelvic X-ray and classify their instability. data used were pelvic anteroposterior radiographs of 990 adults over 18 years of age diagnosed with pelvic fractures, and 200 normal subjects. A total of 93 features were extracted based on radiomics:18 first-order, 24 GLCM, 16 GLRLM, 16 GLSZM, 5 NGTDM, and 14 GLDM features. To improve the performance of machine learning, the feature selection methods RFE, SFS, LASSO, and Ridge were used, and the machine learning models used LR, SVM, RF, XGB, MLP, KNN, and LGBM. Performance measurement was evaluated by area under the curve (AUC) by analyzing the receiver operating characteristic curve. The machine learning model was trained based on the selected features using four feature-selection methods. When the RFE feature selection method was used, the average AUC was higher than that of the other methods. Among them, the combination with the machine learning model SVM showed the best performance, with an average AUC of 0.75±0.06. By obtaining a feature-importance graph for the combination of RFE and SVM, it is possible to identify features with high importance. The AO/OTA classification of normal pelvic rings and pelvic fractures on pelvic AP radiographs using a radiomics-based machine learning model showed the highest AUC when using the SVM classification combination. Further research on the radiomic features of each part of the pelvic bone constituting the pelvic ring is needed.
Collapse
Affiliation(s)
- Jun Young Park
- Department of Health Sciences and Technology, Gachon Advanced Institute for Health Sciences and Technology (GAIHST), Gachon University, Incheon, Republic of Korea
| | - Seung Hwan Lee
- Department of Trauma Surgery, Gachon University Gil Medical Center, Gachon University, Incheon, Republic of Korea
- Department of Traumatology, Gachon University College of Medicine, Gachon University, Incheon, Republic of Korea
| | - Young Jae Kim
- Department of Health Sciences and Technology, Gachon Advanced Institute for Health Sciences and Technology (GAIHST), Gachon University, Incheon, Republic of Korea
- Department of Medical Devices R&D Center, Gachon University Gil Medical Center, Gachon University, Incheon, Republic of Korea
- Department of Biomedical Engineering, Pre-medical Course, College of Medicine, Gachon University, Incheon, Republic of Korea
| | - Kwang Gi Kim
- Department of Health Sciences and Technology, Gachon Advanced Institute for Health Sciences and Technology (GAIHST), Gachon University, Incheon, Republic of Korea
- Department of Medical Devices R&D Center, Gachon University Gil Medical Center, Gachon University, Incheon, Republic of Korea
- Department of Biomedical Engineering, Pre-medical Course, College of Medicine, Gachon University, Incheon, Republic of Korea
| | - Gil Jae Lee
- Department of Trauma Surgery, Gachon University Gil Medical Center, Gachon University, Incheon, Republic of Korea
- Department of Traumatology, Gachon University College of Medicine, Gachon University, Incheon, Republic of Korea
| |
Collapse
|
16
|
Lin N, Shao X, Wu H, Jiang R, Wu M. Heavy Metal Concentration Estimation for Different Farmland Soils Based on Projection Pursuit and LightGBM with Hyperspectral Images. SENSORS (BASEL, SWITZERLAND) 2024; 24:3251. [PMID: 38794105 PMCID: PMC11125194 DOI: 10.3390/s24103251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 05/12/2024] [Accepted: 05/19/2024] [Indexed: 05/26/2024]
Abstract
Heavy metal pollution in farmland soil threatens soil environmental quality. It is an important task to quickly grasp the status of heavy metal pollution in farmland soil in a region. Hyperspectral remote sensing technology has been widely used in soil heavy metal concentration monitoring. How to improve the accuracy and reliability of its estimation model is a hot topic. This study analyzed 440 soil samples from Sihe Town and the surrounding agricultural areas in Yushu City, Jilin Province. Considering the differences between different types of soils, a local regression model of heavy metal concentrations (As and Cu) was established based on projection pursuit (PP) and light gradient boosting machine (LightGBM) algorithms. Based on the estimations, a spatial distribution map of soil heavy metals in the region was drawn. The findings of this study showed that considering the differences between different soils to construct a local regression estimation model of soil heavy metal concentration improved the estimation accuracy. Specifically, the relative percent difference (RPD) of As and Cu element estimations in black soil increased the most, by 0.30 and 0.26, respectively. The regional spatial distribution map of heavy metal concentration derived from local regression showed high spatial variability. The number of characteristic bands screened by the PP method accounted for 10-13% of the total spectral bands, effectively reducing the model complexity. Compared with the traditional machine model, the LightGBM model showed better estimation ability, and the highest determination coefficients (R2) of different soil validation sets reached 0.73 (As) and 0.75 (Cu), respectively. In this study, the constructed PP-LightGBM estimation model takes into account the differences in soil types, which effectively improves the accuracy and reliability of hyperspectral image estimation of soil heavy metal concentration and provides a reference for drawing large-scale spatial distributions of heavy metals from hyperspectral images and mastering soil environmental quality.
Collapse
Affiliation(s)
- Nan Lin
- College of Surveying and Exploration Engineering, Jilin Jianzhu University, Changchun 130118, China; (N.L.); (X.S.); (M.W.)
- Jilin Province Natural Resources Remote Sensing Information Technology Innovation Laboratory, Changchun 130118, China
| | - Xiaofan Shao
- College of Surveying and Exploration Engineering, Jilin Jianzhu University, Changchun 130118, China; (N.L.); (X.S.); (M.W.)
| | - Huizhi Wu
- Henan Academy of Geology, Zhengzhou 450016, China
| | - Ranzhe Jiang
- College of Biological and Agricultural Engineering, Jilin University, Changchun 130012, China;
| | - Menghong Wu
- College of Surveying and Exploration Engineering, Jilin Jianzhu University, Changchun 130118, China; (N.L.); (X.S.); (M.W.)
- College of Resource and Environmental Science, Jilin Agricultural University, Changchun 130118, China
| |
Collapse
|
17
|
Wang X, Wang W, Ren H, Li X, Wen Y. Prediction and analysis of risk factors for diabetic retinopathy based on machine learning and interpretable models. Heliyon 2024; 10:e29497. [PMID: 38699007 PMCID: PMC11064081 DOI: 10.1016/j.heliyon.2024.e29497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 04/09/2024] [Accepted: 04/09/2024] [Indexed: 05/05/2024] Open
Abstract
Objective Diabetic retinopathy is one of the major complications of diabetes. In this study, a diabetic retinopathy risk prediction model integrating machine learning models and SHAP was established to increase the accuracy of risk prediction for diabetic retinopathy, explain the rationality of the findings from model prediction and improve the reliability of prediction results. Methods Data were preprocessed for missing values and outliers, features selected through information gain, a diabetic retinopathy risk prediction model established using the CatBoost and the outputs of the mode interpreted using the SHAP model. Results One thousand early warning data of diabetes complications derived from diabetes complication early warning dataset from the National Clinical Medical Sciences Data Center were used in this study. The CatBoost-based model for diabetic retinopathy prediction performed the best in the comparative model test. ALB_CR, HbA1c, UPR_24, NEPHROPATHY and SCR were positively correlated with diabetic retinopathy, while CP, HB, ALB, DBILI and CRP were negatively correlated with diabetic retinopathy. The relationships between HEIGHT, WEIGHT and ESR characteristics and diabetic retinopathy were not significant. Conclusion The risk factors for diabetic retinopathy include poor renal function, elevated blood glucose level, liver disease, hematonosis and dysarteriotony, among others. Diabetic retinopathy can be prevented by monitoring and effectively controlling relevant indices. In this study, the influence relationships between the features were also analyzed to further explore the potential factors of diabetic retinopathy, which can provide new methods and new ideas for the early prevention and clinical diagnosis of subsequent diabetic retinopathy.
Collapse
Affiliation(s)
- Xu Wang
- Institute of Medical Information/Medical Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Weijie Wang
- Institute of Medical Information/Medical Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Huiling Ren
- Institute of Medical Information/Medical Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Xiaoying Li
- Institute of Medical Information/Medical Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Yili Wen
- Institute of Medical Information/Medical Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| |
Collapse
|
18
|
Liu Z, Meng Z, Wei D, Qin Y, Lv Y, Xie L, Qiu H, Xie B, Li L, Wei X, Zhang D, Liang B, Li W, Qin S, Yan T, Meng Q, Wei H, Jiang G, Su L, Jiang N, Zhang K, Lv J, Hu Y. Predictive model and risk analysis for coronary heart disease in people living with HIV using machine learning. BMC Med Inform Decis Mak 2024; 24:110. [PMID: 38664736 PMCID: PMC11046885 DOI: 10.1186/s12911-024-02511-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 04/15/2024] [Indexed: 04/28/2024] Open
Abstract
OBJECTIVE This study aimed to construct a coronary heart disease (CHD) risk-prediction model in people living with human immunodeficiency virus (PLHIV) with the help of machine learning (ML) per electronic medical records (EMRs). METHODS Sixty-one medical characteristics (including demography information, laboratory measurements, and complicating disease) readily available from EMRs were retained for clinical analysis. These characteristics further aided the development of prediction models by using seven ML algorithms [light gradient-boosting machine (LightGBM), support vector machine (SVM), eXtreme gradient boosting (XGBoost), adaptive boosting (AdaBoost), decision tree, multilayer perceptron (MLP), and logistic regression]. The performance of this model was assessed using the area under the receiver operating characteristic curve (AUC). Shapley additive explanation (SHAP) was further applied to interpret the findings of the best-performing model. RESULTS The LightGBM model exhibited the highest AUC (0.849; 95% CI, 0.814-0.883). Additionally, the SHAP plot per the LightGBM depicted that age, heart failure, hypertension, glucose, serum creatinine, indirect bilirubin, serum uric acid, and amylase can help identify PLHIV who were at a high or low risk of developing CHD. CONCLUSION This study developed a CHD risk prediction model for PLHIV utilizing ML techniques and EMR data. The LightGBM model exhibited improved comprehensive performance and thus had higher reliability in assessing the risk predictors of CHD. Hence, it can potentially facilitate the development of clinical management techniques for PLHIV care in the era of EMRs.
Collapse
Affiliation(s)
- Zengjing Liu
- Information and Management College of Guangxi Medical University, Nanning, Guangxi, 530021, China
| | - Zhihao Meng
- Guangxi Clinical Center for AIDS Prevention and Treatment, Chest Hospital of Guangxi Zhuang Autonomous Region, No. 8 Yangjiaoshan Road, Liuzhou, Guangxi, 545005, China
| | - Di Wei
- Guangxi Clinical Center for AIDS Prevention and Treatment, Chest Hospital of Guangxi Zhuang Autonomous Region, No. 8 Yangjiaoshan Road, Liuzhou, Guangxi, 545005, China
| | - Yuan Qin
- Guangxi Clinical Center for AIDS Prevention and Treatment, Chest Hospital of Guangxi Zhuang Autonomous Region, No. 8 Yangjiaoshan Road, Liuzhou, Guangxi, 545005, China
| | - Yu Lv
- Guangxi Clinical Center for AIDS Prevention and Treatment, Chest Hospital of Guangxi Zhuang Autonomous Region, No. 8 Yangjiaoshan Road, Liuzhou, Guangxi, 545005, China
| | - Luman Xie
- Guangxi Clinical Center for AIDS Prevention and Treatment, Chest Hospital of Guangxi Zhuang Autonomous Region, No. 8 Yangjiaoshan Road, Liuzhou, Guangxi, 545005, China
| | - Hong Qiu
- Life Sciences College of Guangxi Medical University, Nanning, Guangxi, 530021, China
| | - Bo Xie
- Information and Management College of Guangxi Medical University, Nanning, Guangxi, 530021, China
| | - Lanxiang Li
- Basic Medical College of Guangxi Medical University, Nanning, Guangxi, 530021, China
| | - Xihua Wei
- Life Sciences College of Guangxi Medical University, Nanning, Guangxi, 530021, China
| | - Die Zhang
- Collaborative Innovation Centre of Regenerative Medicine and Medical BioResource Development and Application Co- constructed by the Province, Ministry of Guangxi Medical University, Nanning, Guangxi, 530021, China
| | - Boying Liang
- Basic Medical College of Guangxi Medical University, Nanning, Guangxi, 530021, China
| | - Wen Li
- Life Sciences College of Guangxi Medical University, Nanning, Guangxi, 530021, China
| | - Shanfang Qin
- Guangxi Clinical Center for AIDS Prevention and Treatment, Chest Hospital of Guangxi Zhuang Autonomous Region, No. 8 Yangjiaoshan Road, Liuzhou, Guangxi, 545005, China
| | - Tengyue Yan
- Collaborative Innovation Centre of Regenerative Medicine and Medical BioResource Development and Application Co- constructed by the Province, Ministry of Guangxi Medical University, Nanning, Guangxi, 530021, China
| | - Qiuxia Meng
- Information and Management College of Guangxi Medical University, Nanning, Guangxi, 530021, China
| | - Huilin Wei
- Life Sciences College of Guangxi Medical University, Nanning, Guangxi, 530021, China
| | - Guiyang Jiang
- Department of rehabilitation medicine, Department of the First affliated hospital of Guangxi Medical University, Nanning, Guangxi, 530021, China
| | - Lingsong Su
- Guangxi Clinical Center for AIDS Prevention and Treatment, Chest Hospital of Guangxi Zhuang Autonomous Region, No. 8 Yangjiaoshan Road, Liuzhou, Guangxi, 545005, China
| | - Nili Jiang
- Life Sciences College of Guangxi Medical University, Nanning, Guangxi, 530021, China
| | - Kai Zhang
- Guangxi Clinical Center for AIDS Prevention and Treatment, Chest Hospital of Guangxi Zhuang Autonomous Region, No. 8 Yangjiaoshan Road, Liuzhou, Guangxi, 545005, China.
| | - Jiannan Lv
- Affiliate Hospital of Youjiang Medical University for Nationalities, Baise, Guangxi, 533000, China.
| | - Yanling Hu
- Information and Management College of Guangxi Medical University, Nanning, Guangxi, 530021, China.
- Life Sciences College of Guangxi Medical University, Nanning, Guangxi, 530021, China.
- Collaborative Innovation Centre of Regenerative Medicine and Medical BioResource Development and Application Co- constructed by the Province, Ministry of Guangxi Medical University, Nanning, Guangxi, 530021, China.
- Faculty of Data science, City University of Macau, 999078, Macau, China.
| |
Collapse
|
19
|
Zemariam AB, Yimer A, Abebe GK, Wondie WT, Abate BB, Alamaw AW, Yilak G, Melaku TM, Ngusie HS. Employing supervised machine learning algorithms for classification and prediction of anemia among youth girls in Ethiopia. Sci Rep 2024; 14:9080. [PMID: 38643324 PMCID: PMC11032364 DOI: 10.1038/s41598-024-60027-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 04/18/2024] [Indexed: 04/22/2024] Open
Abstract
In developing countries, one-quarter of young women have suffered from anemia. However, the available studies in Ethiopia have been usually used the traditional stastical methods. Therefore, this study aimed to employ multiple machine learning algorithms to identify the most effective model for the prediction of anemia among youth girls in Ethiopia. A total of 5642 weighted samples of young girls from the 2016 Ethiopian Demographic and Health Survey dataset were utilized. The data underwent preprocessing, with 80% of the observations used for training the model and 20% for testing. Eight machine learning algorithms were employed to build and compare models. The model performance was assessed using evaluation metrics in Python software. Various data balancing techniques were applied, and the Boruta algorithm was used to select the most relevant features. Besides, association rule mining was conducted using the Apriori algorithm in R software. The random forest classifier with an AUC value of 82% outperformed in predicting anemia among all the tested classifiers. Region, poor wealth index, no formal education, unimproved toilet facility, rural residence, not used contraceptive method, religion, age, no media exposure, occupation, and having more than 5 family size were the top attributes to predict anemia. Association rule mining was identified the top seven best rules that most frequently associated with anemia. The random forest classifier is the best for predicting anemia. Therefore, making it potentially valuable as decision-support tools for the relevant stakeholders and giving emphasis for the identified predictors could be an important intervention to halt anemia among youth girls.
Collapse
Affiliation(s)
- Alemu Birara Zemariam
- Department of Pediatrics and Child Health Nursing, School of Nursing, College of Medicine and Health Science, Woldia University, Po. Box: 400, Woldia, Ethiopia.
| | - Ali Yimer
- Department of Public Health, School of Public Health, College of Medicine and Health Science, Woldia University, Woldia, Ethiopia
| | - Gebremeskel Kibret Abebe
- Department of Emergency and Critical Care Nursing, School of Nursing, College of Medicine and Health Science, Woldia University, Woldia, Ethiopia
| | - Wubet Tazeb Wondie
- Department of Pediatrics and Child Health Nursing, School of Nursing, College of Medicine and Health Science, Ambo University, Ambo, Ethiopia
| | - Biruk Beletew Abate
- Department of Pediatrics and Child Health Nursing, School of Nursing, College of Medicine and Health Science, Woldia University, Po. Box: 400, Woldia, Ethiopia
| | - Addis Wondmagegn Alamaw
- Department of Emergency and Critical Care Nursing, School of Nursing, College of Medicine and Health Science, Woldia University, Woldia, Ethiopia
| | - Gizachew Yilak
- Department of Nursing, School of Nursing, College of Medicine and Health Science, Woldia University, Woldia, Ethiopia
| | | | - Habtamu Setegn Ngusie
- Department of Health Informatics, School of Public Health, College of Medicine and Health Science, Woldia University, Woldia, Ethiopia
| |
Collapse
|
20
|
Orimaye SO, Schmidtke KA. Combining artificial neural networks and a marginal structural model to predict the progression from depression to Alzheimer's disease. FRONTIERS IN DEMENTIA 2024; 3:1362230. [PMID: 39081615 PMCID: PMC11285640 DOI: 10.3389/frdem.2024.1362230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Accepted: 03/21/2024] [Indexed: 08/02/2024]
Abstract
Introduction Decades of research in population health have established depression as a likely precursor to Alzheimer's disease. A combination of causal estimates and machine learning methods in artificial intelligence could identify internal and external mediating mechanisms that contribute to the likelihood of progression from depression to Alzheimer's disease. Methods We developed an integrated predictive model, combining the marginal structural model and an artificial intelligence predictive model, distinguishing between patients likely to progress from depressive states to Alzheimer's disease better than each model alone. Results The integrated predictive model achieved substantial clinical relevance when using the area under the curve measure. It performed better than the traditional statistical method or a single artificial intelligence method alone. Discussion The integrated predictive model could form a part of a clinical screening tool that identifies patients who are likely to progress from depression to Alzheimer's disease for early behavioral health interventions. Given the high costs of treating Alzheimer's disease, our model could serve as a cost-effective intervention for the early detection of depression before it progresses to Alzheimer's disease.
Collapse
Affiliation(s)
- Sylvester O. Orimaye
- College of Global Population Health, University of Health Sciences and Pharmacy, St. Louis, MO, United States
| | - Kelly A. Schmidtke
- College of Arts and Sciences, University of Health Sciences and Pharmacy, St. Louis, MO, United States
| |
Collapse
|
21
|
Qadri AM, Hashmi MSA, Raza A, Zaidi SAJ, Rehman AU. Heart failure survival prediction using novel transfer learning based probabilistic features. PeerJ Comput Sci 2024; 10:e1894. [PMID: 38660216 PMCID: PMC11042000 DOI: 10.7717/peerj-cs.1894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 01/30/2024] [Indexed: 04/26/2024]
Abstract
Heart failure is a complex cardiovascular condition characterized by the heart's inability to pump blood effectively, leading to a cascade of physiological changes. Predicting survival in heart failure patients is crucial for optimizing patient care and resource allocation. This research aims to develop a robust survival prediction model for heart failure patients using advanced machine learning techniques. We analyzed data from 299 hospitalized heart failure patients, addressing the issue of imbalanced data with the Synthetic Minority Oversampling (SMOTE) method. Additionally, we proposed a novel transfer learning-based feature engineering approach that generates a new probabilistic feature set from patient data using ensemble trees. Nine fine-tuned machine learning models are built and compared to evaluate performance in patient survival prediction. Our novel transfer learning mechanism applied to the random forest model outperformed other models and state-of-the-art studies, achieving a remarkable accuracy of 0.975. All models underwent evaluation using 10-fold cross-validation and tuning through hyperparameter optimization. The findings of this study have the potential to advance the field of cardiovascular medicine by providing more accurate and personalized prognostic assessments for individuals with heart failure.
Collapse
Affiliation(s)
- Azam Mehmood Qadri
- Institute of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
| | - Muhammad Shadab Alam Hashmi
- Institute of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
| | - Ali Raza
- Institute of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
| | - Syed Ali Jafar Zaidi
- Institute of Information Technology, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan
| | - Atiq ur Rehman
- Artificial Intelligence and Intelligent Systems Research Group, School of Innovation, Design and Engineering, Mälardalen University, Västerås, Sweden
| |
Collapse
|
22
|
Zhang HQ, Liu SH, Li R, Yu JW, Ye DX, Yuan SS, Lin H, Huang CB, Tang H. MIBPred: Ensemble Learning-Based Metal Ion-Binding Protein Classifier. ACS OMEGA 2024; 9:8439-8447. [PMID: 38405489 PMCID: PMC10882704 DOI: 10.1021/acsomega.3c09587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/16/2024] [Accepted: 01/22/2024] [Indexed: 02/27/2024]
Abstract
In biological organisms, metal ion-binding proteins participate in numerous metabolic activities and are closely associated with various diseases. To accurately predict whether a protein binds to metal ions and the type of metal ion-binding protein, this study proposed a classifier named MIBPred. The classifier incorporated advanced Word2Vec technology from the field of natural language processing to extract semantic features of the protein sequence language and combined them with position-specific score matrix (PSSM) features. Furthermore, an ensemble learning model was employed for the metal ion-binding protein classification task. In the model, we independently trained XGBoost, LightGBM, and CatBoost algorithms and integrated the output results through an SVM voting mechanism. This innovative combination has led to a significant breakthrough in the predictive performance of our model. As a result, we achieved accuracies of 95.13% and 85.19%, respectively, in predicting metal ion-binding proteins and their types. Our research not only confirms the effectiveness of Word2Vec technology in extracting semantic information from protein sequences but also highlights the outstanding performance of the MIBPred classifier in the problem of metal ion-binding protein types. This study provides a reliable tool and method for the in-depth exploration of the structure and function of metal ion-binding proteins.
Collapse
Affiliation(s)
- Hong-Qi Zhang
- School
of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of
China, Chengdu 610054, China
| | - Shang-Hua Liu
- School
of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of
China, Chengdu 610054, China
| | - Rui Li
- School
of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of
China, Chengdu 610054, China
| | - Jun-Wen Yu
- School
of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of
China, Chengdu 610054, China
| | - Dong-Xin Ye
- School
of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of
China, Chengdu 610054, China
| | - Shi-Shi Yuan
- School
of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of
China, Chengdu 610054, China
| | - Hao Lin
- School
of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of
China, Chengdu 610054, China
| | - Cheng-Bing Huang
- School
of Computer Science and Technology, Aba Teachers University, Aba 623002, China
| | - Hua Tang
- School
of Basic Medical Sciences, Southwest Medical
University, Luzhou 646000, China
- Central
Nervous System Drug Key Laboratory of Sichuan Province, Luzhou 646000, China
| |
Collapse
|
23
|
Aygun U, Yagin FH, Yagin B, Yasar S, Colak C, Ozkan AS, Ardigò LP. Assessment of Sepsis Risk at Admission to the Emergency Department: Clinical Interpretable Prediction Model. Diagnostics (Basel) 2024; 14:457. [PMID: 38472930 DOI: 10.3390/diagnostics14050457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 02/18/2024] [Accepted: 02/19/2024] [Indexed: 03/14/2024] Open
Abstract
This study aims to develop an interpretable prediction model based on explainable artificial intelligence to predict bacterial sepsis and discover important biomarkers. A total of 1572 adult patients, 560 of whom were sepsis positive and 1012 of whom were negative, who were admitted to the emergency department with suspicion of sepsis, were examined. We investigated the performance characteristics of sepsis biomarkers alone and in combination for confirmed sepsis diagnosis using Sepsis-3 criteria. Three different tree-based algorithms-Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), Adaptive Boosting (AdaBoost)-were used for sepsis prediction, and after examining comprehensive performance metrics, descriptions of the optimal model were obtained with the SHAP method. The XGBoost model achieved accuracy of 0.898 (0.868-0.929) and area under the ROC curve (AUC) of 0.940 (0.898-0.980) with a 95% confidence interval. The five biomarkers for predicting sepsis were age, respiratory rate, oxygen saturation, procalcitonin, and positive blood culture. SHAP results revealed that older age, higher respiratory rate, procalcitonin, neutrophil-lymphocyte count ratio, C-reactive protein, plaque, leukocyte particle concentration, as well as lower oxygen saturation, systolic blood pressure, and hemoglobin levels increased the risk of sepsis. As a result, the Explainable Artificial Intelligence (XAI)-based prediction model can guide clinicians in the early diagnosis and treatment of sepsis, providing more effective sepsis management and potentially reducing mortality rates and medical costs.
Collapse
Affiliation(s)
- Umran Aygun
- Department of Anesthesiology and Reanimation, Malatya Yesilyurt Hasan Calık State Hospital, Malatya 44929, Turkey
| | - Fatma Hilal Yagin
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, Malatya 44280, Turkey
| | - Burak Yagin
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, Malatya 44280, Turkey
| | - Seyma Yasar
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, Malatya 44280, Turkey
| | - Cemil Colak
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, Malatya 44280, Turkey
| | - Ahmet Selim Ozkan
- Department of Anesthesiology and Reanimation, Malatya Turgut Ozal University School of Medicine, Malatya 44210, Turkey
| | - Luca Paolo Ardigò
- Department of Teacher Education, NLA University College, 0166 Oslo, Norway
| |
Collapse
|
24
|
Kebede SR, Waldamichael FG, Debelee TG, Aleme M, Bedane W, Mezgebu B, Merga ZC. Dual view deep learning for enhanced breast cancer screening using mammography. Sci Rep 2024; 14:3839. [PMID: 38360869 PMCID: PMC10869685 DOI: 10.1038/s41598-023-50797-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 12/26/2023] [Indexed: 02/17/2024] Open
Abstract
Breast cancer has the highest incidence rate among women in Ethiopia compared to other types of cancer. Unfortunately, many cases are detected at a stage where a cure is delayed or not possible. To address this issue, mammography-based screening is widely accepted as an effective technique for early detection. However, the interpretation of mammography images requires experienced radiologists in breast imaging, a resource that is limited in Ethiopia. In this research, we have developed a model to assist radiologists in mass screening for breast abnormalities and prioritizing patients. Our approach combines an ensemble of EfficientNet-based classifiers with YOLOv5, a suspicious mass detection method, to identify abnormalities. The inclusion of YOLOv5 detection is crucial in providing explanations for classifier predictions and improving sensitivity, particularly when the classifier fails to detect abnormalities. To further enhance the screening process, we have also incorporated an abnormality detection model. The classifier model achieves an F1-score of 0.87 and a sensitivity of 0.82. With the addition of suspicious mass detection, sensitivity increases to 0.89, albeit at the expense of a slightly lower F1-score of 0.79.
Collapse
Affiliation(s)
- Samuel Rahimeto Kebede
- Research Development Cluster, Ethiopian Artificial Intelligence Institute, Addis Ababa, 40782, Ethiopia.
- College of Engineering, Debre Berhan University, Debre Berhan, Ethiopia.
| | - Fraol Gelana Waldamichael
- Research Development Cluster, Ethiopian Artificial Intelligence Institute, Addis Ababa, 40782, Ethiopia
| | - Taye Girma Debelee
- Research Development Cluster, Ethiopian Artificial Intelligence Institute, Addis Ababa, 40782, Ethiopia
- College of Electrical and Mechanical Engineering, Addis Ababa Science and Technology University, Addis Ababa, 120611, Ethiopia
| | | | - Wubalem Bedane
- Radiology, St. Pauli Millenium Medical College, Addis Ababa, Ethiopia
| | - Bethelhem Mezgebu
- Radiology, St. Pauli Millenium Medical College, Addis Ababa, Ethiopia
| | | |
Collapse
|
25
|
Vargas J, Pease M, Snyder MH, Blalock J, Wu S, Nwachuku E, Mittal A, Okonkwo DO, Kellogg RT. Automated Preoperative and Postoperative Volume Estimates Risk of Retreatment in Chronic Subdural Hematoma: A Retrospective, Multicenter Study. Neurosurgery 2024; 94:317-324. [PMID: 37747231 DOI: 10.1227/neu.0000000000002667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Accepted: 07/17/2023] [Indexed: 09/26/2023] Open
Abstract
BACKGROUND AND OBJECTIVES Several neurosurgical pathologies, ranging from glioblastoma to hemorrhagic stroke, use volume thresholds to guide treatment decisions. For chronic subdural hematoma (cSDH), with a risk of retreatment of 10%-30%, the relationship between preoperative and postoperative cSDH volume and retreatment is not well understood. We investigated the potential link between preoperative and postoperative cSDH volumes and retreatment. METHODS We performed a retrospective chart review of patients operated for unilateral cSDH from 4 level 1 trauma centers, February 2009-August 2021. We used a 3-dimensional deep learning, automated segmentation pipeline to calculate preoperative and postoperative cSDH volumes. To identify volume thresholds, we constructed a receiver operating curve with preoperative and postoperative volumes to predict cSDH retreatment rates and selected the threshold with the highest Youden index. Then, we developed a light gradient boosting machine to predict the risk of cSDH recurrence. RESULTS We identified 538 patients with unilateral cSDH, of whom 62 (12%) underwent surgical retreatment within 6 months of the index surgery. cSDH retreatment was associated with higher preoperative (122 vs 103 mL; P < .001) and postoperative (62 vs 35 mL; P < .001) volumes. Patients with >140 mL preoperative volume had nearly triple the risk of cSDH recurrence compared with those below 140 mL, while a postoperative volume >46 mL led to an increased risk for retreatment (22% vs 6%; P < .001). On multivariate modeling, our model had an area under the receiver operating curve of 0.76 (95% CI: 0.60-0.93) for predicting retreatment. The most important features were preoperative and postoperative volume, platelet count, and age. CONCLUSION Larger preoperative and postoperative cSDH volumes increase the risk of retreatment. Volume thresholds may allow identification of patients at high risk of cSDH retreatment who would benefit from adjunct treatments. Machine learning algorithm can quickly provide accurate estimates of preoperative and postoperative volumes.
Collapse
Affiliation(s)
- Jan Vargas
- Division of Neurosurgery, PRISMA Health, Greenville , South Carolina , USA
| | - Matthew Pease
- Department of Neurosurgery, Memorial Sloan Kettering Cancer Center, New York , New York , USA
| | - M Harrison Snyder
- Department of Neurosurgery, Tufts Medical Center, Boston , Massachusetts , USA
| | - Jonathan Blalock
- University of South Carolina School of Medicine Greenville, Greenville , South Carolina , USA
| | - Shandong Wu
- Department of Neurosurgery, UPMC Healthcare System, Pittsburgh , Pennsylvania , USA
| | - Enyinna Nwachuku
- Department of Neurosurgery, Cleveland Clinic Foundation, Cleveland , Ohio , USA
| | - Aditya Mittal
- Department of Neurosurgery, University of Pittsburgh Medical Center Medical School, Pittsburgh , Pennsylvania , USA
| | - David O Okonkwo
- Department of Neurosurgery, UPMC Healthcare System, Pittsburgh , Pennsylvania , USA
| | - Ryan T Kellogg
- Department of Neurosurgery, University of Virginia, Charlottesville , Virginia , USA
| |
Collapse
|
26
|
Biju VG, Schmitt AM, Engelmann B. Assessing the Influence of Sensor-Induced Noise on Machine-Learning-Based Changeover Detection in CNC Machines. SENSORS (BASEL, SWITZERLAND) 2024; 24:330. [PMID: 38257422 PMCID: PMC10819623 DOI: 10.3390/s24020330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 12/18/2023] [Accepted: 01/03/2024] [Indexed: 01/24/2024]
Abstract
The noise in sensor data has a substantial impact on the reliability and accuracy of (ML) algorithms. A comprehensive framework is proposed to analyze the effects of diverse noise inputs in sensor data on the accuracy of ML models. Through extensive experimentation and evaluation, this research examines the resilience of a LightGBM ML model to ten different noise models, namely, Flicker, Impulse, Gaussian, Brown, Periodic, and others. A thorough analytical approach with various statistical metrics in a Monte Carlo simulation setting was followed. It was found that the Gaussian and Colored noise were detrimental when compared to Flicker and Brown, which are identified as safe noise categories. It was interesting to find a safe threshold limit of noise intensity for the case of Gaussian noise, which was missing in other noise types. This research work employed the use case of changeover detection in (CNC) manufacturing machines and the corresponding data from the publicly funded research project (OBerA).
Collapse
Affiliation(s)
| | | | - Bastian Engelmann
- Institute of Digital Engineering, Technical University of Applied Sciences Wuerzburg-Schweinfurt, 97421 Schweinfurt, Germany; (V.G.B.)
| |
Collapse
|
27
|
Jain P, Gupta S. Enhancing blood flow prediction in multi-exposure laser speckle contrast imaging through ensemble learning with K-mean clustering. Biomed Phys Eng Express 2024; 10:025005. [PMID: 38109789 DOI: 10.1088/2057-1976/ad16c2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 12/18/2023] [Indexed: 12/20/2023]
Abstract
Purpose.Accurately visualizing and measuring blood flow is of utmost importance in maintaining optimal health and preventing the onset of various chronic diseases. One promising imaging technique that aids in visualizing perfusion in biological tissues is Multi-exposure Laser Speckle Contrast Imaging (MELSCI). MELSCI technique allows real-time quantitative measurements using multiple exposure times to obtain precise and reliable blood flow data. Additionally, the application of machine learning (ML) techniques can further enhance the accuracy of blood flow prediction in this imaging modality.Method.Our study focused on developing and evaluating Ensemble Learning ML techniques along with clustering algorithms for predicting blood flow rates in MELSCI. The effectiveness of these techniques was assessed using performance parameters, including accuracy, F1-score, precision, recall, specificity, and classification error rate.Result.Notably, the study revealed that Ensemble Learning with clustering emerged as the most accurate technique, achieving an impressive accuracy rate of 98.5%. Furthermore, it demonstrated a high recall of more than 91%, F1-score, the precision of more than 90%, higher specificity of 99%, and least classification error of 1.5%, highlighting its suitability and sustainability for flow prediction in MELSCI.Conclusion.The study's findings imply that Ensemble Learning can significantly contribute to enhancing the accuracy of blood flow prediction in MELSCI. This advancement holds substantial promise for healthcare professionals and researchers, as it facilitates improved understanding and assessment of perfusion within biological tissues, which will contribute to the maintenance of good health and prevention of chronic diseases.
Collapse
Affiliation(s)
- Pankaj Jain
- National Institute of Technology Raipur, Raipur, CG, 492010, India
| | - Saurabh Gupta
- National Institute of Technology Raipur, Raipur, CG, 492010, India
| |
Collapse
|
28
|
Ikuta S, Aihara T, Nakajima T, Yamanaka N. Predicting Pathological Response to Preoperative Chemotherapy in Pancreatic Ductal Adenocarcinoma Using Post-Chemotherapy Computed Tomography Radiomics. Cureus 2024; 16:e52193. [PMID: 38348011 PMCID: PMC10859726 DOI: 10.7759/cureus.52193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/12/2024] [Indexed: 02/15/2024] Open
Abstract
INTRODUCTION Assessing the response to preoperative treatment in pancreatic cancer provides valuable information for guiding subsequent treatment strategies. The present study aims to develop and validate a computed tomography (CT) radiomics-based machine learning (ML) model for predicting pathological response (PR) to preoperative chemotherapy in pancreatic ductal adenocarcinoma (PDAC). METHODS Retrospective data were analyzed from 86 PDAC patients undergoing neoadjuvant or conversion chemotherapy followed by surgical resection from January 2018 to May 2023. The cohort was randomly divided into training (70%, n = 60) and testing (30%, n = 26) sets. Favorable PR was defined as Evans grade IIb or greater. Radiomic features were extracted from post-chemotherapy CT images, and dimensionality reduction was performed using the least absolute shrinkage and selection operator (LASSO) logistic regression. Four ML classifiers (Light Gradient Boosting Machine (LGBM), Random Forest, AdaBoost, and Quadratic Discriminant Analysis) were evaluated for predicting a favorable PR. Model performance was primarily assessed using the area under the receiver operating characteristic curve (AUC), Brier score, and decision curve analysis. RESULTS Forty-one (47.7%) patients had a favorable PR. LASSO analysis on the training set identified five radiomic features. The LGBM model demonstrated the best performance, with a training AUC of 0.902 and a testing AUC of 0.923. It also exhibited the lowest Brier scores, both in training (0.136) and testing (0.135). Decision curve analysis further confirmed its clinical potential. CONCLUSION The CT radiomics-based ML model exhibited promising performance in predicting PR in PDAC after neoadjuvant/conversion chemotherapy. This suggests clinical utility in optimizing surgical candidates and timing of surgery, leading to personalized treatment strategies.
Collapse
|
29
|
El-Sherbini AH, Shah A, Cheng R, Elsebaie A, Harby AA, Redfearn D, El-Diasty M. Machine Learning for Predicting Postoperative Atrial Fibrillation After Cardiac Surgery: A Scoping Review of Current Literature. Am J Cardiol 2023; 209:66-75. [PMID: 37871512 DOI: 10.1016/j.amjcard.2023.09.079] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 09/12/2023] [Accepted: 09/21/2023] [Indexed: 10/25/2023]
Abstract
Postoperative atrial fibrillation (POAF) occurs in up to 20% to 55% of patients who underwent cardiac surgery. Machine learning (ML) has been increasingly employed in monitoring, screening, and identifying different cardiovascular clinical conditions. It was proposed that ML may be a useful tool for predicting POAF after cardiac surgery. An electronic database search was conducted on Medline, EMBASE, Cochrane, Google Scholar, and ClinicalTrials.gov to identify primary studies that investigated the role of ML in predicting POAF after cardiac surgery. A total of 5,955 citations were subjected to title and abstract screening, and ultimately 5 studies were included. The reported incidence of POAF ranged from 21.5% to 37.1%. The studied ML models included: deep learning, decision trees, logistic regression, support vector machines, gradient boosting decision tree, gradient-boosted machine, K-nearest neighbors, neural network, and random forest models. The sensitivity of the reported ML models ranged from 0.22 to 0.91, the specificity from 0.64 to 0.84, and the area under the receiver operating characteristic curve from 0.67 to 0.94. Age, gender, left atrial diameter, glomerular filtration rate, and duration of mechanical ventilation were significant clinical risk factors for POAF. Limited evidence suggest that machine learning models may play a role in predicting atrial fibrillation after cardiac surgery because of their ability to detect different patterns of correlations and the incorporation of several demographic and clinical variables. However, the heterogeneity of the included studies and the lack of external validation are the most important limitations against the routine incorporation of these models in routine practice. Artificial intelligence, cardiac surgery, decision tree, deep learning, gradient-boosted machine, gradient boosting decision tree, k-nearest neighbors, logistic regression, machine learning, neural network, postoperative atrial fibrillation, postoperative complications, random forest, risk scores, scoping review, support vector machine.
Collapse
Affiliation(s)
| | - Aryan Shah
- School of Medicine, Queen's University, Kingston, Ontario, Canada
| | - Richard Cheng
- School of Medicine, Queen's University, Kingston, Ontario, Canada
| | | | - Ahmed A Harby
- The School of Computing, Queen's University, Kingston, Ontario, Canada
| | - Damian Redfearn
- Division of Cardiology, Department of Medicine, Queen's University, Kingston, Ontario, Canada
| | - Mohammad El-Diasty
- Division of Cardiac Surgery, University Hospitals Cleveland Medical Center, Cleveland, Ohio, USA.
| |
Collapse
|
30
|
Guldogan E, Yagin FH, Pinar A, Colak C, Kadry S, Kim J. A proposed tree-based explainable artificial intelligence approach for the prediction of angina pectoris. Sci Rep 2023; 13:22189. [PMID: 38092844 PMCID: PMC10719282 DOI: 10.1038/s41598-023-49673-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 12/11/2023] [Indexed: 12/17/2023] Open
Abstract
Cardiovascular diseases (CVDs) are a serious public health issue that affects and is responsible for numerous fatalities and impairments. Ischemic heart disease (IHD) is one of the most prevalent and deadliest types of CVDs and is responsible for 45% of all CVD-related fatalities. IHD occurs when the blood supply to the heart is reduced due to narrowed or blocked arteries, which causes angina pectoris (AP) chest pain. AP is a common symptom of IHD and can indicate a higher risk of heart attack or sudden cardiac death. Therefore, it is important to diagnose and treat AP promptly and effectively. To forecast AP in women, we constructed a novel artificial intelligence (AI) method employing the tree-based algorithm known as an Explainable Boosting Machine (EBM). EBM is a machine learning (ML) technique that combines the interpretability of linear models with the flexibility and accuracy of gradient boosting. We applied EBM to a dataset of 200 female patients, 100 with AP and 100 without AP, and extracted the most relevant features for AP prediction. We then evaluated the performance of EBM against other AI methods, such as Logistic Regression (LR), Categorical Boosting (CatBoost), eXtreme Gradient Boosting (XGBoost), Adaptive Boosting (AdaBoost), and Light Gradient Boosting Machine (LightGBM). We found that EBM was the most accurate and well-balanced technique for forecasting AP, with accuracy (0.925) and Youden's index (0.960). We also looked at the global and local explanations provided by EBM to better understand how each feature affected the prediction and how each patient was classified. Our research showed that EBM is a useful AI method for predicting AP in women and identifying the risk factors related to it. This can help clinicians to provide personalized and evidence-based care for female patients with AP.
Collapse
Affiliation(s)
- Emek Guldogan
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, 44280, Malatya, Turkey
| | - Fatma Hilal Yagin
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, 44280, Malatya, Turkey.
| | - Abdulvahap Pinar
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, 44280, Malatya, Turkey
| | - Cemil Colak
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, 44280, Malatya, Turkey
| | - Seifedine Kadry
- Noroff University College, Kristiansand, Norway
- Artificial Intelligence Research Center (AIRC), Ajman University, 346, Ajman, United Arab Emirates
- Department of Electrical and Computer Engineering, Lebanese American University, Byblos, Lebanon
| | - Jungeun Kim
- Department of Software, Kongju National University, Cheonan, 31080, Korea.
| |
Collapse
|
31
|
Maulana A, Noviandy TR, Suhendra R, Earlia N, Bulqiah M, Idroes GM, Niode NJ, Sofyan H, Subianto M, Idroes R. Evaluation of atopic dermatitis severity using artificial intelligence. NARRA J 2023; 3:e511. [PMID: 38450339 PMCID: PMC10914065 DOI: 10.52225/narra.v3i3.511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 12/18/2023] [Indexed: 03/08/2024]
Abstract
Atopic dermatitis is a prevalent and persistent chronic inflammatory skin disorder that poses significant challenges when it comes to accurately assessing its severity. The aim of this study was to evaluate deep learning models for automated atopic dermatitis severity scoring using a dataset of Aceh ethnicity individuals in Indonesia. The dataset of clinical images was collected from 250 patients at Dr. Zainoel Abidin Hospital, Banda Aceh, Indonesia and labeled by dermatologists as mild, moderate, severe, or none. Five pretrained convolutional neural networks (CNN) architectures were evaluated: ResNet50, VGGNet19, MobileNetV3, MnasNet, and EfficientNetB0. The evaluation metrics, including accuracy, precision, sensitivity, specificity, and F1-score, were employed to assess the models. Among the models, ResNet50 emerged as the most proficient, demonstrating an accuracy of 89.8%, precision of 90.00%, sensitivity of 89.80%, specificity of 96.60%, and an F1-score of 89.85%. These results highlight the potential of incorporating advanced, data-driven models into the field of dermatology. These models can serve as invaluable tools to assist dermatologists in making early and precise assessments of atopic dermatitis severity and therefore improve patient care and outcomes.
Collapse
Affiliation(s)
- Aga Maulana
- Department of Informatics, Faculty of Mathematics and Natural Sciences, Universitas Syiah Kuala, Banda Aceh, Indonesia
| | - Teuku R Noviandy
- Department of Informatics, Faculty of Mathematics and Natural Sciences, Universitas Syiah Kuala, Banda Aceh, Indonesia
| | - Rivansyah Suhendra
- Department of Information Technology, Faculty of Engineering, Universitas Teuku Umar, Meulaboh, Indonesia
| | - Nanda Earlia
- Dermatology Division, Dr. Zainoel Abidin Hospital, Banda Aceh, Indonesia
- Department of Dermatology and Venereology, Faculty of Medicine, Universitas Syiah Kuala, Banda Aceh, Indonesia
| | - Mikyal Bulqiah
- Dermatology Division, Dr. Zainoel Abidin Hospital, Banda Aceh, Indonesia
| | - Ghazi M Idroes
- Department of Occupational Health and Safety, Faculty of Health Sciences, Universitas Abulyatama, Aceh Besar, Indonesia
| | - Nurdjannah J Niode
- Department of Dermatology and Venereology, Faculty of Medicine, Sam Ratulangi University, Manado, Indonesia
| | - Hizir Sofyan
- Department of Statistics, Faculty of Mathematics and Natural Sciences, Universitas Syiah Kuala, Banda Aceh, Indonesia
| | - Muhammad Subianto
- Department of Statistics, Faculty of Mathematics and Natural Sciences, Universitas Syiah Kuala, Banda Aceh, Indonesia
| | - Rinaldi Idroes
- Department of Pharmacy, Faculty of Mathematics and Natural Sciences, Universitas Syiah Kuala, Banda Aceh, Indonesia
- Department of Statistics, Faculty of Mathematics and Natural Sciences, Universitas Syiah Kuala, Banda Aceh, Indonesia
| |
Collapse
|
32
|
Zuo D, Yang L, Jin Y, Qi H, Liu Y, Ren L. Machine learning-based models for the prediction of breast cancer recurrence risk. BMC Med Inform Decis Mak 2023; 23:276. [PMID: 38031071 PMCID: PMC10688055 DOI: 10.1186/s12911-023-02377-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 11/17/2023] [Indexed: 12/01/2023] Open
Abstract
Breast cancer is the most common malignancy diagnosed in women worldwide. The prevalence and incidence of breast cancer is increasing every year; therefore, early diagnosis along with suitable relapse detection is an important strategy for prognosis improvement. This study aimed to compare different machine algorithms to select the best model for predicting breast cancer recurrence. The prediction model was developed by using eleven different machine learning (ML) algorithms, including logistic regression (LR), random forest (RF), support vector classification (SVC), extreme gradient boosting (XGBoost), gradient boosting decision tree (GBDT), decision tree, multilayer perceptron (MLP), linear discriminant analysis (LDA), adaptive boosting (AdaBoost), Gaussian naive Bayes (GaussianNB), and light gradient boosting machine (LightGBM), to predict breast cancer recurrence. The area under the curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and F1 score were used to evaluate the performance of the prognostic model. Based on performance, the optimal ML was selected, and feature importance was ranked by Shapley Additive Explanation (SHAP) values. Compared to the other 10 algorithms, the results showed that the AdaBoost algorithm had the best prediction performance for successfully predicting breast cancer recurrence and was adopted in the establishment of the prediction model. Moreover, CA125, CEA, Fbg, and tumor diameter were found to be the most important features in our dataset to predict breast cancer recurrence. More importantly, our study is the first to use the SHAP method to improve the interpretability of clinicians to predict the recurrence model of breast cancer based on the AdaBoost algorithm. The AdaBoost algorithm offers a clinical decision support model and successfully identifies the recurrence of breast cancer.
Collapse
Affiliation(s)
- Duo Zuo
- Department of Clinical Laboratory, Tianjin Medical University Cancer Institute & Hospital, Tianjin, 300060, China
- National Clinical Research Center for Cancer, Tianjin, 300060, China
- Tianjin's Clinical Research Center for Cancer, Tianjin, 300060, China
- Key Laboratory of Cancer Prevention and Therapy, Tianjin, 300060, China
- Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, 300060, China
| | - Lexin Yang
- Department of Clinical Laboratory, Tianjin Medical University Cancer Institute & Hospital, Tianjin, 300060, China
- National Clinical Research Center for Cancer, Tianjin, 300060, China
- Tianjin's Clinical Research Center for Cancer, Tianjin, 300060, China
- Key Laboratory of Cancer Prevention and Therapy, Tianjin, 300060, China
- Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, 300060, China
| | - Yu Jin
- Department of Clinical Laboratory, Tianjin Medical University Cancer Institute & Hospital, Tianjin, 300060, China
- Tongji University Cancer Center, Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai, 200072, China
| | - Huan Qi
- China Mobile Group Tianjin Company Limited, Tianjin, 300308, China
| | - Yahui Liu
- Department of Clinical Laboratory, Tianjin Medical University Cancer Institute & Hospital, Tianjin, 300060, China
- National Clinical Research Center for Cancer, Tianjin, 300060, China
- Tianjin's Clinical Research Center for Cancer, Tianjin, 300060, China
- Key Laboratory of Cancer Prevention and Therapy, Tianjin, 300060, China
- Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, 300060, China
| | - Li Ren
- Department of Clinical Laboratory, Tianjin Medical University Cancer Institute & Hospital, Tianjin, 300060, China.
- National Clinical Research Center for Cancer, Tianjin, 300060, China.
- Tianjin's Clinical Research Center for Cancer, Tianjin, 300060, China.
- Key Laboratory of Cancer Prevention and Therapy, Tianjin, 300060, China.
- Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, 300060, China.
| |
Collapse
|
33
|
Liu JX, Na RS, Yang LJ, Huang XR, Zhao X. Discovery of potential RIPK1 inhibitors by machine learning and molecular dynamics simulations. Phys Chem Chem Phys 2023; 25:31418-31430. [PMID: 37962373 DOI: 10.1039/d3cp03755j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Receptor-interacting serine/threonine-protein kinase 1 (RIPK1) plays a crucial role in inflammation and cell death, so it is a promising candidate for the treatment of autoimmune, inflammatory, neurodegenerative, and ischemic diseases. So far, there are no approved RIPK1 inhibitors available. In this study, four machine learning algorithms were employed (random forest, extra trees, extreme gradient boosting and light gradient boosting machine) to predict small molecule inhibitors of RIPK1. The statistical metrics revealed similar performance and demonstrated outstanding predictive capabilities in all four models. Molecular docking and clustering analysis were employed to confirm six compounds that are structurally distinct from existing RIPK1 inhibitors. Subsequent molecular dynamics simulations were performed to evaluate the binding ability of these compounds. Utilizing the Shapley additive explanation (SHAP) method, the 1855 bit has been identified as the most significant molecular fingerprint fragment. The findings propose that these six small molecules exhibit promising potential for targeting RIPK1 in associated diseases. Notably, the identification of Cpd-1 small molecule (ZINC000085897746) from the Musa acuminate highlights its natural product origin, warranting further attention and investigation.
Collapse
Affiliation(s)
- Ji-Xiang Liu
- Institute of Theoretical Chemistry, College of Chemistry, Jilin University, Liutiao Road #2, Changchun 130021, China.
| | - Ri-Song Na
- Collaborative Innovation Center of Henan Grain Crops, National Key Laboratory of Wheat and Maize Crop Science, College of Plant Protection, Henan Agricultural University, Zhengzhou 450002, China
| | - Lian-Juan Yang
- Department of Medical Mycology, Shanghai Skin Disease Hospital, Tongji University School of Medicine, Shanghai, 200443, China
| | - Xu-Ri Huang
- Institute of Theoretical Chemistry, College of Chemistry, Jilin University, Liutiao Road #2, Changchun 130021, China.
| | - Xi Zhao
- Institute of Theoretical Chemistry, College of Chemistry, Jilin University, Liutiao Road #2, Changchun 130021, China.
| |
Collapse
|
34
|
Debelee TG. Skin Lesion Classification and Detection Using Machine Learning Techniques: A Systematic Review. Diagnostics (Basel) 2023; 13:3147. [PMID: 37835889 PMCID: PMC10572538 DOI: 10.3390/diagnostics13193147] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 09/22/2023] [Accepted: 09/24/2023] [Indexed: 10/15/2023] Open
Abstract
Skin lesions are essential for the early detection and management of a number of dermatological disorders. Learning-based methods for skin lesion analysis have drawn much attention lately because of improvements in computer vision and machine learning techniques. A review of the most-recent methods for skin lesion classification, segmentation, and detection is presented in this survey paper. The significance of skin lesion analysis in healthcare and the difficulties of physical inspection are discussed in this survey paper. The review of state-of-the-art papers targeting skin lesion classification is then covered in depth with the goal of correctly identifying the type of skin lesion from dermoscopic, macroscopic, and other lesion image formats. The contribution and limitations of various techniques used in the selected study papers, including deep learning architectures and conventional machine learning methods, are examined. The survey then looks into study papers focused on skin lesion segmentation and detection techniques that aimed to identify the precise borders of skin lesions and classify them accordingly. These techniques make it easier to conduct subsequent analyses and allow for precise measurements and quantitative evaluations. The survey paper discusses well-known segmentation algorithms, including deep-learning-based, graph-based, and region-based ones. The difficulties, datasets, and evaluation metrics particular to skin lesion segmentation are also discussed. Throughout the survey, notable datasets, benchmark challenges, and evaluation metrics relevant to skin lesion analysis are highlighted, providing a comprehensive overview of the field. The paper concludes with a summary of the major trends, challenges, and potential future directions in skin lesion classification, segmentation, and detection, aiming to inspire further advancements in this critical domain of dermatological research.
Collapse
Affiliation(s)
- Taye Girma Debelee
- Ethiopian Artificial Intelligence Institute, Addis Ababa 40782, Ethiopia;
- Department of Electrical and Computer Engineering, Addis Ababa Science and Technology University, Addis Ababa 16417, Ethiopia
| |
Collapse
|
35
|
Kuo CY, Kuo LJ, Lin YK. Artificial intelligence based system for predicting permanent stoma after sphincter saving operations. Sci Rep 2023; 13:16039. [PMID: 37749194 PMCID: PMC10519982 DOI: 10.1038/s41598-023-43211-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 09/21/2023] [Indexed: 09/27/2023] Open
Abstract
Although the goal of rectal cancer treatment is to restore gastrointestinal continuity, some patients with rectal cancer develop a permanent stoma (PS) after sphincter-saving operations. Although many studies have identified the risk factors and causes of PS, few have precisely predicted the probability of PS formation before surgery. To validate whether an artificial intelligence model can accurately predict PS formation in patients with rectal cancer after sphincter-saving operations. Patients with rectal cancer who underwent a sphincter-saving operation at Taipei Medical University Hospital between January 1, 2012, and December 31, 2021, were retrospectively included in this study. A machine learning technique was used to predict whether a PS would form after a sphincter-saving operation. We included 19 routinely available preoperative variables in the artificial intelligence analysis. To evaluate the efficiency of the model, 6 performance metrics were utilized: accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and area under the receiving operating characteristic curve. In our classification pipeline, the data were randomly divided into a training set (80% of the data) and a validation set (20% of the data). The artificial intelligence models were trained using the training dataset, and their performance was evaluated using the validation dataset. Synthetic minority oversampling was used to solve the data imbalance. A total of 428 patients were included, and the PS rate was 13.6% (58/428) in the training set. The logistic regression (LR), Gaussian Naïve Bayes (GNB), Extreme Gradient Boosting (XGB), Gradient Boosting (GB), random forest, decision tree and light gradient boosting machine (LightGBM) algorithms were employed. The accuracies of the logistic regression (LR), Gaussian Naïve Bayes (GNB), Extreme Gradient Boosting (XGB), Gradient Boosting (GB), random forest (RF), decision tree (DT) and light gradient boosting machine (LightGBM) models were 70%, 76%, 89%, 93%, 95%, 79% and 93%, respectively. The area under the receiving operating characteristic curve values were 0.79 for the LR model, 0.84 for the GNB, 0.95 for the XGB, 0.95 for the GB, 0.99 for the RF model, 0.79 for the DT model and 0.98 for the LightGBM model. The key predictors that were identified were the distance of the lesion from the anal verge, clinical N stage, age, sex, American Society of Anesthesiologists score, and preoperative albumin and carcinoembryonic antigen levels. Integration of artificial intelligence with available preoperative data can potentially predict stoma outcomes after sphincter-saving operations. Our model exhibited excellent predictive ability and can improve the process of obtaining informed consent.
Collapse
Affiliation(s)
- Chih-Yu Kuo
- Department of Surgery, Taipei Medical University Hospital, Taipei, Taiwan
| | - Li-Jen Kuo
- Division of Colorectal Surgery, Department of Surgery, Taipei Medical University Hospital, Taipei Medical University, Taipei, Taiwan
- Department of Surgery, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
- Taipei Cancer Center, Taipei Medical University, Taipei, Taiwan
| | - Yen-Kuang Lin
- Graduate Institute of Athletics and Coaching Science, National Taiwan Sport University, Taoyuan, Taiwan.
| |
Collapse
|
36
|
Wang DC, Xu WD, Qin Z, Fu L, Lan YY, Liu XY, Huang AF. Systemic lupus erythematosus with high disease activity identification based on machine learning. Inflamm Res 2023; 72:1909-1918. [PMID: 37725103 DOI: 10.1007/s00011-023-01793-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 08/22/2023] [Accepted: 08/28/2023] [Indexed: 09/21/2023] Open
Abstract
OBJECTIVE Clinical evaluation of systemic lupus erythematosus (SLE) disease activity is limited and inconsistent, and high disease activity significantly, seriously impacts on SLE patients. This study aims to generate a machine learning model to identify SLE patients with high disease activity. METHOD A total of 1014 SLE patients with low disease activity and 453 SLE patients with high disease activity were included. A total of 94 clinical, laboratory data and 17 meteorological indicators were collected. After data preprocessing, we use mutual information and multisurf to evaluate and select the importance of features. The selected features are used for machine learning modeling. Performance of the model is evaluated and verified by a series of binary classification indicators. RESULTS We screened out hematuria, proteinuria, pyuria, low complement, precipitation, sunlight and other features for model construction by integrated feature selection. After hyperparameter optimization, the LGB has the best performance (ROC: AUC = 0.930; PRC: AUC = 0.911, APS = 0.913; balance accuracy: 0.856), and the worst is the naive bayes (ROC: AUC = 0.849; PRC: AUC = 0.719, APS = 0.714; balance accuracy: 0.705). Finally, the selection of features has good consistency in the composite feature importance bar plot. CONCLUSION We identify SLE patients with high disease activity by a simple machine learning pipeline, especially the LGB model based on the characteristics of proteinuria, hematuria, pyuria and other feathers screened out by collective feature selection.
Collapse
Affiliation(s)
- Da-Cheng Wang
- Department of Evidence-Based Medicine, Southwest Medical University, 1 Xianglin Road, Luzhou, 646000, Sichuan, China
| | - Wang-Dong Xu
- Department of Evidence-Based Medicine, Southwest Medical University, 1 Xianglin Road, Luzhou, 646000, Sichuan, China.
| | - Zhen Qin
- Department of Rheumatology and Immunology, Affiliated Hospital of Southwest Medical University, 25 Taiping Road, Luzhou, 646000, Sichuan, China
| | - Lu Fu
- Laboratory Animal Center, Southwest Medical University, 1 Xianglin Road, Luzhou, 646000, Sichuan, China
| | - You-Yu Lan
- Department of Rheumatology and Immunology, Affiliated Hospital of Southwest Medical University, 25 Taiping Road, Luzhou, 646000, Sichuan, China
| | - Xiao-Yan Liu
- Department of Evidence-Based Medicine, Southwest Medical University, 1 Xianglin Road, Luzhou, 646000, Sichuan, China
| | - An-Fang Huang
- Department of Rheumatology and Immunology, Affiliated Hospital of Southwest Medical University, 25 Taiping Road, Luzhou, 646000, Sichuan, China.
| |
Collapse
|
37
|
Shi Y, Zhang G, Ma C, Xu J, Xu K, Zhang W, Wu J, Xu L. Machine learning algorithms to predict intraoperative hemorrhage in surgical patients: a modeling study of real-world data in Shanghai, China. BMC Med Inform Decis Mak 2023; 23:156. [PMID: 37563676 PMCID: PMC10416513 DOI: 10.1186/s12911-023-02253-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 07/31/2023] [Indexed: 08/12/2023] Open
Abstract
BACKGROUND Prediction tools for various intraoperative bleeding events remain scarce. We aim to develop machine learning-based models and identify the most important predictors by real-world data from electronic medical records (EMRs). METHODS An established database of surgical inpatients in Shanghai was utilized for analysis. A total of 51,173 inpatients were assessed for eligibility. 48,543 inpatients were obtained in the dataset and patients were divided into haemorrhage (N = 9728) and without-haemorrhage (N = 38,815) groups according to their bleeding during the procedure. Candidate predictors were selected from 27 variables, including sex (N = 48,543), age (N = 48,543), BMI (N = 48,543), renal disease (N = 26), heart disease (N = 1309), hypertension (N = 9579), diabetes (N = 4165), coagulopathy (N = 47), and other features. The models were constructed by 7 machine learning algorithms, i.e., light gradient boosting (LGB), extreme gradient boosting (XGB), cathepsin B (CatB), Ada-boosting of decision tree (AdaB), logistic regression (LR), long short-term memory (LSTM), and multilayer perception (MLP). An area under the receiver operating characteristic curve (AUC) was used to evaluate the model performance. RESULTS The mean age of the inpatients was 53 ± 17 years, and 57.5% were male. LGB showed the best predictive performance for intraoperative bleeding combining multiple indicators (AUC = 0.933, sensitivity = 0.87, specificity = 0.85, accuracy = 0.87) compared with XGB, CatB, AdaB, LR, MLP and LSTM. The three most important predictors identified by LGB were operative time, D-dimer (DD), and age. CONCLUSIONS We proposed LGB as the best Gradient Boosting Decision Tree (GBDT) algorithm for the evaluation of intraoperative bleeding. It is considered a simple and useful tool for predicting intraoperative bleeding in clinical settings. Operative time, DD, and age should receive attention.
Collapse
Affiliation(s)
- Ying Shi
- Hongqiao International Institute of Medicine, Tongren Hospital, Shanghai Jiao Tong University School of Medicine, 1111 XianXia Road, Shanghai, 200336, China
| | - Guangming Zhang
- Department of Anesthesiology, Tongren Hospital, Shanghai Jiao Tong University School of Medicine, 1111 XianXia Road, Shanghai, 200336, China
| | - Chiye Ma
- Shanghai Institute of Computing Technology, 546 YuYuan Road, Shanghai, 200040, China
| | - Jiading Xu
- Shanghai Institute of Computing Technology, 546 YuYuan Road, Shanghai, 200040, China
| | - Kejia Xu
- Department of Anesthesiology, Tongren Hospital, Shanghai Jiao Tong University School of Medicine, 1111 XianXia Road, Shanghai, 200336, China
| | - Wenyi Zhang
- Department of Anesthesiology, Tongren Hospital, Shanghai Jiao Tong University School of Medicine, 1111 XianXia Road, Shanghai, 200336, China
| | - Jianren Wu
- Shanghai Institute of Computing Technology, 546 YuYuan Road, Shanghai, 200040, China
| | - Liling Xu
- Hongqiao International Institute of Medicine, Tongren Hospital, Shanghai Jiao Tong University School of Medicine, 1111 XianXia Road, Shanghai, 200336, China.
| |
Collapse
|
38
|
Su Y, Huang C, Zhu W, Lyu X, Ji F. Multi-party Diabetes Mellitus risk prediction based on secure federated learning. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2023.104881] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
|
39
|
Hashizume T, Ozawa Y, Ying BW. Employing active learning in the optimization of culture medium for mammalian cells. NPJ Syst Biol Appl 2023; 9:20. [PMID: 37253825 DOI: 10.1038/s41540-023-00284-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Accepted: 05/18/2023] [Indexed: 06/01/2023] Open
Abstract
Medium optimization is a crucial step during cell culture for biopharmaceutics and regenerative medicine; however, this step remains challenging, as both media and cells are highly complex systems. Here, we addressed this issue by employing active learning. Specifically, we introduced machine learning to cell culture experiments to optimize culture medium. The cell line HeLa-S3 and the gradient-boosting decision tree algorithm were used to find optimized media as pilot studies. To acquire the training data, cell culture was performed in a large variety of medium combinations. The cellular NAD(P)H abundance, represented as A450, was used to indicate the goodness of culture media. In active learning, regular and time-saving modes were developed using culture data at 168 h and 96 h, respectively. Both modes successfully fine-tuned 29 components to generate a medium for improved cell culture. Intriguingly, the two modes provided different predictions for the concentrations of vitamins and amino acids, and a significant decrease was commonly predicted for fetal bovine serum (FBS) compared to the commercial medium. In addition, active learning-assisted medium optimization significantly increased the cellular concentration of NAD(P)H, an active chemical with a constant abundance in living cells. Our study demonstrated the efficiency and practicality of active learning for medium optimization and provided valuable information for employing machine learning technology in cell biology experiments.
Collapse
Affiliation(s)
- Takamasa Hashizume
- School of Life and Environmental Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, 305-8572, Ibaraki, Japan
| | - Yuki Ozawa
- School of Life and Environmental Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, 305-8572, Ibaraki, Japan
| | - Bei-Wen Ying
- School of Life and Environmental Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, 305-8572, Ibaraki, Japan.
| |
Collapse
|
40
|
An Ensemble of Light Gradient Boosting Machine and Adaptive Boosting for Prediction of Type-2 Diabetes. INT J COMPUT INT SYS 2023. [DOI: 10.1007/s44196-023-00184-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2023] Open
Abstract
AbstractMachine learning helps construct predictive models in clinical data analysis, predicting stock prices, picture recognition, financial modelling, disease prediction, and diagnostics. This paper proposes machine learning ensemble algorithms to forecast diabetes. The ensemble combines k-NN, Naive Bayes (Gaussian), Random Forest (RF), Adaboost, and a recently designed Light Gradient Boosting Machine. The proposed ensembles inherit detection ability of LightGBM to boost accuracy. Under fivefold cross-validation, the proposed ensemble models perform better than other recent models. The k-NN, Adaboost, and LightGBM jointly achieve 90.76% detection accuracy. The receiver operating curve analysis shows that $$k$$
k
-NN, RF, and LightGBM successfully solve class imbalance issue of the underlying dataset.
Collapse
|
41
|
Su Y, Huang C, Yin W, Lyu X, Ma L, Tao Z. Diabetes Mellitus risk prediction using age adaptation models. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
42
|
Huang YH, Xie C, Chou CY, Jin Y, Li W, Wang M, Lu Y, Liu Z. Subtyping intractable functional constipation in children using clinical and laboratory data in a classification model. Front Pediatr 2023; 11:1148753. [PMID: 37168808 PMCID: PMC10165123 DOI: 10.3389/fped.2023.1148753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 04/03/2023] [Indexed: 05/13/2023] Open
Abstract
Background Children with intractable functional constipation (IFC) who are refractory to traditional pharmacological intervention develop severe symptoms that can persist even in adulthood, resulting in a substantial deterioration in their quality of life. In order to better manage IFC patients, efficient subtyping of IFC into its three subtypes, normal transit constipation (NTC), outlet obstruction constipation (OOC), and slow transit constipation (STC), at early stages is crucial. With advancements in technology, machine learning can classify IFC early through the use of validated questionnaires and the different serum concentrations of gastrointestinal motility-related hormones. Method A hundred and one children with IFC and 50 controls were enrolled in this study. Three supervised machine-learning methods, support vector machine, random forest, and light gradient boosting machine (LGBM), were used to classify children with IFC into the three subtypes based on their symptom severity, self-efficacy, and quality of life which were quantified using certified questionnaires and their serum concentrations of the gastrointestinal hormones evaluated with enzyme-linked immunosorbent assay. The accuracy of machine learning subtyping was evaluated with respect to radiopaque markers. Results Of 101 IFC patients, 37 had NTC, 49 had OOC, and 15 had STC. The variables significant for IFC subtype classification, according to SelectKBest, were stool frequency, the satisfaction domain of the Patient Assessment of Constipation Quality of Life questionnaire (PAC-QOL), the emotional self-efficacy for Functional Constipation questionnaire (SEFCQ), motilin serum concentration, and vasoactive intestinal peptide serum concentration. Among the three models, the LGBM model demonstrated an accuracy of 83.8%, a precision of 84.5%, a recall of 83.6%, a f1-score of 83.4%, and an area under the receiver operating characteristic curve (AUROC) of 0.89 in discriminating IFC subtypes. Conclusion Using clinical characteristics measured by certified questionnaires and serum concentrations of the gastrointestinal hormones, machine learning can efficiently classify pediatric IFC into its three subtypes. Of the three models tested, the LGBM model is the most accurate model for the classification of IFC, with an accuracy of 83.8%, demonstrating that machine learning is an efficient tool for the management of IFC in children.
Collapse
Affiliation(s)
- Yi-Hsuan Huang
- Department of Gastroenterology, Children’s Hospital of Nanjing Medical University, Nanjing, China
- Medical School, Nanjing University, Nanjing, China
| | - Chenjia Xie
- School of Electronic Science and Engineering, Nanjing University, Nanjing, China
| | - Chih-Yi Chou
- College of Medicine, National Taiwan University, Taipei, Taiwan
| | - Yu Jin
- Department of Gastroenterology, Children’s Hospital of Nanjing Medical University, Nanjing, China
- Medical School, Nanjing University, Nanjing, China
| | - Wei Li
- Department of Gastroenterology, Children’s Hospital of Nanjing Medical University, Nanjing, China
- Department of Quality Management, Children's Hospital of Nanjing Medical University, Nanjing, China
| | - Meng Wang
- Department of Gastroenterology, Children’s Hospital of Nanjing Medical University, Nanjing, China
| | - Yan Lu
- Department of Gastroenterology, Children’s Hospital of Nanjing Medical University, Nanjing, China
- Correspondence: Yan Lu Zhifeng Liu
| | - Zhifeng Liu
- Department of Gastroenterology, Children’s Hospital of Nanjing Medical University, Nanjing, China
- Medical School, Nanjing University, Nanjing, China
- Correspondence: Yan Lu Zhifeng Liu
| |
Collapse
|
43
|
Wang T, Yan Y, Xiang S, Tan J, Yang C, Zhao W. A comparative study of antihypertensive drugs prediction models for the elderly based on machine learning algorithms. Front Cardiovasc Med 2022; 9:1056263. [PMID: 36531716 PMCID: PMC9753549 DOI: 10.3389/fcvm.2022.1056263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Accepted: 11/17/2022] [Indexed: 11/04/2023] Open
Abstract
Background Globally, blood pressure management strategies were ineffective, and a low percentage of patients receiving hypertension treatment had their blood pressure controlled. In this study, we aimed to build a medication prediction model by correlating patient attributes with medications to help physicians quickly and rationally match appropriate medications. Methods We collected clinical data from elderly hypertensive patients during hospitalization and combined statistical methods and machine learning (ML) algorithms to filter out typical indicators. We constructed five ML models to evaluate all datasets using 5-fold cross-validation. Include random forest (RF), support vector machine (SVM), light gradient boosting machine (LightGBM), artificial neural network (ANN), and naive Bayes (NB) models. And the performance of the models was evaluated using the micro-F1 score. Results Our experiments showed that by statistical methods and ML algorithms for feature selection, we finally selected Age, SBP, DBP, Lymph, RBC, HCT, MCHC, PLT, AST, TBIL, Cr, UA, Urea, K, Na, Ga, TP, GLU, TC, TG, γ-GT, Gender, HTN CAD, and RI as feature metrics of the models. LightGBM had the best prediction performance with the micro-F1 of 78.45%, which was higher than the other four models. Conclusion LightGBM model has good results in predicting antihypertensive medication regimens, and the model can be beneficial in improving the personalization of hypertension treatment.
Collapse
Affiliation(s)
- Tiantian Wang
- School of Medical Informatics, Chongqing Medical University, Chongqing, China
| | - Yongjie Yan
- Medical Records and Statistics Office, The Third Affiliated Hospital of Army Medical University, Chongqing, China
| | - Shoushu Xiang
- Medical Records and Statistics Room, Affiliated Banan Hospital of Chongqing Medical University, Chongqing, China
| | - Juntao Tan
- Operation Management Office, Affiliated Banan Hospital of Chongqing Medical University, Chongqing, China
| | - Chen Yang
- School of Medical Informatics, Chongqing Medical University, Chongqing, China
| | - Wenlong Zhao
- School of Medical Informatics, Chongqing Medical University, Chongqing, China
| |
Collapse
|
44
|
Li M, Chen H, Zhang H, Zeng M, Chen B, Guan L. Prediction of the Aqueous Solubility of Compounds Based on Light Gradient Boosting Machines with Molecular Fingerprints and the Cuckoo Search Algorithm. ACS OMEGA 2022; 7:42027-42035. [PMID: 36440111 PMCID: PMC9685740 DOI: 10.1021/acsomega.2c03885] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Accepted: 10/18/2022] [Indexed: 06/16/2023]
Abstract
Aqueous solubility is one of the most important physicochemical properties in drug discovery. At present, the prediction of aqueous solubility of compounds is still a challenging problem. Machine learning has shown great potential in solubility prediction. Most machine learning models largely rely on the setting of hyperparameters, and their performance can be improved by setting the hyperparameters in a better way. In this paper, we used MACCS fingerprints to represent the structural features and optimized the hyperparameters of the light gradient boosting machine (LightGBM) with the cuckoo search algorithm (CS). Based on the above representation and optimization, the CS-LightGBM model was established to predict the aqueous solubility of 2446 organic compounds and the obtained prediction results were compared with those obtained with the other six different machine learning models (RF, GBDT, XGBoost, LightGBM, SVR, and BO-LightGBM). The comparison results showed that the CS-LightGBM model had a better prediction performance than the other six different models. RMSE, MAE, and R 2 of the CS-LightGBM model were, respectively, 0.7785, 0.5117, and 0.8575. In addition, this model has good scalability and can be used to solve solubility prediction problems in other fields such as solvent selection and drug screening.
Collapse
|
45
|
Wang L, Wu M, Zhu C, Li R, Bao S, Yang S, Dong J. Ensemble learning based on efficient features combination can predict the outcome of recurrence-free survival in patients with hepatocellular carcinoma within three years after surgery. Front Oncol 2022; 12:1019009. [PMID: 36439437 PMCID: PMC9686395 DOI: 10.3389/fonc.2022.1019009] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Accepted: 10/25/2022] [Indexed: 04/11/2024] Open
Abstract
Preoperative prediction of recurrence outcome in hepatocellular carcinoma (HCC) facilitates physicians' clinical decision-making. Preoperative imaging and related clinical baseline data of patients are valuable for evaluating prognosis. With the widespread application of machine learning techniques, the present study proposed the ensemble learning method based on efficient feature representations to predict recurrence outcomes within three years after surgery. Radiomics features during arterial phase (AP) and clinical data were selected for training the ensemble models. In order to improve the efficiency of the process, the lesion area was automatically segmented by 3D U-Net. It was found that the mIoU of the segmentation model was 0.8874, and the Light Gradient Boosting Machine (LightGBM) was the most superior, with an average accuracy of 0.7600, a recall of 0.7673, a F1 score of 0.7553, and an AUC of 0.8338 when inputting radiomics features during AP and clinical baseline indicators. Studies have shown that the proposed strategy can relatively accurately predict the recurrence outcome within three years, which is helpful for physicians to evaluate individual patients before surgery.
Collapse
Affiliation(s)
- Liyang Wang
- School of Clinical Medicine, Tsinghua University, Beijing, China
| | - Meilong Wu
- Division of Hepatobiliary and Pancreas Surgery, Department of General Surgery, Shenzhen People’s Hospital (The Second Clinical Medical College, Jinan University; The First Affiliated Hospital, Southern University of Science and Technology), Shenzhen, Guangdong, China
| | - Chengzhan Zhu
- Department of Pediatric Surgery, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Rui Li
- Department of Pediatric Surgery, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Shiyun Bao
- Division of Hepatobiliary and Pancreas Surgery, Department of General Surgery, Shenzhen People’s Hospital (The Second Clinical Medical College, Jinan University; The First Affiliated Hospital, Southern University of Science and Technology), Shenzhen, Guangdong, China
| | - Shizhong Yang
- Hepato-pancreato-biliary Center, Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Tsing-hua University, Beijing, China
| | - Jiahong Dong
- School of Clinical Medicine, Tsinghua University, Beijing, China
- Hepato-pancreato-biliary Center, Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Tsing-hua University, Beijing, China
| |
Collapse
|
46
|
Kline A, Wang H, Li Y, Dennis S, Hutch M, Xu Z, Wang F, Cheng F, Luo Y. Multimodal machine learning in precision health: A scoping review. NPJ Digit Med 2022; 5:171. [PMID: 36344814 PMCID: PMC9640667 DOI: 10.1038/s41746-022-00712-8] [Citation(s) in RCA: 83] [Impact Index Per Article: 41.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 10/14/2022] [Indexed: 11/09/2022] Open
Abstract
Machine learning is frequently being leveraged to tackle problems in the health sector including utilization for clinical decision-support. Its use has historically been focused on single modal data. Attempts to improve prediction and mimic the multimodal nature of clinical expert decision-making has been met in the biomedical field of machine learning by fusing disparate data. This review was conducted to summarize the current studies in this field and identify topics ripe for future research. We conducted this review in accordance with the PRISMA extension for Scoping Reviews to characterize multi-modal data fusion in health. Search strings were established and used in databases: PubMed, Google Scholar, and IEEEXplore from 2011 to 2021. A final set of 128 articles were included in the analysis. The most common health areas utilizing multi-modal methods were neurology and oncology. Early fusion was the most common data merging strategy. Notably, there was an improvement in predictive performance when using data fusion. Lacking from the papers were clear clinical deployment strategies, FDA-approval, and analysis of how using multimodal approaches from diverse sub-populations may improve biases and healthcare disparities. These findings provide a summary on multimodal data fusion as applied to health diagnosis/prognosis problems. Few papers compared the outputs of a multimodal approach with a unimodal prediction. However, those that did achieved an average increase of 6.4% in predictive accuracy. Multi-modal machine learning, while more robust in its estimations over unimodal methods, has drawbacks in its scalability and the time-consuming nature of information concatenation.
Collapse
Affiliation(s)
- Adrienne Kline
- Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA
| | - Hanyin Wang
- Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA
| | - Yikuan Li
- Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA
| | - Saya Dennis
- Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA
| | - Meghan Hutch
- Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA
| | - Zhenxing Xu
- Department of Population Health Sciences, Cornell University, New York, 10065, NY, USA
| | - Fei Wang
- Department of Population Health Sciences, Cornell University, New York, 10065, NY, USA
| | - Feixiong Cheng
- Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, 44195, OH, USA
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA.
| |
Collapse
|
47
|
Wang J, Zheng T, Liao Y, Geng S, Li J, Zhang Z, Shang D, Liu C, Yu P, Huang Y, Liu C, Liu Y, Liu S, Wang M, Liu D, Miao H, Li S, Zhang B, Huang A, Zhang Y, Qi X, Chen S. Machine learning prediction model for post- hepatectomy liver failure in hepatocellular carcinoma: A multicenter study. Front Oncol 2022; 12:986867. [PMID: 36408144 PMCID: PMC9667038 DOI: 10.3389/fonc.2022.986867] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Accepted: 10/14/2022] [Indexed: 09/16/2023] Open
Abstract
Introduction Post-hepatectomy liver failure (PHLF) is one of the most serious complications and causes of death in patients with hepatocellular carcinoma (HCC) after hepatectomy. This study aimed to develop a novel machine learning (ML) model based on the light gradient boosting machines (LightGBM) algorithm for predicting PHLF. Methods A total of 875 patients with HCC who underwent hepatectomy were randomized into a training cohort (n=612), a validation cohort (n=88), and a testing cohort (n=175). Shapley additive explanation (SHAP) was performed to determine the importance of individual variables. By combining these independent risk factors, an ML model for predicting PHLF was established. The area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value, negative predictive value, and decision curve analyses (DCA) were used to evaluate the accuracy of the ML model and compare it to that of other noninvasive models. Results The AUCs of the ML model for predicting PHLF in the training cohort, validation cohort, and testing cohort were 0.944, 0.870, and 0.822, respectively. The ML model had a higher AUC for predicting PHLF than did other non-invasive models. The ML model for predicting PHLF was found to be more valuable than other noninvasive models. Conclusion A novel ML model for the prediction of PHLF using common clinical parameters was constructed and validated. The novel ML model performed better than did existing noninvasive models for the prediction of PHLF.
Collapse
Affiliation(s)
- Jitao Wang
- Xingtai Key Laboratory of Precision Medicine for Liver Cirrhosis and Portal Hypertension, Xingtai People’s Hospital, Xingtai, Hebei, China
- Center of Portal Hypertension, Department of Radiology, Zhongda Hospital, Medical School, Southeast University, Nanjing, Jiangsu, China
| | - Tianlei Zheng
- Artificial Intelligence Unit, Department of Medical Equipment Management, Affiliated Hospital of Xuzhou Medical University, Xuzhou, Jiangsu, China
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, Jiangsu, China
| | - Yong Liao
- Xingtai Key Laboratory of Precision Medicine for Liver Cirrhosis and Portal Hypertension, Xingtai People’s Hospital, Xingtai, Hebei, China
| | - Shi Geng
- Artificial Intelligence Unit, Department of Medical Equipment Management, Affiliated Hospital of Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Jinlong Li
- Xingtai Key Laboratory of Precision Medicine for Liver Cirrhosis and Portal Hypertension, Xingtai People’s Hospital, Xingtai, Hebei, China
| | - Zhanguo Zhang
- Department of Hepatobiliary Surgery, Tongji Hospital Affiliated to Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Dong Shang
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Dalian Medical University, Dalian, Liaoning, China
| | - Chengyu Liu
- Xingtai Key Laboratory of Precision Medicine for Liver Cirrhosis and Portal Hypertension, Xingtai People’s Hospital, Xingtai, Hebei, China
| | - Peng Yu
- Department of Hepatobiliary Surgery, Fifth Medical Center of People's Liberation Army (PLA) General Hospital, Beijing, China
| | - Yifei Huang
- Institute of Portal Hypertension, The First Hospital of Lanzhou University, Lanzhou, China
| | - Chuan Liu
- Center of Portal Hypertension, Department of Radiology, Zhongda Hospital, Medical School, Southeast University, Nanjing, Jiangsu, China
| | - Yanna Liu
- Department of Microbiology and Infectious Disease Center, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China
| | - Shanghao Liu
- Center of Portal Hypertension, Department of Radiology, Zhongda Hospital, Medical School, Southeast University, Nanjing, Jiangsu, China
| | - Mingguang Wang
- Center of Portal Hypertension, Department of Radiology, Zhongda Hospital, Medical School, Southeast University, Nanjing, Jiangsu, China
| | - Dengxiang Liu
- Center of Portal Hypertension, Department of Radiology, Zhongda Hospital, Medical School, Southeast University, Nanjing, Jiangsu, China
| | - Hongrui Miao
- Department of Hepatobiliary Surgery, Tongji Hospital Affiliated to Huazhong University of Science and Technology, Wuhan, Hubei, China
| | - Shuang Li
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Dalian Medical University, Dalian, Liaoning, China
| | - Biao Zhang
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Dalian Medical University, Dalian, Liaoning, China
| | - Anliang Huang
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Dalian Medical University, Dalian, Liaoning, China
| | - Yewei Zhang
- Department of Hepatobiliary Surgery, The Second Affiliated Hospital of Nanjing Medical University, Nanjing, Jiangsu, China
| | - Xiaolong Qi
- Center of Portal Hypertension, Department of Radiology, Zhongda Hospital, Medical School, Southeast University, Nanjing, Jiangsu, China
| | - Shubo Chen
- Xingtai Key Laboratory of Precision Medicine for Liver Cirrhosis and Portal Hypertension, Xingtai People’s Hospital, Xingtai, Hebei, China
| |
Collapse
|
48
|
Kim YJ. Machine Learning Model Based on Radiomic Features for Differentiation between COVID-19 and Pneumonia on Chest X-ray. SENSORS (BASEL, SWITZERLAND) 2022; 22:6709. [PMID: 36081170 PMCID: PMC9460643 DOI: 10.3390/s22176709] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 08/20/2022] [Accepted: 09/02/2022] [Indexed: 06/15/2023]
Abstract
Machine learning approaches are employed to analyze differences in real-time reverse transcription polymerase chain reaction scans to differentiate between COVID-19 and pneumonia. However, these methods suffer from large training data requirements, unreliable images, and uncertain clinical diagnosis. Thus, in this paper, we used a machine learning model to differentiate between COVID-19 and pneumonia via radiomic features using a bias-minimized dataset of chest X-ray scans. We used logistic regression (LR), naive Bayes (NB), support vector machine (SVM), k-nearest neighbor (KNN), bagging, random forest (RF), extreme gradient boosting (XGB), and light gradient boosting machine (LGBM) to differentiate between COVID-19 and pneumonia based on training data. Further, we used a grid search to determine optimal hyperparameters for each machine learning model and 5-fold cross-validation to prevent overfitting. The identification performances of COVID-19 and pneumonia were compared with separately constructed test data for four machine learning models trained using the maximum probability, contrast, and difference variance of the gray level co-occurrence matrix (GLCM), and the skewness as input variables. The LGBM and bagging model showed the highest and lowest performances; the GLCM difference variance showed a high overall effect in all models. Thus, we confirmed that the radiomic features in chest X-rays can be used as indicators to differentiate between COVID-19 and pneumonia using machine learning.
Collapse
Affiliation(s)
- Young Jae Kim
- Department of Biomedical Engineering, Gachon University, 21, Namdong-daero 774 beon-gil, Namdong-gu, Inchon 21936, Korea
| |
Collapse
|
49
|
Machine learning of microvolt-level 12-lead electrocardiogram can help distinguish takotsubo syndrome and acute anterior myocardial infarction. CARDIOVASCULAR DIGITAL HEALTH JOURNAL 2022; 3:179-188. [PMID: 36046427 PMCID: PMC9422059 DOI: 10.1016/j.cvdhj.2022.07.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Background Methods Results Conclusion
Collapse
|
50
|
Liu Q, Zhang M, He Y, Zhang L, Zou J, Yan Y, Guo Y. Predicting the Risk of Incident Type 2 Diabetes Mellitus in Chinese Elderly Using Machine Learning Techniques. J Pers Med 2022; 12:jpm12060905. [PMID: 35743691 PMCID: PMC9224915 DOI: 10.3390/jpm12060905] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Revised: 05/21/2022] [Accepted: 05/27/2022] [Indexed: 02/04/2023] Open
Abstract
Early identification of individuals at high risk of diabetes is crucial for implementing early intervention strategies. However, algorithms specific to elderly Chinese adults are lacking. The aim of this study is to build effective prediction models based on machine learning (ML) for the risk of type 2 diabetes mellitus (T2DM) in Chinese elderly. A retrospective cohort study was conducted using the health screening data of adults older than 65 years in Wuhan, China from 2018 to 2020. With a strict data filtration, 127,031 records from the eligible participants were utilized. Overall, 8298 participants were diagnosed with incident T2DM during the 2-year follow-up (2019–2020). The dataset was randomly split into training set (n = 101,625) and test set (n = 25,406). We developed prediction models based on four ML algorithms: logistic regression (LR), decision tree (DT), random forest (RF), and extreme gradient boosting (XGBoost). Using LASSO regression, 21 prediction features were selected. The Random under-sampling (RUS) was applied to address the class imbalance, and the Shapley Additive Explanations (SHAP) was used to calculate and visualize feature importance. Model performance was evaluated by the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and accuracy. The XGBoost model achieved the best performance (AUC = 0.7805, sensitivity = 0.6452, specificity = 0.7577, accuracy = 0.7503). Fasting plasma glucose (FPG), education, exercise, gender, and waist circumference (WC) were the top five important predictors. This study showed that XGBoost model can be applied to screen individuals at high risk of T2DM in the early phrase, which has the strong potential for intelligent prevention and control of diabetes. The key features could also be useful for developing targeted diabetes prevention interventions.
Collapse
Affiliation(s)
- Qing Liu
- Department of Epidemiology, School of Public Health, Wuhan University, Wuhan 430071, China; (Q.L.); (M.Z.)
| | - Miao Zhang
- Department of Epidemiology, School of Public Health, Wuhan University, Wuhan 430071, China; (Q.L.); (M.Z.)
| | - Yifeng He
- School of Geodesy and Geomatics, Wuhan University, Wuhan 430079, China; (Y.H.); (J.Z.)
| | - Lei Zhang
- School of Mathematics and Statistics, Wuhan University, Wuhan 430070, China;
| | - Jingui Zou
- School of Geodesy and Geomatics, Wuhan University, Wuhan 430079, China; (Y.H.); (J.Z.)
| | - Yaqiong Yan
- Wuhan Center for Disease Control and Prevention, Wuhan 430015, China;
| | - Yan Guo
- Wuhan Center for Disease Control and Prevention, Wuhan 430015, China;
- Correspondence:
| |
Collapse
|