1
|
Chen CC, Ting WC, Lee HC, Chang CC, Lin TC, Yang SF. A Cost-Effective Model for Predicting Recurrent Gastric Cancer Using Clinical Features. Diagnostics (Basel) 2024; 14:842. [PMID: 38667487 PMCID: PMC11049390 DOI: 10.3390/diagnostics14080842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 04/14/2024] [Accepted: 04/15/2024] [Indexed: 04/28/2024] Open
Abstract
This study used artificial intelligence techniques to identify clinical cancer biomarkers for recurrent gastric cancer survivors. From a hospital-based cancer registry database in Taiwan, the datasets of the incidence of recurrence and clinical risk features were included in 2476 gastric cancer survivors. We benchmarked Random Forest using MLP, C4.5, AdaBoost, and Bagging algorithms on metrics and leveraged the synthetic minority oversampling technique (SMOTE) for imbalanced dataset issues, cost-sensitive learning for risk assessment, and SHapley Additive exPlanations (SHAPs) for feature importance analysis in this study. Our proposed Random Forest outperformed the other models with an accuracy of 87.9%, a recall rate of 90.5%, an accuracy rate of 86%, and an F1 of 88.2% on the recurrent category by a 10-fold cross-validation in a balanced dataset. We identified clinical features of recurrent gastric cancer, which are the top five features, stage, number of regional lymph node involvement, Helicobacter pylori, BMI (body mass index), and gender; these features significantly affect the prediction model's output and are worth paying attention to in the following causal effect analysis. Using an artificial intelligence model, the risk factors for recurrent gastric cancer could be identified and cost-effectively ranked according to their feature importance. In addition, they should be crucial clinical features to provide physicians with the knowledge to screen high-risk patients in gastric cancer survivors as well.
Collapse
Affiliation(s)
- Chun-Chia Chen
- Institute of Medicine, Chung Shan Medical University, Taichung 40201, Taiwan; (C.-C.C.); (S.-F.Y.)
- Division of Plastic Surgery, Department of Surgery, Chi Mei Medical Center, Tainan 704, Taiwan
- Division of Colorectal Surgery, Department of Surgery, Chung Shan Medical University Hospital, Taichung 40201, Taiwan;
| | - Wen-Chien Ting
- Division of Colorectal Surgery, Department of Surgery, Chung Shan Medical University Hospital, Taichung 40201, Taiwan;
- School of Medicine, Chung Shan Medical University, Taichung 40201, Taiwan
| | - Hsi-Chieh Lee
- Department of Computer Science and Information Engineering, National Quemoy University, Kinmen County 892, Taiwan;
| | - Chi-Chang Chang
- School of Medical Informatics, Chung Shan Medical University & IT Office, Chung Shan Medical University Hospital, Taichung 40201, Taiwan
- Department of Information Management, Ming Chuan University, Taoyuan City 33300, Taiwan
| | - Tsung-Chieh Lin
- Department of Computer Science and Information Engineering, National Quemoy University, Kinmen County 892, Taiwan;
| | - Shun-Fa Yang
- Institute of Medicine, Chung Shan Medical University, Taichung 40201, Taiwan; (C.-C.C.); (S.-F.Y.)
| |
Collapse
|
2
|
Wang CK, Chang CY, Chu TW, Liang YJ. Using Machine Learning to Identify the Relationships between Demographic, Biochemical, and Lifestyle Parameters and Plasma Vitamin D Concentration in Healthy Premenopausal Chinese Women. Life (Basel) 2023; 13:2257. [PMID: 38137858 PMCID: PMC10744461 DOI: 10.3390/life13122257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 11/15/2023] [Accepted: 11/22/2023] [Indexed: 12/24/2023] Open
Abstract
INTRODUCTION Vitamin D plays a vital role in maintaining homeostasis and enhancing the absorption of calcium, an essential component for strengthening bones and preventing osteoporosis. There are many factors known to relate to plasma vitamin D concentration (PVDC). However, most of these studies were performed with traditional statistical methods. Nowadays, machine learning methods (Mach-L) have become new tools in medical research. In the present study, we used four Mach-L methods to explore the relationships between PVDC and demographic, biochemical, and lifestyle factors in a group of healthy premenopausal Chinese women. Our goals were as follows: (1) to evaluate and compare the predictive accuracy of Mach-L and MLR, and (2) to establish a hierarchy of the significance of the aforementioned factors related to PVDC. METHODS Five hundred ninety-three healthy Chinese women were enrolled. In total, there were 35 variables recorded, including demographic, biochemical, and lifestyle information. The dependent variable was 25-OH vitamin D (PVDC), and all other variables were the independent variables. Multiple linear regression (MLR) was regarded as the benchmark for comparison. Four Mach-L methods were applied (random forest (RF), stochastic gradient boosting (SGB), extreme gradient boosting (XGBoost), and elastic net). Each method would produce several estimation errors. The smaller these errors were, the better the model was. RESULTS Pearson's correlation, age, glycated hemoglobin, HDL-cholesterol, LDL-cholesterol, and hemoglobin were positively correlated to PVDC, whereas eGFR was negatively correlated to PVDC. The Mach-L methods yielded smaller estimation errors for all five parameters, which indicated that they were better methods than the MLR model. After averaging the importance percentage from the four Mach-L methods, a rank of importance could be obtained. Age was the most important factor, followed by plasma insulin level, TSH, spouse status, LDH, and ALP. CONCLUSIONS In a healthy Chinese premenopausal cohort using four different Mach-L methods, age was found to be the most important factor related to PVDC, followed by plasma insulin level, TSH, spouse status, LDH, and ALP.
Collapse
Affiliation(s)
- Chun-Kai Wang
- Department of Obstetrics and Gynecology, Zuoying Branch of Kaohsiung Armed Forces General Hospital, Kaohsiung 813, Taiwan;
| | - Ching-Yao Chang
- Graduate Institute of Applied Science and Engineering, Fu Jen Catholic University, New Taipei City 242, Taiwan;
| | - Ta-Wei Chu
- Department of Obstetrics and Gynecology, Tri-Service General Hospital, National Defense Medical Center, Chief Executive Officer’s Office, MJ Health Research Foundation, Taipei 114, Taiwan;
| | - Yao-Jen Liang
- Graduate Institute of Applied Science and Engineering, Fu Jen Catholic University, New Taipei City 242, Taiwan;
| |
Collapse
|
3
|
Tzou SJ, Peng CH, Huang LY, Chen FY, Kuo CH, Wu CZ, Chu TW. Comparison between linear regression and four different machine learning methods in selecting risk factors for osteoporosis in a Chinese female aged cohort. J Chin Med Assoc 2023; 86:1028-1036. [PMID: 37729604 DOI: 10.1097/jcma.0000000000000999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 09/22/2023] Open
Abstract
BACKGROUND Population aging is emerging as an increasingly acute challenge for countries around the world. One particular manifestation of this phenomenon is the impact of osteoporosis on individuals and national health systems. Previous studies of risk factors for osteoporosis were conducted using traditional statistical methods, but more recent efforts have turned to machine learning approaches. Most such efforts, however, treat the target variable (bone mineral density [BMD] or fracture rate) as a categorical one, which provides no quantitative information. The present study uses five different machine learning methods to analyze the risk factors for T-score of BMD, seeking to (1) compare the prediction accuracy between different machine learning methods and traditional multiple linear regression (MLR) and (2) rank the importance of 25 different risk factors. METHODS The study sample includes 24 412 women older than 55 years with 25 related variables, applying traditional MLR and five different machine learning methods: classification and regression tree, Naïve Bayes, random forest, stochastic gradient boosting, and eXtreme gradient boosting. The metrics used for model performance comparisons are the symmetric mean absolute percentage error, relative absolute error, root relative squared error, and root mean squared error. RESULTS Machine learning approaches outperformed MLR for all four prediction errors. The average importance ranking of each factor generated by the machine learning methods indicates that age is the most important factor determining T-score, followed by estimated glomerular filtration rate (eGFR), body mass index (BMI), uric acid (UA), and education level. CONCLUSION In a group of women older than 55 years, we demonstrated that machine learning methods provide superior performance in estimating T-Score, with age being the most important impact factor, followed by eGFR, BMI, UA, and education level.
Collapse
Affiliation(s)
- Shiow-Jyu Tzou
- Teaching and Researching Center, Kaohsiung Armed Forces General Hospital, Kaohsiung, Taiwan, ROC
- Institute of Medical Science and Technology, National Sun Yat-sen University, Kaohsiung, Taiwan, ROC
| | - Chung-Hsin Peng
- Department of Urology, Cardinal Tien Hospital, School of Medicine, Fu Jen Catholic University, New Taipei City, Taiwan, ROC
| | - Li-Ying Huang
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Fu Jen Catholic University Hospital, New Taipei, Taiwan
| | - Fang-Yu Chen
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Fu Jen Catholic University Hospital, New Taipei, Taiwan
| | - Chun-Heng Kuo
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Fu Jen Catholic University Hospital, New Taipei, Taiwan
| | - Chung-Ze Wu
- Department of Internal Medicine, Shuang Ho Hospital, New Taipei City, Division of Endocrinology and Metabolism, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan, ROC
| | - Ta-Wei Chu
- Department of Obstetrics and Gynecology, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan, ROC
- MJ Health Research Foundation, Taipei, Taiwan, ROC
| |
Collapse
|
4
|
Chen CH, Wang CK, Wang CY, Chang CF, Chu TW. Roles of biochemistry data, lifestyle, and inflammation in identifying abnormal renal function in old Chinese. World J Clin Cases 2023; 11:7004-7016. [PMID: 37946770 PMCID: PMC10631406 DOI: 10.12998/wjcc.v11.i29.7004] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 08/01/2023] [Accepted: 09/11/2023] [Indexed: 10/13/2023] Open
Abstract
BACKGROUND The incidence of chronic kidney disease (CKD) has dramatically increased in recent years, with significant impacts on patient mortality rates. Previous studies have identified multiple risk factors for CKD, but they mostly relied on the use of traditional statistical methods such as logistic regression and only focused on a few risk factors. AIM To determine factors that can be used to identify subjects with a low estimated glomerular filtration rate (L-eGFR < 60 mL/min per 1.73 m2) in a cohort of 1236 Chinese people aged over 65. METHODS Twenty risk factors were divided into three models. Model 1 consisted of demographic and biochemistry data. Model 2 added lifestyle data to Model 1, and Model 3 added inflammatory markers to Model 2. Five machine learning methods were used: Multivariate adaptive regression splines, eXtreme Gradient Boosting, stochastic gradient boosting, Light Gradient Boosting Machine, and Categorical Features + Gradient Boosting. Evaluation criteria included accuracy, sensitivity, specificity, area under the receiver operating characteristic curve (AUC), F-1 score, and balanced accuracy. RESULTS A trend of increasing AUC of each was observed from Model 1 to Model 3 and reached statistical significance. Model 3 selected uric acid as the most important risk factor, followed by age, hemoglobin (Hb), body mass index (BMI), sport hours, and systolic blood pressure (SBP). CONCLUSION Among all the risk factors including demographic, biochemistry, and lifestyle risk factors, along with inflammation markers, UA is the most important risk factor to identify L-eGFR, followed by age, Hb, BMI, sport hours, and SBP in a cohort of elderly Chinese people.
Collapse
Affiliation(s)
- Chao-Hung Chen
- Division of Urology, Department of Surgery, Kaohsiung Armed Forces General Hospital, Kaohsiung 802, Taiwan
- Division of Urology, Department of Surgery, Chang Gung Memorial Hospital, Keelung 204, Taiwan
| | - Chun-Kai Wang
- Department of Obstetrics and Gynecology, Zuoying Branch of Kaohsiung Armed Forces General Hospital, Kaohsiung 813, Taiwan
| | - Chen-Yu Wang
- Department of Obstetrics and Gynecology, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan
| | - Chun-Feng Chang
- Division of Urology, Department of Surgery, Kaohsiung Armed Forces General Hospital, Kaohsiung 802, Taiwan
- Division of Urology, Department of Surgery, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan
| | - Ta-Wei Chu
- Department of Obstetrics and Gynecology, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan
- Chief Executive Officer's Office, MJ Health Research Foundation, Taipei 114, Taiwan
| |
Collapse
|
5
|
Tsai MH, Jhou MJ, Liu TC, Fang YW, Lu CJ. An integrated machine learning predictive scheme for longitudinal laboratory data to evaluate the factors determining renal function changes in patients with different chronic kidney disease stages. Front Med (Lausanne) 2023; 10:1155426. [PMID: 37859858 PMCID: PMC10582636 DOI: 10.3389/fmed.2023.1155426] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 09/19/2023] [Indexed: 10/21/2023] Open
Abstract
Background and objectives Chronic kidney disease (CKD) is a global health concern. This study aims to identify key factors associated with renal function changes using the proposed machine learning and important variable selection (ML&IVS) scheme on longitudinal laboratory data. The goal is to predict changes in the estimated glomerular filtration rate (eGFR) in a cohort of patients with CKD stages 3-5. Design A retrospective cohort study. Setting and participants A total of 710 outpatients who presented with stable nondialysis-dependent CKD stages 3-5 at the Shin-Kong Wu Ho-Su Memorial Hospital Medical Center from 2016 to 2021. Methods This study analyzed trimonthly laboratory data including 47 indicators. The proposed scheme used stochastic gradient boosting, multivariate adaptive regression splines, random forest, eXtreme gradient boosting, and light gradient boosting machine algorithms to evaluate the important factors for predicting the results of the fourth eGFR examination, especially in patients with CKD stage 3 and those with CKD stages 4-5, with or without diabetes mellitus (DM). Main outcome measurement Subsequent eGFR level after three consecutive laboratory data assessments. Results Our ML&IVS scheme demonstrated superior predictive capabilities and identified significant factors contributing to renal function changes in various CKD groups. The latest levels of eGFR, blood urea nitrogen (BUN), proteinuria, sodium, and systolic blood pressure as well as mean levels of eGFR, BUN, proteinuria, and triglyceride were the top 10 significantly important factors for predicting the subsequent eGFR level in patients with CKD stages 3-5. In individuals with DM, the latest levels of BUN and proteinuria, mean levels of phosphate and proteinuria, and variations in diastolic blood pressure levels emerged as important factors for predicting the decline of renal function. In individuals without DM, all phosphate patterns and latest albumin levels were found to be key factors in the advanced CKD group. Moreover, proteinuria was identified as an important factor in the CKD stage 3 group without DM and CKD stages 4-5 group with DM. Conclusion The proposed scheme highlighted factors associated with renal function changes in different CKD conditions, offering valuable insights to physicians for raising awareness about renal function changes.
Collapse
Affiliation(s)
- Ming-Hsien Tsai
- Division of Nephrology, Department of Medicine, Shin Kong Wu Ho-Su Memorial Hospital, Taipei, Taiwan
- Department of Medicine, School of Medicine, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Mao-Jhen Jhou
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Tzu-Chi Liu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Yu-Wei Fang
- Division of Nephrology, Department of Medicine, Shin Kong Wu Ho-Su Memorial Hospital, Taipei, Taiwan
- Department of Medicine, School of Medicine, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Chi-Jie Lu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City, Taiwan
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City, Taiwan
- Department of Information Management, Fu Jen Catholic University, New Taipei City, Taiwan
| |
Collapse
|
6
|
Machine Learning Predictive Models for Evaluating Risk Factors Affecting Sperm Count: Predictions Based on Health Screening Indicators. J Clin Med 2023; 12:jcm12031220. [PMID: 36769868 PMCID: PMC9917545 DOI: 10.3390/jcm12031220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 01/13/2023] [Accepted: 02/01/2023] [Indexed: 02/05/2023] Open
Abstract
In many countries, especially developed nations, the fertility rate and birth rate have continually declined. Taiwan's fertility rate has paralleled this trend and reached its nadir in 2022. Therefore, the government uses many strategies to encourage more married couples to have children. However, couples marrying at an older age may have declining physical status, as well as hypertension and other metabolic syndrome symptoms, in addition to possibly being overweight, which have been the focus of the studies for their influences on male and female gamete quality. Many previous studies based on infertile people are not truly representative of the general population. This study proposed a framework using five machine learning (ML) predictive algorithms-random forest, stochastic gradient boosting, least absolute shrinkage and selection operator regression, ridge regression, and extreme gradient boosting-to identify the major risk factors affecting male sperm count based on a major health screening database in Taiwan. Unlike traditional multiple linear regression, ML algorithms do not need statistical assumptions and can capture non-linear relationships or complex interactions between dependent and independent variables to generate promising performance. We analyzed annual health screening data of 1375 males from 2010 to 2017, including data on health screening indicators, sourced from the MJ Group, a major health screening center in Taiwan. The symmetric mean absolute percentage error, relative absolute error, root relative squared error, and root mean squared error were used as performance evaluation metrics. Our results show that sleep time (ST), alpha-fetoprotein (AFP), body fat (BF), systolic blood pressure (SBP), and blood urea nitrogen (BUN) are the top five risk factors associated with sperm count. ST is a known risk factor influencing reproductive hormone balance, which can affect spermatogenesis and final sperm count. BF and SBP are risk factors associated with metabolic syndrome, another known risk factor of altered male reproductive hormone systems. However, AFP has not been the focus of previous studies on male fertility or semen quality. BUN, the index for kidney function, is also identified as a risk factor by our established ML model. Our results support previous findings that metabolic syndrome has negative impacts on sperm count and semen quality. Sleep duration also has an impact on sperm generation in the testes. AFP and BUN are two novel risk factors linked to sperm counts. These findings could help healthcare personnel and law makers create strategies for creating environments to increase the country's fertility rate. This study should also be of value to follow-up research.
Collapse
|
7
|
Imangaliyev S, Schlötterer J, Meyer F, Seifert C. Diagnosis of Inflammatory Bowel Disease and Colorectal Cancer through Multi-View Stacked Generalization Applied on Gut Microbiome Data. Diagnostics (Basel) 2022; 12:diagnostics12102514. [PMID: 36292203 PMCID: PMC9600435 DOI: 10.3390/diagnostics12102514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 10/08/2022] [Accepted: 10/11/2022] [Indexed: 12/02/2022] Open
Abstract
Most of the microbiome studies suggest that using ensemble models such as Random Forest results in best predictive power. In this study, we empirically evaluate a more powerful ensemble learning algorithm, multi-view stacked generalization, on pediatric inflammatory bowel disease and adult colorectal cancer patients’ cohorts. We aim to check whether stacking would lead to better results compared to using a single best machine learning algorithm. Stacking achieves the best test set Average Precision (AP) on inflammatory bowel disease dataset reaching AP = 0.69, outperforming both the best base classifier (AP = 0.61) and the baseline meta learner built on top of base classifiers (AP = 0.63). On colorectal cancer dataset, the stacked classifier also outperforms (AP = 0.81) both the best base classifier (AP = 0.79) and the baseline meta learner (AP = 0.75). Stacking achieves best predictive performance on test set outperforming the best classifiers on both patient cohorts. Application of the stacking solves the issue of choosing the most appropriate machine learning algorithm by automating the model selection procedure. Clinical application of such a model is not limited to diagnosis task only, but it also can be extended to biomarker selection thanks to feature selection procedure.
Collapse
Affiliation(s)
- Sultan Imangaliyev
- Institute for Artificial Intelligence in Medicine, University of Duisburg-Essen, 45131 Essen, Germany
- Cancer Research Center Cologne Essen (CCCE), 45147 Essen, Germany
| | - Jörg Schlötterer
- Institute for Artificial Intelligence in Medicine, University of Duisburg-Essen, 45131 Essen, Germany
- Cancer Research Center Cologne Essen (CCCE), 45147 Essen, Germany
| | - Folker Meyer
- Institute for Artificial Intelligence in Medicine, University of Duisburg-Essen, 45131 Essen, Germany
| | - Christin Seifert
- Institute for Artificial Intelligence in Medicine, University of Duisburg-Essen, 45131 Essen, Germany
- Cancer Research Center Cologne Essen (CCCE), 45147 Essen, Germany
- Correspondence:
| |
Collapse
|
8
|
Huang LY, Chen FY, Jhou MJ, Kuo CH, Wu CZ, Lu CH, Chen YL, Pei D, Cheng YF, Lu CJ. Comparing Multiple Linear Regression and Machine Learning in Predicting Diabetic Urine Albumin–Creatinine Ratio in a 4-Year Follow-Up Study. J Clin Med 2022; 11:jcm11133661. [PMID: 35806944 PMCID: PMC9267784 DOI: 10.3390/jcm11133661] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 06/19/2022] [Accepted: 06/22/2022] [Indexed: 02/07/2023] Open
Abstract
The urine albumin–creatinine ratio (uACR) is a warning for the deterioration of renal function in type 2 diabetes (T2D). The early detection of ACR has become an important issue. Multiple linear regression (MLR) has traditionally been used to explore the relationships between risk factors and endpoints. Recently, machine learning (ML) methods have been widely applied in medicine. In the present study, four ML methods were used to predict the uACR in a T2D cohort. We hypothesized that (1) ML outperforms traditional MLR and (2) different ranks of the importance of the risk factors will be obtained. A total of 1147 patients with T2D were followed up for four years. MLR, classification and regression tree, random forest, stochastic gradient boosting, and eXtreme gradient boosting methods were used. Our findings show that the prediction errors of the ML methods are smaller than those of MLR, which indicates that ML is more accurate. The first six most important factors were baseline creatinine level, systolic and diastolic blood pressure, glycated hemoglobin, and fasting plasma glucose. In conclusion, ML might be more accurate in predicting uACR in a T2D cohort than the traditional MLR, and the baseline creatinine level is the most important predictor, which is followed by systolic and diastolic blood pressure, glycated hemoglobin, and fasting plasma glucose in Chinese patients with T2D.
Collapse
Affiliation(s)
- Li-Ying Huang
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Department of Medical Education, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan; (L.-Y.H.); (F.-Y.C.); (C.-H.K.); (D.P.)
| | - Fang-Yu Chen
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Department of Medical Education, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan; (L.-Y.H.); (F.-Y.C.); (C.-H.K.); (D.P.)
| | - Mao-Jhen Jhou
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242062, Taiwan;
| | - Chun-Heng Kuo
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Department of Medical Education, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan; (L.-Y.H.); (F.-Y.C.); (C.-H.K.); (D.P.)
| | - Chung-Ze Wu
- Division of Endocrinology, Department of Internal Medicine, Shuang Ho Hospital, New Taipei City 23561, Taiwan;
- Division of Endocrinology and Metabolism, Department of Internal Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei 11031, Taiwan
| | - Chieh-Hua Lu
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Tri-Service General Hospital, School of Medicine, National Defense Medical Center, Taipei 11490, Taiwan;
| | - Yen-Lin Chen
- Department of Pathology, Tri-Service General Hospital, National Defense Medical Center, Taipei 11490, Taiwan;
| | - Dee Pei
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Department of Medical Education, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan; (L.-Y.H.); (F.-Y.C.); (C.-H.K.); (D.P.)
| | - Yu-Fang Cheng
- Department of Endocrinology and Metabolism, Changhua Christian Hospital, Changhua 50051, Taiwan;
| | - Chi-Jie Lu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242062, Taiwan;
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City 242062, Taiwan
- Department of Information Management, Fu Jen Catholic University, New Taipei City 242062, Taiwan
- Correspondence: ; Tel.: +886-2-2905-2973
| |
Collapse
|
9
|
Associations between Sex and Risk Factors for Predicting Chronic Kidney Disease. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph19031219. [PMID: 35162242 PMCID: PMC8835286 DOI: 10.3390/ijerph19031219] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 01/12/2022] [Accepted: 01/19/2022] [Indexed: 11/16/2022]
Abstract
Gender is an important risk factor in predicting chronic kidney disease (CKD); however, it is under-researched. The purpose of this study was to examine whether gender differences affect the risk factors of early CKD prediction. This study used data from 19,270 adult health screenings, including 5101 with CKD, to screen for 11 independent variables selected as risk factors and to test for the significant effects of statistical Chi-square test variables, using seven machine learning techniques to train the predictive models. Performance indicators included classification accuracy, sensitivity, specificity, and precision. Unbalanced category issues were addressed using three extraction methods: manual sampling, the synthetic minority oversampling technique, and SpreadSubsample. The Chi-square test revealed statistically significant results (p < 0.001) for gender, age, red blood cell count in urine, urine protein (PRO) content, and the PRO-to-urinary creatinine ratio. In terms of classifier prediction performance, the manual extraction method, logistic regression, exhibited the highest average prediction accuracy rate (0.8053) for men, whereas the manual extraction method, linear discriminant analysis, demonstrated the highest average prediction accuracy rate (0.8485) for women. The clinical features of a normal or abnormal PRO-to-urinary creatinine ratio indicated that PRO ratio, age, and urine red blood cell count are the most important risk factors with which to predict CKD in both genders. As a result, this study proposes a prediction model with acceptable prediction accuracy. The model supports doctors in diagnosis and treatment and achieves the goal of early detection and treatment. Based on the evidence-based medicine, machine learning methods are used to develop predictive model in this study. The model has proven to support the prediction of early clinical risk of CKD as much as possible to improve the efficacy and quality of clinical decision making.
Collapse
|