1
|
Hsia JY, Chang CC, Liu CF, Chou CL, Yang CC. Longitudinal Risk Analysis of Second Primary Cancer after Curative Treatment in Patients with Rectal Cancer. Diagnostics (Basel) 2024; 14:1461. [PMID: 39001350 PMCID: PMC11241612 DOI: 10.3390/diagnostics14131461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Accepted: 07/04/2024] [Indexed: 07/16/2024] Open
Abstract
Predicting and improving the response of rectal cancer to second primary cancers (SPCs) remains an active and challenging field of clinical research. Identifying predictive risk factors for SPCs will help guide more personalized treatment strategies. In this study, we propose that experience data be used as evidence to support patient-oriented decision-making. The proposed model consists of two main components: a pipeline for extraction and classification and a clinical risk assessment. The study includes 4402 patient datasets, including 395 SPC patients, collected from three cancer registry databases at three medical centers; based on literature reviews and discussion with clinical experts, 10 predictive variables were considered risk factors for SPCs. The proposed extraction and classification pipelines that classified patients according to importance were age at diagnosis, chemotherapy, smoking behavior, combined stage group, and sex, as has been proven in previous studies. The C5 method had the highest predicted AUC (84.88%). In addition, the proposed model was associated with a classification pipeline that showed an acceptable testing accuracy of 80.85%, a recall of 79.97%, a specificity of 88.12%, a precision of 85.79%, and an F1 score of 79.88%. Our results indicate that chemotherapy is the most important prognostic risk factor for SPCs in rectal cancer survivors. Furthermore, our decision tree for clinical risk assessment illuminates the possibility of assessing the effectiveness of a combination of these risk factors. This proposed model may provide an essential evaluation and longitudinal change for personalized treatment of rectal cancer survivors in the future.
Collapse
Affiliation(s)
- Jiun-Yi Hsia
- Division of Thoracic Surgery, Department of Surgery, Chung Shan Medical University Hospital, Taichung 402367, Taiwan;
- School of Medicine, Chung Shan Medical University, Taichung 40201, Taiwan
| | - Chi-Chang Chang
- School of Medical Informatics, Chung Shan Medical University, IT Office, Chung Shan Medical University Hospital, Taichung 40201, Taiwan;
- Department of Information Management, Ming Chuan University, Taoyuan 33348, Taiwan
| | - Chung-Feng Liu
- Department of Medical Research, Chi Mei Medical Center, Tainan 710402, Taiwan;
| | - Chia-Lin Chou
- Division of Colon & Rectal Surgery, Department of Surgery, Chi Mei Medical Center, Tainan 710402, Taiwan
- Department of Medical Laboratory Science and Biotechnology, Chung Hwa University of Medical Technology, Tainan 71703, Taiwan
| | - Ching-Chieh Yang
- Department of Radiation Oncology, Chi Mei Medical Center, Tainan 71004, Taiwan
- Department of Pharmacy, Chia-Nan University of Pharmacy and Science, Tainan 717301, Taiwan
- School of Medicine, College of Medicine, National Sun Yat-sen University, Kaohsiung 80404, Taiwan
| |
Collapse
|
2
|
Acosta-Angulo B, Lara-Ramos J, Niño-Vargas A, Diaz-Angulo J, Benavides-Guerrero J, Bhattacharya A, Cloutier S, Machuca-Martínez F. Unveiling the potential of machine learning in cost-effective degradation of pharmaceutically active compounds: A stirred photo-reactor study. CHEMOSPHERE 2024; 358:142222. [PMID: 38714249 DOI: 10.1016/j.chemosphere.2024.142222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 03/30/2024] [Accepted: 04/30/2024] [Indexed: 05/09/2024]
Abstract
In this study, neural networks and support vector regression (SVR) were employed to predict the degradation over three pharmaceutically active compounds (PhACs): Ibuprofen (IBP), diclofenac (DCF), and caffeine (CAF) within a stirred reactor featuring a flotation cell with two non-concentric ultraviolet lamps. A total of 438 datapoints were collected from published works and distributed into 70% training and 30% test datasets while cross-validation was utilized to assess the training reliability. The models incorporated 15 input variables concerning reaction kinetics, molecular properties, hydrodynamic information, presence of radiation, and catalytic properties. It was observed that the Support Vector Regression (SVR) presented a poor performance as the ε hyperparameter ignored large error over low concentration levels. Meanwhile, the Artificial Neural Networks (ANN) model was able to provide rough estimations on the expected degradation of the pollutants without requiring information regarding reaction rate constants. The multi-objective optimization analysis suggested a leading role due to ozone kinetic for a rapid degradation of the contaminants and most of the results required intensification with hydrogen peroxide and Fenton process. Although both models were affected by accuracy limitations, this work provided a lightweight model to evaluate different Advanced Oxidation Processes (AOPs) by providing general information regarding the process operational conditions as well as know molecular and catalytic properties.
Collapse
Affiliation(s)
- B Acosta-Angulo
- Escuela de Ingeniería Química, Universidad Del Valle, Santiago de, Cali, 760026, Valle Del Cauca, Colombia
| | - J Lara-Ramos
- Escuela de Ingeniería Química, Universidad Del Valle, Santiago de, Cali, 760026, Valle Del Cauca, Colombia
| | - A Niño-Vargas
- Escuela de Ingeniería Química, Universidad Del Valle, Santiago de, Cali, 760026, Valle Del Cauca, Colombia
| | - J Diaz-Angulo
- Research and Technological Development in Water Treatment, Processes Modelling and Disposal of Residues - GITAM, Cauca, Colombia
| | - J Benavides-Guerrero
- Department of Electrical Engineering, Ecole de Technologia Superieure, 1100 Notre-Dame West, Montreal, H3C 1K3, Quebec, Canada
| | - A Bhattacharya
- Department of Electrical Engineering, Ecole de Technologia Superieure, 1100 Notre-Dame West, Montreal, H3C 1K3, Quebec, Canada
| | - S Cloutier
- Department of Electrical Engineering, Ecole de Technologia Superieure, 1100 Notre-Dame West, Montreal, H3C 1K3, Quebec, Canada
| | - F Machuca-Martínez
- Escuela de Ingeniería Química, Universidad Del Valle, Santiago de, Cali, 760026, Valle Del Cauca, Colombia.
| |
Collapse
|
3
|
Chang CC, Yeh JH, Chiu HC, Liu TC, Chen YM, Jhou MJ, Lu CJ. Assessing the length of hospital stay for patients with myasthenia gravis based on the data mining MARS approach. Front Neurol 2023; 14:1283214. [PMID: 38156090 PMCID: PMC10752965 DOI: 10.3389/fneur.2023.1283214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 11/27/2023] [Indexed: 12/30/2023] Open
Abstract
Predicting the length of hospital stay for myasthenia gravis (MG) patients is challenging due to the complex pathogenesis, high clinical variability, and non-linear relationships between variables. Considering the management of MG during hospitalization, it is important to conduct a risk assessment to predict the length of hospital stay. The present study aimed to successfully predict the length of hospital stay for MG based on an expandable data mining technique, multivariate adaptive regression splines (MARS). Data from 196 MG patients' hospitalization were analyzed, and the MARS model was compared with classical multiple linear regression (MLR) and three other machine learning (ML) algorithms. The average hospital stay duration was 12.3 days. The MARS model, leveraging its ability to capture non-linearity, identified four significant factors: disease duration, age at admission, MGFA clinical classification, and daily prednisolone dose. Cut-off points and correlation curves were determined for these risk factors. The MARS model outperformed the MLR and the other ML methods (including least absolute shrinkage and selection operator MLR, classification and regression tree, and random forest) in assessing hospital stay length. This is the first study to utilize data mining methods to explore factors influencing hospital stay in patients with MG. The results highlight the effectiveness of the MARS model in identifying the cut-off points and correlation for risk factors associated with MG hospitalization. Furthermore, a MARS-based formula was developed as a practical tool to assist in the measurement of hospital stay, which can be feasibly supported as an extension of clinical risk assessment.
Collapse
Affiliation(s)
- Che-Cheng Chang
- Department of Neurology, Fu Jen Catholic University Hospital, Fu Jen Catholic University, New Taipei City, Taiwan
- PhD Program in Nutrition and Food Science, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Jiann-Horng Yeh
- School of Medicine, Fu Jen Catholic University, New Taipei City, Taiwan
- Department of Neurology, Shin Kong Wu Ho-Su Memorial Hospital, Taipei City, Taiwan
- Department of Neurology, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Hou-Chang Chiu
- School of Medicine, Fu Jen Catholic University, New Taipei City, Taiwan
- Department of Neurology, Taipei Medical University, Shuang-Ho Hospital, New Taipei City, Taiwan
| | - Tzu-Chi Liu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Yen-Ming Chen
- Department of Neurology, Fu Jen Catholic University Hospital, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Mao-Jhen Jhou
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Chi-Jie Lu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City, Taiwan
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City, Taiwan
- Department of Information Management, Fu Jen Catholic University, New Taipei City, Taiwan
| |
Collapse
|
4
|
Wang CK, Chang CY, Chu TW, Liang YJ. Using Machine Learning to Identify the Relationships between Demographic, Biochemical, and Lifestyle Parameters and Plasma Vitamin D Concentration in Healthy Premenopausal Chinese Women. Life (Basel) 2023; 13:2257. [PMID: 38137858 PMCID: PMC10744461 DOI: 10.3390/life13122257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 11/15/2023] [Accepted: 11/22/2023] [Indexed: 12/24/2023] Open
Abstract
INTRODUCTION Vitamin D plays a vital role in maintaining homeostasis and enhancing the absorption of calcium, an essential component for strengthening bones and preventing osteoporosis. There are many factors known to relate to plasma vitamin D concentration (PVDC). However, most of these studies were performed with traditional statistical methods. Nowadays, machine learning methods (Mach-L) have become new tools in medical research. In the present study, we used four Mach-L methods to explore the relationships between PVDC and demographic, biochemical, and lifestyle factors in a group of healthy premenopausal Chinese women. Our goals were as follows: (1) to evaluate and compare the predictive accuracy of Mach-L and MLR, and (2) to establish a hierarchy of the significance of the aforementioned factors related to PVDC. METHODS Five hundred ninety-three healthy Chinese women were enrolled. In total, there were 35 variables recorded, including demographic, biochemical, and lifestyle information. The dependent variable was 25-OH vitamin D (PVDC), and all other variables were the independent variables. Multiple linear regression (MLR) was regarded as the benchmark for comparison. Four Mach-L methods were applied (random forest (RF), stochastic gradient boosting (SGB), extreme gradient boosting (XGBoost), and elastic net). Each method would produce several estimation errors. The smaller these errors were, the better the model was. RESULTS Pearson's correlation, age, glycated hemoglobin, HDL-cholesterol, LDL-cholesterol, and hemoglobin were positively correlated to PVDC, whereas eGFR was negatively correlated to PVDC. The Mach-L methods yielded smaller estimation errors for all five parameters, which indicated that they were better methods than the MLR model. After averaging the importance percentage from the four Mach-L methods, a rank of importance could be obtained. Age was the most important factor, followed by plasma insulin level, TSH, spouse status, LDH, and ALP. CONCLUSIONS In a healthy Chinese premenopausal cohort using four different Mach-L methods, age was found to be the most important factor related to PVDC, followed by plasma insulin level, TSH, spouse status, LDH, and ALP.
Collapse
Affiliation(s)
- Chun-Kai Wang
- Department of Obstetrics and Gynecology, Zuoying Branch of Kaohsiung Armed Forces General Hospital, Kaohsiung 813, Taiwan;
| | - Ching-Yao Chang
- Graduate Institute of Applied Science and Engineering, Fu Jen Catholic University, New Taipei City 242, Taiwan;
| | - Ta-Wei Chu
- Department of Obstetrics and Gynecology, Tri-Service General Hospital, National Defense Medical Center, Chief Executive Officer’s Office, MJ Health Research Foundation, Taipei 114, Taiwan;
| | - Yao-Jen Liang
- Graduate Institute of Applied Science and Engineering, Fu Jen Catholic University, New Taipei City 242, Taiwan;
| |
Collapse
|
5
|
Machine Learning Predictive Models for Evaluating Risk Factors Affecting Sperm Count: Predictions Based on Health Screening Indicators. J Clin Med 2023; 12:jcm12031220. [PMID: 36769868 PMCID: PMC9917545 DOI: 10.3390/jcm12031220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 01/13/2023] [Accepted: 02/01/2023] [Indexed: 02/05/2023] Open
Abstract
In many countries, especially developed nations, the fertility rate and birth rate have continually declined. Taiwan's fertility rate has paralleled this trend and reached its nadir in 2022. Therefore, the government uses many strategies to encourage more married couples to have children. However, couples marrying at an older age may have declining physical status, as well as hypertension and other metabolic syndrome symptoms, in addition to possibly being overweight, which have been the focus of the studies for their influences on male and female gamete quality. Many previous studies based on infertile people are not truly representative of the general population. This study proposed a framework using five machine learning (ML) predictive algorithms-random forest, stochastic gradient boosting, least absolute shrinkage and selection operator regression, ridge regression, and extreme gradient boosting-to identify the major risk factors affecting male sperm count based on a major health screening database in Taiwan. Unlike traditional multiple linear regression, ML algorithms do not need statistical assumptions and can capture non-linear relationships or complex interactions between dependent and independent variables to generate promising performance. We analyzed annual health screening data of 1375 males from 2010 to 2017, including data on health screening indicators, sourced from the MJ Group, a major health screening center in Taiwan. The symmetric mean absolute percentage error, relative absolute error, root relative squared error, and root mean squared error were used as performance evaluation metrics. Our results show that sleep time (ST), alpha-fetoprotein (AFP), body fat (BF), systolic blood pressure (SBP), and blood urea nitrogen (BUN) are the top five risk factors associated with sperm count. ST is a known risk factor influencing reproductive hormone balance, which can affect spermatogenesis and final sperm count. BF and SBP are risk factors associated with metabolic syndrome, another known risk factor of altered male reproductive hormone systems. However, AFP has not been the focus of previous studies on male fertility or semen quality. BUN, the index for kidney function, is also identified as a risk factor by our established ML model. Our results support previous findings that metabolic syndrome has negative impacts on sperm count and semen quality. Sleep duration also has an impact on sperm generation in the testes. AFP and BUN are two novel risk factors linked to sperm counts. These findings could help healthcare personnel and law makers create strategies for creating environments to increase the country's fertility rate. This study should also be of value to follow-up research.
Collapse
|
6
|
Huang YC, Cheng YC, Jhou MJ, Chen M, Lu CJ. Integrated Machine Learning Decision Tree Model for Risk Evaluation in Patients with Non-Valvular Atrial Fibrillation When Taking Different Doses of Dabigatran. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:2359. [PMID: 36767726 PMCID: PMC9915180 DOI: 10.3390/ijerph20032359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 01/24/2023] [Accepted: 01/24/2023] [Indexed: 06/18/2023]
Abstract
The new generation of nonvitamin K antagonists are broadly applied for stroke prevention due to their notable efficacy and safety. Our study aimed to develop a suggestive utilization of dabigatran through an integrated machine learning (ML) decision-tree model. Participants taking different doses of dabigatran in the Randomized Evaluation of Long-Term Anticoagulant Therapy trial were included in our analysis and defined as the 110 mg and 150 mg groups. The proposed scheme integrated ML methods, namely naive Bayes, random forest (RF), classification and regression tree (CART), and extreme gradient boosting (XGBoost), which were used to identify the essential variables for predicting vascular events in the 110 mg group and bleeding in the 150 mg group. RF (0.764 for 110 mg; 0.747 for 150 mg) and XGBoost (0.708 for 110 mg; 0.761 for 150 mg) had better area under the receiver operating characteristic curve (AUC) values than logistic regression (benchmark model; 0.683 for 110 mg; 0.739 for 150 mg). We then selected the top ten important variables as internal nodes of the CART decision tree. The two best CART models with ten important variables output tree-shaped rules for predicting vascular events in the 110 mg group and bleeding in the 150 mg group. Our model can be used to provide more visualized and interpretable suggestive rules to clinicians managing NVAF patients who are taking dabigatran.
Collapse
Affiliation(s)
- Yung-Chuan Huang
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242062, Taiwan
- Department of Neurology, Fu Jen Catholic University Hospital, Fu Jen Catholic University, New Taipei City 24352, Taiwan
| | - Yu-Chen Cheng
- Department of Neurology, Fu Jen Catholic University Hospital, Fu Jen Catholic University, New Taipei City 24352, Taiwan
| | - Mao-Jhen Jhou
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242062, Taiwan
| | - Mingchih Chen
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242062, Taiwan
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City 242062, Taiwan
| | - Chi-Jie Lu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242062, Taiwan
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City 242062, Taiwan
- Department of Information Management, Fu Jen Catholic University, New Taipei City 242062, Taiwan
| |
Collapse
|
7
|
AKBULUT S, KÜÇÜKAKÇALI Z, ÇOLAK C. XGBoost modeli ile gen dizileme verilerine dayalı kolorektal kanserin sınıflandırılması: Bir halk sağlığı bilişimi uygulaması. CUKUROVA MEDICAL JOURNAL 2022. [DOI: 10.17826/cumj.1128653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
Amaç: Bu çalışma, bir makine öğrenmesi yöntemi olan XGBoost yöntemi ile açık erişimli kolorektal kanser gen verilerini sınıflandırmayı ve temel genleri tanımlamayı amaçlamaktadır.
Gereç ve Yöntem: Çalışmada açık erişimli kolorektal kanser gen veri seti kullanıldı. Veri seti, sağlıklı kontrollerden 10 mukozanın ve kolorektal kanserli 12 hastanın kolon mukozasının gen dizileme sonuçlarını içeriyordu. Hastalığı sınıflandırmak için makine öğrenmesi yöntemlerinden biri olan XGboost kullanıldı. Model performansı için doğruluk, dengelenmiş doğruluk, duyarlılık, seçicilik, pozitif tahmin değeri ve negatif tahmin değeri performans metrikleri değerlendirildi.
Bulgular: Değişken seçim yöntemine göre 17 gen seçilmiş ve bu girdi değişkenleri ile modelleme yapılmıştır. Modelleme sonuçlarından elde edilen doğruluk, dengeli doğruluk, duyarlılık, özgüllük, pozitif tahmin değeri, negatif tahmin değeri ve F1 puanı sırasıyla %95.5, %95.8, %91.7, %1, %1 ve %90.9 ve %95.7 idi. XGboost tekniği sonucundan elde edilen değişken önemliliklerine göre, CYR61, NR4A, FOSB ve NR4A2 genleri kolorektal kanser için biyolojik belirteçler olarak kullanılabilir.
Sonuç: Bu araştırma sonucunda kolorektal kanserle bağlantılı olabilecek genlerin yanı sıra hastalığa yönelik genetik biyobelirteçler de belirlendi. Gelecekte, tespit edilen genlerin güvenilirliği doğrulanabilir, bu genlere dayalı olarak terapötik prosedürler oluşturulabilir ve klinik pratikteki yararları belgelenebilir.
Collapse
|
8
|
Huang LY, Chen FY, Jhou MJ, Kuo CH, Wu CZ, Lu CH, Chen YL, Pei D, Cheng YF, Lu CJ. Comparing Multiple Linear Regression and Machine Learning in Predicting Diabetic Urine Albumin–Creatinine Ratio in a 4-Year Follow-Up Study. J Clin Med 2022; 11:jcm11133661. [PMID: 35806944 PMCID: PMC9267784 DOI: 10.3390/jcm11133661] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 06/19/2022] [Accepted: 06/22/2022] [Indexed: 02/07/2023] Open
Abstract
The urine albumin–creatinine ratio (uACR) is a warning for the deterioration of renal function in type 2 diabetes (T2D). The early detection of ACR has become an important issue. Multiple linear regression (MLR) has traditionally been used to explore the relationships between risk factors and endpoints. Recently, machine learning (ML) methods have been widely applied in medicine. In the present study, four ML methods were used to predict the uACR in a T2D cohort. We hypothesized that (1) ML outperforms traditional MLR and (2) different ranks of the importance of the risk factors will be obtained. A total of 1147 patients with T2D were followed up for four years. MLR, classification and regression tree, random forest, stochastic gradient boosting, and eXtreme gradient boosting methods were used. Our findings show that the prediction errors of the ML methods are smaller than those of MLR, which indicates that ML is more accurate. The first six most important factors were baseline creatinine level, systolic and diastolic blood pressure, glycated hemoglobin, and fasting plasma glucose. In conclusion, ML might be more accurate in predicting uACR in a T2D cohort than the traditional MLR, and the baseline creatinine level is the most important predictor, which is followed by systolic and diastolic blood pressure, glycated hemoglobin, and fasting plasma glucose in Chinese patients with T2D.
Collapse
Affiliation(s)
- Li-Ying Huang
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Department of Medical Education, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan; (L.-Y.H.); (F.-Y.C.); (C.-H.K.); (D.P.)
| | - Fang-Yu Chen
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Department of Medical Education, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan; (L.-Y.H.); (F.-Y.C.); (C.-H.K.); (D.P.)
| | - Mao-Jhen Jhou
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242062, Taiwan;
| | - Chun-Heng Kuo
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Department of Medical Education, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan; (L.-Y.H.); (F.-Y.C.); (C.-H.K.); (D.P.)
| | - Chung-Ze Wu
- Division of Endocrinology, Department of Internal Medicine, Shuang Ho Hospital, New Taipei City 23561, Taiwan;
- Division of Endocrinology and Metabolism, Department of Internal Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei 11031, Taiwan
| | - Chieh-Hua Lu
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Tri-Service General Hospital, School of Medicine, National Defense Medical Center, Taipei 11490, Taiwan;
| | - Yen-Lin Chen
- Department of Pathology, Tri-Service General Hospital, National Defense Medical Center, Taipei 11490, Taiwan;
| | - Dee Pei
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Department of Medical Education, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan; (L.-Y.H.); (F.-Y.C.); (C.-H.K.); (D.P.)
| | - Yu-Fang Cheng
- Department of Endocrinology and Metabolism, Changhua Christian Hospital, Changhua 50051, Taiwan;
| | - Chi-Jie Lu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242062, Taiwan;
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City 242062, Taiwan
- Department of Information Management, Fu Jen Catholic University, New Taipei City 242062, Taiwan
- Correspondence: ; Tel.: +886-2-2905-2973
| |
Collapse
|
9
|
Survival Risk Prediction of Esophageal Cancer Based on the Kohonen Network Clustering Algorithm and Kernel Extreme Learning Machine. MATHEMATICS 2022. [DOI: 10.3390/math10091367] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Accurate prediction of the survival risk level of patients with esophageal cancer is significant for the selection of appropriate treatment methods. It contributes to improving the living quality and survival chance of patients. However, considering that the characteristics of blood index vary with individuals on the basis of their ages, personal habits and living environment etc., a unified artificial intelligence prediction model is not precisely adequate. In order to enhance the precision of the model on the prediction of esophageal cancer survival risk, this study proposes a different model based on the Kohonen network clustering algorithm and the kernel extreme learning machine (KELM), aiming to classifying the tested population into five catergories and provide better efficiency with the use of machine learning. Firstly, the Kohonen network clustering method was used to cluster the patient samples and five types of samples were obtained. Secondly, patients were divided into two risk levels based on 5-year net survival. Then, the Taylor formula was used to expand the theory to analyze the influence of different activation functions on the KELM modeling effect, and conduct experimental verification. RBF was selected as the activation function of the KELM. Finally, the adaptive mutation sparrow search algorithm (AMSSA) was used to optimize the model parameters. The experimental results were compared with the methods of the artificial bee colony optimized support vector machine (ABC-SVM), the three layers of random forest (TLRF), the gray relational analysis–particle swarm optimization support vector machine (GP-SVM) and the mixed-effects Cox model (Cox-LMM). The results showed that the prediction model proposed in this study had certain advantages in terms of prediction accuracy and running time, and could provide support for medical personnel to choose the treatment mode of esophageal cancer patients.
Collapse
|
10
|
Chiu YL, Jhou MJ, Lee TS, Lu CJ, Chen MS. Health Data-Driven Machine Learning Algorithms Applied to Risk Indicators Assessment for Chronic Kidney Disease. Risk Manag Healthc Policy 2021; 14:4401-4412. [PMID: 34737657 PMCID: PMC8558038 DOI: 10.2147/rmhp.s319405] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Accepted: 09/30/2021] [Indexed: 01/02/2023] Open
Abstract
PURPOSE As global aging progresses, the health management of chronic diseases has become an important issue of concern to governments. Influenced by the aging of its population and improvements in the medical system and healthcare in general, Taiwan's population of patients with chronic kidney disease (CKD) has tended to grow year by year, including the incidence of high-risk cases that pose major health hazards to the elderly and middle-aged populations. METHODS This study analyzed the annual health screening data for 65,394 people from 2010 to 2015 sourced from the MJ Group - a major health screening center in Taiwan - including data for 18 risk indicators. We used five prediction model analysis methods, namely, logistic regression (LR) analysis, C5.0 decision tree (C5.0) analysis, stochastic gradient boosting (SGB) analysis, multivariate adaptive regression splines (MARS), and eXtreme gradient boosting (XGboost), with estimated glomerular filtration rate (e-GFR) data to determine G3a, G3b & G4 stage CKD risk factors. RESULTS The LR analysis (AUC=0.848), SGB analysis (AUC=0.855), and XGboost (AUC=0.858) generated similar classification performance levels and all outperformed the C5.0 and MARS methods. The study results showed that in terms of CKD risk factors, blood urea nitrogen (BUN) and uric acid (UA) were identified as the first and second most important indicators in the models of all five analysis methods, and they were also clinically recognized as the major risk factors. The results for systolic blood pressure (SBP), SGPT, SGOT, and LDL were similar to those of a related study. Interestingly, however, socioeconomic status-related education was found to be the third important indicator in all three of the better performing analysis methods, indicating that it is more important than the other risk indicators of this study, which had different levels of importance according to the different methods. CONCLUSION The five prediction model methods can provide high and similar classification performance in this study. Based on the results of this study, it is recommended that education as the socioeconomic status should be an important factor for CKD, as high educational level showed a negative and highly significant correlation with CKD. The findings of this study should also be of value for further discussions and follow-up research.
Collapse
Affiliation(s)
- Yen-Ling Chiu
- Graduate Institue of Medicine and Graduate School of Biomedical Informatics, Yuan Ze University, Taoyuan, 32003, Taiwan, Republic of China
- Graduate Institute of Clinical Medicine, National Taiwan University College of Medicine, Taipei, 10002, Taiwan, Republic of China
- Department of Medical Research, Department of Medicine,Far Eastern Memorial Hospital, New Taipei, 22056, Taiwan, Republic of China
| | - Mao-Jhen Jhou
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei, 242062, Taiwan, Republic of China
| | - Tian-Shyug Lee
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei, 242062, Taiwan, Republic of China
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City, 242062, Taiwan, Republic of China
| | - Chi-Jie Lu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei, 242062, Taiwan, Republic of China
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City, 242062, Taiwan, Republic of China
- Department of Information Management, Fu Jen Catholic University, New Taipei City, 242062, Taiwan, Republic of China
| | - Ming-Shu Chen
- Department of Healthcare Administration,College of Healthcare and Management, Asia Eastern University of Science and Technology, New Taipei, 22061, Taiwan, Republic of China
| |
Collapse
|
11
|
Comparison of Different Machine Learning Classifiers for Glaucoma Diagnosis Based on Spectralis OCT. Diagnostics (Basel) 2021; 11:diagnostics11091718. [PMID: 34574059 PMCID: PMC8471622 DOI: 10.3390/diagnostics11091718] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 09/17/2021] [Accepted: 09/18/2021] [Indexed: 11/17/2022] Open
Abstract
Early detection is important in glaucoma management. By using optical coherence tomography (OCT), the subtle structural changes caused by glaucoma can be detected. Though OCT provided abundant parameters for comprehensive information, clinicians may be confused once the results conflict. Machine learning classifiers (MLCs) are good tools for considering numerous parameters and generating reliable diagnoses in glaucoma practice. Here we aim to compare different MLCs based on Spectralis OCT parameters, including circumpapillary retinal nerve fiber layer (cRNFL) thickness, Bruch’s membrane opening-minimum rim width (BMO-MRW), Early Treatment Diabetes Retinopathy Study (ETDRS) macular thickness, and posterior pole asymmetry analysis (PPAA), in discriminating normal from glaucomatous eyes. Five MLCs were proposed, namely conditional inference trees (CIT), logistic model tree (LMT), C5.0 decision tree, random forest (RF), and extreme gradient boosting (XGBoost). Logistic regression (LGR) was used as a benchmark for comparison. RF was shown to be the best model. Ganglion cell layer measurements were the most important predictors in early glaucoma detection and cRNFL measurements were more important as the glaucoma severity increased. The global, temporal, inferior, superotemporal, and inferotemporal sites were relatively influential locations among all parameters. Clinicians should cautiously integrate the Spectralis OCT results into the entire clinical picture when diagnosing glaucoma.
Collapse
|
12
|
Improving Sports Outcome Prediction Process Using Integrating Adaptive Weighted Features and Machine Learning Techniques. Processes (Basel) 2021. [DOI: 10.3390/pr9091563] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Developing an effective sports performance analysis process is an attractive issue in sports team management. This study proposed an improved sports outcome prediction process by integrating adaptive weighted features and machine learning algorithms for basketball game score prediction. The feature engineering method is used to construct designed features based on game-lag information and adaptive weighting of variables in the proposed prediction process. These designed features are then applied to the five machine learning methods, including classification and regression trees (CART), random forest (RF), stochastic gradient boosting (SGB), eXtreme gradient boosting (XGBoost), and extreme learning machine (ELM) for constructing effective prediction models. The empirical results from National Basketball Association (NBA) data revealed that the proposed sports outcome prediction process could generate a promising prediction result compared to the competing models without adaptive weighting features. Our results also showed that the machine learning models with four game-lags information and adaptive weighting of power could generate better prediction performance.
Collapse
|
13
|
Hybrid Basketball Game Outcome Prediction Model by Integrating Data Mining Methods for the National Basketball Association. ENTROPY 2021; 23:e23040477. [PMID: 33920720 PMCID: PMC8073849 DOI: 10.3390/e23040477] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 04/08/2021] [Accepted: 04/14/2021] [Indexed: 12/18/2022]
Abstract
The sports market has grown rapidly over the last several decades. Sports outcomes prediction is an attractive sports analytic challenge as it provides useful information for operations in the sports market. In this study, a hybrid basketball game outcomes prediction scheme is developed for predicting the final score of the National Basketball Association (NBA) games by integrating five data mining techniques, including extreme learning machine, multivariate adaptive regression splines, k-nearest neighbors, eXtreme gradient boosting (XGBoost), and stochastic gradient boosting. Designed features are generated by merging different game-lags information from fundamental basketball statistics and used in the proposed scheme. This study collected data from all the games of the NBA 2018-2019 seasons. There are 30 teams in the NBA and each team play 82 games per season. A total of 2460 NBA game data points were collected. Empirical results illustrated that the proposed hybrid basketball game prediction scheme achieves high prediction performance and identifies suitable game-lag information and relevant game features (statistics). Our findings suggested that a two-stage XGBoost model using four pieces of game-lags information achieves the best prediction performance among all competing models. The six designed features, including averaged defensive rebounds, averaged two-point field goal percentage, averaged free throw percentage, averaged offensive rebounds, averaged assists, and averaged three-point field goal attempts, from four game-lags have a greater effect on the prediction of final scores of NBA games than other game-lags. The findings of this study provide relevant insights and guidance for other team or individual sports outcomes prediction research.
Collapse
|
14
|
Lahoura V, Singh H, Aggarwal A, Sharma B, Mohammed MA, Damaševičius R, Kadry S, Cengiz K. Cloud Computing-Based Framework for Breast Cancer Diagnosis Using Extreme Learning Machine. Diagnostics (Basel) 2021; 11:241. [PMID: 33557132 PMCID: PMC7913821 DOI: 10.3390/diagnostics11020241] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2020] [Revised: 01/28/2021] [Accepted: 01/29/2021] [Indexed: 02/07/2023] Open
Abstract
Globally, breast cancer is one of the most significant causes of death among women. Early detection accompanied by prompt treatment can reduce the risk of death due to breast cancer. Currently, machine learning in cloud computing plays a pivotal role in disease diagnosis, but predominantly among the people living in remote areas where medical facilities are scarce. Diagnosis systems based on machine learning act as secondary readers and assist radiologists in the proper diagnosis of diseases, whereas cloud-based systems can support telehealth services and remote diagnostics. Techniques based on artificial neural networks (ANN) have attracted many researchers to explore their capability for disease diagnosis. Extreme learning machine (ELM) is one of the variants of ANN that has a huge potential for solving various classification problems. The framework proposed in this paper amalgamates three research domains: Firstly, ELM is applied for the diagnosis of breast cancer. Secondly, to eliminate insignificant features, the gain ratio feature selection method is employed. Lastly, a cloud computing-based system for remote diagnosis of breast cancer using ELM is proposed. The performance of the cloud-based ELM is compared with some state-of-the-art technologies for disease diagnosis. The results achieved on the Wisconsin Diagnostic Breast Cancer (WBCD) dataset indicate that the cloud-based ELM technique outperforms other results. The best performance results of ELM were found for both the standalone and cloud environments, which were compared. The important findings of the experimental results indicate that the accuracy achieved is 0.9868, the recall is 0.9130, the precision is 0.9054, and the F1-score is 0.8129.
Collapse
Affiliation(s)
- Vivek Lahoura
- Department of Computer Science and Engineering, DAV University, Jalandhar 144 012, Punjab, India; (V.L.); (H.S.)
| | - Harpreet Singh
- Department of Computer Science and Engineering, DAV University, Jalandhar 144 012, Punjab, India; (V.L.); (H.S.)
| | - Ashutosh Aggarwal
- Department of Computer Science and Engineering, Thapar Institute of Engineering and Technology, Patiala 147004, Punjab, India;
| | - Bhisham Sharma
- Chitkara University School of Engineering and Technology, Chitkara University, Himachal Pradesh, India;
| | - Mazin Abed Mohammed
- Information Systems Department, College of Computer Science and Information Technology, University of Anbar, 55431 Ramadi, Anbar, Iraq;
| | - Robertas Damaševičius
- Department of Applied Informatics, Vytautas Magnus University, 44404 Kaunas, Lithuania
- Faculty of Applied Mathematics, Silesian University of Technology, 44-100 Gliwice, Poland
| | - Seifedine Kadry
- Faculty of Applied Computing and Technology (FACT), Noroff University College, 4608 Kristiansand, Norway;
| | - Korhan Cengiz
- Department of Electrical—Electronics Engineering, Trakya University, Edirne 22030, Turkey;
| |
Collapse
|
15
|
Shih CC, Lu CJ, Chen GD, Chang CC. Risk Prediction for Early Chronic Kidney Disease: Results from an Adult Health Examination Program of 19,270 Individuals. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:ijerph17144973. [PMID: 32664271 PMCID: PMC7399976 DOI: 10.3390/ijerph17144973] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/30/2020] [Revised: 07/06/2020] [Accepted: 07/07/2020] [Indexed: 12/13/2022]
Abstract
Developing effective risk prediction models is a cost-effective approach to predicting complications of chronic kidney disease (CKD) and mortality rates; however, there is inadequate evidence to support screening for CKD. In this study, four data mining algorithms, including a classification and regression tree, a C4.5 decision tree, a linear discriminant analysis, and an extreme learning machine, are used to predict early CKD. The study includes datasets from 19,270 patients, provided by an adult health examination program from 32 chain clinics and three special physical examination centers, between 2015 and 2019. There were 11 independent variables, and the glomerular filtration rate (GFR) was used as the predictive variable. The C4.5 decision tree algorithm outperformed the three comparison models for predicting early CKD based on accuracy, sensitivity, specificity, and area under the curve metrics. It is, therefore, a promising method for early CKD prediction. The experimental results showed that Urine protein and creatinine ratio (UPCR), Proteinuria (PRO), Red blood cells (RBC), Glucose Fasting (GLU), Triglycerides (TG), Total Cholesterol (T-CHO), age, and gender are important risk factors. CKD care is closely related to primary care level and is recognized as a healthcare priority in national strategy. The proposed risk prediction models can support the important influence of personality and health examination representations in predicting early CKD.
Collapse
Affiliation(s)
- Chin-Chuan Shih
- Institute of Medicine, Chung Shan Medical University, Taichung 40201, Taiwan; (C.-C.S.); (G.-D.C.)
- General Administrative Department, United Safety Medical Group, New Taipei City 24205, Taiwan
- Deputy Chairman, Taiwan Association of Family Medicine, Taipei 24200, Taiwan
| | - Chi-Jie Lu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei 24205, Taiwan;
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei 24205, Taiwan
| | - Gin-Den Chen
- Institute of Medicine, Chung Shan Medical University, Taichung 40201, Taiwan; (C.-C.S.); (G.-D.C.)
- Department of Obstetrics and Gynecology, Chung Shan Medical University Hospital, Taichung 40201, Taiwan
| | - Chi-Chang Chang
- School of Medical Informatics, Chung Shan Medical University & IT office, Chung Shan Medical University Hospital, Taichung 40201, Taiwan
- Correspondence: ; Tel.: +886-4-24730022
| |
Collapse
|