1
|
Chellappan D, Rajaguru H. Machine Learning Meets Meta-Heuristics: Bald Eagle Search Optimization and Red Deer Optimization for Feature Selection in Type II Diabetes Diagnosis. Bioengineering (Basel) 2024; 11:766. [PMID: 39199724 PMCID: PMC11351847 DOI: 10.3390/bioengineering11080766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 07/10/2024] [Accepted: 07/22/2024] [Indexed: 09/01/2024] Open
Abstract
This article investigates the effectiveness of feature extraction and selection techniques in enhancing the performance of classifier accuracy in Type II Diabetes Mellitus (DM) detection using microarray gene data. To address the inherent high dimensionality of the data, three feature extraction (FE) methods are used, namely Short-Time Fourier Transform (STFT), Ridge Regression (RR), and Pearson's Correlation Coefficient (PCC). To further refine the data, meta-heuristic algorithms like Bald Eagle Search Optimization (BESO) and Red Deer Optimization (RDO) are utilized for feature selection. The performance of seven classification techniques, Non-Linear Regression-NLR, Linear Regression-LR, Gaussian Mixture Models-GMMs, Expectation Maximization-EM, Logistic Regression-LoR, Softmax Discriminant Classifier-SDC, and Support Vector Machine with Radial Basis Function kernel-SVM-RBF, are evaluated with and without feature selection. The analysis reveals that the combination of PCC with SVM-RBF achieved a promising accuracy of 92.85% even without feature selection. Notably, employing BESO with PCC and SVM-RBF maintained this high accuracy. However, the highest overall accuracy of 97.14% was achieved when RDO was used for feature selection alongside PCC and SVM-RBF. These findings highlight the potential of feature extraction and selection techniques, particularly RDO with PCC, in improving the accuracy of DM detection using microarray gene data.
Collapse
Affiliation(s)
- Dinesh Chellappan
- Department of Electrical and Electronics Engineering, KPR Institute of Engineering and Technology, Coimbatore 641 407, Tamil Nadu, India;
| | - Harikumar Rajaguru
- Department of Electronics and Communication Engineering, Bannari Amman Institute of Technology, Sathyamangalam 638 401, Tamil Nadu, India
| |
Collapse
|
2
|
Liu CH, Peng CH, Huang LY, Chen FY, Kuo CH, Wu CZ, Cheng YF. Comparison of multiple linear regression and machine learning methods in predicting cognitive function in older Chinese type 2 diabetes patients. BMC Neurol 2024; 24:11. [PMID: 38166825 PMCID: PMC10759520 DOI: 10.1186/s12883-023-03507-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Accepted: 12/12/2023] [Indexed: 01/05/2024] Open
Abstract
INTRODUCTION The prevalence of type 2 diabetes (T2D) has increased dramatically in recent decades, and there are increasing indications that dementia is related to T2D. Previous attempts to analyze such relationships principally relied on traditional multiple linear regression (MLR). However, recently developed machine learning methods (Mach-L) outperform MLR in capturing non-linear relationships. The present study applied four different Mach-L methods to analyze the relationships between risk factors and cognitive function in older T2D patients, seeking to compare the accuracy between MLR and Mach-L in predicting cognitive function and to rank the importance of risks factors for impaired cognitive function in T2D. METHODS We recruited older T2D between 60-95 years old without other major comorbidities. Demographic factors and biochemistry data were used as independent variables and cognitive function assessment (CFA) was conducted using the Montreal Cognitive Assessment as an independent variable. In addition to traditional MLR, we applied random forest (RF), stochastic gradient boosting (SGB), Naïve Byer's classifier (NB) and eXtreme gradient boosting (XGBoost). RESULTS Totally, the test cohort consisted of 197 T2D (98 men and 99 women). Results showed that all ML methods outperformed MLR, with symmetric mean absolute percentage errors for MLR, RF, SGB, NB and XGBoost respectively of 0.61, 0.599, 0.606, 0.599 and 0.2139. Education level, age, frailty score, fasting plasma glucose and body mass index were identified as key factors in descending order of importance. CONCLUSION In conclusion, our study demonstrated that RF, SGB, NB and XGBoost are more accurate than MLR for predicting CFA score, and identify education level, age, frailty score, fasting plasma glucose, body fat and body mass index as important risk factors in an older Chinese T2D cohort.
Collapse
Affiliation(s)
- Chi-Hao Liu
- Department of Medicine, Division of Nephrology, Kaohsiung Armed Forces General Hospital, Kaohsiung, Taiwan, R.O.C
| | - Chung-Hsin Peng
- Department of Urology, Cardinal Tien Hospital, School of Medicine, Fu-Jen Catholic University, New Taipei City, Taiwan, R.O.C
| | - Li-Ying Huang
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Department of Medical Education, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City, Taiwan, R.O.C
| | - Fang-Yu Chen
- Department of Internal Medicine, Division of Endocrinology and Metabolism, Fu Jen Catholic University Hospital, New Taipei City, Taiwan, R.O.C
| | - Chun-Heng Kuo
- Department of Internal Medicine, Division of Endocrinology and Metabolism, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City, Taiwan, R.O.C
| | - Chung-Ze Wu
- Department of Internal Medicine, Division of Endocrinology, Shuang Ho Hospital, New Taipei City, 23561, R.O.C
- Division of Endocrinology and Metabolism, School of Medicine, College of Medicine, Taipei Medical University, Taipei, 11031, Taiwan, R.O.C
| | - Yu-Fang Cheng
- Department of Endocrinology and Metabolism, Changhua Christian Hospital, 135 Nanhsiao Street, Changhua City, 50006, Taiwan, R.O.C..
- Department of Medicine, Taipei Medical University, Taipei, Taiwan, R.O.C..
| |
Collapse
|
3
|
Wang CK, Chang CY, Chu TW, Liang YJ. Using Machine Learning to Identify the Relationships between Demographic, Biochemical, and Lifestyle Parameters and Plasma Vitamin D Concentration in Healthy Premenopausal Chinese Women. Life (Basel) 2023; 13:2257. [PMID: 38137858 PMCID: PMC10744461 DOI: 10.3390/life13122257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 11/15/2023] [Accepted: 11/22/2023] [Indexed: 12/24/2023] Open
Abstract
INTRODUCTION Vitamin D plays a vital role in maintaining homeostasis and enhancing the absorption of calcium, an essential component for strengthening bones and preventing osteoporosis. There are many factors known to relate to plasma vitamin D concentration (PVDC). However, most of these studies were performed with traditional statistical methods. Nowadays, machine learning methods (Mach-L) have become new tools in medical research. In the present study, we used four Mach-L methods to explore the relationships between PVDC and demographic, biochemical, and lifestyle factors in a group of healthy premenopausal Chinese women. Our goals were as follows: (1) to evaluate and compare the predictive accuracy of Mach-L and MLR, and (2) to establish a hierarchy of the significance of the aforementioned factors related to PVDC. METHODS Five hundred ninety-three healthy Chinese women were enrolled. In total, there were 35 variables recorded, including demographic, biochemical, and lifestyle information. The dependent variable was 25-OH vitamin D (PVDC), and all other variables were the independent variables. Multiple linear regression (MLR) was regarded as the benchmark for comparison. Four Mach-L methods were applied (random forest (RF), stochastic gradient boosting (SGB), extreme gradient boosting (XGBoost), and elastic net). Each method would produce several estimation errors. The smaller these errors were, the better the model was. RESULTS Pearson's correlation, age, glycated hemoglobin, HDL-cholesterol, LDL-cholesterol, and hemoglobin were positively correlated to PVDC, whereas eGFR was negatively correlated to PVDC. The Mach-L methods yielded smaller estimation errors for all five parameters, which indicated that they were better methods than the MLR model. After averaging the importance percentage from the four Mach-L methods, a rank of importance could be obtained. Age was the most important factor, followed by plasma insulin level, TSH, spouse status, LDH, and ALP. CONCLUSIONS In a healthy Chinese premenopausal cohort using four different Mach-L methods, age was found to be the most important factor related to PVDC, followed by plasma insulin level, TSH, spouse status, LDH, and ALP.
Collapse
Affiliation(s)
- Chun-Kai Wang
- Department of Obstetrics and Gynecology, Zuoying Branch of Kaohsiung Armed Forces General Hospital, Kaohsiung 813, Taiwan;
| | - Ching-Yao Chang
- Graduate Institute of Applied Science and Engineering, Fu Jen Catholic University, New Taipei City 242, Taiwan;
| | - Ta-Wei Chu
- Department of Obstetrics and Gynecology, Tri-Service General Hospital, National Defense Medical Center, Chief Executive Officer’s Office, MJ Health Research Foundation, Taipei 114, Taiwan;
| | - Yao-Jen Liang
- Graduate Institute of Applied Science and Engineering, Fu Jen Catholic University, New Taipei City 242, Taiwan;
| |
Collapse
|
4
|
Yang CC, Peng CH, Huang LY, Chen FY, Kuo CH, Wu CZ, Hsia TL, Lin CY. Comparison between multiple logistic regression and machine learning methods in prediction of abnormal thallium scans in type 2 diabetes. World J Clin Cases 2023; 11:7951-7964. [DOI: 10.12998/wjcc.v11.i33.7951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 10/23/2023] [Accepted: 11/13/2023] [Indexed: 11/24/2023] Open
Abstract
BACKGROUND The prevalence of type 2 diabetes (T2D) has been increasing dramatically in recent decades, and 47.5% of T2D patients will die of cardiovascular disease. Thallium-201 myocardial perfusion scan (MPS) is a precise and non-invasive method to detect coronary artery disease (CAD). Most previous studies used traditional logistic regression (LGR) to evaluate the risks for abnormal CAD. Rapidly developing machine learning (Mach-L) techniques could potentially outperform LGR in capturing non-linear relationships.
AIM To aims were: (1) Compare the accuracy of Mach-L methods and LGR; and (2) Found the most important factors for abnormal TMPS.
METHODS 556 T2D were enrolled in the study (287 men and 269 women). Demographic and biochemistry data were used as independent variables and the sum of stressed score derived from MPS scan was the dependent variable. Subjects with a MPS score ≥ 9 were defined as abnormal. In addition to traditional LGR, classification and regression tree (CART), random forest, Naïve Bayes, and eXtreme gradient boosting were also applied. Sensitivity, specificity, accuracy and area under the receiver operation curve were used to evaluate the respective accuracy of LGR and Mach-L methods.
RESULTS Except for CART, the other Mach-L methods outperformed LGR, with gender, body mass index, age, low-density lipoprotein cholesterol, glycated hemoglobin and smoking emerging as the most important factors to predict abnormal MPS.
CONCLUSION Four Mach-L methods are found to outperform LGR in predicting abnormal TMPS in Chinese T2D, with the most important risk factors being gender, body mass index, age, low-density lipoprotein cholesterol, glycated hemoglobin and smoking.
Collapse
Affiliation(s)
- Chung-Chi Yang
- Division of Cardiovascular Medicine, Taoyuan Armed Forces General Hospital, Taoyuan City 32551, Taiwan
- Division of Cardiovascular, Tri-service General Hospital, Taipei City 114202, Taiwan
| | - Chung-Hsin Peng
- Department of Urology, Cardinal Tien Hospital, New Taipei City 23148, Taiwan
- School of Medicine, Fu-Jen Catholic University, New Taipei City 242062, Taiwan
| | - Li-Ying Huang
- Department of Internal Medicine, Department of Medical Education, School of Medicine, Fu Jen Catholic University Hospital, New Taipei City 243, Taiwan
- School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 243, Taiwan
| | - Fang Yu Chen
- Department of Endocrinology, Fu Jen Catholic University Hospital, New Taipei City 243, Taiwan
| | - Chun-Heng Kuo
- School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 243, Taiwan
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Fu Jen Catholic University Hospital, New Taipei City 243, Taiwan
| | - Chung-Ze Wu
- Division of Endocrinology, Shuang Ho Hospital, New Taipei City 23561, Taiwan
- School of Medicine, Taipei Medical University, Taipei City 11031, Taiwan
| | - Te-Lin Hsia
- Department of Internal Medicine, Cardinal Tien Hospital, New Taipei City 23148, Taiwan
| | - Chung-Yu Lin
- Department of Cardiology, Fu Jen Catholic University Hospital, New Taipei City 24352, Taiwan
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242062, Taiwan
| |
Collapse
|
5
|
Feng X, Cai Y, Xin R. Optimizing diabetes classification with a machine learning-based framework. BMC Bioinformatics 2023; 24:428. [PMID: 37957549 PMCID: PMC10644638 DOI: 10.1186/s12859-023-05467-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 09/04/2023] [Indexed: 11/15/2023] Open
Abstract
BACKGROUND Diabetes is a metabolic disorder usually caused by insufficient secretion of insulin from the pancreas or insensitivity of cells to insulin, resulting in long-term elevated blood sugar levels in patients. Patients usually present with frequent urination, thirst, and hunger. If left untreated, it can lead to various complications that can affect essential organs and even endanger life. Therefore, developing an intelligent diagnosis framework for diabetes is necessary. RESULT This paper proposes a machine learning-based diabetes classification framework machine learning optimized GAN. The framework encompasses several methodological approaches to address the diverse challenges encountered during the analysis. These approaches encompass the implementation of the mean and median joint filling method for handling missing values, the application of the cap method for outlier processing, and the utilization of SMOTEENN to mitigate sample imbalance. Additionally, the framework incorporates the employment of the proposed Diabetes Classification Model based on Generative Adversarial Network and employs logistic regression for detailed feature analysis. The effectiveness of the framework is evaluated using both the PIMA dataset and the diabetes dataset obtained from the GEO database. The experimental findings showcase our model achieved exceptional results, including a binary classification accuracy of 96.27%, tertiary classification accuracy of 99.31%, precision and f1 score of 0.9698, recall of 0.9698, and an AUC of 0.9702. CONCLUSION The experimental results show that the framework proposed in this paper can accurately classify diabetes and provide new ideas for intelligent diagnosis of diabetes.
Collapse
Affiliation(s)
- Xin Feng
- School of Science, Jilin Institute of Chemical Technology, Jilin, 130000, People's Republic of China
- State Key Laboratory of Inorganic Synthesis and Preparative Chemistry, College of Chemistry, Jilin University, Changchun, 130012, People's Republic of China
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, 130012, People's Republic of China
| | - Yihuai Cai
- School of Science, Jilin Institute of Chemical Technology, Jilin, 130000, People's Republic of China.
| | - Ruihao Xin
- College of Information and Control Engineering, Jilin Institute of Chemical Technology, Jilin, 130000, People's Republic of China.
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, People's Republic of China.
| |
Collapse
|
6
|
Tzou SJ, Peng CH, Huang LY, Chen FY, Kuo CH, Wu CZ, Chu TW. Comparison between linear regression and four different machine learning methods in selecting risk factors for osteoporosis in a Chinese female aged cohort. J Chin Med Assoc 2023; 86:1028-1036. [PMID: 37729604 DOI: 10.1097/jcma.0000000000000999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 09/22/2023] Open
Abstract
BACKGROUND Population aging is emerging as an increasingly acute challenge for countries around the world. One particular manifestation of this phenomenon is the impact of osteoporosis on individuals and national health systems. Previous studies of risk factors for osteoporosis were conducted using traditional statistical methods, but more recent efforts have turned to machine learning approaches. Most such efforts, however, treat the target variable (bone mineral density [BMD] or fracture rate) as a categorical one, which provides no quantitative information. The present study uses five different machine learning methods to analyze the risk factors for T-score of BMD, seeking to (1) compare the prediction accuracy between different machine learning methods and traditional multiple linear regression (MLR) and (2) rank the importance of 25 different risk factors. METHODS The study sample includes 24 412 women older than 55 years with 25 related variables, applying traditional MLR and five different machine learning methods: classification and regression tree, Naïve Bayes, random forest, stochastic gradient boosting, and eXtreme gradient boosting. The metrics used for model performance comparisons are the symmetric mean absolute percentage error, relative absolute error, root relative squared error, and root mean squared error. RESULTS Machine learning approaches outperformed MLR for all four prediction errors. The average importance ranking of each factor generated by the machine learning methods indicates that age is the most important factor determining T-score, followed by estimated glomerular filtration rate (eGFR), body mass index (BMI), uric acid (UA), and education level. CONCLUSION In a group of women older than 55 years, we demonstrated that machine learning methods provide superior performance in estimating T-Score, with age being the most important impact factor, followed by eGFR, BMI, UA, and education level.
Collapse
Affiliation(s)
- Shiow-Jyu Tzou
- Teaching and Researching Center, Kaohsiung Armed Forces General Hospital, Kaohsiung, Taiwan, ROC
- Institute of Medical Science and Technology, National Sun Yat-sen University, Kaohsiung, Taiwan, ROC
| | - Chung-Hsin Peng
- Department of Urology, Cardinal Tien Hospital, School of Medicine, Fu Jen Catholic University, New Taipei City, Taiwan, ROC
| | - Li-Ying Huang
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Fu Jen Catholic University Hospital, New Taipei, Taiwan
| | - Fang-Yu Chen
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Fu Jen Catholic University Hospital, New Taipei, Taiwan
| | - Chun-Heng Kuo
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Fu Jen Catholic University Hospital, New Taipei, Taiwan
| | - Chung-Ze Wu
- Department of Internal Medicine, Shuang Ho Hospital, New Taipei City, Division of Endocrinology and Metabolism, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan, ROC
| | - Ta-Wei Chu
- Department of Obstetrics and Gynecology, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan, ROC
- MJ Health Research Foundation, Taipei, Taiwan, ROC
| |
Collapse
|
7
|
Wu CZ, Huang LY, Chen FY, Kuo CH, Yeih DF. Using Machine Learning to Predict Abnormal Carotid Intima-Media Thickness in Type 2 Diabetes. Diagnostics (Basel) 2023; 13:diagnostics13111834. [PMID: 37296685 DOI: 10.3390/diagnostics13111834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Revised: 05/16/2023] [Accepted: 05/20/2023] [Indexed: 06/12/2023] Open
Abstract
Carotid intima-media thickness (c-IMT) is a reliable risk factor for cardiovascular disease risk in type 2 diabetes (T2D) patients. The present study aimed to compare the effectiveness of different machine learning methods and traditional multiple logistic regression in predicting c-IMT using baseline features and to establish the most significant risk factors in a T2D cohort. We followed up with 924 patients with T2D for four years, with 75% of the participants used for model development. Machine learning methods, including classification and regression tree, random forest, eXtreme gradient boosting, and Naïve Bayes classifier, were used to predict c-IMT. The results showed that all machine learning methods, except for classification and regression tree, were not inferior to multiple logistic regression in predicting c-IMT in terms of higher area under receiver operation curve. The most significant risk factors for c-IMT were age, sex, creatinine, body mass index, diastolic blood pressure, and duration of diabetes, sequentially. Conclusively, machine learning methods could improve the prediction of c-IMT in T2D patients compared to conventional logistic regression models. This could have crucial implications for the early identification and management of cardiovascular disease in T2D patients.
Collapse
Affiliation(s)
- Chung-Ze Wu
- Division of Endocrinology and Metabolism, Department of Internal Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei City 11031, Taiwan
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Shuang Ho Hospital, Taipei Medical University, New Taipei City 23561, Taiwan
| | - Li-Ying Huang
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Department of Medical Education, Fu Jen Catholic University Hospital, Fu Jen Catholic University, New Taipei City 24352, Taiwan
- School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan
| | - Fang-Yu Chen
- School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Fu Jen Catholic University Hospital, Fu Jen Catholic University, New Taipei City 24352, Taiwan
| | - Chun-Heng Kuo
- School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Fu Jen Catholic University Hospital, Fu Jen Catholic University, New Taipei City 24352, Taiwan
| | - Dong-Feng Yeih
- School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan
- Division of Cardiology, Department of Internal Medicine, Fu Jen Catholic University Hospital, Fu Jen Catholic University, New Taipei City 24352, Taiwan
| |
Collapse
|
8
|
Olusanya MO, Ogunsakin RE, Ghai M, Adeleke MA. Accuracy of Machine Learning Classification Models for the Prediction of Type 2 Diabetes Mellitus: A Systematic Survey and Meta-Analysis Approach. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph192114280. [PMID: 36361161 PMCID: PMC9655196 DOI: 10.3390/ijerph192114280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 10/22/2022] [Accepted: 10/25/2022] [Indexed: 05/13/2023]
Abstract
Soft-computing and statistical learning models have gained substantial momentum in predicting type 2 diabetes mellitus (T2DM) disease. This paper reviews recent soft-computing and statistical learning models in T2DM using a meta-analysis approach. We searched for papers using soft-computing and statistical learning models focused on T2DM published between 2010 and 2021 on three different search engines. Of 1215 studies identified, 34 with 136952 patients met our inclusion criteria. The pooled algorithm's performance was able to predict T2DM with an overall accuracy of 0.86 (95% confidence interval [CI] of [0.82, 0.89]). The classification of diabetes prediction was significantly greater in models with a screening and diagnosis (pooled proportion [95% CI] = 0.91 [0.74, 0.97]) when compared to models with nephropathy (pooled proportion = 0.48 [0.76, 0.89] to 0.88 [0.83, 0.91]). For the prediction of T2DM, the decision trees (DT) models had a pooled accuracy of 0.88 [95% CI: 0.82, 0.92], and the neural network (NN) models had a pooled accuracy of 0.85 [95% CI: 0.79, 0.89]. Meta-regression did not provide any statistically significant findings for the heterogeneous accuracy in studies with different diabetes predictions, sample sizes, and impact factors. Additionally, ML models showed high accuracy for the prediction of T2DM. The predictive accuracy of ML algorithms in T2DM is promising, mainly through DT and NN models. However, there is heterogeneity among ML models. We compared the results and models and concluded that this evidence might help clinicians interpret data and implement optimum models for their dataset for T2DM prediction.
Collapse
Affiliation(s)
- Micheal O. Olusanya
- Department of Computer Science and Information Technology, Sol Plaatje University, Kimberley 8300, South Africa
- Correspondence:
| | - Ropo Ebenezer Ogunsakin
- Biostatistics Unit, Discipline of Public Health Medicine, School of Nursing & Public Health, College of Health Sciences, University of KwaZulu-Natal, Durban 4000, South Africa
| | - Meenu Ghai
- Discipline of Genetics, School of Life Sciences, University of KwaZulu-Natal, Durban 4000, South Africa
| | - Matthew Adekunle Adeleke
- Discipline of Genetics, School of Life Sciences, University of KwaZulu-Natal, Durban 4000, South Africa
| |
Collapse
|
9
|
Comparison between Machine Learning and Multiple Linear Regression to Identify Abnormal Thallium Myocardial Perfusion Scan in Chinese Type 2 Diabetes. Diagnostics (Basel) 2022; 12:diagnostics12071619. [PMID: 35885524 PMCID: PMC9324130 DOI: 10.3390/diagnostics12071619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2022] [Revised: 06/27/2022] [Accepted: 06/30/2022] [Indexed: 11/17/2022] Open
Abstract
Type 2 diabetes mellitus (T2DM) patients have a high risk of coronary artery disease (CAD). Thallium-201 myocardial perfusion scan (Th-201 scan) is a non-invasive and extensively used tool in recognizing CAD in clinical settings. In this study, we attempted to compare the predictive accuracy of evaluating abnormal Th-201 scans using traditional multiple linear regression (MLR) with four machine learning (ML) methods. From the study, we can determine whether ML surpasses traditional MLR and rank the clinical variables and compare them with previous reports.In total, 796 T2DM, including 368 men and 528 women, were enrolled. In addition to traditional MLR, classification and regression tree (CART), random forest (RF), stochastic gradient boosting (SGB) and eXtreme gradient boosting (XGBoost) were also used to analyze abnormal Th-201 scans. Stress sum score was used as the endpoint (dependent variable). Our findings show that all four root mean square errors of ML are smaller than with MLR, which implies that ML is more precise than MLR in determining abnormal Th-201 scans by using clinical parameters. The first seven factors, from the most important to the least are:body mass index, hemoglobin, age, glycated hemoglobin, Creatinine, systolic and diastolic blood pressure. In conclusion, ML is not inferior to traditional MLR in predicting abnormal Th-201 scans, and the most important factors are body mass index, hemoglobin, age, glycated hemoglobin, creatinine, systolic and diastolic blood pressure. ML methods are superior in these kinds of studies.
Collapse
|
10
|
Huang LY, Chen FY, Jhou MJ, Kuo CH, Wu CZ, Lu CH, Chen YL, Pei D, Cheng YF, Lu CJ. Comparing Multiple Linear Regression and Machine Learning in Predicting Diabetic Urine Albumin-Creatinine Ratio in a 4-Year Follow-Up Study. J Clin Med 2022; 11:3661. [PMID: 35806944 PMCID: PMC9267784 DOI: 10.3390/jcm11133661] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 06/19/2022] [Accepted: 06/22/2022] [Indexed: 02/07/2023] Open
Abstract
The urine albumin-creatinine ratio (uACR) is a warning for the deterioration of renal function in type 2 diabetes (T2D). The early detection of ACR has become an important issue. Multiple linear regression (MLR) has traditionally been used to explore the relationships between risk factors and endpoints. Recently, machine learning (ML) methods have been widely applied in medicine. In the present study, four ML methods were used to predict the uACR in a T2D cohort. We hypothesized that (1) ML outperforms traditional MLR and (2) different ranks of the importance of the risk factors will be obtained. A total of 1147 patients with T2D were followed up for four years. MLR, classification and regression tree, random forest, stochastic gradient boosting, and eXtreme gradient boosting methods were used. Our findings show that the prediction errors of the ML methods are smaller than those of MLR, which indicates that ML is more accurate. The first six most important factors were baseline creatinine level, systolic and diastolic blood pressure, glycated hemoglobin, and fasting plasma glucose. In conclusion, ML might be more accurate in predicting uACR in a T2D cohort than the traditional MLR, and the baseline creatinine level is the most important predictor, which is followed by systolic and diastolic blood pressure, glycated hemoglobin, and fasting plasma glucose in Chinese patients with T2D.
Collapse
Affiliation(s)
- Li-Ying Huang
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Department of Medical Education, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan; (L.-Y.H.); (F.-Y.C.); (C.-H.K.); (D.P.)
| | - Fang-Yu Chen
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Department of Medical Education, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan; (L.-Y.H.); (F.-Y.C.); (C.-H.K.); (D.P.)
| | - Mao-Jhen Jhou
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242062, Taiwan;
| | - Chun-Heng Kuo
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Department of Medical Education, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan; (L.-Y.H.); (F.-Y.C.); (C.-H.K.); (D.P.)
| | - Chung-Ze Wu
- Division of Endocrinology, Department of Internal Medicine, Shuang Ho Hospital, New Taipei City 23561, Taiwan;
- Division of Endocrinology and Metabolism, Department of Internal Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei 11031, Taiwan
| | - Chieh-Hua Lu
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Tri-Service General Hospital, School of Medicine, National Defense Medical Center, Taipei 11490, Taiwan;
| | - Yen-Lin Chen
- Department of Pathology, Tri-Service General Hospital, National Defense Medical Center, Taipei 11490, Taiwan;
| | - Dee Pei
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Department of Medical Education, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan; (L.-Y.H.); (F.-Y.C.); (C.-H.K.); (D.P.)
| | - Yu-Fang Cheng
- Department of Endocrinology and Metabolism, Changhua Christian Hospital, Changhua 50051, Taiwan;
| | - Chi-Jie Lu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242062, Taiwan;
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City 242062, Taiwan
- Department of Information Management, Fu Jen Catholic University, New Taipei City 242062, Taiwan
| |
Collapse
|
11
|
Kanimozhi N, Singaravel G. Hybrid artificial fish particle swarm optimizer and kernel extreme learning machine for type-II diabetes predictive model. Med Biol Eng Comput 2021; 59:841-867. [PMID: 33738640 DOI: 10.1007/s11517-021-02333-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Accepted: 02/03/2021] [Indexed: 10/21/2022]
Abstract
The World Health Organization (WHO) estimated that in 2016, 1.6 million deaths caused were due to diabetes. Precise and on-time diagnosis of type-II diabetes is crucial to reduce the risk of various diseases such as heart disease, stroke, kidney disease, diabetic retinopathy, diabetic neuropathy, and macrovascular problems. The non-invasive methods like machine learning are reliable and efficient in classifying the people subjected to type-II diabetics risk and healthy people into two different categories. This present study aims to develop a stacking-based integrated kernel extreme learning machine (KELM) model for identifying the risk of type-II diabetic patients based on the follow-up time on the diabetes research center dataset. The Pima Indian Diabetic Dataset (PIDD) and a Diabetic Research Center dataset are used in this study. A min-max normalization is used to preprocess the noisy datasets. The Hybrid Particle Swarm Optimization-Artificial Fish Swarm Optimization (HAFPSO) algorithm used satisfies the multi-objective problem by increasing the Classification Accuracy (CA) and decreasing the kernel complexity of the optimal learners (NBC) selected. At last, the model is integrated by utilizing the KELM as a meta-classifier which combines the predictions of the twenty Base Learners as a whole. The proposed classification method helps the clinicians to predict the patients who are at a high risk of type-II diabetes in the future with the highest accuracy of 98.5%. The proposed method is tested with different measures such as accuracy, sensitivity, specificity, Mathews Correlation Coefficient, and Kappa Statistics are calculated. The results obtained show that the KELM-HAFPSO approach is a promising new tool for identifying type-II diabetes.
Collapse
Affiliation(s)
- N Kanimozhi
- Department of Computer Science and Engineering, GKM College of Engineering and Technology, Chennai, India.
| | - G Singaravel
- Department of Information Technology, K S Rangasamy College of Engineering, Tiruchengode, India
| |
Collapse
|
12
|
Machine Learning Applied to Diagnosis of Human Diseases: A Systematic Review. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10155135] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Human healthcare is one of the most important topics for society. It tries to find the correct effective and robust disease detection as soon as possible to patients receipt the appropriate cares. Because this detection is often a difficult task, it becomes necessary medicine field searches support from other fields such as statistics and computer science. These disciplines are facing the challenge of exploring new techniques, going beyond the traditional ones. The large number of techniques that are emerging makes it necessary to provide a comprehensive overview that avoids very particular aspects. To this end, we propose a systematic review dealing with the Machine Learning applied to the diagnosis of human diseases. This review focuses on modern techniques related to the development of Machine Learning applied to diagnosis of human diseases in the medical field, in order to discover interesting patterns, making non-trivial predictions and useful in decision-making. In this way, this work can help researchers to discover and, if necessary, determine the applicability of the machine learning techniques in their particular specialties. We provide some examples of the algorithms used in medicine, analysing some trends that are focused on the goal searched, the algorithm used, and the area of applications. We detail the advantages and disadvantages of each technique to help choose the most appropriate in each real-life situation, as several authors have reported. The authors searched Scopus, Journal Citation Reports (JCR), Google Scholar, and MedLine databases from the last decades (from 1980s approximately) up to the present, with English language restrictions, for studies according to the objectives mentioned above. Based on a protocol for data extraction defined and evaluated by all authors using PRISMA methodology, 141 papers were included in this advanced review.
Collapse
|
13
|
Design of an integrated model for diagnosis and classification of pediatric acute leukemia using machine learning. Proc Inst Mech Eng H 2020; 234:1051-1069. [DOI: 10.1177/0954411920938567] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Applying artificial intelligence techniques for diagnosing diseases in hospitals often provides advanced medical services to patients such as the diagnosis of leukemia. On the other hand, surgery and bone marrow sampling, especially in the diagnosis of childhood leukemia, are even more complex and difficult, resulting in increased human error and procedure time decreased patient satisfaction and increased costs. This study investigates the use of neuro-fuzzy and group method of data handling, for the diagnosis of acute leukemia in children based on the complete blood count test. Furthermore, a principal component analysis is applied to increase the accuracy of the diagnosis. The results show that distinguishing between patient and non-patient individuals can easily be done with adaptive neuro-fuzzy inference system, whereas for classifying between the types of diseases themselves, more pre-processing operations such as reduction of features may be needed. The proposed approach may help to distinguish between two types of leukemia including acute lymphoblastic leukemia and acute myeloid leukemia. Based on the sensitivity of the diagnosis, experts can use the proposed algorithm to help identify the disease earlier and lessen the cost.
Collapse
|
14
|
Modeling the Research Landscapes of Artificial Intelligence Applications in Diabetes (GAP RESEARCH). INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:ijerph17061982. [PMID: 32192211 PMCID: PMC7143845 DOI: 10.3390/ijerph17061982] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 02/28/2020] [Accepted: 03/09/2020] [Indexed: 11/17/2022]
Abstract
The rising prevalence and global burden of diabetes fortify the need for more comprehensive and effective management to prevent, monitor, and treat diabetes and its complications. Applying artificial intelligence in complimenting the diagnosis, management, and prediction of the diabetes trajectory has been increasingly common over the years. This study aims to illustrate an inclusive landscape of application of artificial intelligence in diabetes through a bibliographic analysis and offers future direction for research. Bibliometrics analysis was combined with exploratory factor analysis and latent Dirichlet allocation to uncover emergent research domains and topics related to artificial intelligence and diabetes. Data were extracted from the Web of Science Core Collection database. The results showed a rising trend in the number of papers and citations concerning AI applications in diabetes, especially since 2010. The nucleus driving the research and development of AI in diabetes is centered around developed countries, mainly consisting of the United States, which contributed 44.1% of the publications. Our analyses uncovered the top five emerging research domains to be: (i) use of artificial intelligence in diagnosis of diabetes, (ii) risk assessment of diabetes and its complications, (iii) role of artificial intelligence in novel treatments and monitoring in diabetes, (iv) application of telehealth and wearable technology in the daily management of diabetes, and (v) robotic surgical outcomes with diabetes as a comorbid. Despite the benefits of artificial intelligence, challenges with system accuracy, validity, and confidentiality breach will need to be tackled before being widely applied for patients’ benefits.
Collapse
|
15
|
Abhari S, Niakan Kalhori SR, Ebrahimi M, Hasannejadasl H, Garavand A. Artificial Intelligence Applications in Type 2 Diabetes Mellitus Care: Focus on Machine Learning Methods. Healthc Inform Res 2019; 25:248-261. [PMID: 31777668 PMCID: PMC6859270 DOI: 10.4258/hir.2019.25.4.248] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Revised: 10/06/2019] [Accepted: 10/09/2019] [Indexed: 12/18/2022] Open
Abstract
Objectives The incidence of type 2 diabetes mellitus has increased significantly in recent years. With the development of artificial intelligence applications in healthcare, they are used for diagnosis, therapeutic decision making, and outcome prediction, especially in type 2 diabetes mellitus. This study aimed to identify the artificial intelligence (AI) applications for type 2 diabetes mellitus care. Methods This is a review conducted in 2018. We searched the PubMed, Web of Science, and Embase scientific databases, based on a combination of related mesh terms. The article selection process was based on Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). Finally, 31 articles were selected after inclusion and exclusion criteria were applied. Data gathering was done by using a data extraction form. Data were summarized and reported based on the study objectives. Results The main applications of AI for type 2 diabetes mellitus care were screening and diagnosis in different stages. Among all of the reviewed AI methods, machine learning methods with 71% (n = 22) were the most commonly applied techniques. Many applications were in multi method forms (23%). Among the machine learning algorithms applications, support vector machine (21%) and naive Bayesian (19%) were the most commonly used methods. The most important variables that were used in the selected studies were body mass index, fasting blood sugar, blood pressure, HbA1c, triglycerides, low-density lipoprotein, high-density lipoprotein, and demographic variables. Conclusions It is recommended to select optimal algorithms by testing various techniques. Support vector machine and naive Bayesian might achieve better performance than other applications due to the type of variables and targets in diabetes-related outcomes classification.
Collapse
Affiliation(s)
- Shahabeddin Abhari
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| | - Sharareh R Niakan Kalhori
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| | - Mehdi Ebrahimi
- Department of Internal Medicine, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran.,Endocrinology and Metabolism Research Center, Endocrinology and Metabolism Research Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Hajar Hasannejadasl
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| | - Ali Garavand
- Department of Health Information Management and Technology, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
16
|
G. SS, K. M. Diagnosis of diabetes diseases using optimized fuzzy rule set by grey wolf optimization. Pattern Recognit Lett 2019. [DOI: 10.1016/j.patrec.2019.06.005] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
17
|
Rigla M, García-Sáez G, Pons B, Hernando ME. Artificial Intelligence Methodologies and Their Application to Diabetes. J Diabetes Sci Technol 2018; 12:303-310. [PMID: 28539087 PMCID: PMC5851211 DOI: 10.1177/1932296817710475] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
In the past decade diabetes management has been transformed by the addition of continuous glucose monitoring and insulin pump data. More recently, a wide variety of functions and physiologic variables, such as heart rate, hours of sleep, number of steps walked and movement, have been available through wristbands or watches. New data, hydration, geolocation, and barometric pressure, among others, will be incorporated in the future. All these parameters, when analyzed, can be helpful for patients and doctors' decision support. Similar new scenarios have appeared in most medical fields, in such a way that in recent years, there has been an increased interest in the development and application of the methods of artificial intelligence (AI) to decision support and knowledge acquisition. Multidisciplinary research teams integrated by computer engineers and doctors are more and more frequent, mirroring the need of cooperation in this new topic. AI, as a science, can be defined as the ability to make computers do things that would require intelligence if done by humans. Increasingly, diabetes-related journals have been incorporating publications focused on AI tools applied to diabetes. In summary, diabetes management scenarios have suffered a deep transformation that forces diabetologists to incorporate skills from new areas. This recently needed knowledge includes AI tools, which have become part of the diabetes health care. The aim of this article is to explain in an easy and plane way the most used AI methodologies to promote the implication of health care providers-doctors and nurses-in this field.
Collapse
Affiliation(s)
- Mercedes Rigla
- Endocrinology and Nutrition Department, Parc Tauli University Hospital, Sabadell, Spain
- Mercedes Rigla, MD, PhD, Endocrinology and Nutrition Department, Parc Tauli University Hospital, I3PT, Autonomous University of Barcelona, Parc Taulí, 1, Sabadell, 08208, Spain.
| | - Gema García-Sáez
- Bioengineering and Telemedicine Centre, Universidad Politécnica de Madrid, Spain
- CIBER-BBN: Networking Research Centre for Bioengineering, Biomaterials and Nanomedicine, Madrid, Spain
| | - Belén Pons
- Endocrinology and Nutrition Department, Parc Tauli University Hospital, Sabadell, Spain
| | - Maria Elena Hernando
- Bioengineering and Telemedicine Centre, Universidad Politécnica de Madrid, Spain
- CIBER-BBN: Networking Research Centre for Bioengineering, Biomaterials and Nanomedicine, Madrid, Spain
| |
Collapse
|
18
|
Siddiqui SA, Zhang Y, Lloret J, Song H, Obradovic Z. Pain-Free Blood Glucose Monitoring Using Wearable Sensors: Recent Advancements and Future Prospects. IEEE Rev Biomed Eng 2018; 11:21-35. [DOI: 10.1109/rbme.2018.2822301] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
19
|
Disease Diagnosis in Smart Healthcare: Innovation, Technologies and Applications. SUSTAINABILITY 2017. [DOI: 10.3390/su9122309] [Citation(s) in RCA: 72] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
20
|
Mohebian MR, Marateb HR, Mansourian M, Mañanas MA, Mokarian F. A Hybrid Computer-aided-diagnosis System for Prediction of Breast Cancer Recurrence (HPBCR) Using Optimized Ensemble Learning. Comput Struct Biotechnol J 2016; 15:75-85. [PMID: 28018557 PMCID: PMC5173316 DOI: 10.1016/j.csbj.2016.11.004] [Citation(s) in RCA: 82] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2016] [Revised: 11/24/2016] [Accepted: 11/26/2016] [Indexed: 02/07/2023] Open
Abstract
Cancer is a collection of diseases that involves growing abnormal cells with the potential to invade or spread to the body. Breast cancer is the second leading cause of cancer death among women. A method for 5-year breast cancer recurrence prediction is presented in this manuscript. Clinicopathologic characteristics of 579 breast cancer patients (recurrence prevalence of 19.3%) were analyzed and discriminative features were selected using statistical feature selection methods. They were further refined by Particle Swarm Optimization (PSO) as the inputs of the classification system with ensemble learning (Bagged Decision Tree: BDT). The proper combination of selected categorical features and also the weight (importance) of the selected interval-measurement-scale features were identified by the PSO algorithm. The performance of HPBCR (hybrid predictor of breast cancer recurrence) was assessed using the holdout and 4-fold cross-validation. Three other classifiers namely as supported vector machines, DT, and multilayer perceptron neural network were used for comparison. The selected features were diagnosis age, tumor size, lymph node involvement ratio, number of involved axillary lymph nodes, progesterone receptor expression, having hormone therapy and type of surgery. The minimum sensitivity, specificity, precision and accuracy of HPBCR were 77%, 93%, 95% and 85%, respectively in the entire cross-validation folds and the hold-out test fold. HPBCR outperformed the other tested classifiers. It showed excellent agreement with the gold standard (i.e. the oncologist opinion after blood tumor marker and imaging tests, and tissue biopsy). This algorithm is thus a promising online tool for the prediction of breast cancer recurrence.
Collapse
Key Words
- Breast cancer
- CAD, computer-aided diagnosis
- Cancer recurrence
- Computer-assisted diagnosis
- DT, decision tree
- FH, family history of cancer
- HPBCR, the proposed hybrid predictor of breast cancer recurrence
- HRT, hormone therapy
- I. Node, number of involved axillary lymph nodes
- Machine learning
- NR, lymph node involvement ratio
- Prognosis
- T. Node, number of dissected axillary lymph nodes
- TS, tumor size
- XRT, radiotherapy
Collapse
Affiliation(s)
- Mohammad R. Mohebian
- Biomedical Engineering Department, Engineering Faculty, University of Isfahan, Hezar Jerib St., 81746-73441, Isfahan, Iran
| | - Hamid R. Marateb
- Biomedical Engineering Department, Engineering Faculty, University of Isfahan, Hezar Jerib St., 81746-73441, Isfahan, Iran
- Department of Automatic Control, Biomedical Engineering Research Center, Universitat Politècnica de Catalunya, BarcelonaTech (UPC), C. Pau Gargallo, 5, 08028 Barcelona, Spain
| | - Marjan Mansourian
- Department of Biostatistics and Epidemiology, School of Public Health, Isfahan University of Medical Sciences, Hezar Jerib St., 81745 Isfahan, Iran
- Corresponding author.
| | - Miguel Angel Mañanas
- Department of Automatic Control, Biomedical Engineering Research Center, Universitat Politècnica de Catalunya, BarcelonaTech (UPC), C. Pau Gargallo, 5, 08028 Barcelona, Spain
| | - Fariborz Mokarian
- Cancer Prevention Research Center, Isfahan University of Medical Sciences, Isfahan, Iran
- Department of Internal Medicine, School of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
| |
Collapse
|
21
|
Rule extraction using Recursive-Rule extraction algorithm with J48graft combined with sampling selection techniques for the diagnosis of type 2 diabetes mellitus in the Pima Indian dataset. INFORMATICS IN MEDICINE UNLOCKED 2016. [DOI: 10.1016/j.imu.2016.02.001] [Citation(s) in RCA: 58] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
|