1
|
Chang CY, Peng CH, Chen FY, Huang LY, Kuo CH, Chu TW, Liang YJ. The risk factors determined by four machine learning methods for the change of difference of bone mineral density in post-menopausal women after three years follow-up. Sci Rep 2024; 14:23234. [PMID: 39369003 PMCID: PMC11455928 DOI: 10.1038/s41598-024-73799-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 09/20/2024] [Indexed: 10/07/2024] Open
Abstract
The prevalence of osteoporosis has drastically increased recently. It is not only the most frequent but is also a major global public health problem due to its high morbidity. There are many risk factors associated with osteoporosis were identified. However, most studies have used the traditional multiple linear regression (MLR) to explore their relationships. Recently, machine learning (Mach-L) has become a new modality for data analysis because it enables machine to learn from past data or experiences without being explicitly programmed and could capture nonlinear relationships better. These methods have the potential to outperform conventional MLR in disease prediction. In the present study, we enrolled a Chinese post-menopause cohort followed up for 4 years. The difference of T-score (δ-T score) was the dependent variable. Information such as demographic, biochemistry and life styles were the independent variables. Our goals were: (1) Compare the prediction accuracy between Mach-L and traditional MLR for δ-T score. (2) Rank the importance of risk factors (independent variables) for prediction of δ T-score. Totally, there were 1698 postmenopausal women were enrolled from MJ Health Database. Four different Mach-L methods namely, Random forest (RF), eXtreme Gradient Boosting (XGBoost), Naïve Bayes (NB), and stochastic gradient boosting (SGB), to construct predictive models for predicting δ-BMD after four years follow-up. The dataset was then randomly divided into an 80% training dataset for model building and a 20% testing dataset for model testing. A 10-fold cross-validation technique for hyperparameter tuning was used. The model with the lowest root mean square error for the validation dataset was viewed as the best model for each ML method. The averaged metrics of the RF, SGB, NB, and XGBoost models were used to compare the model performance of the benchmark MLR model that used the same training and testing dataset as the Mach-L methods. We defined that the priority demonstrated in each model ranked 1 as the most critical risk factor and 22 as the last selected risk factor. For Pearson correlation, age, education, BMI, HDL-C, and TSH were positively and plasma calcium level, and baseline T-score were negatively correlated with δ-T score. All four Mach-L methods yielded lower prediction errors than the MLR method and were all convincing Mach-L models. From our results, it could be noted that education level is the most important factor for δ-T Score, followed by DBP, smoking, SBP, UA, age, and LDL-C. All four Mach-L outperformed traditional MLR. By using Mach-L, the most important six risk factors were selected which are, from the most important to the least: DBP, SBP, UA, education level, TG and sleeping hour. δ T score was positively related to SBP, education level, UA and TG and negatively related to DBP and sleeping hour in postmenopausal Chinese women.
Collapse
Affiliation(s)
- Ching-Yao Chang
- Graduate Institute of Applied Science and Engineering, Fu Jen Catholic University, New Taipei City, Taiwan, ROC
| | - Chung-Hsin Peng
- Department of Urology, Cardinal Tien Hospital, School of Medicine, Fu-Jen Catholic University, New Taipei City, Taiwan, ROC
| | - Fang-Yu Chen
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Fu Jen Catholic University Hospital, New Taipei City, Taiwan, ROC
| | - Li-Ying Huang
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Department of Medical Education, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City, Taiwan, ROC
| | - Chun-Heng Kuo
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City, Taiwan, ROC
| | - Ta-Wei Chu
- Department of Obstetrics and Gynecology, Tri-Service General Hospital, National Defense Medical Center, Chief Executive Officer's Office, MJ Health Research Foundation, Taipei, 114, Taiwan
| | - Yao-Jen Liang
- Graduate Institute of Applied Science and Engineering, Fu Jen Catholic University, New Taipei City, Taiwan, ROC.
- Department and Institute of Life Science, Fu Jen Catholic University, New Taipei City, Taiwan, ROC.
| |
Collapse
|
2
|
Madakkatel I, Lumsden AL, Mulugeta A, Mäenpää J, Oehler MK, Hyppönen E. Large-scale analysis to identify risk factors for ovarian cancer. Int J Gynecol Cancer 2024:ijgc-2024-005424. [PMID: 39084694 DOI: 10.1136/ijgc-2024-005424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/02/2024] Open
Abstract
OBJECTIVE Ovarian cancer is characterized by late-stage diagnoses and poor prognosis. We aimed to identify factors that can inform prevention and early detection of ovarian cancer. METHODS We used a data-driven machine learning approach to identify predictors of epithelial ovarian cancer from 2920 input features measured 12.6 years (IQR 11.9 to 13.3 years) before diagnoses. Analyses included 221 732 female participants in the UK Biobank without a history of cancer. During the follow-up 1441 women developed ovarian cancer. For factors that contributed to model prediction, we used multivariate logistic regression to evaluate the association with ovarian cancer, with evidence for causality tested by Mendelian randomization (MR) analyses in the Ovarian Cancer Genetics Consortium (25 509 cases). RESULTS Greater parity and ever-use of oral contraception were associated with lower ovarian cancer risk (ever vs never OR 0.74, 95% CI 0.66 to 0.84). After adjustment for established risk factors, greater height, weight, and greater red blood cell distribution width were associated with increased ovarian cancer risk, while higher aspartate aminotransferase levels and mean corpuscular volume were associated with lower risk. MR analyses confirmed observational associations with anthropometric/adiposity traits (eg, body fat percentage per standard deviation (SD); OR inverse-variance weighted (ORIVW) 1.28, 95% CI 1.13 to 1.46) and aspartate aminotransferase (ORIVW 0.87, 95% CI 0.78 to 0.98). MR also provided genetic evidence for a protective association of higher total serum protein on ovarian cancer, higher lymphocyte count on serous and endometrioid ovarian cancer, and greater forced expiratory volume in 1 s on serous ovarian cancer among other findings. CONCLUSIONS This study shows that certain risk factors for ovarian cancer are modifiable, suggesting that weight reduction and interventions to reduce the number of ovulations may provide potential for future prevention. We also identified blood biomarkers associated with ovarian cancer years before diagnoses, warranting further investigation.
Collapse
Affiliation(s)
- Iqbal Madakkatel
- Australian Centre for Precision Health, Unit of Clinical and Health Sciences, University of South Australia, Adelaide, South Australia, Australia
- South Australian Health and Medical Research Institute (SAHMRI), Adelaide, South Australia, Australia
| | - Amanda L Lumsden
- Australian Centre for Precision Health, Unit of Clinical and Health Sciences, University of South Australia, Adelaide, South Australia, Australia
- South Australian Health and Medical Research Institute (SAHMRI), Adelaide, South Australia, Australia
| | - Anwar Mulugeta
- Australian Centre for Precision Health, Unit of Clinical and Health Sciences, University of South Australia, Adelaide, South Australia, Australia
- South Australian Health and Medical Research Institute (SAHMRI), Adelaide, South Australia, Australia
- Department of Pharmacology and Clinical Pharmacy, College of Health Science, Addis Ababa University, Addis Ababa, Ethiopia
| | - Johanna Mäenpää
- Faculty of Medicine and Medical Technology, Tampere University, Tampere, Finland
| | - Martin K Oehler
- Department of Gynaecological Oncology, Royal Adelaide Hospital, Adelaide, South Australia, Australia
- Adelaide Medical School, Robinson Research Institute, University of Adelaide, Adelaide, South Australia, Australia
| | - Elina Hyppönen
- Australian Centre for Precision Health, Unit of Clinical and Health Sciences, University of South Australia, Adelaide, South Australia, Australia
- South Australian Health and Medical Research Institute (SAHMRI), Adelaide, South Australia, Australia
| |
Collapse
|
3
|
Wang CK, Chang CY, Chu TW, Liang YJ. Using Machine Learning to Identify the Relationships between Demographic, Biochemical, and Lifestyle Parameters and Plasma Vitamin D Concentration in Healthy Premenopausal Chinese Women. Life (Basel) 2023; 13:2257. [PMID: 38137858 PMCID: PMC10744461 DOI: 10.3390/life13122257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 11/15/2023] [Accepted: 11/22/2023] [Indexed: 12/24/2023] Open
Abstract
INTRODUCTION Vitamin D plays a vital role in maintaining homeostasis and enhancing the absorption of calcium, an essential component for strengthening bones and preventing osteoporosis. There are many factors known to relate to plasma vitamin D concentration (PVDC). However, most of these studies were performed with traditional statistical methods. Nowadays, machine learning methods (Mach-L) have become new tools in medical research. In the present study, we used four Mach-L methods to explore the relationships between PVDC and demographic, biochemical, and lifestyle factors in a group of healthy premenopausal Chinese women. Our goals were as follows: (1) to evaluate and compare the predictive accuracy of Mach-L and MLR, and (2) to establish a hierarchy of the significance of the aforementioned factors related to PVDC. METHODS Five hundred ninety-three healthy Chinese women were enrolled. In total, there were 35 variables recorded, including demographic, biochemical, and lifestyle information. The dependent variable was 25-OH vitamin D (PVDC), and all other variables were the independent variables. Multiple linear regression (MLR) was regarded as the benchmark for comparison. Four Mach-L methods were applied (random forest (RF), stochastic gradient boosting (SGB), extreme gradient boosting (XGBoost), and elastic net). Each method would produce several estimation errors. The smaller these errors were, the better the model was. RESULTS Pearson's correlation, age, glycated hemoglobin, HDL-cholesterol, LDL-cholesterol, and hemoglobin were positively correlated to PVDC, whereas eGFR was negatively correlated to PVDC. The Mach-L methods yielded smaller estimation errors for all five parameters, which indicated that they were better methods than the MLR model. After averaging the importance percentage from the four Mach-L methods, a rank of importance could be obtained. Age was the most important factor, followed by plasma insulin level, TSH, spouse status, LDH, and ALP. CONCLUSIONS In a healthy Chinese premenopausal cohort using four different Mach-L methods, age was found to be the most important factor related to PVDC, followed by plasma insulin level, TSH, spouse status, LDH, and ALP.
Collapse
Affiliation(s)
- Chun-Kai Wang
- Department of Obstetrics and Gynecology, Zuoying Branch of Kaohsiung Armed Forces General Hospital, Kaohsiung 813, Taiwan;
| | - Ching-Yao Chang
- Graduate Institute of Applied Science and Engineering, Fu Jen Catholic University, New Taipei City 242, Taiwan;
| | - Ta-Wei Chu
- Department of Obstetrics and Gynecology, Tri-Service General Hospital, National Defense Medical Center, Chief Executive Officer’s Office, MJ Health Research Foundation, Taipei 114, Taiwan;
| | - Yao-Jen Liang
- Graduate Institute of Applied Science and Engineering, Fu Jen Catholic University, New Taipei City 242, Taiwan;
| |
Collapse
|
4
|
Yang CC, Peng CH, Huang LY, Chen FY, Kuo CH, Wu CZ, Hsia TL, Lin CY. Comparison between multiple logistic regression and machine learning methods in prediction of abnormal thallium scans in type 2 diabetes. World J Clin Cases 2023; 11:7951-7964. [DOI: 10.12998/wjcc.v11.i33.7951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 10/23/2023] [Accepted: 11/13/2023] [Indexed: 11/24/2023] Open
Abstract
BACKGROUND The prevalence of type 2 diabetes (T2D) has been increasing dramatically in recent decades, and 47.5% of T2D patients will die of cardiovascular disease. Thallium-201 myocardial perfusion scan (MPS) is a precise and non-invasive method to detect coronary artery disease (CAD). Most previous studies used traditional logistic regression (LGR) to evaluate the risks for abnormal CAD. Rapidly developing machine learning (Mach-L) techniques could potentially outperform LGR in capturing non-linear relationships.
AIM To aims were: (1) Compare the accuracy of Mach-L methods and LGR; and (2) Found the most important factors for abnormal TMPS.
METHODS 556 T2D were enrolled in the study (287 men and 269 women). Demographic and biochemistry data were used as independent variables and the sum of stressed score derived from MPS scan was the dependent variable. Subjects with a MPS score ≥ 9 were defined as abnormal. In addition to traditional LGR, classification and regression tree (CART), random forest, Naïve Bayes, and eXtreme gradient boosting were also applied. Sensitivity, specificity, accuracy and area under the receiver operation curve were used to evaluate the respective accuracy of LGR and Mach-L methods.
RESULTS Except for CART, the other Mach-L methods outperformed LGR, with gender, body mass index, age, low-density lipoprotein cholesterol, glycated hemoglobin and smoking emerging as the most important factors to predict abnormal MPS.
CONCLUSION Four Mach-L methods are found to outperform LGR in predicting abnormal TMPS in Chinese T2D, with the most important risk factors being gender, body mass index, age, low-density lipoprotein cholesterol, glycated hemoglobin and smoking.
Collapse
Affiliation(s)
- Chung-Chi Yang
- Division of Cardiovascular Medicine, Taoyuan Armed Forces General Hospital, Taoyuan City 32551, Taiwan
- Division of Cardiovascular, Tri-service General Hospital, Taipei City 114202, Taiwan
| | - Chung-Hsin Peng
- Department of Urology, Cardinal Tien Hospital, New Taipei City 23148, Taiwan
- School of Medicine, Fu-Jen Catholic University, New Taipei City 242062, Taiwan
| | - Li-Ying Huang
- Department of Internal Medicine, Department of Medical Education, School of Medicine, Fu Jen Catholic University Hospital, New Taipei City 243, Taiwan
- School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 243, Taiwan
| | - Fang Yu Chen
- Department of Endocrinology, Fu Jen Catholic University Hospital, New Taipei City 243, Taiwan
| | - Chun-Heng Kuo
- School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 243, Taiwan
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Fu Jen Catholic University Hospital, New Taipei City 243, Taiwan
| | - Chung-Ze Wu
- Division of Endocrinology, Shuang Ho Hospital, New Taipei City 23561, Taiwan
- School of Medicine, Taipei Medical University, Taipei City 11031, Taiwan
| | - Te-Lin Hsia
- Department of Internal Medicine, Cardinal Tien Hospital, New Taipei City 23148, Taiwan
| | - Chung-Yu Lin
- Department of Cardiology, Fu Jen Catholic University Hospital, New Taipei City 24352, Taiwan
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242062, Taiwan
| |
Collapse
|
5
|
Tzou SJ, Peng CH, Huang LY, Chen FY, Kuo CH, Wu CZ, Chu TW. Comparison between linear regression and four different machine learning methods in selecting risk factors for osteoporosis in a Chinese female aged cohort. J Chin Med Assoc 2023; 86:1028-1036. [PMID: 37729604 DOI: 10.1097/jcma.0000000000000999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 09/22/2023] Open
Abstract
BACKGROUND Population aging is emerging as an increasingly acute challenge for countries around the world. One particular manifestation of this phenomenon is the impact of osteoporosis on individuals and national health systems. Previous studies of risk factors for osteoporosis were conducted using traditional statistical methods, but more recent efforts have turned to machine learning approaches. Most such efforts, however, treat the target variable (bone mineral density [BMD] or fracture rate) as a categorical one, which provides no quantitative information. The present study uses five different machine learning methods to analyze the risk factors for T-score of BMD, seeking to (1) compare the prediction accuracy between different machine learning methods and traditional multiple linear regression (MLR) and (2) rank the importance of 25 different risk factors. METHODS The study sample includes 24 412 women older than 55 years with 25 related variables, applying traditional MLR and five different machine learning methods: classification and regression tree, Naïve Bayes, random forest, stochastic gradient boosting, and eXtreme gradient boosting. The metrics used for model performance comparisons are the symmetric mean absolute percentage error, relative absolute error, root relative squared error, and root mean squared error. RESULTS Machine learning approaches outperformed MLR for all four prediction errors. The average importance ranking of each factor generated by the machine learning methods indicates that age is the most important factor determining T-score, followed by estimated glomerular filtration rate (eGFR), body mass index (BMI), uric acid (UA), and education level. CONCLUSION In a group of women older than 55 years, we demonstrated that machine learning methods provide superior performance in estimating T-Score, with age being the most important impact factor, followed by eGFR, BMI, UA, and education level.
Collapse
Affiliation(s)
- Shiow-Jyu Tzou
- Teaching and Researching Center, Kaohsiung Armed Forces General Hospital, Kaohsiung, Taiwan, ROC
- Institute of Medical Science and Technology, National Sun Yat-sen University, Kaohsiung, Taiwan, ROC
| | - Chung-Hsin Peng
- Department of Urology, Cardinal Tien Hospital, School of Medicine, Fu Jen Catholic University, New Taipei City, Taiwan, ROC
| | - Li-Ying Huang
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Fu Jen Catholic University Hospital, New Taipei, Taiwan
| | - Fang-Yu Chen
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Fu Jen Catholic University Hospital, New Taipei, Taiwan
| | - Chun-Heng Kuo
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Fu Jen Catholic University Hospital, New Taipei, Taiwan
| | - Chung-Ze Wu
- Department of Internal Medicine, Shuang Ho Hospital, New Taipei City, Division of Endocrinology and Metabolism, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan, ROC
| | - Ta-Wei Chu
- Department of Obstetrics and Gynecology, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan, ROC
- MJ Health Research Foundation, Taipei, Taiwan, ROC
| |
Collapse
|
6
|
Jiang Y, Wang C, Zhou S. Artificial intelligence-based risk stratification, accurate diagnosis and treatment prediction in gynecologic oncology. Semin Cancer Biol 2023; 96:82-99. [PMID: 37783319 DOI: 10.1016/j.semcancer.2023.09.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Revised: 08/27/2023] [Accepted: 09/25/2023] [Indexed: 10/04/2023]
Abstract
As data-driven science, artificial intelligence (AI) has paved a promising path toward an evolving health system teeming with thrilling opportunities for precision oncology. Notwithstanding the tremendous success of oncological AI in such fields as lung carcinoma, breast tumor and brain malignancy, less attention has been devoted to investigating the influence of AI on gynecologic oncology. Hereby, this review sheds light on the ever-increasing contribution of state-of-the-art AI techniques to the refined risk stratification and whole-course management of patients with gynecologic tumors, in particular, cervical, ovarian and endometrial cancer, centering on information and features extracted from clinical data (electronic health records), cancer imaging including radiological imaging, colposcopic images, cytological and histopathological digital images, and molecular profiling (genomics, transcriptomics, metabolomics and so forth). However, there are still noteworthy challenges beyond performance validation. Thus, this work further describes the limitations and challenges faced in the real-word implementation of AI models, as well as potential solutions to address these issues.
Collapse
Affiliation(s)
- Yuting Jiang
- Department of Obstetrics and Gynecology, Key Laboratory of Birth Defects and Related Diseases of Women and Children of MOE and State Key Laboratory of Biotherapy, West China Second Hospital, Sichuan University and Collaborative Innovation Center, Chengdu, Sichuan 610041, China; Department of Pulmonary and Critical Care Medicine, State Key Laboratory of Respiratory Health and Multimorbidity, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - Chengdi Wang
- Department of Obstetrics and Gynecology, Key Laboratory of Birth Defects and Related Diseases of Women and Children of MOE and State Key Laboratory of Biotherapy, West China Second Hospital, Sichuan University and Collaborative Innovation Center, Chengdu, Sichuan 610041, China; Department of Pulmonary and Critical Care Medicine, State Key Laboratory of Respiratory Health and Multimorbidity, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - Shengtao Zhou
- Department of Obstetrics and Gynecology, Key Laboratory of Birth Defects and Related Diseases of Women and Children of MOE and State Key Laboratory of Biotherapy, West China Second Hospital, Sichuan University and Collaborative Innovation Center, Chengdu, Sichuan 610041, China; Department of Pulmonary and Critical Care Medicine, State Key Laboratory of Respiratory Health and Multimorbidity, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China.
| |
Collapse
|
7
|
Chen CH, Wang CK, Wang CY, Chang CF, Chu TW. Roles of biochemistry data, lifestyle, and inflammation in identifying abnormal renal function in old Chinese. World J Clin Cases 2023; 11:7004-7016. [PMID: 37946770 PMCID: PMC10631406 DOI: 10.12998/wjcc.v11.i29.7004] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 08/01/2023] [Accepted: 09/11/2023] [Indexed: 10/13/2023] Open
Abstract
BACKGROUND The incidence of chronic kidney disease (CKD) has dramatically increased in recent years, with significant impacts on patient mortality rates. Previous studies have identified multiple risk factors for CKD, but they mostly relied on the use of traditional statistical methods such as logistic regression and only focused on a few risk factors. AIM To determine factors that can be used to identify subjects with a low estimated glomerular filtration rate (L-eGFR < 60 mL/min per 1.73 m2) in a cohort of 1236 Chinese people aged over 65. METHODS Twenty risk factors were divided into three models. Model 1 consisted of demographic and biochemistry data. Model 2 added lifestyle data to Model 1, and Model 3 added inflammatory markers to Model 2. Five machine learning methods were used: Multivariate adaptive regression splines, eXtreme Gradient Boosting, stochastic gradient boosting, Light Gradient Boosting Machine, and Categorical Features + Gradient Boosting. Evaluation criteria included accuracy, sensitivity, specificity, area under the receiver operating characteristic curve (AUC), F-1 score, and balanced accuracy. RESULTS A trend of increasing AUC of each was observed from Model 1 to Model 3 and reached statistical significance. Model 3 selected uric acid as the most important risk factor, followed by age, hemoglobin (Hb), body mass index (BMI), sport hours, and systolic blood pressure (SBP). CONCLUSION Among all the risk factors including demographic, biochemistry, and lifestyle risk factors, along with inflammation markers, UA is the most important risk factor to identify L-eGFR, followed by age, Hb, BMI, sport hours, and SBP in a cohort of elderly Chinese people.
Collapse
Affiliation(s)
- Chao-Hung Chen
- Division of Urology, Department of Surgery, Kaohsiung Armed Forces General Hospital, Kaohsiung 802, Taiwan
- Division of Urology, Department of Surgery, Chang Gung Memorial Hospital, Keelung 204, Taiwan
| | - Chun-Kai Wang
- Department of Obstetrics and Gynecology, Zuoying Branch of Kaohsiung Armed Forces General Hospital, Kaohsiung 813, Taiwan
| | - Chen-Yu Wang
- Department of Obstetrics and Gynecology, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan
| | - Chun-Feng Chang
- Division of Urology, Department of Surgery, Kaohsiung Armed Forces General Hospital, Kaohsiung 802, Taiwan
- Division of Urology, Department of Surgery, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan
| | - Ta-Wei Chu
- Department of Obstetrics and Gynecology, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan
- Chief Executive Officer's Office, MJ Health Research Foundation, Taipei 114, Taiwan
| |
Collapse
|
8
|
Tsai MH, Jhou MJ, Liu TC, Fang YW, Lu CJ. An integrated machine learning predictive scheme for longitudinal laboratory data to evaluate the factors determining renal function changes in patients with different chronic kidney disease stages. Front Med (Lausanne) 2023; 10:1155426. [PMID: 37859858 PMCID: PMC10582636 DOI: 10.3389/fmed.2023.1155426] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 09/19/2023] [Indexed: 10/21/2023] Open
Abstract
Background and objectives Chronic kidney disease (CKD) is a global health concern. This study aims to identify key factors associated with renal function changes using the proposed machine learning and important variable selection (ML&IVS) scheme on longitudinal laboratory data. The goal is to predict changes in the estimated glomerular filtration rate (eGFR) in a cohort of patients with CKD stages 3-5. Design A retrospective cohort study. Setting and participants A total of 710 outpatients who presented with stable nondialysis-dependent CKD stages 3-5 at the Shin-Kong Wu Ho-Su Memorial Hospital Medical Center from 2016 to 2021. Methods This study analyzed trimonthly laboratory data including 47 indicators. The proposed scheme used stochastic gradient boosting, multivariate adaptive regression splines, random forest, eXtreme gradient boosting, and light gradient boosting machine algorithms to evaluate the important factors for predicting the results of the fourth eGFR examination, especially in patients with CKD stage 3 and those with CKD stages 4-5, with or without diabetes mellitus (DM). Main outcome measurement Subsequent eGFR level after three consecutive laboratory data assessments. Results Our ML&IVS scheme demonstrated superior predictive capabilities and identified significant factors contributing to renal function changes in various CKD groups. The latest levels of eGFR, blood urea nitrogen (BUN), proteinuria, sodium, and systolic blood pressure as well as mean levels of eGFR, BUN, proteinuria, and triglyceride were the top 10 significantly important factors for predicting the subsequent eGFR level in patients with CKD stages 3-5. In individuals with DM, the latest levels of BUN and proteinuria, mean levels of phosphate and proteinuria, and variations in diastolic blood pressure levels emerged as important factors for predicting the decline of renal function. In individuals without DM, all phosphate patterns and latest albumin levels were found to be key factors in the advanced CKD group. Moreover, proteinuria was identified as an important factor in the CKD stage 3 group without DM and CKD stages 4-5 group with DM. Conclusion The proposed scheme highlighted factors associated with renal function changes in different CKD conditions, offering valuable insights to physicians for raising awareness about renal function changes.
Collapse
Affiliation(s)
- Ming-Hsien Tsai
- Division of Nephrology, Department of Medicine, Shin Kong Wu Ho-Su Memorial Hospital, Taipei, Taiwan
- Department of Medicine, School of Medicine, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Mao-Jhen Jhou
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Tzu-Chi Liu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Yu-Wei Fang
- Division of Nephrology, Department of Medicine, Shin Kong Wu Ho-Su Memorial Hospital, Taipei, Taiwan
- Department of Medicine, School of Medicine, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Chi-Jie Lu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City, Taiwan
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City, Taiwan
- Department of Information Management, Fu Jen Catholic University, New Taipei City, Taiwan
| |
Collapse
|
9
|
Wu CZ, Huang LY, Chen FY, Kuo CH, Yeih DF. Using Machine Learning to Predict Abnormal Carotid Intima-Media Thickness in Type 2 Diabetes. Diagnostics (Basel) 2023; 13:diagnostics13111834. [PMID: 37296685 DOI: 10.3390/diagnostics13111834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Revised: 05/16/2023] [Accepted: 05/20/2023] [Indexed: 06/12/2023] Open
Abstract
Carotid intima-media thickness (c-IMT) is a reliable risk factor for cardiovascular disease risk in type 2 diabetes (T2D) patients. The present study aimed to compare the effectiveness of different machine learning methods and traditional multiple logistic regression in predicting c-IMT using baseline features and to establish the most significant risk factors in a T2D cohort. We followed up with 924 patients with T2D for four years, with 75% of the participants used for model development. Machine learning methods, including classification and regression tree, random forest, eXtreme gradient boosting, and Naïve Bayes classifier, were used to predict c-IMT. The results showed that all machine learning methods, except for classification and regression tree, were not inferior to multiple logistic regression in predicting c-IMT in terms of higher area under receiver operation curve. The most significant risk factors for c-IMT were age, sex, creatinine, body mass index, diastolic blood pressure, and duration of diabetes, sequentially. Conclusively, machine learning methods could improve the prediction of c-IMT in T2D patients compared to conventional logistic regression models. This could have crucial implications for the early identification and management of cardiovascular disease in T2D patients.
Collapse
Affiliation(s)
- Chung-Ze Wu
- Division of Endocrinology and Metabolism, Department of Internal Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei City 11031, Taiwan
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Shuang Ho Hospital, Taipei Medical University, New Taipei City 23561, Taiwan
| | - Li-Ying Huang
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Department of Medical Education, Fu Jen Catholic University Hospital, Fu Jen Catholic University, New Taipei City 24352, Taiwan
- School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan
| | - Fang-Yu Chen
- School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Fu Jen Catholic University Hospital, Fu Jen Catholic University, New Taipei City 24352, Taiwan
| | - Chun-Heng Kuo
- School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Fu Jen Catholic University Hospital, Fu Jen Catholic University, New Taipei City 24352, Taiwan
| | - Dong-Feng Yeih
- School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan
- Division of Cardiology, Department of Internal Medicine, Fu Jen Catholic University Hospital, Fu Jen Catholic University, New Taipei City 24352, Taiwan
| |
Collapse
|
10
|
Huang HH, Hsieh SJ, Chen MS, Jhou MJ, Liu TC, Shen HL, Yang CT, Hung CC, Yu YY, Lu CJ. Machine Learning Predictive Models for Evaluating Risk Factors Affecting Sperm Count: Predictions Based on Health Screening Indicators. J Clin Med 2023; 12:1220. [PMID: 36769868 PMCID: PMC9917545 DOI: 10.3390/jcm12031220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 01/13/2023] [Accepted: 02/01/2023] [Indexed: 02/05/2023] Open
Abstract
In many countries, especially developed nations, the fertility rate and birth rate have continually declined. Taiwan's fertility rate has paralleled this trend and reached its nadir in 2022. Therefore, the government uses many strategies to encourage more married couples to have children. However, couples marrying at an older age may have declining physical status, as well as hypertension and other metabolic syndrome symptoms, in addition to possibly being overweight, which have been the focus of the studies for their influences on male and female gamete quality. Many previous studies based on infertile people are not truly representative of the general population. This study proposed a framework using five machine learning (ML) predictive algorithms-random forest, stochastic gradient boosting, least absolute shrinkage and selection operator regression, ridge regression, and extreme gradient boosting-to identify the major risk factors affecting male sperm count based on a major health screening database in Taiwan. Unlike traditional multiple linear regression, ML algorithms do not need statistical assumptions and can capture non-linear relationships or complex interactions between dependent and independent variables to generate promising performance. We analyzed annual health screening data of 1375 males from 2010 to 2017, including data on health screening indicators, sourced from the MJ Group, a major health screening center in Taiwan. The symmetric mean absolute percentage error, relative absolute error, root relative squared error, and root mean squared error were used as performance evaluation metrics. Our results show that sleep time (ST), alpha-fetoprotein (AFP), body fat (BF), systolic blood pressure (SBP), and blood urea nitrogen (BUN) are the top five risk factors associated with sperm count. ST is a known risk factor influencing reproductive hormone balance, which can affect spermatogenesis and final sperm count. BF and SBP are risk factors associated with metabolic syndrome, another known risk factor of altered male reproductive hormone systems. However, AFP has not been the focus of previous studies on male fertility or semen quality. BUN, the index for kidney function, is also identified as a risk factor by our established ML model. Our results support previous findings that metabolic syndrome has negative impacts on sperm count and semen quality. Sleep duration also has an impact on sperm generation in the testes. AFP and BUN are two novel risk factors linked to sperm counts. These findings could help healthcare personnel and law makers create strategies for creating environments to increase the country's fertility rate. This study should also be of value to follow-up research.
Collapse
Affiliation(s)
- Hung-Hsiang Huang
- Division of Urology, Department of Surgery, Far Eastern Memorial Hospital, New Taipei City 220, Taiwan
| | - Shang-Ju Hsieh
- Division of Urology, Department of Surgery, Far Eastern Memorial Hospital, New Taipei City 220, Taiwan
| | - Ming-Shu Chen
- Department of Healthcare Administration, College of Healthcare & Management, Asia Eastern University of Science and Technology, New Taipei City 220, Taiwan
| | - Mao-Jhen Jhou
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242, Taiwan
| | - Tzu-Chi Liu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242, Taiwan
| | - Hsiang-Li Shen
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242, Taiwan
| | - Chih-Te Yang
- Department of Business Administration, Tamkang University, New Taipei City 251, Taiwan
| | - Chung-Chih Hung
- Department of Laboratory Medicine, Taipei Hospital, Ministry of Health and Welfare, New Taipei City 242, Taiwan
| | - Ya-Yen Yu
- Department of Medical Laboratory, Chang-Hua Hospital, Ministry of Health and Welfare, Chang Hua County 513, Taiwan
| | - Chi-Jie Lu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242, Taiwan
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City 242, Taiwan
- Department of Information Management, Fu Jen Catholic University, New Taipei City 242, Taiwan
| |
Collapse
|
11
|
Fiste O, Liontos M, Zagouri F, Stamatakos G, Dimopoulos MA. Machine learning applications in gynecological cancer: A critical review. Crit Rev Oncol Hematol 2022; 179:103808. [PMID: 36087852 DOI: 10.1016/j.critrevonc.2022.103808] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Revised: 08/18/2022] [Accepted: 09/05/2022] [Indexed: 11/30/2022] Open
Abstract
Machine Learning (ML) represents a computer science capable of generating predictive models, by exposure to raw, training data, without being rigidly programmed. Over the last few years, ML has gained attention within the field of oncology, with considerable strides in both diagnostic, predictive, and prognostic spectrum of malignancies, but also as a catalyst of cancer research. In this review, we discuss the state of ML applications on gynecologic oncology and systematically address major technical and ethical concerns, with respect to their real-world medical practice translation. Undoubtedly, advances in ML will enable the analysis of large, rather complex, datasets for improved, cost-effective, and efficient clinical decisions.
Collapse
Affiliation(s)
- Oraianthi Fiste
- Department of Clinical Therapeutics, School of Medicine, National and Kapodistrian University of Athens, Alexandra Hospital, 80 Vasilissis Sophias, 11528 Athens, Greece.
| | - Michalis Liontos
- Department of Clinical Therapeutics, School of Medicine, National and Kapodistrian University of Athens, Alexandra Hospital, 80 Vasilissis Sophias, 11528 Athens, Greece
| | - Flora Zagouri
- Department of Clinical Therapeutics, School of Medicine, National and Kapodistrian University of Athens, Alexandra Hospital, 80 Vasilissis Sophias, 11528 Athens, Greece
| | - Georgios Stamatakos
- In Silico Oncology and In Silico Medicine Group, Institute of Communication and Computer Systems, School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece
| | - Meletios Athanasios Dimopoulos
- Department of Clinical Therapeutics, School of Medicine, National and Kapodistrian University of Athens, Alexandra Hospital, 80 Vasilissis Sophias, 11528 Athens, Greece
| |
Collapse
|
12
|
Ahmed MIB, Alotaibi S, Atta-ur-Rahman, Dash S, Nabil M, AlTurki AO. A Review on Machine Learning Approaches in Identification of Pediatric Epilepsy. SN COMPUTER SCIENCE 2022; 3:437. [PMID: 35965953 PMCID: PMC9364307 DOI: 10.1007/s42979-022-01358-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 06/26/2022] [Indexed: 10/26/2022]
Abstract
Epilepsy is the second most common neurological disease after Alzheimer. It is a disorder of the brain which results in recurrent seizures. Though the epilepsy in general is considered as a serious disorder, its effects in children are rather dangerous. It is mainly because it reasons a slower rate of development and a failure to improve certain skills among such children. Seizures are the most common symptom of epilepsy. As a regular medical procedure, the specialists record brain activity using an electroencephalogram (EEG) to observe epileptic seizures. The detection of these seizures is performed by specialists, but the results might not be accurate and depend on the specialist's experience; therefore, automated detection of epileptic pediatric seizures might be an optimal solution. In this regard, several techniques have been investigated in the literature. This research aims to review the approaches to pediatric epilepsy seizures' identification especially those based on machine learning, in addition to the techniques applied on the CHB-MIT scalp EEG database of epileptic pediatric signals.
Collapse
Affiliation(s)
- Mohammed Imran Basheer Ahmed
- Department of Computer Engineering, College of Computer Science and Information Technology (CCSIT), Imam Abdulrahman Bin Faisal University (IAU), P.O. Box 1982, Dammam, 31441 Saudi Arabia
| | - Shamsah Alotaibi
- Department of Computer Science, College of Computer Science and Information Technology (CCSIT), Imam Abdulrahman Bin Faisal University (IAU), P.O. Box 1982, Dammam, 31441 Saudi Arabia
| | - Atta-ur-Rahman
- Department of Computer Science, College of Computer Science and Information Technology (CCSIT), Imam Abdulrahman Bin Faisal University (IAU), P.O. Box 1982, Dammam, 31441 Saudi Arabia
| | - Sujata Dash
- Department of Computer Application, Maharaja Srirama Chandra Bhanj Deo University, Baripada, India
| | - Majed Nabil
- Department of Computer Science, College of Computer Science and Information Technology (CCSIT), Imam Abdulrahman Bin Faisal University (IAU), P.O. Box 1982, Dammam, 31441 Saudi Arabia
| | - Abdullah Omar AlTurki
- Department of Computer Science, College of Computer Science and Information Technology (CCSIT), Imam Abdulrahman Bin Faisal University (IAU), P.O. Box 1982, Dammam, 31441 Saudi Arabia
| |
Collapse
|
13
|
Comparison between Machine Learning and Multiple Linear Regression to Identify Abnormal Thallium Myocardial Perfusion Scan in Chinese Type 2 Diabetes. Diagnostics (Basel) 2022; 12:diagnostics12071619. [PMID: 35885524 PMCID: PMC9324130 DOI: 10.3390/diagnostics12071619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2022] [Revised: 06/27/2022] [Accepted: 06/30/2022] [Indexed: 11/17/2022] Open
Abstract
Type 2 diabetes mellitus (T2DM) patients have a high risk of coronary artery disease (CAD). Thallium-201 myocardial perfusion scan (Th-201 scan) is a non-invasive and extensively used tool in recognizing CAD in clinical settings. In this study, we attempted to compare the predictive accuracy of evaluating abnormal Th-201 scans using traditional multiple linear regression (MLR) with four machine learning (ML) methods. From the study, we can determine whether ML surpasses traditional MLR and rank the clinical variables and compare them with previous reports.In total, 796 T2DM, including 368 men and 528 women, were enrolled. In addition to traditional MLR, classification and regression tree (CART), random forest (RF), stochastic gradient boosting (SGB) and eXtreme gradient boosting (XGBoost) were also used to analyze abnormal Th-201 scans. Stress sum score was used as the endpoint (dependent variable). Our findings show that all four root mean square errors of ML are smaller than with MLR, which implies that ML is more precise than MLR in determining abnormal Th-201 scans by using clinical parameters. The first seven factors, from the most important to the least are:body mass index, hemoglobin, age, glycated hemoglobin, Creatinine, systolic and diastolic blood pressure. In conclusion, ML is not inferior to traditional MLR in predicting abnormal Th-201 scans, and the most important factors are body mass index, hemoglobin, age, glycated hemoglobin, creatinine, systolic and diastolic blood pressure. ML methods are superior in these kinds of studies.
Collapse
|
14
|
Huang LY, Chen FY, Jhou MJ, Kuo CH, Wu CZ, Lu CH, Chen YL, Pei D, Cheng YF, Lu CJ. Comparing Multiple Linear Regression and Machine Learning in Predicting Diabetic Urine Albumin-Creatinine Ratio in a 4-Year Follow-Up Study. J Clin Med 2022; 11:3661. [PMID: 35806944 PMCID: PMC9267784 DOI: 10.3390/jcm11133661] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 06/19/2022] [Accepted: 06/22/2022] [Indexed: 02/07/2023] Open
Abstract
The urine albumin-creatinine ratio (uACR) is a warning for the deterioration of renal function in type 2 diabetes (T2D). The early detection of ACR has become an important issue. Multiple linear regression (MLR) has traditionally been used to explore the relationships between risk factors and endpoints. Recently, machine learning (ML) methods have been widely applied in medicine. In the present study, four ML methods were used to predict the uACR in a T2D cohort. We hypothesized that (1) ML outperforms traditional MLR and (2) different ranks of the importance of the risk factors will be obtained. A total of 1147 patients with T2D were followed up for four years. MLR, classification and regression tree, random forest, stochastic gradient boosting, and eXtreme gradient boosting methods were used. Our findings show that the prediction errors of the ML methods are smaller than those of MLR, which indicates that ML is more accurate. The first six most important factors were baseline creatinine level, systolic and diastolic blood pressure, glycated hemoglobin, and fasting plasma glucose. In conclusion, ML might be more accurate in predicting uACR in a T2D cohort than the traditional MLR, and the baseline creatinine level is the most important predictor, which is followed by systolic and diastolic blood pressure, glycated hemoglobin, and fasting plasma glucose in Chinese patients with T2D.
Collapse
Affiliation(s)
- Li-Ying Huang
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Department of Medical Education, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan; (L.-Y.H.); (F.-Y.C.); (C.-H.K.); (D.P.)
| | - Fang-Yu Chen
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Department of Medical Education, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan; (L.-Y.H.); (F.-Y.C.); (C.-H.K.); (D.P.)
| | - Mao-Jhen Jhou
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242062, Taiwan;
| | - Chun-Heng Kuo
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Department of Medical Education, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan; (L.-Y.H.); (F.-Y.C.); (C.-H.K.); (D.P.)
| | - Chung-Ze Wu
- Division of Endocrinology, Department of Internal Medicine, Shuang Ho Hospital, New Taipei City 23561, Taiwan;
- Division of Endocrinology and Metabolism, Department of Internal Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei 11031, Taiwan
| | - Chieh-Hua Lu
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Tri-Service General Hospital, School of Medicine, National Defense Medical Center, Taipei 11490, Taiwan;
| | - Yen-Lin Chen
- Department of Pathology, Tri-Service General Hospital, National Defense Medical Center, Taipei 11490, Taiwan;
| | - Dee Pei
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Department of Medical Education, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan; (L.-Y.H.); (F.-Y.C.); (C.-H.K.); (D.P.)
| | - Yu-Fang Cheng
- Department of Endocrinology and Metabolism, Changhua Christian Hospital, Changhua 50051, Taiwan;
| | - Chi-Jie Lu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242062, Taiwan;
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City 242062, Taiwan
- Department of Information Management, Fu Jen Catholic University, New Taipei City 242062, Taiwan
| |
Collapse
|
15
|
Famitha S, Moorthi M. Intelligent and novel multi-type cancer prediction model using optimized ensemble learning. Comput Methods Biomech Biomed Engin 2022; 25:1879-1903. [PMID: 35695463 DOI: 10.1080/10255842.2022.2081504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Cancer is known to be highly severe disease and gets incurable even when the treatment has started at the time of diagnosis owing to the occurrence of cancer cells. Diverse machine learning approaches are implemented for predicting the cancer recurrence that needs to be evaluated for showing the appropriate approach for cancer prediction. This paper provides intelligent optimized ensemble learning for predicting multiple types of cancers. At first, the different types of cancer data are collected and performed the data cleansing. Then, the feature extraction is done using statistical features, 'Linear Discriminant Analysis (LDA), and Principal Component Analysis (PCA)'. With these features, a new Adaptive Condition Searched-Harris hawks Whale Optimization (ACS-HWO) is used for selecting the optimal features and transformed into weighted features with meta-heuristic update. The prediction is carried out by Optimized Ensemble-based Multi-disease Detection (OEMD) with Support Vector Machine (SVM), Autoencoder, Adaboost, 'Deep Neural Network (DNN), and Recurrent Neural Network (RNN)' with high ranking strategy. The same ACS-HWO is used for improvising the weighted feature selection and optimized ensemble learning. The comparative analysis over existing models shows that the suggested method can be highly applicable for the healthcare system to ensure the consistent prediction with the multi-type of cancers.
Collapse
Affiliation(s)
- S Famitha
- Associate Professor, Computer Science and Engineering, Prathyusha Engineering College, Anna University, Tiruvallur, India
| | - M Moorthi
- Professor & HOD, BME & Medical Electronics, Saveetha Engineering College, Anna University, Chennai India
| |
Collapse
|
16
|
Nan N. Integration and Development of Enterprise Internal Audit and Big Data Based on Data Mining Technology. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:8138046. [PMID: 35498211 PMCID: PMC9054413 DOI: 10.1155/2022/8138046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 03/28/2022] [Indexed: 11/20/2022]
Abstract
Auditing based on big data is the trend in the future audit development. First, the technical environment provides a technical support platform for continuous auditing. Through the development of information technology to promote the merger between financial services, the company's business operations have been digitized, and the original paper audit is also facing changes. This article aims to study the integration and development of enterprise internal audit and big data based on data mining technology. To this end, this article proposes a big data audit system, improves and optimizes the clustering algorithm (key algorithm) of data mining, and designs experiments and analysis to explore its related effects and improved performance, so that it can be more suitable for the research topic. The experimental results of this article show that the improved big data audit system improves the resource perfection of internal audit by 17.4%. The improved algorithm's accuracy rate has increased by 31.4%, and the best clustering ability has also been improved by 20.7%, which can be well applied to the company's internal audit.
Collapse
Affiliation(s)
- Nan Nan
- School of Accountancy, Xijing University, Xi'an 710123, Shaanxi, China
| |
Collapse
|
17
|
Kao HY, Chang CC, Chang CF, Chen YC, Cheewakriangkrai C, Tu YL. Associations between Sex and Risk Factors for Predicting Chronic Kidney Disease. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:1219. [PMID: 35162242 PMCID: PMC8835286 DOI: 10.3390/ijerph19031219] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 01/12/2022] [Accepted: 01/19/2022] [Indexed: 11/16/2022]
Abstract
Gender is an important risk factor in predicting chronic kidney disease (CKD); however, it is under-researched. The purpose of this study was to examine whether gender differences affect the risk factors of early CKD prediction. This study used data from 19,270 adult health screenings, including 5101 with CKD, to screen for 11 independent variables selected as risk factors and to test for the significant effects of statistical Chi-square test variables, using seven machine learning techniques to train the predictive models. Performance indicators included classification accuracy, sensitivity, specificity, and precision. Unbalanced category issues were addressed using three extraction methods: manual sampling, the synthetic minority oversampling technique, and SpreadSubsample. The Chi-square test revealed statistically significant results (p < 0.001) for gender, age, red blood cell count in urine, urine protein (PRO) content, and the PRO-to-urinary creatinine ratio. In terms of classifier prediction performance, the manual extraction method, logistic regression, exhibited the highest average prediction accuracy rate (0.8053) for men, whereas the manual extraction method, linear discriminant analysis, demonstrated the highest average prediction accuracy rate (0.8485) for women. The clinical features of a normal or abnormal PRO-to-urinary creatinine ratio indicated that PRO ratio, age, and urine red blood cell count are the most important risk factors with which to predict CKD in both genders. As a result, this study proposes a prediction model with acceptable prediction accuracy. The model supports doctors in diagnosis and treatment and achieves the goal of early detection and treatment. Based on the evidence-based medicine, machine learning methods are used to develop predictive model in this study. The model has proven to support the prediction of early clinical risk of CKD as much as possible to improve the efficacy and quality of clinical decision making.
Collapse
Affiliation(s)
- Hao-Yun Kao
- Department of Healthcare Administration and Medical Informatics, College of Health Sciences, Kaohsiung Medical University, Kaohsiung 80708, Taiwan;
| | - Chi-Chang Chang
- School of Medical Informatics, Chung Shan Medical University & IT Office, Chung Shan Medical University Hospital, Taichung City 40201, Taiwan;
- Department of Information Management, Ming Chuan University, Taoyuan City 33300, Taiwan
| | - Chin-Fang Chang
- Department of Otorhinolaryngology, Head and Neck Surgery, Jen-Ai Hospital, Taichung City 41222, Taiwan
- Cancer Medicine Center, Jen-Ai Hospital, Taichung City 41222, Taiwan
- Basic Medical Education Center, Central Taiwan University of Science and Technology, Taichung City 40601, Taiwan
- Department of Medical Education and Research, Jen-Ai Hospital, Taichung City 41222, Taiwan
| | - Ying-Chen Chen
- School of Medical Informatics, Chung Shan Medical University & IT Office, Chung Shan Medical University Hospital, Taichung City 40201, Taiwan;
| | - Chalong Cheewakriangkrai
- Department of Obstetrics and Gynecology, Division of Gynecologic Oncology, Faculty of Medicine, Chiang Mai University, Chiang Mai 50200, Thailand;
| | - Ya-Ling Tu
- Center for General Education, National Taichung University of Science and Technology, Taichung City 40401, Taiwan;
| |
Collapse
|
18
|
Sato M, Sato S, Shintani D, Hanaoka M, Ogasawara A, Miwa M, Yabuno A, Kurosaki A, Yoshida H, Fujiwara K, Hasegawa K. Clinical significance of metabolism-related genes and FAK activity in ovarian high-grade serous carcinoma. BMC Cancer 2022; 22:59. [PMID: 35027024 PMCID: PMC8756654 DOI: 10.1186/s12885-021-09148-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2021] [Accepted: 12/22/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Administration of poly (ADP-ribose) polymerase (PARP) inhibitors after achieving a response to platinum-containing drugs significantly prolonged relapse-free survival compared to placebo administration. PARP inhibitors have been used in clinical practice. However, patients with platinum-resistant relapsed ovarian cancer still have a poor prognosis and there is an unmet need. The purpose of this study was to examine the clinical significance of metabolic genes and focal adhesion kinase (FAK) activity in advanced ovarian high-grade serous carcinoma (HGSC). METHODS The RNA sequencing (RNA-seq) data and clinical data of HGSC patients were obtained from the Genomic Data Commons (GDC) Data Portal and analysed ( https://portal.gdc.cancer.gov/ ). In addition, tumour tissue was sampled by laparotomy or screening laparoscopy prior to treatment initiation from patients diagnosed with stage IIIC ovarian cancer (International Federation of Gynecology and Obstetrics (FIGO) classification, 2014) at the Saitama Medical University International Medical Center, and among the patients diagnosed with HGSC, 16 cases of available cryopreserved specimens were included in this study. The present study was reviewed and approved by the Institutional Review Board of Saitama Medical University International Medical Center (Saitama, Japan). Among the 6307 variable genes detected in both The Cancer Genome Atlas-Ovarian (TCGA-OV) data and clinical specimen data, 35 genes related to metabolism and FAK activity were applied. RNA-seq data were analysed using the Subio Platform (Subio Inc, Japan). JMP 15 (SAS, USA) was used for statistical analysis and various types of machine learning. The Kaplan-Meier method was used for survival analysis, and the Wilcoxon test was used to analyse significant differences. P < 0.05 was considered significant. RESULTS In the TCGA-OV data, patients with stage IIIC with a residual tumour diameter of 1-10 mm were selected for K means clustering and classified into groups with significant prognostic correlations (p = 0.0444). These groups were significantly associated with platinum sensitivity/resistance in clinical cases (χ2 test, p = 0.0408) and showed significant relationships with progression-free survival (p = 0.0307). CONCLUSION In the TCGA-OV data, 2 groups classified by clustering focusing on metabolism-related genes and FAK activity were shown to be associated with platinum resistance and a poor prognosis.
Collapse
Affiliation(s)
- Masakazu Sato
- Department of Gynecologic Oncology, Saitama Medical University International Medical Center, 1397-1 Yamane, Hidaka, Saitama, 350-1298, Japan.
| | - Sho Sato
- Department of Gynecologic Oncology, Saitama Medical University International Medical Center, 1397-1 Yamane, Hidaka, Saitama, 350-1298, Japan
| | - Daisuke Shintani
- Department of Gynecologic Oncology, Saitama Medical University International Medical Center, 1397-1 Yamane, Hidaka, Saitama, 350-1298, Japan
| | - Mieko Hanaoka
- Department of Gynecologic Oncology, Saitama Medical University International Medical Center, 1397-1 Yamane, Hidaka, Saitama, 350-1298, Japan
| | - Aiko Ogasawara
- Department of Gynecologic Oncology, Saitama Medical University International Medical Center, 1397-1 Yamane, Hidaka, Saitama, 350-1298, Japan
| | - Maiko Miwa
- Department of Gynecologic Oncology, Saitama Medical University International Medical Center, 1397-1 Yamane, Hidaka, Saitama, 350-1298, Japan
| | - Akira Yabuno
- Department of Gynecologic Oncology, Saitama Medical University International Medical Center, 1397-1 Yamane, Hidaka, Saitama, 350-1298, Japan
| | - Akira Kurosaki
- Department of Gynecologic Oncology, Saitama Medical University International Medical Center, 1397-1 Yamane, Hidaka, Saitama, 350-1298, Japan
| | - Hiroyuki Yoshida
- Department of Gynecologic Oncology, Saitama Medical University International Medical Center, 1397-1 Yamane, Hidaka, Saitama, 350-1298, Japan
| | | | - Kosei Hasegawa
- Department of Gynecologic Oncology, Saitama Medical University International Medical Center, 1397-1 Yamane, Hidaka, Saitama, 350-1298, Japan
| |
Collapse
|
19
|
An Integrated Approach for Cancer Survival Prediction Using Data Mining Techniques. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2021:6342226. [PMID: 34992648 PMCID: PMC8727098 DOI: 10.1155/2021/6342226] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 11/27/2021] [Indexed: 12/31/2022]
Abstract
Ovarian cancer is the third most common gynecologic cancers worldwide. Advanced ovarian cancer patients bear a significant mortality rate. Survival estimation is essential for clinicians and patients to understand better and tolerate future outcomes. The present study intends to investigate different survival predictors available for cancer prognosis using data mining techniques. Dataset of 140 advanced ovarian cancer patients containing data from different data profiles (clinical, treatment, and overall life quality) has been collected and used to foresee cancer patients' survival. Attributes from each data profile have been processed accordingly. Clinical data has been prepared corresponding to missing values and outliers. Treatment data including varying time periods were created using sequence mining techniques to identify the treatments given to the patients. And lastly, different comorbidities were combined into a single factor by computing Charlson Comorbidity Index for each patient. After appropriate preprocessing, the integrated dataset is classified using appropriate machine learning algorithms. The proposed integrated model approach gave the highest accuracy of 76.4% using ensemble technique with sequential pattern mining including time intervals of 2 months between treatments. Thus, the treatment sequences and, most importantly, life quality attributes significantly contribute to the survival prediction of cancer patients.
Collapse
|
20
|
Wu ZS, Huang SM, Wang YC. Palmitate Enhances the Efficacy of Cisplatin and Doxorubicin against Human Endometrial Carcinoma Cells. Int J Mol Sci 2021; 23:ijms23010080. [PMID: 35008502 PMCID: PMC8744704 DOI: 10.3390/ijms23010080] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Revised: 12/14/2021] [Accepted: 12/17/2021] [Indexed: 12/13/2022] Open
Abstract
Endometrial cancer is the most common gynecological cancer worldwide. At present there is no effective screening test for its early detection and no curative treatment for women with advanced-stage or recurrent disease. Overexpression of fatty acid synthase is a common molecular feature of a subgroup of sex steroid-related cancers associated with poor prognoses, including endometrial cancers. Disruption of this fatty acid synthesis leads to cell apoptosis, making it a potential therapeutic target. The saturated fatty acid palmitate reportedly induces lipotoxicity and cell death by inducing oxidative stress in many cell types. Here, we explored the effects of palmitate combined with doxorubicin or cisplatin in the HEC-1-A and RL95-2 human endometrial cancer cell lines. The results showed that physiological concentrations of exogenous palmitate significantly increased cell cycle arrest, DNA damage, autophagy, and apoptosis in both RL95-2 and HEC-1-A cells. It also increased the chemosensitivity of both cell types. Notably, we did not observe that palmitate lipotoxicity reflected increased levels of reactive oxygen species, suggesting palmitate acts via a different mechanism in endometrial cancer. This study thus provides a potential therapeutic strategy in which palmitate is used as an adjuvant in the treatment of endometrial cancer.
Collapse
Affiliation(s)
- Zih-Syuan Wu
- Graduate Institute of Life Sciences, National Defense Medical Center, Taipei City 114, Taiwan; (Z.-S.W.); (S.-M.H.)
| | - Shih-Ming Huang
- Graduate Institute of Life Sciences, National Defense Medical Center, Taipei City 114, Taiwan; (Z.-S.W.); (S.-M.H.)
- Department of Biochemistry, National Defense Medical Center, Taipei City 114, Taiwan
| | - Yu-Chi Wang
- Graduate Institute of Life Sciences, National Defense Medical Center, Taipei City 114, Taiwan; (Z.-S.W.); (S.-M.H.)
- Department of Obstetrics and Gynecology, Tri-Service General Hospital, National Defense Medical Center, Taipei City 114, Taiwan
- Correspondence:
| |
Collapse
|
21
|
Su K, Wu J, Gu D, Yang S, Deng S, Khakimova AK. An Adaptive Deep Ensemble Learning Method for Dynamic Evolving Diagnostic Task Scenarios. Diagnostics (Basel) 2021; 11:2288. [PMID: 34943525 PMCID: PMC8700766 DOI: 10.3390/diagnostics11122288] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 12/04/2021] [Accepted: 12/06/2021] [Indexed: 12/19/2022] Open
Abstract
Increasingly, machine learning methods have been applied to aid in diagnosis with good results. However, some complex models can confuse physicians because they are difficult to understand, while data differences across diagnostic tasks and institutions can cause model performance fluctuations. To address this challenge, we combined the Deep Ensemble Model (DEM) and tree-structured Parzen Estimator (TPE) and proposed an adaptive deep ensemble learning method (TPE-DEM) for dynamic evolving diagnostic task scenarios. Different from previous research that focuses on achieving better performance with a fixed structure model, our proposed model uses TPE to efficiently aggregate simple models more easily understood by physicians and require less training data. In addition, our proposed model can choose the optimal number of layers for the model and the type and number of basic learners to achieve the best performance in different diagnostic task scenarios based on the data distribution and characteristics of the current diagnostic task. We tested our model on one dataset constructed with a partner hospital and five UCI public datasets with different characteristics and volumes based on various diagnostic tasks. Our performance evaluation results show that our proposed model outperforms other baseline models on different datasets. Our study provides a novel approach for simple and understandable machine learning models in tasks with variable datasets and feature sets, and the findings have important implications for the application of machine learning models in computer-aided diagnosis.
Collapse
Affiliation(s)
- Kaixiang Su
- School of Management, Hefei University of Technology, Hefei 230009, China; (K.S.); (S.Y.)
| | - Jiao Wu
- School of Business, Northern Illinois University, DeKalb, IL 60115, USA;
| | - Dongxiao Gu
- School of Management, Hefei University of Technology, Hefei 230009, China; (K.S.); (S.Y.)
- Key Laboratory of Process Optimization and Intelligent Decision-Making of Ministry of Education, Hefei 230009, China
| | - Shanlin Yang
- School of Management, Hefei University of Technology, Hefei 230009, China; (K.S.); (S.Y.)
- Key Laboratory of Process Optimization and Intelligent Decision-Making of Ministry of Education, Hefei 230009, China
| | | | - Aida K. Khakimova
- Scientific-Research Center for Physical-Technical Informatics, Russian New University, Radio St., 22, 105005 Moscow, Russia;
| |
Collapse
|
22
|
Akazawa M, Hashimoto K. Artificial intelligence in gynecologic cancers: Current status and future challenges - A systematic review. Artif Intell Med 2021; 120:102164. [PMID: 34629152 DOI: 10.1016/j.artmed.2021.102164] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 05/28/2021] [Accepted: 08/31/2021] [Indexed: 11/30/2022]
Abstract
OBJECTIVE Over the past years, the application of artificial intelligence (AI) in medicine has increased rapidly, especially in diagnostics, and in the near future, the role of AI in medicine will become progressively more important. In this study, we elucidated the state of AI research on gynecologic cancers. METHODS A search was conducted in three databases-PubMed, Web of Science, and Scopus-for research papers dated between January 2010 and December 2020. As keywords, we used "artificial intelligence," "deep learning," "machine learning," and "neural network," combined with "cervical cancer," "endometrial cancer," "uterine cancer," and "ovarian cancer." We excluded genomic and molecular research, as well as automated pap-smear diagnoses and digital colposcopy. RESULTS Of 1632 articles, 71 were eligible, including 34 on cervical cancer, 13 on endometrial cancer, three on uterine sarcoma, and 21 on ovarian cancer. A total of 35 studies (49%) used imaging data and 36 studies (51%) used value-based data as the input data. Magnetic resonance imaging (MRI), computed tomography (CT), ultrasound, cytology, and hysteroscopy data were used as imaging data, and the patients' backgrounds, blood examinations, tumor markers, and indices in pathological examination were used as value-based data. The targets of prediction were definitive diagnosis and prognostic outcome, including overall survival and lymph node metastasis. The size of the dataset was relatively small because 64 studies (90%) included less than 1000 cases, and the median size was 214 cases. The models were evaluated by accuracy scores, area under the receiver operating curve (AUC), and sensitivity/specificity. Owing to the heterogeneity, a quantitative synthesis was not appropriate in this review. CONCLUSIONS In gynecologic oncology, more studies have been conducted on cervical cancer than on ovarian and endometrial cancers. Prognoses were mainly used in the study of cervical cancer, whereas diagnoses were primarily used for studying ovarian cancer. The proficiency of the study design for endometrial cancer and uterine sarcoma was unclear because of the small number of studies conducted. The small size of the dataset and the lack of a dataset for external validation were indicated as the challenges of the studies.
Collapse
Affiliation(s)
- Munetoshi Akazawa
- Department of Obstetrics and Gynecology, Tokyo Women's Medical University Medical Center East, Tokyo, Japan.
| | - Kazunori Hashimoto
- Department of Obstetrics and Gynecology, Tokyo Women's Medical University Medical Center East, Tokyo, Japan
| |
Collapse
|
23
|
Risk Prediction of Second Primary Endometrial Cancer in Obese Women: A Hospital-Based Cancer Registry Study. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph18178997. [PMID: 34501584 PMCID: PMC8431143 DOI: 10.3390/ijerph18178997] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 08/06/2021] [Accepted: 08/16/2021] [Indexed: 12/15/2022]
Abstract
Due to the high effectiveness of cancer screening and therapies, the diagnosis of second primary cancers (SPCs) has increased in women with endometrial cancer (EC). However, previous studies providing adequate evidence to support screening for SPCs in endometrial cancer are lacking. This study aimed to develop effective risk prediction models of second primary endometrial cancer (SPEC) in women with obesity (body mass index (BMI) > 25) and included datasets on the incidence of SPEC and the other risks of SPEC in 4480 primary cancer survivors from a hospital-based cancer registry database. We found that obesity plays a key role in SPEC. We used 10 independent variables as predicting variables, which correlated to obesity, and so should be monitored for the early detection of SPEC in endometrial cancer. Our proposed scheme is promising for SPEC prediction and demonstrates the important influence of obesity and clinical data representation in all cases following primary treatments. Our results suggest that obesity is still a crucial risk factor for SPEC in endometrial cancer.
Collapse
|
24
|
Spatial Prediction of Groundwater Potentiality in Large Semi-Arid and Karstic Mountainous Region Using Machine Learning Models. WATER 2021. [DOI: 10.3390/w13162273] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The drinking and irrigation water scarcity is a major global issue, particularly in arid and semi-arid zones. In rural areas, groundwater could be used as an alternative and additional water supply source in order to reduce human suffering in terms of water scarcity. In this context, the purpose of the present study is to facilitate groundwater potentiality mapping via spatial-modelling techniques, individual and ensemble machine-learning models. Random forest (RF), logistic regression (LR), decision tree (DT) and artificial neural networks (ANNs) are the main algorithms used in this study. The preparation of groundwater potentiality maps was assembled into 11 ensembles of models. Overall, about 374 groundwater springs was identified and inventoried in the mountain area. The spring inventory data was randomly divided into training (75%) and testing (25%) datasets. Twenty-four groundwater influencing factors (GIFs) were selected based on a multicollinearity test and the information gain calculation. The results of the groundwater potentiality mapping were validated using statistical measures and the receiver operating characteristic curve (ROC) method. Finally, a ranking of the 15 models was achieved with the prioritization rank method using the compound factor (CF) method. The ensembles of models are the most stable and suitable for groundwater potentiality mapping in mountainous aquifers compared to individual models based on success and prediction rate. The most efficient model using the area under the curve validation method is the RF-LR-DT-ANN ensemble of models. Moreover, the results of the prioritization rank indicate that the best models are the RF-DT and RF-LR-DT ensembles of models.
Collapse
|
25
|
Kuo CC, Wang HH, Tseng LP. Using data mining technology to predict medication-taking behaviour in women with breast cancer: A retrospective study. Nurs Open 2021; 9:2646-2656. [PMID: 34156764 PMCID: PMC9584494 DOI: 10.1002/nop2.963] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 05/10/2021] [Accepted: 05/27/2021] [Indexed: 11/19/2022] Open
Abstract
Aims Medication‐taking behaviours of breast cancer survivors undergoing adjuvant hormone therapy have received considerable attention. This study aimed to determine factors affecting medication‐taking behaviours in people with breast cancer using data mining. Design A longitudinal observational retrospective cohort study with a hospital‐based survey. Methods A total of 385 subjects were surveyed, analysing existing data from January 2010 to December 2017 in Taiwan. Three data mining approaches—multiple logistic regression, decision tree and artificial neural network—were used to build the prediction models and rank the importance of influencing factors. Accuracy, specificity and sensitivity were used as assessment indicators for the prediction models. Results Multiple logistic regression was the most effective approach, achieving an accuracy of 96.37%, specificity of 96.75% and sensitivity of 96.12%. The duration of adjuvant hormone therapy discontinuation, duration of adjuvant hormone therapy use and age at diagnosis by data mining were the three most critical factors influencing the medication‐taking behaviours of people with breast cancer.
Collapse
Affiliation(s)
- Chen-Chen Kuo
- The Cancer Prevention and Treatment Center, St. Martin De Porres Hospital, Chiayi, Taiwan.,School of Nursing, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Hsiu-Hung Wang
- School of Nursing, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Li-Ping Tseng
- Management Center, St. Martin De Porres Hospital, Chiayi, Taiwan.,Department of Industrial Engineering and Management, National Yunlin University of Science and Technology, Douliu, Taiwan
| |
Collapse
|
26
|
Abstract
Importance Artificial intelligence (AI) will play an increasing role in health care. In gynecologic oncology, it can advance tailored screening, precision surgery, and personalized targeted therapies. Objective The aim of this study was to review the role of AI in gynecologic oncology. Evidence Acquisition Artificial intelligence publications in gynecologic oncology were identified by searching "gynecologic oncology AND artificial intelligence" in the PubMed database. A review of the literature was performed on the history of AI, its fundamentals, and current applications as related to diagnosis and treatment of cervical, uterine, and ovarian cancers. Results A PubMed literature search since the year 2000 showed a significant increase in oncology publications related to AI and oncology. Early studies focused on using AI to interrogate electronic health records in order to improve clinical outcome and facilitate clinical research. In cervical cancer, AI algorithms can enhance image analysis of cytology and visual inspection with acetic acid or colposcopy. In uterine cancers, AI can improve the diagnostic accuracies of radiologic imaging and predictive/prognostic capabilities of clinicopathologic characteristics. Artificial intelligence has also been used to better detect early-stage ovarian cancer and predict surgical outcomes and treatment response. Conclusions and Relevance Artificial intelligence has been shown to enhance diagnosis, refine clinical decision making, and advance personalized therapies in gynecologic cancers. The rapid adoption of AI in gynecologic oncology will depend on overcoming the challenges related to data transparency, quality, and interpretation. Artificial intelligence is rapidly transforming health care. However, many physicians are unaware that this technology is being used in their practices and could benefit from a better understanding of the statistics and computer science behind these algorithms. This review provides a summary of AI, its applicability, and its limitations in gynecologic oncology.
Collapse
|
27
|
Lu H, Gao H, Ye M, Wang X. A Hybrid Ensemble Algorithm Combining AdaBoost and Genetic Algorithm for Cancer Classification with Gene Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:863-870. [PMID: 31722484 DOI: 10.1109/tcbb.2019.2952102] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The diversity of base classifiers and integration of multiple classifiers are two key issues in the field of ensemble learning. This paper puts forward a hybrid ensemble algorithm combining AdaBoost and genetic algorithm(GA) for cancer classification with gene expression data. The decision group is designed to increase the diversity of base classifier pool, and the GA is used to assign weight to each base classifier, thus to improve the classification performance by avoiding local extrema. The decision groups composed by using base classifiers, including K-nearest neighbor (KNN), Naïve Bayes (NB), and Decision Tree (C4.5). Experimental results show that the proposed algorithm is superior to those existing ensemble learning methods, such as Bagging, Random Forest (RF), Rotation Forest (RoF), AdaBoost, AdaBoost-BPNN, AdaBoost-SVM, and AdaBoost-RF, especially it has better performance on small samples and unbalanced gene expression data processing.
Collapse
|
28
|
Sathipati SY, Ho SY. Identification of the miRNA signature associated with survival in patients with ovarian cancer. Aging (Albany NY) 2021; 13:12660-12690. [PMID: 33910165 PMCID: PMC8148489 DOI: 10.18632/aging.202940] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 03/23/2021] [Indexed: 12/22/2022]
Abstract
Ovarian cancer is a major gynaecological malignant tumor associated with a high mortality rate. Identifying survival-related variants may improve treatment and survival in patients with ovarian cancer. In this work, we proposed a support vector regression (SVR)-based method called OV-SURV, which is incorporated with an inheritable bi-objective combinatorial genetic algorithm for feature selection to identify a miRNA signature associated with survival in patients with ovarian cancer. There were 209 patients with miRNA expression profiles and survival information of ovarian cancer retrieved from The Cancer Genome Atlas database. OV-SURV achieved a mean correlation coefficient of 0.77±0.01and a mean absolute error of 0.69±0.02 years using 10-fold cross-validation. Analysis of the top ranked miRNAs revealed that the miRNAs, hsa-let-7f, hsa-miR-1237, hsa-miR-98, hsa-miR-933, and hsa-miR-889, were significantly associated with the survival in patients with ovarian cancer. Kyoto Encyclopedia of Genes and Genomes pathway analysis revealed that four of these miRNAs, hsa-miR-182, hsa-miR-34a, hsa-miR-342, and hsa-miR-1304, were highly enriched in fatty acid biosynthesis, and the five miRNAs, hsa-let-7f, hsa-miR-34a, hsa-miR-342, hsa-miR-1304, and hsa-miR-24, were highly enriched in fatty acid metabolism. The prediction model with the identified miRNA signature consisting of prognostic biomarkers can benefit therapeutic decision making of ovarian cancer.
Collapse
Affiliation(s)
- Srinivasulu Yerukala Sathipati
- Center for Precision Medicine Research, Marshfield Clinic Research Institute, Marshfield, WI 54449, USA.,Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan.,Institute of Population Health Sciences, National Health Research Institutes, Miaoli, Taiwan
| | - Shinn-Ying Ho
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan.,Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan.,Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan.,Center For Intelligent Drug Systems and Smart Bio-devices (IDS2B), National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| |
Collapse
|
29
|
Hybrid Basketball Game Outcome Prediction Model by Integrating Data Mining Methods for the National Basketball Association. ENTROPY 2021; 23:e23040477. [PMID: 33920720 PMCID: PMC8073849 DOI: 10.3390/e23040477] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 04/08/2021] [Accepted: 04/14/2021] [Indexed: 12/18/2022]
Abstract
The sports market has grown rapidly over the last several decades. Sports outcomes prediction is an attractive sports analytic challenge as it provides useful information for operations in the sports market. In this study, a hybrid basketball game outcomes prediction scheme is developed for predicting the final score of the National Basketball Association (NBA) games by integrating five data mining techniques, including extreme learning machine, multivariate adaptive regression splines, k-nearest neighbors, eXtreme gradient boosting (XGBoost), and stochastic gradient boosting. Designed features are generated by merging different game-lags information from fundamental basketball statistics and used in the proposed scheme. This study collected data from all the games of the NBA 2018-2019 seasons. There are 30 teams in the NBA and each team play 82 games per season. A total of 2460 NBA game data points were collected. Empirical results illustrated that the proposed hybrid basketball game prediction scheme achieves high prediction performance and identifies suitable game-lag information and relevant game features (statistics). Our findings suggested that a two-stage XGBoost model using four pieces of game-lags information achieves the best prediction performance among all competing models. The six designed features, including averaged defensive rebounds, averaged two-point field goal percentage, averaged free throw percentage, averaged offensive rebounds, averaged assists, and averaged three-point field goal attempts, from four game-lags have a greater effect on the prediction of final scores of NBA games than other game-lags. The findings of this study provide relevant insights and guidance for other team or individual sports outcomes prediction research.
Collapse
|
30
|
Liu Y, Xu B, Liu M, Qiao H, Zhang S, Qiu J, Ying X. Long non-coding RNA SNHG25 promotes epithelial ovarian cancer progression by up-regulating COMP. J Cancer 2021; 12:1660-1668. [PMID: 33613753 PMCID: PMC7890321 DOI: 10.7150/jca.47344] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Accepted: 12/24/2020] [Indexed: 12/11/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) play a pivotal role in the genesis and development of cancer. The role and molecular mechanisms of SNHG25 in epithelial ovarian cancer (EOC) have not been investigated. In the present study, we showed that SNHG25 expression was up-regulated in EOC tissues relative to normal ovarian tissues. In vitro, functional experiments demonstrated that high expression of SNHG25 promoted proliferation, migration and invasion, and decreased apoptosis, in ovarian cancer cell lines. In vivo, downregulation of SNHG25 inhibited the growth (tumor volume) of subcutaneous xenografts in nude mice. High-throughput sequencing and western blot analysis showed a significant decrease in the expression of COMP mRNA and protein in SNHG25 knockdown compared to control ovarian cancer cells. These data suggest that SNHG25 promotes EOC progression by regulating COMP, serving as a potential biomarker for EOC.
Collapse
Affiliation(s)
- Yinglei Liu
- Department of Obstetrics and Gynecology, the Second Affiliated Hospital of Nanjing Medical University, 262 Zhongshan North Road, Nanjing, 210000, China.,Department of Obstetrics and Gynecology, the Second Affiliated Hospital of Nantong University, 6 Haierxiang North Road, Nantong, 226000, China
| | - Boqun Xu
- Department of Obstetrics and Gynecology, the Second Affiliated Hospital of Nanjing Medical University, 262 Zhongshan North Road, Nanjing, 210000, China
| | - Manhua Liu
- Department of Obstetrics and Gynecology, the Second Affiliated Hospital of Nantong University, 6 Haierxiang North Road, Nantong, 226000, China
| | - Haifeng Qiao
- Department of Obstetrics and Gynecology, the Second Affiliated Hospital of Nantong University, 6 Haierxiang North Road, Nantong, 226000, China
| | - Siming Zhang
- Department of Obstetrics and Gynecology, the Second Affiliated Hospital of Nantong University, 6 Haierxiang North Road, Nantong, 226000, China
| | - Junjun Qiu
- Department of Gynecology, Obstetrics and Gynecology Hospital, Fudan University, 419 Fangxie Road, Shanghai, 200011, China.,Shanghai Key Laboratory of Female Reproductive Endocrine-Related Diseases, 413 Zhaozhou Road, Shanghai, 200011, China
| | - Xiaoyan Ying
- Department of Obstetrics and Gynecology, the Second Affiliated Hospital of Nanjing Medical University, 262 Zhongshan North Road, Nanjing, 210000, China
| |
Collapse
|
31
|
Shih CC, Lu CJ, Chen GD, Chang CC. Risk Prediction for Early Chronic Kidney Disease: Results from an Adult Health Examination Program of 19,270 Individuals. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:ijerph17144973. [PMID: 32664271 PMCID: PMC7399976 DOI: 10.3390/ijerph17144973] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/30/2020] [Revised: 07/06/2020] [Accepted: 07/07/2020] [Indexed: 12/13/2022]
Abstract
Developing effective risk prediction models is a cost-effective approach to predicting complications of chronic kidney disease (CKD) and mortality rates; however, there is inadequate evidence to support screening for CKD. In this study, four data mining algorithms, including a classification and regression tree, a C4.5 decision tree, a linear discriminant analysis, and an extreme learning machine, are used to predict early CKD. The study includes datasets from 19,270 patients, provided by an adult health examination program from 32 chain clinics and three special physical examination centers, between 2015 and 2019. There were 11 independent variables, and the glomerular filtration rate (GFR) was used as the predictive variable. The C4.5 decision tree algorithm outperformed the three comparison models for predicting early CKD based on accuracy, sensitivity, specificity, and area under the curve metrics. It is, therefore, a promising method for early CKD prediction. The experimental results showed that Urine protein and creatinine ratio (UPCR), Proteinuria (PRO), Red blood cells (RBC), Glucose Fasting (GLU), Triglycerides (TG), Total Cholesterol (T-CHO), age, and gender are important risk factors. CKD care is closely related to primary care level and is recognized as a healthcare priority in national strategy. The proposed risk prediction models can support the important influence of personality and health examination representations in predicting early CKD.
Collapse
Affiliation(s)
- Chin-Chuan Shih
- Institute of Medicine, Chung Shan Medical University, Taichung 40201, Taiwan; (C.-C.S.); (G.-D.C.)
- General Administrative Department, United Safety Medical Group, New Taipei City 24205, Taiwan
- Deputy Chairman, Taiwan Association of Family Medicine, Taipei 24200, Taiwan
| | - Chi-Jie Lu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei 24205, Taiwan;
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei 24205, Taiwan
| | - Gin-Den Chen
- Institute of Medicine, Chung Shan Medical University, Taichung 40201, Taiwan; (C.-C.S.); (G.-D.C.)
- Department of Obstetrics and Gynecology, Chung Shan Medical University Hospital, Taichung 40201, Taiwan
| | - Chi-Chang Chang
- School of Medical Informatics, Chung Shan Medical University & IT office, Chung Shan Medical University Hospital, Taichung 40201, Taiwan
- Correspondence: ; Tel.: +886-4-24730022
| |
Collapse
|
32
|
Landslide Susceptibility Prediction Considering Regional Soil Erosion Based on Machine-Learning Models. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION 2020. [DOI: 10.3390/ijgi9060377] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Soil erosion (SE) provides slide mass sources for landslide formation, and reflects long-term rainfall erosion destruction of landslides. Therefore, it is possible to obtain more reliable landslide susceptibility prediction results by introducing SE as a geology and hydrology-related predisposing factor. The Ningdu County of China is taken as a research area. Firstly, 446 landslides are obtained through government disaster survey reports. Secondly, the SE amount in Ningdu County is calculated and nine other conventional predisposing factors are obtained under both 30 m and 60 m grid resolutions to determine the effects of SE on landslide susceptibility prediction. Thirdly, four types of machine-learning predictors with 30 m and 60 m grid resolutions—C5.0 decision tree (C5.0 DT), logistic regression (LR), multilayer perceptron (MLP) and support vector machine (SVM)—are applied to construct the landslide susceptibility prediction models considering the SE factor as SE-C5.0 DT, SE-LR, SE-MLP and SE-SVM models; C5.0 DT, LR, MLP and SVM models with no SE are also used for comparisons. Finally, the area under receiver operating feature curve is used to verify the prediction accuracy of these models, and the relative importance of all the 10 predisposing factors is ranked. The results indicate that: (1) SE factor plays the most important role in landslide susceptibility prediction among all 10 predisposing factors under both 30 m and 60 m resolutions; (2) the SE-based models have more accurate landslide susceptibility prediction than the single models with no SE factor; (3) all the models with 30 m resolutions have higher landslide susceptibility prediction accuracy than those with 60 m resolutions; and (4) the C5.0 DT and SVM models show higher landslide susceptibility prediction performance than the MLP and LR models.
Collapse
|
33
|
Doja M, Kaur I, Ahmad T. Age-specific survival in prostate cancer using machine learning. DATA TECHNOLOGIES AND APPLICATIONS 2020. [DOI: 10.1108/dta-10-2019-0189] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/09/2022]
Abstract
PurposeThe incidence of prostate cancer is increasing from the past few decades. Various studies have tried to determine the survival of patients, but metastatic prostate cancer is still not extensively explored. The survival rate of metastatic prostate cancer is very less compared to the earlier stages. The study aims to investigate the survivability of metastatic prostate cancer based on the age group to which a patient belongs, and the difference between the significance of the attributes for different age groups.Design/methodology/approachData of metastatic prostate cancer patients was collected from a cancer hospital in India. Two predictive models were built for the analysis-one for the complete dataset, and the other for separate age groups. Machine learning was applied to both the models and their accuracies were compared for the analysis. Also, information gain for each model has been evaluated to determine the significant predictors for each age group.FindingsThe ensemble approach gave the best results of 81.4% for the complete dataset, and thus was used for the age-specific models. The results concluded that the age-specific model had the direct average accuracy of 83.74% and weighted average accuracy of 79.9%, with the highest accuracy levels for age less than 60.Originality/valueThe study developed a model that predicts the survival of metastatic prostate cancer based on age. The study will be able to assist the clinicians in determining the best course of treatment for each patient based on ECOG, age and comorbidities.
Collapse
|
34
|
Landslide Susceptibility Prediction Modeling Based on Remote Sensing and a Novel Deep Learning Algorithm of a Cascade-Parallel Recurrent Neural Network. SENSORS 2020; 20:s20061576. [PMID: 32178235 PMCID: PMC7146231 DOI: 10.3390/s20061576] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Revised: 02/26/2020] [Accepted: 03/06/2020] [Indexed: 11/17/2022]
Abstract
Landslide susceptibility prediction (LSP) modeling is an important and challenging problem. Landslide features are generally uncorrelated or nonlinearly correlated, resulting in limited LSP performance when leveraging conventional machine learning models. In this study, a deep-learning-based model using the long short-term memory (LSTM) recurrent neural network and conditional random field (CRF) in cascade-parallel form was proposed for making LSPs based on remote sensing (RS) images and a geographic information system (GIS). The RS images are the main data sources of landslide-related environmental factors, and a GIS is used to analyze, store, and display spatial big data. The cascade-parallel LSTM-CRF consists of frequency ratio values of environmental factors in the input layers, cascade-parallel LSTM for feature extraction in the hidden layers, and cascade-parallel full connection for classification and CRF for landslide/non-landslide state modeling in the output layers. The cascade-parallel form of LSTM can extract features from different layers and merge them into concrete features. The CRF is used to calculate the energy relationship between two grid points, and the extracted features are further smoothed and optimized. As a case study, the cascade-parallel LSTM-CRF was applied to Shicheng County of Jiangxi Province in China. A total of 2709 landslide grid cells were recorded and 2709 non-landslide grid cells were randomly selected from the study area. The results show that, compared with existing main traditional machine learning algorithms, such as multilayer perception, logistic regression, and decision tree, the proposed cascade-parallel LSTM-CRF had a higher landslide prediction rate (positive predictive rate: 72.44%, negative predictive rate: 80%, total predictive rate: 75.67%). In conclusion, the proposed cascade-parallel LSTM-CRF is a novel data-driven deep learning model that overcomes the limitations of traditional machine learning algorithms and achieves promising results for making LSPs.
Collapse
|
35
|
Li X, Zhao L, Meng T. Upregulated CXCL14 is associated with poor survival outcomes and promotes ovarian cancer cells proliferation. Cell Biochem Funct 2020; 38:613-620. [PMID: 32077118 DOI: 10.1002/cbf.3516] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 01/24/2020] [Accepted: 02/05/2020] [Indexed: 12/12/2022]
Abstract
Ovarian cancer is one of the common malignant tumours of female reproductive organs. Due to early diagnosis difficulties and lack of effective treatment in the late stage, ovarian cancer has the highest mortality rate in female reproductive system malignancies. Therefore, finding reliable early diagnosis indicators and new therapeutic targets for ovarian cancer is an urgent problem to be solved. Chemokine (C-X-C motif) ligand 14 (CXCL14) is a small cytokine belonging to the CXC chemokine family, which has been found to possess multi-effects in tumourigenesis and development. Here, we reported that CXCL14 was preferentially expressed in ovarian cancer. By analysing the TCGA database, we found that CXCL14 was highly expressed in advanced ovarian cancer patients and correlated with poor prognosis. In addition, the abnormal high CXCL14 levels were observed in serum and ovarian tissue of ovarian cancer patients by qRT-PCR and ELISA. In vitro and in vivo experiments both confirmed that overexpression of CXCL14 promoted the ovarian cancer cell proliferation. Moreover, transfection of CXCL14 increased the phosphorylation level of signal transducer and activator of transcription 3 (STAT3), and administration of STAT3 inhibitor III inhibited the tumour-promoting effects of CXCL14. Therefore, our study suggests that CXCL14 could be utilised as a novel adjunct biomarker for early diagnosis of ovarian cancer and provides new targets and ideas for the treatment of advanced ovarian cancer. SIGNIFICANCE PARAGRAPH: CXCL14 could be utilised as a novel adjunct biomarker for early diagnosis of ovarian cancer and provides new targets and ideas for the treatment of advanced ovarian cancer.
Collapse
Affiliation(s)
- Xue Li
- Department of Obstetrics and Gynecology, Liaocheng People's Hospital, Liaocheng, China
| | - Longjun Zhao
- Department of Obstetrics and Gynecology, Liaocheng People's Hospital, Liaocheng, China
| | - Tengteng Meng
- Department of Obstetrics and Gynecology, Liaocheng People's Hospital, Liaocheng, China
| |
Collapse
|
36
|
Developing a Novel Machine Learning-Based Classification Scheme for Predicting SPCs in Colorectal Cancer Survivors. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10041355] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Colorectal cancer is ranked third and fourth in terms of mortality and cancer incidence in the world. While advances in treatment strategies have provided cancer patients with longer survival, potentially harmful second primary cancers can occur. Therefore, second primary colorectal cancer analysis is an important issue with regard to clinical management. In this study, a novel predictive scheme was developed for predicting the risk factors associated with second colorectal cancer in patients with colorectal cancer by integrating five machine learning classification techniques, including support vector machine, random forest, multivariate adaptive regression splines, extreme learning machine, and extreme gradient boosting. A total of 4287 patients in the datasets provided by three hospital tumor registries were used. Our empirical results revealed that this proposed predictive scheme provided promising classification results and the identification of important risk factors for predicting second colorectal cancer based on accuracy, sensitivity, specificity, and area under the curve metrics. Collectively, our clinical findings suggested that the most important risk factors were the combined stage, age at diagnosis, BMI, surgical margins of the primary site, tumor size, sex, regional lymph nodes positive, grade/differentiation, primary site, and drinking behavior. Accordingly, these risk factors should be monitored for the early detection of second primary tumors in order to improve treatment and intervention strategies.
Collapse
|
37
|
Talaei-Khoei A, Tavana M, Wilson JM. A predictive analytics framework for identifying patients at risk of developing multiple medical complications caused by chronic diseases. Artif Intell Med 2019; 101:101750. [PMID: 31813486 DOI: 10.1016/j.artmed.2019.101750] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2018] [Revised: 07/07/2019] [Accepted: 10/30/2019] [Indexed: 01/22/2023]
Abstract
Chronic diseases often cause several medical complications. This paper aims to predict multiple complications among patients with a chronic disease. The literature uses single-task learning algorithms to predict complications independently and assumes no correlation among complications of chronic diseases. We propose two methods (independent prediction of complications with single-task learning and concurrent prediction of complications with multi-task learning) and show that medical complications of chronic diseases can be correlated. We use a case study and compare the performance of these two methods by predicting complications of hypertrophic cardiomyopathy on 106 predictors in 1078 electronic medical records from April 2009-April 2017, inclusive. The methods are implemented using logistic regression, artificial neural networks, decision trees, and support vector machines. The results show multi-task learning with logistic regression improves the performance of predictions in terms of both discrimination and calibration.
Collapse
Affiliation(s)
- Amir Talaei-Khoei
- Department of Information Systems, University of Nevada, Reno, USA; School of Software, University of Technology Sydney, Australia.
| | - Madjid Tavana
- Business Systems and Analytics Department, Distinguished Chair of Business Analytics, La Salle University, Philadelphia, USA; Business Information Systems Department, Faculty of Business Administration and Economics, University of Paderborn, Paderborn, Germany.
| | - James M Wilson
- School of Community Health Sciences, University of Nevada, Reno, USA.
| |
Collapse
|
38
|
Chang CC, Chen SH. Developing a Novel Machine Learning-Based Classification Scheme for Predicting SPCs in Breast Cancer Survivors. Front Genet 2019; 10:848. [PMID: 31620166 PMCID: PMC6759630 DOI: 10.3389/fgene.2019.00848] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Accepted: 08/14/2019] [Indexed: 11/13/2022] Open
Abstract
Due to the high effectiveness of cancer screening and therapies, the diagnosis of second primary cancers (SPCs) has increased in women with breast cancer. The present study was conducted to develop a novel machine learning-based classification scheme for predicting the risk factors of SPCs in breast cancer survivors. The proposed scheme was based on the XGBoost classifier with the following four comparable strategies: transformation, resampling, clustering, and ensemble learning, to improve the training balanced accuracy. Results suggested that the best prediction accuracy for an empirical case is the XGBoost associated with the strategies of resampling and clustering. The experimental results showed that age, sequence of radiotherapy and surgery, surgical margins of the primary site, human epidermal growth factor, high-dose clinical target volume, and estrogen receptors are relatively more important risk factors associated with SPCs in patients with breast cancer. These risk factors should be monitored for the early detection of breast cancer. In conclusion, the proposed scheme can support the important influence of personality and clinical symptom representations in all phases of the primary treatment trajectory. Our results further suggested that adaptive machine learning techniques require the incorporation of significant variables for optimal predictions.
Collapse
Affiliation(s)
- Chi-Chang Chang
- School of Medical Informatics, Chung Shan Medical University, Taichung, Taiwan.,IT Office, Chung Shan Medical University Hospital, Taichung, Taiwan
| | - Ssu-Han Chen
- Department of Industrial Engineering and Management, Ming Chi University of Technology, New Taipei City, Taiwan
| |
Collapse
|
39
|
Ye X, Li H, Sakurai T, Shueng PW. Ensemble Feature Learning to Identify Risk Factors for Predicting Secondary Cancer. Int J Med Sci 2019; 16:949-959. [PMID: 31341408 PMCID: PMC6643128 DOI: 10.7150/ijms.33820] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/03/2019] [Accepted: 04/24/2019] [Indexed: 12/11/2022] Open
Abstract
Background: In recent years, the development and diagnosis of secondary cancer have become the primary concern of cancer survivors. A number of studies have been developing strategies to extract knowledge from the clinical data, aiming to identify important risk factors that can be used to prevent the recurrence of diseases. However, these studies do not focus on secondary cancer. Secondary cancer is lack of the strategies for clinical treatment as well as risk factor identification to prevent the occurrence. Methods: We propose an effective ensemble feature learning method to identify the risk factors for predicting secondary cancer by considering class imbalance and patient heterogeneity. We first divide the patients into some heterogeneous groups based on spectral clustering. In each group, we apply the oversampling method to balance the number of samples in each class and use them as training data for ensemble feature learning. The purpose of ensemble feature learning is to identify the risk factors and construct a diagnosis model for each group. The importance of risk factors is measured based on the properties of patients in each group separately. We predict secondary cancer by assigning the patient to a corresponding group and based on the diagnosis model in this corresponding group. Results: Analysis of the results shows that the decision tree obtains the best results for predicting secondary cancer in the three classifiers. The best results of the decision tree are 0.72 in terms of AUC when dividing the patients into 15 groups, 0.38 in terms of F1 score when dividing the patients into 20 groups. In terms of AUC, decision tree achieves 67.4% improvement compared to using all 20 predictor variables and 28.6% improvement compared to no group division. In terms of F1 score, decision tree achieves 216.7% improvement compared to using all 20 predictor variables and 80.9% improvement compared to no group division. Different groups provide different ranking results for the predictor variables. Conclusion: The accuracies of predicting secondary cancer using k-nearest neighbor, decision tree, support vector machine indeed increased after using the selected important risk factors as predictors. Group division on patients to predict secondary cancer on the separated models can further improve the prediction accuracies. The information discovered in the experiments can provide important references to the personality and clinical symptom representations on all phases of guide interventions, with the complexities of multiple symptoms associated with secondary cancer in all phases of the recurrent trajectory.
Collapse
Affiliation(s)
- Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, Japan.,Center for Artificial Intelligence Research, University of Tsukuba, Tsukuba, Japan
| | - Hongmin Li
- Department of Computer Science, University of Tsukuba, Tsukuba, Japan
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba, Japan.,Center for Artificial Intelligence Research, University of Tsukuba, Tsukuba, Japan
| | - Pei-Wei Shueng
- Division of Radiation Oncology, Far Eastern Memorial Hospital, New Taipei City, Taiwan.,Faculty of Medicine, School of Medicine, National Yang-Ming University, Taipei, Taiwan
| |
Collapse
|
40
|
Xu L, Liang G, Liao C, Chen GD, Chang CC. An Efficient Classifier for Alzheimer's Disease Genes Identification. Molecules 2018; 23:molecules23123140. [PMID: 30501121 PMCID: PMC6321377 DOI: 10.3390/molecules23123140] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Revised: 11/17/2018] [Accepted: 11/19/2018] [Indexed: 11/16/2022] Open
Abstract
Alzheimer’s disease (AD) is considered to one of 10 key diseases leading to death in humans. AD is considered the main cause of brain degeneration, and will lead to dementia. It is beneficial for affected patients to be diagnosed with the disease at an early stage so that efforts to manage the patient can begin as soon as possible. Most existing protocols diagnose AD by way of magnetic resonance imaging (MRI). However, because the size of the images produced is large, existing techniques that employ MRI technology are expensive and time-consuming to perform. With this in mind, in the current study, AD is predicted instead by the use of a support vector machine (SVM) method based on gene-coding protein sequence information. In our proposed method, the frequency of two consecutive amino acids is used to describe the sequence information. The accuracy of the proposed method for identifying AD is 85.7%, which is demonstrated by the obtained experimental results. The experimental results also show that the sequence information of gene-coding proteins can be used to predict AD.
Collapse
Affiliation(s)
- Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen 518055, China.
| | - Guangmin Liang
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen 518055, China.
| | - Changrui Liao
- Key Laboratory of Optoelectronic Devices and Systems of Ministry of Education and Guangdong Province, College of Optoelectronic Engineering, Shenzhen University, Shenzhen 518060, China.
| | - Gin-Den Chen
- Department of Obstetrics and Gynecology, Chung Shan Medical University Hospital, Taichung 40201, Taiwan.
| | - Chi-Chang Chang
- School of Medical Informatics, Chung Shan Medical University, Taichung 40201, Taiwan.
- IT Office, Chung Shan Medical University Hospital, Taichung 40201, Taiwan.
| |
Collapse
|
41
|
Tahir M, Jan B, Hayat M, Shah SU, Amin M. Efficient computational model for classification of protein localization images using Extended Threshold Adjacency Statistics and Support Vector Machines. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2018; 157:205-215. [PMID: 29477429 DOI: 10.1016/j.cmpb.2018.01.021] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2017] [Revised: 01/02/2018] [Accepted: 01/24/2018] [Indexed: 06/08/2023]
Abstract
BACKGROUND AND OBJECTIVE Discriminative and informative feature extraction is the core requirement for accurate and efficient classification of protein subcellular localization images so that drug development could be more effective. The objective of this paper is to propose a novel modification in the Threshold Adjacency Statistics technique and enhance its discriminative power. METHODS In this work, we utilized Threshold Adjacency Statistics from a novel perspective to enhance its discrimination power and efficiency. In this connection, we utilized seven threshold ranges to produce seven distinct feature spaces, which are then used to train seven SVMs. The final prediction is obtained through the majority voting scheme. The proposed ETAS-SubLoc system is tested on two benchmark datasets using 5-fold cross-validation technique. RESULTS We observed that our proposed novel utilization of TAS technique has improved the discriminative power of the classifier. The ETAS-SubLoc system has achieved 99.2% accuracy, 99.3% sensitivity and 99.1% specificity for Endogenous dataset outperforming the classical Threshold Adjacency Statistics technique. Similarly, 91.8% accuracy, 96.3% sensitivity and 91.6% specificity values are achieved for Transfected dataset. CONCLUSIONS Simulation results validated the effectiveness of ETAS-SubLoc that provides superior prediction performance compared to the existing technique. The proposed methodology aims at providing support to pharmaceutical industry as well as research community towards better drug designing and innovation in the fields of bioinformatics and computational biology. The implementation code for replicating the experiments presented in this paper is available at: https://drive.google.com/file/d/0B7IyGPObWbSqRTRMcXI2bG5CZWs/view?usp=sharing.
Collapse
Affiliation(s)
- Muhammad Tahir
- College of Computing and Informatics, Saudi Electronic University, Al-Madinah Branch, Saudi Arabia
| | - Bismillah Jan
- Department of Computer and Information Sciences, Pakistan Institute of Engineering and Applied Sciences, Islamabad, Pakistan; Department of Computer Science, National University of Computer and Emerging Sciences, Peshawar Campus, Pakistan
| | - Maqsood Hayat
- Department of Computer Science, University College of Sciences, Shanker, Abdul Wali Khan University, Mardan, Pakistan.
| | - Shakir Ullah Shah
- Department of Computer Science, National University of Computer and Emerging Sciences, Peshawar Campus, Pakistan
| | - Muhammad Amin
- Department of Computer Science, National University of Computer and Emerging Sciences, Peshawar Campus, Pakistan
| |
Collapse
|
42
|
Li H, Zhang P, Yuan S, Tian H, Tian D, Liu M. Modeling analysis of the relationship between atherosclerosis and related inflammatory factors. Saudi J Biol Sci 2017; 24:1803-1809. [PMID: 29551927 PMCID: PMC5851939 DOI: 10.1016/j.sjbs.2017.11.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2017] [Revised: 11/07/2017] [Accepted: 11/07/2017] [Indexed: 11/05/2022] Open
Abstract
Objective: To establish early diagnosis model of inflammatory factors for atherosclerosis (AS), providing theoretical evidence for early detection of AS and development of plaques. Methods: Serum samples were collected to detect the inflammatory factors including CysC, Hcy, hs-CRP, UA, FIB, D-D, LP (a), IL-6, SAA, sCD40L and MDA. Using Logistic regression analysis, the inflammatory factors used for modeling were screened out, and then the AS early diagnosis models were established based on receiver operating characteristic (ROC) curve, support vector machine and BP neural network respectively. Results: No significant difference exists between the general materials of two groups. All 11 inflammatory factors had higher level in AS group than in control group. As shown in ROC curve, all inflammatory factors were helpful in AS diagnosis. In terms of sensitivity, UA ranked first (98) and FIB ranked last (55.5); in terms of specificity, UA ranked first (99) and FIB ranked last (78); in terms of area under the curve, UA and SAA ranked first (both were 0.995) and FIB ranked last (0.721). Based on Logistic regression equation, six factors were screened out, including Hcy, Hs-CRP, IL-6, D-D, CysC and MDA. According to classification, the final sixth steps had a prediction accuracy of 99%. When six inflammatory factors included in Logistic regression equation were detected jointly, the sensitivity, specificity and area under the curve were 57%, 97% and 0.821 respectively, while those of the model excluding D-D were 64%, 90% and 0.828, generally superior to results of joint detection including six factors. The ROC curve based on Hcy, Hs-CRP and MDA had a sensitivity of 87%, a specificity of 94% and an area under the curve of 0.869, being inferior to those of the ROC curve based on IL-6, D-D and Cys C, which were 87%, 92% and 0.936 respectively. The accuracy of SVM-AS diagnosis model and BP neural network model were 82.5% and 77.5% respectively. Conclusion: All 11 inflammatory factors are valuable in AS diagnosis. AS early diagnosis models based on Logistic regression analysis, ROC curve, support vector machine and BP neural network possess diagnostic value and can provide reference for clinical diagnosis.
Collapse
Affiliation(s)
- Huidong Li
- Department of Hypertension, The Second Affiliated Hospital of Zhengzhou University, Henan Province, China
| | - Pei Zhang
- Department of Hypertension, Henan Provincial People's Hospital, Henan Province, China
| | - Shuaifang Yuan
- Department of Hypertension, Henan Provincial People's Hospital, Henan Province, China
| | - Huiyuan Tian
- Department of Hypertension, Henan Provincial People's Hospital, Henan Province, China
| | - Dandan Tian
- Department of Hypertension, Henan Provincial People's Hospital, Henan Province, China
| | - Min Liu
- Department of Hypertension, Henan Provincial People's Hospital, Henan Province, China
| |
Collapse
|