Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Tseng C, Lu C, Chang C, Chen G, Cheewakriangkrai C. Integration of data mining classification techniques and ensemble learning to identify risk factors and diagnose ovarian cancer recurrence. Artif Intell Med 2017;78:47-54. [DOI: 10.1016/j.artmed.2017.06.003] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2017] [Revised: 05/30/2017] [Accepted: 06/04/2017] [Indexed: 11/22/2022]

For:	Tseng C, Lu C, Chang C, Chen G, Cheewakriangkrai C. Integration of data mining classification techniques and ensemble learning to identify risk factors and diagnose ovarian cancer recurrence. Artif Intell Med 2017;78:47-54. [DOI: 10.1016/j.artmed.2017.06.003] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2017] [Revised: 05/30/2017] [Accepted: 06/04/2017] [Indexed: 11/22/2022]

Number

Cited by Other Article(s)

Chang CY, Peng CH, Chen FY, Huang LY, Kuo CH, Chu TW, Liang YJ. The risk factors determined by four machine learning methods for the change of difference of bone mineral density in post-menopausal women after three years follow-up. Sci Rep 2024;14:23234. [PMID: 39369003 PMCID: PMC11455928 DOI: 10.1038/s41598-024-73799-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 09/20/2024] [Indexed: 10/07/2024] Open

Abstract

The prevalence of osteoporosis has drastically increased recently. It is not only the most frequent but is also a major global public health problem due to its high morbidity. There are many risk factors associated with osteoporosis were identified. However, most studies have used the traditional multiple linear regression (MLR) to explore their relationships. Recently, machine learning (Mach-L) has become a new modality for data analysis because it enables machine to learn from past data or experiences without being explicitly programmed and could capture nonlinear relationships better. These methods have the potential to outperform conventional MLR in disease prediction. In the present study, we enrolled a Chinese post-menopause cohort followed up for 4 years. The difference of T-score (δ-T score) was the dependent variable. Information such as demographic, biochemistry and life styles were the independent variables. Our goals were: (1) Compare the prediction accuracy between Mach-L and traditional MLR for δ-T score. (2) Rank the importance of risk factors (independent variables) for prediction of δ T-score. Totally, there were 1698 postmenopausal women were enrolled from MJ Health Database. Four different Mach-L methods namely, Random forest (RF), eXtreme Gradient Boosting (XGBoost), Naïve Bayes (NB), and stochastic gradient boosting (SGB), to construct predictive models for predicting δ-BMD after four years follow-up. The dataset was then randomly divided into an 80% training dataset for model building and a 20% testing dataset for model testing. A 10-fold cross-validation technique for hyperparameter tuning was used. The model with the lowest root mean square error for the validation dataset was viewed as the best model for each ML method. The averaged metrics of the RF, SGB, NB, and XGBoost models were used to compare the model performance of the benchmark MLR model that used the same training and testing dataset as the Mach-L methods. We defined that the priority demonstrated in each model ranked 1 as the most critical risk factor and 22 as the last selected risk factor. For Pearson correlation, age, education, BMI, HDL-C, and TSH were positively and plasma calcium level, and baseline T-score were negatively correlated with δ-T score. All four Mach-L methods yielded lower prediction errors than the MLR method and were all convincing Mach-L models. From our results, it could be noted that education level is the most important factor for δ-T Score, followed by DBP, smoking, SBP, UA, age, and LDL-C. All four Mach-L outperformed traditional MLR. By using Mach-L, the most important six risk factors were selected which are, from the most important to the least: DBP, SBP, UA, education level, TG and sleeping hour. δ T score was positively related to SBP, education level, UA and TG and negatively related to DBP and sleeping hour in postmenopausal Chinese women.

Collapse

Madakkatel I, Lumsden AL, Mulugeta A, Mäenpää J, Oehler MK, Hyppönen E. Large-scale analysis to identify risk factors for ovarian cancer. Int J Gynecol Cancer 2024:ijgc-2024-005424. [PMID: 39084694 DOI: 10.1136/ijgc-2024-005424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/02/2024] Open

Abstract

OBJECTIVE

Ovarian cancer is characterized by late-stage diagnoses and poor prognosis. We aimed to identify factors that can inform prevention and early detection of ovarian cancer.

METHODS

We used a data-driven machine learning approach to identify predictors of epithelial ovarian cancer from 2920 input features measured 12.6 years (IQR 11.9 to 13.3 years) before diagnoses. Analyses included 221 732 female participants in the UK Biobank without a history of cancer. During the follow-up 1441 women developed ovarian cancer. For factors that contributed to model prediction, we used multivariate logistic regression to evaluate the association with ovarian cancer, with evidence for causality tested by Mendelian randomization (MR) analyses in the Ovarian Cancer Genetics Consortium (25 509 cases).

RESULTS

Greater parity and ever-use of oral contraception were associated with lower ovarian cancer risk (ever vs never OR 0.74, 95% CI 0.66 to 0.84). After adjustment for established risk factors, greater height, weight, and greater red blood cell distribution width were associated with increased ovarian cancer risk, while higher aspartate aminotransferase levels and mean corpuscular volume were associated with lower risk. MR analyses confirmed observational associations with anthropometric/adiposity traits (eg, body fat percentage per standard deviation (SD); OR inverse-variance weighted (ORIVW) 1.28, 95% CI 1.13 to 1.46) and aspartate aminotransferase (ORIVW 0.87, 95% CI 0.78 to 0.98). MR also provided genetic evidence for a protective association of higher total serum protein on ovarian cancer, higher lymphocyte count on serous and endometrioid ovarian cancer, and greater forced expiratory volume in 1 s on serous ovarian cancer among other findings.

CONCLUSIONS

This study shows that certain risk factors for ovarian cancer are modifiable, suggesting that weight reduction and interventions to reduce the number of ovulations may provide potential for future prevention. We also identified blood biomarkers associated with ovarian cancer years before diagnoses, warranting further investigation.

Collapse

Wang CK, Chang CY, Chu TW, Liang YJ. Using Machine Learning to Identify the Relationships between Demographic, Biochemical, and Lifestyle Parameters and Plasma Vitamin D Concentration in Healthy Premenopausal Chinese Women. Life (Basel) 2023;13:2257. [PMID: 38137858 PMCID: PMC10744461 DOI: 10.3390/life13122257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 11/15/2023] [Accepted: 11/22/2023] [Indexed: 12/24/2023] Open

Abstract

INTRODUCTION

Vitamin D plays a vital role in maintaining homeostasis and enhancing the absorption of calcium, an essential component for strengthening bones and preventing osteoporosis. There are many factors known to relate to plasma vitamin D concentration (PVDC). However, most of these studies were performed with traditional statistical methods. Nowadays, machine learning methods (Mach-L) have become new tools in medical research. In the present study, we used four Mach-L methods to explore the relationships between PVDC and demographic, biochemical, and lifestyle factors in a group of healthy premenopausal Chinese women. Our goals were as follows: (1) to evaluate and compare the predictive accuracy of Mach-L and MLR, and (2) to establish a hierarchy of the significance of the aforementioned factors related to PVDC.

METHODS

Five hundred ninety-three healthy Chinese women were enrolled. In total, there were 35 variables recorded, including demographic, biochemical, and lifestyle information. The dependent variable was 25-OH vitamin D (PVDC), and all other variables were the independent variables. Multiple linear regression (MLR) was regarded as the benchmark for comparison. Four Mach-L methods were applied (random forest (RF), stochastic gradient boosting (SGB), extreme gradient boosting (XGBoost), and elastic net). Each method would produce several estimation errors. The smaller these errors were, the better the model was.

RESULTS

Pearson's correlation, age, glycated hemoglobin, HDL-cholesterol, LDL-cholesterol, and hemoglobin were positively correlated to PVDC, whereas eGFR was negatively correlated to PVDC. The Mach-L methods yielded smaller estimation errors for all five parameters, which indicated that they were better methods than the MLR model. After averaging the importance percentage from the four Mach-L methods, a rank of importance could be obtained. Age was the most important factor, followed by plasma insulin level, TSH, spouse status, LDH, and ALP.

CONCLUSIONS

In a healthy Chinese premenopausal cohort using four different Mach-L methods, age was found to be the most important factor related to PVDC, followed by plasma insulin level, TSH, spouse status, LDH, and ALP.

Collapse

Yang CC, Peng CH, Huang LY, Chen FY, Kuo CH, Wu CZ, Hsia TL, Lin CY. Comparison between multiple logistic regression and machine learning methods in prediction of abnormal thallium scans in type 2 diabetes. World J Clin Cases 2023;11:7951-7964. [DOI: 10.12998/wjcc.v11.i33.7951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 10/23/2023] [Accepted: 11/13/2023] [Indexed: 11/24/2023] Open

Tzou SJ, Peng CH, Huang LY, Chen FY, Kuo CH, Wu CZ, Chu TW. Comparison between linear regression and four different machine learning methods in selecting risk factors for osteoporosis in a Chinese female aged cohort. J Chin Med Assoc 2023;86:1028-1036. [PMID: 37729604 DOI: 10.1097/jcma.0000000000000999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 09/22/2023] Open

Abstract

BACKGROUND

Population aging is emerging as an increasingly acute challenge for countries around the world. One particular manifestation of this phenomenon is the impact of osteoporosis on individuals and national health systems. Previous studies of risk factors for osteoporosis were conducted using traditional statistical methods, but more recent efforts have turned to machine learning approaches. Most such efforts, however, treat the target variable (bone mineral density [BMD] or fracture rate) as a categorical one, which provides no quantitative information. The present study uses five different machine learning methods to analyze the risk factors for T-score of BMD, seeking to (1) compare the prediction accuracy between different machine learning methods and traditional multiple linear regression (MLR) and (2) rank the importance of 25 different risk factors.

METHODS

The study sample includes 24 412 women older than 55 years with 25 related variables, applying traditional MLR and five different machine learning methods: classification and regression tree, Naïve Bayes, random forest, stochastic gradient boosting, and eXtreme gradient boosting. The metrics used for model performance comparisons are the symmetric mean absolute percentage error, relative absolute error, root relative squared error, and root mean squared error.

RESULTS

Machine learning approaches outperformed MLR for all four prediction errors. The average importance ranking of each factor generated by the machine learning methods indicates that age is the most important factor determining T-score, followed by estimated glomerular filtration rate (eGFR), body mass index (BMI), uric acid (UA), and education level.

CONCLUSION

In a group of women older than 55 years, we demonstrated that machine learning methods provide superior performance in estimating T-Score, with age being the most important impact factor, followed by eGFR, BMI, UA, and education level.

Collapse

Jiang Y, Wang C, Zhou S. Artificial intelligence-based risk stratification, accurate diagnosis and treatment prediction in gynecologic oncology. Semin Cancer Biol 2023;96:82-99. [PMID: 37783319 DOI: 10.1016/j.semcancer.2023.09.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Revised: 08/27/2023] [Accepted: 09/25/2023] [Indexed: 10/04/2023]

Chen CH, Wang CK, Wang CY, Chang CF, Chu TW. Roles of biochemistry data, lifestyle, and inflammation in identifying abnormal renal function in old Chinese. World J Clin Cases 2023;11:7004-7016. [PMID: 37946770 PMCID: PMC10631406 DOI: 10.12998/wjcc.v11.i29.7004] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 08/01/2023] [Accepted: 09/11/2023] [Indexed: 10/13/2023] Open

Tsai MH, Jhou MJ, Liu TC, Fang YW, Lu CJ. An integrated machine learning predictive scheme for longitudinal laboratory data to evaluate the factors determining renal function changes in patients with different chronic kidney disease stages. Front Med (Lausanne) 2023;10:1155426. [PMID: 37859858 PMCID: PMC10582636 DOI: 10.3389/fmed.2023.1155426] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 09/19/2023] [Indexed: 10/21/2023] Open

Abstract

Background and objectives

Chronic kidney disease (CKD) is a global health concern. This study aims to identify key factors associated with renal function changes using the proposed machine learning and important variable selection (ML&IVS) scheme on longitudinal laboratory data. The goal is to predict changes in the estimated glomerular filtration rate (eGFR) in a cohort of patients with CKD stages 3-5.

Design

A retrospective cohort study.

Setting and participants

A total of 710 outpatients who presented with stable nondialysis-dependent CKD stages 3-5 at the Shin-Kong Wu Ho-Su Memorial Hospital Medical Center from 2016 to 2021.

Methods

This study analyzed trimonthly laboratory data including 47 indicators. The proposed scheme used stochastic gradient boosting, multivariate adaptive regression splines, random forest, eXtreme gradient boosting, and light gradient boosting machine algorithms to evaluate the important factors for predicting the results of the fourth eGFR examination, especially in patients with CKD stage 3 and those with CKD stages 4-5, with or without diabetes mellitus (DM).

Main outcome measurement

Subsequent eGFR level after three consecutive laboratory data assessments.

Results

Our ML&IVS scheme demonstrated superior predictive capabilities and identified significant factors contributing to renal function changes in various CKD groups. The latest levels of eGFR, blood urea nitrogen (BUN), proteinuria, sodium, and systolic blood pressure as well as mean levels of eGFR, BUN, proteinuria, and triglyceride were the top 10 significantly important factors for predicting the subsequent eGFR level in patients with CKD stages 3-5. In individuals with DM, the latest levels of BUN and proteinuria, mean levels of phosphate and proteinuria, and variations in diastolic blood pressure levels emerged as important factors for predicting the decline of renal function. In individuals without DM, all phosphate patterns and latest albumin levels were found to be key factors in the advanced CKD group. Moreover, proteinuria was identified as an important factor in the CKD stage 3 group without DM and CKD stages 4-5 group with DM.

Conclusion

The proposed scheme highlighted factors associated with renal function changes in different CKD conditions, offering valuable insights to physicians for raising awareness about renal function changes.

Collapse

Wu CZ, Huang LY, Chen FY, Kuo CH, Yeih DF. Using Machine Learning to Predict Abnormal Carotid Intima-Media Thickness in Type 2 Diabetes. Diagnostics (Basel) 2023;13:diagnostics13111834. [PMID: 37296685 DOI: 10.3390/diagnostics13111834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Revised: 05/16/2023] [Accepted: 05/20/2023] [Indexed: 06/12/2023] Open

Huang HH, Hsieh SJ, Chen MS, Jhou MJ, Liu TC, Shen HL, Yang CT, Hung CC, Yu YY, Lu CJ. Machine Learning Predictive Models for Evaluating Risk Factors Affecting Sperm Count: Predictions Based on Health Screening Indicators. J Clin Med 2023;12:1220. [PMID: 36769868 PMCID: PMC9917545 DOI: 10.3390/jcm12031220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 01/13/2023] [Accepted: 02/01/2023] [Indexed: 02/05/2023] Open

Abstract

In many countries, especially developed nations, the fertility rate and birth rate have continually declined. Taiwan's fertility rate has paralleled this trend and reached its nadir in 2022. Therefore, the government uses many strategies to encourage more married couples to have children. However, couples marrying at an older age may have declining physical status, as well as hypertension and other metabolic syndrome symptoms, in addition to possibly being overweight, which have been the focus of the studies for their influences on male and female gamete quality. Many previous studies based on infertile people are not truly representative of the general population. This study proposed a framework using five machine learning (ML) predictive algorithms-random forest, stochastic gradient boosting, least absolute shrinkage and selection operator regression, ridge regression, and extreme gradient boosting-to identify the major risk factors affecting male sperm count based on a major health screening database in Taiwan. Unlike traditional multiple linear regression, ML algorithms do not need statistical assumptions and can capture non-linear relationships or complex interactions between dependent and independent variables to generate promising performance. We analyzed annual health screening data of 1375 males from 2010 to 2017, including data on health screening indicators, sourced from the MJ Group, a major health screening center in Taiwan. The symmetric mean absolute percentage error, relative absolute error, root relative squared error, and root mean squared error were used as performance evaluation metrics. Our results show that sleep time (ST), alpha-fetoprotein (AFP), body fat (BF), systolic blood pressure (SBP), and blood urea nitrogen (BUN) are the top five risk factors associated with sperm count. ST is a known risk factor influencing reproductive hormone balance, which can affect spermatogenesis and final sperm count. BF and SBP are risk factors associated with metabolic syndrome, another known risk factor of altered male reproductive hormone systems. However, AFP has not been the focus of previous studies on male fertility or semen quality. BUN, the index for kidney function, is also identified as a risk factor by our established ML model. Our results support previous findings that metabolic syndrome has negative impacts on sperm count and semen quality. Sleep duration also has an impact on sperm generation in the testes. AFP and BUN are two novel risk factors linked to sperm counts. These findings could help healthcare personnel and law makers create strategies for creating environments to increase the country's fertility rate. This study should also be of value to follow-up research.

Collapse

Fiste O, Liontos M, Zagouri F, Stamatakos G, Dimopoulos MA. Machine learning applications in gynecological cancer: A critical review. Crit Rev Oncol Hematol 2022;179:103808. [PMID: 36087852 DOI: 10.1016/j.critrevonc.2022.103808] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Revised: 08/18/2022] [Accepted: 09/05/2022] [Indexed: 11/30/2022] Open

Ahmed MIB, Alotaibi S, Atta-ur-Rahman, Dash S, Nabil M, AlTurki AO. A Review on Machine Learning Approaches in Identification of Pediatric Epilepsy. SN COMPUTER SCIENCE 2022;3:437. [PMID: 35965953 PMCID: PMC9364307 DOI: 10.1007/s42979-022-01358-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 06/26/2022] [Indexed: 10/26/2022]

Comparison between Machine Learning and Multiple Linear Regression to Identify Abnormal Thallium Myocardial Perfusion Scan in Chinese Type 2 Diabetes. Diagnostics (Basel) 2022;12:diagnostics12071619. [PMID: 35885524 PMCID: PMC9324130 DOI: 10.3390/diagnostics12071619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2022] [Revised: 06/27/2022] [Accepted: 06/30/2022] [Indexed: 11/17/2022] Open

Huang LY, Chen FY, Jhou MJ, Kuo CH, Wu CZ, Lu CH, Chen YL, Pei D, Cheng YF, Lu CJ. Comparing Multiple Linear Regression and Machine Learning in Predicting Diabetic Urine Albumin-Creatinine Ratio in a 4-Year Follow-Up Study. J Clin Med 2022;11:3661. [PMID: 35806944 PMCID: PMC9267784 DOI: 10.3390/jcm11133661] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 06/19/2022] [Accepted: 06/22/2022] [Indexed: 02/07/2023] Open

Affiliation(s)

Li-Ying Huang Division of Endocrinology and Metabolism, Department of Internal Medicine, Department of Medical Education, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan; (L.-Y.H.); (F.-Y.C.); (C.-H.K.); (D.P.)
Fang-Yu Chen Division of Endocrinology and Metabolism, Department of Internal Medicine, Department of Medical Education, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan; (L.-Y.H.); (F.-Y.C.); (C.-H.K.); (D.P.)
Mao-Jhen Jhou Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242062, Taiwan;
Chun-Heng Kuo Division of Endocrinology and Metabolism, Department of Internal Medicine, Department of Medical Education, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan; (L.-Y.H.); (F.-Y.C.); (C.-H.K.); (D.P.)
Chung-Ze Wu Division of Endocrinology, Department of Internal Medicine, Shuang Ho Hospital, New Taipei City 23561, Taiwan; Division of Endocrinology and Metabolism, Department of Internal Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei 11031, Taiwan
Chieh-Hua Lu Division of Endocrinology and Metabolism, Department of Internal Medicine, Tri-Service General Hospital, School of Medicine, National Defense Medical Center, Taipei 11490, Taiwan;
Yen-Lin Chen Department of Pathology, Tri-Service General Hospital, National Defense Medical Center, Taipei 11490, Taiwan;
Dee Pei Division of Endocrinology and Metabolism, Department of Internal Medicine, Department of Medical Education, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan; (L.-Y.H.); (F.-Y.C.); (C.-H.K.); (D.P.)
Yu-Fang Cheng Department of Endocrinology and Metabolism, Changhua Christian Hospital, Changhua 50051, Taiwan;
Chi-Jie Lu Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242062, Taiwan; Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City 242062, Taiwan Department of Information Management, Fu Jen Catholic University, New Taipei City 242062, Taiwan

Collapse

Famitha S, Moorthi M. Intelligent and novel multi-type cancer prediction model using optimized ensemble learning. Comput Methods Biomech Biomed Engin 2022;25:1879-1903. [PMID: 35695463 DOI: 10.1080/10255842.2022.2081504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]

Nan N. Integration and Development of Enterprise Internal Audit and Big Data Based on Data Mining Technology. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022;2022:8138046. [PMID: 35498211 PMCID: PMC9054413 DOI: 10.1155/2022/8138046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 03/28/2022] [Indexed: 11/20/2022]

Kao HY, Chang CC, Chang CF, Chen YC, Cheewakriangkrai C, Tu YL. Associations between Sex and Risk Factors for Predicting Chronic Kidney Disease. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022;19:1219. [PMID: 35162242 PMCID: PMC8835286 DOI: 10.3390/ijerph19031219] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 01/12/2022] [Accepted: 01/19/2022] [Indexed: 11/16/2022]

Abstract

Gender is an important risk factor in predicting chronic kidney disease (CKD); however, it is under-researched. The purpose of this study was to examine whether gender differences affect the risk factors of early CKD prediction. This study used data from 19,270 adult health screenings, including 5101 with CKD, to screen for 11 independent variables selected as risk factors and to test for the significant effects of statistical Chi-square test variables, using seven machine learning techniques to train the predictive models. Performance indicators included classification accuracy, sensitivity, specificity, and precision. Unbalanced category issues were addressed using three extraction methods: manual sampling, the synthetic minority oversampling technique, and SpreadSubsample. The Chi-square test revealed statistically significant results (p < 0.001) for gender, age, red blood cell count in urine, urine protein (PRO) content, and the PRO-to-urinary creatinine ratio. In terms of classifier prediction performance, the manual extraction method, logistic regression, exhibited the highest average prediction accuracy rate (0.8053) for men, whereas the manual extraction method, linear discriminant analysis, demonstrated the highest average prediction accuracy rate (0.8485) for women. The clinical features of a normal or abnormal PRO-to-urinary creatinine ratio indicated that PRO ratio, age, and urine red blood cell count are the most important risk factors with which to predict CKD in both genders. As a result, this study proposes a prediction model with acceptable prediction accuracy. The model supports doctors in diagnosis and treatment and achieves the goal of early detection and treatment. Based on the evidence-based medicine, machine learning methods are used to develop predictive model in this study. The model has proven to support the prediction of early clinical risk of CKD as much as possible to improve the efficacy and quality of clinical decision making.

Collapse

Sato M, Sato S, Shintani D, Hanaoka M, Ogasawara A, Miwa M, Yabuno A, Kurosaki A, Yoshida H, Fujiwara K, Hasegawa K. Clinical significance of metabolism-related genes and FAK activity in ovarian high-grade serous carcinoma. BMC Cancer 2022;22:59. [PMID: 35027024 PMCID: PMC8756654 DOI: 10.1186/s12885-021-09148-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2021] [Accepted: 12/22/2021] [Indexed: 12/13/2022] Open

Abstract

BACKGROUND

Administration of poly (ADP-ribose) polymerase (PARP) inhibitors after achieving a response to platinum-containing drugs significantly prolonged relapse-free survival compared to placebo administration. PARP inhibitors have been used in clinical practice. However, patients with platinum-resistant relapsed ovarian cancer still have a poor prognosis and there is an unmet need. The purpose of this study was to examine the clinical significance of metabolic genes and focal adhesion kinase (FAK) activity in advanced ovarian high-grade serous carcinoma (HGSC).

METHODS

The RNA sequencing (RNA-seq) data and clinical data of HGSC patients were obtained from the Genomic Data Commons (GDC) Data Portal and analysed ( https://portal.gdc.cancer.gov/ ). In addition, tumour tissue was sampled by laparotomy or screening laparoscopy prior to treatment initiation from patients diagnosed with stage IIIC ovarian cancer (International Federation of Gynecology and Obstetrics (FIGO) classification, 2014) at the Saitama Medical University International Medical Center, and among the patients diagnosed with HGSC, 16 cases of available cryopreserved specimens were included in this study. The present study was reviewed and approved by the Institutional Review Board of Saitama Medical University International Medical Center (Saitama, Japan). Among the 6307 variable genes detected in both The Cancer Genome Atlas-Ovarian (TCGA-OV) data and clinical specimen data, 35 genes related to metabolism and FAK activity were applied. RNA-seq data were analysed using the Subio Platform (Subio Inc, Japan). JMP 15 (SAS, USA) was used for statistical analysis and various types of machine learning. The Kaplan-Meier method was used for survival analysis, and the Wilcoxon test was used to analyse significant differences. P < 0.05 was considered significant.

RESULTS

In the TCGA-OV data, patients with stage IIIC with a residual tumour diameter of 1-10 mm were selected for K means clustering and classified into groups with significant prognostic correlations (p = 0.0444). These groups were significantly associated with platinum sensitivity/resistance in clinical cases (χ2 test, p = 0.0408) and showed significant relationships with progression-free survival (p = 0.0307).

CONCLUSION

In the TCGA-OV data, 2 groups classified by clustering focusing on metabolism-related genes and FAK activity were shown to be associated with platinum resistance and a poor prognosis.

Collapse

An Integrated Approach for Cancer Survival Prediction Using Data Mining Techniques. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022;2021:6342226. [PMID: 34992648 PMCID: PMC8727098 DOI: 10.1155/2021/6342226] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 11/27/2021] [Indexed: 12/31/2022]

Wu ZS, Huang SM, Wang YC. Palmitate Enhances the Efficacy of Cisplatin and Doxorubicin against Human Endometrial Carcinoma Cells. Int J Mol Sci 2021;23:ijms23010080. [PMID: 35008502 PMCID: PMC8744704 DOI: 10.3390/ijms23010080] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Revised: 12/14/2021] [Accepted: 12/17/2021] [Indexed: 12/13/2022] Open

Su K, Wu J, Gu D, Yang S, Deng S, Khakimova AK. An Adaptive Deep Ensemble Learning Method for Dynamic Evolving Diagnostic Task Scenarios. Diagnostics (Basel) 2021;11:2288. [PMID: 34943525 PMCID: PMC8700766 DOI: 10.3390/diagnostics11122288] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 12/04/2021] [Accepted: 12/06/2021] [Indexed: 12/19/2022] Open

Akazawa M, Hashimoto K. Artificial intelligence in gynecologic cancers: Current status and future challenges - A systematic review. Artif Intell Med 2021;120:102164. [PMID: 34629152 DOI: 10.1016/j.artmed.2021.102164] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 05/28/2021] [Accepted: 08/31/2021] [Indexed: 11/30/2022]

Abstract

OBJECTIVE

Over the past years, the application of artificial intelligence (AI) in medicine has increased rapidly, especially in diagnostics, and in the near future, the role of AI in medicine will become progressively more important. In this study, we elucidated the state of AI research on gynecologic cancers.

METHODS

A search was conducted in three databases-PubMed, Web of Science, and Scopus-for research papers dated between January 2010 and December 2020. As keywords, we used "artificial intelligence," "deep learning," "machine learning," and "neural network," combined with "cervical cancer," "endometrial cancer," "uterine cancer," and "ovarian cancer." We excluded genomic and molecular research, as well as automated pap-smear diagnoses and digital colposcopy.

RESULTS

Of 1632 articles, 71 were eligible, including 34 on cervical cancer, 13 on endometrial cancer, three on uterine sarcoma, and 21 on ovarian cancer. A total of 35 studies (49%) used imaging data and 36 studies (51%) used value-based data as the input data. Magnetic resonance imaging (MRI), computed tomography (CT), ultrasound, cytology, and hysteroscopy data were used as imaging data, and the patients' backgrounds, blood examinations, tumor markers, and indices in pathological examination were used as value-based data. The targets of prediction were definitive diagnosis and prognostic outcome, including overall survival and lymph node metastasis. The size of the dataset was relatively small because 64 studies (90%) included less than 1000 cases, and the median size was 214 cases. The models were evaluated by accuracy scores, area under the receiver operating curve (AUC), and sensitivity/specificity. Owing to the heterogeneity, a quantitative synthesis was not appropriate in this review.

CONCLUSIONS

In gynecologic oncology, more studies have been conducted on cervical cancer than on ovarian and endometrial cancers. Prognoses were mainly used in the study of cervical cancer, whereas diagnoses were primarily used for studying ovarian cancer. The proficiency of the study design for endometrial cancer and uterine sarcoma was unclear because of the small number of studies conducted. The small size of the dataset and the lack of a dataset for external validation were indicated as the challenges of the studies.

Collapse

Risk Prediction of Second Primary Endometrial Cancer in Obese Women: A Hospital-Based Cancer Registry Study. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021;18:ijerph18178997. [PMID: 34501584 PMCID: PMC8431143 DOI: 10.3390/ijerph18178997] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 08/06/2021] [Accepted: 08/16/2021] [Indexed: 12/15/2022]

Spatial Prediction of Groundwater Potentiality in Large Semi-Arid and Karstic Mountainous Region Using Machine Learning Models. WATER 2021. [DOI: 10.3390/w13162273] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Kuo CC, Wang HH, Tseng LP. Using data mining technology to predict medication-taking behaviour in women with breast cancer: A retrospective study. Nurs Open 2021;9:2646-2656. [PMID: 34156764 PMCID: PMC9584494 DOI: 10.1002/nop2.963] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 05/10/2021] [Accepted: 05/27/2021] [Indexed: 11/19/2022] Open

Applying Artificial Intelligence to Gynecologic Oncology: A Review. Obstet Gynecol Surv 2021;76:292-301. [PMID: 34032861 DOI: 10.1097/ogx.0000000000000902] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

Abstract

Importance

Artificial intelligence (AI) will play an increasing role in health care. In gynecologic oncology, it can advance tailored screening, precision surgery, and personalized targeted therapies.

Objective

The aim of this study was to review the role of AI in gynecologic oncology.

Evidence Acquisition

Artificial intelligence publications in gynecologic oncology were identified by searching "gynecologic oncology AND artificial intelligence" in the PubMed database. A review of the literature was performed on the history of AI, its fundamentals, and current applications as related to diagnosis and treatment of cervical, uterine, and ovarian cancers.

Results

A PubMed literature search since the year 2000 showed a significant increase in oncology publications related to AI and oncology. Early studies focused on using AI to interrogate electronic health records in order to improve clinical outcome and facilitate clinical research. In cervical cancer, AI algorithms can enhance image analysis of cytology and visual inspection with acetic acid or colposcopy. In uterine cancers, AI can improve the diagnostic accuracies of radiologic imaging and predictive/prognostic capabilities of clinicopathologic characteristics. Artificial intelligence has also been used to better detect early-stage ovarian cancer and predict surgical outcomes and treatment response.

Conclusions and Relevance

Artificial intelligence has been shown to enhance diagnosis, refine clinical decision making, and advance personalized therapies in gynecologic cancers. The rapid adoption of AI in gynecologic oncology will depend on overcoming the challenges related to data transparency, quality, and interpretation. Artificial intelligence is rapidly transforming health care. However, many physicians are unaware that this technology is being used in their practices and could benefit from a better understanding of the statistics and computer science behind these algorithms. This review provides a summary of AI, its applicability, and its limitations in gynecologic oncology.

Collapse

Lu H, Gao H, Ye M, Wang X. A Hybrid Ensemble Algorithm Combining AdaBoost and Genetic Algorithm for Cancer Classification with Gene Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;18:863-870. [PMID: 31722484 DOI: 10.1109/tcbb.2019.2952102] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Sathipati SY, Ho SY. Identification of the miRNA signature associated with survival in patients with ovarian cancer. Aging (Albany NY) 2021;13:12660-12690. [PMID: 33910165 PMCID: PMC8148489 DOI: 10.18632/aging.202940] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 03/23/2021] [Indexed: 12/22/2022]

Hybrid Basketball Game Outcome Prediction Model by Integrating Data Mining Methods for the National Basketball Association. ENTROPY 2021;23:e23040477. [PMID: 33920720 PMCID: PMC8073849 DOI: 10.3390/e23040477] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 04/08/2021] [Accepted: 04/14/2021] [Indexed: 12/18/2022]

Liu Y, Xu B, Liu M, Qiao H, Zhang S, Qiu J, Ying X. Long non-coding RNA SNHG25 promotes epithelial ovarian cancer progression by up-regulating COMP. J Cancer 2021;12:1660-1668. [PMID: 33613753 PMCID: PMC7890321 DOI: 10.7150/jca.47344] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Accepted: 12/24/2020] [Indexed: 12/11/2022] Open

Shih CC, Lu CJ, Chen GD, Chang CC. Risk Prediction for Early Chronic Kidney Disease: Results from an Adult Health Examination Program of 19,270 Individuals. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020;17:ijerph17144973. [PMID: 32664271 PMCID: PMC7399976 DOI: 10.3390/ijerph17144973] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/30/2020] [Revised: 07/06/2020] [Accepted: 07/07/2020] [Indexed: 12/13/2022]

Landslide Susceptibility Prediction Considering Regional Soil Erosion Based on Machine-Learning Models. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION 2020. [DOI: 10.3390/ijgi9060377] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Abstract Soil erosion (SE) provides slide mass sources for landslide formation, and reflects long-term rainfall erosion destruction of landslides. Therefore, it is possible to obtain more reliable landslide susceptibility prediction results by introducing SE as a geology and hydrology-related predisposing factor. The Ningdu County of China is taken as a research area. Firstly, 446 landslides are obtained through government disaster survey reports. Secondly, the SE amount in Ningdu County is calculated and nine other conventional predisposing factors are obtained under both 30 m and 60 m grid resolutions to determine the effects of SE on landslide susceptibility prediction. Thirdly, four types of machine-learning predictors with 30 m and 60 m grid resolutions—C5.0 decision tree (C5.0 DT), logistic regression (LR), multilayer perceptron (MLP) and support vector machine (SVM)—are applied to construct the landslide susceptibility prediction models considering the SE factor as SE-C5.0 DT, SE-LR, SE-MLP and SE-SVM models; C5.0 DT, LR, MLP and SVM models with no SE are also used for comparisons. Finally, the area under receiver operating feature curve is used to verify the prediction accuracy of these models, and the relative importance of all the 10 predisposing factors is ranked. The results indicate that: (1) SE factor plays the most important role in landslide susceptibility prediction among all 10 predisposing factors under both 30 m and 60 m resolutions; (2) the SE-based models have more accurate landslide susceptibility prediction than the single models with no SE factor; (3) all the models with 30 m resolutions have higher landslide susceptibility prediction accuracy than those with 60 m resolutions; and (4) the C5.0 DT and SVM models show higher landslide susceptibility prediction performance than the MLP and LR models. Collapse

Doja M, Kaur I, Ahmad T. Age-specific survival in prostate cancer using machine learning. DATA TECHNOLOGIES AND APPLICATIONS 2020. [DOI: 10.1108/dta-10-2019-0189] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/09/2022]

Landslide Susceptibility Prediction Modeling Based on Remote Sensing and a Novel Deep Learning Algorithm of a Cascade-Parallel Recurrent Neural Network. SENSORS 2020;20:s20061576. [PMID: 32178235 PMCID: PMC7146231 DOI: 10.3390/s20061576] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Revised: 02/26/2020] [Accepted: 03/06/2020] [Indexed: 11/17/2022]

Abstract

Landslide susceptibility prediction (LSP) modeling is an important and challenging problem. Landslide features are generally uncorrelated or nonlinearly correlated, resulting in limited LSP performance when leveraging conventional machine learning models. In this study, a deep-learning-based model using the long short-term memory (LSTM) recurrent neural network and conditional random field (CRF) in cascade-parallel form was proposed for making LSPs based on remote sensing (RS) images and a geographic information system (GIS). The RS images are the main data sources of landslide-related environmental factors, and a GIS is used to analyze, store, and display spatial big data. The cascade-parallel LSTM-CRF consists of frequency ratio values of environmental factors in the input layers, cascade-parallel LSTM for feature extraction in the hidden layers, and cascade-parallel full connection for classification and CRF for landslide/non-landslide state modeling in the output layers. The cascade-parallel form of LSTM can extract features from different layers and merge them into concrete features. The CRF is used to calculate the energy relationship between two grid points, and the extracted features are further smoothed and optimized. As a case study, the cascade-parallel LSTM-CRF was applied to Shicheng County of Jiangxi Province in China. A total of 2709 landslide grid cells were recorded and 2709 non-landslide grid cells were randomly selected from the study area. The results show that, compared with existing main traditional machine learning algorithms, such as multilayer perception, logistic regression, and decision tree, the proposed cascade-parallel LSTM-CRF had a higher landslide prediction rate (positive predictive rate: 72.44%, negative predictive rate: 80%, total predictive rate: 75.67%). In conclusion, the proposed cascade-parallel LSTM-CRF is a novel data-driven deep learning model that overcomes the limitations of traditional machine learning algorithms and achieves promising results for making LSPs.

Collapse

Li X, Zhao L, Meng T. Upregulated CXCL14 is associated with poor survival outcomes and promotes ovarian cancer cells proliferation. Cell Biochem Funct 2020;38:613-620. [PMID: 32077118 DOI: 10.1002/cbf.3516] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 01/24/2020] [Accepted: 02/05/2020] [Indexed: 12/12/2022]

Developing a Novel Machine Learning-Based Classification Scheme for Predicting SPCs in Colorectal Cancer Survivors. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10041355] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]

Talaei-Khoei A, Tavana M, Wilson JM. A predictive analytics framework for identifying patients at risk of developing multiple medical complications caused by chronic diseases. Artif Intell Med 2019;101:101750. [PMID: 31813486 DOI: 10.1016/j.artmed.2019.101750] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2018] [Revised: 07/07/2019] [Accepted: 10/30/2019] [Indexed: 01/22/2023]

Chang CC, Chen SH. Developing a Novel Machine Learning-Based Classification Scheme for Predicting SPCs in Breast Cancer Survivors. Front Genet 2019;10:848. [PMID: 31620166 PMCID: PMC6759630 DOI: 10.3389/fgene.2019.00848] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Accepted: 08/14/2019] [Indexed: 11/13/2022] Open

Ye X, Li H, Sakurai T, Shueng PW. Ensemble Feature Learning to Identify Risk Factors for Predicting Secondary Cancer. Int J Med Sci 2019;16:949-959. [PMID: 31341408 PMCID: PMC6643128 DOI: 10.7150/ijms.33820] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/03/2019] [Accepted: 04/24/2019] [Indexed: 12/11/2022] Open

Abstract

Background: In recent years, the development and diagnosis of secondary cancer have become the primary concern of cancer survivors. A number of studies have been developing strategies to extract knowledge from the clinical data, aiming to identify important risk factors that can be used to prevent the recurrence of diseases. However, these studies do not focus on secondary cancer. Secondary cancer is lack of the strategies for clinical treatment as well as risk factor identification to prevent the occurrence. Methods: We propose an effective ensemble feature learning method to identify the risk factors for predicting secondary cancer by considering class imbalance and patient heterogeneity. We first divide the patients into some heterogeneous groups based on spectral clustering. In each group, we apply the oversampling method to balance the number of samples in each class and use them as training data for ensemble feature learning. The purpose of ensemble feature learning is to identify the risk factors and construct a diagnosis model for each group. The importance of risk factors is measured based on the properties of patients in each group separately. We predict secondary cancer by assigning the patient to a corresponding group and based on the diagnosis model in this corresponding group. Results: Analysis of the results shows that the decision tree obtains the best results for predicting secondary cancer in the three classifiers. The best results of the decision tree are 0.72 in terms of AUC when dividing the patients into 15 groups, 0.38 in terms of F₁ score when dividing the patients into 20 groups. In terms of AUC, decision tree achieves 67.4% improvement compared to using all 20 predictor variables and 28.6% improvement compared to no group division. In terms of F₁ score, decision tree achieves 216.7% improvement compared to using all 20 predictor variables and 80.9% improvement compared to no group division. Different groups provide different ranking results for the predictor variables. Conclusion: The accuracies of predicting secondary cancer using k-nearest neighbor, decision tree, support vector machine indeed increased after using the selected important risk factors as predictors. Group division on patients to predict secondary cancer on the separated models can further improve the prediction accuracies. The information discovered in the experiments can provide important references to the personality and clinical symptom representations on all phases of guide interventions, with the complexities of multiple symptoms associated with secondary cancer in all phases of the recurrent trajectory.

Collapse

Xu L, Liang G, Liao C, Chen GD, Chang CC. An Efficient Classifier for Alzheimer's Disease Genes Identification. Molecules 2018;23:molecules23123140. [PMID: 30501121 PMCID: PMC6321377 DOI: 10.3390/molecules23123140] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Revised: 11/17/2018] [Accepted: 11/19/2018] [Indexed: 11/16/2022] Open

Tahir M, Jan B, Hayat M, Shah SU, Amin M. Efficient computational model for classification of protein localization images using Extended Threshold Adjacency Statistics and Support Vector Machines. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2018;157:205-215. [PMID: 29477429 DOI: 10.1016/j.cmpb.2018.01.021] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2017] [Revised: 01/02/2018] [Accepted: 01/24/2018] [Indexed: 06/08/2023]

Li H, Zhang P, Yuan S, Tian H, Tian D, Liu M. Modeling analysis of the relationship between atherosclerosis and related inflammatory factors. Saudi J Biol Sci 2017;24:1803-1809. [PMID: 29551927 PMCID: PMC5851939 DOI: 10.1016/j.sjbs.2017.11.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2017] [Revised: 11/07/2017] [Accepted: 11/07/2017] [Indexed: 11/05/2022] Open

Abstract

Objective: To establish early diagnosis model of inflammatory factors for atherosclerosis (AS), providing theoretical evidence for early detection of AS and development of plaques. Methods: Serum samples were collected to detect the inflammatory factors including CysC, Hcy, hs-CRP, UA, FIB, D-D, LP (a), IL-6, SAA, sCD40L and MDA. Using Logistic regression analysis, the inflammatory factors used for modeling were screened out, and then the AS early diagnosis models were established based on receiver operating characteristic (ROC) curve, support vector machine and BP neural network respectively. Results: No significant difference exists between the general materials of two groups. All 11 inflammatory factors had higher level in AS group than in control group. As shown in ROC curve, all inflammatory factors were helpful in AS diagnosis. In terms of sensitivity, UA ranked first (98) and FIB ranked last (55.5); in terms of specificity, UA ranked first (99) and FIB ranked last (78); in terms of area under the curve, UA and SAA ranked first (both were 0.995) and FIB ranked last (0.721). Based on Logistic regression equation, six factors were screened out, including Hcy, Hs-CRP, IL-6, D-D, CysC and MDA. According to classification, the final sixth steps had a prediction accuracy of 99%. When six inflammatory factors included in Logistic regression equation were detected jointly, the sensitivity, specificity and area under the curve were 57%, 97% and 0.821 respectively, while those of the model excluding D-D were 64%, 90% and 0.828, generally superior to results of joint detection including six factors. The ROC curve based on Hcy, Hs-CRP and MDA had a sensitivity of 87%, a specificity of 94% and an area under the curve of 0.869, being inferior to those of the ROC curve based on IL-6, D-D and Cys C, which were 87%, 92% and 0.936 respectively. The accuracy of SVM-AS diagnosis model and BP neural network model were 82.5% and 77.5% respectively. Conclusion: All 11 inflammatory factors are valuable in AS diagnosis. AS early diagnosis models based on Logistic regression analysis, ROC curve, support vector machine and BP neural network possess diagnostic value and can provide reference for clinical diagnosis.

Collapse