1
|
Lu Y, Ning Y, Li Y, Zhu B, Zhang J, Yang Y, Chen W, Yan Z, Chen A, Shen B, Fang Y, Wang D, Song N, Ding X. Risk factor mining and prediction of urine protein progression in chronic kidney disease: a machine learning- based study. BMC Med Inform Decis Mak 2023; 23:173. [PMID: 37653403 PMCID: PMC10472702 DOI: 10.1186/s12911-023-02269-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Accepted: 08/17/2023] [Indexed: 09/02/2023] Open
Abstract
BACKGROUND Chronic kidney disease (CKD) is a global public health concern. Therefore, to provide timely intervention for non-hospitalized high-risk patients and rationally allocate limited clinical resources is important to mine the key factors when designing a CKD prediction model. METHODS This study included data from 1,358 patients with CKD pathologically confirmed during the period from December 2017 to September 2020 at Zhongshan Hospital. A CKD prediction interpretation framework based on machine learning was proposed. From among 100 variables, 17 were selected for the model construction through a recursive feature elimination with logistic regression feature screening. Several machine learning classifiers, including extreme gradient boosting, gaussian-based naive bayes, a neural network, ridge regression, and linear model logistic regression (LR), were trained, and an ensemble model was developed to predict 24-hour urine protein. The detailed relationship between the risk of CKD progression and these predictors was determined using a global interpretation. A patient-specific analysis was conducted using a local interpretation. RESULTS The results showed that LR achieved the best performance, with an area under the curve (AUC) of 0.850 in a single machine learning model. The ensemble model constructed using the voting integration method further improved the AUC to 0.856. The major predictors of moderate-to-severe severity included lower levels of 25-OH-vitamin, albumin, transferrin in males, and higher levels of cystatin C. CONCLUSIONS Compared with the clinical single kidney function evaluation indicators (eGFR, Scr), the machine learning model proposed in this study improved the prediction accuracy of CKD progression by 17.6% and 24.6%, respectively, and the AUC was improved by 0.250 and 0.236, respectively. Our framework can achieve a good predictive interpretation and provide effective clinical decision support.
Collapse
Affiliation(s)
- Yufei Lu
- Department of Nephrology, Zhongshan Hospital, Fudan University, Shanghai Clinical Research Center for Kidney Disease, Shanghai Medical Center of Kidney, Shanghai Institute of Kidney and Dialysis, Shanghai Key Laboratory of Kidney and Blood Purification, Hemodialysis Quality Control Center of Shanghai, Shanghai, China
| | - Yichun Ning
- Department of Nephrology, Zhongshan Hospital, Fudan University, Shanghai Clinical Research Center for Kidney Disease, Shanghai Medical Center of Kidney, Shanghai Institute of Kidney and Dialysis, Shanghai Key Laboratory of Kidney and Blood Purification, Hemodialysis Quality Control Center of Shanghai, Shanghai, China
| | - Yang Li
- Department of Nephrology, Zhongshan Hospital, Fudan University, Shanghai Clinical Research Center for Kidney Disease, Shanghai Medical Center of Kidney, Shanghai Institute of Kidney and Dialysis, Shanghai Key Laboratory of Kidney and Blood Purification, Hemodialysis Quality Control Center of Shanghai, Shanghai, China
| | - Bowen Zhu
- Department of Nephrology, Zhongshan Hospital, Fudan University, Shanghai Clinical Research Center for Kidney Disease, Shanghai Medical Center of Kidney, Shanghai Institute of Kidney and Dialysis, Shanghai Key Laboratory of Kidney and Blood Purification, Hemodialysis Quality Control Center of Shanghai, Shanghai, China
| | - Jian Zhang
- Department of Nephrology, Zhongshan Hospital, Fudan University, Shanghai Clinical Research Center for Kidney Disease, Shanghai Medical Center of Kidney, Shanghai Institute of Kidney and Dialysis, Shanghai Key Laboratory of Kidney and Blood Purification, Hemodialysis Quality Control Center of Shanghai, Shanghai, China
| | - Yan Yang
- Department of Nephrology, Zhongshan Hospital, Fudan University, Shanghai Clinical Research Center for Kidney Disease, Shanghai Medical Center of Kidney, Shanghai Institute of Kidney and Dialysis, Shanghai Key Laboratory of Kidney and Blood Purification, Hemodialysis Quality Control Center of Shanghai, Shanghai, China
| | - Weize Chen
- Department of Nephrology, Zhongshan Hospital, Fudan University, Shanghai Clinical Research Center for Kidney Disease, Shanghai Medical Center of Kidney, Shanghai Institute of Kidney and Dialysis, Shanghai Key Laboratory of Kidney and Blood Purification, Hemodialysis Quality Control Center of Shanghai, Shanghai, China
| | - Zhixin Yan
- Department of Nephrology, Zhongshan Hospital, Fudan University, Shanghai Clinical Research Center for Kidney Disease, Shanghai Medical Center of Kidney, Shanghai Institute of Kidney and Dialysis, Shanghai Key Laboratory of Kidney and Blood Purification, Hemodialysis Quality Control Center of Shanghai, Shanghai, China
| | - Annan Chen
- Department of Nephrology, Zhongshan Hospital, Fudan University, Shanghai Clinical Research Center for Kidney Disease, Shanghai Medical Center of Kidney, Shanghai Institute of Kidney and Dialysis, Shanghai Key Laboratory of Kidney and Blood Purification, Hemodialysis Quality Control Center of Shanghai, Shanghai, China
| | - Bo Shen
- Department of Nephrology, Zhongshan Hospital, Fudan University, Shanghai Clinical Research Center for Kidney Disease, Shanghai Medical Center of Kidney, Shanghai Institute of Kidney and Dialysis, Shanghai Key Laboratory of Kidney and Blood Purification, Hemodialysis Quality Control Center of Shanghai, Shanghai, China
| | - Yi Fang
- Department of Nephrology, Zhongshan Hospital, Fudan University, Shanghai Clinical Research Center for Kidney Disease, Shanghai Medical Center of Kidney, Shanghai Institute of Kidney and Dialysis, Shanghai Key Laboratory of Kidney and Blood Purification, Hemodialysis Quality Control Center of Shanghai, Shanghai, China
| | - Dong Wang
- School of Computer Science & Information Engineering, Shanghai Institute of Technology, Shanghai, China.
| | - Nana Song
- Department of Nephrology, Zhongshan Hospital, Fudan University, Shanghai Clinical Research Center for Kidney Disease, Shanghai Medical Center of Kidney, Shanghai Institute of Kidney and Dialysis, Shanghai Key Laboratory of Kidney and Blood Purification, Hemodialysis Quality Control Center of Shanghai, Shanghai, China.
| | - Xiaoqiang Ding
- Department of Nephrology, Zhongshan Hospital, Fudan University, Shanghai Clinical Research Center for Kidney Disease, Shanghai Medical Center of Kidney, Shanghai Institute of Kidney and Dialysis, Shanghai Key Laboratory of Kidney and Blood Purification, Hemodialysis Quality Control Center of Shanghai, Shanghai, China.
| |
Collapse
|
2
|
Han ISM, Abramson D, Thayer KM. Insights into Rational Design of a New Class of Allosteric Effectors with Molecular Dynamics Markov State Models and Network Theory. ACS OMEGA 2022; 7:2831-2841. [PMID: 35097279 PMCID: PMC8792916 DOI: 10.1021/acsomega.1c05624] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Accepted: 12/16/2021] [Indexed: 05/12/2023]
Abstract
The development of drugs to restore protein function has been a major advance facilitated by molecular medicine. Allosteric regulation, a phenomenon widely observed in nature, in which a molecule binds to control a distance active site, holds great promise for regulating proteins, yet how to rationally design such a molecule remains a mystery. Over the past few years, we and others have developed several techniques based on molecular dynamics (MD) simulations: MD-Markov state models to capture global conformational substates, and network theory approach utilizing the interaction energy within the protein to confer local allosteric control. We focus on the key case study of the p53 Y220C mutation restoration by PK11000, a compound experimentally shown to reactivate p53 native function in Y220C mutant present tumors. We gain insights into the mutation and allosteric reactivation of the protein, which we anticipate will be applicable to de novo design to engineer new compounds not only for this mutation, but in other macromolecular systems as well.
Collapse
|
3
|
Wu H, Wu Y, Jiang Y, Zhou B, Zhou H, Chen Z, Xiong Y, Liu Q, Zhang H. scHiCStackL: a stacking ensemble learning-based method for single-cell Hi-C classification using cell embedding. Brief Bioinform 2021; 23:6374065. [PMID: 34553746 DOI: 10.1093/bib/bbab396] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 08/25/2021] [Accepted: 08/30/2021] [Indexed: 11/13/2022] Open
Abstract
Single-cell Hi-C data are a common data source for studying the differences in the three-dimensional structure of cell chromosomes. The development of single-cell Hi-C technology makes it possible to obtain batches of single-cell Hi-C data. How to quickly and effectively discriminate cell types has become one hot research field. However, the existing computational methods to predict cell types based on Hi-C data are found to be low in accuracy. Therefore, we propose a high accuracy cell classification algorithm, called scHiCStackL, based on single-cell Hi-C data. In our work, we first improve the existing data preprocessing method for single-cell Hi-C data, which allows the generated cell embedding better to represent cells. Then, we construct a two-layer stacking ensemble model for classifying cells. Experimental results show that the cell embedding generated by our data preprocessing method increases by 0.23, 1.22, 1.46 and 1.61$\%$ comparing with the cell embedding generated by the previously published method scHiCluster, in terms of the Acc, MCC, F1 and Precision confidence intervals, respectively, on the task of classifying human cells in the ML1 and ML3 datasets. When using the two-layer stacking ensemble framework with the cell embedding, scHiCStackL improves by 13.33, 19, 19.27 and 14.5 over the scHiCluster, in terms of the Acc, ARI, NMI and F1 confidence intervals, respectively. In summary, scHiCStackL achieves superior performance in predicting cell types using the single-cell Hi-C data. The webserver and source code of scHiCStackL are freely available at http://hww.sdu.edu.cn:8002/scHiCStackL/ and https://github.com/HaoWuLab-Bioinformatics/scHiCStackL, respectively.
Collapse
Affiliation(s)
- Hao Wu
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China.,School of Software, Shandong University, Jinan, 250101, Shandong, China
| | - Yingfu Wu
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Yuhong Jiang
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Bing Zhou
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Haoru Zhou
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Zhongli Chen
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism and School of Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 200240, Shanghai, China
| | - Quanzhong Liu
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Hongming Zhang
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| |
Collapse
|
4
|
Fan J, Chen M, Luo J, Yang S, Shi J, Yao Q, Zhang X, Du S, Qu H, Cheng Y, Ma S, Zhang M, Xu X, Wang Q, Zhan S. The prediction of asymptomatic carotid atherosclerosis with electronic health records: a comparative study of six machine learning models. BMC Med Inform Decis Mak 2021; 21:115. [PMID: 33820531 PMCID: PMC8020544 DOI: 10.1186/s12911-021-01480-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Accepted: 03/26/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Screening carotid B-mode ultrasonography is a frequently used method to detect subjects with carotid atherosclerosis (CAS). Due to the asymptomatic progression of most CAS patients, early identification is challenging for clinicians, and it may trigger ischemic stroke. Recently, machine learning has shown a strong ability to classify data and a potential for prediction in the medical field. The combined use of machine learning and the electronic health records of patients could provide clinicians with a more convenient and precise method to identify asymptomatic CAS. METHODS Retrospective cohort study using routine clinical data of medical check-up subjects from April 19, 2010 to November 15, 2019. Six machine learning models (logistic regression [LR], random forest [RF], decision tree [DT], eXtreme Gradient Boosting [XGB], Gaussian Naïve Bayes [GNB], and K-Nearest Neighbour [KNN]) were used to predict asymptomatic CAS and compared their predictability in terms of the area under the receiver operating characteristic curve (AUCROC), accuracy (ACC), and F1 score (F1). RESULTS Of the 18,441 subjects, 6553 were diagnosed with asymptomatic CAS. Compared to DT (AUCROC 0.628, ACC 65.4%, and F1 52.5%), the other five models improved prediction: KNN + 7.6% (0.704, 68.8%, and 50.9%, respectively), GNB + 12.5% (0.753, 67.0%, and 46.8%, respectively), XGB + 16.0% (0.788, 73.4%, and 55.7%, respectively), RF + 16.6% (0.794, 74.5%, and 56.8%, respectively) and LR + 18.1% (0.809, 74.7%, and 59.9%, respectively). The highest achieving model, LR predicted 1045/1966 cases (sensitivity 53.2%) and 3088/3566 non-cases (specificity 86.6%). A tenfold cross-validation scheme further verified the predictive ability of the LR. CONCLUSIONS Among machine learning models, LR showed optimal performance in predicting asymptomatic CAS. Our findings set the stage for an early automatic alarming system, allowing a more precise allocation of CAS prevention measures to individuals probably to benefit most.
Collapse
Affiliation(s)
- Jiaxin Fan
- Department of Neurology, The Second Affiliated Hospital of Xi'an Jiaotong University, No. 157 West Five Road, Xi'an, 710004, Shaanxi, China
| | - Mengying Chen
- Department of Neurology, The Second Affiliated Hospital of Xi'an Jiaotong University, No. 157 West Five Road, Xi'an, 710004, Shaanxi, China
| | - Jian Luo
- Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Shusen Yang
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an, China
| | - Jinming Shi
- Department of Neurology, The Second Affiliated Hospital of Xi'an Jiaotong University, No. 157 West Five Road, Xi'an, 710004, Shaanxi, China
| | - Qingling Yao
- Department of Neurology, The Second Affiliated Hospital of Xi'an Jiaotong University, No. 157 West Five Road, Xi'an, 710004, Shaanxi, China
| | - Xiaodong Zhang
- Department of Neurology, The Second Affiliated Hospital of Xi'an Jiaotong University, No. 157 West Five Road, Xi'an, 710004, Shaanxi, China
| | - Shuang Du
- Department of Neurology, The Second Affiliated Hospital of Xi'an Jiaotong University, No. 157 West Five Road, Xi'an, 710004, Shaanxi, China
| | - Huiyang Qu
- Department of Neurology, The Second Affiliated Hospital of Xi'an Jiaotong University, No. 157 West Five Road, Xi'an, 710004, Shaanxi, China
| | - Yuxuan Cheng
- Department of Neurology, The Second Affiliated Hospital of Xi'an Jiaotong University, No. 157 West Five Road, Xi'an, 710004, Shaanxi, China
| | - Shuyin Ma
- Department of Neurology, The Second Affiliated Hospital of Xi'an Jiaotong University, No. 157 West Five Road, Xi'an, 710004, Shaanxi, China
| | - Meijuan Zhang
- Department of Neurology, The Second Affiliated Hospital of Xi'an Jiaotong University, No. 157 West Five Road, Xi'an, 710004, Shaanxi, China
| | - Xi Xu
- Department of Neurology, The Second Affiliated Hospital of Xi'an Jiaotong University, No. 157 West Five Road, Xi'an, 710004, Shaanxi, China
| | - Qian Wang
- Department of Health Management, The Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
| | - Shuqin Zhan
- Department of Neurology, The Second Affiliated Hospital of Xi'an Jiaotong University, No. 157 West Five Road, Xi'an, 710004, Shaanxi, China.
| |
Collapse
|
5
|
Shan G, Bernick C, Caldwell JZK, Ritter A. Machine learning methods to predict amyloid positivity using domain scores from cognitive tests. Sci Rep 2021; 11:4822. [PMID: 33649452 PMCID: PMC7921140 DOI: 10.1038/s41598-021-83911-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Accepted: 02/09/2021] [Indexed: 01/31/2023] Open
Abstract
Amyloid-[Formula: see text] (A[Formula: see text]) is the target in many clinical trials for Alzheimer's disease (AD). Preclinical AD patients are heterogeneous with regards to different backgrounds and diagnosis. Accurately predicting A[Formula: see text] status of participants by using machine learning (ML) models based on easily accessible data, could improve the effectiveness of AD clinical trials. We will develop optimal ML models for each subpopulation stratified by sex and disease stages using sub scores from screening neurological tests. Data from the AD Neuroimaging Initiative (ADNI) were used to build the ML models, for three groups: individuals with significant memory concern, early mild cognitive impairment (MCI), and late MCI. Data were further separated into 6 groups by disease stage (3 levels) and sex (2 categories). The outcome was defined as the A[Formula: see text] status confirmed by the PET imaging, and the features include demographic data, newly identified risk factors, screening tests, and the domain scores from screening tests. Monte Carlo simulation studies were used together with k-fold cross-validation technique to compute model performance metric. We also develop a new feature selection method based on the stochastic ordering to avoiding searching all possible combinations of features. Accuracy of the identified optimal model for SMC male was over 90% by using domain scores, and accuracy for LMCI female was above 86%. Domain scores can improve the ML model prediction as compared to the total scores. Accurate ML prediction models can identify the proper population for AD clinical trials.
Collapse
Affiliation(s)
- Guogen Shan
- Department of Epidemiology and Biostatistics, School of Public Health, University of Nevada Las Vegas, Las Vegas, NV, 89154, USA.
| | - Charles Bernick
- Cleveland Clinic Lou Ruvo Center for Brain Health, 888 W. Bonneville Avenue, Las Vegas, NV, 89106, USA
| | - Jessica Z K Caldwell
- Cleveland Clinic Lou Ruvo Center for Brain Health, 888 W. Bonneville Avenue, Las Vegas, NV, 89106, USA
| | - Aaron Ritter
- Cleveland Clinic Lou Ruvo Center for Brain Health, 888 W. Bonneville Avenue, Las Vegas, NV, 89106, USA
| |
Collapse
|
6
|
Choi BK, Kim MS, Kim SH. Risk prediction models for the development of oral-mucosal pressure injuries in intubated patients in intensive care units: A prospective observational study. J Tissue Viability 2020; 29:252-257. [PMID: 32800513 DOI: 10.1016/j.jtv.2020.06.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Revised: 06/04/2020] [Accepted: 06/10/2020] [Indexed: 01/16/2023]
Abstract
PURPOSE Oral-mucosal pressure injury (PI) is the most commonly encountered medical device-related PIs. This study was performed to identify risk factors and construct a risk prediction model for oral-mucosal PI development in intubated patients in the intensive care unit. METHODS The study design was prospective, observational with medical record review. The inclusion criteria stipulated that 1) participants should be > 18 years of age, 2) there should be ETT use with holding methods including adhesive tape, gauze tying, and commercial devices. Data of 194 patient-days were analysed. The identification and validation of risk model development was performed using SPSS and the SciKit learn platform. RESULTS The risk prediction logistic models were composed of three factors (bite-block/airway, commercial ETT holder, and corticosteroid use) for lower oral-mucosal PI development and four factors (commercial ETT holder, vasopressor use, haematocrit, and serum albumin level) for upper oral-mucosal PI development among 10 significant input variables. The sensitivity and specificity for lower oral-mucosal PI development were 85.2% and 76.0%, respectively, and those for upper oral-mucosal PI development were 60.0% and 89.1%, respectively. Based on the results of the machine learning, the upper oral-mucosal PI development model had an accuracy of 79%, F1 score of 88%, precision of 86%, and recall of 91%. CONCLUSIONS The development of lower oral-mucosal PIs is affected by immobility-related factors and corticosteroid use, and that of upper oral-mucosal PIs by undernutrition-related factors and ETT holder use. The high sensitivities of the two logit models comprise important minimum data for positively predicting oral-mucosal PIs.
Collapse
Affiliation(s)
- Byung Kwan Choi
- Department of Neurosurgery, College of Medicine, Pusan National University, Busan, South Korea.
| | - Myoung Soo Kim
- Department of Nursing, Pukyong National University, Busan, South Korea.
| | - Soo Hyun Kim
- The Artificial Kidney Room, Busan Medical Center, Busan, South Korea.
| |
Collapse
|
7
|
Lakhani B, Thayer KM, Black E, Beveridge DL. Spectral analysis of molecular dynamics simulations on PDZ: MD sectors. J Biomol Struct Dyn 2020; 38:781-790. [PMID: 31262238 PMCID: PMC7307555 DOI: 10.1080/07391102.2019.1588169] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Accepted: 02/23/2019] [Indexed: 02/06/2023]
Abstract
The idea of protein "sectors" posits that sparse subsets of amino acid residues form cooperative networks that are key elements of protein stability, ligand binding, and allosterism. To date, protein sectors have been calculated by the statistical coupling analysis (SCA) method of Ranganathan and co-workers via the spectral analysis of conservation-weighted evolutionary covariance matrices obtained from a multiple sequence alignments of homologous families of proteins. SCA sectors, a knowledge-based protocol, have been indentified with functional properties and allosterism for a number of systems. In this study, we investigate the utility of the sector idea for the analysis of physics-based molecular dynamics (MD) trajectories of proteins. Our test case for this procedure is PSD95- PDZ3, one of the smallest proteins for which allosterism has been observed. It has served previously as a model system for a number of prediction algorithms, and is well characterized by X-ray crystallography, NMR spectroscopy and site specific mutagenisis. All-atom MD simulations were performed for a total of 500 nanoseconds using AMBER, and MD-calculated covariance matrices for the fluctuations of residue displacements and non-bonded interaction energies were subjected to spectral analysis in a manner analogous to that of SCA. The composition of MD sectors was compared with results from SCA, site specific mutagenesis, and allosterism. The concordance indicates that MD sectors are a viable protocol for analyzing MD trajectories and provide insight into the physical origin of the phenomenon.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Bharat Lakhani
- Program in Molecular Biophysics, Wesleyan University, Middletown CT 06459, USA
- Department of Molecular Biology & Biochemistry, Wesleyan University, Middletown CT 06459, USA
| | - Kelly M. Thayer
- Program in Molecular Biophysics, Wesleyan University, Middletown CT 06459, USA
- Chemistry Department, Wesleyan University, Middletown CT 06459, USA
- Department of Mathematics and Computer Science, Wesleyan University, Middletown CT 06459, USA
| | - Emily Black
- Program in Molecular Biophysics, Wesleyan University, Middletown CT 06459, USA
| | - David L. Beveridge
- Program in Molecular Biophysics, Wesleyan University, Middletown CT 06459, USA
- Chemistry Department, Wesleyan University, Middletown CT 06459, USA
| |
Collapse
|
8
|
Zhang K, Liu X, Jiang J, Li W, Wang S, Liu L, Zhou X, Wang L. Prediction of postoperative complications of pediatric cataract patients using data mining. J Transl Med 2019; 17:2. [PMID: 30602368 PMCID: PMC6317183 DOI: 10.1186/s12967-018-1758-2] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Accepted: 12/21/2018] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND The common treatment for pediatric cataracts is to replace the cloudy lens with an artificial one. However, patients may suffer complications (severe lens proliferation into the visual axis and abnormal high intraocular pressure; SLPVA and AHIP) within 1 year after surgery and factors causing these complications are unknown. METHODS Apriori algorithm is employed to find association rules related to complications. We use random forest (RF) and Naïve Bayesian (NB) to predict the complications with datasets preprocessed by SMOTE (synthetic minority oversampling technique). Genetic feature selection is exploited to find real features related to complications. RESULTS Average classification accuracies in three binary classification problems are over 75%. Second, the relationship between the classification performance and the number of random forest tree is studied. Results show except for gender and age at surgery (AS); other attributes are related to complications. Except for the secondary IOL placement, operation mode, AS and area of cataracts; other attributes are related to SLPVA. Except for the gender, operation mode, and laterality; other attributes are related to the AHIP. Next, the association rules related to the complications are mined out. Then additional 50 data were used to test the performance of RF and NB, both of then obtained the accuracies of over 65% for three classification problems. Finally, we developed a webserver to assist doctors. CONCLUSIONS The postoperative complications of pediatric cataracts patients can be predicted. Then the factors related to the complications are found. Finally, the association rules that is about the complications can provide reference to doctors.
Collapse
Affiliation(s)
- Kai Zhang
- School of Computer Science and Technology, Xidian University, No.2 South Taibai Rd, Xi'an, 710071, China.,State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, 510060, China
| | - Xiyang Liu
- School of Computer Science and Technology, Xidian University, No.2 South Taibai Rd, Xi'an, 710071, China. .,Institute of Software Engineering, Xidian University, Xi'an, 710071, China. .,School of Software, Xidian University, Xi'an, 710071, China.
| | - Jiewei Jiang
- School of Computer Science and Technology, Xidian University, No.2 South Taibai Rd, Xi'an, 710071, China.,State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, 510060, China
| | - Wangting Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, 510060, China
| | - Shuai Wang
- School of Software, Xidian University, Xi'an, 710071, China
| | - Lin Liu
- School of Computer Science and Technology, Xidian University, No.2 South Taibai Rd, Xi'an, 710071, China
| | - Xiaojing Zhou
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Liming Wang
- School of Computer Science and Technology, Xidian University, No.2 South Taibai Rd, Xi'an, 710071, China.,Institute of Software Engineering, Xidian University, Xi'an, 710071, China.,School of Software, Xidian University, Xi'an, 710071, China
| |
Collapse
|