Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Lustgarten JL, Visweswaran S, Gopalakrishnan V, Cooper GF. Application of an efficient Bayesian discretization method to biomedical data. BMC Bioinformatics 2011;12:309. [PMID: 21798039 PMCID: PMC3162539 DOI: 10.1186/1471-2105-12-309] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2011] [Accepted: 07/28/2011] [Indexed: 12/16/2022] Open

For:	Lustgarten JL, Visweswaran S, Gopalakrishnan V, Cooper GF. Application of an efficient Bayesian discretization method to biomedical data. BMC Bioinformatics 2011;12:309. [PMID: 21798039 PMCID: PMC3162539 DOI: 10.1186/1471-2105-12-309] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2011] [Accepted: 07/28/2011] [Indexed: 12/16/2022] Open

Number

Cited by Other Article(s)

Furxhi I, Bengalli R, Motta G, Mantecca P, Kose O, Carriere M, Haq EU, O’Mahony C, Blosi M, Gardini D, Costa A. Data-Driven Quantitative Intrinsic Hazard Criteria for Nanoproduct Development in a Safe-by-Design Paradigm: A Case Study of Silver Nanoforms. ACS APPLIED NANO MATERIALS 2023;6:3948-3962. [PMID: 36938492 PMCID: PMC10012170 DOI: 10.1021/acsanm.3c00173] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 01/20/2023] [Indexed: 06/18/2023]

Abstract

The current European (EU) policies, that is, the Green Deal, envisage safe and sustainable practices for chemicals, which include nanoforms (NFs), at the earliest stages of innovation. A theoretically safe and sustainable by design (SSbD) framework has been established from EU collaborative efforts toward the definition of quantitative criteria in each SSbD dimension, namely, the human and environmental safety dimension and the environmental, social, and economic sustainability dimensions. In this study, we target the safety dimension, and we demonstrate the journey toward quantitative intrinsic hazard criteria derived from findable, accessible, interoperable, and reusable data. Data were curated and merged for the development of new approach methodologies, that is, quantitative structure-activity relationship models based on regression and classification machine learning algorithms, with the intent to predict a hazard class. The models utilize system (i.e., hydrodynamic size and polydispersity index) and non-system (i.e., elemental composition and core size)-dependent nanoscale features in combination with biological in vitro attributes and experimental conditions for various silver NFs, functional antimicrobial textiles, and cosmetics applications. In a second step, interpretable rules (criteria) followed by a certainty factor were obtained by exploiting a Bayesian network structure crafted by expert reasoning. The probabilistic model shows a predictive capability of ≈78% (average accuracy across all hazard classes). In this work, we show how we shifted from the conceptualization of the SSbD framework toward the realistic implementation with pragmatic instances. This study reveals (i) quantitative intrinsic hazard criteria to be considered in the safety aspects during synthesis stage, (ii) the challenges within, and (iii) the future directions for the generation and distillation of such criteria that can feed SSbD paradigms. Specifically, the criteria can guide material engineers to synthesize NFs that are inherently safer from alternative nanoformulations, at the earliest stages of innovation, while the models enable a fast and cost-efficient in silico toxicological screening of previously synthesized and hypothetical scenarios of yet-to-be synthesized NFs.

Collapse

Evolutionary Algorithm for Improving Decision Tree with Global Discretization in Manufacturing. SENSORS 2021;21:s21082849. [PMID: 33919558 PMCID: PMC8074051 DOI: 10.3390/s21082849] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 04/13/2021] [Accepted: 04/15/2021] [Indexed: 11/17/2022]

Esmaeilyfard R, Paknahad M, Dokohaki S. Sex classification of first molar teeth in cone beam computed tomography images using data mining. Forensic Sci Int 2020;318:110633. [PMID: 33279763 DOI: 10.1016/j.forsciint.2020.110633] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 10/15/2020] [Accepted: 11/25/2020] [Indexed: 11/28/2022]

Li Y, Jann T, Vera-Licona P. Benchmarking time-series data discretization on inference methods. Bioinformatics 2019;35:3102-3109. [PMID: 30657860 DOI: 10.1093/bioinformatics/btz036] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Revised: 12/10/2018] [Accepted: 01/14/2019] [Indexed: 12/15/2022] Open

Abstract

SUMMARY

The rapid development in quantitatively measuring DNA, RNA and protein has generated a great interest in the development of reverse-engineering methods, that is, data-driven approaches to infer the network structure or dynamical model of the system. Many reverse-engineering methods require discrete quantitative data as input, while many experimental data are continuous. Some studies have started to reveal the impact that the choice of data discretization has on the performance of reverse-engineering methods. However, more comprehensive studies are still greatly needed to systematically and quantitatively understand the impact that discretization methods have on inference methods. Furthermore, there is an urgent need for systematic comparative methods that can help select between discretization methods. In this work, we consider four published intracellular networks inferred with their respective time-series datasets. We discretized the data using different discretization methods. Across all datasets, changing the data discretization to a more appropriate one improved the reverse-engineering methods' performance. We observed no universal best discretization method across different time-series datasets. Thus, we propose DiscreeTest, a two-step evaluation metric for ranking discretization methods for time-series data. The underlying assumption of DiscreeTest is that an optimal discretization method should preserve the dynamic patterns observed in the original data across all variables. We used the same datasets and networks to show that DiscreeTest is able to identify an appropriate discretization among several candidate methods. To our knowledge, this is the first time that a method for benchmarking and selecting an appropriate discretization method for time-series data has been proposed.

AVAILABILITY AND IMPLEMENTATION

All the datasets, reverse-engineering methods and source code used in this paper are available in Vera-Licona's lab Github repository: https://github.com/VeraLiconaResearchGroup/Benchmarking_TSDiscretizations.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Ray SS, Misra S. Genetic algorithm for assigning weights to gene expressions using functional annotations. Comput Biol Med 2018;104:149-162. [PMID: 30472497 DOI: 10.1016/j.compbiomed.2018.11.011] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Revised: 11/13/2018] [Accepted: 11/13/2018] [Indexed: 12/17/2022]

Balasubramanian JB, Gopalakrishnan V. Tunable structure priors for Bayesian rule learning for knowledge integrated biomarker discovery. World J Clin Oncol 2018;9:98-109. [PMID: 30254965 PMCID: PMC6153126 DOI: 10.5306/wjco.v9.i5.98] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Revised: 07/24/2018] [Accepted: 08/05/2018] [Indexed: 02/06/2023] Open

Abstract

AIM

To develop a framework to incorporate background domain knowledge into classification rule learning for knowledge discovery in biomedicine.

METHODS

Bayesian rule learning (BRL) is a rule-based classifier that uses a greedy best-first search over a space of Bayesian belief-networks (BN) to find the optimal BN to explain the input dataset, and then infers classification rules from this BN. BRL uses a Bayesian score to evaluate the quality of BNs. In this paper, we extended the Bayesian score to include informative structure priors, which encodes our prior domain knowledge about the dataset. We call this extension of BRL as BRL_p. The structure prior has a λ hyperparameter that allows the user to tune the degree of incorporation of the prior knowledge in the model learning process. We studied the effect of λ on model learning using a simulated dataset and a real-world lung cancer prognostic biomarker dataset, by measuring the degree of incorporation of our specified prior knowledge. We also monitored its effect on the model predictive performance. Finally, we compared BRL_p to other state-of-the-art classifiers commonly used in biomedicine.

RESULTS

We evaluated the degree of incorporation of prior knowledge into BRL_p, with simulated data by measuring the Graph Edit Distance between the true data-generating model and the model learned by BRL_p. We specified the true model using informative structure priors. We observed that by increasing the value of λ we were able to increase the influence of the specified structure priors on model learning. A large value of λ of BRL_p caused it to return the true model. This also led to a gain in predictive performance measured by area under the receiver operator characteristic curve (AUC). We then obtained a publicly available real-world lung cancer prognostic biomarker dataset and specified a known biomarker from literature [the epidermal growth factor receptor (EGFR) gene]. We again observed that larger values of λ led to an increased incorporation of EGFR into the final BRL_p model. This relevant background knowledge also led to a gain in AUC.

CONCLUSION

BRL_p enables tunable structure priors to be incorporated during Bayesian classification rule learning that integrates data and knowledge as demonstrated using lung cancer biomarker data.

Collapse

Sriwanna K, Boongoen T, Iam-On N. Graph clustering-based discretization approach to microarray data. Knowl Inf Syst 2018. [DOI: 10.1007/s10115-018-1249-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

Finding optimum width of discretization for gene expressions using functional annotations. Comput Biol Med 2017;90:59-67. [DOI: 10.1016/j.compbiomed.2017.09.010] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2017] [Revised: 09/14/2017] [Accepted: 09/14/2017] [Indexed: 12/20/2022]

Rajappan S, Rangasamy D. Estimation of incomplete values in heterogeneous attribute large datasets using discretized Bayesian max–min ant colony optimization. Knowl Inf Syst 2017. [DOI: 10.1007/s10115-017-1123-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Liu Y, Gopalakrishnan V. An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data. DATA 2017;2. [PMID: 28243594 PMCID: PMC5325161 DOI: 10.3390/data2010008] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open

Learning Parsimonious Classification Rules from Gene Expression Data Using Bayesian Networks with Local Structure. DATA 2017;2. [PMID: 28331847 PMCID: PMC5358670 DOI: 10.3390/data2010005] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open

Abstract

The comprehensibility of good predictive models learned from high-dimensional gene expression data is attractive because it can lead to biomarker discovery. Several good classifiers provide comparable predictive performance but differ in their abilities to summarize the observed data. We extend a Bayesian Rule Learning (BRL-GSS) algorithm, previously shown to be a significantly better predictor than other classical approaches in this domain. It searches a space of Bayesian networks using a decision tree representation of its parameters with global constraints, and infers a set of IF-THEN rules. The number of parameters and therefore the number of rules are combinatorial to the number of predictor variables in the model. We relax these global constraints to a more generalizable local structure (BRL-LSS). BRL-LSS entails more parsimonious set of rules because it does not have to generate all combinatorial rules. The search space of local structures is much richer than the space of global structures. We design the BRL-LSS with the same worst-case time-complexity as BRL-GSS while exploring a richer and more complex model space. We measure predictive performance using Area Under the ROC curve (AUC) and Accuracy. We measure model parsimony performance by noting the average number of rules and variables needed to describe the observed data. We evaluate the predictive and parsimony performance of BRL-GSS, BRL-LSS and the state-of-the-art C4.5 decision tree algorithm, across 10-fold cross-validation using ten microarray gene-expression diagnostic datasets. In these experiments, we observe that BRL-LSS is similar to BRL-GSS in terms of predictive performance, while generating a much more parsimonious set of rules to explain the same observed data. BRL-LSS also needs fewer variables than C4.5 to explain the data with similar predictive performance. We also conduct a feasibility study to demonstrate the general applicability of our BRL methods on the newer RNA sequencing gene-expression data.

Collapse

Gallo CA, Cecchini RL, Carballido JA, Micheletto S, Ponzoni I. Discretization of gene expression data revised. Brief Bioinform 2015;17:758-70. [PMID: 26438418 DOI: 10.1093/bib/bbv074] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2015] [Indexed: 01/22/2023] Open

Gopalakrishnan V, Menon PG, Madan S. cMRI-BED: A novel informatics framework for cardiac MRI biomarker extraction and discovery applied to pediatric cardiomyopathy classification. Biomed Eng Online 2015;14 Suppl 2:S7. [PMID: 26329721 PMCID: PMC4547147 DOI: 10.1186/1475-925x-14-s2-s7] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open

Abstract

Background

Pediatric cardiomyopathies are a rare, yet heterogeneous group of pathologies of the myocardium that are routinely examined clinically using Cardiovascular Magnetic Resonance Imaging (cMRI). This gold standard powerful non-invasive tool yields high resolution temporal images that characterize myocardial tissue. The complexities associated with the annotation of images and extraction of markers, necessitate the development of efficient workflows to acquire, manage and transform this data into actionable knowledge for patient care to reduce mortality and morbidity.

Methods

We develop and test a novel informatics framework called cMRI-BED for biomarker extraction and discovery from such complex pediatric cMRI data that includes the use of a suite of tools for image processing, marker extraction and predictive modeling. We applied our workflow to obtain and analyze a dataset of 83 de-identified cases and controls containing cMRI-derived biomarkers for classifying positive versus negative findings of cardiomyopathy in children. Bayesian rule learning (BRL) methods were applied to derive understandable models in the form of propositional rules with posterior probabilities pertaining to their validity. Popular machine learning methods in the WEKA data mining toolkit were applied using default parameters to assess cross-validation performance of this dataset using accuracy and percentage area under ROC curve (AUC) measures.

Results

The best 10-fold cross validation predictive performance obtained on this cMRI-derived biomarker dataset was 80.72% accuracy and 79.6% AUC by a BRL decision tree model, which is promising from this type of rare data. Moreover, we were able to verify that mycocardial delayed enhancement (MDE) status, which is known to be an important qualitative factor in the classification of cardiomyopathies, is picked up by our rule models as an important variable for prediction.

Conclusions

Preliminary results show the feasibility of our framework for processing such data while also yielding actionable predictive classification rules that can augment knowledge conveyed in cardiac radiology outcome reports. Interactions between MDE status and other cMRI parameters that are depicted in our rules warrant further investigation and validation. Predictive rules learned from cMRI data to classify positive and negative findings of cardiomyopathy can enhance scientific understanding of the underlying interactions among imaging-derived parameters.

Collapse

Ogoe HA, Visweswaran S, Lu X, Gopalakrishnan V. Knowledge transfer via classification rules using functional mapping for integrative modeling of gene expression data. BMC Bioinformatics 2015. [PMID: 26202217 PMCID: PMC4512094 DOI: 10.1186/s12859-015-0643-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open

Abstract

Background

Most ‘transcriptomic’ data from microarrays are generated from small sample sizes compared to the large number of measured biomarkers, making it very difficult to build accurate and generalizable disease state classification models. Integrating information from different, but related, ‘transcriptomic’ data may help build better classification models. However, most proposed methods for integrative analysis of ‘transcriptomic’ data cannot incorporate domain knowledge, which can improve model performance. To this end, we have developed a methodology that leverages transfer rule learning and functional modules, which we call TRL-FM, to capture and abstract domain knowledge in the form of classification rules to facilitate integrative modeling of multiple gene expression data. TRL-FM is an extension of the transfer rule learner (TRL) that we developed previously. The goal of this study was to test our hypothesis that “an integrative model obtained via the TRL-FM approach outperforms traditional models based on single gene expression data sources”.

Results

To evaluate the feasibility of the TRL-FM framework, we compared the area under the ROC curve (AUC) of models developed with TRL-FM and other traditional methods, using 21 microarray datasets generated from three studies on brain cancer, prostate cancer, and lung disease, respectively. The results show that TRL-FM statistically significantly outperforms TRL as well as traditional models based on single source data. In addition, TRL-FM performed better than other integrative models driven by meta-analysis and cross-platform data merging.

Conclusions

The capability of utilizing transferred abstract knowledge derived from source data using feature mapping enables the TRL-FM framework to mimic the human process of learning and adaptation when performing related tasks. The novel TRL-FM methodology for integrative modeling for multiple ‘transcriptomic’ datasets is able to intelligently incorporate domain knowledge that traditional methods might disregard, to boost predictive power and generalization performance. In this study, TRL-FM’s abstraction of knowledge is achieved in the form of functional modules, but the overall framework is generalizable in that different approaches of acquiring abstract knowledge can be integrated into this framework.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0643-8) contains supplementary material, which is available to authorized users.

Collapse

Naeini MP, Cooper GF, Hauskrecht M. Binary Classifier Calibration Using a Bayesian Non-Parametric Approach. PROCEEDINGS OF THE ... SIAM INTERNATIONAL CONFERENCE ON DATA MINING. SIAM INTERNATIONAL CONFERENCE ON DATA MINING 2015;2015:208-216. [PMID: 26613068 DOI: 10.1137/1.9781611974010.24] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Biology-constrained gene expression discretization for cancer classification. Neurocomputing 2014. [DOI: 10.1016/j.neucom.2014.04.064] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Zaidi AH, Gopalakrishnan V, Kasi PM, Zeng X, Malhotra U, Balasubramanian J, Visweswaran S, Sun M, Flint MS, Davison JM, Hood BL, Conrads TP, Bergman JJ, Bigbee WL, Jobe BA. Evaluation of a 4-protein serum biomarker panel-biglycan, annexin-A6, myeloperoxidase, and protein S100-A9 (B-AMP)-for the detection of esophageal adenocarcinoma. Cancer 2014;120:3902-13. [PMID: 25100294 DOI: 10.1002/cncr.28963] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2014] [Revised: 06/23/2014] [Accepted: 07/22/2014] [Indexed: 12/14/2022]

Abstract

BACKGROUND

Esophageal adenocarcinoma (EAC) is associated with a dismal prognosis. The identification of cancer biomarkers can advance the possibility for early detection and better monitoring of tumor progression and/or response to therapy. The authors present results from the development of a serum-based, 4-protein (biglycan, myeloperoxidase, annexin-A6, and protein S100-A9) biomarker panel for EAC.

METHODS

A vertically integrated, proteomics-based biomarker discovery approach was used to identify candidate serum biomarkers for the detection of EAC. Liquid chromatography-tandem mass spectrometry analysis was performed on formalin-fixed, paraffin-embedded tissue samples that were collected from across the Barrett esophagus (BE)-EAC disease spectrum. The mass spectrometry-based spectral count data were used to guide the selection of candidate serum biomarkers. Then, the serum enzyme-linked immunosorbent assay data were validated in an independent cohort and were used to develop a multiparametric risk-assessment model to predict the presence of disease.

RESULTS

With a minimum threshold of 10 spectral counts, 351 proteins were identified as differentially abundant along the spectrum of Barrett esophagus, high-grade dysplasia, and EAC (P<.05). Eleven proteins from this data set were then tested using enzyme-linked immunosorbent assays in serum samples, of which 5 proteins were significantly elevated in abundance among patients who had EAC compared with normal controls, which mirrored trends across the disease spectrum present in the tissue data. By using serum data, a Bayesian rule-learning predictive model with 4 biomarkers was developed to accurately classify disease class; the cross-validation results for the merged data set yielded accuracy of 87% and an area under the receiver operating characteristic curve of 93%.

CONCLUSIONS

Serum biomarkers hold significant promise for the early, noninvasive detection of EAC.

Collapse

AbdelRahman SE, Zhang M, Bray BE, Kawamoto K. A three-step approach for the derivation and validation of high-performing predictive models using an operational dataset: congestive heart failure readmission case study. BMC Med Inform Decis Mak 2014;14:41. [PMID: 24886637 PMCID: PMC4074427 DOI: 10.1186/1472-6947-14-41] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2013] [Accepted: 05/06/2014] [Indexed: 11/23/2022] Open

Abstract

Background

The aim of this study was to propose an analytical approach to develop high-performing predictive models for congestive heart failure (CHF) readmission using an operational dataset with incomplete records and changing data over time.

Methods

Our analytical approach involves three steps: pre-processing, systematic model development, and risk factor analysis. For pre-processing, variables that were absent in >50% of records were removed. Moreover, the dataset was divided into a validation dataset and derivation datasets which were separated into three temporal subsets based on changes to the data over time. For systematic model development, using the different temporal datasets and the remaining explanatory variables, the models were developed by combining the use of various (i) statistical analyses to explore the relationships between the validation and the derivation datasets; (ii) adjustment methods for handling missing values; (iii) classifiers; (iv) feature selection methods; and (iv) discretization methods. We then selected the best derivation dataset and the models with the highest predictive performance. For risk factor analysis, factors in the highest-performing predictive models were analyzed and ranked using (i) statistical analyses of the best derivation dataset, (ii) feature rankers, and (iii) a newly developed algorithm to categorize risk factors as being strong, regular, or weak.

Results

The analysis dataset consisted of 2,787 CHF hospitalizations at University of Utah Health Care from January 2003 to June 2013. In this study, we used the complete-case analysis and mean-based imputation adjustment methods; the wrapper subset feature selection method; and four ranking strategies based on information gain, gain ratio, symmetrical uncertainty, and wrapper subset feature evaluators. The best-performing models resulted from the use of a complete-case analysis derivation dataset combined with the Class-Attribute Contingency Coefficient discretization method and a voting classifier which averaged the results of multi-nominal logistic regression and voting feature intervals classifiers. Of 42 final model risk factors, discharge disposition, discretized age, and indicators of anemia were the most significant. This model achieved a c-statistic of 86.8%.

Conclusion

The proposed three-step analytical approach enhanced predictive model performance for CHF readmissions. It could potentially be leveraged to improve predictive model performance in other areas of clinical medicine.

Collapse

Balasubramanian JB, Visweswaran S, Cooper GF, Gopalakrishnan V. Selective model averaging with bayesian rule learning for predictive biomedicine. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2014;2014:17-22. [PMID: 25717394 PMCID: PMC4333697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Maslove DM, Podchiyska T, Lowe HJ. Discretization of continuous features in clinical datasets. J Am Med Inform Assoc 2012;20:544-53. [PMID: 23059731 DOI: 10.1136/amiajnl-2012-000929] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open

Lustgarten JL, Gopalakrishnan V, Grover H, Visweswaran S. Improving classification performance with discretization on biomedical datasets. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2008;2008:445-9. [PMID: 18999186 PMCID: PMC2656082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Received: 03/14/2008] [Revised: 07/01/2008] [Indexed: 05/27/2023]