Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Johansson R, Strålfors P, Cedersund G. Combining test statistics and models in bootstrapped model rejection: it is a balancing act. BMC Syst Biol 2014;8:46. [PMID: 24742065 PMCID: PMC4022267 DOI: 10.1186/1752-0509-8-46] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/11/2013] [Accepted: 04/01/2014] [Indexed: 11/29/2022]

For:	Johansson R, Strålfors P, Cedersund G. Combining test statistics and models in bootstrapped model rejection: it is a balancing act. BMC Syst Biol 2014;8:46. [PMID: 24742065 PMCID: PMC4022267 DOI: 10.1186/1752-0509-8-46] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/11/2013] [Accepted: 04/01/2014] [Indexed: 11/29/2022]

Number

Cited by Other Article(s)

Validation-based model selection for 13C metabolic flux analysis with uncertain measurement errors. PLoS Comput Biol 2022;18:e1009999. [PMID: 35404953 PMCID: PMC9022838 DOI: 10.1371/journal.pcbi.1009999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 04/21/2022] [Accepted: 03/07/2022] [Indexed: 11/26/2022] Open

Abstract

Accurate measurements of metabolic fluxes in living cells are central to metabolism research and metabolic engineering. The gold standard method is model-based metabolic flux analysis (MFA), where fluxes are estimated indirectly from mass isotopomer data with the use of a mathematical model of the metabolic network. A critical step in MFA is model selection: choosing what compartments, metabolites, and reactions to include in the metabolic network model. Model selection is often done informally during the modelling process, based on the same data that is used for model fitting (estimation data). This can lead to either overly complex models (overfitting) or too simple ones (underfitting), in both cases resulting in poor flux estimates. Here, we propose a method for model selection based on independent validation data. We demonstrate in simulation studies that this method consistently chooses the correct model in a way that is independent on errors in measurement uncertainty. This independence is beneficial, since estimating the true magnitude of these errors can be difficult. In contrast, commonly used model selection methods based on the χ²-test choose different model structures depending on the believed measurement uncertainty; this can lead to errors in flux estimates, especially when the magnitude of the error is substantially off. We present a new approach for quantification of prediction uncertainty of mass isotopomer distributions in other labelling experiments, to check for problems with too much or too little novelty in the validation data. Finally, in an isotope tracing study on human mammary epithelial cells, the validation-based model selection method identified pyruvate carboxylase as a key model component. Our results argue that validation-based model selection should be an integral part of MFA model development.

Measuring metabolic reaction fluxes in living cells is difficult, yet important. The gold standard is to label extracellular metabolites with ¹³C, to use mass spectrometry to find out where the ¹³C-atoms ends up, and finally use mathematical modelling to calculate how quickly each reaction must have flowed, for the ¹³C-atoms to end up like that. This measurement thus relies on usage of the right mathematical model, which must be selected among various candidate models. In this manuscript, we present a new way to do this model selection step, utilizing validation data. Using an adopted approach to calculate the uncertainty of model predictions, we identify new validation experiments, which are neither too similar, nor too dissimilar, compared to the previous training data. The model candidate that is best at predicting this new validation data is the one chosen. Tests on simulated data where the true model is known, shows that the validation-based method is robust when the magnitude of the error in the measurement uncertainty is unknown, something that conventional methods are not. This improvement is important since true uncertainties can be difficult to estimate for these data. Finally, we demonstrate how the new method can be used on real data, to identify fluxes and important reactions.

Collapse

Walsh CG, Sharman K, Hripcsak G. Beyond discrimination: A comparison of calibration methods and clinical usefulness of predictive models of readmission risk. J Biomed Inform 2017;76:9-18. [PMID: 29079501 DOI: 10.1016/j.jbi.2017.10.008] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2016] [Revised: 09/11/2017] [Accepted: 10/14/2017] [Indexed: 11/26/2022]

Abstract

BACKGROUND

Prior to implementing predictive models in novel settings, analyses of calibration and clinical usefulness remain as important as discrimination, but they are not frequently discussed. Calibration is a model's reflection of actual outcome prevalence in its predictions. Clinical usefulness refers to the utilities, costs, and harms of using a predictive model in practice. A decision analytic approach to calibrating and selecting an optimal intervention threshold may help maximize the impact of readmission risk and other preventive interventions.

OBJECTIVES

To select a pragmatic means of calibrating predictive models that requires a minimum amount of validation data and that performs well in practice. To evaluate the impact of miscalibration on utility and cost via clinical usefulness analyses.

MATERIALS AND METHODS

Observational, retrospective cohort study with electronic health record data from 120,000 inpatient admissions at an urban, academic center in Manhattan. The primary outcome was thirty-day readmission for three causes: all-cause, congestive heart failure, and chronic coronary atherosclerotic disease. Predictive modeling was performed via L1-regularized logistic regression. Calibration methods were compared including Platt Scaling, Logistic Calibration, and Prevalence Adjustment. Performance of predictive modeling and calibration was assessed via discrimination (c-statistic), calibration (Spiegelhalter Z-statistic, Root Mean Square Error [RMSE] of binned predictions, Sanders and Murphy Resolutions of the Brier Score, Calibration Slope and Intercept), and clinical usefulness (utility terms represented as costs). The amount of validation data necessary to apply each calibration algorithm was also assessed.

RESULTS

C-statistics by diagnosis ranged from 0.7 for all-cause readmission to 0.86 (0.78-0.93) for congestive heart failure. Logistic Calibration and Platt Scaling performed best and this difference required analyzing multiple metrics of calibration simultaneously, in particular Calibration Slopes and Intercepts. Clinical usefulness analyses provided optimal risk thresholds, which varied by reason for readmission, outcome prevalence, and calibration algorithm. Utility analyses also suggested maximum tolerable intervention costs, e.g., $1720 for all-cause readmissions based on a published cost of readmission of $11,862.

CONCLUSIONS

Choice of calibration method depends on availability of validation data and on performance. Improperly calibrated models may contribute to higher costs of intervention as measured via clinical usefulness. Decision-makers must understand underlying utilities or costs inherent in the use-case at hand to assess usefulness and will obtain the optimal risk threshold to trigger intervention with intervention cost limits as a result.

Collapse

Identifying Novel Transcriptional Regulators with Circadian Expression. Mol Cell Biol 2015;36:545-58. [PMID: 26644408 DOI: 10.1128/mcb.00701-15] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2015] [Accepted: 11/19/2015] [Indexed: 01/06/2023] Open

Hasdemir D, Hoefsloot HCJ, Smilde AK. Validation and selection of ODE based systems biology models: how to arrive at more reliable decisions. BMC SYSTEMS BIOLOGY 2015;9:32. [PMID: 26152206 PMCID: PMC4493957 DOI: 10.1186/s12918-015-0180-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2014] [Accepted: 06/16/2015] [Indexed: 01/07/2023]

Abstract

Background

Most ordinary differential equation (ODE) based modeling studies in systems biology involve a hold-out validation step for model validation. In this framework a pre-determined part of the data is used as validation data and, therefore it is not used for estimating the parameters of the model. The model is assumed to be validated if the model predictions on the validation dataset show good agreement with the data. Model selection between alternative model structures can also be performed in the same setting, based on the predictive power of the model structures on the validation dataset. However, drawbacks associated with this approach are usually under-estimated.

Results

We have carried out simulations by using a recently published High Osmolarity Glycerol (HOG) pathway from S.cerevisiae to demonstrate these drawbacks. We have shown that it is very important how the data is partitioned and which part of the data is used for validation purposes. The hold-out validation strategy leads to biased conclusions, since it can lead to different validation and selection decisions when different partitioning schemes are used. Furthermore, finding sensible partitioning schemes that would lead to reliable decisions are heavily dependent on the biology and unknown model parameters which turns the problem into a paradox. This brings the need for alternative validation approaches that offer flexible partitioning of the data. For this purpose, we have introduced a stratified random cross-validation (SRCV) approach that successfully overcomes these limitations.

Conclusions

SRCV leads to more stable decisions for both validation and selection which are not biased by underlying biological phenomena. Furthermore, it is less dependent on the specific noise realization in the data. Therefore, it proves to be a promising alternative to the standard hold-out validation strategy.

Electronic supplementary material

The online version of this article (doi:10.1186/s12918-015-0180-0) contains supplementary material, which is available to authorized users.

Collapse

Comulada WS. Model specification and bootstrapping for multiply imputed data: An application to count models for the frequency of alcohol use. THE STATA JOURNAL 2015;15:833-844. [PMID: 26973439 PMCID: PMC4782976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]