151
|
Gerds TA, Andersen PK, Kattan MW. Calibration plots for risk prediction models in the presence of competing risks. Stat Med 2014; 33:3191-203. [DOI: 10.1002/sim.6152] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2013] [Revised: 12/20/2013] [Accepted: 02/27/2014] [Indexed: 11/07/2022]
Affiliation(s)
- Thomas A. Gerds
- Department of Biostatistics; University of Copenhagen; Copenhagen Denmark
| | - Per K. Andersen
- Department of Biostatistics; University of Copenhagen; Copenhagen Denmark
| | - Michael W. Kattan
- Department of Quantitative Health Sciences; Cleveland Clinic; Cleveland OH U.S.A
| |
Collapse
|
152
|
Pezzini A, Grassi M, Lodigiani C, Patella R, Gandolfo C, Zini A, Delodovici ML, Paciaroni M, Del Sette M, Toriello A, Musolino R, Calabrò RS, Bovi P, Adami A, Silvestrelli G, Sessa M, Cavallini A, Marcheselli S, Bonifati DM, Checcarelli N, Tancredi L, Chiti A, Del Zotto E, Spalloni A, Giossi A, Volonghi I, Costa P, Giacalone G, Ferrazzi P, Poli L, Morotti A, Rasura M, Simone AM, Gamba M, Cerrato P, Micieli G, Melis M, Massucco D, De Giuli V, Iacoviello L, Padovani A. Predictors of long-term recurrent vascular events after ischemic stroke at young age: the Italian Project on Stroke in Young Adults. Circulation 2014; 129:1668-76. [PMID: 24508827 DOI: 10.1161/circulationaha.113.005663] [Citation(s) in RCA: 78] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
BACKGROUND Data on long-term risk and predictors of recurrent thrombotic events after ischemic stroke at a young age are limited. METHODS AND RESULTS We followed 1867 patients with first-ever ischemic stroke who were 18 to 45 years of age (mean age, 36.8±7.1 years; women, 49.0%), as part of the Italian Project on Stroke in Young Adults (IPSYS). Median follow-up was 40 months (25th to 75th percentile, 53). The primary end point was a composite of ischemic stroke, transient ischemic attack, myocardial infarction, or other arterial events. One hundred sixty-three patients had recurrent thrombotic events (average rate, 2.26 per 100 person-years at risk). At 10 years, cumulative risk was 14.7% (95% confidence interval, 12.2%-17.9%) for primary end point, 14.0% (95% confidence interval, 11.4%-17.1%) for brain ischemia, and 0.7% (95% confidence interval, 0.4%-1.3%) for myocardial infarction or other arterial events. Familial history of stroke, migraine with aura, circulating antiphospholipid antibodies, discontinuation of antiplatelet and antihypertensive medications, and any increase of 1 traditional vascular risk factor were independent predictors of the composite end point in multivariable Cox proportional hazards analysis. A point-scoring system for each variable was generated by their β-coefficients, and a predictive score (IPSYS score) was calculated as the sum of the weighted scores. The area under the receiver operating characteristic curve of the 0- to 5-year score was 0.66 (95% confidence interval, 0.61-0.71; mean, 10-fold internally cross-validated area under the receiver operating characteristic curve, 0.65). CONCLUSIONS Among patients with ischemic stroke aged 18 to 45 years, the long-term risk of recurrent thrombotic events is associated with modifiable, age-specific risk factors. The IPSYS score may serve as a simple tool for risk estimation.
Collapse
Affiliation(s)
- Alessandro Pezzini
- Dipartimento di Scienze Mediche e Chirurgiche, Clinica Neurologica, Università degli Studi di Brescia, Brescia, Italia (A. Pezzini, P.C., L.P., A.M., V.D.G., A. Padovani; Dipartimento di Scienze del Sistema Nervoso e del Comportamento, Unità di Statistica Medica e Genomica, Università di Pavia, Pavia, Italia (M.G.); Centro Trombosi, IRCCS Istituto Clinico Humanitas, Rozzano-Milano, Italia (C.L., P.F.); Stroke Unit, Azienda Ospedaliera Sant'Andrea, Roma, Italia (R.P., A.S., M.R.); Dipartimento di Neuroscienze, Riabilitazione, Oftalmologia, Genetica e Scienze Materno-Infantili, Università di Genova, Genova, Italia (C.G., D.M.); Stroke Unit, Clinica Neurologica, Nuovo Ospedale Civile "S. Agostino Estense", AUSL Modena, Italia (A.Z., A.M.S.); Unità di Neurologia, Ospedale di Circolo, Università dell'Insubria, Varese, Italia (M.L.D.); Stroke Unit, Divisione di Medicina Cardiovascolare, Università di Perugia, Perugia, Italia (M.P.); Unità di Neurologia, Ospedale S. Andrea, La Spezia, Italia (M.D.S.); U.O.C. Neurologia, A.O Universitaria "San Giovanni di Dio e Ruggi d'Aragona", Salerno, Italia (A.T.); Dipartimento di Neuroscienze, Scienze Psichiatriche e Anestesiologiche, Clinica Neurologica, Università di Messina, Messina, Italia (R.M.); Istituto di Ricovero e Cura a Carattere Scientifico, Centro Neurolesi Bonino-Pulejo, Messina, Italia (R.S.C.); UO Neurologia, Azienda Ospedaliera-Universitaria Borgo Trento, Verona, Italia (P.B.); Stroke Center, Dipartimento di Neurologia, Ospedale Sacro Cuore Negrar, Verona, Italia (A.A.); Stroke Unit, U.O Neurologia, Azienda Ospedaliera "C. Poma", Mantova, Italia (G.S.); Stroke Unit, U.O Neurologia, IRCCS Ospedale S. Raffaele, Milano, Italia (M.S., G.G.); U.C Malattie Cerebrovascolari e Stroke Unit (A.C.) and U.C Neurologia d'Urgenza (G.M.), IRCCS Fondazione Istituto Neurologico Nazionale "C. Mondino," Pavia, Italia; Neurologia d'Urgenza and Stroke Unit, IRCCS Istituto Clinico Humanitas, Rozzano-Milano, Italia (S.M.); Stroke Un
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
153
|
Wolbers M, Blanche P, Koller MT, Witteman JCM, Gerds TA. Concordance for prognostic models with competing risks. Biostatistics 2014; 15:526-39. [PMID: 24493091 PMCID: PMC4059461 DOI: 10.1093/biostatistics/kxt059] [Citation(s) in RCA: 129] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
The concordance probability is a widely used measure to assess discrimination of prognostic models with binary and survival endpoints. We formally define the concordance probability for a prognostic model of the absolute risk of an event of interest in the presence of competing risks and relate it to recently proposed time-dependent area under the receiver operating characteristic curve measures. For right-censored data, we investigate inverse probability of censoring weighted (IPCW) estimates of a truncated concordance index based on a working model for the censoring distribution. We demonstrate consistency and asymptotic normality of the IPCW estimate if the working model is correctly specified and derive an explicit formula for the asymptotic variance under independent censoring. The small sample properties of the estimator are assessed in a simulation study also against misspecification of the working model. We further illustrate the methods by computing the concordance probability for a prognostic model of coronary heart disease (CHD) events in the presence of the competing risk of non-CHD death.
Collapse
Affiliation(s)
- Marcel Wolbers
- Oxford University Clinical Research Unit, Wellcome Trust Major Overseas Programme, Ho Chi Minh City, Viet Nam and Centre for Tropical Medicine, Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7FZ, UK
| | - Paul Blanche
- Université Bordeaux Segalen, ISPED, INSERM U897, F-33000 Bordeaux, France
| | - Michael T Koller
- Basel Institute for Clinical Epidemiology and Biostatistics, University Hospital Basel, 4031 Basel, Switzerland
| | | | - Thomas A Gerds
- Department of Biostatistics, University of Copenhagen, 1014 Copenhagen K, Denmark
| |
Collapse
|
154
|
Waljee AK, Higgins PDR, Singal AG. A primer on predictive models. Clin Transl Gastroenterol 2014; 5:e44. [PMID: 24384866 PMCID: PMC3912317 DOI: 10.1038/ctg.2013.19] [Citation(s) in RCA: 78] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/20/2013] [Revised: 09/26/2013] [Accepted: 11/06/2013] [Indexed: 12/21/2022] Open
Abstract
Prediction research is becoming increasing popular; however, the differences between traditional explanatory research and prediction research are often poorly understood, resulting in a wide variation in the methodologic quality of prediction research. This primer describes the basic methods for conducting prediction research in gastroenterology and highlights differences between traditional explanatory research and predictive research.
Collapse
Affiliation(s)
- Akbar K Waljee
- 1] Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, USA [2] Veterans Affairs Center for Clinical Management Research, Ann Arbor, Michigan, USA
| | - Peter D R Higgins
- Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, USA
| | - Amit G Singal
- 1] Department of Internal Medicine, UT Southwestern Medical Center, Dallas, Texas, USA [2] Department of Clinical Sciences, University of Texas Southwestern, Dallas, Texas, USA
| |
Collapse
|
155
|
Predictive modeling for diagnostic tests with high specificity, but low sensitivity: a study of the glycerol test in patients with suspected Menière's disease. PLoS One 2013; 8:e79315. [PMID: 24260193 PMCID: PMC3832512 DOI: 10.1371/journal.pone.0079315] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2013] [Accepted: 09/20/2013] [Indexed: 11/20/2022] Open
Abstract
A high specificity does not ensure that the expected benefit of a diagnostic test outweighs its cost. Problems arise, in particular, when the investigation is expensive, the prevalence of a positive test result is relatively small for the candidate patients, and the sensitivity of the test is low so that the information provided by a negative result is virtually negligible. The consequence may be that a potentially useful test does not gain broader acceptance. Here we show how predictive modeling can help to identify patients for whom the ratio of expected benefit and cost reaches an acceptable level so that testing these patients is reasonable even though testing all patients might be considered wasteful. Our application example is based on a retrospective study of the glycerol test, which is used to corroborate a suspected diagnosis of Menière’s disease. Using the pretest hearing thresholds at up to 10 frequencies, predictions were made by K-nearest neighbor classification or logistic regression. Both methods estimate, based on results from previous patients, the posterior probability that performing the considered test in a new patient will have a positive outcome. The quality of the prediction was evaluated using leave-one-out cross-validation, making various assumptions about the costs and benefits of testing. With reference to all 356 cases, the probability of a positive test result was almost 0.4. For subpopulations selected by K-nearest neighbor classification, which was clearly superior to logistic regression, this probability could be increased up to about 0.6. Thus, the odds of a positive test result were more than doubled.
Collapse
|
156
|
Wang W, Baggerly KA, Knudsen S, Askaa J, Mazin W, Coombes KR. Independent validation of a model using cell line chemosensitivity to predict response to therapy. J Natl Cancer Inst 2013; 105:1284-91. [PMID: 23964133 DOI: 10.1093/jnci/djt202] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Methods using cell line microarray and drug sensitivity data to predict patients' chemotherapy response are appealing, but groups may be reluctant to release details to preserve intellectual property. Here we describe a case study to validate predictions while treating the methods as a "black box." METHODS Medical Prognosis Institute (MPI) constructed cell-line-derived sensitivity scores (SSs) and combined scores (CSs) that incorporate clinical variables. MD Anderson researchers evaluated their predictions. We searched the Gene Expression Omnibus (GEO) to identify validation datasets, and we performed statistical evaluation of the agreement between prediction and clinical observation. RESULTS We identified 3 suitable datasets: GSE16446 (n = 120; binary outcome), GSE17920 (n = 130; binary outcome), and GSE10255 (n = 161; continuous and time-to-event outcomes). The SS was statistically significantly associated with primary treatment responses for all studies (GSE16446: P = .02; GSE17920: P = .02; GSE10255: P = .02). Dichotomized SSs performed no better than chance for GSE16446 and GSE17920, and categorized SSs did not predict disease-free survival (GSE10255). SSs sometimes improved on predictions using clinical variables (GSE16446: P = .05; GSE17920: P = .31; GSE10255: P = .045), but gains were limited (95% confidence intervals for GSE16446 and GSE17920 include 0). The CS did not predict treatment response for GSE16446 (P = .55), but it did for GSE17920 (P < .001). Coefficients of clinical variables provided by MPI for CSs agree with estimates for GSE17920 better than estimates for GSE16446. CONCLUSIONS Model predictions were better than chance in all three datasets. However, these scores added little to existing clinical predictors; statistically significant contributions were likely to be too small to change clinical practice. These findings suggest that discovering better predictors will require both cell line data and a clinical training dataset of patient samples.
Collapse
Affiliation(s)
- Wenting Wang
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX 77230, USA
| | | | | | | | | | | |
Collapse
|
157
|
Cohen ME, Ko CY, Bilimoria KY, Zhou L, Huffman K, Wang X, Liu Y, Kraemer K, Meng X, Merkow R, Chow W, Matel B, Richards K, Hart AJ, Dimick JB, Hall BL. Optimizing ACS NSQIP modeling for evaluation of surgical quality and risk: patient risk adjustment, procedure mix adjustment, shrinkage adjustment, and surgical focus. J Am Coll Surg 2013; 217:336-46.e1. [PMID: 23628227 DOI: 10.1016/j.jamcollsurg.2013.02.027] [Citation(s) in RCA: 426] [Impact Index Per Article: 38.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2013] [Accepted: 02/26/2013] [Indexed: 12/17/2022]
Abstract
The American College of Surgeons National Surgical Quality Improvement Program (ACS NSQIP) collects detailed clinical data from participating hospitals using standardized data definitions, analyzes these data, and provides participating hospitals with reports that permit risk-adjusted comparisons with a surgical quality standard. Since its inception, the ACS NSQIP has worked to refine surgical outcomes measurements and enhance statistical methods to improve the reliability and validity of this hospital profiling. From an original focus on controlling for between-hospital differences in patient risk factors with logistic regression, ACS NSQIP has added a variable to better adjust for the complexity and risk profile of surgical procedures (procedure mix adjustment) and stabilized estimates derived from small samples by using a hierarchical model with shrinkage adjustment. New models have been developed focusing on specific surgical procedures (eg, "Procedure Targeted" models), which provide opportunities to incorporate indication and other procedure-specific variables and outcomes to improve risk adjustment. In addition, comparative benchmark reports given to participating hospitals have been expanded considerably to allow more detailed evaluations of performance. Finally, procedures have been developed to estimate surgical risk for individual patients. This article describes the development of, and justification for, these new statistical methods and reporting strategies in ACS NSQIP.
Collapse
Affiliation(s)
- Mark E Cohen
- Division of Research and Optimal Patient Care, American College of Surgeons, Chicago, IL 60611-3211, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
158
|
Prediction Models for Postpartum Urinary and Fecal Incontinence in Primiparous Women. Female Pelvic Med Reconstr Surg 2013; 19:110-8. [DOI: 10.1097/spv.0b013e31828508f0] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
159
|
Leusink FKJ, van Es RJJ, de Bree R, Baatenburg de Jong RJ, van Hooff SR, Holstege FCP, Slootweg PJ, Brakenhoff RH, Takes RP. Novel diagnostic modalities for assessment of the clinically node-negative neck in oral squamous-cell carcinoma. Lancet Oncol 2013. [PMID: 23182196 DOI: 10.1016/s1470-2045(12)70395-9] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Oral squamous-cell carcinomas arise in mucosal linings of the oral cavity and frequently metastasise to regional lymph nodes in the neck. The presence of nodal metastases is a determinant of prognosis and clinical management. The neck is staged by palpation and imaging, but accuracy of these techniques to detect small metastases is low. In general, 30-40% of patients will have occult nodal disease and will develop clinically detectable lymph-node metastases when the neck is left untreated. The choice at present is either elective treatment or careful observation followed by treatment of the neck in patients who develop manifest metastases. These unsatisfying therapeutic options have been the subject of debate for decades. Recent developments in staging of the neck, including expression profiling and sentinel lymph-node biopsy, will allow more personalised management of the neck.
Collapse
Affiliation(s)
- Frank K J Leusink
- Department of Oral and Maxillofacial Surgery, University Medical Centre Utrecht, Utrecht, Netherlands.
| | | | | | | | | | | | | | | | | |
Collapse
|
160
|
Debray TPA, Moons KGM, Ahmed I, Koffijberg H, Riley RD. A framework for developing, implementing, and evaluating clinical prediction models in an individual participant data meta-analysis. Stat Med 2013; 32:3158-80. [PMID: 23307585 DOI: 10.1002/sim.5732] [Citation(s) in RCA: 130] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2012] [Accepted: 12/18/2012] [Indexed: 11/10/2022]
Abstract
The use of individual participant data (IPD) from multiple studies is an increasingly popular approach when developing a multivariable risk prediction model. Corresponding datasets, however, typically differ in important aspects, such as baseline risk. This has driven the adoption of meta-analytical approaches for appropriately dealing with heterogeneity between study populations. Although these approaches provide an averaged prediction model across all studies, little guidance exists about how to apply or validate this model to new individuals or study populations outside the derivation data. We consider several approaches to develop a multivariable logistic regression model from an IPD meta-analysis (IPD-MA) with potential between-study heterogeneity. We also propose strategies for choosing a valid model intercept for when the model is to be validated or applied to new individuals or study populations. These strategies can be implemented by the IPD-MA developers or future model validators. Finally, we show how model generalizability can be evaluated when external validation data are lacking using internal-external cross-validation and extend our framework to count and time-to-event data. In an empirical evaluation, our results show how stratified estimation allows study-specific model intercepts, which can then inform the intercept to be used when applying the model in practice, even to a population not represented by included studies. In summary, our framework allows the development (through stratified estimation), implementation in new individuals (through focused intercept choice), and evaluation (through internal-external validation) of a single, integrated prediction model from an IPD-MA in order to achieve improved model performance and generalizability.
Collapse
Affiliation(s)
- Thomas P A Debray
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands.
| | | | | | | | | |
Collapse
|
161
|
Ling Y, Johnson MK, Kiely DG, Condliffe R, Elliot CA, Gibbs JSR, Howard LS, Pepke-Zaba J, Sheares KKK, Corris PA, Fisher AJ, Lordan JL, Gaine S, Coghlan JG, Wort SJ, Gatzoulis MA, Peacock AJ. Changing Demographics, Epidemiology, and Survival of Incident Pulmonary Arterial Hypertension. Am J Respir Crit Care Med 2012; 186:790-6. [DOI: 10.1164/rccm.201203-0383oc] [Citation(s) in RCA: 382] [Impact Index Per Article: 31.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
|
162
|
Petretta M, Cuocolo A. Prediction models for risk classification in cardiovascular disease. Eur J Nucl Med Mol Imaging 2012; 39:1959-69. [PMID: 23053326 DOI: 10.1007/s00259-012-2254-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2012] [Accepted: 09/12/2012] [Indexed: 10/27/2022]
Abstract
Risk stratification is an increasingly important tool for the management of patients with different diseases and also for decision making in subjects not yet with overt disease but who are at risk of disease in the short or long term or during their lifetime. Careful risk assessment in the individual patient, based on clinical, laboratory and imaging data, can be helpful for making decisions about treatment or other prevention strategies. As regards cardiovascular disease, many models have been suggested and are available for the prediction of diagnosis and prognosis and there are several algorithms for risk prediction. However, current risk screening methods are not perfect. This review evaluates relative strengths and limitations of traditional and more recent methods for assessing the performance of prediction models.
Collapse
Affiliation(s)
- Mario Petretta
- Department of Internal Medicine, Cardiovascular and Immunological Sciences, University Federico II, Naples, Italy
| | | |
Collapse
|
163
|
Parast L, Cheng SC, Cai T. Landmark Prediction of Long Term Survival Incorporating Short Term Event Time Information. J Am Stat Assoc 2012; 107:1492-1501. [PMID: 23293405 DOI: 10.1080/01621459.2012.721281] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
In recent years, a wide range of markers have become available as potential tools to predict risk or progression of disease. In addition to such biological and genetic markers, short term outcome information may be useful in predicting long term disease outcomes. When such information is available, it would be desirable to combine this along with predictive markers to improve the prediction of long term survival. Most existing methods for incorporating censored short term event information in predicting long term survival focus on modeling the disease process and are derived under restrictive parametric models in a multi-state survival setting. When such model assumptions fail to hold, the resulting prediction of long term outcomes may be invalid or inaccurate. When there is only a single discrete baseline covariate, a fully non-parametric estimation procedure to incorporate short term event time information has been previously proposed. However, such an approach is not feasible for settings with one or more continuous covariates due to the curse of dimensionality. In this paper, we propose to incorporate short term event time information along with multiple covariates collected up to a landmark point via a flexible varying-coefficient model. To evaluate and compare the prediction performance of the resulting landmark prediction rule, we use robust non-parametric procedures which do not require the correct specification of the proposed varying coefficient model. Simulation studies suggest that the proposed procedures perform well in finite samples. We illustrate them here using a dataset of post-dialysis patients with end-stage renal disease.
Collapse
Affiliation(s)
- Layla Parast
- Department of Biostatistics, Harvard School of Public Health, 677 Huntington Avenue, Boston, MA 02115
| | | | | |
Collapse
|
164
|
Sauerbrei W, Boulesteix AL, Binder H. Stability investigations of multivariable regression models derived from low- and high-dimensional data. J Biopharm Stat 2012; 21:1206-31. [PMID: 22023687 DOI: 10.1080/10543406.2011.629890] [Citation(s) in RCA: 90] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
Abstract
Multivariable regression models can link a potentially large number of variables to various kinds of outcomes, such as continuous, binary, or time-to-event endpoints. Selection of important variables and selection of the functional form for continuous covariates are key parts of building such models but are notoriously difficult due to several reasons. Caused by multicollinearity between predictors and a limited amount of information in the data, (in)stability can be a serious issue of models selected. For applications with a moderate number of variables, resampling-based techniques have been developed for diagnosing and improving multivariable regression models. Deriving models for high-dimensional molecular data has led to the need for adapting these techniques to settings where the number of variables is much larger than the number of observations. Three studies with a time-to-event outcome, of which one has high-dimensional data, are used to illustrate several techniques. Investigations at the covariate level and at the predictor level are seen to provide considerable insight into model stability and performance. While some areas are indicated where resampling techniques for model building still need further refinement, our case studies illustrate that these techniques can already be recommended for wider use.
Collapse
Affiliation(s)
- Willi Sauerbrei
- Institute of Medical Biometry and Informatics, University Medical Center Freiburg, Freiburg, Germany.
| | | | | |
Collapse
|
165
|
Van Calster B, Van Belle V, Vergouwe Y, Steyerberg EW. Discrimination ability of prediction models for ordinal outcomes: relationships between existing measures and a new measure. Biom J 2012; 54:674-85. [PMID: 22711459 DOI: 10.1002/bimj.201200026] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2012] [Revised: 04/12/2012] [Accepted: 04/23/2012] [Indexed: 11/08/2022]
Abstract
In this paper, we focus on measures to evaluate discrimination of prediction models for ordinal outcomes. We review existing extensions of the dichotomous c-index-which is equivalent to the area under the receiver operating characteristic (ROC) curve--suggest a new measure, and study their relationships. The volume under the ROC surface (VUS) scores sets of cases including one case from each outcome category. VUS considers sets as either correctly or incorrectly ordered by the model. All other existing measures assess pairs of cases. We propose an ordinal c-index (ORC) that is set-based but, contrary to VUS, scores sets more gradually by indicating the closeness of the model-based ordering to the perfect ordering. As a result, the ORC does not decrease rapidly as the number of outcome categories increases. It turns out that the ORC can be rewritten as the average of pairwise c-indexes. Hence, the ORC has both a set- and pair-based interpretation. There are several relationships between the existing measures, leading to only two types of existing measures: a prevalence-weighted average of pairwise c-indexes and the VUS. Our suggested measure ORC positions itself in between as it is set-based but turns out to equal an unweighted average of pairwise c-indexes. The measures are demonstrated through a case study on the prediction of six-month outcome after traumatic brain injury. In conclusion, the set-based nature and graded scoring system make the ORC an attractive measure with a simple interpretation, together with its prevalence-independence that is a natural property of a discrimination measure.
Collapse
Affiliation(s)
- Ben Van Calster
- Department of Development, and Regeneration, KU Leuven--University of Leuven, Herestraat 49 box 7003, B-3000 Leuven, Belgium.
| | | | | | | |
Collapse
|
166
|
Chen X, Ishwaran H. Random forests for genomic data analysis. Genomics 2012; 99:323-9. [PMID: 22546560 PMCID: PMC3387489 DOI: 10.1016/j.ygeno.2012.04.003] [Citation(s) in RCA: 389] [Impact Index Per Article: 32.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2012] [Revised: 04/11/2012] [Accepted: 04/14/2012] [Indexed: 11/25/2022]
Abstract
Random forests (RF) is a popular tree-based ensemble machine learning tool that is highly data adaptive, applies to "large p, small n" problems, and is able to account for correlation as well as interactions among features. This makes RF particularly appealing for high-dimensional genomic data analysis. In this article, we systematically review the applications and recent progresses of RF for genomic data, including prediction and classification, variable selection, pathway analysis, genetic association and epistasis detection, and unsupervised learning.
Collapse
Affiliation(s)
- Xi Chen
- Department of Biostatistics, Vanderbilt University, Nashville, TN 37232, USA.
| | | |
Collapse
|
167
|
Kattan MW, Gerds TA. Stages of prediction model comparison. Eur Urol 2012; 62:597-9; discussion 599-600. [PMID: 22579048 DOI: 10.1016/j.eururo.2012.04.053] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2012] [Accepted: 04/26/2012] [Indexed: 11/16/2022]
|
168
|
|
169
|
Beetz I, Schilstra C, van Luijk P, Christianen MEMC, Doornaert P, Bijl HP, Chouvalova O, van den Heuvel ER, Steenbakkers RJHM, Langendijk JA. External validation of three dimensional conformal radiotherapy based NTCP models for patient-rated xerostomia and sticky saliva among patients treated with intensity modulated radiotherapy. Radiother Oncol 2011; 105:94-100. [PMID: 22169766 DOI: 10.1016/j.radonc.2011.11.006] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2011] [Revised: 11/11/2011] [Accepted: 11/16/2011] [Indexed: 10/14/2022]
Abstract
PURPOSE The purpose of this study was to investigate the ability of predictive models for patient-rated xerostomia (XER(6M)) and sticky saliva (STIC(6M)) at 6 months after completion of primary (chemo)radiation developed in head and neck cancer patients treated with 3D-conformal radiotherapy (3D-CRT) to predict outcome in patients treated with intensity modulated radiotherapy (IMRT). METHODS AND MATERIALS Recently, we published the results of a prospective study on predictive models for patient-rated xerostomia and sticky saliva in head and neck cancer patients treated with 3D-CRT (3D-CRT based NTCP models). The 3D-CRT based model for XER(6M) consisted of three factors, including the mean parotid dose, age, and baseline xerostomia (none versus a bit). The 3D-CRT based model for STIC(6M) consisted of the mean submandibular dose, age, the mean sublingual dose, and baseline sticky saliva (none versus a bit). In the current study, a population consisting of 162 patients treated with IMRT was used to test the external validity of these 3D-CRT based models. External validity was described by the explained variation (R(2) Nagelkerke) and the Brier score. The discriminative abilities of the models were calculated using the area under the receiver operating curve (AUC) and calibration (i.e. the agreement between predicted and observed outcome) was assessed with the Hosmer-Lemeshow "goodness-of-fit" test. RESULTS Overall model performance of the 3D-CRT based predictive models for XER(6M) and STIC(6M) was significantly worse in terms of the Brier score and R(2) Nagelkerke among patients treated with IMRT. Moreover the AUC for both 3D-CRT based models in the IMRT treated patients were markedly lower. The Hosmer-Lemeshow test showed a significant disagreement for both models between predicted risk and observed outcome. CONCLUSION 3D-CRT based models for patient-rated xerostomia and sticky saliva among head and neck cancer patients treated with primary radiotherapy or chemoradiation turned out to be less valid for patients treated with IMRT. The main message from these findings is that models developed in a population treated with a specific technique cannot be generalised and extrapolated to a population treated with another technique without external validation.
Collapse
Affiliation(s)
- Ivo Beetz
- Department of Radiation Oncology University of Groningen, University Medical Center Groningen, Groningen, The Netherlands.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
170
|
Gorodeski EZ, Ishwaran H, Kogalur UB, Blackstone EH, Hsich E, Zhang ZM, Vitolins MZ, Manson JE, Curb JD, Martin LW, Prineas RJ, Lauer MS. Use of hundreds of electrocardiographic biomarkers for prediction of mortality in postmenopausal women: the Women's Health Initiative. Circ Cardiovasc Qual Outcomes 2011; 4:521-32. [PMID: 21862719 DOI: 10.1161/circoutcomes.110.959023] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
BACKGROUND- Simultaneous contribution of hundreds of electrocardiographic (ECG) biomarkers to prediction of long-term mortality in postmenopausal women with clinically normal resting ECGs is unknown. METHODS AND RESULTS- We analyzed ECGs and all-cause mortality in 33 144 women enrolled in the Women's Health Initiative trials who were without baseline cardiovascular disease or cancer and had normal ECGs by Minnesota and Novacode criteria. Four hundred and seventy-seven ECG biomarkers, encompassing global and individual ECG findings, were measured with computer algorithms. During a median follow-up of 8.1 years (range for survivors, 0.5 to 11.2 years), 1229 women died. For analyses, the cohort was randomly split into derivation (n=22 096; deaths, 819) and validation (n=11 048; deaths, 410) subsets. ECG biomarkers and demographic and clinical characteristics were simultaneously analyzed using both traditional Cox regression and random survival forest, a novel algorithmic machine-learning approach. Regression modeling failed to converge. Random survival forest variable selection yielded 20 variables that were independently predictive of long-term mortality, 14 of which were ECG biomarkers related to autonomic tone, atrial conduction, and ventricular depolarization and repolarization. CONCLUSIONS- We identified 14 ECG biomarkers from among hundreds that were associated with long-term prognosis using a novel random forest variable selection methodology. These biomarkers were related to autonomic tone, atrial conduction, ventricular depolarization, and ventricular repolarization. Quantitative ECG biomarkers have prognostic importance and may be markers of subclinical disease in apparently healthy postmenopausal women.
Collapse
|
171
|
Moisan F, Gourlet V, Mazurie JL, Dupupet JL, Houssinot J, Goldberg M, Imbernon E, Tzourio C, Elbaz A. Prediction model of Parkinson's disease based on antiparkinsonian drug claims. Am J Epidemiol 2011; 174:354-63. [PMID: 21606234 DOI: 10.1093/aje/kwr081] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Drug claims databases are increasingly available and provide opportunities to investigate epidemiologic questions. The authors used computerized drug claims databases from a social security system in 5 French districts to predict the probability that a person had Parkinson's disease (PD) based on patterns of antiparkinsonian drug (APD) use. Clinical information for a population-based sample of persons using APDs in 2007 was collected. The authors built a prediction model using demographic variables and APDs as predictors and investigated the additional predictive benefit of including information on dose and regularity of use. Among 1,114 APD users, 320 (29%) had PD and 794 (71%) had another diagnosis as determined by study neurologists. A logistic model including information on cumulative APD dose and regularity of use showed good performance (c statistic = 0.953, sensitivity = 92.5%, specificity = 86.4%). Predicted PD prevalence (among persons aged ≥18 years) was 6.66/1,000; correcting this estimate using sensitivity/specificity led to a similar figure (6.04/1,000). These data demonstrate that drug claims databases can be used to estimate the probability that a person is being treated for PD and that information on APD dose and regularity of use improves models' performances. Similar approaches could be developed for other conditions.
Collapse
|
172
|
Evaluation of SCORTEN on a cohort of patients with Stevens-Johnson syndrome and toxic epidermal necrolysis included in the RegiSCAR study. J Burn Care Res 2011; 32:237-45. [PMID: 21228709 DOI: 10.1097/bcr.0b013e31820aafbc] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The purpose of this study was to evaluate the severity-of-illness score called SCORTEN with respect to its predictive ability and by using data obtained in the RegiSCAR study, the most comprehensive European registry of patients with Stevens-Johnson syndrome (SJS) and toxic epidermal necrolysis (TEN). For advanced comparisons, an auxiliary score (AS) was defined using data obtained in a previous study. Three hundred sixty-nine patients with SJS/TEN were included in RegiSCAR between 2003 and 2005. The data needed for calculation of SCORTEN were available for 45% of patients. The score revealed a moderate predictive ability with a slight underestimation of the total number of in-hospital deaths by 11%, an area under the receiver operating characteristic curve of 0.75, and a Brier score of 0.14. Problems could be seen by analyzing subgroups such as patients with TEN. The AS was better calibrated but discriminated worse (area under the receiver operating characteristic curve: 0.72; Brier score: 0.14). With the help of a refined score derived from SCORTEN and AS, potential for a possible improvement could be demonstrated. The authors were able to show that the predictive ability of SCORTEN is acceptable. Although improvement might be possible, SCORTEN remains the tool of choice, whereas AS might be an alternative in retrospective settings with missing laboratory data.
Collapse
|
173
|
Ritter AV, Shugars DA, Bader JD. Root caries risk indicators: a systematic review of risk models. Community Dent Oral Epidemiol 2011; 38:383-97. [PMID: 20545716 DOI: 10.1111/j.1600-0528.2010.00551.x] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
OBJECTIVE To identify risk indicators that are associated with root caries incidence in published predictive risk models. METHODS Abstracts (n = 472) identified from a MEDLINE, EMBASE, and Cochrane registry search were screened independently by two investigators to exclude articles not in English (n = 39), published prior to 1970 (none), or containing no information on either root caries incidence, risk indicators, or risk models (n = 209). A full-article duplicate review of the remaining articles (n = 224) selected those reporting predictive risk models based on original/primary longitudinal root caries incidence studies. The quality of the included articles was assessed based both on selected criteria of methodological standards for observational studies and on the statistical quality of the modeling strategy. Data from these included studies were extracted and compiled into evidence tables, with information about the cohort location, incidence period, sample size, age of the study participants, risk indicators included in the model, root caries incidence, modeling strategy, significant risk indicators/predictors, and parameter estimates and statistical findings. RESULTS Thirteen articles were selected for data extraction. The overall quality of the included articles was poor to moderate. Root caries incidence ranged from 12% to 77% (mean ± SD = 45 ± 17%); follow-up time of the published studies was ≤ 10 years (range = 9; median = 3); sample size ranged from 23-723 (mean ± SD = 264 ± 203; median = 261); person-years ranged from 23 to 1540 (mean ± SD = 760 ± 556; median = 746). Variables most frequently tested and significantly associated with root caries incidence were (times tested; % significant; directionality): baseline root caries (12; 58%; positive); number of teeth (7; 71%; three times positive, twice negative), and plaque index (4; 100%; positive). Ninety-two other clinical and nonclinical variables were tested: 27 were tested three times or more and were significant between 9% and 100% of the times tested; and 65 were tested but never significant. CONCLUSIONS The root caries incidence indicators/predictors most frequently reported were root caries prevalence at baseline, number of teeth, and plaque index. This finding can guide targeted root caries prevention. There was substantial variation among published models of root caries risk in terms of variable selection, sample size, cohort location, assessment methods, incidence periods, association directionality, and analytical techniques. Future studies should emphasize variables frequently tested and often significant, and validate existing models in independent databases.
Collapse
Affiliation(s)
- André V Ritter
- Department of Operative Dentistry, University of North Carolina School of Dentistry, Chapel Hill, NC 27599-7450, USA.
| | | | | |
Collapse
|
174
|
Parast L, Cheng SC, Cai T. Incorporating short-term outcome information to predict long-term survival with discrete markers. Biom J 2011; 53:294-307. [PMID: 21337601 DOI: 10.1002/bimj.201000150] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2010] [Revised: 12/22/2010] [Accepted: 01/04/2011] [Indexed: 11/11/2022]
Abstract
In disease screening and prognosis studies, an important task is to determine useful markers for identifying high-risk subgroups. Once such markers are established, they can be incorporated into public health practice to provide appropriate strategies for treatment or disease monitoring based on each individual's predicted risk. In the recent years, genetic and biological markers have been examined extensively for their potential to signal progression or risk of disease. In addition to these markers, it has often been argued that short-term outcomes may be helpful in making a better prediction of disease outcomes in clinical practice. In this paper we propose model-free non-parametric procedures to incorporate short-term event information to improve the prediction of a long-term terminal event. We include the optional availability of a single discrete marker measurement and assess the additional information gained by including the short-term outcome. We focus on the semi-competing risk setting where the short-term event is an intermediate event that may be censored by the terminal event while the terminal event is only subject to administrative censoring. Simulation studies suggest that the proposed procedures perform well in finite samples. Our procedures are illustrated using a data set of post-dialysis patients with end-stage renal disease.
Collapse
Affiliation(s)
- Layla Parast
- Department of Biostatistics, Harvard School of Public Health, 677 Huntington Avenue, Boston, MA 02115, USA.
| | | | | |
Collapse
|
175
|
Gerds TA, van de Wiel MA. Confidence scores for prediction models. Biom J 2011; 53:259-74. [PMID: 21328604 DOI: 10.1002/bimj.201000157] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2010] [Revised: 12/01/2010] [Accepted: 12/21/2010] [Indexed: 11/06/2022]
Abstract
In medical statistics, many alternative strategies are available for building a prediction model based on training data. Prediction models are routinely compared by means of their prediction performance in independent validation data. If only one data set is available for training and validation, then rival strategies can still be compared based on repeated bootstraps of the same data. Often, however, the overall performance of rival strategies is similar and it is thus difficult to decide for one model. Here, we investigate the variability of the prediction models that results when the same modelling strategy is applied to different training sets. For each modelling strategy we estimate a confidence score based on the same repeated bootstraps. A new decomposition of the expected Brier score is obtained, as well as the estimates of population average confidence scores. The latter can be used to distinguish rival prediction models with similar prediction performances. Furthermore, on the subject level a confidence score may provide useful supplementary information for new patients who want to base a medical decision on predicted risk. The ideas are illustrated and discussed using data from cancer studies, also with high-dimensional predictor space.
Collapse
|
176
|
Porzelius C, Schumacher M, Binder H. The benefit of data-based model complexity selection via prediction error curves in time-to-event data. Comput Stat 2011. [DOI: 10.1007/s00180-011-0236-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
177
|
Multidimensionality of microarrays: statistical challenges and (im)possible solutions. Mol Oncol 2011; 5:190-6. [PMID: 21349780 DOI: 10.1016/j.molonc.2011.01.002] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2010] [Accepted: 01/27/2011] [Indexed: 11/20/2022] Open
Abstract
A typical array experiment yields at least tens of thousands of measurements on often not more than a hundred patients, a situation often denoted as the curse of dimensionality. With a focus on prognostic multi-biomarker scores derived from microarrays, we highlight the multidimensionality of the problem and the issues in the multidimensionality of the data. We go over several statistical challenges raised by this curse occurring in each step of microarray analysis on patient data, from the hypothesis and the experimental design to the analysis methods, interpretation of results and clinical utility. Different analytical tools and solutions to answer these challenges are provided and discussed.
Collapse
|
178
|
Rücker G, Schumacher M. Summary ROC curve based on a weighted Youden index for selecting an optimal cutpoint in meta-analysis of diagnostic accuracy. Stat Med 2010; 29:3069-78. [DOI: 10.1002/sim.3937] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
179
|
Lyman GH, Kuderer NM, Crawford J, Wolff DA, Culakova E, Poniewierski MS, Dale DC. Predicting individual risk of neutropenic complications in patients receiving cancer chemotherapy. Cancer 2010; 117:1917-27. [PMID: 21509769 DOI: 10.1002/cncr.25691] [Citation(s) in RCA: 162] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2010] [Revised: 08/23/2010] [Accepted: 09/02/2010] [Indexed: 11/08/2022]
Abstract
BACKGROUND A prospective cohort study was undertaken to develop and validate a risk model for neutropenic complications in cancer patients receiving chemotherapy. METHODS The study population consisted of 3760 patients with common solid tumors or malignant lymphoma who were beginning a new chemotherapy regimen at 115 practice sites throughout the United States. A regression model for neutropenic complications was developed and then validated by using a random split-sample selection process. RESULTS No significant differences in the derivation and validation populations were observed. The risk of neutropenic complications was greatest in cycle 1 with no significant difference in predicted risk between the 2 cohorts in univariate analysis. After adjustment for cancer type and age, major independent risk factors in multivariate analysis included: prior chemotherapy, abnormal hepatic and renal function, low white blood count, chemotherapy and planned delivery ≥85%. At a predicted risk cutpoint of 10%, model test performance included: sensitivity 90%, specificity 59%, and predictive value positive and negative of 34% and 96%, respectively. Further analysis confirmed model discrimination for risk of febrile neutropenia over multiple chemotherapy cycles. CONCLUSIONS A risk model for neutropenic complications was developed and validated in a large prospective cohort of patients who were beginning cancer chemotherapy that may guide the effective and cost-effective use of available supportive care.
Collapse
Affiliation(s)
- Gary H Lyman
- Department of Medicine, Duke University, Durham, North Carolina 27710, USA.
| | | | | | | | | | | | | |
Collapse
|
180
|
la Cour Freiesleben N, Gerds TA, Forman JL, Silver JD, Nyboe Andersen A, Popovic-Todorovic B. Risk charts to identify low and excessive responders among first-cycle IVF/ICSI standard patients. Reprod Biomed Online 2010; 22:50-8. [PMID: 21115267 DOI: 10.1016/j.rbmo.2010.08.010] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2010] [Revised: 07/05/2010] [Accepted: 08/25/2010] [Indexed: 10/19/2022]
Abstract
Ovarian stimulation carries a risk of either low or excessive ovarian response. The aim was to develop prognostic models for identification of standard (ovulatory and normal basal FSH) patients’ risks of low and excessive response to conventional stimulation for IVF/intracytoplasmic sperm injection. Prospectively collected data on 276 first-cycle patients treated with 150 IU recombinant FSH (rFSH)/day in a long agonist protocol were analysed. Logistic regression analysis was applied to the outcome variables:low (seven or less follicles) and excessive (20 or more follicles) response. Variables were woman’s age, menstrual cycle length, weight or body mass index, ovarian volume, antral follicle count (AFC) and basal FSH. The predictive performance of the models was evaluated from the prediction error (Brier score, %) where zero corresponds to a perfect prediction. Model stability was assessed using 1000 bootstrap cross-validation steps. The best prognostic model to predict low response included AFC and age (Brier score 7.94) and the best model to predict excessive response included AFC and cycle length (Brier score 15.82). Charts were developed to identify risks of low and excessive ovarian response. They can be used for evidence-based risk assessment before ovarian stimulation and may assist clinicians in individual dosage of their patients.
Collapse
Affiliation(s)
- N la Cour Freiesleben
- The Fertility Clinic, Department 4071, Rigshospitalet, Copenhagen University Hospital, Blegdamsvej 9, 2100 Copenhagen, Denmark.
| | | | | | | | | | | |
Collapse
|
181
|
Kopec JA, Finès P, Manuel DG, Buckeridge DL, Flanagan WM, Oderkirk J, Abrahamowicz M, Harper S, Sharif B, Okhmatovskaia A, Sayre EC, Rahman MM, Wolfson MC. Validation of population-based disease simulation models: a review of concepts and methods. BMC Public Health 2010; 10:710. [PMID: 21087466 PMCID: PMC3001435 DOI: 10.1186/1471-2458-10-710] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2010] [Accepted: 11/18/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Computer simulation models are used increasingly to support public health research and policy, but questions about their quality persist. The purpose of this article is to review the principles and methods for validation of population-based disease simulation models. METHODS We developed a comprehensive framework for validating population-based chronic disease simulation models and used this framework in a review of published model validation guidelines. Based on the review, we formulated a set of recommendations for gathering evidence of model credibility. RESULTS Evidence of model credibility derives from examining: 1) the process of model development, 2) the performance of a model, and 3) the quality of decisions based on the model. Many important issues in model validation are insufficiently addressed by current guidelines. These issues include a detailed evaluation of different data sources, graphical representation of models, computer programming, model calibration, between-model comparisons, sensitivity analysis, and predictive validity. The role of external data in model validation depends on the purpose of the model (e.g., decision analysis versus prediction). More research is needed on the methods of comparing the quality of decisions based on different models. CONCLUSION As the role of simulation modeling in population health is increasing and models are becoming more complex, there is a need for further improvements in model validation methodology and common standards for evaluating model credibility.
Collapse
Affiliation(s)
- Jacek A Kopec
- School of Population and Public Health, University of British Columbia, Vancouver, BC, Canada
- Arthritis Research Centre of Canada, Vancouver, BC, Canada
| | - Philippe Finès
- Health Analysis Division, Statistics Canada, Ottawa, ON, Canada
| | - Douglas G Manuel
- Epidemiology Division, Ottawa Health Research Institute, University of Ottawa, Ottawa, ON, Canada
| | - David L Buckeridge
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada
| | | | | | - Michal Abrahamowicz
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada
| | - Samuel Harper
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada
| | - Behnam Sharif
- School of Population and Public Health, University of British Columbia, Vancouver, BC, Canada
- Arthritis Research Centre of Canada, Vancouver, BC, Canada
| | - Anya Okhmatovskaia
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada
| | - Eric C Sayre
- Arthritis Research Centre of Canada, Vancouver, BC, Canada
| | - M Mushfiqur Rahman
- School of Population and Public Health, University of British Columbia, Vancouver, BC, Canada
- Arthritis Research Centre of Canada, Vancouver, BC, Canada
| | - Michael C Wolfson
- Department of Epidemiology and Community Medicine, University of Ottawa, Ottawa, ON, Canada
| |
Collapse
|
182
|
Vergouwe Y, Moons KGM, Steyerberg EW. External validity of risk models: Use of benchmark values to disentangle a case-mix effect from incorrect coefficients. Am J Epidemiol 2010; 172:971-80. [PMID: 20807737 DOI: 10.1093/aje/kwq223] [Citation(s) in RCA: 175] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Various performance measures related to calibration and discrimination are available for the assessment of risk models. When the validity of a risk model is assessed in a new population, estimates of the model's performance can be influenced in several ways. The regression coefficients can be incorrect, which indeed results in an invalid model. However, the distribution of patient characteristics (case mix) may also influence the performance of the model. Here the authors consider a number of typical situations that can be encountered in external validation studies. Theoretical relations between differences in development and validation samples and performance measures are studied by simulation. Benchmark values for the performance measures are proposed to disentangle a case-mix effect from incorrect regression coefficients, when interpreting the model's estimated performance in validation samples. The authors demonstrate the use of the benchmark values using data on traumatic brain injury obtained from the International Tirilazad Trial and the North American Tirilazad Trial (1991-1994).
Collapse
Affiliation(s)
- Yvonne Vergouwe
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, the Netherlands.
| | | | | |
Collapse
|
183
|
Schmid M, Hielscher T, Augustin T, Gefeller O. A Robust Alternative to the Schemper-Henderson Estimator of Prediction Error. Biometrics 2010; 67:524-35. [DOI: 10.1111/j.1541-0420.2010.01459.x] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
184
|
Validation of the high-dose heparin confirmatory step for the diagnosis of heparin-induced thrombocytopenia. Blood 2010; 116:1761-6. [PMID: 20508160 DOI: 10.1182/blood-2010-01-262659] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
The diagnosis of heparin-induced thrombocytopenia (HIT) requires detection of antibodies to the heparin/platelet factor 4 (PF4) complexes via enzyme-linked immunosorbent assay. Addition of excess heparin to the sample decreases the optical density by 50% or more and confirms the presence of these antibodies. One hundred fifteen patients with anti-heparin/PF4 antibodies detected by enzyme-linked immunosorbent assay were classified as clinically HIT-positive or HIT-negative, followed by confirmation with excess heparin. A multivariate logistic regression model was fitted to estimate relationships between patient characteristics, laboratory findings, and clinical HIT status. This model was validated on an independent sample of 97 patients with anti-heparin/PF4 antibodies. No relationship between age, race, or sex and clinical HIT status was found. Maximal optical density and confirmatory positive status independently predicted HIT in multivariate analysis. Predictive accuracy on the training set (c-index 0.78, Brier score 0.17) was maintained when the algorithm was applied to the independent validation population (c-index 0.80, Brier score 0.20). This study quantifies the clinical utility of the confirmatory test to diagnose HIT. On the basis of data from the heparin/PF4 enzyme-linked immunosorbent assay and confirmatory assays, a predictive computer algorithm could distinguish patients likely to have HIT from those who do not.
Collapse
|
185
|
Held L, Rufibach K, Balabdaoui F. A Score Regression Approach to Assess Calibration of Continuous Probabilistic Predictions. Biometrics 2010; 66:1295-305. [DOI: 10.1111/j.1541-0420.2010.01406.x] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
186
|
Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology 2010; 21:128-38. [PMID: 20010215 PMCID: PMC3575184 DOI: 10.1097/ede.0b013e3181c30fb2] [Citation(s) in RCA: 3071] [Impact Index Per Article: 219.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
The performance of prediction models can be assessed using a variety of methods and metrics. Traditional measures for binary and survival outcomes include the Brier score to indicate overall model performance, the concordance (or c) statistic for discriminative ability (or area under the receiver operating characteristic [ROC] curve), and goodness-of-fit statistics for calibration.Several new measures have recently been proposed that can be seen as refinements of discrimination measures, including variants of the c statistic for survival, reclassification tables, net reclassification improvement (NRI), and integrated discrimination improvement (IDI). Moreover, decision-analytic measures have been proposed, including decision curves to plot the net benefit achieved by making decisions based on model predictions.We aimed to define the role of these relatively novel approaches in the evaluation of the performance of prediction models. For illustration, we present a case study of predicting the presence of residual tumor versus benign tissue in patients with testicular cancer (n = 544 for model development, n = 273 for external validation).We suggest that reporting discrimination and calibration will always be important for a prediction model. Decision-analytic measures should be reported if the predictive model is to be used for clinical decisions. Other measures of performance may be warranted in specific applications, such as reclassification metrics to gain insight into the value of adding a novel predictor to an established model.
Collapse
|
187
|
Pers TH, Albrechtsen A, Holst C, Sørensen TIA, Gerds TA. The validation and assessment of machine learning: a game of prediction from high-dimensional data. PLoS One 2009; 4:e6287. [PMID: 19652722 PMCID: PMC2716515 DOI: 10.1371/journal.pone.0006287] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2009] [Accepted: 06/15/2009] [Indexed: 11/24/2022] Open
Abstract
In applied statistics, tools from machine learning are popular for analyzing complex and high-dimensional data. However, few theoretical results are available that could guide to the appropriate machine learning tool in a new application. Initial development of an overall strategy thus often implies that multiple methods are tested and compared on the same set of data. This is particularly difficult in situations that are prone to over-fitting where the number of subjects is low compared to the number of potential predictors. The article presents a game which provides some grounds for conducting a fair model comparison. Each player selects a modeling strategy for predicting individual response from potential predictors. A strictly proper scoring rule, bootstrap cross-validation, and a set of rules are used to make the results obtained with different strategies comparable. To illustrate the ideas, the game is applied to data from the Nugenob Study where the aim is to predict the fat oxidation capacity based on conventional factors and high-dimensional metabolomics data. Three players have chosen to use support vector machines, LASSO, and random forests, respectively.
Collapse
Affiliation(s)
- Tune H. Pers
- Tune H. Pers Center for Biological Sequence Analysis, Department of Systems Biology, The Technical University of Denmark, Kongens Lyngby, Denmark
| | - Anders Albrechtsen
- Anders Albrechtsen Department of Biostatistics, University of Copenhagen, Copenhagen, Denmark
| | - Claus Holst
- Claus Holst Institute of Preventive Medicine, Copenhagen University Hospitals, Center for Health and Society, Copenhagen, Denmark
| | - Thorkild I. A. Sørensen
- Thorkild I. A. Sørensen Institute of Preventive Medicine, Copenhagen University Hospitals, Center for Health and Society, Copenhagen, Denmark
| | - Thomas A. Gerds
- Thomas A. Gerds Department of Biostatistics, University of Copenhagen, Copenhagen, Denmark
- * E-mail:
| |
Collapse
|
188
|
Kohlmann M, Held L, Grunert VP. Classification of Therapy Resistance Based on Longitudinal Biomarker Profiles. Biom J 2009; 51:610-26. [DOI: 10.1002/bimj.200800157] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
189
|
Chandana S, Leung H, Trpkov K. Staging of prostate cancer using automatic feature selection, sampling and Dempster-Shafer fusion. Cancer Inform 2009; 7:57-73. [PMID: 19352459 PMCID: PMC2664701 DOI: 10.4137/cin.s819] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
A novel technique of automatically selecting the best pairs of features and sampling techniques to predict the stage of prostate cancer is proposed in this study. The problem of class imbalance, which is prominent in most medical data sets is also addressed here. Three feature subsets obtained by the use of principal components analysis (PCA), genetic algorithm (GA) and rough sets (RS) based approaches were also used in the study. The performance of under-sampling, synthetic minority over-sampling technique (SMOTE) and a combination of the two were also investigated and the performance of the obtained models was compared. To combine the classifier outputs, we used the Dempster-Shafer (DS) theory, whereas the actual choice of combined models was made using a GA. We found that the best performance for the overall system resulted from the use of under sampled data combined with rough sets based features modeled as a support vector machine (SVM).
Collapse
Affiliation(s)
- Sandeep Chandana
- Department of Electrical and Computer Engineering, University of Calgary, Calgary, Alberta, Canada
| | | | | |
Collapse
|
190
|
|