1
|
Sadatsafavi M, Petkau J. Non-parametric inference on calibration of predicted risks. Stat Med 2024; 43:3524-3538. [PMID: 38863133 DOI: 10.1002/sim.10138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 05/24/2024] [Accepted: 05/29/2024] [Indexed: 06/13/2024]
Abstract
Moderate calibration, the expected event probability among observations with predicted probability z being equal to z, is a desired property of risk prediction models. Current graphical and numerical techniques for evaluating moderate calibration of risk prediction models are mostly based on smoothing or grouping the data. As well, there is no widely accepted inferential method for the null hypothesis that a model is moderately calibrated. In this work, we discuss recently-developed, and propose novel, methods for the assessment of moderate calibration for binary responses. The methods are based on the limiting distributions of functions of standardized partial sums of prediction errors converging to the corresponding laws of Brownian motion. The novel method relies on well-known properties of the Brownian bridge which enables joint inference on mean and moderate calibration, leading to a unified "bridge" test for detecting miscalibration. Simulation studies indicate that the bridge test is more powerful, often substantially, than the alternative test. As a case study we consider a prediction model for short-term mortality after a heart attack, where we provide suggestions on graphical presentation and the interpretation of results. Moderate calibration can be assessed without requiring arbitrary grouping of data or using methods that require tuning of parameters.
Collapse
Affiliation(s)
- Mohsen Sadatsafavi
- Faculty of Pharmaceutical Sciences and Faculty of Medicine, The University of British Columbia, Vancouver, British Columbia, Canada
| | - John Petkau
- Department of Statistics, The University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
2
|
Ledger A, Ceusters J, Valentin L, Testa A, Van Holsbeke C, Franchi D, Bourne T, Froyman W, Timmerman D, Van Calster B. Multiclass risk models for ovarian malignancy: an illustration of prediction uncertainty due to the choice of algorithm. BMC Med Res Methodol 2023; 23:276. [PMID: 38001421 PMCID: PMC10668424 DOI: 10.1186/s12874-023-02103-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 11/14/2023] [Indexed: 11/26/2023] Open
Abstract
BACKGROUND Assessing malignancy risk is important to choose appropriate management of ovarian tumors. We compared six algorithms to estimate the probabilities that an ovarian tumor is benign, borderline malignant, stage I primary invasive, stage II-IV primary invasive, or secondary metastatic. METHODS This retrospective cohort study used 5909 patients recruited from 1999 to 2012 for model development, and 3199 patients recruited from 2012 to 2015 for model validation. Patients were recruited at oncology referral or general centers and underwent an ultrasound examination and surgery ≤ 120 days later. We developed models using standard multinomial logistic regression (MLR), Ridge MLR, random forest (RF), XGBoost, neural networks (NN), and support vector machines (SVM). We used nine clinical and ultrasound predictors but developed models with or without CA125. RESULTS Most tumors were benign (3980 in development and 1688 in validation data), secondary metastatic tumors were least common (246 and 172). The c-statistic (AUROC) to discriminate benign from any type of malignant tumor ranged from 0.89 to 0.92 for models with CA125, from 0.89 to 0.91 for models without. The multiclass c-statistic ranged from 0.41 (SVM) to 0.55 (XGBoost) for models with CA125, and from 0.42 (SVM) to 0.51 (standard MLR) for models without. Multiclass calibration was best for RF and XGBoost. Estimated probabilities for a benign tumor in the same patient often differed by more than 0.2 (20% points) depending on the model. Net Benefit for diagnosing malignancy was similar for algorithms at the commonly used 10% risk threshold, but was slightly higher for RF at higher thresholds. Comparing models, between 3% (XGBoost vs. NN, with CA125) and 30% (NN vs. SVM, without CA125) of patients fell on opposite sides of the 10% threshold. CONCLUSION Although several models had similarly good performance, individual probability estimates varied substantially.
Collapse
Affiliation(s)
- Ashleigh Ledger
- Department of Development and Regeneration, KU Leuven, Herestraat 49 box 805, Leuven, 3000, Belgium
| | - Jolien Ceusters
- Department of Development and Regeneration, KU Leuven, Herestraat 49 box 805, Leuven, 3000, Belgium
- Department of Oncology, Leuven Cancer Institute, Laboratory of Tumor Immunology and Immunotherapy, KU Leuven, Leuven, Belgium
| | - Lil Valentin
- Department of Obstetrics and Gynecology, Skåne University Hospital, Malmö, Sweden
- Department of Clinical Sciences Malmö, Lund University, Malmö, Sweden
| | - Antonia Testa
- Department of Woman, Child and Public Health, Fondazione Policlinico Universitario A. Gemelli IRCCS, Rome, Italy
- Dipartimento Universitario Scienze della Vita e Sanità Pubblica, Università Cattolica del Sacro Cuore, Rome, Italy
| | | | - Dorella Franchi
- Preventive Gynecology Unit, Division of Gynecology, European Institute of Oncology IRCCS, Milan, Italy
| | - Tom Bourne
- Department of Development and Regeneration, KU Leuven, Herestraat 49 box 805, Leuven, 3000, Belgium
- Department of Obstetrics and Gynecology, University Hospitals Leuven, Leuven, Belgium
- Queen Charlotte's and Chelsea Hospital, Imperial College, London, UK
| | - Wouter Froyman
- Department of Development and Regeneration, KU Leuven, Herestraat 49 box 805, Leuven, 3000, Belgium
- Department of Obstetrics and Gynecology, University Hospitals Leuven, Leuven, Belgium
| | - Dirk Timmerman
- Department of Development and Regeneration, KU Leuven, Herestraat 49 box 805, Leuven, 3000, Belgium
- Department of Obstetrics and Gynecology, University Hospitals Leuven, Leuven, Belgium
| | - Ben Van Calster
- Department of Development and Regeneration, KU Leuven, Herestraat 49 box 805, Leuven, 3000, Belgium.
- Department of Biomedical Data Sciences, Leiden University Medical Centre (LUMC), Leiden, Netherlands.
- Leuven Unit for Health Technology Assessment Research (LUHTAR), KU Leuven, Leuven, Belgium.
| |
Collapse
|
3
|
Zhang K, Jiang Y, Zeng H, Zhu H. Application and risk prediction of thrombolytic therapy in cardio-cerebrovascular diseases: a review. Thromb J 2023; 21:90. [PMID: 37667349 PMCID: PMC10476453 DOI: 10.1186/s12959-023-00532-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 08/18/2023] [Indexed: 09/06/2023] Open
Abstract
Cardiocerebrovascular diseases (CVDs) are the leading cause of death worldwide, consuming huge healthcare budget. For CVD patients, the prompt assessment and appropriate administration is the crux to save life and improve prognosis. Thrombolytic therapy, as a non-invasive approach to achieve recanalization, is the basic component of CVD treatment. Still, there are risks that limits its application. The objective of this review is to give an introduction on the utilization of thrombolytic therapy in cardiocerebrovascular blockage diseases, including coronary heart disease and ischemic stroke, and to review the development in risk assessment of thrombolytic therapy, comparing the performance of traditional scales and novel artificial intelligence-based risk assessment models.
Collapse
Affiliation(s)
- Kexin Zhang
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
| | - Yao Jiang
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
| | - Hesong Zeng
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
| | - Hongling Zhu
- Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China.
| |
Collapse
|
4
|
TUINMAN PR, DE GROOTH HJ. Elderly patients in the ICU: getting from epidemiological studies to clinical decisions. Minerva Anestesiol 2022; 88:434-435. [DOI: 10.23736/s0375-9393.22.16680-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
5
|
Consistency of ranking was evaluated as new measure for prediction model stability: longitudinal cohort study. J Clin Epidemiol 2021; 138:168-177. [PMID: 34224835 DOI: 10.1016/j.jclinepi.2021.06.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Revised: 06/17/2021] [Accepted: 06/29/2021] [Indexed: 11/22/2022]
Abstract
OBJECTIVE Clinical risk prediction models are generally assessed on population level with a lack of measures that evaluate their stability at predicting risks of individual patients. This study evaluated the use of ranking as a measure to assess individual level stability between risk prediction models. STUDY DESIGN AND SETTING A large patient cohort (3.66 million patients with 0.11 million cardiovascular events) extracted from the Clinical Practice Research Datalink was used in the exemplar of cardiovascular disease risk prediction. RESULTS It was found that 15 models (including machine learning and statistical models) had similar population-level model performance (C statistics about 0.88). For patients with high absolute risks, the models were more consistent in ranking of risk predictions (interquartile range (IQR) of differences in rank percentiles -0.6 to 1.0), but inconsistent in absolute risk (IQR of differences in absolute risk -18.8 to 9.0). At low risk, the reverse was true with inconsistent ranking but more consistent absolute risk. CONCLUSION Consistency of ranking of individual risk predictions is a useful measure to assess risk prediction models providing complementary information to absolute risk stability. Model developing guidelines including "TRIPOD" and "PROBAST" should incorporate ranking to assess individual level stability between risk prediction models.
Collapse
|
6
|
Ten Haaf K, van der Aalst CM, de Koning HJ, Kaaks R, Tammemägi MC. Personalising lung cancer screening: An overview of risk-stratification opportunities and challenges. Int J Cancer 2021; 149:250-263. [PMID: 33783822 PMCID: PMC8251929 DOI: 10.1002/ijc.33578] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 03/04/2021] [Accepted: 03/12/2021] [Indexed: 12/17/2022]
Abstract
Randomised clinical trials have shown the efficacy of computed tomography lung cancer screening, initiating discussions on whether and how to implement population‐based screening programs. Due to smoking behaviour being the primary risk‐factor for lung cancer and part of the criteria for determining screening eligibility, lung cancer screening is inherently risk‐based. In fact, the selection of high‐risk individuals has been shown to be essential in implementing lung cancer screening in a cost‐effective manner. Furthermore, studies have shown that further risk‐stratification may improve screening efficiency, allow personalisation of the screening interval and reduce health disparities. However, implementing risk‐based lung cancer screening programs also requires overcoming a number of challenges. There are indications that risk‐based approaches can negatively influence the trade‐off between individual benefits and harms if not applied thoughtfully. Large‐scale implementation of targeted, risk‐based screening programs has been limited thus far. Consequently, questions remain on how to efficiently identify and invite high‐risk individuals from the general population. Finally, while risk‐based approaches may increase screening program efficiency, efficiency should be balanced with the overall impact of the screening program. In this review, we will address the opportunities and challenges in applying risk‐stratification in different aspects of lung cancer screening programs, as well as the balance between screening program efficiency and impact.
Collapse
Affiliation(s)
- Kevin Ten Haaf
- Department of Public Health, Erasmus MC-University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - Carlijn M van der Aalst
- Department of Public Health, Erasmus MC-University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - Harry J de Koning
- Department of Public Health, Erasmus MC-University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - Rudolf Kaaks
- Division of Cancer Epidemiology, German Cancer Research Center (DKFZ), Heidelberg, Germany.,Translational Lung Research Center (TLRC) Heidelberg, Member of the German Center for Lung Research (DZL), Heidelberg, Germany
| | - Martin C Tammemägi
- Department of Health Sciences, Brock University, St. Catharines, Ontario, Canada
| |
Collapse
|
7
|
Pate A, Emsley R, Sperrin M, Martin GP, van Staa T. Impact of sample size on the stability of risk scores from clinical prediction models: a case study in cardiovascular disease. Diagn Progn Res 2020; 4:14. [PMID: 32944655 PMCID: PMC7487849 DOI: 10.1186/s41512-020-00082-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Accepted: 08/12/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Stability of risk estimates from prediction models may be highly dependent on the sample size of the dataset available for model derivation. In this paper, we evaluate the stability of cardiovascular disease risk scores for individual patients when using different sample sizes for model derivation; such sample sizes include those similar to models recommended in the national guidelines, and those based on recently published sample size formula for prediction models. METHODS We mimicked the process of sampling N patients from a population to develop a risk prediction model by sampling patients from the Clinical Practice Research Datalink. A cardiovascular disease risk prediction model was developed on this sample and used to generate risk scores for an independent cohort of patients. This process was repeated 1000 times, giving a distribution of risks for each patient. N = 100,000, 50,000, 10,000, N min (derived from sample size formula) and N epv10 (meets 10 events per predictor rule) were considered. The 5-95th percentile range of risks across these models was used to evaluate instability. Patients were grouped by a risk derived from a model developed on the entire population (population-derived risk) to summarise results. RESULTS For a sample size of 100,000, the median 5-95th percentile range of risks for patients across the 1000 models was 0.77%, 1.60%, 2.42% and 3.22% for patients with population-derived risks of 4-5%, 9-10%, 14-15% and 19-20% respectively; for N = 10,000, it was 2.49%, 5.23%, 7.92% and 10.59%, and for N using the formula-derived sample size, it was 6.79%, 14.41%, 21.89% and 29.21%. Restricting this analysis to models with high discrimination, good calibration or small mean absolute prediction error reduced the percentile range, but high levels of instability remained. CONCLUSIONS Widely used cardiovascular disease risk prediction models suffer from high levels of instability induced by sampling variation. Many models will also suffer from overfitting (a closely linked concept), but at acceptable levels of overfitting, there may still be high levels of instability in individual risk. Stability of risk estimates should be a criterion when determining the minimum sample size to develop models.
Collapse
Affiliation(s)
- Alexander Pate
- grid.5379.80000000121662407Centre for Health Informatics, School of Health Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Oxford Road, Manchester, M13 9PL UK
| | - Richard Emsley
- grid.13097.3c0000 0001 2322 6764Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, De Crispigny Park, London, SE5 8AF UK
| | - Matthew Sperrin
- grid.5379.80000000121662407Centre for Health Informatics, School of Health Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Oxford Road, Manchester, M13 9PL UK
| | - Glen P. Martin
- grid.5379.80000000121662407Centre for Health Informatics, School of Health Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Oxford Road, Manchester, M13 9PL UK
| | - Tjeerd van Staa
- grid.5379.80000000121662407Centre for Health Informatics, School of Health Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Oxford Road, Manchester, M13 9PL UK
| |
Collapse
|
8
|
ten Haaf K, Jeon J, Tammemägi MC, Han SS, Kong CY, Plevritis SK, Feuer EJ, de Koning HJ, Steyerberg EW, Meza R. Risk prediction models for selection of lung cancer screening candidates: A retrospective validation study. PLoS Med 2017; 14:e1002277. [PMID: 28376113 PMCID: PMC5380315 DOI: 10.1371/journal.pmed.1002277] [Citation(s) in RCA: 186] [Impact Index Per Article: 26.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Accepted: 02/27/2017] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Selection of candidates for lung cancer screening based on individual risk has been proposed as an alternative to criteria based on age and cumulative smoking exposure (pack-years). Nine previously established risk models were assessed for their ability to identify those most likely to develop or die from lung cancer. All models considered age and various aspects of smoking exposure (smoking status, smoking duration, cigarettes per day, pack-years smoked, time since smoking cessation) as risk predictors. In addition, some models considered factors such as gender, race, ethnicity, education, body mass index, chronic obstructive pulmonary disease, emphysema, personal history of cancer, personal history of pneumonia, and family history of lung cancer. METHODS AND FINDINGS Retrospective analyses were performed on 53,452 National Lung Screening Trial (NLST) participants (1,925 lung cancer cases and 884 lung cancer deaths) and 80,672 Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial (PLCO) ever-smoking participants (1,463 lung cancer cases and 915 lung cancer deaths). Six-year lung cancer incidence and mortality risk predictions were assessed for (1) calibration (graphically) by comparing the agreement between the predicted and the observed risks, (2) discrimination (area under the receiver operating characteristic curve [AUC]) between individuals with and without lung cancer (death), and (3) clinical usefulness (net benefit in decision curve analysis) by identifying risk thresholds at which applying risk-based eligibility would improve lung cancer screening efficacy. To further assess performance, risk model sensitivities and specificities in the PLCO were compared to those based on the NLST eligibility criteria. Calibration was satisfactory, but discrimination ranged widely (AUCs from 0.61 to 0.81). The models outperformed the NLST eligibility criteria over a substantial range of risk thresholds in decision curve analysis, with a higher sensitivity for all models and a slightly higher specificity for some models. The PLCOm2012, Bach, and Two-Stage Clonal Expansion incidence models had the best overall performance, with AUCs >0.68 in the NLST and >0.77 in the PLCO. These three models had the highest sensitivity and specificity for predicting 6-y lung cancer incidence in the PLCO chest radiography arm, with sensitivities >79.8% and specificities >62.3%. In contrast, the NLST eligibility criteria yielded a sensitivity of 71.4% and a specificity of 62.2%. Limitations of this study include the lack of identification of optimal risk thresholds, as this requires additional information on the long-term benefits (e.g., life-years gained and mortality reduction) and harms (e.g., overdiagnosis) of risk-based screening strategies using these models. In addition, information on some predictor variables included in the risk prediction models was not available. CONCLUSIONS Selection of individuals for lung cancer screening using individual risk is superior to selection criteria based on age and pack-years alone. The benefits, harms, and feasibility of implementing lung cancer screening policies based on risk prediction models should be assessed and compared with those of current recommendations.
Collapse
Affiliation(s)
- Kevin ten Haaf
- Department of Public Health, Erasmus MC University Medical Center Rotterdam, Rotterdam, the Netherlands
- * E-mail: (KtH); (RM)
| | - Jihyoun Jeon
- Department of Epidemiology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Martin C. Tammemägi
- Department of Health Sciences, Brock University, St. Catharines, Ontario, Canada
| | - Summer S. Han
- Department of Radiology, Stanford University, Palo Alto, California, United States of America
- Department of Medicine, Stanford University, Palo Alto, California, United States of America
| | - Chung Yin Kong
- Department of Radiology, Massachusetts General Hospital, Boston, Massachusetts, United States of America
| | - Sylvia K. Plevritis
- Department of Radiology, Stanford University, Palo Alto, California, United States of America
| | - Eric J. Feuer
- Division of Cancer Prevention, National Cancer Institute, Bethesda, Maryland, United States of America
| | - Harry J. de Koning
- Department of Public Health, Erasmus MC University Medical Center Rotterdam, Rotterdam, the Netherlands
| | - Ewout W. Steyerberg
- Department of Public Health, Erasmus MC University Medical Center Rotterdam, Rotterdam, the Netherlands
| | - Rafael Meza
- Department of Epidemiology, University of Michigan, Ann Arbor, Michigan, United States of America
- * E-mail: (KtH); (RM)
| |
Collapse
|
9
|
Matsubara Y, Kimachi M, Fukuma S, Onishi Y, Fukuhara S. Development of a new risk model for predicting cardiovascular events among hemodialysis patients: Population-based hemodialysis patients from the Japan Dialysis Outcome and Practice Patterns Study (J-DOPPS). PLoS One 2017; 12:e0173468. [PMID: 28273175 PMCID: PMC5342257 DOI: 10.1371/journal.pone.0173468] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2016] [Accepted: 02/22/2017] [Indexed: 01/30/2023] Open
Abstract
Background Cardiovascular (CV) events are the primary cause of death and becoming bedridden among hemodialysis (HD) patients. The Framingham risk score (FRS) is useful for predicting incidence of CV events in the general population, but is considerd to be unsuitable for the prediction of the incidence of CV events in HD patients, given their characteristics due to atypical relationships between conventional risk factors and outcomes. We therefore aimed to develop a new prognostic prediction model for prevention and early detection of CV events among hemodialysis patients. Methods We enrolled 3,601 maintenance HD patients based on their data from the Japan Dialysis Outcomes and Practice Patterns Study (J-DOPPS), phases 3 and 4. We longitudinaly assessed the association between several potential candidate predictors and composite CV events in the year after study initiation. Potential candidate predictors included the component factors of FRS and other HD-specific risk factors. We used multivariable logistic regression with backward stepwise selection to develop our new prediction model and generated a calibration plot. Additinially, we performed bootstrapping to assess the internal validity. Results We observed 328 composite CV events during 1-year follow-up. The final prediction model contained six variables: age, diabetes status, history of CV events, dialysis time per session, and serum phosphorus and albumin levels. The new model showed significantly better discrimination than the FRS, in both men (c-statistics: 0.76 for new model, 0.64 for FRS) and women (c-statistics: 0.77 for new model, 0.60 for FRS). Additionally, we confirmed the consistency between the observed results and predicted results using the calibration plot. Further, we found similar discrimination and calibration to the derivation model in the bootstrapping cohort. Conclusions We developed a new risk model consisting of only six predictors. Our new model predicted CV events more accurately than the FRS.
Collapse
Affiliation(s)
- Yukiko Matsubara
- Department of Artificial Organs, Akane-Foundation Omachi Tsuchiya Clinic, and Hiroshima Medical University, Hiroshima, Japan
| | - Miho Kimachi
- Department of Healthcare Epidemiology, School of Public Health in the Graduate School of Medicine, Kyoto University, Kyoto, Japan
- Institute for Health Outcomes and Process Evaluation Research (iHope International), Kyoto, Japan
| | - Shingo Fukuma
- Department of Healthcare Epidemiology, School of Public Health in the Graduate School of Medicine, Kyoto University, Kyoto, Japan
- Institute for Health Outcomes and Process Evaluation Research (iHope International), Kyoto, Japan
- Center for Innovative Research for Communities and Clinical Excellence, Fukushima Medical University, Fukushima, Japan
| | - Yoshihiro Onishi
- Institute for Health Outcomes and Process Evaluation Research (iHope International), Kyoto, Japan
| | - Shunichi Fukuhara
- Department of Healthcare Epidemiology, School of Public Health in the Graduate School of Medicine, Kyoto University, Kyoto, Japan
- Center for Innovative Research for Communities and Clinical Excellence, Fukushima Medical University, Fukushima, Japan
| |
Collapse
|
10
|
Steyerberg EW, Vergouwe Y. Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur Heart J 2014; 35:1925-31. [PMID: 24898551 DOI: 10.1093/eurheartj/ehu207] [Citation(s) in RCA: 1102] [Impact Index Per Article: 110.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Clinical prediction models provide risk estimates for the presence of disease (diagnosis) or an event in the future course of disease (prognosis) for individual patients. Although publications that present and evaluate such models are becoming more frequent, the methodology is often suboptimal. We propose that seven steps should be considered in developing prediction models: (i) consideration of the research question and initial data inspection; (ii) coding of predictors; (iii) model specification; (iv) model estimation; (v) evaluation of model performance; (vi) internal validation; and (vii) model presentation. The validity of a prediction model is ideally assessed in fully independent data, where we propose four key measures to evaluate model performance: calibration-in-the-large, or the model intercept (A); calibration slope (B); discrimination, with a concordance statistic (C); and clinical usefulness, with decision-curve analysis (D). As an application, we develop and validate prediction models for 30-day mortality in patients with an acute myocardial infarction. This illustrates the usefulness of the proposed framework to strengthen the methodological rigour and quality for prediction models in cardiovascular research.
Collapse
Affiliation(s)
- Ewout W Steyerberg
- Department of Public Health, Erasmus MC, University Medical Center Rotterdam, PO Box 2040, 3000 CA Rotterdam, The Netherlands
| | - Yvonne Vergouwe
- Department of Public Health, Erasmus MC, University Medical Center Rotterdam, PO Box 2040, 3000 CA Rotterdam, The Netherlands
| |
Collapse
|
11
|
Paganetti H, van Luijk P. Biological considerations when comparing proton therapy with photon therapy. Semin Radiat Oncol 2013; 23:77-87. [PMID: 23473684 DOI: 10.1016/j.semradonc.2012.11.002] [Citation(s) in RCA: 67] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Owing to the limited availability of data on the outcome of proton therapy, treatments are generally optimized based on broadly available data on photon-based treatments. However, the microscopic pattern of energy deposition of protons differs from that of photons, leading to a different biological effect. Consequently, proton therapy needs a correction factor (relative biological effectiveness) to relate proton doses to photon doses, and currently, a generic value is used. Moreover, the macroscopic distribution of dose in proton therapy differs compared with photon treatments. Although this may offer new opportunities to reduce dose to normal tissues, it raises the question whether data obtained from photon-based treatments offer sufficient information on dose-volume effects to optimally use unique features of protons. In addition, there are potential differences in late effects due to low doses of secondary radiation outside the volume irradiated by the primary beam. This article discusses the controversies associated with these 3 issues when comparing proton and photon therapy.
Collapse
Affiliation(s)
- Harald Paganetti
- Department of Radiation Oncology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA.
| | | |
Collapse
|
12
|
The EuroSCORE and a local model consistently predicted coronary surgery mortality and showed complementary properties. J Clin Epidemiol 2008; 61:663-70. [DOI: 10.1016/j.jclinepi.2006.10.025] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2006] [Revised: 08/04/2006] [Accepted: 10/02/2006] [Indexed: 11/18/2022]
|
13
|
Steyerberg EW, Eijkemans MJC, Boersma E, Habbema JDF. Applicability of clinical prediction models in acute myocardial infarction: a comparison of traditional and empirical Bayes adjustment methods. Am Heart J 2005; 150:920. [PMID: 16290963 DOI: 10.1016/j.ahj.2005.07.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/09/2005] [Accepted: 07/12/2005] [Indexed: 11/29/2022]
Abstract
INTRODUCTION Several clinical prediction models have been developed to predict outcome after acute myocardial infarction. Updating to local circumstances may be required to make such models better applicable. We aimed to compare traditional and empirical Bayes (EB) methods to perform such updating. METHODS We focused on 16 geographical regions within the GUSTO-I trial, which included 40,830 patients with acute myocardial infarction; of whom, 2851 (7.0%) had died by 30 days. Differences in mortality between regions were studied with traditional adjustment for case mix in logistic regression models and with EB methods. These methods updated predictions for new patients while accounting for the uncertainty in the traditionally estimated mortality differences. RESULTS The case mix in the regions differed with respect to important predictive characteristics such as age, presence of shock, and anterior infarct location (all P < .001). These differences did not explain regional differences in 30-day mortality, which varied between 80% and 120% with traditional analyses (P < .01). The EB estimates for regional differences were much smaller (between 93% and 107%). CONCLUSIONS Statistically significant differences in case mix and 30-day mortality were noted between geographical regions. The practical implications of this heterogeneity were, however, limited when model predictions were updated with EB methods.
Collapse
Affiliation(s)
- Ewout W Steyerberg
- Department of Public Health, Erasmus MC, University Medical Center Rotterdam, Rotterdam, The Netherlands.
| | | | | | | |
Collapse
|