Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Franklin JM, Schneeweiss S, Polinski JM, Rassen JA. Plasmode simulation for the evaluation of pharmacoepidemiologic methods in complex healthcare databases. Comput Stat Data Anal 2014;72:219-226. [PMID: 24587587 DOI: 10.1016/j.csda.2013.10.018] [Citation(s) in RCA: 81] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]

For:	Franklin JM, Schneeweiss S, Polinski JM, Rassen JA. Plasmode simulation for the evaluation of pharmacoepidemiologic methods in complex healthcare databases. Comput Stat Data Anal 2014;72:219-226. [PMID: 24587587 DOI: 10.1016/j.csda.2013.10.018] [Citation(s) in RCA: 81] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]

Number

Cited by Other Article(s)

Wan W, Murugesan M, Nocon RS, Bolton J, Konetzka RT, Chin MH, Huang ES. Comparison of two propensity score-based methods for balancing covariates: the overlap weighting and fine stratification methods in real-world claims data. BMC Med Res Methodol 2024;24:122. [PMID: 38831393 PMCID: PMC11145799 DOI: 10.1186/s12874-024-02228-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 04/23/2024] [Indexed: 06/05/2024] Open

Abstract

BACKGROUND

Two propensity score (PS) based balancing covariate methods, the overlap weighting method (OW) and the fine stratification method (FS), produce superb covariate balance. OW has been compared with various weighting methods while FS has been compared with the traditional stratification method and various matching methods. However, no study has yet compared OW and FS. In addition, OW has not yet been evaluated in large claims data with low prevalence exposure and with low frequency outcomes, a context in which optimal use of balancing methods is critical. In the study, we aimed to compare OW and FS using real-world data and simulations with low prevalence exposure and with low frequency outcomes.

METHODS

We used the Texas State Medicaid claims data on adult beneficiaries with diabetes in 2012 as an empirical example (N = 42,628). Based on its real-world research question, we estimated an average treatment effect of health center vs. non-health center attendance in the total population. We also performed simulations to evaluate their relative performance. To preserve associations between covariates, we used the plasmode approach to simulate outcomes and/or exposures with N = 4,000. We simulated both homogeneous and heterogeneous treatment effects with various outcome risks (1-30% or observed: 27.75%) and/or exposure prevalence (2.5-30% or observed:10.55%). We used a weighted generalized linear model to estimate the exposure effect and the cluster-robust standard error (SE) method to estimate its SE.

RESULTS

In the empirical example, we found that OW had smaller standardized mean differences in all covariates (range: OW: 0.0-0.02 vs. FS: 0.22-3.26) and Mahalanobis balance distance (MB) (< 0.001 vs. > 0.049) than FS. In simulations, OW also achieved smaller MB (homogeneity: <0.04 vs. > 0.04; heterogeneity: 0.0-0.11 vs. 0.07-0.29), relative bias (homogeneity: 4.04-56.20 vs. 20-61.63; heterogeneity: 7.85-57.6 vs. 15.0-60.4), square root of mean squared error (homogeneity: 0.332-1.308 vs. 0.385-1.365; heterogeneity: 0.263-0.526 vs 0.313-0.620), and coverage probability (homogeneity: 0.0-80.4% vs. 0.0-69.8%; heterogeneity: 0.0-97.6% vs. 0.0-92.8%), than FS, in most cases.

CONCLUSIONS

These findings suggest that OW can yield nearly perfect covariate balance and therefore enhance the accuracy of average treatment effect estimation in the total population.

Collapse

Weberpals J, Raman SR, Shaw PA, Lee H, Russo M, Hammill BG, Toh S, Connolly JG, Dandreo KJ, Tian F, Liu W, Li J, Hernández-Muñoz JJ, Glynn RJ, Desai RJ. A Principled Approach to Characterize and Analyze Partially Observed Confounder Data from Electronic Health Records. Clin Epidemiol 2024;16:329-343. [PMID: 38798915 PMCID: PMC11127690 DOI: 10.2147/clep.s436131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Accepted: 04/09/2024] [Indexed: 05/29/2024] Open

Abstract

Objective

Partially observed confounder data pose challenges to the statistical analysis of electronic health records (EHR) and systematic assessments of potentially underlying missingness mechanisms are lacking. We aimed to provide a principled approach to empirically characterize missing data processes and investigate performance of analytic methods.

Methods

Three empirical sub-cohorts of diabetic SGLT2 or DPP4-inhibitor initiators with complete information on HbA1c, BMI and smoking as confounders of interest (COI) formed the basis of data simulation under a plasmode framework. A true null treatment effect, including the COI in the outcome generation model, and four missingness mechanisms for the COI were simulated: completely at random (MCAR), at random (MAR), and two not at random (MNAR) mechanisms, where missingness was dependent on an unmeasured confounder and on the value of the COI itself. We evaluated the ability of three groups of diagnostics to differentiate between mechanisms: 1)-differences in characteristics between patients with or without the observed COI (using averaged standardized mean differences [ASMD]), 2)-predictive ability of the missingness indicator based on observed covariates, and 3)-association of the missingness indicator with the outcome. We then compared analytic methods including "complete case", inverse probability weighting, single and multiple imputation in their ability to recover true treatment effects.

Results

The diagnostics successfully identified characteristic patterns of simulated missingness mechanisms. For MAR, but not MCAR, the patient characteristics showed substantial differences (median ASMD 0.20 vs 0.05) and consequently, discrimination of the prediction models for missingness was also higher (0.59 vs 0.50). For MNAR, but not MAR or MCAR, missingness was significantly associated with the outcome even in models adjusting for other observed covariates. Comparing analytic methods, multiple imputation using a random forest algorithm resulted in the lowest root-mean-squared-error.

Conclusion

Principled diagnostics provided reliable insights into missingness mechanisms. When assumptions allow, multiple imputation with nonparametric models could help reduce bias.

Collapse

Affiliation(s)

Janick Weberpals Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
Sudha R Raman Department of Population Health Sciences, Duke University School of Medicine, Durham, NC, USA
Pamela A Shaw Biostatistics Division, Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
Hana Lee Office of Biostatistics, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
Massimiliano Russo Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
Bradley G Hammill Department of Population Health Sciences, Duke University School of Medicine, Durham, NC, USA
Sengwee Toh Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA, USA
John G Connolly Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA, USA
Kimberly J Dandreo Department of Population Medicine, Harvard Pilgrim Health Care Institute, Boston, MA, USA
Fang Tian Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
Wei Liu Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
Jie Li Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
José J Hernández-Muñoz Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
Robert J Glynn Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
Rishi J Desai Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA

Collapse

Martin GL, Petri C, Rozenberg J, Simon N, Hajage D, Kirchgesner J, Tubach F, Létinier L, Dechartres A. A methodological review of the high-dimensional propensity score in comparative-effectiveness and safety-of-interventions research finds incomplete reporting relative to algorithm development and robustness. J Clin Epidemiol 2024;169:111305. [PMID: 38417583 DOI: 10.1016/j.jclinepi.2024.111305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 02/14/2024] [Accepted: 02/20/2024] [Indexed: 03/01/2024]

Brooks TG, Lahens NF, Mrčela A, Grant GR. Challenges and best practices in omics benchmarking. Nat Rev Genet 2024;25:326-339. [PMID: 38216661 DOI: 10.1038/s41576-023-00679-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/14/2023] [Indexed: 01/14/2024]

Schreck N, Slynko A, Saadati M, Benner A. Statistical plasmode simulations-Potentials, challenges and recommendations. Stat Med 2024;43:1804-1825. [PMID: 38356231 DOI: 10.1002/sim.10012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 12/18/2023] [Accepted: 01/02/2024] [Indexed: 02/16/2024]

DiPrete BL, Girman CJ, Mavros P, Breskin A, Brookhart MA. Characterizing Imbalance in the Tails of the Propensity Score Distribution. Am J Epidemiol 2024;193:389-403. [PMID: 37830395 DOI: 10.1093/aje/kwad200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2021] [Revised: 07/13/2023] [Accepted: 10/05/2023] [Indexed: 10/14/2023] Open

Friedrich S, Friede T. On the role of benchmarking data sets and simulations in method comparison studies. Biom J 2024;66:e2200212. [PMID: 36810737 DOI: 10.1002/bimj.202200212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 01/26/2023] [Accepted: 02/01/2023] [Indexed: 02/24/2023]

Ayilara OF, Platt RW, Dahl M, Coulombe J, Ginestet PG, Chateau D, Lix LM. Generating synthetic data from administrative health records for drug safety and effectiveness studies. Int J Popul Data Sci 2023;8:2176. [PMID: 38414538 PMCID: PMC10898503 DOI: 10.23889/ijpds.v8i1.2176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/29/2024] Open

Abstract

Introduction

Administrative health records (AHRs) are used to conduct population-based post-market drug safety and comparative effectiveness studies to inform healthcare decision making. However, the cost of data extraction, and the challenges associated with privacy and securing approvals can make it challenging for researchers to conduct methodological research in a timely manner using real data. Generating synthetic AHRs that reasonably represent the real-world data are beneficial for developing analytic methods and training analysts to rapidly implement study protocols. We generated synthetic AHRs using two methods and compared these synthetic AHRs to real-world AHRs. We described the challenges associated with using synthetic AHRs for real-world study.

Methods

The real-world AHRs comprised prescription drug records for individuals with healthcare insurance coverage in the Population Research Data Repository (PRDR) from Manitoba, Canada for the 10-year period from 2008 to 2017. Synthetic data were generated using the Observational Medical Dataset Simulator II (OSIM2) and a modification (ModOSIM). Synthetic and real-world data were described using frequencies and percentages. Agreement of prescription drug use measures in PRDR, OSIM2 and ModOSIM was estimated with the concordance coefficient.

Results

The PRDR cohort included 169,586,633 drug records and 1,395 drug types for 1,604,734 individuals. Synthetic data for 1,000,000 individuals were generated using OSIM2 and ModOSIM. Sex and age group distributions were similar in the real-world and synthetic AHRs. However, there were significant differences in the number of drug records and number of unique drugs per person for OSIM2 and ModOSIM when compared with PRDR. For the average number of days of drug use, concordance with the PRDR was 16% (95% confidence interval [CI]: 12%-19%) for OSIM2 and 88% (95% CI: 87%-90%) for ModOSIM.

Conclusions

ModOSIM data were more similar to PRDR than OSIM2 data on many measures. Synthetic AHRs consistent with those found in real-world settings can be generated using ModOSIM. Synthetic data will benefit rapid implementation of methodological studies and data analyst training.

Collapse

Souli Y, Trudel X, Diop A, Brisson C, Talbot D. Longitudinal plasmode algorithms to evaluate statistical methods in realistic scenarios: an illustration applied to occupational epidemiology. BMC Med Res Methodol 2023;23:242. [PMID: 37853309 PMCID: PMC10585912 DOI: 10.1186/s12874-023-02062-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Accepted: 10/09/2023] [Indexed: 10/20/2023] Open

Abstract

INTRODUCTION

Plasmode simulations are a type of simulations that use real data to determine the synthetic data-generating equations. Such simulations thus allow evaluating statistical methods under realistic conditions. As far as we know, no plasmode algorithm has been proposed for simulating longitudinal data. In this paper, we propose a longitudinal plasmode framework to generate realistic data with both a time-varying exposure and time-varying covariates. This work was motivated by the objective of comparing different methods for estimating the causal effect of a cumulative exposure to psychosocial stressors at work over time.

METHODS

We developed two longitudinal plasmode algorithms: a parametric and a nonparametric algorithms. Data from the PROspective Québec (PROQ) Study on Work and Health were used as an input to generate data with the proposed plasmode algorithms. We evaluated the performance of multiple estimators of the parameters of marginal structural models (MSMs): inverse probability of treatment weighting, g-computation and targeted maximum likelihood estimation. These estimators were also compared to standard regression approaches with either adjustment for baseline covariates only or with adjustment for both baseline and time-varying covariates.

RESULTS

Standard regression methods were susceptible to yield biased estimates with confidence intervals having coverage probability lower than their nominal level. The bias was much lower and coverage of confidence intervals was much closer to the nominal level when considering MSMs. Among MSM estimators, g-computation overall produced the best results relative to bias, root mean squared error and coverage of confidence intervals. No method produced unbiased estimates with adequate coverage for all parameters in the more realistic nonparametric plasmode simulation.

CONCLUSION

The proposed longitudinal plasmode algorithms can be important methodological tools for evaluating and comparing analytical methods in realistic simulation scenarios. To facilitate the use of these algorithms, we provide R functions on GitHub. We also recommend using MSMs when estimating the effect of cumulative exposure to psychosocial stressors at work.

Collapse

Oh IS, Jeong HE, Lee H, Filion KB, Noh Y, Shin JY. Validating an approach to overcome the immeasurable time bias in cohort studies: a real-world example and Monte Carlo simulation study. Int J Epidemiol 2023;52:1534-1544. [PMID: 37172269 DOI: 10.1093/ije/dyad049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 02/03/2023] [Accepted: 04/18/2023] [Indexed: 05/14/2023] Open

Williamson BD, Wyss R, Stuart EA, Dang LE, Mertens AN, Neugebauer RS, Wilson A, Gruber S. An application of the Causal Roadmap in two safety monitoring case studies: Causal inference and outcome prediction using electronic health record data. J Clin Transl Sci 2023;7:e208. [PMID: 37900347 PMCID: PMC10603358 DOI: 10.1017/cts.2023.632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 09/12/2023] [Accepted: 09/13/2023] [Indexed: 10/31/2023] Open

Vader DT, Mamtani R, Li Y, Griffith SD, Calip GS, Hubbard RA. Inverse Probability of Treatment Weighting and Confounder Missingness in Electronic Health Record-based Analyses: A Comparison of Approaches Using Plasmode Simulation. Epidemiology 2023;34:520-530. [PMID: 37155612 PMCID: PMC10231933 DOI: 10.1097/ede.0000000000001618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Accepted: 03/22/2023] [Indexed: 05/10/2023]

Abstract

BACKGROUND

Electronic health record (EHR) data represent a critical resource for comparative effectiveness research, allowing investigators to study intervention effects in real-world settings with large patient samples. However, high levels of missingness in confounder variables is common, challenging the perceived validity of EHR-based investigations.

METHODS

We investigated performance of multiple imputation and propensity score (PS) calibration when conducting inverse probability of treatment weights (IPTW)-based comparative effectiveness research using EHR data with missingness in confounder variables and outcome misclassification. Our motivating example compared effectiveness of immunotherapy versus chemotherapy treatment of advanced bladder cancer with missingness in a key prognostic variable. We captured complexity in EHR data structures using a plasmode simulation approach to spike investigator-defined effects into resamples of a cohort of 4361 patients from a nationwide deidentified EHR-derived database. We characterized statistical properties of IPTW hazard ratio estimates when using multiple imputation or PS calibration missingness approaches.

RESULTS

Multiple imputation and PS calibration performed similarly, maintaining ≤0.05 absolute bias in the marginal hazard ratio even when ≥50% of subjects had missing at random or missing not at random confounder data. Multiple imputation required greater computational resources, taking nearly 40 times as long as PS calibration to complete. Outcome misclassification minimally increased bias of both methods.

CONCLUSION

Our results support multiple imputation and PS calibration approaches to missingness in missing completely at random or missing at random confounder variables in EHR-based IPTW comparative effectiveness analyses, even with missingness ≥50%. PS calibration represents a computationally efficient alternative to multiple imputation.

Collapse

Sarayani A, Brown JD, Hampp C, Donahoo WT, Winterstein AG. Adaptability of High Dimensional Propensity Score Procedure in the Transition from ICD-9 to ICD-10 in the US Healthcare System. Clin Epidemiol 2023;15:645-660. [PMID: 37274833 PMCID: PMC10237200 DOI: 10.2147/clep.s405165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 04/20/2023] [Indexed: 06/07/2023] Open

Laurent T, Lambrelli D, Wakabayashi R, Hirano T, Kuwatsuru R. Strategies to Address Current Challenges in Real-World Evidence Generation in Japan. Drugs Real World Outcomes 2023:10.1007/s40801-023-00371-5. [PMID: 37178273 PMCID: PMC10182751 DOI: 10.1007/s40801-023-00371-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/19/2023] [Indexed: 05/15/2023] Open

Getz K, Hubbard RA, Linn KA. Performance of Multiple Imputation Using Modern Machine Learning Methods in Electronic Health Records Data. Epidemiology 2023;34:206-215. [PMID: 36722803 DOI: 10.1097/ede.0000000000001578] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]

Abstract

BACKGROUND

Missing data are common in studies using electronic health records (EHRs)-derived data. Missingness in EHR data is related to healthcare utilization patterns, resulting in complex and potentially missing not at random missingness mechanisms. Prior research has suggested that machine learning-based multiple imputation methods may outperform traditional methods and may perform well even in settings of missing not at random missingness.

METHODS

We used plasmode simulations based on a nationwide EHR-derived de-identified database for patients with metastatic urothelial carcinoma to compare the performance of multiple imputation using chained equations, random forests, and denoising autoencoders in terms of bias and precision of hazard ratio estimates under varying proportions of observations with missing values and missingness mechanisms (missing completely at random, missing at random, and missing not at random).

RESULTS

Multiple imputation by chained equations and random forest methods had low bias and similar standard errors for parameter estimates under missingness completely at random. Under missingness at random, denoising autoencoders had higher bias than multiple imputation by chained equations and random forests. Contrary to results of prior studies of denoising autoencoders, all methods exhibited substantial bias under missingness not at random, with bias increasing in direct proportion to the amount of missing data.

CONCLUSIONS

We found no advantage of denoising autoencoders for multiple imputation in the setting of an epidemiologic study conducted using EHR data. Results suggested that denoising autoencoders may overfit the data leading to poor confounder control. Use of more flexible imputation approaches does not mitigate bias induced by missingness not at random and can produce estimates with spurious precision.

Collapse

Targeted learning: Towards a future informed by real-world evidence. Stat Biopharm Res 2023. [DOI: 10.1080/19466315.2023.2182356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]

Abrahamowicz M, Beauchamp ME, Moura CS, Bernatsky S, Ferreira Guerra S, Danieli C. Adapting SIMEX to correct for bias due to interval-censored outcomes in survival analysis with time-varying exposure. Biom J 2022;64:1467-1485. [PMID: 36065586 DOI: 10.1002/bimj.202100013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Revised: 05/16/2022] [Accepted: 05/28/2022] [Indexed: 12/14/2022]

Weinstein SM, Vandekar SN, Baller EB, Tu D, Adebimpe A, Tapera TM, Gur RC, Gur RE, Detre JA, Raznahan A, Alexander-Bloch AF, Satterthwaite TD, Shinohara RT, Park JY. Spatially-enhanced clusterwise inference for testing and localizing intermodal correspondence. Neuroimage 2022;264:119712. [PMID: 36309332 PMCID: PMC10062374 DOI: 10.1016/j.neuroimage.2022.119712] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 10/16/2022] [Accepted: 10/25/2022] [Indexed: 11/05/2022] Open

Affiliation(s)

Sarah M Weinstein Penn Statistics in Imaging and Visualization Center, Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA
Simon N Vandekar Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
Erica B Baller Department of Psychiatry, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA
Danni Tu Penn Statistics in Imaging and Visualization Center, Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA
Azeez Adebimpe Department of Psychiatry, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA; Strategy Innovation & Deployment Section, Johnson and Johnson, Raritan, NJ, 08869, USA
Tinashe M Tapera Department of Psychiatry, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA
Ruben C Gur Department of Psychiatry, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA
Raquel E Gur Department of Psychiatry, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA
John A Detre Department of Neurology, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA
Armin Raznahan Section on Developmental Neurogenomics, National Institute of Mental Health Intramural Research Program, Bethesda, MD 20892, USA
Aaron F Alexander-Bloch Department of Psychiatry, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA; Department of Child and Adolescent Psychiatry and Behavioral Science, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
Theodore D Satterthwaite Department of Psychiatry, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA
Russell T Shinohara Penn Statistics in Imaging and Visualization Center, Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA
Jun Young Park Department of Statistical Sciences and Department of Psychology, University of Toronto, Toronto, ON, M5G 1Z5, Canada.

Collapse

Duchesneau ED, Jackson BE, Webster-Clark M, Lund JL, Reeder-Hayes KE, Nápoles AM, Strassle PD. The Timing, the Treatment, the Question: Comparison of Epidemiologic Approaches to Minimize Immortal Time Bias in Real-World Data Using a Surgical Oncology Example. Cancer Epidemiol Biomarkers Prev 2022;31:2079-2086. [PMID: 35984990 PMCID: PMC9627261 DOI: 10.1158/1055-9965.epi-22-0495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Revised: 07/01/2022] [Accepted: 08/17/2022] [Indexed: 01/07/2023] Open

Abstract

BACKGROUND

Studies evaluating the effects of cancer treatments are prone to immortal time bias that, if unaddressed, can lead to treatments appearing more beneficial than they are.

METHODS

To demonstrate the impact of immortal time bias, we compared results across several analytic approaches (dichotomous exposure, dichotomous exposure excluding immortal time, time-varying exposure, landmark analysis, clone-censor-weight method), using surgical resection among women with metastatic breast cancer as an example. All adult women diagnosed with incident metastatic breast cancer from 2013-2016 in the National Cancer Database were included. To quantify immortal time bias, we also conducted a simulation study where the "true" relationship between surgical resection and mortality was known.

RESULTS

24,329 women (median age 61, IQR 51-71) were included, and 24% underwent surgical resection. The largest association between resection and mortality was observed when using a dichotomized exposure [HR, 0.54; 95% confidence interval (CI), 0.51-0.57], followed by dichotomous with exclusion of immortal time (HR, 0.62; 95% CI, 0.59-0.65). Results from the time-varying exposure, landmark, and clone-censor-weight method analyses were closer to the null (HR, 0.67-0.84). Results from the plasmode simulation found that the time-varying exposure, landmark, and clone-censor-weight method models all produced unbiased HRs (bias -0.003 to 0.016). Both standard dichotomous exposure (HR, 0.84; bias, -0.177) and dichotomous with exclusion of immortal time (HR, 0.93; bias, -0.074) produced meaningfully biased estimates.

CONCLUSIONS

Researchers should use time-varying exposures with a treatment assessment window or the clone-censor-weight method when immortal time is present.

IMPACT

Using methods that appropriately account for immortal time will improve evidence and decision-making from research using real-world data.

Collapse

Shi J, Wang D, Tesei G, Norgeot B. Generating high-fidelity privacy-conscious synthetic patient data for causal effect estimation with multiple treatments. Front Artif Intell 2022;5:918813. [PMID: 36187323 PMCID: PMC9515575 DOI: 10.3389/frai.2022.918813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Accepted: 08/15/2022] [Indexed: 12/03/2022] Open

Abstract

In the past decade, there has been exponentially growing interest in the use of observational data collected as a part of routine healthcare practice to determine the effect of a treatment with causal inference models. Validation of these models, however, has been a challenge because the ground truth is unknown: only one treatment-outcome pair for each person can be observed. There have been multiple efforts to fill this void using synthetic data where the ground truth can be generated. However, to date, these datasets have been severely limited in their utility either by being modeled after small non-representative patient populations, being dissimilar to real target populations, or only providing known effects for two cohorts (treated vs. control). In this work, we produced a large-scale and realistic synthetic dataset that provides ground truth effects for over 10 hypertension treatments on blood pressure outcomes. The synthetic dataset was created by modeling a nationwide cohort of more than 580, 000 hypertension patient data including each person's multi-year history of diagnoses, medications, and laboratory values. We designed a data generation process by combining an adapted ADS-GAN model for fictitious patient information generation and a neural network for treatment outcome generation. Wasserstein distance of 0.35 demonstrates that our synthetic data follows a nearly identical joint distribution to the patient cohort used to generate the data. Patient privacy was a primary concern for this study; the ϵ-identifiability metric, which estimates the probability of actual patients being identified, is 0.008%, ensuring that our synthetic data cannot be used to identify any actual patients. To demonstrate its usage, we tested the bias in causal effect estimation of four well-established models using this dataset. The approach we used can be readily extended to other types of diseases in the clinical domain, and to datasets in other domains as well.

Collapse

Wyss R, Schneeweiss S, Lin KJ, Miller DP, Kalilani L, Franklin JM. Synthetic Negative Controls: Using Simulation to Screen Large-scale Propensity Score Analyses. Epidemiology 2022;33:541-550. [PMID: 35439779 PMCID: PMC9156547 DOI: 10.1097/ede.0000000000001482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Robertson SE, Steingrimsson JA, Dahabreh IJ. Using Numerical Methods to Design Simulations: Revisiting the Balancing Intercept. Am J Epidemiol 2022;191:1283-1289. [PMID: 34736280 DOI: 10.1093/aje/kwab264] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 09/13/2021] [Accepted: 10/27/2021] [Indexed: 01/26/2023] Open

Wyss R, Yanover C, El-Hay T, Bennett D, Platt RW, Zullo AR, Sari G, Wen X, Ye Y, Yuan H, Gokhale M, Patorno E, Lin KJ. Machine learning for improving high-dimensional proxy confounder adjustment in healthcare database studies: an overview of the current literature. Pharmacoepidemiol Drug Saf 2022;31:932-943. [PMID: 35729705 PMCID: PMC9541861 DOI: 10.1002/pds.5500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 06/01/2022] [Accepted: 06/05/2022] [Indexed: 11/10/2022]

Rodriguez PJ, Veenstra DL, Heagerty PJ, Goss CH, Ramos KJ, Bansal A. A Framework for Using Real-World Data and Health Outcomes Modeling to Evaluate Machine Learning-Based Risk Prediction Models. VALUE IN HEALTH : THE JOURNAL OF THE INTERNATIONAL SOCIETY FOR PHARMACOECONOMICS AND OUTCOMES RESEARCH 2022;25:350-358. [PMID: 35227445 PMCID: PMC9311314 DOI: 10.1016/j.jval.2021.11.1360] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 10/19/2021] [Accepted: 11/16/2021] [Indexed: 05/06/2023]

Shan M, Faries D, Dang A, Zhang X, Cui Z, Sheffield KM. A Simulation-Based Evaluation of Statistical Methods for Hybrid Real-World Control Arms in Clinical Trials. STATISTICS IN BIOSCIENCES 2022. [DOI: 10.1007/s12561-022-09334-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Stopsack KH, Tyekucheva S, Wang M, Gerke TA, Vaselkiv JB, Penney KL, Kantoff PW, Finn SP, Fiorentino M, Loda M, Lotan TL, Parmigiani G, Mucci LA. Extent, impact, and mitigation of batch effects in tumor biomarker studies using tissue microarrays. eLife 2021;10:71265. [PMID: 34939926 PMCID: PMC8849344 DOI: 10.7554/elife.71265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 12/22/2021] [Indexed: 12/05/2022] Open

Abstract

Tissue microarrays (TMAs) have been used in thousands of cancer biomarker studies. To what extent batch effects, measurement error in biomarker levels between slides, affects TMA-based studies has not been assessed systematically. We evaluated 20 protein biomarkers on 14 TMAs with prospectively collected tumor tissue from 1448 primary prostate cancers. In half of the biomarkers, more than 10% of biomarker variance was attributable to between-TMA differences (range, 1–48%). We implemented different methods to mitigate batch effects (R package batchtma), tested in plasmode simulation. Biomarker levels were more similar between mitigation approaches compared to uncorrected values. For some biomarkers, associations with clinical features changed substantially after addressing batch effects. Batch effects and resulting bias are not an error of an individual study but an inherent feature of TMA-based protein biomarker studies. They always need to be considered during study design and addressed analytically in studies using more than one TMA.

To understand cancer, researchers need to know which molecules tumor cells use. These so-called ‘biomarkers’ tag cancer cells as being different from healthy cells, and can be used to predict how aggressive a tumor may be, or how well it might respond to treatment.

A popular technique for assessing biomarkers across multiple tumors is to use tissue microarrays. This involves taking samples from different tumors and embedding them in a block of wax, which is then cut into micro-thin slices and stained with reagents that can detect specific biomarkers, such as proteins. Each block contains hundreds of samples, which all experience the same conditions. So, any patterns detected in the staining are likely to represent real variations in the biomarkers present.

Many cancer studies, however, often compare samples from multiple tissue microarrays, which may increase the risk of technical artifacts: for example, staining may look stronger in one batch of tissue samples than another, even though the amount of biomarker present in these different arrays is roughly the same. These ‘batch effects’ could potentially bias the results of the experiment and lead to the identification of misleading patterns.

To evaluate how batch effects impact tissue microarray studies, Stopsack et al. examined 14 wax blocks which contained tumor samples from 1,448 men with prostate cancer. This revealed that for some biomarkers, but not others, there were noticeable differences between tissue microarrays that were clearly the result of batch effects. Stopsack et al. then tested six different ways of fixing these discrepancies using statistical methods. All six approaches were successful, even if the arrays included tumors with different characteristics, such as tumors that had been diagnosed more or less recently.

This work highlights the importance of considering batch effects when using tissue microarrays to study cancer. Stopsack et al. have used their statistical approaches to develop freely available software which can reduce the biases that sometimes arise from these technical artifacts. This could help researchers avoid misleading patterns in their data and make it easier to detect real variations in the biomarkers present between tumor samples.

Collapse

Madjar K, Zucknick M, Ickstadt K, Rahnenführer J. Combining heterogeneous subgroups with graph-structured variable selection priors for Cox regression. BMC Bioinformatics 2021;22:586. [PMID: 34895139 PMCID: PMC8665528 DOI: 10.1186/s12859-021-04483-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Accepted: 11/15/2021] [Indexed: 11/12/2022] Open

Soeorg H, Sverrisdóttir E, Andersen M, Lund TM, Sessa M. The PHARMACOM-EPI Framework for Integrating Pharmacometric Modelling Into Pharmacoepidemiological Research Using Real-World Data: Application to Assess Death Associated With Valproate. Clin Pharmacol Ther 2021;111:840-856. [PMID: 34860420 DOI: 10.1002/cpt.2502] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Accepted: 11/17/2021] [Indexed: 01/14/2023]

Abstract

In pharmacoepidemiology, it is usually expected that the observed association should be directly or indirectly related to the pharmacological effects of the drug/s under investigation. Pharmacological effects are, in turn, strongly connected to the pharmacokinetic and pharmacodynamic properties of a drug, which can be characterized and investigated using pharmacometric models. Recently, the use of pharmacometrics has been proposed to provide pharmacological substantiation of pharmacoepidemiological findings derived from real-world data. However, validated frameworks suggesting how to combine these two disciplines for the aforementioned purpose are missing. Therefore, we propose PHARMACOM-EPI, a framework that provides a structured approach on how to identify, characterize, and apply pharmacometric models with practical details on how to choose software, format dataset, handle missing covariates/dosing data, how to perform the external evaluation of pharmacometric models in real-world data, and how to provide pharmacological substantiation of pharmacoepidemiological findings. PHARMACOM-EPI was tested in a proof-of-concept study to pharmacologically substantiate death associated with valproate use in the Danish population aged ≥ 65 years. Pharmacological substantiation of death during a follow-up period of 1 year showed that in all individuals who died (n = 169) individual predictions were within the subtherapeutic range compared with 52.8% of those who did not die (n = 1,084). Of individuals who died, 66.3% (n = 112) had a cause of death possibly related to valproate and 33.7% (n = 57) with well-defined cause of death unlikely related to valproate. This proof-of-concept study showed that PHARMACOM-EPI was able to provide pharmacological substantiation for death associated with valproate use in the study population.

Collapse

Naimi AI, Mishler AE, Kennedy EH. Practical Strategies for Mitigating the Unknowable. Am J Epidemiol 2021;192:kwab202. [PMID: 34268571 DOI: 10.1093/aje/kwab202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 07/07/2021] [Accepted: 07/08/2021] [Indexed: 11/14/2022] Open

Filion KB, Yu YH. Invited Commentary: The Prevalent New-User Design in Pharmacoepidemiology-Challenges and Opportunities. Am J Epidemiol 2021;190:1349-1352. [PMID: 33350439 DOI: 10.1093/aje/kwaa284] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 12/15/2020] [Accepted: 12/16/2020] [Indexed: 12/19/2022] Open

Webster-Clark M, Ross RK, Lund JL. Initiator Types and the Causal Question of the Prevalent New-User Design: A Simulation Study. Am J Epidemiol 2021;190:1341-1348. [PMID: 33350433 DOI: 10.1093/aje/kwaa283] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Revised: 10/30/2020] [Accepted: 11/02/2020] [Indexed: 12/12/2022] Open

Accounting for Repeat Pregnancies in Risk Prediction Models. Epidemiology 2021;32:560-568. [PMID: 33767113 DOI: 10.1097/ede.0000000000001349] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Weberpals J, Becker T, Davies J, Schmich F, Rüttinger D, Theis FJ, Bauer-Mehren A. Deep Learning-based Propensity Scores for Confounding Control in Comparative Effectiveness Research: A Large-scale, Real-world Data Study. Epidemiology 2021;32:378-388. [PMID: 33591049 DOI: 10.1097/ede.0000000000001338] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Abstract

BACKGROUND

Due to the non-randomized nature of real-world data, prognostic factors need to be balanced, which is often done by propensity scores (PSs). This study aimed to investigate whether autoencoders, which are unsupervised deep learning architectures, might be leveraged to compute PS.

METHODS

We selected patient-level data of 128,368 first-line treated cancer patients from the Flatiron Health EHR-derived de-identified database. We trained an autoencoder architecture to learn a lower-dimensional patient representation, which we used to compute PS. To compare the performance of an autoencoder-based PS with established methods, we performed a simulation study. We assessed the balancing and adjustment performance using standardized mean differences, root mean square errors (RMSE), percent bias, and confidence interval coverage. To illustrate the application of the autoencoder-based PS, we emulated the PRONOUNCE trial by applying the trial's protocol elements within an observational database setting, comparing two chemotherapy regimens.

RESULTS

All methods but the manual variable selection approach led to well-balanced cohorts with average standardized mean differences <0.1. LASSO yielded on average the lowest deviation of resulting estimates (RMSE 0.0205) followed by the autoencoder approach (RMSE 0.0248). Altering the hyperparameter setup in sensitivity analysis, the autoencoder approach led to similar results as LASSO (RMSE 0.0203 and 0.0205, respectively). In the case study, all methods provided a similar conclusion with point estimates clustered around the null (e.g., HRautoencoder 1.01 [95% confidence interval = 0.80, 1.27] vs. HRPRONOUNCE 1.07 [0.83, 1.36]).

CONCLUSIONS

Autoencoder-based PS computation was a feasible approach to control for confounding but did not perform better than some established approaches like LASSO.

Collapse

Acton EK, Hennessy S. Use of prescription drug samples in the US and implications for pharmacoepidemiologic research: a systematic search of the literature. Expert Rev Pharmacoecon Outcomes Res 2021;21:541-551. [PMID: 33730962 DOI: 10.1080/14737167.2021.1905528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Methods to Account for Uncertainty in Latent Class Assignments When Using Latent Classes as Predictors in Regression Models, with Application to Acculturation Strategy Measures. Epidemiology 2021;31:194-204. [PMID: 31809338 DOI: 10.1097/ede.0000000000001139] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

Conover MM, Rothman KJ, Stürmer T, Ellis AR, Poole C, Jonsson Funk M. Propensity score trimming mitigates bias due to covariate measurement error in inverse probability of treatment weighted analyses: A plasmode simulation. Stat Med 2021;40:2101-2112. [PMID: 33622016 DOI: 10.1002/sim.8887] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Revised: 11/15/2020] [Accepted: 01/08/2021] [Indexed: 11/12/2022]

Tao R, Mercaldo ND, Haneuse S, Maronge JM, Rathouz PJ, Heagerty PJ, Schildcrout JS. Two-wave two-phase outcome-dependent sampling designs, with applications to longitudinal binary data. Stat Med 2021;40:1863-1876. [PMID: 33442883 DOI: 10.1002/sim.8876] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 12/07/2020] [Accepted: 12/25/2020] [Indexed: 12/26/2022]

Garrido MM, Lum J, Pizer SD. Vector-based kernel weighting: A simple estimator for improving precision and bias of average treatment effects in multiple treatment settings. Stat Med 2020;40:1204-1223. [PMID: 33327037 DOI: 10.1002/sim.8836] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Revised: 10/27/2020] [Accepted: 11/14/2020] [Indexed: 11/08/2022]

Consequences of Depletion of Susceptibles for Hazard Ratio Estimators Based on Propensity Scores. Epidemiology 2020;31:806-814. [PMID: 32841986 PMCID: PMC7523577 DOI: 10.1097/ede.0000000000001246] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Bykov K, Wang SV, Hallas J, Pottegård A, Maclure M, Gagne JJ. Bias in case-crossover studies of medications due to persistent use: A simulation study. Pharmacoepidemiol Drug Saf 2020;29:1079-1085. [PMID: 32548875 DOI: 10.1002/pds.5031] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Revised: 05/01/2020] [Accepted: 05/05/2020] [Indexed: 11/09/2022]

Ripollone JE, Huybrechts KF, Rothman KJ, Ferguson RE, Franklin JM. Evaluating the Utility of Coarsened Exact Matching for Pharmacoepidemiology Using Real and Simulated Claims Data. Am J Epidemiol 2020;189:613-622. [PMID: 31845719 DOI: 10.1093/aje/kwz268] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Revised: 11/14/2019] [Accepted: 11/19/2019] [Indexed: 01/27/2023] Open

Izem R, Liao J, Hu M, Wei Y, Akhtar S, Wernecke M, MaCurdy TE, Kelman J, Graham DJ. Comparison of propensity score methods for pre-specified subgroup analysis with survival data. J Biopharm Stat 2020;30:734-751. [DOI: 10.1080/10543406.2020.1730868] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Shi X, Wellman R, Heagerty PJ, Nelson JC, Cook AJ. Safety surveillance and the estimation of risk in select populations: Flexible methods to control for confounding while targeting marginal comparisons via standardization. Stat Med 2020;39:369-386. [PMID: 31823406 PMCID: PMC7768802 DOI: 10.1002/sim.8410] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2017] [Revised: 09/04/2019] [Accepted: 09/26/2019] [Indexed: 11/07/2022]

Ding LJ, Schlüter HM, Szucs MJ, Ahmad R, Wu Z, Xu W. Comparison of Statistical Tests and Power Analysis for Phosphoproteomics Data. J Proteome Res 2020;19:572-582. [PMID: 31789524 DOI: 10.1021/acs.jproteome.9b00280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Jagdhuber R, Lang M, Stenzl A, Neuhaus J, Rahnenführer J. Cost-Constrained feature selection in binary classification: adaptations for greedy forward selection and genetic algorithms. BMC Bioinformatics 2020;21:26. [PMID: 31992203 PMCID: PMC6986087 DOI: 10.1186/s12859-020-3361-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Accepted: 01/10/2020] [Indexed: 01/22/2023] Open

Abstract

BACKGROUND

With modern methods in biotechnology, the search for biomarkers has advanced to a challenging statistical task exploring high dimensional data sets. Feature selection is a widely researched preprocessing step to handle huge numbers of biomarker candidates and has special importance for the analysis of biomedical data. Such data sets often include many input features not related to the diagnostic or therapeutic target variable. A less researched, but also relevant aspect for medical applications are costs of different biomarker candidates. These costs are often financial costs, but can also refer to other aspects, for example the decision between a painful biopsy marker and a simple urine test. In this paper, we propose extensions to two feature selection methods to control the total amount of such costs: greedy forward selection and genetic algorithms. In comprehensive simulation studies of binary classification tasks, we compare the predictive performance, the run-time and the detection rate of relevant features for the new proposed methods and five baseline alternatives to handle budget constraints.

RESULTS

In simulations with a predefined budget constraint, our proposed methods outperform the baseline alternatives, with just minor differences between them. Only in the scenario without an actual budget constraint, our adapted greedy forward selection approach showed a clear drop in performance compared to the other methods. However, introducing a hyperparameter to adapt the benefit-cost trade-off in this method could overcome this weakness.

CONCLUSIONS

In feature cost scenarios, where a total budget has to be met, common feature selection algorithms are often not suitable to identify well performing subsets for a modelling task. Adaptations of these algorithms such as the ones proposed in this paper can help to tackle this problem.

Collapse

Visualization tool of variable selection in bias-variance tradeoff for inverse probability weights. Ann Epidemiol 2020;41:56-59. [PMID: 31982245 DOI: 10.1016/j.annepidem.2019.12.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Revised: 11/27/2019] [Accepted: 12/10/2019] [Indexed: 11/23/2022]

Use of Time-Dependent Propensity Scores to Adjust Hazard Ratio Estimates in Cohort Studies with Differential Depletion of Susceptibles. Epidemiology 2020;31:82-89. [DOI: 10.1097/ede.0000000000001107] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Edelmann D, Hummel M, Hielscher T, Saadati M, Benner A. Marginal variable screening for survival endpoints. Biom J 2019;62:610-626. [DOI: 10.1002/bimj.201800269] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2018] [Revised: 05/23/2019] [Accepted: 06/04/2019] [Indexed: 01/31/2023]

Missing Data in Marginal Structural Models: A Plasmode Simulation Study Comparing Multiple Imputation and Inverse Probability Weighting. Med Care 2019;57:237-243. [PMID: 30664611 DOI: 10.1097/mlr.0000000000001063] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]

Tian Y, Schuemie MJ, Suchard MA. Evaluating large-scale propensity score performance through real-world and synthetic data experiments. Int J Epidemiol 2019;47:2005-2014. [PMID: 29939268 DOI: 10.1093/ije/dyy120] [Citation(s) in RCA: 103] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/24/2018] [Indexed: 12/30/2022] Open