1
|
Xiong W, Roy J, Liu H, Hu L. Leveraging machine learning: Covariate-adjusted Bayesian adaptive randomization and subgroup discovery in multi-arm survival trials. Contemp Clin Trials 2024; 142:107547. [PMID: 38688389 DOI: 10.1016/j.cct.2024.107547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 04/17/2024] [Accepted: 04/25/2024] [Indexed: 05/02/2024]
Abstract
Clinical trials evaluate the safety and efficacy of treatments for specific diseases. Ensuring these studies are well-powered is crucial for identifying superior treatments. With the rise of personalized medicine, treatment efficacy may vary based on biomarker profiles. However, researchers often lack prior knowledge about which biomarkers are linked to varied treatment effects. Fixed or response-adaptive designs may not sufficiently account for heterogeneous patient characteristics, such as genetic diversity, potentially reducing the chance of selecting the optimal treatment for individuals. Recent advances in Bayesian nonparametric modeling pave the way for innovative trial designs that not only maintain robust power but also offer the flexibility to identify subgroups deriving greater benefits from specific treatments. Building on this inspiration, we introduce a Bayesian adaptive design for multi-arm trials focusing on time-to-event endpoints. We introduce a covariate-adjusted response adaptive randomization, updating treatment allocation probabilities grounded on causal effect estimates using a random intercept accelerated failure time BART model. After the trial concludes, we suggest employing a multi-response decision tree to pinpoint subgroups with varying treatment impacts. The performance of our design is then assessed via comprehensive simulations.
Collapse
Affiliation(s)
- Wenxuan Xiong
- Department of Biostatistics and Epidemiology, Rutgers University School of Public Health, Piscataway, NJ, USA.
| | - Jason Roy
- Department of Biostatistics and Epidemiology, Rutgers University School of Public Health, Piscataway, NJ, USA
| | - Hao Liu
- Department of Biostatistics and Epidemiology, Rutgers University School of Public Health, Piscataway, NJ, USA; Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ, USA
| | - Liangyuan Hu
- Department of Biostatistics and Epidemiology, Rutgers University School of Public Health, Piscataway, NJ, USA
| |
Collapse
|
2
|
Han S, Goh J, Meng F, Leow MKS, Rubin DB. Contrast-specific propensity scores for causal inference with multiple interventions. Stat Methods Med Res 2024; 33:825-837. [PMID: 38499338 DOI: 10.1177/09622802241236952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]
Abstract
Existing methods that use propensity scores for heterogeneous treatment effect estimation on non-experimental data do not readily extend to the case of more than two treatment options. In this work, we develop a new propensity score-based method for heterogeneous treatment effect estimation when there are three or more treatment options, and prove that it generates unbiased estimates. We demonstrate our method on a real patient registry of patients in Singapore with diabetic dyslipidemia. On this dataset, our method generates heterogeneous treatment recommendations for patients among three options: Statins, fibrates, and non-pharmacological treatment to control patients' lipid ratios (total cholesterol divided by high-density lipoprotein level). In our numerical study, our proposed method generated more stable estimates compared to a benchmark method based on a multi-dimensional propensity score.
Collapse
Affiliation(s)
- Shasha Han
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Joel Goh
- NUS Business School, National University of Singapore, Singapore
- Global Asia Institute, National University of Singapore, Singapore
- Institute of Operations Research and Analytics, National University of Singapore, Singapore
| | - Fanwen Meng
- Department of Health Services & Outcomes Research, National Healthcare Group, Singapore
| | - Melvin Khee-Shing Leow
- Cardiovascular & Metabolic Disorders Programme, Duke-NUS Medical School, Singapore
- Department of Endocrinology, Tan Tock Seng Hospital, Singapore
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore
| | - Donald B Rubin
- Department of Statistics, Harvard University, Cambridge, MA, USA
- Department of Statistical Science, Fox Business School, Temple University, Philadelphia, PA, USA
- Yau Mathematical Center, Tsinghua University, Beijing, China
| |
Collapse
|
3
|
Dandl S, Bender A, Hothorn T. Heterogeneous treatment effect estimation for observational data using model-based forests. Stat Methods Med Res 2024; 33:392-413. [PMID: 38332489 PMCID: PMC10981193 DOI: 10.1177/09622802231224628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2024]
Abstract
The estimation of heterogeneous treatment effects has attracted considerable interest in many disciplines, most prominently in medicine and economics. Contemporary research has so far primarily focused on continuous and binary responses where heterogeneous treatment effects are traditionally estimated by a linear model, which allows the estimation of constant or heterogeneous effects even under certain model misspecifications. More complex models for survival, count, or ordinal outcomes require stricter assumptions to reliably estimate the treatment effect. Most importantly, the noncollapsibility issue necessitates the joint estimation of treatment and prognostic effects. Model-based forests allow simultaneous estimation of covariate-dependent treatment and prognostic effects, but only for randomized trials. In this paper, we propose modifications to model-based forests to address the confounding issue in observational data. In particular, we evaluate an orthogonalization strategy originally proposed by Robinson (1988, Econometrica) in the context of model-based forests targeting heterogeneous treatment effect estimation in generalized linear models and transformation models. We found that this strategy reduces confounding effects in a simulated study with various outcome distributions. We demonstrate the practical aspects of heterogeneous treatment effect estimation for survival and ordinal outcomes by an assessment of the potentially heterogeneous effect of Riluzole on the progress of Amyotrophic Lateral Sclerosis.
Collapse
Affiliation(s)
- Susanne Dandl
- Institut für Statistik, Ludwig-Maximilians-Universität München, Munich, Germany
- Munich Center for Machine Learning (MCML), Germany
| | - Andreas Bender
- Institut für Statistik, Ludwig-Maximilians-Universität München, Munich, Germany
- Munich Center for Machine Learning (MCML), Germany
| | - Torsten Hothorn
- Institut für Epidemiologie, Biostatistik und Prävention, Universität Zürich, Zurich, Switzerland
| |
Collapse
|
4
|
Chen X, Harhay MO, Tong G, Li F. A BAYESIAN MACHINE LEARNING APPROACH FOR ESTIMATING HETEROGENEOUS SURVIVOR CAUSAL EFFECTS: APPLICATIONS TO A CRITICAL CARE TRIAL. Ann Appl Stat 2024; 18:350-374. [PMID: 38455841 PMCID: PMC10919396 DOI: 10.1214/23-aoas1792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2024]
Abstract
Assessing heterogeneity in the effects of treatments has become increasingly popular in the field of causal inference and carries important implications for clinical decision-making. While extensive literature exists for studying treatment effect heterogeneity when outcomes are fully observed, there has been limited development in tools for estimating heterogeneous causal effects when patient-centered outcomes are truncated by a terminal event, such as death. Due to mortality occurring during study follow-up, the outcomes of interest are unobservable, undefined, or not fully observed for many participants in which case principal stratification is an appealing framework to draw valid causal conclusions. Motivated by the Acute Respiratory Distress Syndrome Network (ARDSNetwork) ARDS respiratory management (ARMA) trial, we developed a flexible Bayesian machine learning approach to estimate the average causal effect and heterogeneous causal effects among the always-survivors stratum when clinical outcomes are subject to truncation. We adopted Bayesian additive regression trees (BART) to flexibly specify separate mean models for the potential outcomes and latent stratum membership. In the analysis of the ARMA trial, we found that the low tidal volume treatment had an overall benefit for participants sustaining acute lung injuries on the outcome of time to returning home but substantial heterogeneity in treatment effects among the always-survivors, driven most strongly by biologic sex and the alveolar-arterial oxygen gradient at baseline (a physiologic measure of lung function and degree of hypoxemia). These findings illustrate how the proposed methodology could guide the prognostic enrichment of future trials in the field.
Collapse
Affiliation(s)
- Xinyuan Chen
- Department of Mathematics and Statistics, Mississippi State University
| | - Michael O. Harhay
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania
| | - Guangyu Tong
- Department of Biostatistics, Yale School of Public Health
| | - Fan Li
- Department of Biostatistics, Yale School of Public Health
| |
Collapse
|
5
|
Xue W, Zhang X, Chan KCG, Wong RKW. RKHS-based covariate balancing for survival causal effect estimation. LIFETIME DATA ANALYSIS 2024; 30:34-58. [PMID: 36821062 DOI: 10.1007/s10985-023-09590-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 01/10/2023] [Indexed: 06/18/2023]
Abstract
Survival causal effect estimation based on right-censored data is of key interest in both survival analysis and causal inference. Propensity score weighting is one of the most popular methods in the literature. However, since it involves the inverse of propensity score estimates, its practical performance may be very unstable, especially when the covariate overlap is limited between treatment and control groups. To address this problem, a covariate balancing method is developed in this paper to estimate the counterfactual survival function. The proposed method is nonparametric and balances covariates in a reproducing kernel Hilbert space (RKHS) via weights that are counterparts of inverse propensity scores. The uniform rate of convergence for the proposed estimator is shown to be the same as that for the classical Kaplan-Meier estimator. The appealing practical performance of the proposed method is demonstrated by a simulation study as well as two real data applications to study the causal effect of smoking on survival time of stroke patients and that of endotoxin on survival time for female patients with lung cancer respectively.
Collapse
Affiliation(s)
- Wu Xue
- Meta Platforms Inc., Menlo Park, CA, 94025, USA
| | - Xiaoke Zhang
- Department of Statistics, George Washington University, Washington, DC, 20052, USA.
| | | | - Raymond K W Wong
- Department of Statistics, Texas A &M University, College Station, TX, 77843, USA
| |
Collapse
|
6
|
Hu L. A new method for clustered survival data: Estimation of treatment effect heterogeneity and variable selection. Biom J 2024; 66:e2200178. [PMID: 38072661 PMCID: PMC10953775 DOI: 10.1002/bimj.202200178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 07/31/2023] [Accepted: 08/11/2023] [Indexed: 01/30/2024]
Abstract
We recently developed a new method random-intercept accelerated failure time model with Bayesian additive regression trees (riAFT-BART) to draw causal inferences about population treatment effect on patient survival from clustered and censored survival data while accounting for the multilevel data structure. The practical utility of this method goes beyond the estimation of population average treatment effect. In this work, we exposit how riAFT-BART can be used to solve two important statistical questions with clustered survival data: estimating the treatment effect heterogeneity and variable selection. Leveraging the likelihood-based machine learning, we describe a way in which we can draw posterior samples of the individual survival treatment effect from riAFT-BART model runs, and use the drawn posterior samples to perform an exploratory treatment effect heterogeneity analysis to identify subpopulations who may experience differential treatment effects than population average effects. There is sparse literature on methods for variable selection among clustered and censored survival data, particularly ones using flexible modeling techniques. We propose a permutation-based approach using the predictor's variable inclusion proportion supplied by the riAFT-BART model for variable selection. To address the missing data issue frequently encountered in health databases, we propose a strategy to combine bootstrap imputation and riAFT-BART for variable selection among incomplete clustered survival data. We conduct an expansive simulation study to examine the practical operating characteristics of our proposed methods, and provide empirical evidence that our proposed methods perform better than several existing methods across a wide range of data scenarios. Finally, we demonstrate the methods via a case study of predictors for in-hospital mortality among severe COVID-19 patients and estimating the heterogeneous treatment effects of three COVID-specific medications. The methods developed in this work are readily available in the R ${\textsf {R}}$ package riAFTBART $\textsf {riAFTBART}$ .
Collapse
Affiliation(s)
- Liangyuan Hu
- Department of Biostatistics and Epidemiology, Rutgers University, Piscataway, New Jersey 08854
| |
Collapse
|
7
|
Yang S, Zhou R, Li F, Thomas LE. Propensity score weighting methods for causal subgroup analysis with time-to-event outcomes. Stat Methods Med Res 2023; 32:1919-1935. [PMID: 37559475 DOI: 10.1177/09622802231188517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/11/2023]
Abstract
Evaluating causal effects of an intervention in pre-specified subgroups is a standard goal in comparative effectiveness research. Despite recent advancements in causal subgroup analysis, research on time-to-event outcomes has been lacking. This article investigates the propensity score weighting method for causal subgroup survival analysis. We introduce two causal estimands, the subgroup marginal hazard ratio and subgroup restricted average causal effect, and provide corresponding propensity score weighting estimators. We analytically established that the bias of subgroup-restricted average causal effect is determined by subgroup covariate balance. Using extensive simulations, we compare the performance of various combinations of propensity score models (logistic regression, random forests, least absolute shrinkage and selection operator, and generalized boosted models) and weighting schemes (inverse probability weighting, and overlap weighting) for estimating the causal estimands. We find that the logistic model with subgroup-covariate interactions selected by least absolute shrinkage and selection operator consistently outperforms other propensity score models. Also, overlap weighting generally outperforms inverse probability weighting in terms of balance, bias and variance, and the advantage is particularly pronounced in small subgroups and/or in the presence of poor overlap. We applied the methods to the observational Comparing Options for Management: PAtient-centered REsults for Uterine Fibroids study to evaluate the causal effects of myomectomy versus hysterectomy on the time to disease recurrence in a number of pre-specified subgroups of patients with uterine fibroids.
Collapse
Affiliation(s)
- Siyun Yang
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA
| | - Ruiwen Zhou
- Division of Biostatistics, Washington University in St. Louis, Missouri, USA
| | - Fan Li
- Department of Statistical Science, Duke University, Durham, NC, USA
| | - Laine E Thomas
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA
- Duke Clinical Research Institute, Durham, NC, USA
| |
Collapse
|
8
|
Galetzka W, Kowall B, Jusi C, Huessler EM, Stang A. Distance-Metric Learning for Personalized Survival Analysis. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1404. [PMID: 37895525 PMCID: PMC10606222 DOI: 10.3390/e25101404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Revised: 09/21/2023] [Accepted: 09/26/2023] [Indexed: 10/29/2023]
Abstract
Personalized time-to-event or survival prediction with right-censored outcomes is a pervasive challenge in healthcare research. Although various supervised machine learning methods, such as random survival forests or neural networks, have been adapted to handle such outcomes effectively, they do not provide explanations for their predictions, lacking interpretability. In this paper, an alternative method for survival prediction by weighted nearest neighbors is proposed. Fitting this model to data entails optimizing the weights by learning a metric. An individual prediction of this method can be explained by providing the user with the most influential data points for this prediction, i.e., the closest data points and their weights. The strengths and weaknesses in terms of predictive performance are highlighted on simulated data and an application of the method on two different real-world datasets of breast cancer patients shows its competitiveness with established methods.
Collapse
Affiliation(s)
- Wolfgang Galetzka
- Institute of Medical Informatics, Biometrics and Epidemiology, University Hospital Essen, 45130 Essen, Germany
| | - Bernd Kowall
- Institute of Medical Informatics, Biometrics and Epidemiology, University Hospital Essen, 45130 Essen, Germany
| | - Cynthia Jusi
- Nisso Chemical Europe GmbH, 40212 Düsseldorf, Germany
| | - Eva-Maria Huessler
- Institute of Medical Informatics, Biometrics and Epidemiology, University Hospital Essen, 45130 Essen, Germany
| | - Andreas Stang
- Institute of Medical Informatics, Biometrics and Epidemiology, University Hospital Essen, 45130 Essen, Germany
| |
Collapse
|
9
|
Jawadekar N, Kezios K, Odden MC, Stingone JA, Calonico S, Rudolph K, Zeki Al Hazzouri A. Practical Guide to Honest Causal Forests for Identifying Heterogeneous Treatment Effects. Am J Epidemiol 2023; 192:1155-1165. [PMID: 36843042 DOI: 10.1093/aje/kwad043] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Revised: 12/05/2022] [Accepted: 02/20/2023] [Indexed: 02/28/2023] Open
Abstract
"Heterogeneous treatment effects" is a term which refers to conditional average treatment effects (i.e., CATEs) that vary across population subgroups. Epidemiologists are often interested in estimating such effects because they can help detect populations that may particularly benefit from or be harmed by a treatment. However, standard regression approaches for estimating heterogeneous effects are limited by preexisting hypotheses, test a single effect modifier at a time, and are subject to the multiple-comparisons problem. In this article, we aim to offer a practical guide to honest causal forests, an ensemble tree-based learning method which can discover as well as estimate heterogeneous treatment effects using a data-driven approach. We discuss the fundamentals of tree-based methods, describe how honest causal forests can identify and estimate heterogeneous effects, and demonstrate an implementation of this method using simulated data. Our implementation highlights the steps required to simulate data sets, build honest causal forests, and assess model performance across a variety of simulation scenarios. Overall, this paper is intended for epidemiologists and other population health researchers who lack an extensive background in machine learning yet are interested in utilizing an emerging method for identifying and estimating heterogeneous treatment effects.
Collapse
|
10
|
Li F, Ding P, Mealli F. Bayesian causal inference: a critical review. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2023; 381:20220153. [PMID: 36970828 DOI: 10.1098/rsta.2022.0153] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 10/23/2022] [Indexed: 06/18/2023]
Abstract
This paper provides a critical review of the Bayesian perspective of causal inference based on the potential outcomes framework. We review the causal estimands, assignment mechanism, the general structure of Bayesian inference of causal effects and sensitivity analysis. We highlight issues that are unique to Bayesian causal inference, including the role of the propensity score, the definition of identifiability, the choice of priors in both low- and high-dimensional regimes. We point out the central role of covariate overlap and more generally the design stage in Bayesian causal inference. We extend the discussion to two complex assignment mechanisms: instrumental variable and time-varying treatments. We identify the strengths and weaknesses of the Bayesian approach to causal inference. Throughout, we illustrate the key concepts via examples. This article is part of the theme issue 'Bayesian inference: challenges, perspectives, and prospects'.
Collapse
Affiliation(s)
- Fan Li
- Duke University, Durham, NC, USA
| | - Peng Ding
- University of California, Berkeley, CA, USA
| | | |
Collapse
|
11
|
Blette BS, Granholm A, Li F, Shankar-Hari M, Lange T, Munch MW, Møller MH, Perner A, Harhay MO. Causal Bayesian machine learning to assess treatment effect heterogeneity by dexamethasone dose for patients with COVID-19 and severe hypoxemia. Sci Rep 2023; 13:6570. [PMID: 37085591 PMCID: PMC10120498 DOI: 10.1038/s41598-023-33425-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 04/12/2023] [Indexed: 04/23/2023] Open
Abstract
The currently recommended dose of dexamethasone for patients with severe or critical COVID-19 is 6 mg per day (mg/d) regardless of patient features and variation. However, patients with severe or critical COVID-19 are heterogenous in many ways (e.g., age, weight, comorbidities, disease severity, and immune features). Thus, it is conceivable that a standardized dosing protocol may not be optimal. We assessed treatment effect heterogeneity in the COVID STEROID 2 trial, which compared 6 mg/d to 12 mg/d, using a causal inference framework with Bayesian Additive Regression Trees, a flexible modeling method that detects interactive effects and nonlinear relationships among multiple patient characteristics simultaneously. We found that 12 mg/d of dexamethasone, relative to 6 mg/d, was probably associated with better long-term outcomes (days alive without life support and mortality after 90 days) among the entire trial population (i.e., no signals of harm), and probably more beneficial among those without diabetes mellitus, that were older, were not using IL-6 inhibitors at baseline, weighed less, or had higher level respiratory support at baseline. This adds more evidence supporting the use of 12 mg/d in practice for most patients not receiving other immunosuppressants and that additional study of dosing could potentially optimize clinical outcomes.
Collapse
Affiliation(s)
- Bryan S Blette
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Clinical Trials Methods and Outcomes Lab, Palliative and Advanced Illness Research (PAIR) Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Anders Granholm
- Department of Intensive Care, Rigshospitalet-Copenhagen University Hospital, Copenhagen, Denmark
- Collaboration for Research in Intensive Care, Copenhagen, Denmark
| | - Fan Li
- Department of Biostatistics, Yale University School of Public Health, New Haven, CT, USA
- Center for Methods in Implementation and Prevention Science, Yale University School of Public Health, New Haven, CT, USA
| | - Manu Shankar-Hari
- Centre for Inflammation Research, University of Edinburgh, Edinburgh, UK
| | - Theis Lange
- Section of Biostatistics, Department of Public Health, University of Copenhagen, Copenhagen, Denmark
| | - Marie Warrer Munch
- Department of Intensive Care, Rigshospitalet-Copenhagen University Hospital, Copenhagen, Denmark
- Collaboration for Research in Intensive Care, Copenhagen, Denmark
| | - Morten Hylander Møller
- Department of Intensive Care, Rigshospitalet-Copenhagen University Hospital, Copenhagen, Denmark
- Collaboration for Research in Intensive Care, Copenhagen, Denmark
| | - Anders Perner
- Department of Intensive Care, Rigshospitalet-Copenhagen University Hospital, Copenhagen, Denmark
- Collaboration for Research in Intensive Care, Copenhagen, Denmark
| | - Michael O Harhay
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Clinical Trials Methods and Outcomes Lab, Palliative and Advanced Illness Research (PAIR) Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Division of Pulmonary and Critical Care, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, 304 Blockley Hall, 423 Guardian Drive, Philadelphia, PA, 19104-6021, USA.
| |
Collapse
|
12
|
Hu L, Li L. Using Tree-Based Machine Learning for Health Studies: Literature Review and Case Series. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:16080. [PMID: 36498153 PMCID: PMC9736500 DOI: 10.3390/ijerph192316080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 11/22/2022] [Accepted: 11/24/2022] [Indexed: 06/17/2023]
Abstract
Tree-based machine learning methods have gained traction in the statistical and data science fields. They have been shown to provide better solutions to various research questions than traditional analysis approaches. To encourage the uptake of tree-based methods in health research, we review the methodological fundamentals of three key tree-based machine learning methods: random forests, extreme gradient boosting and Bayesian additive regression trees. We further conduct a series of case studies to illustrate how these methods can be properly used to solve important health research problems in four domains: variable selection, estimation of causal effects, propensity score weighting and missing data. We exposit that the central idea of using ensemble tree methods for these research questions is accurate prediction via flexible modeling. We applied ensemble trees methods to select important predictors for the presence of postoperative respiratory complication among early stage lung cancer patients with resectable tumors. We then demonstrated how to use these methods to estimate the causal effects of popular surgical approaches on postoperative respiratory complications among lung cancer patients. Using the same data, we further implemented the methods to accurately estimate the inverse probability weights for a propensity score analysis of the comparative effectiveness of the surgical approaches. Finally, we demonstrated how random forests can be used to impute missing data using the Study of Women's Health Across the Nation data set. To conclude, the tree-based methods are a flexible tool and should be properly used for health investigations.
Collapse
Affiliation(s)
- Liangyuan Hu
- Department of Biostatistics and Epidemiology, Rutgers University, Piscataway, NJ 08854, USA
| | - Lihua Li
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| |
Collapse
|
13
|
Hu L, Ji J, Liu H, Ennis R. A Flexible Approach for Assessing Heterogeneity of Causal Treatment Effects on Patient Survival Using Large Datasets with Clustered Observations. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:14903. [PMID: 36429621 PMCID: PMC9690785 DOI: 10.3390/ijerph192214903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 11/08/2022] [Accepted: 11/09/2022] [Indexed: 06/16/2023]
Abstract
Personalized medicine requires an understanding of treatment effect heterogeneity. Evolving toward causal evidence for scenarios not studied in randomized trials necessitates a methodology using real-world evidence. Herein, we demonstrate a methodology that generates causal effects, assesses the heterogeneity of the effects and adjusts for the clustered nature of the data. This study uses a state-of-the-art machine learning survival model, riAFT-BART, to draw causal inferences about individual survival treatment effects, while accounting for the variability in institutional effects; further, it proposes a data-driven approach to agnostically (as opposed to a priori hypotheses) ascertain which subgroups exhibit an enhanced treatment effect from which intervention, relative to global evidence-average treatment effects measured at the population level. Comprehensive simulations show the advantages of the proposed method in terms of bias, efficiency and precision in estimating heterogeneous causal effects. The empirically validated method was then used to analyze the National Cancer Database.
Collapse
Affiliation(s)
- Liangyuan Hu
- Department of Biostatistics and Epidemiology, Rutgers University, New Brunswick, NJ 07102, USA
| | - Jiayi Ji
- Department of Biostatistics and Epidemiology, Rutgers University, New Brunswick, NJ 07102, USA
| | - Hao Liu
- Department of Biostatistics and Epidemiology, Rutgers University, New Brunswick, NJ 07102, USA
- Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ 07102, USA
| | - Ronald Ennis
- Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ 07102, USA
- Robert Wood Johnson Medical School, Rutgers University, New Brunswick, NJ 07102, USA
| |
Collapse
|
14
|
Hu L, Ji J, Ennis RD, Hogan JW. A flexible approach for causal inference with multiple treatments and clustered survival outcomes. Stat Med 2022; 41:4982-4999. [PMID: 35948011 PMCID: PMC9588538 DOI: 10.1002/sim.9548] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 07/20/2022] [Accepted: 07/22/2022] [Indexed: 01/07/2023]
Abstract
When drawing causal inferences about the effects of multiple treatments on clustered survival outcomes using observational data, we need to address implications of the multilevel data structure, multiple treatments, censoring, and unmeasured confounding for causal analyses. Few off-the-shelf causal inference tools are available to simultaneously tackle these issues. We develop a flexible random-intercept accelerated failure time model, in which we use Bayesian additive regression trees to capture arbitrarily complex relationships between censored survival times and pre-treatment covariates and use the random intercepts to capture cluster-specific main effects. We develop an efficient Markov chain Monte Carlo algorithm to draw posterior inferences about the population survival effects of multiple treatments and examine the variability in cluster-level effects. We further propose an interpretable sensitivity analysis approach to evaluate the sensitivity of drawn causal inferences about treatment effect to the potential magnitude of departure from the causal assumption of no unmeasured confounding. Expansive simulations empirically validate and demonstrate good practical operating characteristics of our proposed methods. Applying the proposed methods to a dataset on older high-risk localized prostate cancer patients drawn from the National Cancer Database, we evaluate the comparative effects of three treatment approaches on patient survival, and assess the ramifications of potential unmeasured confounding. The methods developed in this work are readily available in the R $$ \mathsf{R}\kern.15em $$ package riAFTBART $$ \mathsf{riAFTBART} $$ .
Collapse
Affiliation(s)
- Liangyuan Hu
- Department of Biostatistics and EpidemiologyRutgers UniversityPiscatawayNew JerseyUSA
| | - Jiayi Ji
- Department of Biostatistics and EpidemiologyRutgers UniversityPiscatawayNew JerseyUSA
| | - Ronald D. Ennis
- Department of Radiation OncologyCancer Institute of New Jersey of Rutgers UniversityNew BrunswickNew JerseyUSA
| | - Joseph W. Hogan
- Department of BiostatisticsBrown UniversityProvidenceRhode IslandUSA
| |
Collapse
|
15
|
Wang Y, Li F, Blaha O, Meng C, Esserman D. Design and analysis of partially randomized preference trials with propensity score stratification. Stat Methods Med Res 2022; 31:1515-1537. [PMID: 35469503 PMCID: PMC10530658 DOI: 10.1177/09622802221095673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
While the two-stage randomized design allows us to unbiasedly evaluate the impact of patients' treatment preference on the outcome of interest, it may not always be practical to implement in clinical practice; patients with a strong preference may not be willing to be randomized. The more pragmatic, partially randomized preference design (PRPD) allows patients who are unwilling to be randomized, but willing to state their preference, to receive their preferred treatment in lieu of the first-stage randomization in the two-stage design, at the cost of potentially introducing bias in estimating the effects of interest. In this article, we consider the application of propensity score stratification (PSS) in a PRPD to recreate a conditional first-stage randomization based on observed covariates, enabling the estimation and inference of the overall treatment, selection and preference effects with minimum bias. We additionally derive a set of closed-form sample size formulas for detecting all three effects of interest in a PSS-PRPD. Simulation studies demonstrate the bias reduction properties of the PSS-PRPD, and validate the accuracy of the proposed sample size formulas. Our results show that 5 to 10 propensity score strata may be needed to correct for biases in effect estimates, and the exact number of strata needed to achieve the best match between the empirical power and formula prediction may depend on the degree of effect heterogeneity. Finally, we demonstrate our proposed formulas by estimating the required sample sizes to detect treatment, selection and preference effects in the context of the Harapan Study.
Collapse
Affiliation(s)
- Yumin Wang
- Department of Biostatistics, 50296Yale School of Public Health, New Haven, Connecticut, USA
| | - Fan Li
- Department of Biostatistics, 50296Yale School of Public Health, New Haven, Connecticut, USA
| | - Ondrej Blaha
- Department of Biostatistics, 50296Yale School of Public Health, New Haven, Connecticut, USA
| | - Can Meng
- Department of Biostatistics, 50296Yale School of Public Health, New Haven, Connecticut, USA
| | - Denise Esserman
- Department of Biostatistics, 50296Yale School of Public Health, New Haven, Connecticut, USA
| |
Collapse
|
16
|
Health status balancing weights for estimation of health care disparities. HEALTH SERVICES AND OUTCOMES RESEARCH METHODOLOGY 2022. [DOI: 10.1007/s10742-022-00287-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
17
|
Hu L, Zou J, Gu C, Ji J, Lopez M, Kale M. A FLEXIBLE SENSITIVITY ANALYSIS APPROACH FOR UNMEASURED CONFOUNDING WITH MULTIPLE TREATMENTS AND A BINARY OUTCOME WITH APPLICATION TO SEER-MEDICARE LUNG CANCER DATA. Ann Appl Stat 2022; 16:1014-1037. [PMID: 36644682 PMCID: PMC9835106 DOI: 10.1214/21-aoas1530] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
In the absence of a randomized experiment, a key assumption for drawing causal inference about treatment effects is the ignorable treatment assignment. Violations of the ignorability assumption may lead to biased treatment effect estimates. Sensitivity analysis helps gauge how causal conclusions will be altered in response to the potential magnitude of departure from the ignorability assumption. However, sensitivity analysis approaches for unmeasured confounding in the context of multiple treatments and binary outcomes are scarce. We propose a flexible Monte Carlo sensitivity analysis approach for causal inference in such settings. We first derive the general form of the bias introduced by unmeasured confounding, with emphasis on theoretical properties uniquely relevant to multiple treatments. We then propose methods to encode the impact of unmeasured confounding on potential outcomes and adjust the estimates of causal effects in which the presumed unmeasured confounding is removed. Our proposed methods embed nested multiple imputation within the Bayesian framework, which allow for seamless integration of the uncertainty about the values of the sensitivity parameters and the sampling variability, as well as use of the Bayesian Additive Regression Trees for modeling flexibility. Expansive simulations validate our methods and gain insight into sensitivity analysis with multiple treatments. We use the SEER-Medicare data to demonstrate sensitivity analysis using three treatments for early stage non-small cell lung cancer. The methods developed in this work are readily available in the R package SAMTx.
Collapse
Affiliation(s)
- Liangyuan Hu
- Department of Biostatistics and Epidemiology, Rutgers University
| | - Jungang Zou
- Department of Biostatistics, Columbia University
| | | | - Jiayi Ji
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai
| | | | - Minal Kale
- Department of Medicine, Icahn School of Medicine at Mount Sinai
| |
Collapse
|
18
|
Lin J, Trinquart L. Doubly-robust estimator of the difference in restricted mean times lost with competing risks data. Stat Methods Med Res 2022; 31:1881-1903. [PMID: 35607287 DOI: 10.1177/09622802221102625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In the context of competing risks data, the subdistribution hazard ratio has limited clinical interpretability to measure treatment effects. An alternative is the difference in restricted mean times lost (RMTL), which gives the mean time lost to a specific cause of failure between treatment groups. In non-randomized studies, the average causal effect is conventionally used for decision-making about treatment and public health policies. We show how the difference in RMTL can be estimated by contrasting the integrated cumulative incidence functions from a Fine-Gray model. We also show how the difference in RMTL can be estimated by using inverse probability of treatment weighting and contrasts between weighted non-parametric estimators of the area below the cumulative incidence. We use pseudo-observation approaches to estimate both component models and we integrate them into a doubly-robust estimator. We demonstrate that this estimator is consistent when either component is correctly specified. We conduct simulation studies to assess its finite-sample performance and demonstrate its inherited consistency property from its component models. We also examine the performance of this estimator under varying degrees of covariate overlap and under a model misspecification of nonlinearity. We apply the proposed method to assess biomarker-treatment interaction in subpopulations of the POPLAR and OAK randomized controlled trials of second-line therapy for advanced non-small-cell lung cancer.
Collapse
Affiliation(s)
- Jingyi Lin
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Ludovic Trinquart
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA.,550030Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, USA.,551843Tufts Clinical and Translational Science Institute, Tufts University, Boston, MA, USA
| |
Collapse
|
19
|
Lin JYJ, Hu L, Huang C, Jiayi J, Lawrence S, Govindarajulu U. A flexible approach for variable selection in large-scale healthcare database studies with missing covariate and outcome data. BMC Med Res Methodol 2022; 22:132. [PMID: 35508974 PMCID: PMC9066834 DOI: 10.1186/s12874-022-01608-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 04/19/2022] [Indexed: 12/17/2022] Open
Abstract
Background Prior work has shown that combining bootstrap imputation with tree-based machine learning variable selection methods can provide good performances achievable on fully observed data when covariate and outcome data are missing at random (MAR). This approach however is computationally expensive, especially on large-scale datasets. Methods We propose an inference-based method, called RR-BART, which leverages the likelihood-based Bayesian machine learning technique, Bayesian additive regression trees, and uses Rubin’s rule to combine the estimates and variances of the variable importance measures on multiply imputed datasets for variable selection in the presence of MAR data. We conduct a representative simulation study to investigate the practical operating characteristics of RR-BART, and compare it with the bootstrap imputation based methods. We further demonstrate the methods via a case study of risk factors for 3-year incidence of metabolic syndrome among middle-aged women using data from the Study of Women’s Health Across the Nation (SWAN). Results The simulation study suggests that even in complex conditions of nonlinearity and nonadditivity with a large percentage of missingness, RR-BART can reasonably recover both prediction and variable selection performances, achievable on the fully observed data. RR-BART provides the best performance that the bootstrap imputation based methods can achieve with the optimal selection threshold value. In addition, RR-BART demonstrates a substantially stronger ability of detecting discrete predictors. Furthermore, RR-BART offers substantial computational savings. When implemented on the SWAN data, RR-BART adds to the literature by selecting a set of predictors that had been less commonly identified as risk factors but had substantial biological justifications. Conclusion The proposed variable selection method for MAR data, RR-BART, offers both computational efficiency and good operating characteristics and is utilitarian in large-scale healthcare database studies. Supplementary Information The online version contains supplementary material available at (10.1186/s12874-022-01608-7).
Collapse
Affiliation(s)
- Jung-Yi Joyce Lin
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, 1425 Madison Ave, New York, 10029, USA
| | - Liangyuan Hu
- Department of Biostatistics and Epidemiology, Rutgers University, 683 Hoes Lane West, Piscataway, 08854, USA.
| | - Chuyue Huang
- Primary Research Solution LLC., 115 W 18th St, New York, 10011, USA
| | - Ji Jiayi
- Department of Biostatistics and Epidemiology, Rutgers University, 683 Hoes Lane West, Piscataway, 08854, USA
| | - Steven Lawrence
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, 1425 Madison Ave, New York, 10029, USA
| | - Usha Govindarajulu
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, 1425 Madison Ave, New York, 10029, USA
| |
Collapse
|
20
|
Meid AD, Wirbka L, Groll A, Haefeli WE. Can Machine Learning from Real-World Data Support Drug Treatment Decisions? A Prediction Modeling Case for Direct Oral Anticoagulants. Med Decis Making 2021; 42:587-598. [PMID: 34911402 PMCID: PMC9189725 DOI: 10.1177/0272989x211064604] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
BACKGROUND Decision making for the "best" treatment is particularly challenging in situations in which individual patient response to drugs can largely differ from average treatment effects. By estimating individual treatment effects (ITEs), we aimed to demonstrate how strokes, major bleeding events, and a composite of both could be reduced by model-assisted recommendations for a particular direct oral anticoagulant (DOAC). METHODS In German claims data for the calendar years 2014-2018, we selected 29 901 new users of the DOACs rivaroxaban and apixaban. Random forests considered binary events within 1 y to estimate ITEs under each DOAC according to the X-learner algorithm with 29 potential effect modifiers; treatment recommendations were based on these estimated ITEs. Model performance was evaluated by the c-for-benefit statistics, absolute risk reduction (ARR), and absolute risk difference (ARD) by trial emulation. RESULTS A significant proportion of patients would be recommended a different treatment option than they actually received. The stroke model significantly discriminated patients for higher benefit and thus indicated improved decisions by reduced outcomes (c-for-benefit: 0.56; 95% confidence interval [0.52; 0.60]). In the group with apixaban recommendation, the model also improved the composite endpoint (ARR: 1.69 % [0.39; 2.97]). In trial emulations, model-assisted recommendations significantly reduced the composite event rate (ARD: -0.78 % [-1.40; -0.03]). CONCLUSIONS If prescribers are undecided about the potential benefits of different treatment options, ITEs can support decision making, especially if evidence is inconclusive, risk-benefit profiles of therapeutic alternatives differ significantly, and the patients' complexity deviates from "typical" study populations. In the exemplary case for DOACs and potentially in other situations, the significant impact could also become practically relevant if recommendations were available in an automated way as part of decision making.HighlightsIt was possible to calculate individual treatment effects (ITEs) from routine claims data for rivaroxaban and apixaban, and the characteristics between the groups with recommendation for one or the other option differed significantly.ITEs resulted in recommendations that were significantly superior to usual (observed) treatment allocations in terms of absolute risk reduction, both separately for stroke and in the composite endpoint of stroke and major bleeding.When similar patients from routine data were selected (precision cohorts) for patients with a strong recommendation for one option or the other, those similar patients under the respective recommendation showed a significantly better prognosis compared with the alternative option.Many steps may still be needed on the way to clinical practice, but the principle of decision support developed from routine data may point the way toward future decision-making processes.
Collapse
Affiliation(s)
- Andreas D Meid
- Department of Clinical Pharmacology and Pharmacoepidemiology, University of Heidelberg, Heidelberg, Germany
| | - Lucas Wirbka
- Department of Clinical Pharmacology and Pharmacoepidemiology, University of Heidelberg, Heidelberg, Germany
| | | | - Andreas Groll
- Department of Statistics, TU Dortmund University, Dortmund, Germany
| | - Walter E Haefeli
- Department of Clinical Pharmacology and Pharmacoepidemiology, University of Heidelberg, Heidelberg, Germany
| |
Collapse
|
21
|
Hu L, Joyce Lin JY, Ji J. Variable selection with missing data in both covariates and outcomes: Imputation and machine learning. Stat Methods Med Res 2021; 30:2651-2671. [PMID: 34696650 DOI: 10.1177/09622802211046385] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Variable selection in the presence of both missing covariates and outcomes is an important statistical research topic. Parametric regression are susceptible to misspecification, and as a result are sub-optimal for variable selection. Flexible machine learning methods mitigate the reliance on the parametric assumptions, but do not provide as naturally defined variable importance measure as the covariate effect native to parametric models. We investigate a general variable selection approach when both the covariates and outcomes can be missing at random and have general missing data patterns. This approach exploits the flexibility of machine learning models and bootstrap imputation, which is amenable to nonparametric methods in which the covariate effects are not directly available. We conduct expansive simulations investigating the practical operating characteristics of the proposed variable selection approach, when combined with four tree-based machine learning methods, extreme gradient boosting, random forests, Bayesian additive regression trees, and conditional random forests, and two commonly used parametric methods, lasso and backward stepwise selection. Numeric results suggest that, extreme gradient boosting and Bayesian additive regression trees have the overall best variable selection performance with respect to the F1 score and Type I error, while the lasso and backward stepwise selection have subpar performance across various settings. There is no significant difference in the variable selection performance due to imputation methods. We further demonstrate the methods via a case study of risk factors for 3-year incidence of metabolic syndrome with data from the Study of Women's Health Across the Nation.
Collapse
Affiliation(s)
- Liangyuan Hu
- Department of Biostatistics and Epidemiology, 242612Rutgers University School of Public Health, USA
| | - Jung-Yi Joyce Lin
- Department of Population Health Science & Policy, 5925Icahn School of Medicine at Mount Sinai, USA
| | - Jiayi Ji
- Department of Population Health Science & Policy, 5925Icahn School of Medicine at Mount Sinai, USA
| |
Collapse
|
22
|
Hu L, Lin JY, Sigel K, Kale M. Estimating heterogeneous survival treatment effects of lung cancer screening approaches: A causal machine learning analysis. Ann Epidemiol 2021; 62:36-42. [PMID: 34157399 PMCID: PMC8463451 DOI: 10.1016/j.annepidem.2021.06.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 05/18/2021] [Accepted: 06/14/2021] [Indexed: 12/20/2022]
Abstract
The National Lung Screening Trial (NLST) found that low-dose computed tomography (LDCT) screening provided lung cancer (LC) mortality benefit compared to chest radiography (CXR). Considerable research concerns identifying the differential treatment effects that may exist in certain subpopulations. We shed light on several important issues in existing research and highlight the need for further investigation of the heterogeneous comparative effect of LDCT versus CXR, using more flexible and rigorous statistical approaches. We used a high-performance Bayesian machine learning approach designed for censored survival data, accelerated failure time Bayesian additive regression trees model (AFT-BART), to flexibly capture the relationships between the failure time and predictors. We then used the counterfactual framework to draw Markov chain Monte Carlo samples of the individual treatment effect for each participant. Using these posterior samples, we explored the possible treatment effect heterogeneity via a stepwise binary tree approach. When re-analyzed with AFT-BART, LDCT did not have a statistically significant LC or overall mortality benefit compared to CXR. The Asian and Black (particularly those with pack-year ≥ 37 years and without emphysema) NLST population were shown to have enhanced overall mortality benefit from LDCT than the population average. Although inconclusive for LC mortality benefit, Asians, Blacks and Whites with history of chronic obstructive pulmonary disease showed a small trend towards benefit from LDCT. Causal inference with flexible machine learning modeling can provide valuable knowledge for informing treatment decision and planning targeted clinical trials emphasizing personalized medicine approaches.
Collapse
Affiliation(s)
- Liangyuan Hu
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY; Department of Biostatistics and Epidemiology, Rutgers University, Piscataway, NJ.
| | - Jung-Yi Lin
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY; Icahn School of Medicine at Mount Sinai, Institute for Health Care Delivery Science, New York, NY
| | - Keith Sigel
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Minal Kale
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
| |
Collapse
|