1
|
Li J, Wisnivesky JP, Lin JJ, Campbell KN, Hu L, Kale MS. Examining the Trajectory of Health-Related Quality of Life among Coronavirus Disease Patients. J Gen Intern Med 2024; 39:1820-1827. [PMID: 38169022 PMCID: PMC11282031 DOI: 10.1007/s11606-023-08575-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 12/08/2023] [Indexed: 01/05/2024]
Abstract
BACKGROUND Recent studies have reported a reduction in health-related quality of life (HR-QoL) among post-coronavirus disease 2019 (COVID-19) patients. However, there remains a gap in research examining the heterogeneity and determinants of HR-QoL trajectory in these patients. OBJECTIVE To describe and identify factors explaining the variability in HR-QoL trajectories among a cohort of patients with history of COVID-19. DESIGN A prospective study using data from a cohort of COVID-19 patients enrolled into a registry established at a health system in New York City. PARTICIPANTS Participants were enrolled from July 2020 to June 2022, and completed a baseline evaluation and two follow-up visits at 6 and 12 months. METHODS We assessed HR-QoL with the 29-item Patient Reported Outcomes Measurement Information System instrument, which was summarized into mental and physical health domains. We performed latent class growth and multinomial logistic regression to examine trajectories of HR-QoL and identify factors associated with specific trajectories. RESULTS The study included 588 individuals with a median age of 52 years, 65% female, 54% White, 18% Black, and 18% Hispanic. We identified five physical health trajectories and four mental health trajectories. Female gender, having pre-existing hypertension, cardiovascular disease, asthma, and hospitalization for acute COVID-19 were independently associated with lower physical health. In addition, patients with increasing body mass index were more likely to experience lower physical health over time. Female gender, younger age, pre-existing asthma, arthritis and cardiovascular disease were associated with poor mental health. CONCLUSIONS We found significant heterogeneity of HR-QoL after COVID-19, with women and patients with specific comorbidities at increased risk of lower HR-QoL. Implementation of targeted psychological and physical interventions is crucial for enhancing the quality of life of this patient population.
Collapse
Affiliation(s)
- Jia Li
- Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY, 10029, USA.
| | - Juan P Wisnivesky
- Division of General Internal Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Division of Pulmonary and Critical Care Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Jenny J Lin
- Division of General Internal Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Kirk N Campbell
- Division of Nephrology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Liangyuan Hu
- Department of Biostatistics and Epidemiology, Rutgers University School of Public Health, New Brunswick, NJ, USA
| | - Minal S Kale
- Division of General Internal Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
2
|
Xiong W, Roy J, Liu H, Hu L. Leveraging machine learning: Covariate-adjusted Bayesian adaptive randomization and subgroup discovery in multi-arm survival trials. Contemp Clin Trials 2024; 142:107547. [PMID: 38688389 DOI: 10.1016/j.cct.2024.107547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 04/17/2024] [Accepted: 04/25/2024] [Indexed: 05/02/2024]
Abstract
Clinical trials evaluate the safety and efficacy of treatments for specific diseases. Ensuring these studies are well-powered is crucial for identifying superior treatments. With the rise of personalized medicine, treatment efficacy may vary based on biomarker profiles. However, researchers often lack prior knowledge about which biomarkers are linked to varied treatment effects. Fixed or response-adaptive designs may not sufficiently account for heterogeneous patient characteristics, such as genetic diversity, potentially reducing the chance of selecting the optimal treatment for individuals. Recent advances in Bayesian nonparametric modeling pave the way for innovative trial designs that not only maintain robust power but also offer the flexibility to identify subgroups deriving greater benefits from specific treatments. Building on this inspiration, we introduce a Bayesian adaptive design for multi-arm trials focusing on time-to-event endpoints. We introduce a covariate-adjusted response adaptive randomization, updating treatment allocation probabilities grounded on causal effect estimates using a random intercept accelerated failure time BART model. After the trial concludes, we suggest employing a multi-response decision tree to pinpoint subgroups with varying treatment impacts. The performance of our design is then assessed via comprehensive simulations.
Collapse
Affiliation(s)
- Wenxuan Xiong
- Department of Biostatistics and Epidemiology, Rutgers University School of Public Health, Piscataway, NJ, USA.
| | - Jason Roy
- Department of Biostatistics and Epidemiology, Rutgers University School of Public Health, Piscataway, NJ, USA
| | - Hao Liu
- Department of Biostatistics and Epidemiology, Rutgers University School of Public Health, Piscataway, NJ, USA; Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ, USA
| | - Liangyuan Hu
- Department of Biostatistics and Epidemiology, Rutgers University School of Public Health, Piscataway, NJ, USA
| |
Collapse
|
3
|
Hu L. A new method for clustered survival data: Estimation of treatment effect heterogeneity and variable selection. Biom J 2024; 66:e2200178. [PMID: 38072661 PMCID: PMC10953775 DOI: 10.1002/bimj.202200178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 07/31/2023] [Accepted: 08/11/2023] [Indexed: 01/30/2024]
Abstract
We recently developed a new method random-intercept accelerated failure time model with Bayesian additive regression trees (riAFT-BART) to draw causal inferences about population treatment effect on patient survival from clustered and censored survival data while accounting for the multilevel data structure. The practical utility of this method goes beyond the estimation of population average treatment effect. In this work, we exposit how riAFT-BART can be used to solve two important statistical questions with clustered survival data: estimating the treatment effect heterogeneity and variable selection. Leveraging the likelihood-based machine learning, we describe a way in which we can draw posterior samples of the individual survival treatment effect from riAFT-BART model runs, and use the drawn posterior samples to perform an exploratory treatment effect heterogeneity analysis to identify subpopulations who may experience differential treatment effects than population average effects. There is sparse literature on methods for variable selection among clustered and censored survival data, particularly ones using flexible modeling techniques. We propose a permutation-based approach using the predictor's variable inclusion proportion supplied by the riAFT-BART model for variable selection. To address the missing data issue frequently encountered in health databases, we propose a strategy to combine bootstrap imputation and riAFT-BART for variable selection among incomplete clustered survival data. We conduct an expansive simulation study to examine the practical operating characteristics of our proposed methods, and provide empirical evidence that our proposed methods perform better than several existing methods across a wide range of data scenarios. Finally, we demonstrate the methods via a case study of predictors for in-hospital mortality among severe COVID-19 patients and estimating the heterogeneous treatment effects of three COVID-specific medications. The methods developed in this work are readily available in the R ${\textsf {R}}$ package riAFTBART $\textsf {riAFTBART}$ .
Collapse
Affiliation(s)
- Liangyuan Hu
- Department of Biostatistics and Epidemiology, Rutgers University, Piscataway, New Jersey 08854
| |
Collapse
|
4
|
Hu L, Li L. Using Tree-Based Machine Learning for Health Studies: Literature Review and Case Series. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:16080. [PMID: 36498153 PMCID: PMC9736500 DOI: 10.3390/ijerph192316080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 11/22/2022] [Accepted: 11/24/2022] [Indexed: 06/17/2023]
Abstract
Tree-based machine learning methods have gained traction in the statistical and data science fields. They have been shown to provide better solutions to various research questions than traditional analysis approaches. To encourage the uptake of tree-based methods in health research, we review the methodological fundamentals of three key tree-based machine learning methods: random forests, extreme gradient boosting and Bayesian additive regression trees. We further conduct a series of case studies to illustrate how these methods can be properly used to solve important health research problems in four domains: variable selection, estimation of causal effects, propensity score weighting and missing data. We exposit that the central idea of using ensemble tree methods for these research questions is accurate prediction via flexible modeling. We applied ensemble trees methods to select important predictors for the presence of postoperative respiratory complication among early stage lung cancer patients with resectable tumors. We then demonstrated how to use these methods to estimate the causal effects of popular surgical approaches on postoperative respiratory complications among lung cancer patients. Using the same data, we further implemented the methods to accurately estimate the inverse probability weights for a propensity score analysis of the comparative effectiveness of the surgical approaches. Finally, we demonstrated how random forests can be used to impute missing data using the Study of Women's Health Across the Nation data set. To conclude, the tree-based methods are a flexible tool and should be properly used for health investigations.
Collapse
Affiliation(s)
- Liangyuan Hu
- Department of Biostatistics and Epidemiology, Rutgers University, Piscataway, NJ 08854, USA
| | - Lihua Li
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| |
Collapse
|
5
|
Hu L, Ji J, Liu H, Ennis R. A Flexible Approach for Assessing Heterogeneity of Causal Treatment Effects on Patient Survival Using Large Datasets with Clustered Observations. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:14903. [PMID: 36429621 PMCID: PMC9690785 DOI: 10.3390/ijerph192214903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 11/08/2022] [Accepted: 11/09/2022] [Indexed: 06/16/2023]
Abstract
Personalized medicine requires an understanding of treatment effect heterogeneity. Evolving toward causal evidence for scenarios not studied in randomized trials necessitates a methodology using real-world evidence. Herein, we demonstrate a methodology that generates causal effects, assesses the heterogeneity of the effects and adjusts for the clustered nature of the data. This study uses a state-of-the-art machine learning survival model, riAFT-BART, to draw causal inferences about individual survival treatment effects, while accounting for the variability in institutional effects; further, it proposes a data-driven approach to agnostically (as opposed to a priori hypotheses) ascertain which subgroups exhibit an enhanced treatment effect from which intervention, relative to global evidence-average treatment effects measured at the population level. Comprehensive simulations show the advantages of the proposed method in terms of bias, efficiency and precision in estimating heterogeneous causal effects. The empirically validated method was then used to analyze the National Cancer Database.
Collapse
Affiliation(s)
- Liangyuan Hu
- Department of Biostatistics and Epidemiology, Rutgers University, New Brunswick, NJ 07102, USA
| | - Jiayi Ji
- Department of Biostatistics and Epidemiology, Rutgers University, New Brunswick, NJ 07102, USA
| | - Hao Liu
- Department of Biostatistics and Epidemiology, Rutgers University, New Brunswick, NJ 07102, USA
- Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ 07102, USA
| | - Ronald Ennis
- Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ 07102, USA
- Robert Wood Johnson Medical School, Rutgers University, New Brunswick, NJ 07102, USA
| |
Collapse
|
6
|
Hu L, Ji J, Ennis RD, Hogan JW. A flexible approach for causal inference with multiple treatments and clustered survival outcomes. Stat Med 2022; 41:4982-4999. [PMID: 35948011 PMCID: PMC9588538 DOI: 10.1002/sim.9548] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 07/20/2022] [Accepted: 07/22/2022] [Indexed: 01/07/2023]
Abstract
When drawing causal inferences about the effects of multiple treatments on clustered survival outcomes using observational data, we need to address implications of the multilevel data structure, multiple treatments, censoring, and unmeasured confounding for causal analyses. Few off-the-shelf causal inference tools are available to simultaneously tackle these issues. We develop a flexible random-intercept accelerated failure time model, in which we use Bayesian additive regression trees to capture arbitrarily complex relationships between censored survival times and pre-treatment covariates and use the random intercepts to capture cluster-specific main effects. We develop an efficient Markov chain Monte Carlo algorithm to draw posterior inferences about the population survival effects of multiple treatments and examine the variability in cluster-level effects. We further propose an interpretable sensitivity analysis approach to evaluate the sensitivity of drawn causal inferences about treatment effect to the potential magnitude of departure from the causal assumption of no unmeasured confounding. Expansive simulations empirically validate and demonstrate good practical operating characteristics of our proposed methods. Applying the proposed methods to a dataset on older high-risk localized prostate cancer patients drawn from the National Cancer Database, we evaluate the comparative effects of three treatment approaches on patient survival, and assess the ramifications of potential unmeasured confounding. The methods developed in this work are readily available in theR $$ \mathsf{R}\kern.15em $$ packageriAFTBART $$ \mathsf{riAFTBART} $$ .
Collapse
Affiliation(s)
- Liangyuan Hu
- Department of Biostatistics and EpidemiologyRutgers UniversityPiscatawayNew JerseyUSA
| | - Jiayi Ji
- Department of Biostatistics and EpidemiologyRutgers UniversityPiscatawayNew JerseyUSA
| | - Ronald D. Ennis
- Department of Radiation OncologyCancer Institute of New Jersey of Rutgers UniversityNew BrunswickNew JerseyUSA
| | - Joseph W. Hogan
- Department of BiostatisticsBrown UniversityProvidenceRhode IslandUSA
| |
Collapse
|
7
|
Hu L, Ji J. CIMTx: An R Package for Causal Inference with Multiple Treatments using Observational Data. THE R JOURNAL 2022; 14:213-230. [PMID: 39310290 PMCID: PMC11415261 DOI: 10.32614/rj-2022-058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
CIMTx provides efficient and unified functions to implement modern methods for causal inferences with multiple treatments using observational data with a focus on binary outcomes. The methods include regression adjustment, inverse probability of treatment weighting, Bayesian additive regression trees, regression adjustment with multivariate spline of the generalized propensity score, vector matching and targeted maximum likelihood estimation. In addition, CIMTx illustrates ways in which users can simulate data adhering to the complex data structures in the multiple treatment setting. Furthermore, the CIMTx package offers a unique set of features to address the key causal assumptions: positivity and ignorability. For the positivity assumption, CIMTx demonstrates techniques to identify the common support region for retaining inferential units using inverse probability of treatment weighting, Bayesian additive regression trees and vector matching. To handle the ignorability assumption, CIMTx provides a flexible Monte Carlo sensitivity analysis approach to evaluate how causal conclusions would be altered in response to different magnitude of departure from ignorable treatment assignment.
Collapse
Affiliation(s)
- Lianyuan Hu
- Rutgers University School of Public Health, Department of Biostatistics and Epidemiology, 683 Hoes Lane West, Piscataway, NJ 08854, United States of America
| | - Jiayi Ji
- Rutgers University School of Public Health, Department of Biostatistics and Epidemiology 683 Hoes Lane West, Piscataway, NJ 08854, United States of America
| |
Collapse
|
8
|
Lin JYJ, Hu L, Huang C, Jiayi J, Lawrence S, Govindarajulu U. A flexible approach for variable selection in large-scale healthcare database studies with missing covariate and outcome data. BMC Med Res Methodol 2022; 22:132. [PMID: 35508974 PMCID: PMC9066834 DOI: 10.1186/s12874-022-01608-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 04/19/2022] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Prior work has shown that combining bootstrap imputation with tree-based machine learning variable selection methods can provide good performances achievable on fully observed data when covariate and outcome data are missing at random (MAR). This approach however is computationally expensive, especially on large-scale datasets. METHODS We propose an inference-based method, called RR-BART, which leverages the likelihood-based Bayesian machine learning technique, Bayesian additive regression trees, and uses Rubin's rule to combine the estimates and variances of the variable importance measures on multiply imputed datasets for variable selection in the presence of MAR data. We conduct a representative simulation study to investigate the practical operating characteristics of RR-BART, and compare it with the bootstrap imputation based methods. We further demonstrate the methods via a case study of risk factors for 3-year incidence of metabolic syndrome among middle-aged women using data from the Study of Women's Health Across the Nation (SWAN). RESULTS The simulation study suggests that even in complex conditions of nonlinearity and nonadditivity with a large percentage of missingness, RR-BART can reasonably recover both prediction and variable selection performances, achievable on the fully observed data. RR-BART provides the best performance that the bootstrap imputation based methods can achieve with the optimal selection threshold value. In addition, RR-BART demonstrates a substantially stronger ability of detecting discrete predictors. Furthermore, RR-BART offers substantial computational savings. When implemented on the SWAN data, RR-BART adds to the literature by selecting a set of predictors that had been less commonly identified as risk factors but had substantial biological justifications. CONCLUSION The proposed variable selection method for MAR data, RR-BART, offers both computational efficiency and good operating characteristics and is utilitarian in large-scale healthcare database studies.
Collapse
Affiliation(s)
- Jung-Yi Joyce Lin
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, 1425 Madison Ave, New York, 10029, USA
| | - Liangyuan Hu
- Department of Biostatistics and Epidemiology, Rutgers University, 683 Hoes Lane West, Piscataway, 08854, USA.
| | - Chuyue Huang
- Primary Research Solution LLC., 115 W 18th St, New York, 10011, USA
| | - Ji Jiayi
- Department of Biostatistics and Epidemiology, Rutgers University, 683 Hoes Lane West, Piscataway, 08854, USA
| | - Steven Lawrence
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, 1425 Madison Ave, New York, 10029, USA
| | - Usha Govindarajulu
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, 1425 Madison Ave, New York, 10029, USA
| |
Collapse
|