1
|
Hartman H, Schipper M, Kidwell K. A sequential, multiple assignment, randomized trial design with a tailoring function. Stat Med 2024; 43:4055-4072. [PMID: 38973591 DOI: 10.1002/sim.10161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 05/30/2024] [Accepted: 06/17/2024] [Indexed: 07/09/2024]
Abstract
We present a trial design for sequential multiple assignment randomized trials (SMARTs) that use a tailoring function instead of a binary tailoring variable allowing for simultaneous development of the tailoring variable and estimation of dynamic treatment regimens (DTRs). We apply methods for developing DTRs from observational data: tree-based regression learning and Q-learning. We compare this to a balanced randomized SMART with equal re-randomization probabilities and a typical SMART design where re-randomization depends on a binary tailoring variable and DTRs are analyzed with weighted and replicated regression. This project addresses a gap in clinical trial methodology by presenting SMARTs where second stage treatment is based on a continuous outcome removing the need for a binary tailoring variable. We demonstrate that data from a SMART using a tailoring function can be used to efficiently estimate DTRs and is more flexible under varying scenarios than a SMART using a tailoring variable.
Collapse
Affiliation(s)
- Holly Hartman
- Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, Ohio, USA
| | - Matthew Schipper
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Kelley Kidwell
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
2
|
McGinty EE, Seewald NJ, Bandara S, Cerdá M, Daumit GL, Eisenberg MD, Griffin BA, Igusa T, Jackson JW, Kennedy-Hendricks A, Marsteller J, Miech EJ, Purtle J, Schmid I, Schuler MS, Yuan CT, Stuart EA. Scaling Interventions to Manage Chronic Disease: Innovative Methods at the Intersection of Health Policy Research and Implementation Science. PREVENTION SCIENCE : THE OFFICIAL JOURNAL OF THE SOCIETY FOR PREVENTION RESEARCH 2024; 25:96-108. [PMID: 36048400 PMCID: PMC11042861 DOI: 10.1007/s11121-022-01427-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/16/2022] [Indexed: 10/14/2022]
Abstract
Policy implementation is a key component of scaling effective chronic disease prevention and management interventions. Policy can support scale-up by mandating or incentivizing intervention adoption, but enacting a policy is only the first step. Fully implementing a policy designed to facilitate implementation of health interventions often requires a range of accompanying implementation structures, like health IT systems, and implementation strategies, like training. Decision makers need to know what policies can support intervention adoption and how to implement those policies, but to date research on policy implementation is limited and innovative methodological approaches are needed. In December 2021, the Johns Hopkins ALACRITY Center for Health and Longevity in Mental Illness and the Johns Hopkins Center for Mental Health and Addiction Policy convened a forum of research experts to discuss approaches for studying policy implementation. In this report, we summarize the ideas that came out of the forum. First, we describe a motivating example focused on an Affordable Care Act Medicaid health home waiver policy used by some US states to support scale-up of an evidence-based integrated care model shown in clinical trials to improve cardiovascular care for people with serious mental illness. Second, we define key policy implementation components including structures, strategies, and outcomes. Third, we provide an overview of descriptive, predictive and associational, and causal approaches that can be used to study policy implementation. We conclude with discussion of priorities for methodological innovations in policy implementation research, with three key areas identified by forum experts: effect modification methods for making causal inferences about how policies' effects on outcomes vary based on implementation structures/strategies; causal mediation approaches for studying policy implementation mechanisms; and characterizing uncertainty in systems science models. We conclude with discussion of overarching methods considerations for studying policy implementation, including measurement of policy implementation, strategies for studying the role of context in policy implementation, and the importance of considering when establishing causality is the goal of policy implementation research.
Collapse
Affiliation(s)
- Emma E McGinty
- Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
| | - Nicholas J Seewald
- Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Sachini Bandara
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Magdalena Cerdá
- Department of Population Health, New York University Grossman School of Medicine, New York, NY, USA
| | - Gail L Daumit
- Division of General Internal Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Matthew D Eisenberg
- Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | | | - Tak Igusa
- Department of Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - John W Jackson
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Alene Kennedy-Hendricks
- Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Jill Marsteller
- Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Edward J Miech
- Indiana University School of Medicine, Indianapolis, USA
| | - Jonathan Purtle
- Department of Public Health Policy and Management, New York University School of Global Public Health, New York City, New York, USA
| | - Ian Schmid
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | | | - Christina T Yuan
- Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Elizabeth A Stuart
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| |
Collapse
|
3
|
Batorsky A, Anstrom KJ, Zeng D. Integrating randomized and observational studies to estimate optimal dynamic treatment regimes. Biometrics 2024; 80:ujae046. [PMID: 38804219 PMCID: PMC11130757 DOI: 10.1093/biomtc/ujae046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 04/17/2024] [Accepted: 05/03/2024] [Indexed: 05/29/2024]
Abstract
Sequential multiple assignment randomized trials (SMARTs) are the gold standard for estimating optimal dynamic treatment regimes (DTRs), but are costly and require a large sample size. We introduce the multi-stage augmented Q-learning estimator (MAQE) to improve efficiency of estimation of optimal DTRs by augmenting SMART data with observational data. Our motivating example comes from the Back Pain Consortium, where one of the overarching aims is to learn how to tailor treatments for chronic low back pain to individual patient phenotypes, knowledge which is lacking clinically. The Consortium-wide collaborative SMART and observational studies within the Consortium collect data on the same participant phenotypes, treatments, and outcomes at multiple time points, which can easily be integrated. Previously published single-stage augmentation methods for integration of trial and observational study (OS) data were adapted to estimate optimal DTRs from SMARTs using Q-learning. Simulation studies show the MAQE, which integrates phenotype, treatment, and outcome information from multiple studies over multiple time points, more accurately estimates the optimal DTR, and has a higher average value than a comparable Q-learning estimator without augmentation. We demonstrate this improvement is robust to a wide range of trial and OS sample sizes, addition of noise variables, and effect sizes.
Collapse
Affiliation(s)
- Anna Batorsky
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Kevin J Anstrom
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Donglin Zeng
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
4
|
Montoya LM, van der Laan MJ, Luedtke AR, Skeem JL, Coyle JR, Petersen ML. The optimal dynamic treatment rule superlearner: considerations, performance, and application to criminal justice interventions. Int J Biostat 2023; 19:217-238. [PMID: 35708222 PMCID: PMC10238854 DOI: 10.1515/ijb-2020-0127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 05/06/2022] [Indexed: 11/15/2022]
Abstract
The optimal dynamic treatment rule (ODTR) framework offers an approach for understanding which kinds of patients respond best to specific treatments - in other words, treatment effect heterogeneity. Recently, there has been a proliferation of methods for estimating the ODTR. One such method is an extension of the SuperLearner algorithm - an ensemble method to optimally combine candidate algorithms extensively used in prediction problems - to ODTRs. Following the ``causal roadmap," we causally and statistically define the ODTR and provide an introduction to estimating it using the ODTR SuperLearner. Additionally, we highlight practical choices when implementing the algorithm, including choice of candidate algorithms, metalearners to combine the candidates, and risk functions to select the best combination of algorithms. Using simulations, we illustrate how estimating the ODTR using this SuperLearner approach can uncover treatment effect heterogeneity more effectively than traditional approaches based on fitting a parametric regression of the outcome on the treatment, covariates and treatment-covariate interactions. We investigate the implications of choices in implementing an ODTR SuperLearner at various sample sizes. Our results show the advantages of: (1) including a combination of both flexible machine learning algorithms and simple parametric estimators in the library of candidate algorithms; (2) using an ensemble metalearner to combine candidates rather than selecting only the best-performing candidate; (3) using the mean outcome under the rule as a risk function. Finally, we apply the ODTR SuperLearner to the ``Interventions" study, an ongoing randomized controlled trial, to identify which justice-involved adults with mental illness benefit most from cognitive behavioral therapy to reduce criminal re-offending.
Collapse
Affiliation(s)
- Lina M. Montoya
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | | | - Jennifer L. Skeem
- School of Social Work and Goldman School of Public Policy, University of California Berkeley, Berkeley, USA
| | - Jeremy R. Coyle
- Division of Biostatistics, University of California Berkeley, Berkeley, USA
| | - Maya L. Petersen
- Divisions of Biostatistics and Epidemiology, University of California Berkeley, Berkeley, USA
| |
Collapse
|
5
|
Coulombe J, Moodie EEM, Shortreed SM, Renoux C. Estimating individualized treatment rules in longitudinal studies with covariate-driven observation times. Stat Methods Med Res 2023; 32:868-884. [PMID: 36927216 PMCID: PMC10248307 DOI: 10.1177/09622802231158733] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
The sequential treatment decisions made by physicians to treat chronic diseases are formalized in the statistical literature as dynamic treatment regimes. To date, methods for dynamic treatment regimes have been developed under the assumption that observation times, that is, treatment and outcome monitoring times, are determined by study investigators. That assumption is often not satisfied in electronic health records data in which the outcome, the observation times, and the treatment mechanism are associated with patients' characteristics. The treatment and observation processes can lead to spurious associations between the treatment of interest and the outcome to be optimized under the dynamic treatment regime if not adequately considered in the analysis. We address these associations by incorporating two inverse weights that are functions of a patient's covariates into dynamic weighted ordinary least squares to develop optimal single stage dynamic treatment regimes, known as individualized treatment rules. We show empirically that our methodology yields consistent, multiply robust estimators. In a cohort of new users of antidepressant drugs from the United Kingdom's Clinical Practice Research Datalink, the proposed method is used to develop an optimal treatment rule that chooses between two antidepressants to optimize a utility function related to the change in body mass index.
Collapse
Affiliation(s)
- Janie Coulombe
- Department of Mathematics and
Statistics, Université de Montréal, Montreal, Canada
| | - Erica EM Moodie
- Department of Epidemiology,
Biostatistics and Occupational Health, McGill University, Montreal, Canada
| | - Susan M Shortreed
- Biostatistics Unit, Kaiser Permanente Washington Health
Research Institute, Seattle, Washington, USA
- Biostatistics Department, University of Washington, Seattle, Washington, USA
| | - Christel Renoux
- Lady Davis Institute for Medical
Research, Jewish General Hospital, Montreal, Canada
- Department of Neurology and
Neurosurgery, McGill University, Montreal, Canada
- Department of Epidemiology,
Biostatistics and Occupational Health, Mcgill University, Montreal, Canada
| |
Collapse
|
6
|
Rudolph KE, Díaz I. When the Ends do not Justify the Means: Learning Who is Predicted to Have Harmful Indirect Effects. JOURNAL OF THE ROYAL STATISTICAL SOCIETY. SERIES A, (STATISTICS IN SOCIETY) 2022; 185:S573-S589. [PMID: 37397280 PMCID: PMC10312488 DOI: 10.1111/rssa.12951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
There is a growing literature on finding rules by which to assign treatment based on an individual's characteristics such that a desired outcome under the intervention is maximized. A related goal entails identifying a subpopulation of individuals predicted to have a harmful indirect effect (the effect of treatment on an outcome through mediators), perhaps even in the presence of a predicted beneficial total treatment effect. In some cases, the implications of a likely harmful indirect effect may outweigh an anticipated beneficial total treatment effect, and would motivate further discussion of whether to treat identified individuals. We build on the mediation and optimal treatment rule literatures to propose a method of identifying a subgroup for which the treatment effect through the mediator is expected to be harmful. Our approach is nonparametric, incorporates post-treatment confounders of the mediator-outcome relationship, and does not make restrictions on the distribution of baseline covariates, mediating variables, or outcomes. We apply the proposed approach to identify a subgroup of boys in the MTO housing voucher experiment who are predicted to have a harmful indirect effect of housing voucher receipt on subsequent psychiatric disorder incidence through aspects of their school and neighborhood environments.
Collapse
Affiliation(s)
- Kara E Rudolph
- Department of Epidemiology, Mailman School of Public Health, Columbia University
| | - Iván Díaz
- Division of Biostatistics, Department of Population Health Sciences, Weill Cornell Medicine
| |
Collapse
|
7
|
Shah SIH, De Pietro G, Paragliola G, Coronato A. Projection based inverse reinforcement learning for the analysis of dynamic treatment regimes. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04173-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
AbstractDynamic Treatment Regimes (DTRs) are adaptive treatment strategies that allow clinicians to personalize dynamically the treatment for each patient based on their step-by-step response to their treatment. There are a series of predefined alternative treatments for each disease and any patient may associate with one of these treatments according to his/her demographics. DTRs for a certain disease are studied and evaluated by means of statistical approaches where patients are randomized at each step of the treatment and their responses are observed. Recently, the Reinforcement Learning (RL) paradigm has also been applied to determine DTRs. However, such approaches may be limited by the need to design a true reward function, which may be difficult to formalize when the expert knowledge is not well assessed, as when the DTR is in the design phase. To address this limitation, an extension of the RL paradigm, namely Inverse Reinforcement Learning (IRL), has been adopted to learn the reward function from data, such as those derived from DTR trials. In this paper, we define a Projection Based Inverse Reinforcement Learning (PB-IRL) approach to learn the true underlying reward function for given demonstrations (DTR trials). Such a reward function can be used both to evaluate the set of DTRs determined for a certain disease, as well as to enable an RL-based intelligent agent to self-learn the best way and then act as a decision support system for the clinician.
Collapse
|
8
|
Di S, Petch J, Gerstein HC, Zhu R, Sherifali D. Optimizing Health Coaching for Patients With Type 2 Diabetes Using Machine Learning: Model Development and Validation Study. JMIR Form Res 2022; 6:e37838. [PMID: 36099006 PMCID: PMC9516374 DOI: 10.2196/37838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 06/06/2022] [Accepted: 09/07/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Health coaching is an emerging intervention that has been shown to improve clinical and patient-relevant outcomes for type 2 diabetes. Advances in artificial intelligence may provide an avenue for developing a more personalized, adaptive, and cost-effective approach to diabetes health coaching. OBJECTIVE We aim to apply Q-learning, a widely used reinforcement learning algorithm, to a diabetes health-coaching data set to develop a model for recommending an optimal coaching intervention at each decision point that is tailored to a patient's accumulated history. METHODS In this pilot study, we fit a two-stage reinforcement learning model on 177 patients from the intervention arm of a community-based randomized controlled trial conducted in Canada. The policy produced by the reinforcement learning model can recommend a coaching intervention at each decision point that is tailored to a patient's accumulated history and is expected to maximize the composite clinical outcome of hemoglobin A1c reduction and quality of life improvement (normalized to [ 0, 1 ], with a higher score being better). Our data, models, and source code are publicly available. RESULTS Among the 177 patients, the coaching intervention recommended by our policy mirrored the observed diabetes health coach's interventions in 17.5% (n=31) of the patients in stage 1 and 14.1% (n=25) of the patients in stage 2. Where there was agreement in both stages, the average cumulative composite outcome (0.839, 95% CI 0.460-1.220) was better than those for whom the optimal policy agreed with the diabetes health coach in only one stage (0.791, 95% CI 0.747-0.836) or differed in both stages (0.755, 95% CI 0.728-0.781). Additionally, the average cumulative composite outcome predicted for the policy's recommendations was significantly better than that of the observed diabetes health coach's recommendations (tn-1=10.040; P<.001). CONCLUSIONS Applying reinforcement learning to diabetes health coaching could allow for both the automation of health coaching and an improvement in health outcomes produced by this type of intervention.
Collapse
Affiliation(s)
- Shuang Di
- Centre for Data Science and Digital Health, Hamilton Health Sciences, Hamilton, ON, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Jeremy Petch
- Centre for Data Science and Digital Health, Hamilton Health Sciences, Hamilton, ON, Canada
- Population Health Research Institute, Hamilton Health Sciences, Hamilton, ON, Canada
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON, Canada
- Department of Medicine, Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada
| | - Hertzel C Gerstein
- Population Health Research Institute, Hamilton Health Sciences, Hamilton, ON, Canada
- Department of Medicine, Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada
| | - Ruoqing Zhu
- Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL, United States
| | - Diana Sherifali
- Population Health Research Institute, Hamilton Health Sciences, Hamilton, ON, Canada
- School of Nursing, McMaster University, Hamilton, ON, Canada
| |
Collapse
|
9
|
Liu M, Shen X, Pan W. Deep reinforcement learning for personalized treatment recommendation. Stat Med 2022; 41:4034-4056. [PMID: 35716038 PMCID: PMC9427729 DOI: 10.1002/sim.9491] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 05/22/2022] [Accepted: 05/25/2022] [Indexed: 12/12/2022]
Abstract
In precision medicine, the ultimate goal is to recommend the most effective treatment to an individual patient based on patient-specific molecular and clinical profiles, possibly high-dimensional. To advance cancer treatment, large-scale screenings of cancer cell lines against chemical compounds have been performed to help better understand the relationship between genomic features and drug response; existing machine learning approaches use exclusively supervised learning, including penalized regression and recommender systems. However, it would be more efficient to apply reinforcement learning to sequentially learn as data accrue, including selecting the most promising therapy for a patient given individual molecular and clinical features and then collecting and learning from the corresponding data. In this article, we propose a novel personalized ranking system called Proximal Policy Optimization Ranking (PPORank), which ranks the drugs based on their predicted effects per cell line (or patient) in the framework of deep reinforcement learning (DRL). Modeled as a Markov decision process, the proposed method learns to recommend the most suitable drugs sequentially and continuously over time. As a proof-of-concept, we conduct experiments on two large-scale cancer cell line data sets in addition to simulated data. The results demonstrate that the proposed DRL-based PPORank outperforms the state-of-the-art competitors based on supervised learning. Taken together, we conclude that novel methods in the framework of DRL have great potential for precision medicine and should be further studied.
Collapse
Affiliation(s)
- Mingyang Liu
- School of StatisticsUniversity of MinnesotaMinneapolisMinnesotaUSA
| | - Xiaotong Shen
- School of StatisticsUniversity of MinnesotaMinneapolisMinnesotaUSA
| | - Wei Pan
- Division of BiostatisticsUniversity of MinnesotaMinneapolisMinnesotaUSA
| |
Collapse
|
10
|
Jiang C, Wallace MP, Thompson ME. Dynamic treatment regimes with interference. CAN J STAT 2022. [DOI: 10.1002/cjs.11702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Cong Jiang
- Department of Statistics and Actuarial Science University of Waterloo Waterloo Ontario Canada
| | - Michael P. Wallace
- Department of Statistics and Actuarial Science University of Waterloo Waterloo Ontario Canada
| | - Mary E. Thompson
- Department of Statistics and Actuarial Science University of Waterloo Waterloo Ontario Canada
| |
Collapse
|
11
|
Tardini E, Zhang X, Canahuate G, Wentzel A, Mohamed ASR, Van Dijk L, Fuller CD, Marai GE. Optimal Treatment Selection in Sequential Systemic and Locoregional Therapy of Oropharyngeal Squamous Carcinomas: Deep Q-Learning With a Patient-Physician Digital Twin Dyad. J Med Internet Res 2022; 24:e29455. [PMID: 35442211 PMCID: PMC9069283 DOI: 10.2196/29455] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Revised: 09/03/2021] [Accepted: 02/09/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Currently, selection of patients for sequential versus concurrent chemotherapy and radiation regimens lacks evidentiary support and it is based on locally optimal decisions for each step. OBJECTIVE We aim to optimize the multistep treatment of patients with head and neck cancer and predict multiple patient survival and toxicity outcomes, and we develop, apply, and evaluate a first application of deep Q-learning (DQL) and simulation to this problem. METHODS The treatment decision DQL digital twin and the patient's digital twin were created, trained, and evaluated on a data set of 536 patients with oropharyngeal squamous cell carcinoma with the goal of, respectively, determining the optimal treatment decisions with respect to survival and toxicity metrics and predicting the outcomes of the optimal treatment on the patient. Of the data set of 536 patients, the models were trained on a subset of 402 (75%) patients (split randomly) and evaluated on a separate set of 134 (25%) patients. Training and evaluation of the digital twin dyad was completed in August 2020. The data set includes 3-step sequential treatment decisions and complete relevant history of the patient cohort treated at MD Anderson Cancer Center between 2005 and 2013, with radiomics analysis performed for the segmented primary tumor volumes. RESULTS On the test set, we found mean 87.35% (SD 11.15%) and median 90.85% (IQR 13.56%) accuracies in treatment outcome prediction, matching the clinicians' outcomes and improving the (predicted) survival rate by +3.73% (95% CI -0.75% to 8.96%) and the dysphagia rate by +0.75% (95% CI -4.48% to 6.72%) when following DQL treatment decisions. CONCLUSIONS Given the prediction accuracy and predicted improvement regarding the medically relevant outcomes yielded by this approach, this digital twin dyad of the patient-physician dynamic treatment problem has the potential of aiding physicians in determining the optimal course of treatment and in assessing its outcomes.
Collapse
Affiliation(s)
- Elisa Tardini
- Department of Computer Science, University of Illinois at Chicago, Chicago, IL, United States
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
| | - Xinhua Zhang
- Department of Computer Science, University of Illinois at Chicago, Chicago, IL, United States
| | - Guadalupe Canahuate
- Department of Electrical and Computer Engineering, University Of Iowa, Iowa City, IA, United States
| | - Andrew Wentzel
- Department of Computer Science, University of Illinois at Chicago, Chicago, IL, United States
| | - Abdallah S R Mohamed
- MD Anderson Cancer Center, Houston, TX, United States
- Department of Radiation Oncology, The University of Texas, Austin, TX, United States
| | | | - Clifton D Fuller
- MD Anderson Cancer Center, Houston, TX, United States
- Department of Radiation Oncology, The University of Texas, Austin, TX, United States
| | - G Elisabeta Marai
- Department of Computer Science, University of Illinois at Chicago, Chicago, IL, United States
| |
Collapse
|
12
|
Survival Augmented Patient Preference Incorporated Reinforcement Learning to Evaluate Tailoring Variables for Personalized Healthcare. STATS 2021. [DOI: 10.3390/stats4040046] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
In this paper, we consider personalized treatment decision strategies in the management of chronic diseases, such as chronic kidney disease, which typically consists of sequential and adaptive treatment decision making. We investigate a two-stage treatment setting with a survival outcome that could be right censored. This can be formulated through a dynamic treatment regime (DTR) framework, where the goal is to tailor treatment to each individual based on their own medical history in order to maximize a desirable health outcome. We develop a new method, Survival Augmented Patient Preference incorporated reinforcement Q-Learning (SAPP-Q-Learning) to decide between quality of life and survival restricted at maximal follow-up. Our method incorporates the latent patient preference into a weighted utility function that balances between quality of life and survival time, in a Q-learning model framework. We further propose a corresponding m-out-of-n Bootstrap procedure to accurately make statistical inferences and construct confidence intervals on the effects of tailoring variables, whose values can guide personalized treatment strategies.
Collapse
|
13
|
Zhang B, Weiss J, Small DS, Zhao Q. Selecting and Ranking Individualized Treatment Rules With Unmeasured Confounding. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2020.1736083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Bo Zhang
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA
| | - Jordan Weiss
- Department of Sociology, University of Pennsylvania, Philadelphia, PA
| | - Dylan S. Small
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA
| | - Qingyuan Zhao
- Statistical Laboratory, University of Cambridge, Cambridge, UK
| |
Collapse
|
14
|
Luckett DJ, Laber EB, Kim S, Kosorok MR. Estimation and Optimization of Composite Outcomes. JOURNAL OF MACHINE LEARNING RESEARCH : JMLR 2021; 22:167. [PMID: 34733120 PMCID: PMC8562677] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
There is tremendous interest in precision medicine as a means to improve patient outcomes by tailoring treatment to individual characteristics. An individualized treatment rule formalizes precision medicine as a map from patient information to a recommended treatment. A treatment rule is defined to be optimal if it maximizes the mean of a scalar outcome in a population of interest, e.g., symptom reduction. However, clinical and intervention scientists often seek to balance multiple and possibly competing outcomes, e.g., symptom reduction and the risk of an adverse event. One approach to precision medicine in this setting is to elicit a composite outcome which balances all competing outcomes; unfortunately, eliciting a composite outcome directly from patients is difficult without a high-quality instrument, and an expert-derived composite outcome may not account for heterogeneity in patient preferences. We propose a new paradigm for the study of precision medicine using observational data that relies solely on the assumption that clinicians are approximately (i.e., imperfectly) making decisions to maximize individual patient utility. Estimated composite outcomes are subsequently used to construct an estimator of an individualized treatment rule which maximizes the mean of patient-specific composite outcomes. The estimated composite outcomes and estimated optimal individualized treatment rule provide new insights into patient preference heterogeneity, clinician behavior, and the value of precision medicine in a given domain. We derive inference procedures for the proposed estimators under mild conditions and demonstrate their finite sample performance through a suite of simulation experiments and an illustrative application to data from a study of bipolar depression.
Collapse
Affiliation(s)
| | - Eric B Laber
- Department of Statistics, North Carolina State University, Raleigh, NC 27607, USA
| | - Siyeon Kim
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27607, USA
| | - Michael R Kosorok
- Departments of Biostatistics and Statistics & Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
15
|
Affiliation(s)
- Yilun Sun
- Department of Biostatistics, University of Michigan, Ann Arbor, MI
- Department of Radiation Oncology, University of Michigan, Ann Arbor, MI
| | - Lu Wang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI
| |
Collapse
|
16
|
Wang S, Moodie EE, Stephens DA, Nijjar JS. Adaptive treatment strategies for chronic conditions: shared-parameter G-estimation with an application to rheumatoid arthritis. Biostatistics 2020; 23:kxaa033. [PMID: 32851395 DOI: 10.1093/biostatistics/kxaa033] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Revised: 07/15/2020] [Accepted: 07/17/2020] [Indexed: 11/13/2022] Open
Abstract
Most estimation algorithms for adaptive treatment strategies assume that treatment rules at each decision point are independent from one another in the sense that they do not possess any common parameters. This is often unrealistic, as the same decisions may be made repeatedly over time. Sharing treatment-decision parameters across decision points offers several advantages, including estimation of fewer parameters and the clinical ease of a single, time-invariant decision to implement. We propose a new computational approach to estimation of shared-parameter G-estimation, which is efficient and shares the double robustness of the "unshared" sequential G-estimation. We use this approach to analyze data from the Scottish Early Rheumatoid Arthritis (SERA) Inception Cohort.
Collapse
Affiliation(s)
- Shouao Wang
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC Canada, H3A 1A2
| | - Erica Em Moodie
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC Canada, H3A 1A2
| | - David A Stephens
- Department of Mathematics and Statistics, McGill University, Montreal, QC Canada, H3A 0B9
| | | |
Collapse
|
17
|
Abstract
Precision medicine seeks to maximize the quality of healthcare by individualizing the healthcare process to the uniquely evolving health status of each patient. This endeavor spans a broad range of scientific areas including drug discovery, genetics/genomics, health communication, and causal inference all in support of evidence-based, i.e., data-driven, decision making. Precision medicine is formalized as a treatment regime which comprises a sequence of decision rules, one per decision point, which map up-to-date patient information to a recommended action. The potential actions could be the selection of which drug to use, the selection of dose, timing of administration, specific diet or exercise recommendation, or other aspects of treatment or care. Statistics research in precision medicine is broadly focused on methodological development for estimation of and inference for treatment regimes which maximize some cumulative clinical outcome. In this review, we provide an overview of this vibrant area of research and present important and emerging challenges.
Collapse
Affiliation(s)
- Michael R Kosorok
- Department of Biostatistics and Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, 27599, U.S.A.;
| | - Eric B Laber
- Department of Statistics, North Carolina State University, Raleight, North Carolina, 27695, U.S.A.;
| |
Collapse
|
18
|
Zhao YQ, Laber EB, Ning Y, Saha S, Sands BE. Efficient augmentation and relaxation learning for individualized treatment rules using observational data. JOURNAL OF MACHINE LEARNING RESEARCH : JMLR 2019; 20:48. [PMID: 31440118 PMCID: PMC6705615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Individualized treatment rules aim to identify if, when, which, and to whom treatment should be applied. A globally aging population, rising healthcare costs, and increased access to patient-level data have created an urgent need for high-quality estimators of individualized treatment rules that can be applied to observational data. A recent and promising line of research for estimating individualized treatment rules recasts the problem of estimating an optimal treatment rule as a weighted classification problem. We consider a class of estimators for optimal treatment rules that are analogous to convex large-margin classifiers. The proposed class applies to observational data and is doubly-robust in the sense that correct specification of either a propensity or outcome model leads to consistent estimation of the optimal individualized treatment rule. Using techniques from semiparametric efficiency theory, we derive rates of convergence for the proposed estimators and use these rates to characterize the bias-variance trade-off for estimating individualized treatment rules with classification-based methods. Simulation experiments informed by these results demonstrate that it is possible to construct new estimators within the proposed framework that significantly outperform existing ones. We illustrate the proposed methods using data from a labor training program and a study of inflammatory bowel syndrome.
Collapse
Affiliation(s)
- Ying-Qi Zhao
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, 98109, USA
| | - Eric B Laber
- Department of Statistics, North Carolina State University, Raleigh, NC, 27695, USA
| | - Yang Ning
- Department of Statistical Science, Cornell University, Ithaca, NY, 14853, USA
| | - Sumona Saha
- School of Medicine and Public Health, University of Wisconsin, Madison, WI, 53705, USA
| | - Bruce E Sands
- Division of Gastroenterology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| |
Collapse
|
19
|
Tao Y, Wang L, Almirall D. TREE-BASED REINFORCEMENT LEARNING FOR ESTIMATING OPTIMAL DYNAMIC TREATMENT REGIMES. Ann Appl Stat 2018; 12:1914-1938. [PMID: 30984321 PMCID: PMC6457899 DOI: 10.1214/18-aoas1137] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Dynamic treatment regimes (DTRs) are sequences of treatment decision rules, in which treatment may be adapted over time in response to the changing course of an individual. Motivated by the substance use disorder (SUD) study, we propose a tree-based reinforcement learning (T-RL) method to directly estimate optimal DTRs in a multi-stage multi-treatment setting. At each stage, T-RL builds an unsupervised decision tree that directly handles the problem of optimization with multiple treatment comparisons, through a purity measure constructed with augmented inverse probability weighted estimators. For the multiple stages, the algorithm is implemented recursively using backward induction. By combining semiparametric regression with flexible tree-based learning, T-RL is robust, efficient and easy to interpret for the identification of optimal DTRs, as shown in the simulation studies. With the proposed method, we identify dynamic SUD treatment regimes for adolescents.
Collapse
Affiliation(s)
- Yebin Tao
- Department of Biostatistics University of Michigan Ann Arbor, Michigan 48109 USA
| | - Lu Wang
- Department of Biostatistics University of Michigan Ann Arbor, Michigan 48109 USA
| | - Daniel Almirall
- Institute for Social Research University of Michigan Ann Arbor, Michigan 48104 USA
| |
Collapse
|
20
|
Moodie EEM, Stephens DA, Alam S, Zhang MJ, Logan B, Arora M, Spellman S, Krakow EF. A cure-rate model for Q-learning: Estimating an adaptive immunosuppressant treatment strategy for allogeneic hematopoietic cell transplant patients. Biom J 2018; 61:442-453. [PMID: 29766558 DOI: 10.1002/bimj.201700181] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Revised: 02/26/2018] [Accepted: 03/23/2018] [Indexed: 11/11/2022]
Abstract
Cancers treated by transplantation are often curative, but immunosuppressive drugs are required to prevent and (if needed) to treat graft-versus-host disease. Estimation of an optimal adaptive treatment strategy when treatment at either one of two stages of treatment may lead to a cure has not yet been considered. Using a sample of 9563 patients treated for blood and bone cancers by allogeneic hematopoietic cell transplantation drawn from the Center for Blood and Marrow Transplant Research database, we provide a case study of a novel approach to Q-learning for survival data in the presence of a potentially curative treatment, and demonstrate the results differ substantially from an implementation of Q-learning that fails to account for the cure-rate.
Collapse
Affiliation(s)
- Erica E M Moodie
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, H3A 1A2, Canada
| | - David A Stephens
- Department of Mathematics and Statistics, McGill University, Montreal, QC, H3A 1A2, Canada
| | - Shomoita Alam
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, H3A 1A2, Canada
| | - Mei-Jie Zhang
- Medical College of Wisconsin, Milwaukee, WI, 53226, USA
| | - Brent Logan
- Medical College of Wisconsin, Milwaukee, WI, 53226, USA
| | - Mukta Arora
- Department of Medicine, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Stephen Spellman
- Center for International Blood and Marrow Transplant Research, Minneapolis, MN, 55401, USA
| | | |
Collapse
|
21
|
Laber EB, Wu F, Munera C, Lipkovich I, Colucci S, Ripa S. Identifying optimal dosage regimes under safety constraints: An application to long term opioid treatment of chronic pain. Stat Med 2018; 37:1407-1418. [PMID: 29468702 PMCID: PMC6293986 DOI: 10.1002/sim.7566] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2016] [Revised: 08/26/2017] [Accepted: 10/30/2017] [Indexed: 11/08/2022]
Abstract
There is growing interest and investment in precision medicine as a means to provide the best possible health care. A treatment regime formalizes precision medicine as a sequence of decision rules, one per clinical intervention period, that specify if, when and how current treatment should be adjusted in response to a patient's evolving health status. It is standard to define a regime as optimal if, when applied to a population of interest, it maximizes the mean of some desirable clinical outcome, such as efficacy. However, in many clinical settings, a high-quality treatment regime must balance multiple competing outcomes; eg, when a high dose is associated with substantial symptom reduction but a greater risk of an adverse event. We consider the problem of estimating the most efficacious treatment regime subject to constraints on the risk of adverse events. We combine nonparametric Q-learning with policy-search to estimate a high-quality yet parsimonious treatment regime. This estimator applies to both observational and randomized data, as well as settings with variable, outcome-dependent follow-up, mixed treatment types, and multiple time points. This work is motivated by and framed in the context of dosing for chronic pain; however, the proposed framework can be applied generally to estimate a treatment regime which maximizes the mean of one primary outcome subject to constraints on one or more secondary outcomes. We illustrate the proposed method using data pooled from 5 open-label flexible dosing clinical trials for chronic pain.
Collapse
|
22
|
Butler EL, Laber EB, Davis SM, Kosorok MR. Incorporating Patient Preferences into Estimation of Optimal Individualized Treatment Rules. Biometrics 2018; 74:18-26. [PMID: 28742260 PMCID: PMC5785589 DOI: 10.1111/biom.12743] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Revised: 05/01/2017] [Accepted: 06/01/2017] [Indexed: 11/29/2022]
Abstract
Precision medicine seeks to provide treatment only if, when, to whom, and at the dose it is needed. Thus, precision medicine is a vehicle by which healthcare can be made both more effective and efficient. Individualized treatment rules operationalize precision medicine as a map from current patient information to a recommended treatment. An optimal individualized treatment rule is defined as maximizing the mean of a pre-specified scalar outcome. However, in settings with multiple outcomes, choosing a scalar composite outcome by which to define optimality is difficult. Furthermore, when there is heterogeneity across patient preferences for these outcomes, it may not be possible to construct a single composite outcome that leads to high-quality treatment recommendations for all patients. We simultaneously estimate the optimal individualized treatment rule for all composite outcomes representable as a convex combination of the (suitably transformed) outcomes. For each patient, we use a preference elicitation questionnaire and item response theory to derive the posterior distribution over preferences for these composite outcomes and subsequently derive an estimator of an optimal individualized treatment rule tailored to patient preferences. We prove that as the number of subjects and items on the questionnaire diverge, our estimator is consistent for an oracle optimal individualized treatment rule wherein each patient's preference is known a priori. We illustrate the proposed method using data from a clinical trial on antipsychotic medications for schizophrenia.
Collapse
Affiliation(s)
- Emily L Butler
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, U.S.A
| | - Eric B Laber
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, U.S.A
| | - Sonia M Davis
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, U.S.A
| | - Michael R Kosorok
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, U.S.A
- Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, U.S.A
| |
Collapse
|
23
|
Grant S, Agniel D, Almirall D, Burkhart Q, Hunter SB, McCaffrey DF, Pedersen ER, Ramchand R, Griffin BA. Developing adaptive interventions for adolescent substance use treatment settings: protocol of an observational, mixed-methods project. Addict Sci Clin Pract 2017; 12:35. [PMID: 29254500 PMCID: PMC5735877 DOI: 10.1186/s13722-017-0099-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2017] [Accepted: 11/11/2017] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND Over 1.6 million adolescents in the United States meet criteria for substance use disorders (SUDs). While there are promising treatments for SUDs, adolescents respond to these treatments differentially in part based on the setting in which treatments are delivered. One way to address such individualized response to treatment is through the development of adaptive interventions (AIs): sequences of decision rules for altering treatment based on an individual's needs. This protocol describes a project with the overarching goal of beginning the development of AIs that provide recommendations for altering the setting of an adolescent's substance use treatment. This project has three discrete aims: (1) explore the views of various stakeholders (parents, providers, policymakers, and researchers) on deciding the setting of substance use treatment for an adolescent based on individualized need, (2) generate hypotheses concerning candidate AIs, and (3) compare the relative effectiveness among candidate AIs and non-adaptive interventions commonly used in everyday practice. METHODS This project uses a mixed-methods approach. First, we will conduct an iterative stakeholder engagement process, using RAND's ExpertLens online system, to assess the importance of considering specific individual needs and clinical outcomes when deciding the setting for an adolescent's substance use treatment. Second, we will use results from the stakeholder engagement process to analyze an observational longitudinal data set of 15,656 adolescents in substance use treatment, supported by the Substance Abuse and Mental Health Services Administration, using the Global Appraisal of Individual Needs questionnaire. We will utilize methods based on Q-learning regression to generate hypotheses about candidate AIs. Third, we will use robust statistical methods that aim to appropriately handle casemix adjustment on a large number of covariates (marginal structural modeling and inverse probability of treatment weights) to compare the relative effectiveness among candidate AIs and non-adaptive decision rules that are commonly used in everyday practice. DISCUSSION This project begins filling a major gap in clinical and research efforts for adolescents in substance use treatment. Findings could be used to inform the further development and revision of influential multi-dimensional assessment and treatment planning tools, or lay the foundation for subsequent experiments to further develop or test AIs for treatment planning.
Collapse
Affiliation(s)
- Sean Grant
- RAND Corporation, 1776 Main Street, Santa Monica, CA 90407 USA
| | - Denis Agniel
- RAND Corporation, 1776 Main Street, Santa Monica, CA 90407 USA
| | - Daniel Almirall
- Institute for Social Research, University of Michigan, 426 Thompson Street, Ann Arbor, MI 48104-2321 USA
| | - Q. Burkhart
- RAND Corporation, 1776 Main Street, Santa Monica, CA 90407 USA
| | - Sarah B. Hunter
- RAND Corporation, 1776 Main Street, Santa Monica, CA 90407 USA
| | | | | | - Rajeev Ramchand
- RAND Corporation, 1200 South Hayes Street, Arlington, VA 22202-5050 USA
| | - Beth Ann Griffin
- RAND Corporation, 1200 South Hayes Street, Arlington, VA 22202-5050 USA
| |
Collapse
|
24
|
Fu SS, Rothman AJ, Vock DM, Lindgren B, Almirall D, Begnaud A, Melzer A, Schertz K, Glaeser S, Hammett P, Joseph AM. Program for lung cancer screening and tobacco cessation: Study protocol of a sequential, multiple assignment, randomized trial. Contemp Clin Trials 2017; 60:86-95. [PMID: 28687349 PMCID: PMC5558455 DOI: 10.1016/j.cct.2017.07.002] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Revised: 06/20/2017] [Accepted: 07/03/2017] [Indexed: 12/17/2022]
Affiliation(s)
- Steven S Fu
- VA HSR&D Center for Chronic Disease Outcomes Research, Minneapolis VA Health Care System, Minneapolis, MN, United States; Department of Medicine, University of Minnesota, Minneapolis, MN, United States.
| | - Alexander J Rothman
- Department of Psychology, University of Minnesota, Minneapolis, MN, United States
| | - David M Vock
- Division of Biostatistics, University of Minnesota, Minneapolis, MN, United States
| | - Bruce Lindgren
- Biostatistics and Bioinformatics Core, Masonic Cancer Center, University of Minnesota, Minneapolis, MN, United States
| | - Daniel Almirall
- Survey Research Center, Institute for Social Research, University of Michigan, United States
| | - Abbie Begnaud
- Department of Medicine, University of Minnesota, Minneapolis, MN, United States
| | - Anne Melzer
- VA HSR&D Center for Chronic Disease Outcomes Research, Minneapolis VA Health Care System, Minneapolis, MN, United States; Department of Medicine, University of Minnesota, Minneapolis, MN, United States
| | - Kelsey Schertz
- Department of Medicine, University of Minnesota, Minneapolis, MN, United States
| | - Susan Glaeser
- Department of Medicine, University of Minnesota, Minneapolis, MN, United States
| | - Patrick Hammett
- VA HSR&D Center for Chronic Disease Outcomes Research, Minneapolis VA Health Care System, Minneapolis, MN, United States; Department of Medicine, University of Minnesota, Minneapolis, MN, United States; Division of Epidemiology and Community Health, University of Minnesota, Minneapolis, MN, United States
| | - Anne M Joseph
- Department of Medicine, University of Minnesota, Minneapolis, MN, United States
| |
Collapse
|
25
|
Goldberg Y, Pollak M, Mitelpunkt A, Orlovsky M, Weiss-Meilik A, Gorfine M. Change-point detection for infinite horizon dynamic treatment regimes. Stat Methods Med Res 2017; 26:1590-1604. [DOI: 10.1177/0962280217708655] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
A dynamic treatment regime is a set of decision rules for how to treat a patient at multiple time points. At each time point, a treatment decision is made depending on the patient’s medical history up to that point. We consider the infinite-horizon setting in which the number of decision points is very large. Specifically, we consider long trajectories of patients’ measurements recorded over time. At each time point, the decision whether to intervene or not is conditional on whether or not there was a change in the patient’s trajectory. We present change-point detection tools and show how to use them in defining dynamic treatment regimes. The performance of these regimes is assessed using an extensive simulation study. We demonstrate the utility of the proposed change-point detection approach using two case studies: detection of sepsis in preterm infants in the intensive care unit and detection of a change in glucose levels of a diabetic patient.
Collapse
Affiliation(s)
| | - Moshe Pollak
- The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Alexis Mitelpunkt
- Tel-Aviv University, Tel-Aviv, Israel
- Tel-Aviv Sourasky Medical Center, Tel-Aviv, Israel
| | | | | | | |
Collapse
|
26
|
Krakow EF, Hemmer M, Wang T, Logan B, Arora M, Spellman S, Couriel D, Alousi A, Pidala J, Last M, Lachance S, Moodie EEM. Tools for the Precision Medicine Era: How to Develop Highly Personalized Treatment Recommendations From Cohort and Registry Data Using Q-Learning. Am J Epidemiol 2017; 186:160-172. [PMID: 28472335 DOI: 10.1093/aje/kwx027] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2015] [Accepted: 08/02/2017] [Indexed: 01/01/2023] Open
Abstract
Q-learning is a method of reinforcement learning that employs backwards stagewise estimation to identify sequences of actions that maximize some long-term reward. The method can be applied to sequential multiple-assignment randomized trials to develop personalized adaptive treatment strategies (ATSs)-longitudinal practice guidelines highly tailored to time-varying attributes of individual patients. Sometimes, the basis for choosing which ATSs to include in a sequential multiple-assignment randomized trial (or randomized controlled trial) may be inadequate. Nonrandomized data sources may inform the initial design of ATSs, which could later be prospectively validated. In this paper, we illustrate challenges involved in using nonrandomized data for this purpose with a case study from the Center for International Blood and Marrow Transplant Research registry (1995-2007) aimed at 1) determining whether the sequence of therapeutic classes used in graft-versus-host disease prophylaxis and in refractory graft-versus-host disease is associated with improved survival and 2) identifying donor and patient factors with which to guide individualized immunosuppressant selections over time. We discuss how to communicate the potential benefit derived from following an ATS at the population and subgroup levels and how to evaluate its robustness to modeling assumptions. This worked example may serve as a model for developing ATSs from registries and cohorts in oncology and other fields requiring sequential treatment decisions.
Collapse
|
27
|
Jalalimanesh A, Haghighi HS, Ahmadi A, Hejazian H, Soltani M. Multi-objective optimization of radiotherapy: distributed Q-learning and agent-based simulation. J EXP THEOR ARTIF IN 2017. [DOI: 10.1080/0952813x.2017.1292319] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Ammar Jalalimanesh
- Department of Industrial Engineering, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran
| | - Hamidreza Shahabi Haghighi
- Department of Industrial Engineering, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran
| | - Abbas Ahmadi
- Department of Industrial Engineering, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran
| | - Hossein Hejazian
- Department of Industrial Engineering, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran
| | - Madjid Soltani
- Department of Mechanical Engineering, K. N. Toosi University of Technology, Tehran, Iran
- Department of Radiology, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
28
|
Chen G, Zeng D, Kosorok MR. Personalized Dose Finding Using Outcome Weighted Learning. J Am Stat Assoc 2017; 111:1509-1521. [PMID: 28255189 PMCID: PMC5327863 DOI: 10.1080/01621459.2016.1148611] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Revised: 12/01/2015] [Indexed: 10/22/2022]
Abstract
In dose-finding clinical trials, it is becoming increasingly important to account for individual level heterogeneity while searching for optimal doses to ensure an optimal individualized dose rule (IDR) maximizes the expected beneficial clinical outcome for each individual. In this paper, we advocate a randomized trial design where candidate dose levels assigned to study subjects are randomly chosen from a continuous distribution within a safe range. To estimate the optimal IDR using such data, we propose an outcome weighted learning method based on a nonconvex loss function, which can be solved efficiently using a difference of convex functions algorithm. The consistency and convergence rate for the estimated IDR are derived, and its small-sample performance is evaluated via simulation studies. We demonstrate that the proposed method outperforms competing approaches. Finally, we illustrate this method using data from a cohort study for Warfarin (an anti-thrombotic drug) dosing.
Collapse
Affiliation(s)
- Guanhua Chen
- Assistant Professor, Department of Biostatistics, Vanderbilt University, Nashville, TN 37203
| | - Donglin Zeng
- Professor, Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599
| | - Michael R Kosorok
- W. R. Kenan, Jr. Distinguished Professor and Chair, Department of Biostatistics, and Professor, Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC 27599
| |
Collapse
|
29
|
Tao Y, Wang L. Adaptive contrast weighted learning for multi-stage multi-treatment decision-making. Biometrics 2016; 73:145-155. [DOI: 10.1111/biom.12539] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2015] [Revised: 03/01/2016] [Accepted: 04/01/2016] [Indexed: 11/28/2022]
Affiliation(s)
- Yebin Tao
- Department of Biostatistics; University of Michigan; Ann Arbor, Michigan 48109 U.S.A
| | - Lu Wang
- Department of Biostatistics; University of Michigan; Ann Arbor, Michigan 48109 U.S.A
| |
Collapse
|
30
|
Ogburn EL, Zeger SL. Statistical Reasoning and Methods in Epidemiology to Promote Individualized Health: In Celebration of the 100th Anniversary of the Johns Hopkins Bloomberg School of Public Health. Am J Epidemiol 2016; 183:427-34. [PMID: 26867776 DOI: 10.1093/aje/kwv453] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2015] [Accepted: 12/23/2015] [Indexed: 11/12/2022] Open
Abstract
Epidemiology is concerned with determining the distribution and causes of disease. Throughout its history, epidemiology has drawn upon statistical ideas and methods to achieve its aims. Because of the exponential growth in our capacity to measure and analyze data on the underlying processes that define each person's state of health, there is an emerging opportunity for population-based epidemiologic studies to influence health decisions made by individuals in ways that take into account the individuals' characteristics, circumstances, and preferences. We refer to this endeavor as "individualized health." The present article comprises 2 sections. In the first, we describe how graphical, longitudinal, and hierarchical models can inform the project of individualized health. We propose a simple graphical model for informing individual health decisions using population-based data. In the second, we review selected topics in causal inference that we believe to be particularly useful for individualized health. Epidemiology and biostatistics were 2 of the 4 founding departments in the world's first graduate school of public health at Johns Hopkins University, the centennial of which we honor. This survey of a small part of the literature is intended to demonstrate that the 2 fields remain just as inextricably linked today as they were 100 years ago.
Collapse
|
31
|
Zhang Y, Laber EB, Tsiatis A, Davidian M. Using decision lists to construct interpretable and parsimonious treatment regimes. Biometrics 2015; 71:895-904. [PMID: 26193819 PMCID: PMC4715597 DOI: 10.1111/biom.12354] [Citation(s) in RCA: 57] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2014] [Revised: 04/01/2015] [Accepted: 05/01/2015] [Indexed: 11/26/2022]
Abstract
A treatment regime formalizes personalized medicine as a function from individual patient characteristics to a recommended treatment. A high-quality treatment regime can improve patient outcomes while reducing cost, resource consumption, and treatment burden. Thus, there is tremendous interest in estimating treatment regimes from observational and randomized studies. However, the development of treatment regimes for application in clinical practice requires the long-term, joint effort of statisticians and clinical scientists. In this collaborative process, the statistician must integrate clinical science into the statistical models underlying a treatment regime and the clinician must scrutinize the estimated treatment regime for scientific validity. To facilitate meaningful information exchange, it is important that estimated treatment regimes be interpretable in a subject-matter context. We propose a simple, yet flexible class of treatment regimes whose members are representable as a short list of if-then statements. Regimes in this class are immediately interpretable and are therefore an appealing choice for broad application in practice. We derive a robust estimator of the optimal regime within this class and demonstrate its finite sample performance using simulation experiments. The proposed method is illustrated with data from two clinical trials.
Collapse
Affiliation(s)
- Yichi Zhang
- Department of Statistics, North Carolina State University, Raleigh, NC 27695-8203, U.S.A
| | - Eric B Laber
- Department of Statistics, North Carolina State University, Raleigh, NC 27695-8203, U.S.A
| | - Anastasios Tsiatis
- Department of Statistics, North Carolina State University, Raleigh, NC 27695-8203, U.S.A
| | - Marie Davidian
- Department of Statistics, North Carolina State University, Raleigh, NC 27695-8203, U.S.A
| |
Collapse
|
32
|
Abstract
Chronic illness treatment strategies must adapt to the evolving health status of the patient receiving treatment. Data-driven dynamic treatment regimes can offer guidance for clinicians and intervention scientists on how to treat patients over time in order to bring about the most favorable clinical outcome on average. Methods for estimating optimal dynamic treatment regimes, such as Q-learning, typically require modeling nonsmooth, nonmonotone transformations of data. Thus, building well-fitting models can be challenging and in some cases may result in a poor estimate of the optimal treatment regime. Interactive Q-learning (IQ-learning) is an alternative to Q-learning that only requires modeling smooth, monotone transformations of the data. The R package iqLearn provides functions for implementing both the IQ-learning and Q-learning algorithms. We demonstrate how to estimate a two-stage optimal treatment policy with iqLearn using a generated data set bmiData which mimics a two-stage randomized body mass index reduction trial with binary treatments at each stage.
Collapse
|
33
|
Chakraborty B, Laber EB, Zhao YQ. Inference about the expected performance of a data-driven dynamic treatment regime. Clin Trials 2014; 11:408-417. [PMID: 24925083 DOI: 10.1177/1740774514537727] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
BACKGROUND A dynamic treatment regime (DTR) comprises a sequence of decision rules, one per stage of intervention, that recommends how to individualize treatment to patients based on evolving treatment and covariate history. These regimes are useful for managing chronic disorders, and fit into the larger paradigm of personalized medicine. The Value of a DTR is the expected outcome when the DTR is used to assign treatments to a population of interest. PURPOSE The Value of a data-driven DTR, estimated using data from a Sequential Multiple Assignment Randomized Trial, is both a data-dependent parameter and a non-smooth function of the underlying generative distribution. These features introduce additional variability that is not accounted for by standard methods for conducting statistical inference, for example, the bootstrap or normal approximations, if applied without adjustment. Our purpose is to provide a feasible method for constructing valid confidence intervals (CIs) for this quantity of practical interest. METHODS We propose a conceptually simple and computationally feasible method for constructing valid CIs for the Value of an estimated DTR based on subsampling. The method is self-tuning by virtue of an approach called the double bootstrap. We demonstrate the proposed method using a series of simulated experiments. RESULTS The proposed method offers considerable improvement in terms of coverage rates of the CIs over the standard bootstrap approach. LIMITATIONS In this article, we have restricted our attention to Q-learning for estimating the optimal DTR. However, other methods can be employed for this purpose; to keep the discussion focused, we have not explored these alternatives. CONCLUSION Subsampling-based CIs provide much better performance compared to standard bootstrap for the Value of an estimated DTR.
Collapse
Affiliation(s)
- Bibhas Chakraborty
- Centre for Quantitative Medicine, Duke-NUS Graduate Medical School, Singapore, Singapore Department of Biostatistics, Columbia University, New York, NY, USA
| | - Eric B Laber
- Department of Statistics, North Carolina State University, Raleigh, NC, USA
| | - Ying-Qi Zhao
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
34
|
Abstract
A dynamic treatment regime consists of a sequence of decision rules, one per stage of intervention, that dictate how to individualize treatments to patients based on evolving treatment and covariate history. These regimes are particularly useful for managing chronic disorders, and fit well into the larger paradigm of personalized medicine. They provide one way to operationalize a clinical decision support system. Statistics plays a key role in the construction of evidence-based dynamic treatment regimes - informing best study design as well as efficient estimation and valid inference. Due to the many novel methodological challenges it offers, this area has been growing in popularity among statisticians in recent years. In this article, we review the key developments in this exciting field of research. In particular, we discuss the sequential multiple assignment randomized trial designs, estimation techniques like Q-learning and marginal structural models, and several inference techniques designed to address the associated non-standard asymptotics. We reference software, whenever available. We also outline some important future directions.
Collapse
Affiliation(s)
| | - Susan A Murphy
- Department of Statistics and Institute for Social Research, University of Michigan, Ann Arbor, USA, 48109
| |
Collapse
|
35
|
Moodie EEM, Dean N, Sun YR. Q-Learning: Flexible Learning About Useful Utilities. STATISTICS IN BIOSCIENCES 2013. [DOI: 10.1007/s12561-013-9103-z] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|