1
|
Li W, Li R, Feng Z, Ning J. Dynamic and concordance-assisted learning for risk stratification with application to Alzheimer's disease. Biostatistics 2024:kxae036. [PMID: 39255368 DOI: 10.1093/biostatistics/kxae036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 08/07/2024] [Accepted: 08/08/2024] [Indexed: 09/12/2024] Open
Abstract
Dynamic prediction models capable of retaining accuracy by evolving over time could play a significant role for monitoring disease progression in clinical practice. In biomedical studies with long-term follow up, participants are often monitored through periodic clinical visits with repeat measurements until an occurrence of the event of interest (e.g. disease onset) or the study end. Acknowledging the dynamic nature of disease risk and clinical information contained in the longitudinal markers, we propose an innovative concordance-assisted learning algorithm to derive a real-time risk stratification score. The proposed approach bypasses the need to fit regression models, such as joint models of the longitudinal markers and time-to-event outcome, and hence enjoys the desirable property of model robustness. Simulation studies confirmed that the proposed method has satisfactory performance in dynamically monitoring the risk of developing disease and differentiating high-risk and low-risk population over time. We apply the proposed method to the Alzheimer's Disease Neuroimaging Initiative data and develop a dynamic risk score of Alzheimer's Disease for patients with mild cognitive impairment using multiple longitudinal markers and baseline prognostic factors.
Collapse
Affiliation(s)
- Wen Li
- Division of Clinical and Translational Sciences, Department of Internal Medicine, The University of Texas McGovern Medical School, Houston, TX 77030, United States
| | - Ruosha Li
- Department of Biostatistics and Data Science, The University of Texas School of Public Health, Houston, TX 77030, United States
| | - Ziding Feng
- Department of Biostatistics, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, United States
| | - Jing Ning
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, United States
| |
Collapse
|
2
|
Rhodes G, Davidian M, Lu W. DYNAMIC PREDICTION OF RESIDUAL LIFE WITH LONGITUDINAL COVARIATES USING LONG SHORT-TERM MEMORY NETWORKS. Ann Appl Stat 2023; 17:2039-2058. [PMID: 38037614 PMCID: PMC10688566 DOI: 10.1214/22-aoas1706] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2023]
Abstract
Sepsis, a complex medical condition that involves severe infections with life-threatening organ dysfunction, is a leading cause of death worldwide. Treatment of sepsis is highly challenging. When making treatment decisions, clinicians and patients desire accurate predictions of mean residual life (MRL) that leverage all available patient information, including longitudinal biomarker data. Biomarkers are biological, clinical, and other variables reflecting disease progression that are often measured repeatedly on patients in the clinical setting. Dynamic prediction methods leverage accruing biomarker measurements to improve performance, providing updated predictions as new measurements become available. We introduce two methods for dynamic prediction of MRL using longitudinal biomarkers. in both methods, we begin by using long short-term memory networks (LSTMs) to construct encoded representations of the biomarker trajectories, referred to as "context vectors." In our first method, the LSTM-GLM, we dynamically predict MRL via a transformed MRL model that includes the context vectors as covariates. In our second method, the LSTM-NN, we dynamically predict MRL from the context vectors using a feed-forward neural network. We demonstrate the improved performance of both proposed methods relative to competing methods in simulation studies. We apply the proposed methods to dynamically predict the restricted mean residual life (RMRL) of septic patients in the intensive care unit using electronic medical record data. We demonstrate that the LSTM-GLM and the LSTM-NN are useful tools for producing individualized, real-time predictions of RMRL that can help inform the treatment decisions of septic patients.
Collapse
Affiliation(s)
- Grace Rhodes
- Department of Statistics, North Carolina State University
| | - Marie Davidian
- Department of Statistics, North Carolina State University
| | - Wenbin Lu
- Department of Statistics, North Carolina State University
| |
Collapse
|
3
|
Li W, Li L, Astor BC. A comparison of two approaches to dynamic prediction: Joint modeling and landmark modeling. Stat Med 2023; 42:2101-2115. [PMID: 36938960 DOI: 10.1002/sim.9713] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2022] [Revised: 02/18/2023] [Accepted: 03/07/2023] [Indexed: 03/21/2023]
Abstract
Joint modeling and landmark modeling are two mainstream approaches to dynamic prediction in longitudinal studies, that is, the prediction of a clinical event using longitudinally measured predictor variables available up to the time of prediction. It is an important research question to the methodological research field and also to practical users to understand which approach can produce more accurate prediction. There were few previous studies on this topic, and the majority of results seemed to favor joint modeling. However, these studies were conducted in scenarios where the data were simulated from the joint models, partly due to the widely recognized methodological difficulty on whether there exists a general joint distribution of longitudinal and survival data so that the landmark models, which consists of infinitely many working regression models for survival, hold simultaneously. As a result, the landmark models always worked under misspecification, which caused difficulty in interpreting the comparison. In this paper, we solve this problem by using a novel algorithm to generate longitudinal and survival data that satisfies the working assumptions of the landmark models. This innovation makes it possible for a "fair" comparison of joint modeling and landmark modeling in terms of model specification. Our simulation results demonstrate that the relative performance of these two modeling approaches depends on the data settings and one does not always dominate the other in terms of prediction accuracy. These findings stress the importance of methodological development for both approaches. The related methodology is illustrated with a kidney transplantation dataset.
Collapse
Affiliation(s)
- Wenhao Li
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Liang Li
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Brad C Astor
- Departments of Medicine and Population Health Sciences, University of Wisconsin, Madison, Wisconsin
| |
Collapse
|
4
|
Sun Y, Chiou SH, Wu CO, McGarry M, Huang CY. DYNAMIC RISK PREDICTION TRIGGERED BY INTERMEDIATE EVENTS USING SURVIVAL TREE ENSEMBLES. Ann Appl Stat 2023; 17:1375-1397. [PMID: 37284167 PMCID: PMC10241448 DOI: 10.1214/22-aoas1674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
With the availability of massive amounts of data from electronic health records and registry databases, incorporating time-varying patient information to improve risk prediction has attracted great attention. To exploit the growing amount of predictor information over time, we develop a unified framework for landmark prediction using survival tree ensembles, where an updated prediction can be performed when new information becomes available. Compared to conventional landmark prediction with fixed landmark times, our methods allow the landmark times to be subject-specific and triggered by an intermediate clinical event. Moreover, the nonparametric approach circumvents the thorny issue of model incompatibility at different landmark times. In our framework, both the longitudinal predictors and the event time outcome are subject to right censoring, and thus existing tree-based approaches cannot be directly applied. To tackle the analytical challenges, we propose a risk-set-based ensemble procedure by averaging martingale estimating equations from individual trees. Extensive simulation studies are conducted to evaluate the performance of our methods. The methods are applied to the Cystic Fibrosis Foundation Patient Registry (CFFPR) data to perform dynamic prediction of lung disease in cystic fibrosis patients and to identify important prognosis factors.
Collapse
Affiliation(s)
- Yifei Sun
- Department of Biostatistics, Columbia University
| | - Sy Han Chiou
- Department of Mathematical Sciences, University of Texas at Dallas
| | - Colin O Wu
- National Heart, Lung, and Blood Institute, National Institutes of Health
| | - Meghan McGarry
- Department of Pediatrics, University of California San Francisco
| | - Chiung-Yu Huang
- Department of Epidemiology and Biostatistics, University of California San Francisco
| |
Collapse
|
5
|
Yao Y, Li L, Astor B, Yang W, Greene T. Predicting the risk of a clinical event using longitudinal data: the generalized landmark analysis. BMC Med Res Methodol 2023; 23:5. [PMID: 36611147 PMCID: PMC9824910 DOI: 10.1186/s12874-022-01828-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 12/22/2022] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND In the development of prediction models for a clinical event, it is common to use the static prediction modeling (SPM), a regression model that relates baseline predictors to the time to event. In many situations, the data used in training and validation are from longitudinal studies, where predictor variables are time-varying and measured at clinical visits. But these data are not used in SPM. The landmark analysis (LA), previously proposed for dynamic prediction with longitudinal data, has interpretational difficulty when the baseline is not a risk-changing clinical milestone, as is often the case in observational studies of chronic disease without intervention. METHODS This paper studies the generalized landmark analysis (GLA), a statistical framework to develop prediction models for longitudinal data. The GLA includes the LA as a special case, and generalizes it to situations where the baseline is not a risk-changing clinical milestone with a more useful interpretation. Unlike the LA, the landmark variable does not have to be time since baseline in the GLA, but can be any time-varying prognostic variable. The GLA can also be viewed as a longitudinal generalization of localized prediction, which has been studied in the context of low-dimensional cross-sectional data. We studied the GLA using data from the Chronic Renal Insufficiency Cohort (CRIC) Study and the Wisconsin Allograft Replacement Database (WisARD) and compared the prediction performance of SPM and GLA. RESULTS In various validation populations from longitudinal data, the GLA generally had similarly or better predictive performance than SPM, with notable improvement being seen when the validation population deviated from the baseline population. The GLA also demonstrated similar or better predictive performance than LA, due to its more general model specification. CONCLUSIONS GLA is a generalization of the LA such that the landmark variable does not have to be the time since baseline. It has better interpretation when the baseline is not a risk-changing clinical milestone. The GLA is more adaptive to the validation population than SPM and is more flexible than LA, which may help produce more accurate prediction.
Collapse
Affiliation(s)
- Yi Yao
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX, US
| | - Liang Li
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX, US
| | - Brad Astor
- School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, US
| | - Wei Yang
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, US
| | - Tom Greene
- School of Medicine, University of Utah, Madison, UT, US
| |
Collapse
|
6
|
Wang S, Li Z, Lan L, Zhao J, Zheng WJ, Li L. GPU Accelerated Estimation of a Shared Random Effect Joint Model for Dynamic Prediction. Comput Stat Data Anal 2022; 174:107528. [PMID: 39257897 PMCID: PMC11384271 DOI: 10.1016/j.csda.2022.107528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In longitudinal cohort studies, it is often of interest to predict the risk of a terminal clinical event using longitudinal predictor data among subjects at risk by the time of the prediction. The at-risk population changes over time; so does the association between predictors and the outcome, as well as the accumulating longitudinal predictor history. The dynamic nature of this prediction problem has received increasing interest in the literature, but computation often poses a challenge. The widely used joint model of longitudinal and survival data often comes with intensive computation and excessive model fitting time, due to numerical optimization and the analytically intractable high-dimensional integral in the likelihood function. This problem is exacerbated when the model is fit to a large dataset or the model involves multiple longitudinal predictors with nonlinear trajectories. This challenge can be addressed from an algorithmic perspective, by a novel two-stage estimation procedure, and from a computing perspective, by Graphics Processing Unit (GPU) programming. The latter is implemented through PyTorch, an emerging deep learning framework. The numerical studies demonstrate that the proposed algorithm and software can substantially speed up the estimation of the joint model, particularly with large datasets. The numerical studies also concluded that accounting for nonlinearity in longitudinal predictor trajectories can improve the prediction accuracy in comparison to joint modeling that ignore nonlinearity.
Collapse
Affiliation(s)
- Shikun Wang
- Department of Biostatistics, the University of Texas MD Anderson Cancer Center, United States
| | - Zhao Li
- School of Biomedical Informatics, the University of Texas Health Science Center at Houston, United States
| | - Lan Lan
- School of Biomedical Informatics, the University of Texas Health Science Center at Houston, United States
| | - Jieyi Zhao
- School of Biomedical Informatics, the University of Texas Health Science Center at Houston, United States
| | - W Jim Zheng
- School of Biomedical Informatics, the University of Texas Health Science Center at Houston, United States
| | - Liang Li
- Department of Biostatistics, the University of Texas MD Anderson Cancer Center, United States
| |
Collapse
|
7
|
Kheirandish M, Catanzaro D, Crudu V, Zhang S. Integrating landmark modeling framework and machine learning algorithms for dynamic prediction of tuberculosis treatment outcomes. J Am Med Inform Assoc 2022; 29:900-908. [PMID: 35139541 PMCID: PMC9006704 DOI: 10.1093/jamia/ocac003] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 12/15/2021] [Accepted: 01/27/2022] [Indexed: 11/12/2022] Open
Abstract
OBJECTIVE This study aims to establish an informative dynamic prediction model of treatment outcomes using follow-up records of tuberculosis (TB) patients, which can timely detect cases when the current treatment plan may not be effective. MATERIALS AND METHODS We used 122 267 follow-up records from 17 958 new cases of pulmonary TB in the Republic of Moldova. A dynamic prediction framework integrating landmark modeling and machine learning algorithms was designed to predict patient outcomes during the course of treatment. Sensitivity and positive predictive value (PPV) were calculated to evaluate performance of the model at critical time points. New measures were defined to determine when follow-up laboratory tests should be conducted to obtain most informative results. RESULTS The random-forest algorithm performed better than support vector machine and penalized multinomial logistic regression models for predicting TB treatment outcomes. For all 3 outcome classes (ie, cured, not cured, and died after 24 months following treatment initiation), sensitivity and PPV of prediction models improved as more follow-up information was collected. Specifically, sensitivity and PPV increased from 0.55 to 0.84 and from 0.32 to 0.88, respectively, for the not cured class. CONCLUSION The dynamic prediction framework utilizes longitudinal laboratory test results to predict patient outcomes at various landmarks. Sputum culture and smear results are among the important variables for prediction; however, the most recent sputum result is not always the most informative one. This framework can potentially facilitate a more effective treatment monitoring program and provide insights for policymakers toward improved guidelines on follow-up tests.
Collapse
Affiliation(s)
- Maryam Kheirandish
- Department of Industrial Engineering, University of Arkansas, Fayetteville, Arkansas, USA
| | - Donald Catanzaro
- Department of Biological Sciences, University of Arkansas, Fayetteville, Arkansas, USA
| | - Valeriu Crudu
- Institute of Phthisiopneumology “Chrirl Draganiuc,” Chisinau, Moldova
- State University of Medicine and Pharmacy “Nicolae Testemitanu,” Chisinau, Moldova
| | - Shengfan Zhang
- Department of Industrial Engineering, University of Arkansas, Fayetteville, Arkansas, USA
| |
Collapse
|
8
|
Wu C, Li L, Li R. Dynamic prediction of competing risk events using landmark sub-distribution hazard model with multiple longitudinal biomarkers. Stat Methods Med Res 2020; 29:3179-3191. [PMID: 32419611 PMCID: PMC10469606 DOI: 10.1177/0962280220921553] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
The cause-specific cumulative incidence function quantifies the subject-specific disease risk with competing risk outcome. With longitudinally collected biomarker data, it is of interest to dynamically update the predicted cumulative incidence function by incorporating the most recent biomarker as well as the cumulating longitudinal history. Motivated by a longitudinal cohort study of chronic kidney disease, we propose a framework for dynamic prediction of end stage renal disease using multivariate longitudinal biomarkers, accounting for the competing risk of death. The proposed framework extends the local estimation-based landmark survival modeling to competing risks data, and implies that a distinct sub-distribution hazard regression model is defined at each biomarker measurement time. The model parameters, prediction horizon, longitudinal history and at-risk population are allowed to vary over the landmark time. When the measurement times of biomarkers are irregularly spaced, the predictor variable may not be observed at the time of prediction. Local polynomial is used to estimate the model parameters without explicitly imputing the predictor or modeling its longitudinal trajectory. The proposed model leads to simple interpretation of the regression coefficients and closed-form calculation of the predicted cumulative incidence function. The estimation and prediction can be implemented through standard statistical software with tractable computation. We conducted simulations to evaluate the performance of the estimation procedure and predictive accuracy. The methodology is illustrated with data from the African American Study of Kidney Disease and Hypertension.
Collapse
Affiliation(s)
- Cai Wu
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, USA
- Department of Biostatistics, University of Texas School of Public Health, Houston, USA
| | - Liang Li
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, USA
| | - Ruosha Li
- Department of Biostatistics, University of Texas School of Public Health, Houston, USA
| |
Collapse
|
9
|
Zhu Y, Huang X, Li L. Dynamic prediction of time to a clinical event with sparse and irregularly measured longitudinal biomarkers. Biom J 2020; 62:1371-1393. [PMID: 32196728 PMCID: PMC7502505 DOI: 10.1002/bimj.201900112] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 12/13/2019] [Accepted: 12/18/2019] [Indexed: 12/21/2022]
Abstract
In clinical research and practice, landmark models are commonly used to predict the risk of an adverse future event, using patients' longitudinal biomarker data as predictors. However, these data are often observable only at intermittent visits, making their measurement times irregularly spaced and unsynchronized across different subjects. This poses challenges to conducting dynamic prediction at any post-baseline time. A simple solution is the last-value-carry-forward method, but this may result in bias for the risk model estimation and prediction. Another option is to jointly model the longitudinal and survival processes with a shared random effects model. However, when dealing with multiple biomarkers, this approach often results in high-dimensional integrals without a closed-form solution, and thus the computational burden limits its software development and practical use. In this article, we propose to process the longitudinal data by functional principal component analysis techniques, and then use the processed information as predictors in a class of flexible linear transformation models to predict the distribution of residual time-to-event occurrence. The measurement schemes for multiple biomarkers are allowed to be different within subject and across subjects. Dynamic prediction can be performed in a real-time fashion. The advantages of our proposed method are demonstrated by simulation studies. We apply our approach to the African American Study of Kidney Disease and Hypertension, predicting patients' risk of kidney failure or death by using four important longitudinal biomarkers for renal functions.
Collapse
Affiliation(s)
- Yayuan Zhu
- The Department of Epidemiology and Biostatistics, University of Western Ontario, London, ON, Canada
| | - Xuelin Huang
- The Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Liang Li
- The Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| |
Collapse
|