Devaux A, Helmer C, Genuer R, Proust-Lima C. Random survival forests with multivariate longitudinal endogenous covariates.
Stat Methods Med Res 2023;
32:2331-2346. [PMID:
37886845 DOI:
10.1177/09622802231206477]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2023]
Abstract
Predicting the individual risk of clinical events using the complete patient history is a major challenge in personalized medicine. Analytical methods have to account for a possibly large number of time-dependent predictors, which are often characterized by irregular and error-prone measurements, and are truncated early by the event. In this work, we extended the competing-risk random survival forests to handle such endogenous longitudinal predictors when predicting event probabilities. The method, implemented in the R package DynForest, internally transforms the time-dependent predictors at each node of each tree into time-fixed features (using mixed models) that can then be used as splitting candidates. The final individual event probability is computed as the average of leaf-specific Aalen-Johansen estimators over the trees. Using simulations, we compared the performances of DynForest to accurately predict an event with (i) a joint modeling alternative when considering two longitudinal predictors only, and with (ii) a regression calibration method that ignores the informative truncation by the event when dealing with a large number of longitudinal predictors. Through an application in dementia research, we also illustrated how DynForest can be used to develop a dynamic prediction tool for dementia from multimodal repeated markers, and quantify the importance of each marker.
Collapse