van de Wiel MA, Amestoy M, Hoogland J. Linked shrinkage to improve estimation of interaction effects in regression models.
EPIDEMIOLOGIC METHODS 2024;
13:20230039. [PMID:
38989109 PMCID:
PMC11232106 DOI:
10.1515/em-2023-0039]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Accepted: 06/22/2024] [Indexed: 07/12/2024]
Abstract
Objectives
The addition of two-way interactions is a classic problem in statistics, and comes with the challenge of quadratically increasing dimension. We aim to a) devise an estimation method that can handle this challenge and b) to aid interpretation of the resulting model by developing computational tools for quantifying variable importance.
Methods
Existing strategies typically overcome the dimensionality problem by only allowing interactions between relevant main effects. Building on this philosophy, and aiming for settings with moderate n to p ratio, we develop a local shrinkage model that links the shrinkage of interaction effects to the shrinkage of their corresponding main effects. In addition, we derive a new analytical formula for the Shapley value, which allows rapid assessment of individual-specific variable importance scores and their uncertainties.
Results
We empirically demonstrate that our approach provides accurate estimates of the model parameters and very competitive predictive accuracy. In our Bayesian framework, estimation inherently comes with inference, which facilitates variable selection. Comparisons with key competitors are provided. Large-scale cohort data are used to provide realistic illustrations and evaluations. The implementation of our method in RStan is relatively straightforward and flexible, allowing for adaptation to specific needs.
Conclusions
Our method is an attractive alternative for existing strategies to handle interactions in epidemiological and/or clinical studies, as its linked local shrinkage can improve parameter accuracy, prediction and variable selection. Moreover, it provides appropriate inference and interpretation, and may compete well with less interpretable machine learners in terms of prediction.
Collapse