Grogan TR, Elashoff DA. A simulation based method for assessing the statistical significance of logistic regression models after common variable selection procedures.
COMMUN STAT-SIMUL C 2016;
46:7180-7193. [PMID:
29225408 PMCID:
PMC5722241 DOI:
10.1080/03610918.2016.1230216]
[Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2016] [Accepted: 08/22/2016] [Indexed: 10/20/2022]
Abstract
Classification models can demonstrate apparent prediction accuracy even when there is no underlying relationship between the predictors and the response. Variable selection procedures can lead to false positive variable selections and overestimation of true model performance. A simulation study was conducted using logistic regression with forward stepwise, best subsets, and LASSO variable selection methods with varying total sample sizes (20, 50, 100, 200) and numbers of random noise predictor variables (3, 5, 10, 15, 20, 50). Using our critical values can help reduce needless follow-up on variables having no true association with the outcome.
Collapse