Cheng X, Wang H. A generic model-free feature screening procedure for ultra-high dimensional data with categorical response.
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023;
229:107269. [PMID:
36463676 DOI:
10.1016/j.cmpb.2022.107269]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 11/22/2022] [Accepted: 11/23/2022] [Indexed: 06/17/2023]
Abstract
BACKGROUND AND OBJECTIVE
Identifying active features from ultra-high dimensional data is one of the primary and vital tasks in statistical learning and biological discovery.
METHODS
In this paper, we develop a generic concordance index screening (CI-SIS) procedure to wrestle with ultra-high dimensional data with categorical response. The proposed procedure is model-free and nonparametric based on the concordance index measure. It enjoys both sure screening and ranking consistency properties under some relatively weak assumptions. We investigate the flexibility of this procedure by considering some commonly-encountered challenging settings in biomedical studies, such as category-adaptive data and extremely unbalanced response distributions. A data-driven threshold selection procedure via knockoff features is also presented.
RESULTS
On the real lung dataset, our method achieves a lower prediction error with a mean error of 0.107 with linear discriminant analysis (LDA) and 0.117 with random forest (RF), respectively. In addition, we obtain an accuracy improvement of 3% with LDA and 5% with RF compared to the runner-up method. In a more challenging real data of SRBCT (Small round blue cell tumours), CI-SIS brings about a amazing performance improvement, which is at least 8% higher than all other competing methods.
CONCLUSION
Experimental results show that the proposed method can efficiently identify genes that are associated with certain types of diseases. Therefore, survived features (filtering out irrelevant features) selected by our procedure can help doctors make precision diagnoses and refined treatments of patients.
Collapse