Cao X, Tsang IW, Xu J. Cold-Start Active Sampling Via γ-Tube.
IEEE TRANSACTIONS ON CYBERNETICS 2022;
52:6034-6045. [PMID:
33878008 DOI:
10.1109/tcyb.2021.3069956]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Active learning (AL) improves the generalization performance for the current classification hypothesis by querying labels from a pool of unlabeled data. The sampling process is typically assessed by an informative, representative, or diverse evaluation policy. However, the policy, which needs an initial labeled set to start, may degenerate its performance in a cold-start hypothesis. In this article, we first show that typical AL sampling can be equivalently formulated as geometric sampling over minimum enclosing ballsMEB of this article denotes a conceptual geometry over the cluster in generalization analysis. In the SVM community, it is related to hard-margin support vector data description.(MEBs) of clusters. Following the γ -tube structure in geometric clustering, we then divide one MEB covering a cluster into two parts: 1) a γ -tube and 2) a γ -ball. By estimating the error disagreement between sampling in MEB and γ -ball, our theoretical insight reveals that γ -tube can effectively measure the disagreement of hypotheses in original space over MEB and sampling space over γ -ball. To tighten our insight, we present generalization analysis, and the results show that sampling in γ -tube can derive higher probability bound to achieve a nearly zero generalization error. With these analyses, we finally apply the informative sampling policy of AL over γ -tube to present a tube AL (TAL) algorithm against the cold-start sampling issue. As a result, the dependency between the querying process and the evaluation policy of active sampling can be alleviated. Experimental results show that by using the γ -tube structure to deal with cold-start sampling, TAL achieves the superior performance than standard AL evaluation baselines by presenting substantial accuracy improvements. Image edge recognition extends our theoretical results.
Collapse