Nock R, Nielsen F. Bregman divergences and surrogates for learning.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2009;
31:2048-2059. [PMID:
19762930 DOI:
10.1109/tpami.2008.225]
[Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Bartlett et al. (2006) recently proved that a ground condition for surrogates, classification calibration, ties up their consistent minimization to that of the classification risk, and left as an important problem the algorithmic questions about their minimization. In this paper, we address this problem for a wide set which lies at the intersection of classification calibrated surrogates and those of Murata et al. (2004). This set coincides with those satisfying three common assumptions about surrogates. Equivalent expressions for the members-sometimes well known-follow for convex and concave surrogates, frequently used in the induction of linear separators and decision trees. Most notably, they share remarkable algorithmic features: for each of these two types of classifiers, we give a minimization algorithm provably converging to the minimum of any such surrogate. While seemingly different, we show that these algorithms are offshoots of the same "master" algorithm. This provides a new and broad unified account of different popular algorithms, including additive regression with the squared loss, the logistic loss, and the top-down induction performed in CART, C4.5. Moreover, we show that the induction enjoys the most popular boosting features, regardless of the surrogate. Experiments are provided on 40 readily available domains.
Collapse