NIPS Workshop on The Accuracy-Regularization Frontier

Event Detail

General Information
Dates:
Friday, December 9, 2005 - Friday, December 9, 2005
Days of Week:
Friday
Target Audience:
Academic and Practice
Location:
INFORMS Optimization Society Westin Resort and SPA, Whistler, BC, Canada
Event Details/Other Comments:

CALL FOR CONTRIBUTIONS
A prevalent approach in machine learning for achieving good generalization performance is to seek a predictor that, on one hand, attains low empirical error, and on the other hand, is "simple", as measured by some regularizer, and so guaranteed to generalize well.
Consider, for example, support vector machines, where one seeks a linear classifier with low empirical error and low L2-norm (corresponding to a large geometrical margin). The precise trade-off between the empirical error and the regularizer (e.g. L2-norm) is not known. But since we would like to minimize both, we can limit our attention only to extreme solutions, i.e. classifiers such that one cannot reduce both the empirical error and the regularizer (norm).
Considering the set of attainable (error,norm) combinations, we are interested only in the extreme "frontier" (or "regularization path") of this set. The typical approach is to evaluate classifiers along the frontier on held-out validation data (or cross validate) and choose the classifier minimizing the validation error.
Classifiers along the frontier are typically found by minimizing some parametric combination of the empirical error and the regularizer, e.g. norm^2+C*err, for varying C, in the case of SVMs. Different values of C yield different classifiers along the frontier and C can be thought of as parameterizing the frontier. This particular parametric function of the empirical error and the regularizer is chosen because it leads to a convenient optimization problem, but minimizing any other monotone function of the empirical error and regularizer (in this case, the L2-norm) would also lead to classifiers on the frontier.
Recently, methods have been proposed for obtaining the entire frontier in computation time that is comparable to obtaining a single classifier along the frontier.
The proposed workshop is concerned with optimization and statistical issues related to viewing the entire frontier, rather than a single predictor along it, as an object of interest in machine learning.
Specific issues to be addressed include:
1. Characterizing the "frontier" in a way independent of a specific
trade-off, and its properties as such, e.g. convexity, smoothness,
piecewise linearity/polynomial behavior.
2. What parametric trade-offs capture the entire frontier? Minimizing
any monotone trade-off leads to a predictor on the frontier, but what
conditions must be met to ensure all predictors along the frontier
are obtained when the regularization parameter is varied? Study of
this question is motivated by scenarios in which minimizing a
non-standard parametric trade-off leads to a more convenient
optimization problem.
3. Methods for obtaining the frontier:
3a. Direct methods relying on a characterization, e.g. Hastie et al's
(2004) work on the entire regularization path of Support vector
Machines.
3b.