I had an interesting exchange with Bob regarding the differences between statistics and machine learning. If it were just differences in jargon, it would be no big deal—you could just translate back and forth—but it’s trickier than that, because the two subfields also have different priorities and concepts.
Learning linear predictors with the logistic loss—both in stochastic and online settings—is a fundamental task in machine learning and statistics, with direct connections to classification and boosting. Existing “fast rates” for this setting exhibit exponential dependence on the predictor norm, and Hazan et al. (2014) showed that this is unfortunately unimprovable. Starting with the simple observation that the logistic loss is 1-mixable, we design a new efficient improper learning algorithm for online logistic regression that circumvents the aforementioned lower bound with a regret bound exhibiting a doubly-exponential improvement in dependence on the predictor norm. This provides a positive resolution to a variant of the COLT 2012 open problem of McMahan and Streeter when improper learning is allowed. This improvement is obtained both in the online setting and, with some extra work, in the batch statistical setting with high probability. Leveraging this improved dependency on the predictor norm yields algorithms with tighter regret bounds for online bandit multiclass learning with the logistic loss, and for online multiclass boosting. Finally, we give information-theoretic bounds on the optimal rates for improper logistic regression with general function classes, thereby characterizing the extent to which our improvement for linear classes extends to other parametric and even nonparametric settings. This is joint work with Dylan J. Foster, Haipeng Luo, Mehryar Mohri and Karthik Sridharan.
“Learning linear predictors”: does this mean estimating coefficients, or deciding what predictors to include in the model?