using scoring rules as loss functions
Oct. 8th, 2007 02:19 pmThe way classification problems are usually framed, the algorithm outputs a class-label, and the loss function is a function of the number of errors of each type, such that all partial derivatives are positive (i.e. an extra error is always bad, regardless of its type, but some types may be worse than others).
But this is a lossy process: when an algorithm outputs a label, we don't know if was 99% confident or 60% confident. If we, it might often be better for the algorithm to output its full belief, especially when there is little data.
I propose that we generalize the classification framework, so that algorithms output a multinomial probability distribution, instead of a single label. The loss function now becomes defined by a scoring rule.
I came up with this idea because I've been thinking about how Robin Hanson's work (prediction markets, market scoring rules) may be relevant to Machine Learning.
Q: has anyone thought of this before?
A: yes! Here's a Google search.
This paper is from Wharton, Penn's business school, which seems like an unusual place for Machine Learning:
Andreas Buja, Werner Stuetzle, Yi Shen (2005) - Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications
But this is a lossy process: when an algorithm outputs a label, we don't know if was 99% confident or 60% confident. If we, it might often be better for the algorithm to output its full belief, especially when there is little data.
I propose that we generalize the classification framework, so that algorithms output a multinomial probability distribution, instead of a single label. The loss function now becomes defined by a scoring rule.
I came up with this idea because I've been thinking about how Robin Hanson's work (prediction markets, market scoring rules) may be relevant to Machine Learning.
Q: has anyone thought of this before?
A: yes! Here's a Google search.
This paper is from Wharton, Penn's business school, which seems like an unusual place for Machine Learning:
Andreas Buja, Werner Stuetzle, Yi Shen (2005) - Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications
(no subject)
Date: 2007-10-08 08:16 pm (UTC)(no subject)
Date: 2007-10-08 08:43 pm (UTC)In any case, I'm no longer who came up with this idea first.