ML Theory class
Jan. 18th, 2007 12:37 pmI just went to Machine Learning Theory. The lecture was:
The Mistake-Bound model. Combining expert advice. Connections to info theory and game theory.
It's neat stuff! It reminded me of Kevin Kelly's work (on the consequences of learning theory for scientific methodology). One difference is that here, one is not required to have *one* hypothesis (instead, one can combine advice from multiple hypotheses or multiple experts, using e.g. weighted majority), and the goal is to do nearly as well as the best expert. In Kelly's framework, you always have a hypothesis at any given time, and the goal is to minimize retractions in chance.
One thing I don't like so much is that they're always interested in the "worst case". My objection is that since nature isn't an "evil genie", these worst-case bounds don't tell us much. OTOH, being bound to use worst-case criteria is good in that sense that it forces me to formalize my assumptions mathematically, in this case "nature isn't an 'evil genie'". That way, the worst-case bound will be more informative about the average-case.
I have thought of some interesting questions:
* how to design a language (i.e. feature predicates) so that you get a good prior. See Goodman's Grue.
* how to combine features into chunks, for better performance (what I normally would call "feature selection")
The Mistake-Bound model. Combining expert advice. Connections to info theory and game theory.
It's neat stuff! It reminded me of Kevin Kelly's work (on the consequences of learning theory for scientific methodology). One difference is that here, one is not required to have *one* hypothesis (instead, one can combine advice from multiple hypotheses or multiple experts, using e.g. weighted majority), and the goal is to do nearly as well as the best expert. In Kelly's framework, you always have a hypothesis at any given time, and the goal is to minimize retractions in chance.
One thing I don't like so much is that they're always interested in the "worst case". My objection is that since nature isn't an "evil genie", these worst-case bounds don't tell us much. OTOH, being bound to use worst-case criteria is good in that sense that it forces me to formalize my assumptions mathematically, in this case "nature isn't an 'evil genie'". That way, the worst-case bound will be more informative about the average-case.
I have thought of some interesting questions:
* how to design a language (i.e. feature predicates) so that you get a good prior. See Goodman's Grue.
* how to combine features into chunks, for better performance (what I normally would call "feature selection")