Exponentiated Gradient; list etiquette
Apr. 16th, 2009 01:41 amI feel like posting a question to the UAI list, but the list seems to be for announcements (except for one thread in February).
So I'm pasting it here instead:
Exponentiated Gradient
This would compromise the consistency of EG in cases where any coefficient is negative. Where am I going wrong?
It would be nice to have a list explicitly for discussion... like in the old days of the Internet, with newsgroups.
So I'm pasting it here instead:
Exponentiated Gradient
If we're trying to learn a set of weights {w_i} via EG updates, then
what we're doing is updating the log(w_i) additively. Suppose the true
w_i is negative. Then log(w_i) is a complex number, and we can only
get there if the update terms are complex, which can't happen under
ordinary loss functions.
This would compromise the consistency of EG in cases where any coefficient is negative. Where am I going wrong?
It would be nice to have a list explicitly for discussion... like in the old days of the Internet, with newsgroups.
(no subject)
Date: 2009-04-16 10:29 pm (UTC)Initially, there are of set of n weights, which might have values (1/n,…,1/n), (or any other values from a probability distribution)
Thus you shouldn't have a true weight that is negative. If you were thinking negative weights were necessary then maybe you missed a step were you adjust/bias the data to ensure non-negativity of the weights.
(no subject)
Date: 2009-04-17 08:47 am (UTC)Would RBMs with non-negative weights be similar to non-negative PCA?
what about http://en.wikipedia.org/wiki/Non-negative_matrix_factorization ?
Using EG in my problem was a suggestion made by the prof, apparently without too much thought. Sure, you could choose a large negative value to call 0, but that still compromises consistency.
Somehow this needs to become a project report and a 5-minute talk very soon.
(no subject)
Date: 2009-04-28 05:28 pm (UTC)However, if you add in a (non-exponentiated) gradient update every so often, the sign of the gradient term can then switch. Or you could supplement your feature set with negatives of the original features.
(no subject)
Date: 2009-04-28 06:48 pm (UTC)I have defined a condensation of the real numbers which is an oddly-symmetric version of exp. It works better than additive update, when training RBMs with a CD gradient.
By the way, who is this?