gusl | Exponentiated Gradient; list etiquette

You're viewing

gusl's journal
Create a Dreamwidth Account Learn More

Reload page in style: site light

I feel like posting a question to the UAI list, but the list seems to be for announcements (except for one thread in February).

So I'm pasting it here instead:

Exponentiated Gradient

If we're trying to learn a set of weights {w_i} via EG updates, then
what we're doing is updating the log(w_i) additively. Suppose the true
w_i is negative. Then log(w_i) is a complex number, and we can only
get there if the update terms are complex, which can't happen under
ordinary loss functions.

This would compromise the consistency of EG in cases where any coefficient is negative. Where am I going wrong?

It would be nice to have a list explicitly for discussion... like in the old days of the Internet, with newsgroups.

Threaded | Top-Level Comments Only

From:

the-locster.livejournal.com

Dunno anything about EG but... From http://hunch.net/?p=286
Initially, there are of set of n weights, which might have values (1/n,…,1/n), (or any other values from a probability distribution)

Thus you shouldn't have a true weight that is negative. If you were thinking negative weights were necessary then maybe you missed a step were you adjust/bias the data to ensure non-negativity of the weights.

From:

gustavolacerda.livejournal.com

yes, I've seen that. This supposed requirement that the weight-vector be a pdf is bizarre. It's not a constraint that's ever seen in linear or logistic regression AFAIK... so is w always inside the probability simplex?

Would RBMs with non-negative weights be similar to non-negative PCA?
what about http://en.wikipedia.org/wiki/Non-negative_matrix_factorization ?

Using EG in my problem was a suggestion made by the prof, apparently without too much thought. Sure, you could choose a large negative value to call 0, but that still compromises consistency.

Somehow this needs to become a project report and a 5-minute talk very soon.

From: (Anonymous)

You're not wrong; this is a real issue. Another way to think of it is that the updates to w_i are multiplicative by a positive constant, so w_i's sign can never change.

However, if you add in a (non-exponentiated) gradient update every so often, the sign of the gradient term can then switch. Or you could supplement your feature set with negatives of the original features.

From:

gustavolacerda.livejournal.com

Thanks. What about EG+-? My understanding is that it represents each feature with 2 components (a positive and a negative one).

I have defined a condensation of the real numbers which is an oddly-symmetric version of exp. It works better than additive update, when training RBMs with a CD gradient.

By the way, who is this?