gusl: (Default)
[personal profile] gusl
According to the sources I've read, Gibbs sampling doesn't really tell you how to sample... it's just divide-and-conquer: you still need to sample each variable from its full conditional.

In my problem, that would be more trouble than it's worth, since no denominators are canceling. So I'm doing good old Metropolis-Hastings, with a proposal that makes its trajectory resemble Gibbs: it randomly chooses one variable to modify, and then randomly chooses a value for that variable. In other words, the proposal is uniform over the neighbors in the "generalized hypercube"*.

I can easily compute the posterior. In traditional MCMC, I think you would weight the samples by how often they appear. But doesn't it make more sense to directly compute the posterior in all sampled models?

Should I be using MCMC at all?
How else am I going to find the set of high-probability models? (Maybe what I want is a mode-oriented stochastic search, as my EA project did.

Believe it or not, it's my first time actually implementing MCMC.

Also, I'd like to know what proportion of the posterior mass my sampled models account for... but this is probably VeryHard to do if I can't enumerate the models (otherwise we could know whether the chain has mixed).



* - what do you call a non-binary hypercube? I mean: let each node be a string in {0,...,k-1}^n, and neighborhood relation is defined by set of strings that differ in exactly 1 character. When k=2, we get the n-dim hypercube. What's the word when n>2?.

(no subject)

Date: 2009-11-13 04:36 pm (UTC)
From: [identity profile] lordspaz.livejournal.com
I think the nice thing about Gibbs sampling is you always accept.

(no subject)

Date: 2009-11-13 04:41 pm (UTC)
From: [identity profile] stepleton.livejournal.com
I'm doing good old Metropolis-Hastings, with a proposal that makes its trajectory resemble Gibbs: it randomly chooses one variable to modify, and then randomly chooses a value for that variable.

Another way to look at it then is that you are doing Gibbs sampling, but you're using a fancy sampler (MH) to sample from the (complicated) conditional distribution for that variable.

I can easily compute the posterior. In traditional MCMC, I think you would weight the samples by how often they appear. But doesn't it make more sense to directly compute the posterior in all sampled models?

Are you talking about running your sampler, then assigning a weight to all of your posterior samples based on the posterior? Don't do this---if you do that, you'll be representing a density which is your original density squared. You can try this for yourself with a simpler model. Draw a whole bunch of normal samples. If you took their histogram, you'd get something like a normal distribution back. Now weight the samples by the normal distribution PDF value at their location, then take the histogram again where each sample contributes its weight to its bin. This is your new probability distribution, and it's not normal.

Also, I'd like to know what proportion of the posterior mass my sampled models account for...

Just find a reason to add a continuous variable to your posterior, and then you know your answer: 0.

...

I'm having a hard time mapping this "generalized hypercube" onto the problem I read about in your paper, but I'm going to go out on a limb and assume that you are trying to assign labels to nodes in your graph, or perhaps to edges in your graph. If this is the case, you may find that Gibbs sampling doesn't mix very well---that it's easy for the sampler to get stuck in a mode, and that jumping to another likely mode would require an unlikely sequence of individual label changes. In situations like these, it can help to have a sampling technique that can change multiple labels per iteration, not just one. Auxiliary variable approaches are one way to do this. Look up "Swendsen-Wang" for an auxiliary variable approach to drawing samples from Ising models; I've adapted it gainfully to my own work.

Radford Neal has a very long document describing a large number of sampling techniques. It could pay to check that out.

(no subject)

Date: 2009-11-13 08:17 pm (UTC)
From: [identity profile] gustavolacerda.livejournal.com
<< Another way to look at it then is that you are doing Gibbs sampling, but you're using a fancy sampler (MH) to sample from the (complicated) conditional distribution for that variable. >>

(namely the 1D full conditional for that variable)

yeah, I see what you're saying! And it would be true, except for a minor technicality.

The way I implemented this, if the MH move is rejected, most of the time the proposal will be changing a different variable. So we're not sampling from the 1D full conditional.

(no subject)

Date: 2009-11-13 08:21 pm (UTC)
From: [identity profile] gustavolacerda.livejournal.com
<< Are you talking about running your sampler, then assigning a weight to all of your posterior samples based on the posterior? Don't do this---if you do that, you'll be representing a density which is your original density squared. >>

if the posterior samples have lots of duplication (i.e. states are visited multiple times), and I remove the duplication, I'm effectively normalizing, i.e. dividing by the posterior density.

(no subject)

Date: 2009-11-13 08:29 pm (UTC)
From: [identity profile] stepleton.livejournal.com
I would think that it's a better idea to just preserve the duplicates. If it really bothers you, then remove the duplicates and weight the samples by how many duplicates there were. Otherwise, be very careful with reweighting schemes---you can easily abandon the density you were trying to represent.

(no subject)

Date: 2009-11-13 08:25 pm (UTC)
From: [identity profile] gustavolacerda.livejournal.com
I'm assigning labels to nodes.
Thanks for the tips!


<< In situations like these, it can help to have a sampling technique that can change multiple labels per iteration, not just one. >>

MH with the right proposal is like Blocked Gibbs with a fancy sampler! :-)

(no subject)

Date: 2009-11-13 08:38 pm (UTC)
From: [identity profile] stepleton.livejournal.com
If you like, I could send you relevant draft chapters of my thesis.

(no subject)

Date: 2009-11-14 12:48 am (UTC)
(deleted comment)

(no subject)

Date: 2009-11-13 08:27 pm (UTC)
From: [identity profile] stepleton.livejournal.com
No, I think you are still sampling from the variable's conditional even if it rejects.

Think about it like this. Instead of running only a single MH iteration per variable, you could run multiple MH iterations per variable. Some of those iterations would reject, some would accept. Nevertheless, the collection of samples you would get, some of which are repeated, would approximate the conditional distribution of the variable. (Think again in terms of histograms if that helps.)

The fewer MH iterations you run, the sparser your approximation becomes, until finally you have only a single sample. It's still a sample from the conditional, though, at least theoretically speaking.

Actually, you could be engaging in risky behavior by sampling only once. It depends on your problem, of course, but many MCMC approaches often discard the first few samples to allow the sampler to "burn in", i.e. converge on an area of high probability from a less likely initialization state. (As you probably know.) It might be better for you to run a few iterations of the sampler per variable as you perform your Gibbs sampling; that way you can be more certain that you're drawing from the conditional distribution of the variable.

(no subject)

Date: 2009-11-14 01:02 am (UTC)
From: [identity profile] gustavolacerda.livejournal.com
<< No, I think you are still sampling from the variable's conditional even if it rejects.

Think about it like this. Instead of running only a single MH iteration per variable, you could run multiple MH iterations per variable. Some of those iterations would reject, some would accept. >>

hopefully... but anyway, it's not what I'm doing.


<< Nevertheless, the collection of samples you would get, some of which are repeated ... >>

hang on, when the proposed move is rejected, you don't get the current state again, do you? My sources were ambiguous about this.


<< ..., would approximate the conditional distribution of the variable. (Think again in terms of histograms if that helps.) >>

sure.


<< but many MCMC approaches often discard the first few samples to allow the sampler to "burn in", i.e. converge on an area of high probability from a less likely initialization state. (As you probably know.) It might be better for you to run a few iterations of the sampler per variable as you perform your Gibbs sampling; that way you can be more certain that you're drawing from the conditional distribution of the variable. >>

I never considered that I might need to burn-in when sampling from the full conditional. If I need to burn-in at two levels, that's a big handicap for Gibbs / single-variable-proposal MH... my reckless intuition is that you don't need to.

If I need to do two-level burn-in here, then I should still need to do it when I make the proposal distribution a little fuzzier (makes more sense in continuous hypothesis spaces).

(no subject)

Date: 2009-11-14 01:07 am (UTC)
From: [identity profile] stepleton.livejournal.com
when the proposed move is rejected, you don't get the current state again, do you? My sources were ambiguous about this.

That's correct; you get the current state again.

(no subject)

Date: 2009-11-14 01:21 am (UTC)
From: [identity profile] stepleton.livejournal.com
Whoops, hit post on accident.

I never considered that I might need to burn-in when sampling from the full conditional. If I need to burn-in at two levels, that's a big handicap for Gibbs / single-variable-proposal MH... my reckless intuition is that you don't need to.

Well... you might not need to. Backing away from the Gibbs sampling interpretation, your proposal could be viewed as a two-step operation, where you sample which variable to update and then sample a proposed new value for the variable. So, no problem, that's a perfectly valid MH proposal. I guess I was thinking too hard about Gibbs sampling.

The variable choice should cancel out for forward and backward proposals, and it sounds like your proposed new value will too, since you're talking about uniform probabilities over the "generalized hypercube" neighbors. So that's nice, at least.

As usual, it all boils down to the nature of your problem. It could be that you need really need to nail each label one at a time to get anywhere---in that case, take some time to burn in. It could be that this is not necessary, or that it doesn't matter in the end. Thinking about your problem a little bit, I suspect it may not really matter. My guess is that you'll get about the same number of rejections either way.

If I need to do two-level burn-in here, then I should still need to do it when I make the proposal distribution a little fuzzier (makes more sense in continuous hypothesis spaces).

Dunno what this means...

Of course if you have a truly righteous proposal distribution, you don't need to burn in much at all!

(no subject)

Date: 2009-11-14 01:34 am (UTC)
From: [identity profile] stepleton.livejournal.com
In fact, if the proposal of the new label for a node is not conditioned on the old label for the node, I think you can show that the expected proportion of rejections is the same for both the burn-in and the no burn-in strategies.

February 2020

S M T W T F S
      1
2345678
9101112131415
16171819202122
23242526272829

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags