MCMC and principal eigenvectors
Feb. 1st, 2010 03:18 amI am wondering if, instead of running MCMC and hoping that it has "mixed" ("achieved stationarity"), there are approaches based on computing (or approximating) the principal left eigenvector of the transition matrix.
Of course, in continuous spaces, this "matrix" has as many entries as S^2, where S is the space our parameters live in... so our "principal eigenvector" becomes the "principal eigenfunction". Functional analysts, how do you compute this?
If it helps, we might want to choose a sparse proposal (such as the one corresponding to Gibbs sampling, in which all transitions changing more than one parameter have probability density zero)
Of course, in continuous spaces, this "matrix" has as many entries as S^2, where S is the space our parameters live in... so our "principal eigenvector" becomes the "principal eigenfunction". Functional analysts, how do you compute this?
If it helps, we might want to choose a sparse proposal (such as the one corresponding to Gibbs sampling, in which all transitions changing more than one parameter have probability density zero)
(no subject)
Date: 2010-02-01 02:06 pm (UTC)(no subject)
Date: 2010-02-01 07:33 pm (UTC)(no subject)
Date: 2010-02-01 07:40 pm (UTC)It appears that the problem more or less reduces to solving differential equations in the space of interest and, as with solving DEs, you may be able to get a general solution for a tiny but useful class of problems (like linear DEs or one-dimensional Brownian motion), but in general you have to either discover rare completely ungeneral tricks or do numerical approximations.
(no subject)
Date: 2010-02-01 07:42 pm (UTC)I'm thinking I should ask a statistical physicist about this.
(no subject)
Date: 2010-02-01 09:44 pm (UTC)(no subject)
Date: 2010-02-03 10:24 am (UTC)Our goal is to minimize the distance between the estimate and the posterior, i.e.:
Let f be our estimate. We want to minimize a functional like:
D(f) = \Integral |f(x) - (tf) (x)|^2 dx
where tf is the result of applying transition function t to f.
Transition function t is defined in terms of the proposal g:
t(i,j) = min(1,P(j)/P(i)) g(i,j) , as per Metropolis-Hastings.
g(i,j) is defined as the probability of being in state j at time t+1 given that we were in state i at time t.
Minimizing a function is a typical form of a variational problem.
This confirms my suspicion. (though they use reverse KL rather than L2 distance)