MCMC for frequentists?
Apr. 2nd, 2009 02:09 amMCMC is a class of methods, typically used by "full" Bayesians for sampling from the posterior probability. Some Bayesians (the "non-full" ones) may be happy with Maximum a Posteriori (MAP) estimate as a point estimate. The latter are faced with "merely" an optimization problem, and have no use for MCMC1.
Running an MCMC method consists of doing a random walk according to a Markov Chain whose state-space is the parameter space. Steps are suggested by a "proposal distribution". After some "burn-in" time, we wave our hands and declare that the random walk has "mixed" (i.e. the walk's randomness has erased essentially all the information we had about our initial state), implying that we can consider our visited states as unbiased samples from the posterior (I wonder if formally-minded people analytically bound the mixing rate as a way to justify this assertion).
MCMC methods are based on ratios of probabilities, and are useful because they cancel out the intractable normalization term in energy-based models (such as Undirected Graphical Models).... but why should they only be used by Bayesians?
Frequentists are typically interested in finding the maximum likelihood estimate... but I see no reason why they should focus strictly on getting a point estimate. Shouldn't they be interested in places where the likelihood function is high (level sets)?
Tangentially, Monte Carlo methods were invented by physicists, who in this setting are typically interested in simulating things like the spin correlation as it decays with distance. Nothing Bayesian about that.
Here's a causal diagram showing the connection between Monte Carlo methods and Bayesianism:
(Thanks to Cosma Shalizi for the interesting discussion; The errors, if any, are all mine)
1 - they could cool all the way to 0 Kelvin, but that's already called "Simulated Annealing".
Running an MCMC method consists of doing a random walk according to a Markov Chain whose state-space is the parameter space. Steps are suggested by a "proposal distribution". After some "burn-in" time, we wave our hands and declare that the random walk has "mixed" (i.e. the walk's randomness has erased essentially all the information we had about our initial state), implying that we can consider our visited states as unbiased samples from the posterior (I wonder if formally-minded people analytically bound the mixing rate as a way to justify this assertion).
MCMC methods are based on ratios of probabilities, and are useful because they cancel out the intractable normalization term in energy-based models (such as Undirected Graphical Models).... but why should they only be used by Bayesians?
Frequentists are typically interested in finding the maximum likelihood estimate... but I see no reason why they should focus strictly on getting a point estimate. Shouldn't they be interested in places where the likelihood function is high (level sets)?
Tangentially, Monte Carlo methods were invented by physicists, who in this setting are typically interested in simulating things like the spin correlation as it decays with distance. Nothing Bayesian about that.
Here's a causal diagram showing the connection between Monte Carlo methods and Bayesianism:
Bayesian --> hard integrals --> Monte Carlo methods
(Thanks to Cosma Shalizi for the interesting discussion; The errors, if any, are all mine)
1 - they could cool all the way to 0 Kelvin, but that's already called "Simulated Annealing".