numerical optimization
Dec. 27th, 2008 09:32 pmLet A be a linear space.
Let A_m be the subset of A on which a given non-linear constraint is satisfied (drawn above as a curved 2D surface).
Goal: maximize an objective D within the manifold A_m.
Ideally, we would have a coordinate-system for addressing points in A_m, and optimize as if it were a Euclidean space. But I don't know how to do that.
Still, I can see how I could explore A_m by exploring tangent directions, evaluating D, and optimizing in a fashion similar to how it's done in the standard setting (e.g. to do gradient descent: for any point in A_m, compute the gradient in A, project it onto the tangent plane, and this gives the gradient in A_m... when following this gradient, we will leave A_m slightly, and we'll need to snap back to A_m somehow). I suspect there's room for cleverness in these details.
(no subject)
Date: 2008-12-28 06:30 am (UTC)You can discuss tangent spaces to manifolds independently of an embedding in a Euclidean space by defining tangent vectors as equivalence classes of curves in the manifold. This also avoids the issue you seem to be raising, in which following the gradient tangent vector leaves the surface: instead of following a vector in A, follow a member of the corresponding equivalence class of curves in A_m selected in some reasonable way. Of course, I don't have a reasonable way on tap...
(no subject)
Date: 2008-12-28 06:35 am (UTC)(no subject)
Date: 2008-12-28 07:35 pm (UTC)A book that I like pretty well is Practical Methods of Optimization (http://www.amazon.com/Practical-Methods-Optimization-R-Fletcher/dp/0471494631) by Fletcher (the "F" in BFGS).
There are descriptions of the main methods in various places around the internet.
What exactly are you trying to optimize? What is the constraint manifold? Sometimes there are tricks for specific kinds of constraints.
Bear in mind that most of the other methods are faster than gradient descent because they somehow take the second derivative into account (e.g. Gauss-Newton, conjugate gradient, BFGS, even Barzilai-Borwein).
There's room for lots of cleverness in the details, but you may have to take advantage of the specifics of your system.
One different trick that works sometimes: suppose your constraint is given by f(x)=0 with f(x)>0 outside of A_m. For example, on the 2-sphere x^2+y^2+z^2=1 you could use f = (x^2+y^2+z^2-1)^2. Then try an unconstrained minimization on D + k*f where k is some positive constant. After it converges, increase k. When k is big enough it should force all the test points to stay close to A_m.
(no subject)
Date: 2008-12-28 08:08 pm (UTC)<< Then try an unconstrained minimization on D + k*f where k is some positive constant. After it converges, increase k. When k is big enough it should force all the test points to stay close to A_m. >>
It seems that all we need to do is penalize distance from the manifold (i.e. penalize big failures to meet the constraint). To overcome the maxima outside, we need big k.
(no subject)
Date: 2008-12-28 08:10 pm (UTC)(no subject)
Date: 2008-12-28 08:40 pm (UTC)(no subject)
Date: 2008-12-29 03:20 pm (UTC)A is the linear 4D space of joint Markov Chains whose state-space is the Cartesian product of the state-spaces of Q and R and that are compatible with the constraint that Q and R are partial views.
A_m is the subset of A in which the mutual information of the Q and R views is equal to m.
The function I'm trying to optimize is the mixing rate of the joint Markov Chain (Doeblin contraction coefficient).
(no subject)
Date: 2008-12-29 03:28 pm (UTC)yes. And computation is not free.
But there are situations where you can avoid the waste (e.g. if the constraint yields a linear subspace)... and to be a bit more general, where the curvature is small, you can use the linear spline idea: approximate the constraint manifold by linear meshes.
(no subject)
Date: 2008-12-29 07:47 pm (UTC)(no subject)
Date: 2008-12-29 08:41 pm (UTC)Let the state-space of Q be {a,b}, and of R be {c,d}, then the state-space of Z is {(a,c),(a,d),(b,c),(b,d)}.
Z is in state (x,y) <=> Q is in state x and R is in state y.
We call Q and R "partial views" of Z.
(no subject)
Date: 2008-12-29 08:42 pm (UTC)how did you find my blog?
(no subject)
Date: 2008-12-29 09:42 pm (UTC)(no subject)
Date: 2008-12-29 09:49 pm (UTC)In your original post you said something like "compatible with the constraint that Q and R are partial views of A"
which does not seem to add to the original definition of A.
Or are you saying that the state space is the product, and the transition probabilities have to induce the correct probabilities for Q and R?
Am I missing anything?
(no subject)
Date: 2008-12-29 10:08 pm (UTC)