gusl | numerical optimization

You're viewing

gusl's journal
Create a Dreamwidth Account Learn More

Reload page in style: site light

Let A be a linear space.
Let A_m be the subset of A on which a given non-linear constraint is satisfied (drawn above as a curved 2D surface).

Goal: maximize an objective D within the manifold A_m.

Ideally, we would have a coordinate-system for addressing points in A_m, and optimize as if it were a Euclidean space. But I don't know how to do that.

Still, I can see how I could explore A_m by exploring tangent directions, evaluating D, and optimizing in a fashion similar to how it's done in the standard setting (e.g. to do gradient descent: for any point in A_m, compute the gradient in A, project it onto the tangent plane, and this gives the gradient in A_m... when following this gradient, we will leave A_m slightly, and we'll need to snap back to A_m somehow). I suspect there's room for cleverness in these details.

Threaded | Top-Level Comments Only

From:

en-ki.livejournal.com

If A_m is not self-intersecting, you can safely just use the coordinate system of A. Do you need to worry about self-intersecting constraint surfaces?

You can discuss tangent spaces to manifolds independently of an embedding in a Euclidean space by defining tangent vectors as equivalence classes of curves in the manifold. This also avoids the issue you seem to be raising, in which following the gradient tangent vector leaves the surface: instead of following a vector in A, follow a member of the corresponding equivalence class of curves in A_m selected in some reasonable way. Of course, I don't have a reasonable way on tap...

From:

stepleton.livejournal.com

Lagrange multipliers

From:

aristopheles.livejournal.com

This is a really big and general problem that many people have worked on over decades.
A book that I like pretty well is Practical Methods of Optimization (http://www.amazon.com/Practical-Methods-Optimization-R-Fletcher/dp/0471494631) by Fletcher (the "F" in BFGS).
There are descriptions of the main methods in various places around the internet.

What exactly are you trying to optimize? What is the constraint manifold? Sometimes there are tricks for specific kinds of constraints.

Bear in mind that most of the other methods are faster than gradient descent because they somehow take the second derivative into account (e.g. Gauss-Newton, conjugate gradient, BFGS, even Barzilai-Borwein).

There's room for lots of cleverness in the details, but you may have to take advantage of the specifics of your system.

One different trick that works sometimes: suppose your constraint is given by f(x)=0 with f(x)>0 outside of A_m. For example, on the 2-sphere x^2+y^2+z^2=1 you could use f = (x^2+y^2+z^2-1)^2. Then try an unconstrained minimization on D + k*f where k is some positive constant. After it converges, increase k. When k is big enough it should force all the test points to stay close to A_m.

From:

gustavolacerda.livejournal.com

Thanks.

<< Then try an unconstrained minimization on D + k*f where k is some positive constant. After it converges, increase k. When k is big enough it should force all the test points to stay close to A_m. >>

It seems that all we need to do is penalize distance from the manifold (i.e. penalize big failures to meet the constraint). To overcome the maxima outside, we need big k.

From:

gustavolacerda.livejournal.com

it seems wasteful to use the coordinate system of A.

From:

en-ki.livejournal.com

In that you'd be using n > m coordinates to describe an m-dimensional space? It's wasteful of numbers, but numbers are free. Doing otherwise is costly in human attention.

From:

gustavolacerda.livejournal.com

Q and R are 2-state Markov Chains.
A is the linear 4D space of joint Markov Chains whose state-space is the Cartesian product of the state-spaces of Q and R and that are compatible with the constraint that Q and R are partial views.
A_m is the subset of A in which the mutual information of the Q and R views is equal to m.
The function I'm trying to optimize is the mixing rate of the joint Markov Chain (Doeblin contraction coefficient).

From:

gustavolacerda.livejournal.com

<< In that you'd be using n > m coordinates to describe an m-dimensional space? >>

yes. And computation is not free.

But there are situations where you can avoid the waste (e.g. if the constraint yields a linear subspace)... and to be a bit more general, where the curvature is small, you can use the linear spline idea: approximate the constraint manifold by linear meshes.

From:

aristopheles.livejournal.com

What is a "partial view"? What is the Q or R view?

From:

gustavolacerda.livejournal.com

Let Z be our joint MC.
Let the state-space of Q be {a,b}, and of R be {c,d}, then the state-space of Z is {(a,c),(a,d),(b,c),(b,d)}.

Z is in state (x,y) <=> Q is in state x and R is in state y.

We call Q and R "partial views" of Z.

Edited Date: 2008-12-29 08:42 pm (UTC)

From:

gustavolacerda.livejournal.com

btw, have we met?

how did you find my blog?

From:

aristopheles.livejournal.com

You are friends with

qatar.

From:

aristopheles.livejournal.com

I don't see how this says more than your original definition that A (or Z) is the product of Q and R.
In your original post you said something like "compatible with the constraint that Q and R are partial views of A"
which does not seem to add to the original definition of A.
Or are you saying that the state space is the product, and the transition probabilities have to induce the correct probabilities for Q and R?
Am I missing anything?

From:

gustavolacerda.livejournal.com

yes, the transition probabilities of Z have to induce the correct probabilities for Q and R.

Threaded | Top-Level Comments Only

Profile

gusl

My Website

February 2020

S	M	T	W	T	F	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29

Page Summary

Style Credit

Base style: Transmogrified by Yvonne
Theme: Shallowest Depths by krja

Expand Cut Tags

No cut tags

Top of page

Gustavo Lacerda

numerical optimization

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

(no subject)

Profile

February 2020

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags