convex optimization
Apr. 23rd, 2011 06:41 pmIs there a name for the following gradient ascent algorithm?
The trick of shrinking the step size only when the objective degrades was due to
random_walker. (hm, maybe I should use 0.5 rather than 0.9)
Although my epsilon is 1e-10 (for both the gradient computation and the above optimization algorithm), different starting points give answers within 1e-3 of each other... I can't see why this would happen... if a path has reached an increment size of 1e-10, that should mean that steps larger than that were overshooting, and therefore the true local maximum can't be further than 1e-10 away. Where am I going wrong?
---
I'm finding that this algorithm often keeps moving in the same direction for a long time, a sign that the step size has gotten too small... so I'm thinking I should also increase the step size whenever the objective is not degraded, only not as fast as the decrease that happens when it is degraded. e.g. multiply by 0.5 when it's degraded, multiply by 1.2 otherwise. If we have smoothness then near the optimum we should see degradation about half the time, so this way we still get convergence.
while(L2norm(increment) > epsilon){
grad <- computeGrad(f,state)
increment <- grad*stepSize
state <- state + increment
newObj <- f(state) ##objective
if (newObj < oldObj) { ## in case the objective degrades: go back, decrease step size
state <- oldState
stepSize <- stepSize*0.9
} else { ##otherwise, keep going
oldObj <- newObj
oldState <- state
}
}
return(state)
The trick of shrinking the step size only when the objective degrades was due to
Although my epsilon is 1e-10 (for both the gradient computation and the above optimization algorithm), different starting points give answers within 1e-3 of each other... I can't see why this would happen... if a path has reached an increment size of 1e-10, that should mean that steps larger than that were overshooting, and therefore the true local maximum can't be further than 1e-10 away. Where am I going wrong?
---
I'm finding that this algorithm often keeps moving in the same direction for a long time, a sign that the step size has gotten too small... so I'm thinking I should also increase the step size whenever the objective is not degraded, only not as fast as the decrease that happens when it is degraded. e.g. multiply by 0.5 when it's degraded, multiply by 1.2 otherwise. If we have smoothness then near the optimum we should see degradation about half the time, so this way we still get convergence.