gusl | RHS-search optimization (Reply)

William liked my ideas about RHS-search optimization by learning productive RHS-states. After some thought, we decided that we're not sure whether this problem fits into a reinforcement-learning formulation (because that assumes that the optimal policy is known?). But we came up with a plan anyway:

To Do:
* implement best-first search (using a priority queue) (if memory problems appear, read this)
* look at previous successful searches, and use the frequency of the operators to estimate the value of the search nodes.
* 1-gram model: the probability of a node being successful is the probability of its parent being successful times the success probability associated with the last operator added (estimated above). (Use log formulation)
* optionally, we could also condition on the skill-name, and see if that gives us an improvement!

Next step: since there are dependencies between the operators, we may want to use an n-gram model.

He suggested using an n-state HMM as a way to model the memory. (I don't understand this well).

On Learning HMMs:
If you can observe the states of the HMM (i.e. it's a non-hidden MM), then you can learn the transition probabilities directly.
If you can only observe the state of the HMM indirectly (i.e. through the signal sigma), you need to find the most likely sequence of states (e.g. using Viterbi) (E-step?), and use that to do the above (M-step?). This is an instance of EM. (but wouldn't it be more correct to use multiple high-likelihood strings, weighting each one appropriately, instead of only using the maximum-likelihood one?)
See http://en.wikipedia.org/wiki/Baum-Welch_algorithm

S	M	T	W	T	F	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29

Gustavo Lacerda

RHS-search optimization

Profile

February 2020

Most Popular Tags

Style Credit

Expand Cut Tags