gusl: (Default)
[personal profile] gusl
William liked my ideas about RHS-search optimization by learning productive RHS-states. After some thought, we decided that we're not sure whether this problem fits into a reinforcement-learning formulation (because that assumes that the optimal policy is known?). But we came up with a plan anyway:

To Do:
* implement best-first search (using a priority queue) (if memory problems appear, read this)
* look at previous successful searches, and use the frequency of the operators to estimate the value of the search nodes.
* 1-gram model: the probability of a node being successful is the probability of its parent being successful times the success probability associated with the last operator added (estimated above). (Use log formulation)
* optionally, we could also condition on the skill-name, and see if that gives us an improvement!

Next step: since there are dependencies between the operators, we may want to use an n-gram model.

He suggested using an n-state HMM as a way to model the memory. (I don't understand this well).

On Learning HMMs:
If you can observe the states of the HMM (i.e. it's a non-hidden MM), then you can learn the transition probabilities directly.
If you can only observe the state of the HMM indirectly (i.e. through the signal sigma), you need to find the most likely sequence of states (e.g. using Viterbi) (E-step?), and use that to do the above (M-step?). This is an instance of EM. (but wouldn't it be more correct to use multiple high-likelihood strings, weighting each one appropriately, instead of only using the maximum-likelihood one?)
See http://en.wikipedia.org/wiki/Baum-Welch_algorithm
(will be screened)
(will be screened if not validated)
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

February 2020

S M T W T F S
      1
2345678
9101112131415
16171819202122
23242526272829

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags