gusl: (Default)
[personal profile] gusl
I need to solve one of the above.

Goal: get all cyclic graphs that are admitted by the output of ICA, before looking at the coefficients. For edges that are not there, the entry will be close to 0 (if we had infinitely much data, they would be exactly zero). We want to permute the columns of K*W in such a way that all entries in the diagonal are non-zero.

I can think of two ways to do this:

* The lossless way: in the acyclic case, we are looking for a unique permutation, since the matrix will be permutable into (non-strict) lower-triangularity. For this, we use the Hungarian algorithm (linear assignment problem (LAP)) to find the permutation that maximizes SUM_m:diagonal(|1/m|). This was used by Shimizu, Hoyer, Hyvarinen in their JMLR2006 paper.

If we had an algorithm that solves the LAP to get the k best permutations, and can figure out what k is, that would solve my problem. But how to figure out k? If we have enough data, all acceptable permutations should cluster together at the top, so there should be a big gap above the first wrong permutation. So |1/m| seems like a good loss function.

I don't have a P-time algorithm that solves the LAP for the k best permutations. One heuristic is to do greedy swaps with random restarting, etc, but there are no guarantees this way. I have my fingers crossed that some Computer Scientist will point me to a P-time algorithm with guarantees.


* the lossy way: do significance tests to figure out which entries are meant to be zero (bootstrap sampling is the easy way to do this). Then solve the problem logically (constrained n-rooks problem).


It would seem that the lossless way is better, because it doesn't explicitly throw out information. OTOH, doing significance tests seems like a good idea, because if we don't do them, zeros with a large deviation will not be treated like zeros (and we end up accepting permutations that contain these zeros).

Here's an idea for combining the two: only for the purposes of deciding which permutations to accept, divide each entry by the its s.d. in the bootstrap samples. Now, all entries have the same s.d... and then go the lossless way, by finding solutions to the LAP.

February 2020

S M T W T F S
      1
2345678
9101112131415
16171819202122
23242526272829

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags