gusl: (Default)
[personal profile] gusl
Lately, I've been faced with problems of the following type:
* there are many ways to represent an object
* a canonical representation is either hard to define, or unnatural to work with

(1) optimizing over the space of k-dim linear subspaces (a.k.a. k-flats). They can be represented as bases (more canonically, orthogonal bases), but that's not enough to identify them uniquely. We can define k-flats to be points in a Grassmanian, which is the appropriate quotient space (in which bases of the same space form an equivalence class). But optimizing here involves doing calculus on surfaces, which requires knowledge of differential geometry... or easier, one could use Lagrange multipliers, but I don't have enough constraints (I don't know how to constrain the rotation of the orthogonal basis).

(2) suppose you have an unknown probability distribution over clusterings of nodes. You can easily assign a label to each node such that two nodes are in the same cluster if they have the same label. Of course, these labels are just names, and permuting them results in an equivalent structure, which necessarily has the same probability. Now, suppose you get several samples from this distribution (say, by MCMC), and you want to estimate the distribution. Now, for each sample, you could simply add the other n!-1 relabelings ("mirror images"), but can you do better?
This is known as the "label-switching problem", and in some cases it can addressed by "artificial identifiability constraints" (e.g. in the form of: WOLOG, node 1 is always labeled with "a"), but this too can become unmanageable and unnatural.

Tangentially, is there a nice way to represent graphs modulo isomorphism? I'm aware of spectral techniques (e.g. graph Laplacian), but these are usually lossy, i.e. querying whether two graphs are the "same" (i.e. isomorphic) will result in some false positives.

(no subject)

Date: 2009-10-16 01:47 am (UTC)
From: (Anonymous)
did it... if you use the orthogonal distance (regular euclidean distance), it turns out that the solution is the same as that given by PCA... that's where an eigendecomposition solves this problem... ;)

just to fix notation, the solution (k-flat) corresponds to the vector space spanned by the eigenvectors associated to the k largest eigenvalues of the matrix \sum_i{x_i x_i^T}, x_i being the points on that sphere...

if you really want the proof, I guess I could take some time later to write it down for you or just meet me at UBC some day...

nice exercise, tough... I guess the geodesic distance would be way more complicated...

cheers,
Ives

(no subject)

Date: 2009-10-16 02:03 am (UTC)
From: [identity profile] gustavolacerda.livejournal.com
Thanks for working on this. The geodesic distance is easy to write down, and probably convex.

(no subject)

Date: 2009-10-16 02:15 am (UTC)
From: (Anonymous)
you're welcome... regarding the geodesic distance, it is simple to write down, but to optimize it I guess wouldn't be so simple... regarding convexity, I don't know what you could do since the sphere is not convex. I have some ideas to circumvent this but I'm not sure how useful they would be...

Ives

(no subject)

Date: 2009-10-16 02:32 am (UTC)
From: (Anonymous)
nop... but the closed ball is... I mean, a convex subset of the euclidean space...
maybe we are adopting the same name for different things... let me make it clear...
for me, the sphere is the set of x such that ||x||=1, and the closed ball is the set of x such that ||x|| <= 1
as a subset of the euclidean vector space (inheriting the usual addition and scalar multiplication), the sphere is not convex, but the closed ball is...
are you ok with that?

cheers,
Ives

(no subject)

Date: 2009-10-16 04:08 am (UTC)

(no subject)

Date: 2009-10-16 04:07 am (UTC)
From: [identity profile] gustavolacerda.livejournal.com
This square matrix, is it n-by-n (where n is the dimension of the Euclidean space), or P-by-P (where P is the number of points)?
I think the former. When you take all the eigenvectors, you end up with the original hypersphere.

I now understand why PCA is the solution here.

February 2020

S M T W T F S
      1
2345678
9101112131415
16171819202122
23242526272829

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags