manifold learning
May. 14th, 2010 02:13 pmThis plot of European genomes is an almost perfect match with the map of Europe.
( plot, with map )
Interestingly, PC1 is pretty well aligned with North<->South. This suggests that: (a) most migrations happened East<->West (b) there has been less evolutionary pressure in East<->West migrations(?). It would be good to see the statistical significance of this directionality, e.g. by running a bootstrap. And of course, to see whether this holds in other parts of the world... e.g. how about diagonally-aligned landmasses, such as Sumatra?
You can also see that some national borders are sharper than others, e.g. the Iberian peninsula and Italy seem to be fairly isolated from the rest (probably due to the Pyrenees, Alps)
Looking at the "residuals", Romania appears a lot further South than its geographical location. Bulgaria doesn't as much, despite having more Romani per capita. (Is there software to draw cartograms from this type of data?)
Italy looks like the most diverse country (at least, in the first two PCs), since it has the most spread-out cloud. (Also note the 5 individuals who look like they're from Sardinia!)
Placing Slovakia in Southern Italy is probably an artifact of a very small sample size.
See also: manifold learning, the fascinating idea of discovering topological structure from data, which can be interpreted as things like phylogenetic trees or developmental paths (e.g. in the face dataset).
( plot, with map )
Interestingly, PC1 is pretty well aligned with North<->South. This suggests that: (a) most migrations happened East<->West (b) there has been less evolutionary pressure in East<->West migrations(?). It would be good to see the statistical significance of this directionality, e.g. by running a bootstrap. And of course, to see whether this holds in other parts of the world... e.g. how about diagonally-aligned landmasses, such as Sumatra?
You can also see that some national borders are sharper than others, e.g. the Iberian peninsula and Italy seem to be fairly isolated from the rest (probably due to the Pyrenees, Alps)
Looking at the "residuals", Romania appears a lot further South than its geographical location. Bulgaria doesn't as much, despite having more Romani per capita. (Is there software to draw cartograms from this type of data?)
Italy looks like the most diverse country (at least, in the first two PCs), since it has the most spread-out cloud. (Also note the 5 individuals who look like they're from Sardinia!)
Placing Slovakia in Southern Italy is probably an artifact of a very small sample size.
See also: manifold learning, the fascinating idea of discovering topological structure from data, which can be interpreted as things like phylogenetic trees or developmental paths (e.g. in the face dataset).