manifold learning
May. 14th, 2010 02:13 pmThis plot of European genomes is an almost perfect match with the map of Europe.

Interestingly, PC1 is pretty well aligned with North<->South. This suggests that: (a) most migrations happened East<->West (b) there has been less evolutionary pressure in East<->West migrations(?). It would be good to see the statistical significance of this directionality, e.g. by running a bootstrap. And of course, to see whether this holds in other parts of the world... e.g. how about diagonally-aligned landmasses, such as Sumatra?
You can also see that some national borders are sharper than others, e.g. the Iberian peninsula and Italy seem to be fairly isolated from the rest (probably due to the Pyrenees, Alps)
Looking at the "residuals", Romania appears a lot further South than its geographical location. Bulgaria doesn't as much, despite having more Romani per capita. (Is there software to draw cartograms from this type of data?)
Italy looks like the most diverse country (at least, in the first two PCs), since it has the most spread-out cloud. (Also note the 5 individuals who look like they're from Sardinia!)
Placing Slovakia in Southern Italy is probably an artifact of a very small sample size.
See also: manifold learning, the fascinating idea of discovering topological structure from data, which can be interpreted as things like phylogenetic trees or developmental paths (e.g. in the face dataset).

Interestingly, PC1 is pretty well aligned with North<->South. This suggests that: (a) most migrations happened East<->West (b) there has been less evolutionary pressure in East<->West migrations(?). It would be good to see the statistical significance of this directionality, e.g. by running a bootstrap. And of course, to see whether this holds in other parts of the world... e.g. how about diagonally-aligned landmasses, such as Sumatra?
You can also see that some national borders are sharper than others, e.g. the Iberian peninsula and Italy seem to be fairly isolated from the rest (probably due to the Pyrenees, Alps)
Looking at the "residuals", Romania appears a lot further South than its geographical location. Bulgaria doesn't as much, despite having more Romani per capita. (Is there software to draw cartograms from this type of data?)
Italy looks like the most diverse country (at least, in the first two PCs), since it has the most spread-out cloud. (Also note the 5 individuals who look like they're from Sardinia!)
Placing Slovakia in Southern Italy is probably an artifact of a very small sample size.
See also: manifold learning, the fascinating idea of discovering topological structure from data, which can be interpreted as things like phylogenetic trees or developmental paths (e.g. in the face dataset).
(no subject)
Date: 2010-05-14 11:55 pm (UTC)(no subject)
Date: 2010-05-15 12:58 am (UTC)(no subject)
Date: 2010-05-15 01:06 am (UTC)(no subject)
Date: 2010-05-15 01:30 am (UTC)What is plotted here are the individuals after projecting them onto the plane defined by the first two PCs. What is remarkable is that:
(a) if you orient/flip it the right way, it looks a lot like a map of Europe. (PCA is up to rotations and flips)
(b) PC1 is very close to North<->South. (PC2 is orthogonal to that by definition)
(no subject)
Date: 2010-05-15 01:41 am (UTC)(no subject)
Date: 2010-05-15 10:11 am (UTC)But the statement "historically, most French people have had French parents" is not terribly surprising one, nor can you draw any strong evolutionary inferences from this fact.
Of course it will. It has *always* been the case that people are more likely to marry people near them than far away.(no subject)
Date: 2010-05-15 11:12 am (UTC)As for the selective pressure thing, when I say "A suggests B", I mean: for laymen like me, B is plausible scenario for generating observations A. In this case, I was speculating that being in the "wrong" latitude makes a population unfit and thus quicker to change... but this is probably much less significant than (a).
(Annoyingly, I've previously argued over the meaning of "suspect" :-p maybe I need a better set of words.)
(no subject)
Date: 2010-05-15 03:52 pm (UTC)Also, I understand "up to flips", but "up to rotations" is not true, right? The fact that PC1 is nearly north-south axis-aligned is definitely not true "up to rotations".
(no subject)
Date: 2010-05-15 05:51 pm (UTC)I'm asking you to define a way of assigning "North" in a randomly distorted map of Europe. What I suspect you mean by a "correspondence" means that a perfect correspondence isn't possible if the map has nonlinear distortions.
* PCA is not based on similarity.
* Never mind what I said about rotations.
(no subject)
Date: 2010-05-16 01:34 am (UTC)(no subject)
Date: 2010-05-16 07:46 am (UTC)... which makes it even more probable that East<->West diffusion was more prevalent than North<->South diffusion.
(no subject)
Date: 2010-05-16 01:34 pm (UTC)The following article offers some empirical support for his suggestion:
http://www.eeb.uconn.edu/people/turchin/PDF/Turchin_Adams_Hall_2006.pdf
(Turchin's interesting, btw)
Jon
(no subject)
Date: 2010-05-16 04:41 pm (UTC)(no subject)
Date: 2010-05-16 06:28 pm (UTC)(no subject)
Date: 2010-05-19 04:22 pm (UTC)(no subject)
Date: 2010-05-22 01:12 am (UTC)I wonder what the requirements were for the people these genomes were taken from. For example, if you are marked as "IT" on there, did you just have to be born in Italy, or did your parents have to be from there, or did your family have to live there for generations?