gusl: (Default)
[personal profile] gusl
This plot of European genomes is an almost perfect match with the map of Europe.



Interestingly, PC1 is pretty well aligned with North<->South. This suggests that: (a) most migrations happened East<->West (b) there has been less evolutionary pressure in East<->West migrations(?). It would be good to see the statistical significance of this directionality, e.g. by running a bootstrap. And of course, to see whether this holds in other parts of the world... e.g. how about diagonally-aligned landmasses, such as Sumatra?

You can also see that some national borders are sharper than others, e.g. the Iberian peninsula and Italy seem to be fairly isolated from the rest (probably due to the Pyrenees, Alps)

Looking at the "residuals", Romania appears a lot further South than its geographical location. Bulgaria doesn't as much, despite having more Romani per capita. (Is there software to draw cartograms from this type of data?)

Italy looks like the most diverse country (at least, in the first two PCs), since it has the most spread-out cloud. (Also note the 5 individuals who look like they're from Sardinia!)

Placing Slovakia in Southern Italy is probably an artifact of a very small sample size.

See also: manifold learning, the fascinating idea of discovering topological structure from data, which can be interpreted as things like phylogenetic trees or developmental paths (e.g. in the face dataset).

(no subject)

Date: 2010-05-14 11:55 pm (UTC)
From: [identity profile] gwillen.livejournal.com
Can you give some more background on what this means? E.g. what the significance of the PC1 and PC2 axes are?

(no subject)

Date: 2010-05-15 12:58 am (UTC)
From: [identity profile] gustavolacerda.livejournal.com
PC = Principal Component, i.e. they run PCA on the set of genomes (where each genome is encoded as a set of "features" or variables, e.g. SNPs). PC1 is the first principal component, i.e. the "direction" that accounts for the most variance in the data.

(no subject)

Date: 2010-05-15 01:06 am (UTC)
From: [identity profile] gwillen.livejournal.com
Ah!! Wow, it amazes me that the first two principal components are geographic axes, even more so that they are nearly axis-aligned. That's really neat.

(no subject)

Date: 2010-05-15 01:30 am (UTC)
From: [identity profile] gustavolacerda.livejournal.com
I think you misunderstood slightly.

What is plotted here are the individuals after projecting them onto the plane defined by the first two PCs. What is remarkable is that:

(a) if you orient/flip it the right way, it looks a lot like a map of Europe. (PCA is up to rotations and flips)

(b) PC1 is very close to North<->South. (PC2 is orthogonal to that by definition)
Edited Date: 2010-05-15 01:38 am (UTC)

(no subject)

Date: 2010-05-15 01:41 am (UTC)
From: [identity profile] gustavolacerda.livejournal.com
btw, I added a little sentence about Italy.

(no subject)

Date: 2010-05-15 10:11 am (UTC)
From: [personal profile] neelk
Interestingly, PC1 is pretty well aligned with North<->South. This suggests that: (a) most migrations happened East<->West (b) there has been less evolutionary pressure in East<->West migrations(?).
You're misinterpreting the data -- this data tells us nothing about selective pressures. I went and looked at the paper in Nature about this, and what they were measuring was variations in single nucleotide polymorphisms. SNPs are just what they sound like: mutations in single base pairs, which typically have no phenotypic effect. Everybody is born with some new ones, which they pass on to their kids. What this does is give a way of using phylogenetic algorithms to infer genealogical trees.

But the statement "historically, most French people have had French parents" is not terribly surprising one, nor can you draw any strong evolutionary inferences from this fact.

And of course, to see whether this holds in other parts of the world... e.g. how about diagonally-aligned landmasses, such as Sumatra?
Of course it will. It has *always* been the case that people are more likely to marry people near them than far away.

(no subject)

Date: 2010-05-15 11:12 am (UTC)
From: [identity profile] gustavolacerda.livejournal.com
What I am highlighting is the alignment of PC1 with North<->South. In Sumatra, the North-South alignment seems implausible, since it's more likely that migrations happened from one tip to the other.

As for the selective pressure thing, when I say "A suggests B", I mean: for laymen like me, B is plausible scenario for generating observations A. In this case, I was speculating that being in the "wrong" latitude makes a population unfit and thus quicker to change... but this is probably much less significant than (a).

(Annoyingly, I've previously argued over the meaning of "suspect" :-p maybe I need a better set of words.)
Edited Date: 2010-05-15 11:16 am (UTC)

(no subject)

Date: 2010-05-15 03:52 pm (UTC)
From: [identity profile] gwillen.livejournal.com
Hmm, I don't think I misunderstood? Would you agree if I rephrased that as "it amazes me that the first two principal components happened to correspond so well to geographic axes"? I.e. I am amazed that the first two principle components of whatever measure of genetic similarity you are using happen to match geography so closely. (Although a later comment that "French people tend to have French parents" covers some of it; but that doesn't explain why geographically proximate countries are consistently proximate in gene-space as well.)

Also, I understand "up to flips", but "up to rotations" is not true, right? The fact that PC1 is nearly north-south axis-aligned is definitely not true "up to rotations".

(no subject)

Date: 2010-05-15 05:51 pm (UTC)
From: [identity profile] gustavolacerda.livejournal.com
* how do you measure correspondence to "geographic axes"? "North" and "South" is something that exists in the real world, not in the space of possible PCAs.
I'm asking you to define a way of assigning "North" in a randomly distorted map of Europe. What I suspect you mean by a "correspondence" means that a perfect correspondence isn't possible if the map has nonlinear distortions.

* PCA is not based on similarity.

* Never mind what I said about rotations.
Edited Date: 2010-05-15 06:23 pm (UTC)

(no subject)

Date: 2010-05-16 01:34 am (UTC)
From: [identity profile] radiata-prime.livejournal.com
The answer to this seems pretty simple. If you move East-West you don't have to learn a whole new set of farming techniques.

(no subject)

Date: 2010-05-16 07:46 am (UTC)
From: [identity profile] gustavolacerda.livejournal.com
(completing your answer)
... which makes it even more probable that East<->West diffusion was more prevalent than North<->South diffusion.

(no subject)

Date: 2010-05-16 01:34 pm (UTC)
From: [identity profile] trufflesniffer.livejournal.com
I think this is inline with Jared Diamond's suggestion that empires tended to expand east-west due to being dependent on food crops (which are dependent on climactic conditions that are more consistent latitudinally than longitudinally).

The following article offers some empirical support for his suggestion:
http://www.eeb.uconn.edu/people/turchin/PDF/Turchin_Adams_Hall_2006.pdf

(Turchin's interesting, btw)

Jon

(no subject)

Date: 2010-05-16 04:41 pm (UTC)

(no subject)

Date: 2010-05-16 06:28 pm (UTC)
From: [identity profile] radiata-prime.livejournal.com
Yeh. This is known to be true.

(no subject)

Date: 2010-05-19 04:22 pm (UTC)

(no subject)

Date: 2010-05-22 01:12 am (UTC)
From: [identity profile] spoonless.livejournal.com
Wow, that is totally amazing! Can't wait to get my 23-and-me results back (they received my sample earlier this week).

I wonder what the requirements were for the people these genomes were taken from. For example, if you are marked as "IT" on there, did you just have to be born in Italy, or did your parents have to be from there, or did your family have to live there for generations?

February 2020

S M T W T F S
      1
2345678
9101112131415
16171819202122
23242526272829

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags