gusl: (Default)
[personal profile] gusl
Hypothesis: most of the variation in adult human height is due to variation in leg length.

I'd like to test this. Where can I find a dataset?

What statistical method would I use? It's easy to measure the s.d. in total height and s.d. legs, and since the legs make an additive contribution to the height, this is an easy problem. But just having the 2 s.d.'s isn't as good as having the leg-length data related to the height data.

But in general, explaining variation in terms of variation of other factors sounds like factor analysis.

I really should study more statistics.

(no subject)

Date: 2006-10-30 01:14 am (UTC)
From: [identity profile] smandal.livejournal.com
I think what you want is correlation (http://en.wikipedia.org/wiki/Correlation).

As for obtaining data, sounds like a good excuse for a party! :P

(no subject)

Date: 2006-10-30 01:18 am (UTC)
From: [identity profile] gustavolacerda.livejournal.com
Correlation?? OMG, what a revolutionary concept!

Seriously, it is obvious that there should be a correlation. My claim is stronger than that.

(no subject)

Date: 2006-10-30 01:25 am (UTC)
From: [identity profile] gustavolacerda.livejournal.com
If all people were proportional to each other, you would have a perfect correlation between leg length and height (correlation coefficient of 1).

My claim is that tall people have disproportionately long legs.

(no subject)

Date: 2006-10-30 01:27 am (UTC)
From: [identity profile] gustavolacerda.livejournal.com
Of course, you could look at the correlation between relative-leg-length (i.e. leg length measured as percentage of height) and height. Is this what you meant?

(no subject)

Date: 2006-10-30 01:53 am (UTC)
From: [identity profile] smandal.livejournal.com
All I was saying is that you could plot leg length versus height. Indeed, if people were perfectly proportionate, you would get a straight line and a coefficient of 1. However, if tall people had disproportionately long legs, you would get an upward trend in the distribution, and that would answer your question. Mapping relative-leg-length to height would yield the same information -- this time (hypothetically) a straight horizontal line with an upward trend.

If your hypothesis is correct, a correlation coefficient is not a good representation of the information. If you do in fact perform PCA with two factors (legs, not legs) and zero error, you would get one number for percentage of total height variance due to legs, and another number equal to one minus that for not legs. Even if your hypothesis is correct, a correlation figure only tells you that you might be right, not that you are.

If you insist on numbers, you can look at the cross-distribution with moments and whatnot ...

(no subject)

Date: 2006-10-30 03:24 am (UTC)
From: [identity profile] en-ki.livejournal.com
The US Army has loads of anthropometry data they have gathered that they make public. I believe it's based on people actually in the Army, so there is some selection bias, but it's voluminous and easy to find. Ex-employer uses it a lot.

(no subject)

Date: 2006-10-31 06:25 pm (UTC)
From: [identity profile] bondage-and-tea.livejournal.com
What else could the variation be due to? Body length? Head length? Neck length? Are these the sorts of measurements you want?

Also "most of the variation is due to" seems ill-defined. I guess you're looking for linear relationships? So you could use, say, a Pearson's product moment correlation between total height and each of the other measurements (leg length, body length, etc). You'd expect r = 0.8 and statistical significants for leg length and body length but for the others either a smaller correlation or not statistical significance.

Does that seem right?

You could also use multiple regression: lots of indepedent variables for the various measurements, and the dependent variable is total height. Tree models (R does 'em) allow you to determine which of the variables is the most important factor affecting total height.

(no subject)

Date: 2006-10-31 06:36 pm (UTC)
From: [identity profile] bondage-and-tea.livejournal.com
the r = 0.8 was meant to be a bit more fuzzy, e.g. "You'd expect r = 0.8, say, ..."

February 2020

S M T W T F S
      1
2345678
9101112131415
16171819202122
23242526272829

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags