bleg: biometric data
Oct. 29th, 2006 07:28 pmHypothesis: most of the variation in adult human height is due to variation in leg length.
I'd like to test this. Where can I find a dataset?
What statistical method would I use? It's easy to measure the s.d. in total height and s.d. legs, and since the legs make an additive contribution to the height, this is an easy problem. But just having the 2 s.d.'s isn't as good as having the leg-length data related to the height data.
But in general, explaining variation in terms of variation of other factors sounds like factor analysis.
I really should study more statistics.
I'd like to test this. Where can I find a dataset?
What statistical method would I use? It's easy to measure the s.d. in total height and s.d. legs, and since the legs make an additive contribution to the height, this is an easy problem. But just having the 2 s.d.'s isn't as good as having the leg-length data related to the height data.
But in general, explaining variation in terms of variation of other factors sounds like factor analysis.
I really should study more statistics.
(no subject)
Date: 2006-10-30 01:14 am (UTC)As for obtaining data, sounds like a good excuse for a party! :P
(no subject)
Date: 2006-10-30 01:18 am (UTC)Seriously, it is obvious that there should be a correlation. My claim is stronger than that.
(no subject)
Date: 2006-10-30 01:23 am (UTC)(no subject)
Date: 2006-10-30 01:25 am (UTC)My claim is that tall people have disproportionately long legs.
(no subject)
Date: 2006-10-30 01:27 am (UTC)(no subject)
Date: 2006-10-30 01:53 am (UTC)If your hypothesis is correct, a correlation coefficient is not a good representation of the information. If you do in fact perform PCA with two factors (legs, not legs) and zero error, you would get one number for percentage of total height variance due to legs, and another number equal to one minus that for not legs. Even if your hypothesis is correct, a correlation figure only tells you that you might be right, not that you are.
If you insist on numbers, you can look at the cross-distribution with moments and whatnot ...
(no subject)
Date: 2006-10-30 03:24 am (UTC)(no subject)
Date: 2006-10-31 06:25 pm (UTC)Also "most of the variation is due to" seems ill-defined. I guess you're looking for linear relationships? So you could use, say, a Pearson's product moment correlation between total height and each of the other measurements (leg length, body length, etc). You'd expect r = 0.8 and statistical significants for leg length and body length but for the others either a smaller correlation or not statistical significance.
Does that seem right?
You could also use multiple regression: lots of indepedent variables for the various measurements, and the dependent variable is total height. Tree models (R does 'em) allow you to determine which of the variables is the most important factor affecting total height.
(no subject)
Date: 2006-10-31 06:36 pm (UTC)