gusl: (Default)
[personal profile] gusl
"Most bell curves have thick tails", by Bart Kosko (via MR) is a must-read. Main points:

* a great deal of science and engineering assumes a normal (Gaussian) distribution far too quickly.

* there are many bell curves. The Gaussian is rather thin-tailed, when compared with real-world distributions. It is the thinnest-tailed in the family of stable distributions.

* I quote:
the classical central limit theorem [link mine] result rests on a critical assumption that need not hold and that often does not hold in practice. The theorem assumes that the random dispersion about the mean is so comparatively slight that a particular measure of this dispersion — the variance or the standard deviation — is finite or does not blow up to infinity in a mathematical sense. Most bell curves have infinite or undefined variance even though they have a finite dispersion about their center point. The error is not in the bell curves but in the two-hundred-year-old assumption that variance equals dispersion. It does not in general.


* Standard deviation as a measure of dispersion is a dogma. Squaring means you weigh outliers too heavily.

(no subject)

Date: 2006-07-08 07:30 pm (UTC)
From: [identity profile] smandal.livejournal.com
Thank you very much for the link. Physicists share the chauvinism for the most basic probability distributions because of quantum mechanics and decay phenomena.

(no subject)

Date: 2006-07-08 07:53 pm (UTC)
From: [identity profile] perspectivism.livejournal.com

Yes, great topic!

NN Taleb (link) is fantastic on this topic. It's maybe the strongest theme in his work.

(no subject)

Date: 2006-07-08 08:19 pm (UTC)
From: [identity profile] jcreed.livejournal.com
I'm not at all convinced this is a "dangerous idea".

Why am I supposed to believe that "most" unimodal distributions have thick tails, and why is the assumption of finite variance for the iid variables going into the central-limit-theorem sum so unreasonable for the zillions of real-world applications where the variance really is finite?

(no subject)

Date: 2006-07-08 09:21 pm (UTC)
From: [identity profile] gustavolacerda.livejournal.com
Why am I supposed to believe that "most" unimodal distributions have thick tails

What kind of evidence could possibly convince you?

(no subject)

Date: 2006-07-08 09:29 pm (UTC)
From: [identity profile] jcreed.livejournal.com
A good question!

I think my objection is that the use of the word "most" implicitly involves Kosko's notion of which distributions count for more. So I would claim that the burden is really on the other side of the argument from me, that Kosko should make his assertion more clear before I'm expected to believe it.

Though actually, is there even one nice example you know of where people used to approximate things with a Gaussian where it's (a) clearly a pretty bad approximation for infinite-variance reasons like he describes and (b) there's a better approximation that is still feasible to work with?

It sounds like Kosko has a bunch of such examples in mind, and thinks that they are widespread, but for space reasons hasn't listed any in particular; I might still quibble after they were given about how centrally important they are, but really I am curious what these cases look like.

As a converse example where Gaussian distributions are totally appropriate, I could point to something basic like the number of heads in N fair coin flips.

(no subject)

Date: 2006-07-08 09:49 pm (UTC)
From: [identity profile] gustavolacerda.livejournal.com
Maybe the better/"correct" distributions are not very nice to work with, but nevertheless, people are overconfident about their Gaussian approximations.

I personally don't mind numerical methods, but there are people out there who think that everything should be nice and analytic.

(no subject)

Date: 2006-07-08 09:53 pm (UTC)
From: [identity profile] jcreed.livejournal.com
Well, but people are not always overconfident; there are cases where thin-tailed Gaussian approximations are the right approximation. There are cases where they're not. I'm not persuaded that there is an endemic problem of people using Gaussians when they oughtn't. I might be persuaded by more specific examples.

(no subject)

Date: 2006-07-08 09:35 pm (UTC)
From: [identity profile] jcreed.livejournal.com
Actually now that I've read the wikipedia article on stable distributions you linked to, this all seems even more interesting, although I still think Kosko is over-selling things a bit.

One reason why the (special) central limit theorem is nice is that it easily explains why we might find gaussian distributions all around us; if at the bottom we have a lot of extremely simple processes (let's say they have not only finite variance, but finitely many different outcomes) and we sum them up, we'll get a gaussian.

So exactly what stability means is that if we sum up a bunch of little processes that have a stable distribution, the result will, too — but I wonder what kind of basic processes (or other ways of aggregating besides summation) have these Levy distributions in the first place?

(no subject)

Date: 2006-07-08 09:56 pm (UTC)
From: [identity profile] gustavolacerda.livejournal.com
Stable distributions are a nice idea. I wonder if there is a multiplicative equivalent.

(no subject)

Date: 2006-07-09 12:33 am (UTC)
From: [identity profile] mauitian.livejournal.com
Hmm, what do you get if you plot the distribution of all distributions -- is it a bell curve?

(no subject)

Date: 2006-07-09 02:39 am (UTC)
From: [identity profile] gustavolacerda.livejournal.com
hm... so you mean plotting the distribution of stable-alpha distributions real-world, by alpha?

(no subject)

Date: 2006-07-10 12:06 am (UTC)
From: [identity profile] en-ki.livejournal.com
"Most bell curves have infinite or undefined variance even though they have a finite dispersion about their center point."

What is this "dispersion" of which the author speaks? Is it rigorously defined somewhere? He doesn't write as though it is.

I am kind of ignorant of statistics. Naively, one might use the mean absolute-value deviation instead of the root-mean-square. Why, uh, doesn't one?

(no subject)

Date: 2006-07-11 02:21 am (UTC)
From: [identity profile] en-ki.livejournal.com
Yeah, that's what I saw, and it's clearly too broad for the author's mention of "infinite standard deviation but finite dispersion" to make much sense. For any probability density function that is nonzero over a domain of infinite measure, I am certain I can find a measure of dispersion which will diverge.

Measures of dispersion would seem to correspond to families of metrics: you have n samples, so you subtract the mean from each one, treat that as a point in Rn, and consider its distance from the origin. Each family of metrics is scaled so that adding an identical sample doesn't change the "distance", so the uncorrected standard deviation is just the scaled Euclidean metric.

Each metric then presumably yields a central limit theorem. So to me, the interesting question to take from this is, is there a single "natural" metric that is better than the Euclidean (which, of course, was the first that came to mind?), or is the metric different for each problem, and if so how do you find it? Which I suppose is more or less what the author of the original paper was asking, but I wish they'd asked it better.

February 2020

S M T W T F S
      1
2345678
9101112131415
16171819202122
23242526272829

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags