gusl: (Default)
I've always found it unnatural how, in Probability theory, random variables are functions. I always used to think of "joint random variables" as primary, with their density cloud; and of simple random variables as secondary, obtained by marginalization.

The common undergraduate-level notation, which is shorter and more convenient, may be to blame for my difficulty in making this conceptual shift. Here's the translation:

P(X ∈ A) = P(X(ω) ∈ A)
P(X = x) = P(X(ω) = x)

The standard formulation of Probability theory starts with an abstract space Ω, containing random outcomes. ω is our randomness generator, and will take a random value in Ω. My "density cloud", which contained all the information about the joint distribution, is now replaced by the notion of "base measure". The lossy projection into the real line that gives the distribution of the 1D random variable X, rather than being the operation of marginalization, is the random variable itself, X.

Anyway, tonight I just had a flash of insight! The advantage of the standard approach is that the random variable X + Y is just X(ω) + Y(ω)... and more generally f(X,Y) = f(X(ω), Y(ω)), and similarly for any function over sequences of random variables.

This seems to allow for more spontaneous creation of random variables, even infinite-dimensional ones, which supposedly comes in handy when you do Non-Parametric Bayes.

There is still frequent notational ambiguity: "do you mean the function, or the output of the function?", but I guess that's the price of convenience.
gusl: (Default)
"Most bell curves have thick tails", by Bart Kosko (via MR) is a must-read. Main points:

* a great deal of science and engineering assumes a normal (Gaussian) distribution far too quickly.

* there are many bell curves. The Gaussian is rather thin-tailed, when compared with real-world distributions. It is the thinnest-tailed in the family of stable distributions.

* I quote:
the classical central limit theorem [link mine] result rests on a critical assumption that need not hold and that often does not hold in practice. The theorem assumes that the random dispersion about the mean is so comparatively slight that a particular measure of this dispersion — the variance or the standard deviation — is finite or does not blow up to infinity in a mathematical sense. Most bell curves have infinite or undefined variance even though they have a finite dispersion about their center point. The error is not in the bell curves but in the two-hundred-year-old assumption that variance equals dispersion. It does not in general.


* Standard deviation as a measure of dispersion is a dogma. Squaring means you weigh outliers too heavily.

February 2020

S M T W T F S
      1
2345678
9101112131415
16171819202122
23242526272829

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags