random variables as functions
Sep. 20th, 2011 01:23 amI've always found it unnatural how, in Probability theory, random variables are functions. I always used to think of "joint random variables" as primary, with their density cloud; and of simple random variables as secondary, obtained by marginalization.
The common undergraduate-level notation, which is shorter and more convenient, may be to blame for my difficulty in making this conceptual shift. Here's the translation:
P(X ∈ A) = P(X(ω) ∈ A)
P(X = x) = P(X(ω) = x)
The standard formulation of Probability theory starts with an abstract space Ω, containing random outcomes. ω is our randomness generator, and will take a random value in Ω. My "density cloud", which contained all the information about the joint distribution, is now replaced by the notion of "base measure". The lossy projection into the real line that gives the distribution of the 1D random variable X, rather than being the operation of marginalization, is the random variable itself, X.
Anyway, tonight I just had a flash of insight! The advantage of the standard approach is that the random variable X + Y is just X(ω) + Y(ω)... and more generally f(X,Y) = f(X(ω), Y(ω)), and similarly for any function over sequences of random variables.
This seems to allow for more spontaneous creation of random variables, even infinite-dimensional ones, which supposedly comes in handy when you do Non-Parametric Bayes.
There is still frequent notational ambiguity: "do you mean the function, or the output of the function?", but I guess that's the price of convenience.
The common undergraduate-level notation, which is shorter and more convenient, may be to blame for my difficulty in making this conceptual shift. Here's the translation:
P(X ∈ A) = P(X(ω) ∈ A)
P(X = x) = P(X(ω) = x)
The standard formulation of Probability theory starts with an abstract space Ω, containing random outcomes. ω is our randomness generator, and will take a random value in Ω. My "density cloud", which contained all the information about the joint distribution, is now replaced by the notion of "base measure". The lossy projection into the real line that gives the distribution of the 1D random variable X, rather than being the operation of marginalization, is the random variable itself, X.
Anyway, tonight I just had a flash of insight! The advantage of the standard approach is that the random variable X + Y is just X(ω) + Y(ω)... and more generally f(X,Y) = f(X(ω), Y(ω)), and similarly for any function over sequences of random variables.
This seems to allow for more spontaneous creation of random variables, even infinite-dimensional ones, which supposedly comes in handy when you do Non-Parametric Bayes.
There is still frequent notational ambiguity: "do you mean the function, or the output of the function?", but I guess that's the price of convenience.