types in Statistics
Oct. 3rd, 2008 04:21 pmMy Stats homework has a question of the type: "Given this joint distribution over X and Y, compute E(E(X|Y)).". This notation is extremely confusing. Can you tell what this means?
Let's see:
E(X|Y=y) is a real-valued function of y.
Thus, E(X|Y) is a random variable.
Therefore, E(E(X|Y)) is a real number.
Type-checking passes.
Given how unclear the notation is, I decided to do something about it, using the formal(ish?) language that I designed yesterday:
Type specifications:
Full type derivation, in a fictitious shell:
Note that I'm passing value-less arguments to functions, and they don't complain.
If I now specify the distributions (i.e. give values to the RandomVariable objects), and implement the methods used above ('makeJointIndependentRV', 'expectedValue', 'applyFunctionToRV'), it should compute e:
Note that it remembers that
We could have a lazy strategy and do all the intermediate computations (except for type inference) at the very last step, as the value is requested (i.e. query triggers update). This whole thing is reminding me of Excel (except that the latter is eager, i.e. change triggers update).
Note 1: Coming up with 'applyFunctionToRV' was hard, because Statisticians have no such concept explicitly. It's implicit. Working this out and writing it up just cost me the last hour.
Note 2: I never used X|Y ('XGivenY') in my very thorough derivation. This is because, despite the traditional notation, X|Y is not a meaningful unit. Remember: X|Y=y is a random variable (after you pass a Real value for y), and thus E(X|Y=y) is a Real number (after you pass a Real value for y). "E(X|Y)" is the RV that results from making y random.
Note 3: we could go inside RandomVariable, and specify it as being to the type
Let's see:
E(X|Y=y) is a real-valued function of y.
Thus, E(X|Y) is a random variable.
Therefore, E(E(X|Y)) is a real number.
Type-checking passes.
Given how unclear the notation is, I decided to do something about it, using the formal(ish?) language that I designed yesterday:
Type specifications:
Type RandomVariable alias RV; Type JointRandomVariable alias JointRV has X:RV; Y:RV; function expectedValue : RV -> Real; //function: given X, it computes E(X) function expectedValue : (a -> RV) -> (a -> Real); //mathematical abstraction: we are // now allowing the user to specify the RV in terms of unknown variables. // does this subsume the first specification of 'expectedValue'? function given : (XY:JointRV * X:RV * Y:RV) -> y:Real -> X:RV; //in this syntax "X:RV" is simply a more human-readable version of "RV" //given a joint RV, the "output" RV, and the "conditioning" RV, this function returns //the function that, given a value of y, returns the RV X. function applyFunctionToRV : (f:(Real -> Real) * Y:RV) -> RV; //given a function from Reals to Reals, and a RV over the Reals, returns a RV that //can be sampled by sampling from Y, and then applying f. //the part of the resulting RV's domain with positive probability will be a subset //of range of f.
Full type derivation, in a fictitious shell:
>> let X : RV;
X : RandomVariable (no value)
>> let Y : RV;
Y : RandomVariable (no value)
>> XY = makeJointIndependentRV(X,Y)
XY : JointRandomVariable (no value)
>> XgivenYisy = given(XY,X,Y) /* Note: since I didn't pass the last argument, this
returns a function */
XgivenYisy : Real -> RandomVariable (no value) //namely, y:Real -> X:RV
>> innerE = expectedValue(XgivenYisy) //'expectedValue' is polymorphic: this is the 2nd
innerE : Real -> Real (no value) //namely, y:Real -> meanX:Real
>> innerE_RV = applyFunctionToRV(innerE, Y)
innerE_RV : RandomVariable (no value)
>> outerE = expectedValue(innerE_RV)
outerE : Real (no value)
Note that I'm passing value-less arguments to functions, and they don't complain.
If I now specify the distributions (i.e. give values to the RandomVariable objects), and implement the methods used above ('makeJointIndependentRV', 'expectedValue', 'applyFunctionToRV'), it should compute e:
>> X = Gaussian(0,1) X : RandomVariable >> Y = Gaussian(0,1) Y : RandomVariable >> outerE outerE : Real outerE = 0
Note that it remembers that
outerE = expectedValue(innerE_RV), rather than outerE = 0. So whenever you give new values X and Y, and ask for outerE, you will get an updated value. I call this "spreadsheet logic".We could have a lazy strategy and do all the intermediate computations (except for type inference) at the very last step, as the value is requested (i.e. query triggers update). This whole thing is reminding me of Excel (except that the latter is eager, i.e. change triggers update).
Note 1: Coming up with 'applyFunctionToRV' was hard, because Statisticians have no such concept explicitly. It's implicit. Working this out and writing it up just cost me the last hour.
Note 2: I never used X|Y ('XGivenY') in my very thorough derivation. This is because, despite the traditional notation, X|Y is not a meaningful unit. Remember: X|Y=y is a random variable (after you pass a Real value for y), and thus E(X|Y=y) is a Real number (after you pass a Real value for y). "E(X|Y)" is the RV that results from making y random.
Note 3: we could go inside RandomVariable, and specify it as being to the type
Real[0,1] -> Real (i.e. the RV is defined by the inverse of its cdf). But since we'd have encapsulation, you'd be free to change this type specification later, by typing it just once. I almost want to call it "type implementation", but it doesn't run.
(no subject)
Date: 2008-10-09 04:17 pm (UTC)i'll see if i can code this up after my advisor meeting today.
(no subject)
Date: 2008-10-10 06:36 am (UTC)(no subject)
Date: 2008-10-10 03:36 pm (UTC)