gusl | Feb. 22nd, 2009

* Bayes Rule: "P(A,B) = P(A|B) P(B)" means "forall a,b . P(A=a, B=b) = P(A=a|B=b) P(B=b)". This is very standard.

* "variance of the estimator" means "variance of the sampling distribution of the estimator". AFAICT, this is unambiguous, and the only reasonable interpretation is for "estimator" to mean the random variable. To make this even more explicit: the estimator(RV) is the result of applying the estimator(function) to the random data.

* "estimate the parameters" means "estimate the values of the parameters"; more confusingly, "choose the parameters" can mean "choose the values of the parameters". This may just be the econometricians I've been reading.

* "distribution" to mean "family of distributions". Very standard. No one blinks an eye at "the Gaussian distribution". I think "family" is typically only used to describe families for which mean and variance are not sufficient statistics.

* "sample" to mean "data point". One should be careful here: in standard usage, a "sample" is a collection of data points. Sometimes, though, one samples just one point, and metonymically calls it "the sample".

* using "correlated" to mean "dependent". This is incorrect, except in special circumstances, such as multivariate Gaussian models.

* using "sufficient statistics" to mean "summary statistics" (e.g. in the context of mean-field approximations). This is incorrect.

---

UPDATE: I should write a SigBovik paper titled "Introduction to Statistical Pedantics".

I was just looking back at my proposal for formalizing basic Statistics, and

chrisamaphone's twelfing.

There, I treated "Random Variable" (RV) as a basic type, which you never go inside. I think this was the right choice for the purposes of that formalization: seeing RV as (Real -> Real) isn't helpful, and loses important information.
I want to think of RV as a type that contains a (Real -> Real) (i.e. its pdf): a RV has a pdf (rather than "is"), though I can sorta imagine such choices being somewhat arbitrary in some situations. (Speaking of metonymy...)

But there are cases where you'd want to allow this. You may have a value x and you want to talk about the probability of various RVs being exactly x. This would require accessing the pdf object. No problem here...

The alternative is for RV to be an alias for (Real -> Real). This would be a transparent / glass-box way to model that, and amounts to saying that an RV is its pdf.

---

Now, since math is all about metaphors (through which one tries to transfer results), this means that mathematical objects sometimes have multiple meanings (i.e. they simultaneously instantiate multiple structures). In information geometry, for example, distributions are points in a space.

So I want to say that:

p : point.
p : distr.

are both true.

But this won't jibe with normal type systems (AFAIK): since point and distr aren't subtypes of each other, these two statements contradict each other. So we should add information about the context, i.e. the metaphor through which you see the object, as a point or as a distribution.

Something like:

p [prob_theory] : distr.
p [info_geo] : point.

I'm trying to keep identity sacred here, but there might be good reasons to sacrifice it, and instead treat the mapping between metaphors as a bijective function.

S	M	T	W	T	F	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29

Gustavo Lacerda

Feb. 22nd, 2009

examples of metonymy and ambiguities in statistics, with varying degrees of harm

type rewrapping and data hiding; multi-typing?

Profile

February 2020

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags