gusl: (Default)
Evil Genie: Is it possible to come up with an infinite sequence of Gaussians centered around 0 such that the sum never changes the total distribution Sigma_k<=i(X_k)? (i.e., the total s.d. stays at 1 forever) By how much will the s.d. need to increase each time? What kind of progression is this?


Limit Distribution for Products:
Is there a limit distribution for products? i.e. an analog of the Central Limit Theorem?

In this new case, adding a constant term, i.e. pushing the mean of the distribution to the right or left will affect the shape of the total distribution dramatically. Also, it will not be scale-invariant: multiplying distributions falling mostly within [-1,1] will make the s.d. smaller, whereas distributions falling mostly outside of that range will make the s.d even larger. A natural question is: at what s.d. is multiplication stable, i.e. for what value of sd(F), is it the case that sd(F*F) = sd(F)?

I don't know what the product of 2 Gaussians looks like, or how to find out, other than by programming a simulation.

Claim: Either this distribution is symmetric around 0, or it will be among positive values (i.e. density will be 0 for all negative values).
Argument: if there is more density in the positive values than in the negative values or vice-versa, the F^2 will have even more density in the positives, F^4 even more, and so forth.

My intuition says that multiplying two Gaussians centered around 0 will give you a shape that looks like a McDonald's M. I don't know why.
gusl: (Default)
The Chi-squared statistic gives us a test of statistical independence between n variables with "unordered" values (i.e. there are no relations between possible values, e.g. {apple, orange, tomato} rather than {0, 0.5, 1} ).

The Chi-squared statistic is the sum of the squared error for each possible value in the joint distribution. This error being the difference between the observed value and value computed from the marginal distributions with the independence assumption (i.e. multiplying the marginals).

How can you do better if you have a structure over the values of these variables (either some of them, or all of them)? Remember: A correlation coefficient of 0 does not imply independence.

In order to do better, I think it's necessary to use some sort of continuity assumption.

---

I'm imagining an analog of the Chi-squared in 2 dimensions, in which you:

* estimate the joint density using kernels (i.e. fuzzy blobs at each data point, possibly sharper in higher-density regions (this would need some bootstrapping) )
* make a grid (i.e. a 2D histogram)
* perform a Chi-squared, using the "weight" of each little rectangle as the joint frequencies, and the weight of rows and columns as the marginal frequencies.

In the limit, as the 2D histogram gets very fine, I think this will converge to the "optimal" statistic. But degrees of freedom get pretty big that way. I don't know if that's a problem.

It seems I should be able to do better, by somehow getting rid of the arbitrariness of where histogram makes the cutoffs. Maybe the solution is to make the weight of a rectangle depend on the weight the neighboring rectangles in a fuzzy continuous way. But don't the kernels already do this? But to really be continuous, it seems like I should get rid of rectangles altogether.

Another idea is to modify the Chi-squared statistic so that it takes neighbouring points into account. But again, it seems like using kernels already does the same thing.

February 2020

S M T W T F S
      1
2345678
9101112131415
16171819202122
23242526272829

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags