gusl | Jan. 2nd, 2009

Suppose you have points on the real line, from two classes, e.g. random samples from the sampling distribution of a certain statistic, under two experimental conditions, C1 and C2.

You want to confidently state that, within a certain p-value, having observed a sample from the sampling distribution under C1 and a sample from the sampling distribution under C2, the two populations really are different wrt the statistic.

So you want to count how often the trend is broken (how often a random red will be bigger than a random blue). This is the p-value.

In the above diagram, the proportion of white is an estimate of the p-value. This estimate is consistent in the sense that it converges to the true p-value when both n1 and n2 go to infinity.

Naively, you could make a comparison for each of the n1*n2 combinations, but it's more efficient to do the following:

* sort them
* keep only the order information. The above data becomes the string 001011001111011.
* loop over the characters
** keep a count of the 0s
** whenever we see a 1, we print the number of zeros we have seen so far

it's essentially computing a discrete CDF of the distribution of the 0s, where the 1s define the measure of the sample space.

---

UPDATE (15Jan): this can be used to test whether two samples come from the same distribution, e.g. the Two-Sample Cramer-von Mises test.

S	M	T	W	T	F	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29

Gustavo Lacerda

Jan. 2nd, 2009

algorithm for frequentist order statistics

Profile

February 2020

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags