bootstrap without replacement
Oct. 22nd, 2009 03:33 pmWhat do you call a bootstrap in which you sample without replacement? This way the resamples will tend to look like plausible samples from the population, albeit smaller than the original sample.
Besides, you don't want to sample with replacement if you're:
(a) working with Chinese Restaurant Processes: you could be falsely led to believe that customers are more gregarious than they really if you count multiple clones of the same customer at the same table as different customers.
or are interested in local statistics such as:
(b) cluster tightness: cloned points form tight clusters indeed!
(c) mode of a distribution: same idea.
Why is the standard to sample with replacement? And if you're sampling with replacement, why is it important for the resample to have the same size as the original sample? Why not smaller? Why not bigger?
Although a smaller resample-size will lead to broader sampling distributions for each resample, a big resample-size will make the resamples arbitrarily similar to the original sample (thanks to large sample results)*. It's unclear what exactly you want to optimize.
Tangentially, I've never heard of deterministic resampling: surely it is better to use some sort of combinatorial design than to pick your resamples randomly.
* - furthermore, leading some people to be falsely confident (by mistaking a large resample from the original sample for a large sample from the population).
Besides, you don't want to sample with replacement if you're:
(a) working with Chinese Restaurant Processes: you could be falsely led to believe that customers are more gregarious than they really if you count multiple clones of the same customer at the same table as different customers.
or are interested in local statistics such as:
(b) cluster tightness: cloned points form tight clusters indeed!
(c) mode of a distribution: same idea.
Why is the standard to sample with replacement? And if you're sampling with replacement, why is it important for the resample to have the same size as the original sample? Why not smaller? Why not bigger?
Although a smaller resample-size will lead to broader sampling distributions for each resample, a big resample-size will make the resamples arbitrarily similar to the original sample (thanks to large sample results)*. It's unclear what exactly you want to optimize.
Tangentially, I've never heard of deterministic resampling: surely it is better to use some sort of combinatorial design than to pick your resamples randomly.
* - furthermore, leading some people to be falsely confident (by mistaking a large resample from the original sample for a large sample from the population).