Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Consider a contrived scenario where an opaque jar contains N distinguishable marbles. You take one out and note it's type and put it back in. You repeat this n times. If k out of n are unique it conveys information about N.

If, for example, k=1 then N is likely small. On the other hand if k=n then N is likely large.

The most computer-sciencey way is to look at n at which you get a repeat, ah! a hash collision.

One can make these ideas more quantitative under assumptions about the numbers of each types of marbles.

The math of hashing, birthday paradox, coupon collection and hyperloglog are good places to start.

Then there are other ways. Two of you count the number of typos in a tedious text. One says N the other says n and out of them only k are common. From this you can estimate the likely number of typos in the text.

 help



Right. That makes sense in the contrived scenario (although in that contrived scenario we know the probabilities with absolute surety).

But TFA's estimate is perplexing because it is NOT a contrived scenario. We don't have marbles, we have some territory to cover. The territory isn't randomly distributed, we can't adequately randomly sample (presumably?).

It feels like the estimate could be wildly wildly off, in which case why estimate.


The contrived scenario is just a starting point. One can make more and more sophisticated ecological statistics models about the situation.

Regarding why estimate at all knowing they can be wrong ? Estimates are very useful for planning. Sophisticated models would also yield probabilities of over and underestimated, these combined with cost of over and underestimation errors are very useful for decision making.

See the German tank problem. Turns out the allied forces overestimated the number to f tanks left, still helped in planning.


It also makes sense in non-contrived scenarios ... the contrivance was just pedagogical.

Great explanation, thanks!



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: