Higher log2m value not always producing more accurate estimates

My understanding from the HLL algorithm (which may be flawed, in which case please correct me and close this issue) is that for any fixed set of input values, the accuracy of any estimate from an HLL built from those values should increase as the "m" value used in the HLL increases.  

Ie: 

> if you build 2 HLL instances, with different `log2m` settings, and add the exact same set of (raw) values to both, then the HLL with the larger `log2m` will give you the most accurate results then the HLL with a smaller `log2m` setting.

In my testing however, I'm frequently encountering situations where "smaller" HLL instances are producing more accurate cardinality estimates -- which I can't explain.

I've created a reproducible test case that demonstrates the problem, which i will post as a separate comment.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Higher log2m value not always producing more accurate estimates #15

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Higher log2m value not always producing more accurate estimates #15

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions