Error rate is more than theoretical error rate 

The research paper claims the error rate to be `1.04/sqrt(m)`

For `log2m = 11 and w = 5` the theoretical error rate should be ~2.3% but in our tests we have seen a variance of upto 3.5% for inserting 100K unique alphanumeric UUIDs. Below is our pseudo code : 

```
HLL hll = new HLL( 11, 5 );
HashFunction hashFunction = Hashing.murmur3_128();

for (String str : UUIDs) {
  Hasher hasher = hashFunction.newHasher();
  hasher.putString(str, StandardCharsets.UTF_8);
  hll.addRaw(hasher.hash().asLong());
}
```

How can we reduce this additional 1.2% variance and fall in the theoretically claimed error rate ? 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error rate is more than theoretical error rate #28

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error rate is more than theoretical error rate #28

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions