Skip to content

Error rate is more than theoretical error rate  #28

@shreshthagarg21

Description

@shreshthagarg21

The research paper claims the error rate to be 1.04/sqrt(m)

For log2m = 11 and w = 5 the theoretical error rate should be ~2.3% but in our tests we have seen a variance of upto 3.5% for inserting 100K unique alphanumeric UUIDs. Below is our pseudo code :

HLL hll = new HLL( 11, 5 );
HashFunction hashFunction = Hashing.murmur3_128();

for (String str : UUIDs) {
  Hasher hasher = hashFunction.newHasher();
  hasher.putString(str, StandardCharsets.UTF_8);
  hll.addRaw(hasher.hash().asLong());
}

How can we reduce this additional 1.2% variance and fall in the theoretically claimed error rate ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions