-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Describe the bug
HashJoinExec and RepartitionExec both instantiate a ahash::RandomState
with the same seed:
datafusion/datafusion/physical-plan/src/repartition/mod.rs
Lines 210 to 216 in 64889c0
Partitioning::Hash(exprs, num_partitions) => BatchPartitionerState::Hash { | |
exprs, | |
num_partitions, | |
// Use fixed random hash | |
random_state: ahash::RandomState::with_seeds(0, 0, 0, 0), | |
hash_buffer: vec![], | |
}, |
datafusion/datafusion/physical-plan/src/joins/hash_join.rs
Lines 387 to 389 in 64889c0
let random_state = RandomState::with_seeds(0, 0, 0, 0); | |
This means that both operators compute the same hash functions. This is problematic when a HashJoinExec has a RepartitionExec as a child: Because the RepartitionExec partitions based on the lowest k bits of the hash, each HashJoinStream finds that all the hashes it computes have the same k lowest bits! In theory this could make the HashTable work less efficiently / have more collisions than expected.
Why in theory? Because despite my best attempts, I could not construct a benchmark that showed that changing the hash seed for HJ made a performance difference. I suspect this is due to a combination of the fact that the underlying hashtable uses open addressing, hashbrown storing a bitmask based on the highest bits in the buckets to do some early filtering and other bottlenecks in the HJ.
Is there anything I missed?
To Reproduce
Use this branch: https://github.com/ctsk/datafusion/tree/experiment/collision-reproducer
I patched the HashJoinExec so that it emits a warning when all row hashes of a batch share common least significant bits.
Run RUST_LOG=warn ./bench.sh run tpch
and see lots of warning emitted.
Expected behavior
No response
Additional context
Fix is simple: Change the hash seed in HashJoinExec - or don't provide any seed (use Default::default()
like we do in AggregationExec.