-
Notifications
You must be signed in to change notification settings - Fork 1.7k
perf: Optimize hash joins with an empty build side #16716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice to me Maybe we should have some more tests for correctness of results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @nuno-faria!
| match join_type { | ||
| // these join types only return data if the left side is not empty, so we return an | ||
| // empty RecordBatch | ||
| JoinType::Inner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, how about cross join
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cross joins with an empty relation already appear to run well in the CrossJoinExec operator.
Here is the CrossJoinExec operator for SELECT * FROM t1, t2, where t1 has 100M rows and t2 has none:
CrossJoinExec, metrics=[
output_rows=0,
elapsed_compute=351.714µs,
build_input_batches=0,
build_input_rows=0,
input_batches=0,
input_rows=0,
output_batches=0,
build_mem_used=0,
build_time=351.7µs,
join_time=12ns
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this makes sense, cross join is not a join type that would go through creating hash table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this, I think a more generic version of this would be switching small left sides (e.g < 10 rows) to using cross join 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this, I think a more generic version of this would be switching small left sides (e.g < 10 rows) to using cross join 🤔
Is this including for equijoin conditions? I think the performance seemed slow when there was a larger right table for doing this with nested loop join which follows a similar algorithm. It is probably a memory issue due to the cartesian product.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be relatively fast to do a cross join / NLJ instead of a hash join for those cases, but of course depends how the nested loop join is implemented, probably there is more room for optimization of the nested loop join.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of opening a proposal to make nested loop join faster, there are definitely some issues to work on there. I'll try to get to that when I have the time
|
@Dandandan I've added one more test where both tables are empty. Do you have suggestions for more? |
|
@jonathanc-n Since after #16434 the hash map is not directly accessible, I've added an |
|
|
||
| let timer = self.join_metrics.join_time.timer(); | ||
|
|
||
| // if the left side is empty, we can skip the (potentially expensive) join operation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we would check the left side being empty before retrieving probe batches, we could also remove hash repartition 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do this in a follow up pr wdyt @nuno-faria?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. Can you point out where the probe repartition is being triggered? In the process_probe_batch itself I think we can also skip creating the hashes when the build side is empty, but I measured and it didn't have a relatively big impact on performance.
Yes this looks good |
|
@nuno-faria We can return early from collect_left_input after intaking batches and checking the number of batches if batches.len() == 0 {
return Ok(JoinLeftData::new(
Box::new(JoinHashMapU32::with_capacity(0)),
RecordBatch::new_empty(schema),
Vec::new(),
Mutex::new(BooleanBufferBuilder::new(0)),
AtomicUsize::new(probe_threads_count),
reservation,
));
}; |
|
Looks like this PR was good to go and had no outstanding todos so I merged it in |
Which issue does this PR close?
N/A.
Rationale for this change
When executing hash joins, the build side is first built from the left relation and then the right relation is joined with it. However, when the build side has no rows, the join operation can be mostly skipped, improving performance.
For example, here is a simple anti join query, where
t1has 100M rows andt2has none:Here is the hash join operator in the current implementation:
HashJoinExec: mode=Partitioned, join_type=RightAnti, on=[(k@0, k@0)], metrics=[ output_rows=100000000, build_input_batches=0, build_input_rows=0, input_batches=11733, input_rows=100000000, output_batches=23403, build_mem_used=876, build_time=2.8693ms, join_time=216.251396302s ]And here is the optimized hash join operation:
HashJoinExec: mode=Partitioned, join_type=RightAnti, on=[(k@0, k@0)], metrics=[ output_rows=100000000, build_input_batches=0, build_input_rows=0, input_batches=11733, input_rows=100000000, output_batches=11733, build_mem_used=876, build_time=2.4597ms, join_time=36.038306ms ]The total join time went from 216s to just 36ms.
What changes are included in this PR?
process_probe_batchinphysical-plan/hash_join.rsto optimized the join.Are these changes tested?
Yes.
Are there any user-facing changes?
No.