Skip to content

Conversation

@nuno-faria
Copy link
Contributor

Which issue does this PR close?

N/A.

Rationale for this change

When executing hash joins, the build side is first built from the left relation and then the right relation is joined with it. However, when the build side has no rows, the join operation can be mostly skipped, improving performance.

For example, here is a simple anti join query, where t1 has 100M rows and t2 has none:

SELECT *
FROM t1
LEFT ANTI JOIN t2 on t1.k = t2.k

Here is the hash join operator in the current implementation:

HashJoinExec: mode=Partitioned, join_type=RightAnti, on=[(k@0, k@0)], metrics=[
    output_rows=100000000,
    build_input_batches=0,
    build_input_rows=0,
    input_batches=11733,
    input_rows=100000000,
    output_batches=23403,
    build_mem_used=876,
    build_time=2.8693ms,
    join_time=216.251396302s
]

And here is the optimized hash join operation:

HashJoinExec: mode=Partitioned, join_type=RightAnti, on=[(k@0, k@0)], metrics=[
    output_rows=100000000,
    build_input_batches=0,
    build_input_rows=0,
    input_batches=11733,
    input_rows=100000000,
    output_batches=11733,
    build_mem_used=876,
    build_time=2.4597ms,
    join_time=36.038306ms
]

The total join time went from 216s to just 36ms.

What changes are included in this PR?

  • Changed process_probe_batch in physical-plan/hash_join.rs to optimized the join.
  • Added multiple sqllogictests.

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) physical-plan Changes to the physical-plan crate labels Jul 8, 2025
Copy link
Contributor

@Dandandan Dandandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice to me Maybe we should have some more tests for correctness of results.

Copy link
Contributor

@jonathanc-n jonathanc-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @nuno-faria!

@xudong963 xudong963 added the performance Make DataFusion faster label Jul 9, 2025
match join_type {
// these join types only return data if the left side is not empty, so we return an
// empty RecordBatch
JoinType::Inner
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, how about cross join

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cross joins with an empty relation already appear to run well in the CrossJoinExec operator.

Here is the CrossJoinExec operator for SELECT * FROM t1, t2, where t1 has 100M rows and t2 has none:

CrossJoinExec, metrics=[
    output_rows=0,
    elapsed_compute=351.714µs,
    build_input_batches=0,
    build_input_rows=0,
    input_batches=0,
    input_rows=0,
    output_batches=0,
    build_mem_used=0,
    build_time=351.7µs,
    join_time=12ns
]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this makes sense, cross join is not a join type that would go through creating hash table

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this, I think a more generic version of this would be switching small left sides (e.g < 10 rows) to using cross join 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this, I think a more generic version of this would be switching small left sides (e.g < 10 rows) to using cross join 🤔

Is this including for equijoin conditions? I think the performance seemed slow when there was a larger right table for doing this with nested loop join which follows a similar algorithm. It is probably a memory issue due to the cartesian product.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be relatively fast to do a cross join / NLJ instead of a hash join for those cases, but of course depends how the nested loop join is implemented, probably there is more room for optimization of the nested loop join.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of opening a proposal to make nested loop join faster, there are definitely some issues to work on there. I'll try to get to that when I have the time

@nuno-faria
Copy link
Contributor Author

@Dandandan I've added one more test where both tables are empty. Do you have suggestions for more?

@nuno-faria
Copy link
Contributor Author

@jonathanc-n Since after #16434 the hash map is not directly accessible, I've added an is_empty method to JoinHashMapType. Please check if this is the preferred approach.


let timer = self.join_metrics.join_time.timer();

// if the left side is empty, we can skip the (potentially expensive) join operation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we would check the left side being empty before retrieving probe batches, we could also remove hash repartition 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do this in a follow up pr wdyt @nuno-faria?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. Can you point out where the probe repartition is being triggered? In the process_probe_batch itself I think we can also skip creating the hashes when the build side is empty, but I measured and it didn't have a relatively big impact on performance.

@jonathanc-n
Copy link
Contributor

@jonathanc-n Since after #16434 the hash map is not directly accessible, I've added an is_empty method to JoinHashMapType. Please check if this is the preferred approach.

Yes this looks good

@jonathanc-n
Copy link
Contributor

@nuno-faria We can return early from collect_left_input after intaking batches and checking the number of batches

if batches.len() == 0 {
        return Ok(JoinLeftData::new(
            Box::new(JoinHashMapU32::with_capacity(0)),
            RecordBatch::new_empty(schema),
            Vec::new(),
            Mutex::new(BooleanBufferBuilder::new(0)),
            AtomicUsize::new(probe_threads_count),
            reservation,
        ));
    };

@alamb alamb merged commit 04b006c into apache:main Jul 14, 2025
27 checks passed
@alamb
Copy link
Contributor

alamb commented Jul 14, 2025

Looks like this PR was good to go and had no outstanding todos so I merged it in

@nuno-faria nuno-faria deleted the optimize_empty_hashjoins branch July 14, 2025 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Make DataFusion faster physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants