Case evaluation improvements #17898

pepijnve · 2025-10-03T12:28:06Z

Which issue does this PR close?

None yet

Rationale for this change

Speculative performance improvements for case evaluation

What changes are included in this PR?

Short circuit evaluation when selection vector is all false
Avoid filtering record batch when selection vector is all true

Are these changes tested?

(Hopefully) covered by SQL logic tests

Are there any user-facing changes?

No

- Short circuit evaluation when selection vector is all false - Avoid filtering record batch when selection vector is all true

pepijnve · 2025-10-03T12:29:56Z

@alamb could you trigger a benchmark run on this PR (once CI gives the green light)?

alamb · 2025-10-03T13:41:37Z

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1016-gcp #17~24.04.1-Ubuntu SMP Wed Sep 3 01:55:36 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing case_improvements (3b75814) to 10a437b diff
BENCH_NAME=case_when
BENCH_COMMAND=cargo bench --bench case_when
BENCH_FILTER=
BENCH_BRANCH_NAME=case_improvements
Results will be posted here when complete

alamb · 2025-10-03T13:48:30Z

🤖: Benchmark completed

Details

group                          case_improvements                      main
-----                          -----------------                      ----
case_when: CASE expr           1.00     23.7±0.23µs        ? ?/sec    1.02     24.2±0.24µs        ? ?/sec
case_when: column or null      1.00   1401.1±2.72ns        ? ?/sec    1.01   1409.5±6.15ns        ? ?/sec
case_when: expr or expr        1.00     30.7±0.21µs        ? ?/sec    1.03     31.6±0.22µs        ? ?/sec
case_when: scalar or scalar    1.00      8.0±0.01µs        ? ?/sec    1.02      8.1±0.02µs        ? ?/sec

pepijnve · 2025-10-03T14:08:29Z

Small improvement (which is what I had expected). WDYT, worth integrating?
Could just be noise too though. The 'expr or expr' and 'scalar or scalar' code paths were not modified.

Edit: I had a closer look at the case_when benchmark. It's probably too much of a microbenchmark to see any difference there. It does show that the extra conditionals don't tank performance.

The example I'm looking at locally is a projection where a classification category is being associated with each row using a large case expression. Something like
SELECT *, CASE WHEN predicate_1 THEN 0 WHEN predicate_2 THEN 1 WHEN predicate_2 THEN 2 ... END FROM table. If my reading of the code is correct, there's much more data being manipulated in a scenario like that.

alamb

Seems like it is worth pursing to me -- thanks @pepijnve

I wonder if we can factor out the pattern of "don't call evaluate_selection if the selection is still the entire remainder (mostly so we can document the behavior / make sure this it doesn't get lost in some future refactoring)

Maybe something we could wrap the count in some structure or a function

🤔

let mut remainder = Remainder::new(batch.num_rows());
remainder.update(&self.when_then_expr[i], &batch);

alamb · 2025-10-03T20:40:44Z

datafusion/physical-expr/src/expressions/case.rs

        let mut current_value = new_null_array(&return_type, batch.num_rows());
        // We only consider non-null values while comparing with whens
        let mut remainder = not(&base_nulls)?;
+        let mut remainder_count = remainder.true_count();


we have found in past evaluations, that the code generated for true_count is astonishingly fast (it uses some special hardware instruction) so I am not surprised this works well

Good to know that that’s fairly cheap.

What I’m experimenting with is retaining the filtered record batch from one loop iteration to the next so that the amount of data to be churned through each iteration shrinks.

alamb · 2025-10-03T20:49:11Z

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1016-gcp #17~24.04.1-Ubuntu SMP Wed Sep 3 01:55:36 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing case_improvements (3b75814) to 10a437b diff
BENCH_NAME=case_when
BENCH_COMMAND=cargo bench --bench case_when
BENCH_FILTER=
BENCH_BRANCH_NAME=case_improvements
Results will be posted here when complete

alamb · 2025-10-03T20:55:49Z

🤖: Benchmark completed

Details

group                          case_improvements                      main
-----                          -----------------                      ----
case_when: CASE expr           1.00     24.0±0.10µs        ? ?/sec    1.00     23.9±0.14µs        ? ?/sec
case_when: column or null      1.00   1411.3±1.41ns        ? ?/sec    1.00   1411.1±5.67ns        ? ?/sec
case_when: expr or expr        1.01     31.2±0.25µs        ? ?/sec    1.00     31.0±0.28µs        ? ?/sec
case_when: scalar or scalar    1.00      8.0±0.02µs        ? ?/sec    1.00      8.0±0.02µs        ? ?/sec

github-actions bot added the physical-expr Changes to the physical-expr crates label Oct 3, 2025

Case evaluation improvements

3b75814

- Short circuit evaluation when selection vector is all false - Avoid filtering record batch when selection vector is all true

pepijnve force-pushed the case_improvements branch from bb647a5 to 3b75814 Compare October 3, 2025 12:28

alamb reviewed Oct 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Case evaluation improvements #17898

Case evaluation improvements #17898

Uh oh!

pepijnve commented Oct 3, 2025

Uh oh!

pepijnve commented Oct 3, 2025 •

edited

Loading

Uh oh!

alamb commented Oct 3, 2025

Uh oh!

alamb commented Oct 3, 2025

Uh oh!

pepijnve commented Oct 3, 2025 •

edited

Loading

Uh oh!

alamb left a comment

Uh oh!

alamb Oct 3, 2025

Uh oh!

pepijnve Oct 4, 2025

Uh oh!

alamb commented Oct 3, 2025

Uh oh!

alamb commented Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Case evaluation improvements #17898

Are you sure you want to change the base?

Case evaluation improvements #17898

Uh oh!

Conversation

pepijnve commented Oct 3, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

pepijnve commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Oct 3, 2025

Uh oh!

alamb commented Oct 3, 2025

Uh oh!

pepijnve commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

pepijnve Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

alamb commented Oct 3, 2025

Uh oh!

alamb commented Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pepijnve commented Oct 3, 2025 •

edited

Loading

pepijnve commented Oct 3, 2025 •

edited

Loading