Skip to content

Conversation

pepijnve
Copy link
Contributor

@pepijnve pepijnve commented Oct 3, 2025

Which issue does this PR close?

None yet

Rationale for this change

Speculative performance improvements for case evaluation

What changes are included in this PR?

  • Short circuit evaluation when selection vector is all false
  • Avoid filtering record batch when selection vector is all true

Are these changes tested?

(Hopefully) covered by SQL logic tests

Are there any user-facing changes?

No

@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Oct 3, 2025
- Short circuit evaluation when selection vector is all false
- Avoid filtering record batch when selection vector is all true
@pepijnve
Copy link
Contributor Author

pepijnve commented Oct 3, 2025

@alamb could you trigger a benchmark run on this PR (once CI gives the green light)?

@alamb
Copy link
Contributor

alamb commented Oct 3, 2025

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1016-gcp #17~24.04.1-Ubuntu SMP Wed Sep 3 01:55:36 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing case_improvements (3b75814) to 10a437b diff
BENCH_NAME=case_when
BENCH_COMMAND=cargo bench --bench case_when
BENCH_FILTER=
BENCH_BRANCH_NAME=case_improvements
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Oct 3, 2025

🤖: Benchmark completed

Details

group                          case_improvements                      main
-----                          -----------------                      ----
case_when: CASE expr           1.00     23.7±0.23µs        ? ?/sec    1.02     24.2±0.24µs        ? ?/sec
case_when: column or null      1.00   1401.1±2.72ns        ? ?/sec    1.01   1409.5±6.15ns        ? ?/sec
case_when: expr or expr        1.00     30.7±0.21µs        ? ?/sec    1.03     31.6±0.22µs        ? ?/sec
case_when: scalar or scalar    1.00      8.0±0.01µs        ? ?/sec    1.02      8.1±0.02µs        ? ?/sec

@pepijnve
Copy link
Contributor Author

pepijnve commented Oct 3, 2025

Small improvement (which is what I had expected). WDYT, worth integrating?
Could just be noise too though. The 'expr or expr' and 'scalar or scalar' code paths were not modified.

Edit: I had a closer look at the case_when benchmark. It's probably too much of a microbenchmark to see any difference there. It does show that the extra conditionals don't tank performance.

The example I'm looking at locally is a projection where a classification category is being associated with each row using a large case expression. Something like
SELECT *, CASE WHEN predicate_1 THEN 0 WHEN predicate_2 THEN 1 WHEN predicate_2 THEN 2 ... END FROM table. If my reading of the code is correct, there's much more data being manipulated in a scenario like that.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it is worth pursing to me -- thanks @pepijnve

I wonder if we can factor out the pattern of "don't call evaluate_selection if the selection is still the entire remainder (mostly so we can document the behavior / make sure this it doesn't get lost in some future refactoring)

Maybe something we could wrap the count in some structure or a function

🤔

let mut remainder = Remainder::new(batch.num_rows());
remainder.update(&self.when_then_expr[i], &batch);

let mut current_value = new_null_array(&return_type, batch.num_rows());
// We only consider non-null values while comparing with whens
let mut remainder = not(&base_nulls)?;
let mut remainder_count = remainder.true_count();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have found in past evaluations, that the code generated for true_count is astonishingly fast (it uses some special hardware instruction) so I am not surprised this works well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know that that’s fairly cheap.

What I’m experimenting with is retaining the filtered record batch from one loop iteration to the next so that the amount of data to be churned through each iteration shrinks.

@alamb
Copy link
Contributor

alamb commented Oct 3, 2025

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1016-gcp #17~24.04.1-Ubuntu SMP Wed Sep 3 01:55:36 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing case_improvements (3b75814) to 10a437b diff
BENCH_NAME=case_when
BENCH_COMMAND=cargo bench --bench case_when
BENCH_FILTER=
BENCH_BRANCH_NAME=case_improvements
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Oct 3, 2025

🤖: Benchmark completed

Details

group                          case_improvements                      main
-----                          -----------------                      ----
case_when: CASE expr           1.00     24.0±0.10µs        ? ?/sec    1.00     23.9±0.14µs        ? ?/sec
case_when: column or null      1.00   1411.3±1.41ns        ? ?/sec    1.00   1411.1±5.67ns        ? ?/sec
case_when: expr or expr        1.01     31.2±0.25µs        ? ?/sec    1.00     31.0±0.28µs        ? ?/sec
case_when: scalar or scalar    1.00      8.0±0.02µs        ? ?/sec    1.00      8.0±0.02µs        ? ?/sec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants