-
Notifications
You must be signed in to change notification settings - Fork 1.8k
perf: optimize CASE WHEN lookup table (2.5-22.5 times faster) #18183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
perf: optimize CASE WHEN lookup table (2.5-22.5 times faster) #18183
Conversation
I don't like that it is in `sort_properties.rs` but this struct is used in `get_properties` which seems like the most appropriate place
…to improve-performance-for-literal-mapping # Conflicts: # datafusion/physical-expr/benches/case_when.rs
## Which issue does this PR close? N/A But extracted from: - #18183 ## Rationale for this change I want to add optimization for lookup based `CASE WHEN` like: ```sql CASE company WHEN 1 THEN 'Apple' WHEN 5 THEN 'Samsung' WHEN 2 THEN 'Motorola' WHEN 3 THEN 'LG' ELSE 'Other' END ``` ## What changes are included in this PR? Added multiple benchmarks for testing lookup table from int to string and vice verca with different size of lookup table (5, 10, 20), different probabilities for having values generated to exist in the lookup map, and probabilities for the number of nulls ## Are these changes tested? N/A ## Are there any user-facing changes? nope <details> <summary>Current results</summary> ``` lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0 time: [572.27 µs 572.51 µs 572.78 µs] change: [-0.4311% -0.1953% +0.0524%] (p = 0.09 > 0.05) No change in performance detected. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0 time: [808.11 µs 808.63 µs 809.30 µs] change: [+0.2857% +0.4440% +0.6288%] (p = 0.00 < 0.05) Change within noise threshold. Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0 time: [341.97 µs 342.52 µs 343.21 µs] change: [-0.0740% +0.1913% +0.4541%] (p = 0.13 > 0.05) No change in performance detected. Found 9 outliers among 100 measurements (9.00%) 3 (3.00%) high mild 6 (6.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0 time: [446.34 µs 446.58 µs 446.83 µs] change: [-0.0381% +0.1947% +0.3941%] (p = 0.08 > 0.05) No change in performance detected. Found 12 outliers among 100 measurements (12.00%) 8 (8.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0 time: [804.30 µs 804.79 µs 805.33 µs] change: [+0.7523% +0.9613% +1.1731%] (p = 0.00 < 0.05) Change within noise threshold. Found 6 outliers among 100 measurements (6.00%) 4 (4.00%) high mild 2 (2.00%) high severe Benchmarking lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.1s, enable flat sampling, or reduce sample count to 50. lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0 time: [1.3972 ms 1.3979 ms 1.3987 ms] change: [-0.4150% -0.2455% -0.0613%] (p = 0.00 < 0.05) Change within noise threshold. Found 10 outliers among 100 measurements (10.00%) 7 (7.00%) high mild 3 (3.00%) high severe lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0 time: [342.75 µs 342.96 µs 343.22 µs] change: [+0.0122% +0.2433% +0.4781%] (p = 0.02 < 0.05) Change within noise threshold. Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low mild 2 (2.00%) high mild 7 (7.00%) high severe lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0 time: [445.39 µs 445.56 µs 445.75 µs] change: [-0.4254% -0.2538% -0.0547%] (p = 0.00 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) high mild 3 (3.00%) high severe Benchmarking lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.9s, enable flat sampling, or reduce sample count to 60. lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0 time: [1.1731 ms 1.1738 ms 1.1746 ms] change: [+0.3589% +0.5605% +0.7873%] (p = 0.00 < 0.05) Change within noise threshold. Found 11 outliers among 100 measurements (11.00%) 1 (1.00%) low mild 5 (5.00%) high mild 5 (5.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0 time: [2.3819 ms 2.3832 ms 2.3845 ms] change: [+0.0355% +0.1016% +0.1663%] (p = 0.00 < 0.05) Change within noise threshold. Found 5 outliers among 100 measurements (5.00%) 5 (5.00%) high mild lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0 time: [341.53 µs 341.68 µs 341.85 µs] change: [-0.4361% -0.2031% +0.0398%] (p = 0.08 > 0.05) No change in performance detected. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) high mild 7 (7.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0 time: [444.97 µs 445.12 µs 445.30 µs] change: [-0.3776% -0.2148% -0.0460%] (p = 0.00 < 0.05) Change within noise threshold. Found 12 outliers among 100 measurements (12.00%) 2 (2.00%) high mild 10 (10.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1 time: [728.94 µs 729.23 µs 729.58 µs] change: [+0.2335% +0.4019% +0.5791%] (p = 0.00 < 0.05) Change within noise threshold. Found 14 outliers among 100 measurements (14.00%) 1 (1.00%) low mild 8 (8.00%) high mild 5 (5.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1 time: [966.21 µs 967.02 µs 968.03 µs] change: [+0.2251% +0.3997% +0.6015%] (p = 0.00 < 0.05) Change within noise threshold. Found 11 outliers among 100 measurements (11.00%) 2 (2.00%) high mild 9 (9.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.1 time: [485.63 µs 485.86 µs 486.11 µs] change: [+0.0990% +0.2832% +0.4909%] (p = 0.00 < 0.05) Change within noise threshold. Found 6 outliers among 100 measurements (6.00%) 4 (4.00%) high mild 2 (2.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.1 time: [565.86 µs 566.24 µs 566.70 µs] change: [+0.1038% +0.2811% +0.4814%] (p = 0.00 < 0.05) Change within noise threshold. Found 14 outliers among 100 measurements (14.00%) 5 (5.00%) high mild 9 (9.00%) high severe Benchmarking lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.2s, enable flat sampling, or reduce sample count to 60. lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1 time: [1.0237 ms 1.0243 ms 1.0250 ms] change: [+0.3817% +0.6095% +0.8366%] (p = 0.00 < 0.05) Change within noise threshold. Found 12 outliers among 100 measurements (12.00%) 4 (4.00%) high mild 8 (8.00%) high severe Benchmarking lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.2s, enable flat sampling, or reduce sample count to 50. lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1 time: [1.6139 ms 1.6145 ms 1.6151 ms] change: [-0.3413% -0.1799% -0.0094%] (p = 0.03 < 0.05) Change within noise threshold. Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) high mild 3 (3.00%) high severe lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.1 time: [485.50 µs 485.75 µs 486.07 µs] change: [+0.0362% +0.2597% +0.4990%] (p = 0.02 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.1 time: [564.83 µs 565.13 µs 565.49 µs] change: [-0.2292% -0.0443% +0.1429%] (p = 0.68 > 0.05) No change in performance detected. Found 13 outliers among 100 measurements (13.00%) 5 (5.00%) high mild 8 (8.00%) high severe Benchmarking lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.7s, enable flat sampling, or reduce sample count to 50. lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1 time: [1.5206 ms 1.5214 ms 1.5223 ms] change: [+0.1219% +0.2845% +0.4382%] (p = 0.00 < 0.05) Change within noise threshold. Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.1 time: [2.7355 ms 2.7372 ms 2.7392 ms] change: [+0.1862% +0.2633% +0.3492%] (p = 0.00 < 0.05) Change within noise threshold. Found 16 outliers among 100 measurements (16.00%) 6 (6.00%) high mild 10 (10.00%) high severe lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.1 time: [485.64 µs 485.87 µs 486.12 µs] change: [-0.1317% +0.0342% +0.1974%] (p = 0.72 > 0.05) No change in performance detected. Found 17 outliers among 100 measurements (17.00%) 10 (10.00%) high mild 7 (7.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.1 time: [564.95 µs 565.22 µs 565.52 µs] change: [-0.2459% -0.0804% +0.1093%] (p = 0.42 > 0.05) No change in performance detected. Found 11 outliers among 100 measurements (11.00%) 5 (5.00%) high mild 6 (6.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5 time: [613.55 µs 613.90 µs 614.30 µs] change: [-0.1978% +0.0206% +0.2512%] (p = 0.88 > 0.05) No change in performance detected. Found 13 outliers among 100 measurements (13.00%) 2 (2.00%) high mild 11 (11.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5 time: [804.94 µs 805.27 µs 805.64 µs] change: [-0.3371% -0.2017% -0.0566%] (p = 0.00 < 0.05) Change within noise threshold. Found 7 outliers among 100 measurements (7.00%) 6 (6.00%) high mild 1 (1.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.5 time: [451.36 µs 451.55 µs 451.75 µs] change: [-0.1076% +0.0692% +0.2464%] (p = 0.48 > 0.05) No change in performance detected. Found 10 outliers among 100 measurements (10.00%) 6 (6.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.5 time: [516.12 µs 516.36 µs 516.64 µs] change: [-0.2179% +0.0030% +0.2181%] (p = 0.96 > 0.05) No change in performance detected. Found 15 outliers among 100 measurements (15.00%) 5 (5.00%) high mild 10 (10.00%) high severe lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5 time: [831.48 µs 831.89 µs 832.37 µs] change: [+0.2730% +0.4416% +0.6047%] (p = 0.00 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 7 (7.00%) high mild 2 (2.00%) high severe Benchmarking lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.6s, enable flat sampling, or reduce sample count to 60. lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5 time: [1.2999 ms 1.3006 ms 1.3014 ms] change: [+0.1551% +0.3508% +0.5933%] (p = 0.00 < 0.05) Change within noise threshold. Found 10 outliers among 100 measurements (10.00%) 7 (7.00%) high mild 3 (3.00%) high severe lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.5 time: [451.54 µs 451.76 µs 452.00 µs] change: [-0.0486% +0.1303% +0.3100%] (p = 0.17 > 0.05) No change in performance detected. Found 8 outliers among 100 measurements (8.00%) 6 (6.00%) high mild 2 (2.00%) high severe lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.5 time: [516.47 µs 516.73 µs 517.04 µs] change: [-0.2732% -0.0578% +0.1455%] (p = 0.66 > 0.05) No change in performance detected. Found 14 outliers among 100 measurements (14.00%) 4 (4.00%) high mild 10 (10.00%) high severe Benchmarking lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.2s, enable flat sampling, or reduce sample count to 60. lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5 time: [1.2276 ms 1.2283 ms 1.2290 ms] change: [+0.3032% +0.4998% +0.6974%] (p = 0.00 < 0.05) Change within noise threshold. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.1, nulls: 0.5 time: [2.1643 ms 2.1656 ms 2.1671 ms] change: [-0.2215% -0.1512% -0.0757%] (p = 0.00 < 0.05) Change within noise threshold. Found 16 outliers among 100 measurements (16.00%) 7 (7.00%) high mild 9 (9.00%) high severe lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.5 time: [451.48 µs 451.71 µs 451.96 µs] change: [-0.4533% -0.2697% -0.0701%] (p = 0.00 < 0.05) Change within noise threshold. Found 12 outliers among 100 measurements (12.00%) 7 (7.00%) high mild 5 (5.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.1, nulls: 0.5 time: [516.17 µs 516.42 µs 516.71 µs] change: [-0.3951% -0.1759% +0.0285%] (p = 0.10 > 0.05) No change in performance detected. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) high mild 7 (7.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0 time: [581.53 µs 581.85 µs 582.22 µs] change: [-0.4681% -0.2375% +0.0205%] (p = 0.03 < 0.05) Change within noise threshold. Found 12 outliers among 100 measurements (12.00%) 6 (6.00%) high mild 6 (6.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0 time: [806.04 µs 806.40 µs 806.83 µs] change: [-0.3328% -0.1863% -0.0288%] (p = 0.00 < 0.05) Change within noise threshold. Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) high mild 6 (6.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0 time: [341.42 µs 341.59 µs 341.78 µs] change: [-0.5843% -0.3707% -0.1573%] (p = 0.00 < 0.05) Change within noise threshold. Found 11 outliers among 100 measurements (11.00%) 6 (6.00%) high mild 5 (5.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0 time: [445.02 µs 445.20 µs 445.42 µs] change: [-0.3079% -0.1435% +0.0361%] (p = 0.10 > 0.05) No change in performance detected. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0 time: [804.03 µs 804.52 µs 805.10 µs] change: [+0.0833% +0.2480% +0.4237%] (p = 0.00 < 0.05) Change within noise threshold. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) high mild 7 (7.00%) high severe Benchmarking lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.1s, enable flat sampling, or reduce sample count to 50. lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0 time: [1.3958 ms 1.3964 ms 1.3970 ms] change: [-0.3882% -0.2616% -0.1751%] (p = 0.00 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) high mild 3 (3.00%) high severe lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0 time: [341.66 µs 341.85 µs 342.06 µs] change: [-0.5154% -0.3585% -0.2438%] (p = 0.00 < 0.05) Change within noise threshold. Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) high mild 1 (1.00%) high severe lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0 time: [445.08 µs 445.29 µs 445.53 µs] change: [-0.1558% +0.0148% +0.1858%] (p = 0.88 > 0.05) No change in performance detected. Found 13 outliers among 100 measurements (13.00%) 7 (7.00%) high mild 6 (6.00%) high severe Benchmarking lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.9s, enable flat sampling, or reduce sample count to 60. lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0 time: [1.1700 ms 1.1709 ms 1.1719 ms] change: [+0.0124% +0.2123% +0.4009%] (p = 0.02 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 4 (4.00%) high mild 5 (5.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0 time: [2.3757 ms 2.3775 ms 2.3800 ms] change: [+0.0656% +0.1579% +0.2573%] (p = 0.00 < 0.05) Change within noise threshold. Found 12 outliers among 100 measurements (12.00%) 6 (6.00%) high mild 6 (6.00%) high severe lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0 time: [341.47 µs 341.77 µs 342.10 µs] change: [-0.2178% +0.0125% +0.2362%] (p = 0.92 > 0.05) No change in performance detected. Found 10 outliers among 100 measurements (10.00%) 5 (5.00%) high mild 5 (5.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0 time: [445.57 µs 445.82 µs 446.09 µs] change: [-0.0565% +0.1035% +0.2671%] (p = 0.23 > 0.05) No change in performance detected. Found 9 outliers among 100 measurements (9.00%) 5 (5.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1 time: [728.22 µs 728.56 µs 728.96 µs] change: [-0.3534% -0.1796% +0.0062%] (p = 0.03 < 0.05) Change within noise threshold. Found 13 outliers among 100 measurements (13.00%) 4 (4.00%) high mild 9 (9.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1 time: [965.27 µs 965.67 µs 966.17 µs] change: [-0.2803% -0.1597% -0.0369%] (p = 0.00 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 5 (5.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.1 time: [485.15 µs 485.40 µs 485.69 µs] change: [-0.1787% +0.0020% +0.1849%] (p = 0.98 > 0.05) No change in performance detected. Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) high mild 7 (7.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.1 time: [566.91 µs 567.13 µs 567.37 µs] change: [-0.1075% +0.0721% +0.2537%] (p = 0.47 > 0.05) No change in performance detected. Found 11 outliers among 100 measurements (11.00%) 3 (3.00%) high mild 8 (8.00%) high severe Benchmarking lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.2s, enable flat sampling, or reduce sample count to 60. lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1 time: [1.0272 ms 1.0278 ms 1.0286 ms] Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) high mild 7 (7.00%) high severe Benchmarking lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.2s, enable flat sampling, or reduce sample count to 50. lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1 time: [1.6216 ms 1.6224 ms 1.6232 ms] Found 8 outliers among 100 measurements (8.00%) 2 (2.00%) high mild 6 (6.00%) high severe lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.1 time: [485.95 µs 486.18 µs 486.46 µs] Found 13 outliers among 100 measurements (13.00%) 5 (5.00%) high mild 8 (8.00%) high severe lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.1 time: [566.82 µs 567.07 µs 567.34 µs] Found 15 outliers among 100 measurements (15.00%) 7 (7.00%) high mild 8 (8.00%) high severe Benchmarking lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.7s, enable flat sampling, or reduce sample count to 50. lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1 time: [1.5252 ms 1.5263 ms 1.5276 ms] Found 10 outliers among 100 measurements (10.00%) 7 (7.00%) high mild 3 (3.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.1 time: [2.7479 ms 2.7492 ms 2.7507 ms] Found 15 outliers among 100 measurements (15.00%) 8 (8.00%) high mild 7 (7.00%) high severe lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.1 time: [485.18 µs 485.54 µs 486.05 µs] Found 13 outliers among 100 measurements (13.00%) 4 (4.00%) high mild 9 (9.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.1 time: [566.81 µs 567.09 µs 567.43 µs] Found 15 outliers among 100 measurements (15.00%) 7 (7.00%) high mild 8 (8.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5 time: [614.34 µs 614.75 µs 615.29 µs] Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) high mild 7 (7.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5 time: [808.13 µs 808.56 µs 809.05 µs] Found 11 outliers among 100 measurements (11.00%) 7 (7.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.5 time: [452.21 µs 452.46 µs 452.79 µs] Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) high mild 6 (6.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.5 time: [518.11 µs 518.36 µs 518.62 µs] Found 13 outliers among 100 measurements (13.00%) 7 (7.00%) high mild 6 (6.00%) high severe lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5 time: [834.45 µs 834.88 µs 835.36 µs] Found 9 outliers among 100 measurements (9.00%) 6 (6.00%) high mild 3 (3.00%) high severe Benchmarking lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.6s, enable flat sampling, or reduce sample count to 60. lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5 time: [1.3037 ms 1.3045 ms 1.3053 ms] Found 8 outliers among 100 measurements (8.00%) 7 (7.00%) high mild 1 (1.00%) high severe lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.5 time: [451.25 µs 451.57 µs 451.99 µs] Found 11 outliers among 100 measurements (11.00%) 5 (5.00%) high mild 6 (6.00%) high severe lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.5 time: [517.62 µs 517.86 µs 518.12 µs] Found 11 outliers among 100 measurements (11.00%) 5 (5.00%) high mild 6 (6.00%) high severe Benchmarking lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.2s, enable flat sampling, or reduce sample count to 60. lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5 time: [1.2297 ms 1.2310 ms 1.2328 ms] Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.5, nulls: 0.5 time: [2.1666 ms 2.1676 ms 2.1686 ms] Found 8 outliers among 100 measurements (8.00%) 7 (7.00%) high mild 1 (1.00%) high severe lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.5 time: [452.46 µs 452.66 µs 452.88 µs] Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) high mild 3 (3.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.5, nulls: 0.5 time: [517.20 µs 517.44 µs 517.72 µs] Found 17 outliers among 100 measurements (17.00%) 6 (6.00%) high mild 11 (11.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0 time: [582.53 µs 583.31 µs 584.58 µs] Found 6 outliers among 100 measurements (6.00%) 1 (1.00%) high mild 5 (5.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0 time: [807.24 µs 807.65 µs 808.09 µs] Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0 time: [341.89 µs 342.06 µs 342.25 µs] Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0 time: [445.32 µs 445.54 µs 445.80 µs] Found 9 outliers among 100 measurements (9.00%) 3 (3.00%) high mild 6 (6.00%) high severe lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0 time: [804.09 µs 804.53 µs 805.00 µs] Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) high mild 1 (1.00%) high severe Benchmarking lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.1s, enable flat sampling, or reduce sample count to 50. lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0 time: [1.3968 ms 1.3975 ms 1.3983 ms] Found 11 outliers among 100 measurements (11.00%) 4 (4.00%) high mild 7 (7.00%) high severe lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0 time: [341.27 µs 341.43 µs 341.60 µs] Found 9 outliers among 100 measurements (9.00%) 7 (7.00%) high mild 2 (2.00%) high severe lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0 time: [445.54 µs 445.86 µs 446.24 µs] Found 14 outliers among 100 measurements (14.00%) 6 (6.00%) high mild 8 (8.00%) high severe Benchmarking lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.9s, enable flat sampling, or reduce sample count to 60. lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0 time: [1.1703 ms 1.1710 ms 1.1717 ms] Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0 time: [2.3708 ms 2.3724 ms 2.3743 ms] Found 11 outliers among 100 measurements (11.00%) 8 (8.00%) high mild 3 (3.00%) high severe lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0 time: [341.94 µs 342.15 µs 342.41 µs] Found 9 outliers among 100 measurements (9.00%) 5 (5.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0 time: [445.15 µs 445.42 µs 445.74 µs] Found 11 outliers among 100 measurements (11.00%) 4 (4.00%) high mild 7 (7.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1 time: [729.65 µs 729.94 µs 730.26 µs] Found 11 outliers among 100 measurements (11.00%) 6 (6.00%) high mild 5 (5.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1 time: [966.46 µs 966.97 µs 967.58 µs] Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) high mild 5 (5.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0.1 time: [486.14 µs 486.36 µs 486.61 µs] Found 9 outliers among 100 measurements (9.00%) 4 (4.00%) high mild 5 (5.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0.1 time: [566.81 µs 567.07 µs 567.34 µs] Found 13 outliers among 100 measurements (13.00%) 6 (6.00%) high mild 7 (7.00%) high severe Benchmarking lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.2s, enable flat sampling, or reduce sample count to 60. lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1 time: [1.0273 ms 1.0278 ms 1.0283 ms] Found 9 outliers among 100 measurements (9.00%) 4 (4.00%) high mild 5 (5.00%) high severe Benchmarking lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.2s, enable flat sampling, or reduce sample count to 50. lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1 time: [1.6239 ms 1.6248 ms 1.6258 ms] Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0.1 time: [485.73 µs 486.04 µs 486.43 µs] Found 14 outliers among 100 measurements (14.00%) 6 (6.00%) high mild 8 (8.00%) high severe lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0.1 time: [567.22 µs 567.54 µs 567.93 µs] Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) high mild 6 (6.00%) high severe Benchmarking lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.7s, enable flat sampling, or reduce sample count to 50. lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1 time: [1.5275 ms 1.5282 ms 1.5290 ms] Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 0.9, nulls: 0.1 time: [2.7513 ms 2.7532 ms 2.7553 ms] Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) high mild 6 (6.00%) high severe lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0.1 time: [486.38 µs 486.58 µs 486.78 µs] Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 0.9, nulls: 0.1 time: [566.15 µs 566.42 µs 566.75 µs] Found 13 outliers among 100 measurements (13.00%) 7 (7.00%) high mild 6 (6.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0 time: [582.47 µs 582.74 µs 583.04 µs] Found 14 outliers among 100 measurements (14.00%) 9 (9.00%) high mild 5 (5.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0 time: [807.70 µs 808.17 µs 808.74 µs] Found 11 outliers among 100 measurements (11.00%) 2 (2.00%) high mild 9 (9.00%) high severe lookup_table_case_when/case when i32 -> utf8, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 1, nulls: 0 time: [341.62 µs 341.80 µs 342.03 µs] Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when utf8 -> i32, 5 entries, only first 2 are true/case_when 8192 rows: in_range: 1, nulls: 0 time: [445.48 µs 445.87 µs 446.40 µs] Found 13 outliers among 100 measurements (13.00%) 6 (6.00%) high mild 7 (7.00%) high severe lookup_table_case_when/case when i32 -> utf8, 10 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0 time: [804.56 µs 805.03 µs 805.52 µs] Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) high mild 1 (1.00%) high severe Benchmarking lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.1s, enable flat sampling, or reduce sample count to 50. lookup_table_case_when/case when utf8 -> i32, 10 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0 time: [1.3995 ms 1.4004 ms 1.4015 ms] Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe lookup_table_case_when/case when i32 -> utf8, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 1, nulls: 0 time: [341.46 µs 341.64 µs 341.85 µs] Found 9 outliers among 100 measurements (9.00%) 6 (6.00%) high mild 3 (3.00%) high severe lookup_table_case_when/case when utf8 -> i32, 10 entries, only first 2 are true/case_when 8192 rows: in_range: 1, nulls: 0 time: [445.91 µs 446.16 µs 446.47 µs] Found 9 outliers among 100 measurements (9.00%) 3 (3.00%) high mild 6 (6.00%) high severe Benchmarking lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.9s, enable flat sampling, or reduce sample count to 60. lookup_table_case_when/case when i32 -> utf8, 20 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0 time: [1.1708 ms 1.1716 ms 1.1725 ms] Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, all equally true/case_when 8192 rows: in_range: 1, nulls: 0 time: [2.3735 ms 2.3748 ms 2.3763 ms] Found 12 outliers among 100 measurements (12.00%) 5 (5.00%) high mild 7 (7.00%) high severe lookup_table_case_when/case when i32 -> utf8, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 1, nulls: 0 time: [342.14 µs 342.31 µs 342.53 µs] Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) high mild 5 (5.00%) high severe lookup_table_case_when/case when utf8 -> i32, 20 entries, only first 2 are true/case_when 8192 rows: in_range: 1, nulls: 0 time: [444.92 µs 445.09 µs 445.29 µs] Found 15 outliers among 100 measurements (15.00%) 6 (6.00%) high mild 9 (9.00%) high severe ``` </details>
@alamb I've completed a first read through of the code. Since this is the first time I'm reviewing code in the context of DataFusion any meta comments on the review are more than welcome. |
|
Sorry @pepijnve for the delay, thank you for the review, will address it now |
# Conflicts: # datafusion/physical-expr/src/expressions/case/mod.rs
|
@pepijnve Can you please re-review I've addressed your comments |
| let is_scalar = matches!(evaluated_expression, ColumnarValue::Scalar(_)); | ||
| let evaluated_expression = evaluated_expression.to_array(1)?; | ||
|
|
||
| let output = scalars_or_null_lookup.evaluate_input(&evaluated_expression)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still struggling a bit with this line. It doesn't really read like 'perform lookup in table'. lookup_table.get(...) similar to HashMap::get would seem to be better describe what we're doing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to map_input_to_output
|
|
||
| pub(in super::super) fn map_input_to_output( | ||
| &self, | ||
| expr_array: &ArrayRef, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't really an array of expression, but an array of values (or keys for the lookup table)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the naming came from the CASE <expr>, but changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I understand where you're coming from. For the lookup tables, I was mostly trying to understand them as general purpose associative arrays without taking the usage context into account. When looked at that way expr_array is making a reference to context that you don't really have at this point in the code. Sorry if I'm being too nitpicky about this stuff.
| let is_scalar = matches!(evaluated_expression, ColumnarValue::Scalar(_)); | ||
| let evaluated_expression = evaluated_expression.to_array(1)?; | ||
|
|
||
| let output = lookup_table.map_input_to_output(&evaluated_expression)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to make you keep on renaming this thing since this is going to look like bike shedding. Just wondering why you want to keep some reference to 'input' and or 'output' in this name.
I'm looking at the LookupTable concept as a fairly generic 'associative array'-like data structure where you're looking up values that correspond to keys. I think the public interface can reflect that.
Wouldn't just map or lookup suffice? Anyway, this is just my opinion. Might be useful to get a third opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed
| ) -> datafusion_common::Result<Vec<u32>>; | ||
| } | ||
|
|
||
| pub(crate) fn try_creating_lookup_table( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this function needs to be pub(crate), does it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right
| } | ||
| } | ||
|
|
||
| /// Lookup table for mapping literal values to their corresponding indices in the THEN clauses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation code itself all makes sense to me, but the lack of naming consistency sometimes makes it harder than necessary to follow along.
In the comments here for instance
Lookup table for mapping literal values to their corresponding indices in the THEN clauses
is followed by
Return indices to take from the literals based on the values in the given array
The word 'literal' is used here twice to refer to two different things. The first statement is talking about the evaluated when expression values. They're not really literals at all.
The second usage refers to the evaluated then expression values. These don't even have to be literals per se, just const evaluatable.
I think it would be beneficial for future readers to try to get things clearer or more consistent, but as I said in the other comment I don't want this to come across as bike shedding either. Maybe I'm getting hung up on something not all that important. Would be good to hear from another voice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
|
@pepijnve updated again, can you please re-review |
|
@pepijnve can you please re-review before I have more conflicts :) |
It's on my todo list for today or tomorrow. Sorry for the churn; nothing else incoming from me. |
|
@rluvaton in an attempt to speed things up I've bundled everything I wanted to suggest in a PR instead at rluvaton#6 The intent here is to split the scalar value key/value maps and the usage of this for case a bit further and then use consistent terminology and naming across all the types and comments. All the case specific bits are now consolidated in the As I said before, I'm not a committer and I'm only trying to express my point of view wrt this code. I'll leave it up to you to decide if you want to integrate these suggestions or not. In terms of the implementation itself nothing has changed (or at least not intentionally). That was already fine as is. |
|
@rluvaton The test cases I added in #18872 trigger a panic in the |
|
@pepijnve I appreciate the review and the PR, but I think we should make your changes as different PR as I have some comments on them and it is ok to separate that. is that ok by you? |
Half of the changes are new tests
Which issue does this PR close?
N/A
Part of:
Benchmark is in:
CASE WHEN#18203Rationale for this change
Optimize for Lookup table like
CASE WHEN:CASE company WHEN 1 THEN 'Apple' WHEN 5 THEN 'Samsung' WHEN 2 THEN 'Motorola' WHEN 3 THEN 'LG' ELSE 'Other' ENDWhat changes are included in this PR?
Implement the case when as a lookup table
Are these changes tested?
Yes, a lot
Are there any user-facing changes?
Nope
Benchmark results
(run against main before the #18152 was merged)
Env
Formatted output
Lookup table from
i32toutf8Details
Lookup table from
utf8toi32Details
criterion output:
Benchmark results