Perf: Add prefix compare for inlined compare and change use of inline_value to inline it to a u128 #7748

zhuqi-lucas · 2025-06-23T14:42:26Z

Which issue does this PR close?

Closes #7743

Rationale for this change

Change the fast path to use u128 to compare for lt case, also for inline <12 case to use u128 to compare.

Also when we have > 12 data buffer case, we change 4 bytes compare from each byte compare to u32 compare.

What changes are included in this PR?

Change the fast path to use u128 to compare for lt case, also for inline <12 case to use u128 to compare.

Also when we have > 12 data buffer case, we change 4 bytes compare from each byte compare to u32 compare.

Are there any user-facing changes?

No

zhuqi-lucas · 2025-06-23T14:43:57Z

Peformance result:

2.53 faster for lt StringViewArray StringViewArray inlined bytes
1.16 faster for lt scalar StringViewArray

critcmp  main issue_7743  --filter "iew"
group                                                                                                    issue_7743                             main
-----                                                                                                    ----------                             ----
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar complex        1.00  1250.8±16.53µs        ? ?/sec    1.02  1270.1±34.61µs        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar contains       1.01  1224.8±12.97µs        ? ?/sec    1.00  1218.6±18.65µs        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar ends with      1.00    673.7±3.52µs        ? ?/sec    1.00    676.2±6.47µs        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar starts with    1.00    615.0±6.85µs        ? ?/sec    1.00    614.5±6.27µs        ? ?/sec
eq StringViewArray StringViewArray                                                                       1.04      6.0±0.05ms        ? ?/sec    1.00      5.8±0.06ms        ? ?/sec
eq StringViewArray StringViewArray inlined bytes                                                         1.00      2.7±0.03ms        ? ?/sec    1.01      2.7±0.05ms        ? ?/sec
eq long same prefix strings StringViewArray                                                              1.02    512.6±8.67µs        ? ?/sec    1.00    503.9±9.59µs        ? ?/sec
eq scalar StringViewArray 13 bytes                                                                       1.00      3.6±0.04ms        ? ?/sec    1.02      3.7±0.03ms        ? ?/sec
eq scalar StringViewArray 4 bytes                                                                        1.02      3.9±0.03ms        ? ?/sec    1.00      3.8±0.04ms        ? ?/sec
eq scalar StringViewArray 6 bytes                                                                        1.02      3.9±0.03ms        ? ?/sec    1.00      3.8±0.04ms        ? ?/sec
like_utf8view scalar complex                                                                             1.01     81.0±0.86ms        ? ?/sec    1.00     80.3±0.62ms        ? ?/sec
like_utf8view scalar contains                                                                            1.00     76.5±0.59ms        ? ?/sec    1.00     76.3±0.95ms        ? ?/sec
like_utf8view scalar ends with 13 bytes                                                                  1.01     19.0±0.25ms        ? ?/sec    1.00     18.9±0.26ms        ? ?/sec
like_utf8view scalar ends with 4 bytes                                                                   1.00     19.0±0.22ms        ? ?/sec    1.00     19.0±0.23ms        ? ?/sec
like_utf8view scalar ends with 6 bytes                                                                   1.00     18.9±0.18ms        ? ?/sec    1.00     19.0±0.24ms        ? ?/sec
like_utf8view scalar equals                                                                              1.01     14.0±0.15ms        ? ?/sec    1.00     13.9±0.16ms        ? ?/sec
like_utf8view scalar starts with 13 bytes                                                                1.01     18.3±0.24ms        ? ?/sec    1.00     18.2±0.27ms        ? ?/sec
like_utf8view scalar starts with 4 bytes                                                                 1.00     10.7±0.18ms        ? ?/sec    1.00     10.7±0.23ms        ? ?/sec
like_utf8view scalar starts with 6 bytes                                                                 1.00     18.5±0.22ms        ? ?/sec    1.00     18.4±0.24ms        ? ?/sec
long same prefix strings like_utf8view scalar complex                                                    1.01   617.5±11.35µs        ? ?/sec    1.00    610.9±7.55µs        ? ?/sec
long same prefix strings like_utf8view scalar contains                                                   1.01  1834.8±21.26µs        ? ?/sec    1.00  1810.7±21.24µs        ? ?/sec
long same prefix strings like_utf8view scalar ends with                                                  1.00    605.1±9.37µs        ? ?/sec    1.00    603.2±6.33µs        ? ?/sec
long same prefix strings like_utf8view scalar equals                                                     1.01    196.9±5.64µs        ? ?/sec    1.00    194.2±4.78µs        ? ?/sec
long same prefix strings like_utf8view scalar starts with                                                1.00    669.1±5.56µs        ? ?/sec    1.00    666.8±5.22µs        ? ?/sec
lt StringViewArray StringViewArray inlined bytes                                                         1.00     19.0±0.39ms        ? ?/sec    2.53     48.1±1.10ms        ? ?/sec
lt long same prefix strings StringViewArray                                                              1.00    485.9±8.31µs        ? ?/sec    1.04    504.3±8.39µs        ? ?/sec
lt scalar StringViewArray                                                                                1.00     22.3±1.39ms        ? ?/sec    1.16     25.8±1.28ms        ? ?/sec
neq long same prefix strings StringViewArray                                                             1.00    506.5±7.39µs        ? ?/sec    1.00   506.4±10.21µs        ? ?/sec

zhuqi-lucas · 2025-06-23T14:45:18Z

Note:

I still don't change the 4 bytes inline compare, because this case we still need to get the prefix for both, it will not benefit a lot since we will at most has 4 bytes to compare for this inline compare.

I will also try to optimize the 4 bytes inline compare later.

Dandandan · 2025-06-25T05:26:25Z

arrow-array/src/array/byte_view_array.rs

+            const DATA_MASK: u128 = !0u128 << 32;
+
+            // Remove the length bits, leaving only the data
+            let l_data = (l_bits & DATA_MASK) >> 32;


I wonder if we can avoid some of the bit masking by using ByteView here ?

Thank you @Dandandan , reduce the bit masking in latest PR, i think ByteView from will use all convert to the struct field, we only use part of them, so i just use part of logic in latest PR.

impl From<u128> for ByteView { #[inline] fn from(value: u128) -> Self { Self { length: value as u32, prefix: (value >> 32) as u32, buffer_index: (value >> 64) as u32, offset: (value >> 96) as u32, } } }

I think the compiler might be smart enough to see it is actually the same (I changed it somewhere else, but couldn't detect performance difference).

Interesting @Dandandan , let me try using ByteView and compare performance!

Dandandan · 2025-06-25T05:28:51Z

arrow-ord/src/cmp.rs

+            let l_data = (l_bits & DATA_MASK) >> 32;
+            let r_data = (r_bits & DATA_MASK) >> 32;


Same here :)

Dandandan · 2025-06-25T05:29:58Z

lt StringViewArray StringViewArray inlined bytes 1.00 19.0±0.39ms ? ?/sec 2.53 48.1±1.10ms ? ?/sec

Amazing 😎 !! Perhaps we can use some less manual bit masking/shifting while still producing roughly the same code?

zhuqi-lucas · 2025-06-25T10:26:39Z

lt StringViewArray StringViewArray inlined bytes 1.00 19.0±0.39ms ? ?/sec 2.53 48.1±1.10ms ? ?/sec

Amazing 😎 !! Perhaps we can use some less manual bit masking/shifting while still producing roughly the same code?

Thank you @Dandandan for review, i polished the code, now the performance is even better:

lt StringViewArray StringViewArray inlined bytes    1.00     16.4±0.50ms        ? ?/sec    2.98     48.7±0.71ms        ? ?/sec

About 3 faster comparing to main branch.

jhorstmann · 2025-06-25T10:28:43Z

arrow-array/src/array/byte_view_array.rs

+            let min_len = l_len.min(r_len);
+            // We have all 12 bytes in the high bits, but only want the top min_len
+            let shift = (12 - min_len) * 8;
+            let l_partial = l_be >> shift;


It might be possible to OR the length back into the lower bits, which would then allow getting a result with a single u128 comparison. I think it would also be beneficial to extract this code block into a shared helper function, and add some unit tests for it. The generic code here might not be well convered by tests because of the fast path for inline buffers elsewhere.

Thank you @jhorstmann for review, good suggestion, i will address it!

Dandandan · 2025-06-25T11:01:10Z

arrow-ord/src/cmp.rs

+        let shift = (12 - min_len) * 8;
+        let l_partial = l_be >> shift;
+        let r_partial = r_be >> shift;
+        if l_partial < r_partial {


this can be written as l_partial.cmp(r_partial).then_with(|| { l_len.cmp(&r_len) })

arrow-ord/src/cmp.rs

Dandandan · 2025-06-25T11:32:52Z

arrow-array/src/array/byte_view_array.rs

+            return l_len.cmp(&r_len);
        }

        // one of the string is larger than 12 bytes,


You can change this code below to use (l_view >> 32) as u32 as well (or ByteView if it generates the same code). It seems that is a bit faster for the prefix comparison:

lt scalar StringViewArray time: [34.533 ms 34.567 ms 34.601 ms] change: [−11.030% −10.827% −10.620%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%)

Good point @Dandandan , i will change to ByteView prefix.

zhuqi-lucas · 2025-06-25T14:36:45Z

Addressed comments in latest PR, thanks!

Updated @Dandandan @jhorstmann @alamb , amazing result for latest PR:

6.x faster for lt inlined bytes

and

1.3x faster for lt scalar StringViewArray

lt StringViewArray StringViewArray inlined bytes                                                         1.00      7.8±0.29ms        ? ?/sec    6.21     48.7±0.71ms        ? ?/sec
lt scalar StringViewArray                                                                                1.00     19.5±1.97ms        ? ?/sec    1.32     25.8±1.28ms        ? ?/sec

critcmp  main issue_7743  --filter "iew"
group                                                                                                    issue_7743                             main
-----                                                                                                    ----------                             ----
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar complex        1.00  1208.3±22.24µs        ? ?/sec    1.05  1270.1±34.61µs        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar contains       1.00  1221.9±32.61µs        ? ?/sec    1.00  1218.6±18.65µs        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar ends with      1.00    677.7±9.00µs        ? ?/sec    1.00    676.2±6.47µs        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar starts with    1.03   629.9±17.64µs        ? ?/sec    1.00    614.5±6.27µs        ? ?/sec
eq StringViewArray StringViewArray                                                                       1.00      5.7±0.06ms        ? ?/sec    1.02      5.8±0.06ms        ? ?/sec
eq StringViewArray StringViewArray inlined bytes                                                         1.00      2.7±0.01ms        ? ?/sec    1.00      2.7±0.03ms        ? ?/sec
eq long same prefix strings StringViewArray                                                              1.00    503.7±9.42µs        ? ?/sec    1.00    503.9±9.59µs        ? ?/sec
eq scalar StringViewArray 13 bytes                                                                       1.02      3.8±0.03ms        ? ?/sec    1.00      3.7±0.03ms        ? ?/sec
eq scalar StringViewArray 4 bytes                                                                        1.01      3.9±0.03ms        ? ?/sec    1.00      3.8±0.04ms        ? ?/sec
eq scalar StringViewArray 6 bytes                                                                        1.00      3.8±0.03ms        ? ?/sec    1.00      3.8±0.04ms        ? ?/sec
like_utf8view scalar complex                                                                             1.01     81.2±1.62ms        ? ?/sec    1.00     80.3±0.62ms        ? ?/sec
like_utf8view scalar contains                                                                            1.00     75.7±0.91ms        ? ?/sec    1.01     76.3±0.95ms        ? ?/sec
like_utf8view scalar ends with 13 bytes                                                                  1.00     18.8±0.23ms        ? ?/sec    1.00     18.9±0.26ms        ? ?/sec
like_utf8view scalar ends with 4 bytes                                                                   1.00     18.7±0.24ms        ? ?/sec    1.01     19.0±0.23ms        ? ?/sec
like_utf8view scalar ends with 6 bytes                                                                   1.00     18.8±0.24ms        ? ?/sec    1.01     19.0±0.24ms        ? ?/sec
like_utf8view scalar equals                                                                              1.00     13.9±0.11ms        ? ?/sec    1.01     13.9±0.16ms        ? ?/sec
like_utf8view scalar starts with 13 bytes                                                                1.02     18.5±0.37ms        ? ?/sec    1.00     18.2±0.27ms        ? ?/sec
like_utf8view scalar starts with 4 bytes                                                                 1.00     10.5±0.16ms        ? ?/sec    1.02     10.7±0.23ms        ? ?/sec
like_utf8view scalar starts with 6 bytes                                                                 1.00     18.3±0.17ms        ? ?/sec    1.01     18.4±0.24ms        ? ?/sec
long same prefix strings like_utf8view scalar complex                                                    1.00    601.3±2.97µs        ? ?/sec    1.02    610.9±7.55µs        ? ?/sec
long same prefix strings like_utf8view scalar contains                                                   1.00  1786.0±21.54µs        ? ?/sec    1.01  1810.7±21.24µs        ? ?/sec
long same prefix strings like_utf8view scalar ends with                                                  1.00    591.8±4.87µs        ? ?/sec    1.02    603.2±6.33µs        ? ?/sec
long same prefix strings like_utf8view scalar equals                                                     1.00    193.7±4.41µs        ? ?/sec    1.00    194.2±4.78µs        ? ?/sec
long same prefix strings like_utf8view scalar starts with                                                1.00    667.8±5.49µs        ? ?/sec    1.00    666.8±5.22µs        ? ?/sec
lt StringViewArray StringViewArray inlined bytes                                                         1.00      7.8±0.29ms        ? ?/sec    6.21     48.7±0.71ms        ? ?/sec
lt long same prefix strings StringViewArray                                                              1.00   486.2±12.05µs        ? ?/sec    1.04    504.3±8.39µs        ? ?/sec
lt scalar StringViewArray                                                                                1.00     19.5±1.97ms        ? ?/sec    1.32     25.8±1.28ms        ? ?/sec
neq long same prefix strings StringViewArray                                                             1.00   502.9±12.25µs        ? ?/sec    1.01   506.4±10.21µs        ? ?/sec

alamb · 2025-06-27T14:01:29Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubuntu SMP Thu Apr 24 20:41:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing issue_7743 (96fd53a) to b269422 diff
BENCH_NAME=comparison_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench comparison_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=issue_7743
Results will be posted here when complete

alamb

Thank you @zhuqi-lucas and @Dandandan - this is really exciting to see.

I am not sure about the length handling, but I think concern can be rectified with a few more tests

alamb · 2025-06-27T13:54:24Z

arrow-array/src/array/byte_view_array.rs

        l_full_data.cmp(r_full_data)
    }
+
+    /// Builds a 128-bit composite key for an inline value:


Sorry -- I left comments for this function on the wrong PR: #7792 (comment)

Basically I think it would be very helpful to explain what properties the resulting u128 has

arrow-array/src/array/byte_view_array.rs

alamb · 2025-06-27T13:59:29Z

arrow-array/src/array/byte_view_array.rs

+            let raw = make_raw_inline(input.len() as u32, input);
+            let key = GenericByteViewArray::<BinaryViewType>::inline_key_fast(raw);
+
+            // Validate that keys are monotonically increasing in lexicographic+length order


I recommend updating this test with:

Strings that are the same length but compare lexically different (aaa vs aab for example)

That the comparison is the same as using the GenericBinaryArray accessors

So for example like

let array = GenericBinaryArray::from(test_inputs); ... // compare using &str semantics assert!(array.value(i) < array.value(i+1)) // and then compare using the fast key comparison assert!(make_raw_inline(array.views()[i])< make_raw_inline(arrays.views()[i+1]));

Thank you @alamb for good suggestion, i will try to add more testing.

Addressed in latest PR, thank you @alamb !

alamb · 2025-06-27T14:24:35Z

BTW I think this is a really nice PR @zhuqi-lucas -- it is am amzing result and quite clever and is a nice illustration of the level of attention of detail required for quality high performance engineering

alamb · 2025-06-27T14:37:36Z

🤖: Benchmark completed

Details

group                                                                                                    issue_7743                             main
-----                                                                                                    ----------                             ----
StringArray: regexp_matches_utf8 scalar benchmarks/regexp_matches_utf8 scalar complex                    1.00      2.7±0.03ms        ? ?/sec    1.00      2.7±0.03ms        ? ?/sec
StringArray: regexp_matches_utf8 scalar benchmarks/regexp_matches_utf8 scalar contains                   1.00      2.8±0.03ms        ? ?/sec    1.02      2.8±0.03ms        ? ?/sec
StringArray: regexp_matches_utf8 scalar benchmarks/regexp_matches_utf8 scalar ends with                  1.00      2.2±0.03ms        ? ?/sec    1.00      2.2±0.03ms        ? ?/sec
StringArray: regexp_matches_utf8 scalar benchmarks/regexp_matches_utf8 scalar starts with                1.00      2.1±0.02ms        ? ?/sec    1.00      2.1±0.04ms        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar complex        1.00      2.8±0.03ms        ? ?/sec    1.01      2.8±0.03ms        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar contains       1.00      2.9±0.04ms        ? ?/sec    1.01      2.9±0.04ms        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar ends with      1.00      2.2±0.03ms        ? ?/sec    1.00      2.2±0.04ms        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar starts with    1.01      2.2±0.02ms        ? ?/sec    1.00      2.2±0.03ms        ? ?/sec
eq Float32                                                                                               1.00     44.3±0.17µs        ? ?/sec    1.00     44.3±0.17µs        ? ?/sec
eq Int32                                                                                                 1.00     44.3±0.15µs        ? ?/sec    1.00     44.2±0.15µs        ? ?/sec
eq MonthDayNano                                                                                          1.02     94.8±4.86µs        ? ?/sec    1.00     92.6±3.55µs        ? ?/sec
eq StringArray StringArray                                                                               1.00     34.4±0.21ms        ? ?/sec    1.00     34.5±0.37ms        ? ?/sec
eq StringViewArray StringViewArray                                                                       1.00     26.5±0.07ms        ? ?/sec    1.00     26.6±0.21ms        ? ?/sec
eq StringViewArray StringViewArray inlined bytes                                                         1.00     24.1±0.20ms        ? ?/sec    1.01     24.3±0.11ms        ? ?/sec
eq dictionary[10] string[4])                                                                             1.00    810.9±1.75µs        ? ?/sec    1.00    813.2±1.57µs        ? ?/sec
eq long same prefix strings StringArray                                                                  1.00    570.1±5.28µs        ? ?/sec    1.00    569.1±7.57µs        ? ?/sec
eq long same prefix strings StringViewArray                                                              1.00    980.1±3.24µs        ? ?/sec    1.00    975.8±3.46µs        ? ?/sec
eq scalar Float32                                                                                        1.00     44.1±0.14µs        ? ?/sec    1.00     44.1±0.08µs        ? ?/sec
eq scalar Int32                                                                                          1.00     44.1±0.05µs        ? ?/sec    1.00     44.2±0.13µs        ? ?/sec
eq scalar MonthDayNano                                                                                   1.00     51.0±0.36µs        ? ?/sec    1.01     51.5±0.73µs        ? ?/sec
eq scalar StringArray                                                                                    1.00     26.0±0.42ms        ? ?/sec    1.00     26.0±0.37ms        ? ?/sec
eq scalar StringViewArray 13 bytes                                                                       1.00     17.3±0.11ms        ? ?/sec    1.01     17.5±0.10ms        ? ?/sec
eq scalar StringViewArray 4 bytes                                                                        1.00     17.5±0.14ms        ? ?/sec    1.01     17.8±0.12ms        ? ?/sec
eq scalar StringViewArray 6 bytes                                                                        1.00     17.5±0.17ms        ? ?/sec    1.02     17.8±0.13ms        ? ?/sec
eq_dyn_utf8_scalar dictionary[10] string[4])                                                             1.00     77.1±0.12µs        ? ?/sec    1.00     77.1±0.18µs        ? ?/sec
gt Float32                                                                                               1.00     57.1±0.15µs        ? ?/sec    1.01     57.7±0.90µs        ? ?/sec
gt Int32                                                                                                 1.00     44.2±0.08µs        ? ?/sec    1.00     44.2±0.13µs        ? ?/sec
gt scalar Float32                                                                                        1.00     45.8±0.12µs        ? ?/sec    1.00     45.8±0.09µs        ? ?/sec
gt scalar Int32                                                                                          1.00     44.1±0.09µs        ? ?/sec    1.00     44.2±0.13µs        ? ?/sec
gt_eq Float32                                                                                            1.00     57.0±0.11µs        ? ?/sec    1.00     57.3±0.13µs        ? ?/sec
gt_eq Int32                                                                                              1.00     44.2±0.12µs        ? ?/sec    1.00     44.2±0.12µs        ? ?/sec
gt_eq scalar Float32                                                                                     1.00     46.5±0.10µs        ? ?/sec    1.00     46.5±0.13µs        ? ?/sec
gt_eq scalar Int32                                                                                       1.00     44.1±0.06µs        ? ?/sec    1.00     44.2±0.10µs        ? ?/sec
gt_eq_dyn_utf8_scalar scalar dictionary[10] string[4])                                                   1.00     77.1±0.24µs        ? ?/sec    1.00     77.2±0.19µs        ? ?/sec
ilike_utf8 scalar complex                                                                                1.00      2.9±0.06ms        ? ?/sec    1.00      2.9±0.07ms        ? ?/sec
ilike_utf8 scalar contains                                                                               1.00      4.4±0.06ms        ? ?/sec    1.02      4.5±0.08ms        ? ?/sec
ilike_utf8 scalar ends with                                                                              1.02  1126.8±36.47µs        ? ?/sec    1.00  1099.5±20.51µs        ? ?/sec
ilike_utf8 scalar equals                                                                                 1.01   639.2±45.96µs        ? ?/sec    1.00   630.9±15.37µs        ? ?/sec
ilike_utf8 scalar starts with                                                                            1.01  1053.8±32.81µs        ? ?/sec    1.00  1043.5±43.25µs        ? ?/sec
ilike_utf8_scalar_dyn dictionary[10] string[4])                                                          1.00     77.5±0.10µs        ? ?/sec    1.00     77.6±0.17µs        ? ?/sec
like_utf8 scalar complex                                                                                 1.00      2.1±0.03ms        ? ?/sec    1.00      2.1±0.04ms        ? ?/sec
like_utf8 scalar contains                                                                                1.00  1821.9±18.50µs        ? ?/sec    1.00  1818.4±18.68µs        ? ?/sec
like_utf8 scalar ends with                                                                               1.00    397.7±7.45µs        ? ?/sec    1.06   422.1±12.90µs        ? ?/sec
like_utf8 scalar equals                                                                                  1.08     80.9±0.30µs        ? ?/sec    1.00     75.2±0.23µs        ? ?/sec
like_utf8 scalar starts with                                                                             1.00    321.2±8.46µs        ? ?/sec    1.03    330.0±6.60µs        ? ?/sec
like_utf8_scalar_dyn dictionary[10] string[4])                                                           1.00     77.3±0.13µs        ? ?/sec    1.00     77.4±0.16µs        ? ?/sec
like_utf8view scalar complex                                                                             1.02    207.1±1.25ms        ? ?/sec    1.00    202.8±0.65ms        ? ?/sec
like_utf8view scalar contains                                                                            1.00    160.9±0.34ms        ? ?/sec    1.02    164.1±0.36ms        ? ?/sec
like_utf8view scalar ends with 13 bytes                                                                  1.00     51.1±0.28ms        ? ?/sec    1.00     51.0±0.34ms        ? ?/sec
like_utf8view scalar ends with 4 bytes                                                                   1.00     51.3±0.37ms        ? ?/sec    1.01     51.7±0.34ms        ? ?/sec
like_utf8view scalar ends with 6 bytes                                                                   1.00     51.1±0.24ms        ? ?/sec    1.01     51.6±0.29ms        ? ?/sec
like_utf8view scalar equals                                                                              1.00     34.7±0.12ms        ? ?/sec    1.01     35.0±0.12ms        ? ?/sec
like_utf8view scalar starts with 13 bytes                                                                1.03     50.4±0.38ms        ? ?/sec    1.00     49.1±0.26ms        ? ?/sec
like_utf8view scalar starts with 4 bytes                                                                 1.00     28.5±0.15ms        ? ?/sec    1.01     28.7±0.17ms        ? ?/sec
like_utf8view scalar starts with 6 bytes                                                                 1.00     50.4±0.28ms        ? ?/sec    1.01     50.6±0.29ms        ? ?/sec
long same prefix strings like_utf8 scalar complex                                                        1.00   1537.4±4.40µs        ? ?/sec    1.02  1566.6±36.18µs        ? ?/sec
long same prefix strings like_utf8 scalar contains                                                       1.03      4.2±0.01ms        ? ?/sec    1.00      4.1±0.01ms        ? ?/sec
long same prefix strings like_utf8 scalar ends with                                                      1.00   1535.6±4.68µs        ? ?/sec    1.01   1558.4±2.92µs        ? ?/sec
long same prefix strings like_utf8 scalar equals                                                         1.00    488.5±2.55µs        ? ?/sec    1.01    493.8±2.94µs        ? ?/sec
long same prefix strings like_utf8 scalar starts with                                                    1.06  1953.4±13.30µs        ? ?/sec    1.00   1839.4±6.24µs        ? ?/sec
long same prefix strings like_utf8view scalar complex                                                    1.00   1608.0±4.15µs        ? ?/sec    1.00   1601.9±6.48µs        ? ?/sec
long same prefix strings like_utf8view scalar contains                                                   1.01      4.2±0.01ms        ? ?/sec    1.00      4.1±0.01ms        ? ?/sec
long same prefix strings like_utf8view scalar ends with                                                  1.01   1605.8±3.19µs        ? ?/sec    1.00   1597.0±4.15µs        ? ?/sec
long same prefix strings like_utf8view scalar equals                                                     1.00    574.7±2.67µs        ? ?/sec    1.00    573.1±2.10µs        ? ?/sec
long same prefix strings like_utf8view scalar starts with                                                1.00   1888.2±5.04µs        ? ?/sec    1.01  1914.6±12.17µs        ? ?/sec
lt Float32                                                                                               1.00     57.2±0.28µs        ? ?/sec    1.00     57.4±0.19µs        ? ?/sec
lt Int32                                                                                                 1.00     44.2±0.09µs        ? ?/sec    1.00     44.2±0.08µs        ? ?/sec
lt StringViewArray StringViewArray inlined bytes                                                         1.00     29.2±0.14ms        ? ?/sec    1.66     48.4±0.12ms        ? ?/sec
lt long same prefix strings StringArray                                                                  1.04    671.6±4.46µs        ? ?/sec    1.00    645.9±3.70µs        ? ?/sec
lt long same prefix strings StringViewArray                                                              1.02    900.2±6.07µs        ? ?/sec    1.00    883.0±3.22µs        ? ?/sec
lt scalar Float32                                                                                        1.00     46.4±0.08µs        ? ?/sec    1.00     46.5±0.32µs        ? ?/sec
lt scalar Int32                                                                                          1.00     44.1±0.06µs        ? ?/sec    1.00     44.2±0.16µs        ? ?/sec
lt scalar StringArray                                                                                    1.00     46.8±0.25ms        ? ?/sec    1.00     47.0±0.27ms        ? ?/sec
lt scalar StringViewArray                                                                                1.00     49.8±0.12ms        ? ?/sec    1.30     64.8±0.14ms        ? ?/sec
lt_eq Float32                                                                                            1.00     57.3±0.26µs        ? ?/sec    1.01     57.7±0.21µs        ? ?/sec
lt_eq Int32                                                                                              1.00     44.2±0.11µs        ? ?/sec    1.00     44.3±0.16µs        ? ?/sec
lt_eq scalar Float32                                                                                     1.00     45.8±0.08µs        ? ?/sec    1.00     45.8±0.09µs        ? ?/sec
lt_eq scalar Int32                                                                                       1.00     44.1±0.05µs        ? ?/sec    1.00     44.1±0.07µs        ? ?/sec
neq Float32                                                                                              1.00     44.2±0.12µs        ? ?/sec    1.00     44.2±0.10µs        ? ?/sec
neq Int32                                                                                                1.00     44.2±0.07µs        ? ?/sec    1.00     44.3±0.21µs        ? ?/sec
neq long same prefix strings StringArray                                                                 1.00    564.7±2.83µs        ? ?/sec    1.00    566.3±3.40µs        ? ?/sec
neq long same prefix strings StringViewArray                                                             1.00    980.0±3.81µs        ? ?/sec    1.00    975.7±3.67µs        ? ?/sec
neq scalar Float32                                                                                       1.00     44.1±0.07µs        ? ?/sec    1.00     44.1±0.06µs        ? ?/sec
neq scalar Int32                                                                                         1.00     44.1±0.11µs        ? ?/sec    1.00     44.1±0.06µs        ? ?/sec
nilike_utf8 scalar complex                                                                               1.00      2.9±0.06ms        ? ?/sec    1.02      2.9±0.06ms        ? ?/sec
nilike_utf8 scalar contains                                                                              1.00      4.4±0.07ms        ? ?/sec    1.00      4.4±0.05ms        ? ?/sec
nilike_utf8 scalar ends with                                                                             1.00  1150.7±35.76µs        ? ?/sec    1.01  1166.2±33.92µs        ? ?/sec
nilike_utf8 scalar equals                                                                                1.00   636.9±23.30µs        ? ?/sec    1.10   700.1±29.45µs        ? ?/sec
nilike_utf8 scalar starts with                                                                           1.00  1064.2±44.17µs        ? ?/sec    1.01  1076.6±38.68µs        ? ?/sec
nlike_utf8 scalar complex                                                                                1.01      2.2±0.03ms        ? ?/sec    1.00      2.2±0.05ms        ? ?/sec
nlike_utf8 scalar contains                                                                               1.00  1813.1±18.29µs        ? ?/sec    1.00  1811.9±19.05µs        ? ?/sec
nlike_utf8 scalar ends with                                                                              1.00   403.2±12.28µs        ? ?/sec    1.01   407.6±12.70µs        ? ?/sec
nlike_utf8 scalar equals                                                                                 1.08     80.9±0.27µs        ? ?/sec    1.00     74.9±0.23µs        ? ?/sec
nlike_utf8 scalar starts with                                                                            1.00   334.2±12.52µs        ? ?/sec    1.00   335.0±11.12µs        ? ?/sec

zhuqi-lucas · 2025-06-27T14:42:14Z

lt StringViewArray StringViewArray inlined bytes 1.00 29.2±0.14ms ? ?/sec 1.66 48.4±0.12ms ? ?/sec

Thank you @alamb, it seems the benchmark has less improvement than my local mac result, but it still good to see %66 and %30 improvement.

lt StringViewArray StringViewArray inlined bytes                                                         1.00     29.2±0.14ms        ? ?/sec    1.66     48.4±0.12ms        ? ?/sec
lt scalar StringViewArray                                                                                1.00     49.8±0.12ms        ? ?/sec    1.30     64.8±0.14ms        ? ?/sec

alamb · 2025-06-27T14:50:20Z

Thank you @alamb, it seems the benchmark has less improvement than my local mac result, but it still good to see %66 and %30 improvement.

Indeed those are pretty amazing results for something like comparisons

alamb

Nice work @zhuqi-lucas -- thank you

alamb · 2025-06-27T15:59:05Z

arrow-array/src/array/byte_view_array.rs

+            previous_key = Some(key);
+        }
+
+        // 2) Cross-check against GenericBinaryArray comparison


I think technically speaking this second loop does everything the first loop does and thus the first loop is redundant

Good point @alamb , thank you, addressed it in latest PR!

zhuqi-lucas · 2025-06-28T07:53:17Z

The integration CI seems flaky recently, i am not sure why.

alamb · 2025-06-28T23:43:21Z

The integration CI seems flaky recently, i am not sure why.

Yeah, @etseidl and I have seen that too -- I filed a ticket

Intermittent CI failure of integration / Archery test With other arrows (pull_request) fails #7817

alamb · 2025-06-29T09:02:40Z

Thanks again @zhuqi-lucas and @Dandandan

# Which issue does this PR close? This is a follow-up for #7748 In theory we can custom string view compare, and make it crazy faster. - Closes [#7790](#7790) # Rationale for this change In theory we can custom string view compare, and make it crazy faster. # What changes are included in this PR? In theory we can custom string view compare, and make it crazy faster. # Are these changes tested? Yes # Are there any user-facing changes? No

alamb · 2025-07-05T14:41:18Z

I believe this PR introduced a bug in comparisons, see

Incorrect inlined string view comparison after " Add prefix compare for inlined" #7874

zhuqi-lucas added 3 commits June 23, 2025 21:26

Perf: Change use of inline_value to inline it to a u128

5968c4f

improve lt scalar

061243c

comments

59ede5b

github-actions bot added the arrow Changes to the arrow crate label Jun 23, 2025

zhuqi-lucas mentioned this pull request Jun 24, 2025

Use prefix first for comparisons, resort to data buffer for remaining data on equal values #7744

Closed

zhuqi-lucas changed the title ~~Perf: Change use of inline_value to inline it to a u128~~ Perf: Add prefix compare for inlined compare and change use of inline_value to inline it to a u128 Jun 25, 2025

Dandandan reviewed Jun 25, 2025

View reviewed changes

zhuqi-lucas added 4 commits June 25, 2025 15:40

Merge remote-tracking branch 'upstream/main' into issue_7743

e673331

continue improve performance

b805dfe

fix

5980297

optimize u64 to u32

6fb2f84

jhorstmann reviewed Jun 25, 2025

View reviewed changes

reduce memory access count

3428523

Dandandan reviewed Jun 25, 2025

View reviewed changes

arrow-ord/src/cmp.rs Outdated Show resolved Hide resolved

Dandandan reviewed Jun 25, 2025

View reviewed changes

continue improve performance

1b5a1da

zhuqi-lucas force-pushed the issue_7743 branch from a43bcdc to 1b5a1da Compare June 25, 2025 13:32

zhuqi-lucas added 6 commits June 25, 2025 21:34

address comments

8d4da6a

clean code

52c2fc3

optimize

c8c7038

Add more comments

7194e8d

Add unit test

1a858a1

fmt

442402b

Dandandan approved these changes Jun 25, 2025

View reviewed changes

Merge remote-tracking branch 'upstream/main' into issue_7743

96fd53a

This was referenced Jun 26, 2025

Perf: optimize sort string_view performance #7790

Closed

Perf: Make sort string view fast(1.5X ~ 3X faster) #7792

Merged

alamb reviewed Jun 27, 2025

View reviewed changes

zhuqi-lucas added 4 commits June 27, 2025 23:06

Add more testing and comments, and address comments

9580de1

fix doc

5f80dc7

Merge remote-tracking branch 'upstream/main' into issue_7743

039e38b

fix clippy

918a789

alamb approved these changes Jun 27, 2025

View reviewed changes

zhuqi-lucas added 2 commits June 28, 2025 08:19

address comments

153cf85

Merge remote-tracking branch 'upstream/main' into issue_7743

d650fb3

alamb mentioned this pull request Jun 28, 2025

Intermittent CI failure of integration / Archery test With other arrows (pull_request) fails #7817

Open

alamb merged commit aa96097 into apache:main Jun 29, 2025
29 of 30 checks passed

This was referenced Jun 30, 2025

Continue optimizing the CursorValues compare for StringViewArray apache/datafusion#16629

Closed

Perf: fast CursorValues compare for StringViewArray using inline_key_… apache/datafusion#16630

Merged

alamb mentioned this pull request Jul 5, 2025

Incorrect inlined string view comparison after " Add prefix compare for inlined" #7874

Closed

alamb mentioned this pull request Jul 28, 2025

Change use of inline_value to inline it to a u128 #7743

Closed

		let l_data = (l_bits & DATA_MASK) >> 32;
		let r_data = (r_bits & DATA_MASK) >> 32;

Perf: Add prefix compare for inlined compare and change use of inline_value to inline it to a u128 #7748

Perf: Add prefix compare for inlined compare and change use of inline_value to inline it to a u128 #7748

Conversation

zhuqi-lucas commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

zhuqi-lucas commented Jun 23, 2025

Uh oh!

zhuqi-lucas commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhuqi-lucas Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dandandan commented Jun 25, 2025

Uh oh!

zhuqi-lucas commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dandandan Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhuqi-lucas commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Jun 27, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Jun 27, 2025

Uh oh!

alamb commented Jun 27, 2025

Uh oh!

zhuqi-lucas commented Jun 27, 2025

Uh oh!

alamb commented Jun 27, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhuqi-lucas commented Jun 28, 2025

Uh oh!

alamb commented Jun 28, 2025

zhuqi-lucas commented Jun 23, 2025 •

edited

Loading

zhuqi-lucas commented Jun 23, 2025 •

edited

Loading

zhuqi-lucas Jun 25, 2025 •

edited

Loading

zhuqi-lucas commented Jun 25, 2025 •

edited

Loading

Dandandan Jun 25, 2025 •

edited

Loading

zhuqi-lucas commented Jun 25, 2025 •

edited

Loading