Skip to content

Conversation

zhuqi-lucas
Copy link
Contributor

@zhuqi-lucas zhuqi-lucas commented Jun 23, 2025

Which issue does this PR close?

Closes #7743

Rationale for this change

Change the fast path to use u128 to compare for lt case, also for inline <12 case to use u128 to compare.

Also when we have > 12 data buffer case, we change 4 bytes compare from each byte compare to u32 compare.

What changes are included in this PR?

Change the fast path to use u128 to compare for lt case, also for inline <12 case to use u128 to compare.

Also when we have > 12 data buffer case, we change 4 bytes compare from each byte compare to u32 compare.

Are there any user-facing changes?

No

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jun 23, 2025
@zhuqi-lucas
Copy link
Contributor Author

Peformance result:

2.53 faster for lt StringViewArray StringViewArray inlined bytes
1.16 faster for lt scalar StringViewArray

critcmp  main issue_7743  --filter "iew"
group                                                                                                    issue_7743                             main
-----                                                                                                    ----------                             ----
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar complex        1.00  1250.8±16.53µs        ? ?/sec    1.02  1270.1±34.61µs        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar contains       1.01  1224.8±12.97µs        ? ?/sec    1.00  1218.6±18.65µs        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar ends with      1.00    673.7±3.52µs        ? ?/sec    1.00    676.2±6.47µs        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar starts with    1.00    615.0±6.85µs        ? ?/sec    1.00    614.5±6.27µs        ? ?/sec
eq StringViewArray StringViewArray                                                                       1.04      6.0±0.05ms        ? ?/sec    1.00      5.8±0.06ms        ? ?/sec
eq StringViewArray StringViewArray inlined bytes                                                         1.00      2.7±0.03ms        ? ?/sec    1.01      2.7±0.05ms        ? ?/sec
eq long same prefix strings StringViewArray                                                              1.02    512.6±8.67µs        ? ?/sec    1.00    503.9±9.59µs        ? ?/sec
eq scalar StringViewArray 13 bytes                                                                       1.00      3.6±0.04ms        ? ?/sec    1.02      3.7±0.03ms        ? ?/sec
eq scalar StringViewArray 4 bytes                                                                        1.02      3.9±0.03ms        ? ?/sec    1.00      3.8±0.04ms        ? ?/sec
eq scalar StringViewArray 6 bytes                                                                        1.02      3.9±0.03ms        ? ?/sec    1.00      3.8±0.04ms        ? ?/sec
like_utf8view scalar complex                                                                             1.01     81.0±0.86ms        ? ?/sec    1.00     80.3±0.62ms        ? ?/sec
like_utf8view scalar contains                                                                            1.00     76.5±0.59ms        ? ?/sec    1.00     76.3±0.95ms        ? ?/sec
like_utf8view scalar ends with 13 bytes                                                                  1.01     19.0±0.25ms        ? ?/sec    1.00     18.9±0.26ms        ? ?/sec
like_utf8view scalar ends with 4 bytes                                                                   1.00     19.0±0.22ms        ? ?/sec    1.00     19.0±0.23ms        ? ?/sec
like_utf8view scalar ends with 6 bytes                                                                   1.00     18.9±0.18ms        ? ?/sec    1.00     19.0±0.24ms        ? ?/sec
like_utf8view scalar equals                                                                              1.01     14.0±0.15ms        ? ?/sec    1.00     13.9±0.16ms        ? ?/sec
like_utf8view scalar starts with 13 bytes                                                                1.01     18.3±0.24ms        ? ?/sec    1.00     18.2±0.27ms        ? ?/sec
like_utf8view scalar starts with 4 bytes                                                                 1.00     10.7±0.18ms        ? ?/sec    1.00     10.7±0.23ms        ? ?/sec
like_utf8view scalar starts with 6 bytes                                                                 1.00     18.5±0.22ms        ? ?/sec    1.00     18.4±0.24ms        ? ?/sec
long same prefix strings like_utf8view scalar complex                                                    1.01   617.5±11.35µs        ? ?/sec    1.00    610.9±7.55µs        ? ?/sec
long same prefix strings like_utf8view scalar contains                                                   1.01  1834.8±21.26µs        ? ?/sec    1.00  1810.7±21.24µs        ? ?/sec
long same prefix strings like_utf8view scalar ends with                                                  1.00    605.1±9.37µs        ? ?/sec    1.00    603.2±6.33µs        ? ?/sec
long same prefix strings like_utf8view scalar equals                                                     1.01    196.9±5.64µs        ? ?/sec    1.00    194.2±4.78µs        ? ?/sec
long same prefix strings like_utf8view scalar starts with                                                1.00    669.1±5.56µs        ? ?/sec    1.00    666.8±5.22µs        ? ?/sec
lt StringViewArray StringViewArray inlined bytes                                                         1.00     19.0±0.39ms        ? ?/sec    2.53     48.1±1.10ms        ? ?/sec
lt long same prefix strings StringViewArray                                                              1.00    485.9±8.31µs        ? ?/sec    1.04    504.3±8.39µs        ? ?/sec
lt scalar StringViewArray                                                                                1.00     22.3±1.39ms        ? ?/sec    1.16     25.8±1.28ms        ? ?/sec
neq long same prefix strings StringViewArray                                                             1.00    506.5±7.39µs        ? ?/sec    1.00   506.4±10.21µs        ? ?/sec

@zhuqi-lucas
Copy link
Contributor Author

zhuqi-lucas commented Jun 23, 2025

Note:

I still don't change the 4 bytes inline compare, because this case we still need to get the prefix for both, it will not benefit a lot since we will at most has 4 bytes to compare for this inline compare.

I will also try to optimize the 4 bytes inline compare later.

@zhuqi-lucas zhuqi-lucas changed the title Perf: Change use of inline_value to inline it to a u128 Perf: Add prefix compare for inlined compare and change use of inline_value to inline it to a u128 Jun 25, 2025
const DATA_MASK: u128 = !0u128 << 32;

// Remove the length bits, leaving only the data
let l_data = (l_bits & DATA_MASK) >> 32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can avoid some of the bit masking by using ByteView here ?

Copy link
Contributor Author

@zhuqi-lucas zhuqi-lucas Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Dandandan , reduce the bit masking in latest PR, i think ByteView from will use all convert to the struct field, we only use part of them, so i just use part of logic in latest PR.

impl From<u128> for ByteView {
    #[inline]
    fn from(value: u128) -> Self {
        Self {
            length: value as u32,
            prefix: (value >> 32) as u32,
            buffer_index: (value >> 64) as u32,
            offset: (value >> 96) as u32,
        }
    }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the compiler might be smart enough to see it is actually the same (I changed it somewhere else, but couldn't detect performance difference).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting @Dandandan , let me try using ByteView and compare performance!

Comment on lines 608 to 609
let l_data = (l_bits & DATA_MASK) >> 32;
let r_data = (r_bits & DATA_MASK) >> 32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here :)

@Dandandan
Copy link
Contributor

lt StringViewArray StringViewArray inlined bytes 1.00 19.0±0.39ms ? ?/sec 2.53 48.1±1.10ms ? ?/sec

Amazing 😎 !! Perhaps we can use some less manual bit masking/shifting while still producing roughly the same code?

@zhuqi-lucas
Copy link
Contributor Author

zhuqi-lucas commented Jun 25, 2025

lt StringViewArray StringViewArray inlined bytes 1.00 19.0±0.39ms ? ?/sec 2.53 48.1±1.10ms ? ?/sec

Amazing 😎 !! Perhaps we can use some less manual bit masking/shifting while still producing roughly the same code?

Thank you @Dandandan for review, i polished the code, now the performance is even better:

lt StringViewArray StringViewArray inlined bytes    1.00     16.4±0.50ms        ? ?/sec    2.98     48.7±0.71ms        ? ?/sec

About 3 faster comparing to main branch.

let min_len = l_len.min(r_len);
// We have all 12 bytes in the high bits, but only want the top min_len
let shift = (12 - min_len) * 8;
let l_partial = l_be >> shift;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be possible to OR the length back into the lower bits, which would then allow getting a result with a single u128 comparison. I think it would also be beneficial to extract this code block into a shared helper function, and add some unit tests for it. The generic code here might not be well convered by tests because of the fast path for inline buffers elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jhorstmann for review, good suggestion, i will address it!

let shift = (12 - min_len) * 8;
let l_partial = l_be >> shift;
let r_partial = r_be >> shift;
if l_partial < r_partial {
Copy link
Contributor

@Dandandan Dandandan Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be written as l_partial.cmp(r_partial).then_with(|| { l_len.cmp(&r_len) })

return l_len.cmp(&r_len);
}

// one of the string is larger than 12 bytes,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can change this code below to use (l_view >> 32) as u32 as well (or ByteView if it generates the same code). It seems that is a bit faster for the prefix comparison:

lt scalar StringViewArray
                        time:   [34.533 ms 34.567 ms 34.601 ms]
                        change: [−11.030% −10.827% −10.620%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @Dandandan , i will change to ByteView prefix.

@zhuqi-lucas
Copy link
Contributor Author

zhuqi-lucas commented Jun 25, 2025

Addressed comments in latest PR, thanks!

Updated @Dandandan @jhorstmann @alamb , amazing result for latest PR:

6.x faster for lt inlined bytes

and

1.3x faster for lt scalar StringViewArray

lt StringViewArray StringViewArray inlined bytes                                                         1.00      7.8±0.29ms        ? ?/sec    6.21     48.7±0.71ms        ? ?/sec
lt scalar StringViewArray                                                                                1.00     19.5±1.97ms        ? ?/sec    1.32     25.8±1.28ms        ? ?/sec
critcmp  main issue_7743  --filter "iew"
group                                                                                                    issue_7743                             main
-----                                                                                                    ----------                             ----
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar complex        1.00  1208.3±22.24µs        ? ?/sec    1.05  1270.1±34.61µs        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar contains       1.00  1221.9±32.61µs        ? ?/sec    1.00  1218.6±18.65µs        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar ends with      1.00    677.7±9.00µs        ? ?/sec    1.00    676.2±6.47µs        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar starts with    1.03   629.9±17.64µs        ? ?/sec    1.00    614.5±6.27µs        ? ?/sec
eq StringViewArray StringViewArray                                                                       1.00      5.7±0.06ms        ? ?/sec    1.02      5.8±0.06ms        ? ?/sec
eq StringViewArray StringViewArray inlined bytes                                                         1.00      2.7±0.01ms        ? ?/sec    1.00      2.7±0.03ms        ? ?/sec
eq long same prefix strings StringViewArray                                                              1.00    503.7±9.42µs        ? ?/sec    1.00    503.9±9.59µs        ? ?/sec
eq scalar StringViewArray 13 bytes                                                                       1.02      3.8±0.03ms        ? ?/sec    1.00      3.7±0.03ms        ? ?/sec
eq scalar StringViewArray 4 bytes                                                                        1.01      3.9±0.03ms        ? ?/sec    1.00      3.8±0.04ms        ? ?/sec
eq scalar StringViewArray 6 bytes                                                                        1.00      3.8±0.03ms        ? ?/sec    1.00      3.8±0.04ms        ? ?/sec
like_utf8view scalar complex                                                                             1.01     81.2±1.62ms        ? ?/sec    1.00     80.3±0.62ms        ? ?/sec
like_utf8view scalar contains                                                                            1.00     75.7±0.91ms        ? ?/sec    1.01     76.3±0.95ms        ? ?/sec
like_utf8view scalar ends with 13 bytes                                                                  1.00     18.8±0.23ms        ? ?/sec    1.00     18.9±0.26ms        ? ?/sec
like_utf8view scalar ends with 4 bytes                                                                   1.00     18.7±0.24ms        ? ?/sec    1.01     19.0±0.23ms        ? ?/sec
like_utf8view scalar ends with 6 bytes                                                                   1.00     18.8±0.24ms        ? ?/sec    1.01     19.0±0.24ms        ? ?/sec
like_utf8view scalar equals                                                                              1.00     13.9±0.11ms        ? ?/sec    1.01     13.9±0.16ms        ? ?/sec
like_utf8view scalar starts with 13 bytes                                                                1.02     18.5±0.37ms        ? ?/sec    1.00     18.2±0.27ms        ? ?/sec
like_utf8view scalar starts with 4 bytes                                                                 1.00     10.5±0.16ms        ? ?/sec    1.02     10.7±0.23ms        ? ?/sec
like_utf8view scalar starts with 6 bytes                                                                 1.00     18.3±0.17ms        ? ?/sec    1.01     18.4±0.24ms        ? ?/sec
long same prefix strings like_utf8view scalar complex                                                    1.00    601.3±2.97µs        ? ?/sec    1.02    610.9±7.55µs        ? ?/sec
long same prefix strings like_utf8view scalar contains                                                   1.00  1786.0±21.54µs        ? ?/sec    1.01  1810.7±21.24µs        ? ?/sec
long same prefix strings like_utf8view scalar ends with                                                  1.00    591.8±4.87µs        ? ?/sec    1.02    603.2±6.33µs        ? ?/sec
long same prefix strings like_utf8view scalar equals                                                     1.00    193.7±4.41µs        ? ?/sec    1.00    194.2±4.78µs        ? ?/sec
long same prefix strings like_utf8view scalar starts with                                                1.00    667.8±5.49µs        ? ?/sec    1.00    666.8±5.22µs        ? ?/sec
lt StringViewArray StringViewArray inlined bytes                                                         1.00      7.8±0.29ms        ? ?/sec    6.21     48.7±0.71ms        ? ?/sec
lt long same prefix strings StringViewArray                                                              1.00   486.2±12.05µs        ? ?/sec    1.04    504.3±8.39µs        ? ?/sec
lt scalar StringViewArray                                                                                1.00     19.5±1.97ms        ? ?/sec    1.32     25.8±1.28ms        ? ?/sec
neq long same prefix strings StringViewArray                                                             1.00   502.9±12.25µs        ? ?/sec    1.01   506.4±10.21µs        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Jun 27, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubuntu SMP Thu Apr 24 20:41:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing issue_7743 (96fd53a) to b269422 diff
BENCH_NAME=comparison_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench comparison_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=issue_7743
Results will be posted here when complete

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @zhuqi-lucas and @Dandandan - this is really exciting to see.

I am not sure about the length handling, but I think concern can be rectified with a few more tests

l_full_data.cmp(r_full_data)
}

/// Builds a 128-bit composite key for an inline value:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry -- I left comments for this function on the wrong PR: #7792 (comment)

Basically I think it would be very helpful to explain what properties the resulting u128 has

let raw = make_raw_inline(input.len() as u32, input);
let key = GenericByteViewArray::<BinaryViewType>::inline_key_fast(raw);

// Validate that keys are monotonically increasing in lexicographic+length order
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend updating this test with:

  1. Strings that are the same length but compare lexically different (aaa vs aab for example)
  2. That the comparison is the same as using the GenericBinaryArray accessors

So for example like

let array = GenericBinaryArray::from(test_inputs);
...
// compare using &str semantics
assert!(array.value(i) < array.value(i+1))
// and then compare using the fast key comparison
assert!(make_raw_inline(array.views()[i])< make_raw_inline(arrays.views()[i+1]));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @alamb for good suggestion, i will try to add more testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in latest PR, thank you @alamb !

@alamb
Copy link
Contributor

alamb commented Jun 27, 2025

BTW I think this is a really nice PR @zhuqi-lucas -- it is am amzing result and quite clever and is a nice illustration of the level of attention of detail required for quality high performance engineering

@alamb
Copy link
Contributor

alamb commented Jun 27, 2025

🤖: Benchmark completed

Details

group                                                                                                    issue_7743                             main
-----                                                                                                    ----------                             ----
StringArray: regexp_matches_utf8 scalar benchmarks/regexp_matches_utf8 scalar complex                    1.00      2.7±0.03ms        ? ?/sec    1.00      2.7±0.03ms        ? ?/sec
StringArray: regexp_matches_utf8 scalar benchmarks/regexp_matches_utf8 scalar contains                   1.00      2.8±0.03ms        ? ?/sec    1.02      2.8±0.03ms        ? ?/sec
StringArray: regexp_matches_utf8 scalar benchmarks/regexp_matches_utf8 scalar ends with                  1.00      2.2±0.03ms        ? ?/sec    1.00      2.2±0.03ms        ? ?/sec
StringArray: regexp_matches_utf8 scalar benchmarks/regexp_matches_utf8 scalar starts with                1.00      2.1±0.02ms        ? ?/sec    1.00      2.1±0.04ms        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar complex        1.00      2.8±0.03ms        ? ?/sec    1.01      2.8±0.03ms        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar contains       1.00      2.9±0.04ms        ? ?/sec    1.01      2.9±0.04ms        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar ends with      1.00      2.2±0.03ms        ? ?/sec    1.00      2.2±0.04ms        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar starts with    1.01      2.2±0.02ms        ? ?/sec    1.00      2.2±0.03ms        ? ?/sec
eq Float32                                                                                               1.00     44.3±0.17µs        ? ?/sec    1.00     44.3±0.17µs        ? ?/sec
eq Int32                                                                                                 1.00     44.3±0.15µs        ? ?/sec    1.00     44.2±0.15µs        ? ?/sec
eq MonthDayNano                                                                                          1.02     94.8±4.86µs        ? ?/sec    1.00     92.6±3.55µs        ? ?/sec
eq StringArray StringArray                                                                               1.00     34.4±0.21ms        ? ?/sec    1.00     34.5±0.37ms        ? ?/sec
eq StringViewArray StringViewArray                                                                       1.00     26.5±0.07ms        ? ?/sec    1.00     26.6±0.21ms        ? ?/sec
eq StringViewArray StringViewArray inlined bytes                                                         1.00     24.1±0.20ms        ? ?/sec    1.01     24.3±0.11ms        ? ?/sec
eq dictionary[10] string[4])                                                                             1.00    810.9±1.75µs        ? ?/sec    1.00    813.2±1.57µs        ? ?/sec
eq long same prefix strings StringArray                                                                  1.00    570.1±5.28µs        ? ?/sec    1.00    569.1±7.57µs        ? ?/sec
eq long same prefix strings StringViewArray                                                              1.00    980.1±3.24µs        ? ?/sec    1.00    975.8±3.46µs        ? ?/sec
eq scalar Float32                                                                                        1.00     44.1±0.14µs        ? ?/sec    1.00     44.1±0.08µs        ? ?/sec
eq scalar Int32                                                                                          1.00     44.1±0.05µs        ? ?/sec    1.00     44.2±0.13µs        ? ?/sec
eq scalar MonthDayNano                                                                                   1.00     51.0±0.36µs        ? ?/sec    1.01     51.5±0.73µs        ? ?/sec
eq scalar StringArray                                                                                    1.00     26.0±0.42ms        ? ?/sec    1.00     26.0±0.37ms        ? ?/sec
eq scalar StringViewArray 13 bytes                                                                       1.00     17.3±0.11ms        ? ?/sec    1.01     17.5±0.10ms        ? ?/sec
eq scalar StringViewArray 4 bytes                                                                        1.00     17.5±0.14ms        ? ?/sec    1.01     17.8±0.12ms        ? ?/sec
eq scalar StringViewArray 6 bytes                                                                        1.00     17.5±0.17ms        ? ?/sec    1.02     17.8±0.13ms        ? ?/sec
eq_dyn_utf8_scalar dictionary[10] string[4])                                                             1.00     77.1±0.12µs        ? ?/sec    1.00     77.1±0.18µs        ? ?/sec
gt Float32                                                                                               1.00     57.1±0.15µs        ? ?/sec    1.01     57.7±0.90µs        ? ?/sec
gt Int32                                                                                                 1.00     44.2±0.08µs        ? ?/sec    1.00     44.2±0.13µs        ? ?/sec
gt scalar Float32                                                                                        1.00     45.8±0.12µs        ? ?/sec    1.00     45.8±0.09µs        ? ?/sec
gt scalar Int32                                                                                          1.00     44.1±0.09µs        ? ?/sec    1.00     44.2±0.13µs        ? ?/sec
gt_eq Float32                                                                                            1.00     57.0±0.11µs        ? ?/sec    1.00     57.3±0.13µs        ? ?/sec
gt_eq Int32                                                                                              1.00     44.2±0.12µs        ? ?/sec    1.00     44.2±0.12µs        ? ?/sec
gt_eq scalar Float32                                                                                     1.00     46.5±0.10µs        ? ?/sec    1.00     46.5±0.13µs        ? ?/sec
gt_eq scalar Int32                                                                                       1.00     44.1±0.06µs        ? ?/sec    1.00     44.2±0.10µs        ? ?/sec
gt_eq_dyn_utf8_scalar scalar dictionary[10] string[4])                                                   1.00     77.1±0.24µs        ? ?/sec    1.00     77.2±0.19µs        ? ?/sec
ilike_utf8 scalar complex                                                                                1.00      2.9±0.06ms        ? ?/sec    1.00      2.9±0.07ms        ? ?/sec
ilike_utf8 scalar contains                                                                               1.00      4.4±0.06ms        ? ?/sec    1.02      4.5±0.08ms        ? ?/sec
ilike_utf8 scalar ends with                                                                              1.02  1126.8±36.47µs        ? ?/sec    1.00  1099.5±20.51µs        ? ?/sec
ilike_utf8 scalar equals                                                                                 1.01   639.2±45.96µs        ? ?/sec    1.00   630.9±15.37µs        ? ?/sec
ilike_utf8 scalar starts with                                                                            1.01  1053.8±32.81µs        ? ?/sec    1.00  1043.5±43.25µs        ? ?/sec
ilike_utf8_scalar_dyn dictionary[10] string[4])                                                          1.00     77.5±0.10µs        ? ?/sec    1.00     77.6±0.17µs        ? ?/sec
like_utf8 scalar complex                                                                                 1.00      2.1±0.03ms        ? ?/sec    1.00      2.1±0.04ms        ? ?/sec
like_utf8 scalar contains                                                                                1.00  1821.9±18.50µs        ? ?/sec    1.00  1818.4±18.68µs        ? ?/sec
like_utf8 scalar ends with                                                                               1.00    397.7±7.45µs        ? ?/sec    1.06   422.1±12.90µs        ? ?/sec
like_utf8 scalar equals                                                                                  1.08     80.9±0.30µs        ? ?/sec    1.00     75.2±0.23µs        ? ?/sec
like_utf8 scalar starts with                                                                             1.00    321.2±8.46µs        ? ?/sec    1.03    330.0±6.60µs        ? ?/sec
like_utf8_scalar_dyn dictionary[10] string[4])                                                           1.00     77.3±0.13µs        ? ?/sec    1.00     77.4±0.16µs        ? ?/sec
like_utf8view scalar complex                                                                             1.02    207.1±1.25ms        ? ?/sec    1.00    202.8±0.65ms        ? ?/sec
like_utf8view scalar contains                                                                            1.00    160.9±0.34ms        ? ?/sec    1.02    164.1±0.36ms        ? ?/sec
like_utf8view scalar ends with 13 bytes                                                                  1.00     51.1±0.28ms        ? ?/sec    1.00     51.0±0.34ms        ? ?/sec
like_utf8view scalar ends with 4 bytes                                                                   1.00     51.3±0.37ms        ? ?/sec    1.01     51.7±0.34ms        ? ?/sec
like_utf8view scalar ends with 6 bytes                                                                   1.00     51.1±0.24ms        ? ?/sec    1.01     51.6±0.29ms        ? ?/sec
like_utf8view scalar equals                                                                              1.00     34.7±0.12ms        ? ?/sec    1.01     35.0±0.12ms        ? ?/sec
like_utf8view scalar starts with 13 bytes                                                                1.03     50.4±0.38ms        ? ?/sec    1.00     49.1±0.26ms        ? ?/sec
like_utf8view scalar starts with 4 bytes                                                                 1.00     28.5±0.15ms        ? ?/sec    1.01     28.7±0.17ms        ? ?/sec
like_utf8view scalar starts with 6 bytes                                                                 1.00     50.4±0.28ms        ? ?/sec    1.01     50.6±0.29ms        ? ?/sec
long same prefix strings like_utf8 scalar complex                                                        1.00   1537.4±4.40µs        ? ?/sec    1.02  1566.6±36.18µs        ? ?/sec
long same prefix strings like_utf8 scalar contains                                                       1.03      4.2±0.01ms        ? ?/sec    1.00      4.1±0.01ms        ? ?/sec
long same prefix strings like_utf8 scalar ends with                                                      1.00   1535.6±4.68µs        ? ?/sec    1.01   1558.4±2.92µs        ? ?/sec
long same prefix strings like_utf8 scalar equals                                                         1.00    488.5±2.55µs        ? ?/sec    1.01    493.8±2.94µs        ? ?/sec
long same prefix strings like_utf8 scalar starts with                                                    1.06  1953.4±13.30µs        ? ?/sec    1.00   1839.4±6.24µs        ? ?/sec
long same prefix strings like_utf8view scalar complex                                                    1.00   1608.0±4.15µs        ? ?/sec    1.00   1601.9±6.48µs        ? ?/sec
long same prefix strings like_utf8view scalar contains                                                   1.01      4.2±0.01ms        ? ?/sec    1.00      4.1±0.01ms        ? ?/sec
long same prefix strings like_utf8view scalar ends with                                                  1.01   1605.8±3.19µs        ? ?/sec    1.00   1597.0±4.15µs        ? ?/sec
long same prefix strings like_utf8view scalar equals                                                     1.00    574.7±2.67µs        ? ?/sec    1.00    573.1±2.10µs        ? ?/sec
long same prefix strings like_utf8view scalar starts with                                                1.00   1888.2±5.04µs        ? ?/sec    1.01  1914.6±12.17µs        ? ?/sec
lt Float32                                                                                               1.00     57.2±0.28µs        ? ?/sec    1.00     57.4±0.19µs        ? ?/sec
lt Int32                                                                                                 1.00     44.2±0.09µs        ? ?/sec    1.00     44.2±0.08µs        ? ?/sec
lt StringViewArray StringViewArray inlined bytes                                                         1.00     29.2±0.14ms        ? ?/sec    1.66     48.4±0.12ms        ? ?/sec
lt long same prefix strings StringArray                                                                  1.04    671.6±4.46µs        ? ?/sec    1.00    645.9±3.70µs        ? ?/sec
lt long same prefix strings StringViewArray                                                              1.02    900.2±6.07µs        ? ?/sec    1.00    883.0±3.22µs        ? ?/sec
lt scalar Float32                                                                                        1.00     46.4±0.08µs        ? ?/sec    1.00     46.5±0.32µs        ? ?/sec
lt scalar Int32                                                                                          1.00     44.1±0.06µs        ? ?/sec    1.00     44.2±0.16µs        ? ?/sec
lt scalar StringArray                                                                                    1.00     46.8±0.25ms        ? ?/sec    1.00     47.0±0.27ms        ? ?/sec
lt scalar StringViewArray                                                                                1.00     49.8±0.12ms        ? ?/sec    1.30     64.8±0.14ms        ? ?/sec
lt_eq Float32                                                                                            1.00     57.3±0.26µs        ? ?/sec    1.01     57.7±0.21µs        ? ?/sec
lt_eq Int32                                                                                              1.00     44.2±0.11µs        ? ?/sec    1.00     44.3±0.16µs        ? ?/sec
lt_eq scalar Float32                                                                                     1.00     45.8±0.08µs        ? ?/sec    1.00     45.8±0.09µs        ? ?/sec
lt_eq scalar Int32                                                                                       1.00     44.1±0.05µs        ? ?/sec    1.00     44.1±0.07µs        ? ?/sec
neq Float32                                                                                              1.00     44.2±0.12µs        ? ?/sec    1.00     44.2±0.10µs        ? ?/sec
neq Int32                                                                                                1.00     44.2±0.07µs        ? ?/sec    1.00     44.3±0.21µs        ? ?/sec
neq long same prefix strings StringArray                                                                 1.00    564.7±2.83µs        ? ?/sec    1.00    566.3±3.40µs        ? ?/sec
neq long same prefix strings StringViewArray                                                             1.00    980.0±3.81µs        ? ?/sec    1.00    975.7±3.67µs        ? ?/sec
neq scalar Float32                                                                                       1.00     44.1±0.07µs        ? ?/sec    1.00     44.1±0.06µs        ? ?/sec
neq scalar Int32                                                                                         1.00     44.1±0.11µs        ? ?/sec    1.00     44.1±0.06µs        ? ?/sec
nilike_utf8 scalar complex                                                                               1.00      2.9±0.06ms        ? ?/sec    1.02      2.9±0.06ms        ? ?/sec
nilike_utf8 scalar contains                                                                              1.00      4.4±0.07ms        ? ?/sec    1.00      4.4±0.05ms        ? ?/sec
nilike_utf8 scalar ends with                                                                             1.00  1150.7±35.76µs        ? ?/sec    1.01  1166.2±33.92µs        ? ?/sec
nilike_utf8 scalar equals                                                                                1.00   636.9±23.30µs        ? ?/sec    1.10   700.1±29.45µs        ? ?/sec
nilike_utf8 scalar starts with                                                                           1.00  1064.2±44.17µs        ? ?/sec    1.01  1076.6±38.68µs        ? ?/sec
nlike_utf8 scalar complex                                                                                1.01      2.2±0.03ms        ? ?/sec    1.00      2.2±0.05ms        ? ?/sec
nlike_utf8 scalar contains                                                                               1.00  1813.1±18.29µs        ? ?/sec    1.00  1811.9±19.05µs        ? ?/sec
nlike_utf8 scalar ends with                                                                              1.00   403.2±12.28µs        ? ?/sec    1.01   407.6±12.70µs        ? ?/sec
nlike_utf8 scalar equals                                                                                 1.08     80.9±0.27µs        ? ?/sec    1.00     74.9±0.23µs        ? ?/sec
nlike_utf8 scalar starts with                                                                            1.00   334.2±12.52µs        ? ?/sec    1.00   335.0±11.12µs        ? ?/sec

@zhuqi-lucas
Copy link
Contributor Author

lt StringViewArray StringViewArray inlined bytes 1.00 29.2±0.14ms ? ?/sec 1.66 48.4±0.12ms ? ?/sec

Thank you @alamb, it seems the benchmark has less improvement than my local mac result, but it still good to see %66 and %30 improvement.

lt StringViewArray StringViewArray inlined bytes                                                         1.00     29.2±0.14ms        ? ?/sec    1.66     48.4±0.12ms        ? ?/sec
lt scalar StringViewArray                                                                                1.00     49.8±0.12ms        ? ?/sec    1.30     64.8±0.14ms        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Jun 27, 2025

Thank you @alamb, it seems the benchmark has less improvement than my local mac result, but it still good to see %66 and %30 improvement.

Indeed those are pretty amazing results for something like comparisons

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @zhuqi-lucas -- thank you

previous_key = Some(key);
}

// 2) Cross-check against GenericBinaryArray comparison
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think technically speaking this second loop does everything the first loop does and thus the first loop is redundant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @alamb , thank you, addressed it in latest PR!

@zhuqi-lucas
Copy link
Contributor Author

The integration CI seems flaky recently, i am not sure why.

@alamb
Copy link
Contributor

alamb commented Jun 28, 2025

The integration CI seems flaky recently, i am not sure why.

Yeah, @etseidl and I have seen that too -- I filed a ticket

@alamb alamb merged commit aa96097 into apache:main Jun 29, 2025
29 of 30 checks passed
@alamb
Copy link
Contributor

alamb commented Jun 29, 2025

Thanks again @zhuqi-lucas and @Dandandan

Dandandan pushed a commit that referenced this pull request Jul 1, 2025
# Which issue does this PR close?

This is a follow-up for #7748

In theory we can custom string view compare, and make it crazy faster.

- Closes [#7790](#7790)

# Rationale for this change

In theory we can custom string view compare, and make it crazy faster.

# What changes are included in this PR?

In theory we can custom string view compare, and make it crazy faster.

# Are these changes tested?

Yes

# Are there any user-facing changes?

No
@alamb
Copy link
Contributor

alamb commented Jul 5, 2025

I believe this PR introduced a bug in comparisons, see

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Change use of inline_value to inline it to a u128

4 participants