Skip to content

Conversation

@rluvaton
Copy link
Member

@rluvaton rluvaton commented Jul 24, 2025

Which issue does this PR close?

Rationale for this change

See issue

What changes are included in this PR?

only encode sliced list values and change shift the offset in encoding

Are these changes tested?

Yes in:

Waiting for it to be merged first

Are there any user-facing changes?

only perf

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jul 24, 2025
Comment on lines 523 to 529
let first_offset = list_array.offsets()[0] as usize;
let last_offset =
list_array.offsets()[list_array.offsets().len() - 1] as usize;

list_array
.values()
.slice(first_offset, last_offset - first_offset)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also please add some comments explaining the context of this code (that as you have said, is non obvious) for future readers

For example, something like

// values can include more data than referenced in the ListArray, only encode
// the referenced values. 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rluvaton -- this code looks good to me

I think we need to add some tests as well prior to merging this PR

For example:

  1. A round trip encoded sliced list (if it doesn't already exist)
  2. A test specifically for encoding a smaller part of the list -- for example a test that created a ListArray with 2 rows: a single element list [1], and a list with 1000 values, [1,2,3,4...1000], sliced to only include the single element list. Then encode it and ensure the encodde size is reasonable

Comment on lines 523 to 529
let first_offset = list_array.offsets()[0] as usize;
let last_offset =
list_array.offsets()[list_array.offsets().len() - 1] as usize;

list_array
.values()
.slice(first_offset, last_offset - first_offset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also please add some comments explaining the context of this code (that as you have said, is non obvious) for future readers

For example, something like

// values can include more data than referenced in the ListArray, only encode
// the referenced values. 

@alamb
Copy link
Contributor

alamb commented Jul 26, 2025

Ooops, sorry -- I didn't see #7994 -- looking at that one

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rluvaton -- I still worry that this PR has no tests and thus we run the risk of potentially breaking / reverting it in some future PR.

Would it be possible to add some coverage, either in the form of the "verify the size of the output rows" or a performance benchmark?

@rluvaton
Copy link
Member Author

rluvaton commented Jul 27, 2025

Pr:

Added the tests.

Round trip is included in the fuzz test.

Number of output rows: already covered in the existing tests

The output rows size test will pass on both main and this branch as list encode the entire children as intermediate data, it would later only pick visible list values

I can add a performance benchmark that would show how bad for performance it was before

@rluvaton
Copy link
Member Author

@alamb created in #8008

@rluvaton
Copy link
Member Author

Run the benchmark against this and:

these are the results locally:

it shows 16 to 26 times faster

group                                                                 add-benchmark-encoding-sliced-list     perf-only-encode-actual-list-values
-----                                                                 ----------------------------------     -----------------------------------
append_rows 10 large_list(0) of u64(0)                                1.00   342.2±10.73ns        ? ?/sec    1.17   400.1±13.97ns        ? ?/sec
append_rows 10 list(0) of u64(0)                                      1.00   404.6±16.02ns        ? ?/sec    1.10   444.1±26.33ns        ? ?/sec
append_rows 4096 large_list(0) of u64(0)                              1.00     72.9±3.35µs        ? ?/sec    1.02     74.7±2.26µs        ? ?/sec
append_rows 4096 large_list(0) sliced to 10 of u64(0)                 26.98    14.8±2.63µs        ? ?/sec    1.00   548.9±14.61ns        ? ?/sec
append_rows 4096 list(0) of u64(0)                                    1.00     71.9±3.32µs        ? ?/sec    1.02     73.5±3.44µs        ? ?/sec
append_rows 4096 list(0) sliced to 10 of u64(0)                       22.10    13.8±3.40µs        ? ?/sec    1.00   626.4±14.46ns        ? ?/sec
convert_columns 10 large_list(0) of u64(0)                            1.00   638.2±18.00ns        ? ?/sec    1.03   658.0±22.13ns        ? ?/sec
convert_columns 10 list(0) of u64(0)                                  1.00   663.0±17.46ns        ? ?/sec    1.06   703.9±32.63ns        ? ?/sec
convert_columns 4096 large_list(0) of u64(0)                          1.00     77.5±2.68µs        ? ?/sec    1.01     78.1±4.30µs        ? ?/sec
convert_columns 4096 large_list(0) sliced to 10 of u64(0)             17.97    14.9±3.14µs        ? ?/sec    1.00   830.4±23.24ns        ? ?/sec
convert_columns 4096 list(0) of u64(0)                                1.00     70.3±1.55µs        ? ?/sec    1.05     73.9±2.49µs        ? ?/sec
convert_columns 4096 list(0) sliced to 10 of u64(0)                   16.38    14.3±2.74µs        ? ?/sec    1.00   871.2±22.19ns        ? ?/sec
convert_columns_prepared 10 large_list(0) of u64(0)                   1.00    422.5±8.89ns        ? ?/sec    1.09   462.1±10.78ns        ? ?/sec
convert_columns_prepared 10 list(0) of u64(0)                         1.00   466.6±16.83ns        ? ?/sec    1.10   512.7±21.83ns        ? ?/sec
convert_columns_prepared 4096 large_list(0) of u64(0)                 1.00     77.0±3.19µs        ? ?/sec    1.02     78.4±4.27µs        ? ?/sec
convert_columns_prepared 4096 large_list(0) sliced to 10 of u64(0)    20.54    12.8±3.06µs        ? ?/sec    1.00   622.3±12.86ns        ? ?/sec
convert_columns_prepared 4096 list(0) of u64(0)                       1.00     69.5±2.28µs        ? ?/sec    1.05     73.2±1.54µs        ? ?/sec
convert_columns_prepared 4096 list(0) sliced to 10 of u64(0)          19.09    12.9±2.62µs        ? ?/sec    1.00   673.8±13.70ns        ? ?/sec
convert_rows 10 large_list(0) of u64(0)                               1.00  1041.9±13.96ns        ? ?/sec    1.01  1050.4±21.63ns        ? ?/sec
convert_rows 10 list(0) of u64(0)                                     1.00  1077.8±25.09ns        ? ?/sec    1.03  1105.9±45.31ns        ? ?/sec
convert_rows 4096 large_list(0) of u64(0)                             1.01    153.2±6.28µs        ? ?/sec    1.00    152.1±4.65µs        ? ?/sec
convert_rows 4096 large_list(0) sliced to 10 of u64(0)                1.00  1345.3±24.29ns        ? ?/sec    1.01  1361.6±23.10ns        ? ?/sec
convert_rows 4096 list(0) of u64(0)                                   1.00    145.8±6.87µs        ? ?/sec    1.02    148.7±7.07µs        ? ?/sec
convert_rows 4096 list(0) sliced to 10 of u64(0)                      1.00  1432.5±27.76ns        ? ?/sec    1.01  1446.0±37.15ns        ? ?/sec

@rluvaton rluvaton changed the title perf: only encode actual list values in RowConverter perf: only encode actual list values in RowConverter (16-26 times faster for small sliced list) Jul 27, 2025
alamb pushed a commit that referenced this pull request Jul 28, 2025
#8008)

# Which issue does this PR close?

N/A

# Rationale for this change


#7996 (review)

# What changes are included in this PR?

added to the row format conversion list/large list and sliced list/large
list cases

# Are these changes tested?

Not needed

# Are there any user-facing changes?

Nope

----

Related to:
- #7996
@alamb
Copy link
Contributor

alamb commented Jul 28, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing perf-only-encode-actual-list-values (27faadd) to a65a984 diff
BENCH_NAME=row_format
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench row_format
BENCH_FILTER=
BENCH_BRANCH_NAME=perf-only-encode-actual-list-values
Results will be posted here when complete

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @rluvaton

@alamb
Copy link
Contributor

alamb commented Jul 28, 2025

🤖: Benchmark completed

Details

group                                                                                                                         main                                   perf-only-encode-actual-list-values
-----                                                                                                                         ----                                   -----------------------------------
append_rows 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)                 1.00    377.1±1.74µs        ? ?/sec    1.02    383.3±1.56µs        ? ?/sec
append_rows 4096 bool(0, 0.5)                                                                                                 1.00      8.6±0.28µs        ? ?/sec    1.53     13.2±0.02µs        ? ?/sec
append_rows 4096 bool(0.3, 0.5)                                                                                               1.00     16.1±0.07µs        ? ?/sec    1.00     16.1±0.06µs        ? ?/sec
append_rows 4096 i64(0)                                                                                                       1.00      7.8±0.11µs        ? ?/sec    1.00      7.8±0.11µs        ? ?/sec
append_rows 4096 i64(0.3)                                                                                                     1.00     14.7±0.10µs        ? ?/sec    1.00     14.7±0.11µs        ? ?/sec
append_rows 4096 string view(10, 0)                                                                                           1.19     53.4±0.26µs        ? ?/sec    1.00     44.8±0.27µs        ? ?/sec
append_rows 4096 string view(100, 0)                                                                                          1.00     79.5±0.17µs        ? ?/sec    1.00     79.6±0.41µs        ? ?/sec
append_rows 4096 string view(100, 0.5)                                                                                        1.05     84.0±0.24µs        ? ?/sec    1.00     80.2±0.20µs        ? ?/sec
append_rows 4096 string view(30, 0)                                                                                           1.10     55.0±0.14µs        ? ?/sec    1.00     50.2±0.13µs        ? ?/sec
append_rows 4096 string(10, 0)                                                                                                1.11     50.1±0.25µs        ? ?/sec    1.00     45.2±0.06µs        ? ?/sec
append_rows 4096 string(100, 0)                                                                                               1.00     79.1±0.36µs        ? ?/sec    1.01     79.6±0.31µs        ? ?/sec
append_rows 4096 string(100, 0.5)                                                                                             1.01     85.8±0.20µs        ? ?/sec    1.00     84.7±0.25µs        ? ?/sec
append_rows 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                                       1.03    250.4±1.41µs        ? ?/sec    1.00    243.2±1.40µs        ? ?/sec
append_rows 4096 string(30, 0)                                                                                                1.05     50.9±0.33µs        ? ?/sec    1.00     48.6±0.12µs        ? ?/sec
append_rows 4096 string_dictionary(10, 0)                                                                                     1.05     77.9±0.50µs        ? ?/sec    1.00     74.4±0.08µs        ? ?/sec
append_rows 4096 string_dictionary(100, 0)                                                                                    1.00    151.2±1.44µs        ? ?/sec    1.00    151.8±0.94µs        ? ?/sec
append_rows 4096 string_dictionary(100, 0.5)                                                                                  1.00    117.9±0.51µs        ? ?/sec    1.01    118.5±0.29µs        ? ?/sec
append_rows 4096 string_dictionary(30, 0)                                                                                     1.01     81.1±0.27µs        ? ?/sec    1.00     80.2±0.22µs        ? ?/sec
append_rows 4096 string_dictionary_low_cardinality(10, 0)                                                                     1.00     29.0±0.06µs        ? ?/sec    1.01     29.3±0.90µs        ? ?/sec
append_rows 4096 string_dictionary_low_cardinality(100, 0)                                                                    1.00     47.5±0.09µs        ? ?/sec    1.00     47.6±0.08µs        ? ?/sec
append_rows 4096 string_dictionary_low_cardinality(30, 0)                                                                     1.02     29.6±0.19µs        ? ?/sec    1.00     29.2±0.15µs        ? ?/sec
append_rows 4096 u64(0)                                                                                                       1.00      7.6±0.11µs        ? ?/sec    1.00      7.7±0.11µs        ? ?/sec
append_rows 4096 u64(0.3)                                                                                                     1.01     14.8±0.07µs        ? ?/sec    1.00     14.6±0.11µs        ? ?/sec
convert_columns 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)             1.01    389.2±1.87µs        ? ?/sec    1.00    386.9±1.80µs        ? ?/sec
convert_columns 4096 bool(0, 0.5)                                                                                             1.00      8.9±0.01µs        ? ?/sec    1.51     13.4±0.03µs        ? ?/sec
convert_columns 4096 bool(0.3, 0.5)                                                                                           1.00     16.4±0.09µs        ? ?/sec    1.00     16.3±0.07µs        ? ?/sec
convert_columns 4096 i64(0)                                                                                                   1.00      7.9±0.01µs        ? ?/sec    1.00      7.9±0.10µs        ? ?/sec
convert_columns 4096 i64(0.3)                                                                                                 1.00     14.9±0.09µs        ? ?/sec    1.00     14.9±0.13µs        ? ?/sec
convert_columns 4096 string view(10, 0)                                                                                       1.19     53.1±0.25µs        ? ?/sec    1.00     44.7±0.16µs        ? ?/sec
convert_columns 4096 string view(100, 0)                                                                                      1.01     80.2±0.31µs        ? ?/sec    1.00     79.2±0.58µs        ? ?/sec
convert_columns 4096 string view(100, 0.5)                                                                                    1.05     85.1±0.24µs        ? ?/sec    1.00     80.9±0.20µs        ? ?/sec
convert_columns 4096 string view(30, 0)                                                                                       1.09     55.0±0.99µs        ? ?/sec    1.00     50.4±0.19µs        ? ?/sec
convert_columns 4096 string(10, 0)                                                                                            1.08     49.2±0.22µs        ? ?/sec    1.00     45.5±0.07µs        ? ?/sec
convert_columns 4096 string(100, 0)                                                                                           1.00     78.7±0.35µs        ? ?/sec    1.00     78.4±0.23µs        ? ?/sec
convert_columns 4096 string(100, 0.5)                                                                                         1.01     86.3±0.33µs        ? ?/sec    1.00     85.1±0.22µs        ? ?/sec
convert_columns 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                                   1.01    247.0±1.23µs        ? ?/sec    1.00    243.9±2.30µs        ? ?/sec
convert_columns 4096 string(30, 0)                                                                                            1.04     50.8±0.16µs        ? ?/sec    1.00     48.8±0.11µs        ? ?/sec
convert_columns 4096 string_dictionary(10, 0)                                                                                 1.02     78.0±0.30µs        ? ?/sec    1.00     76.8±0.11µs        ? ?/sec
convert_columns 4096 string_dictionary(100, 0)                                                                                1.01    154.9±1.59µs        ? ?/sec    1.00    153.3±1.11µs        ? ?/sec
convert_columns 4096 string_dictionary(100, 0.5)                                                                              1.00    120.9±0.24µs        ? ?/sec    1.01    121.6±0.28µs        ? ?/sec
convert_columns 4096 string_dictionary(30, 0)                                                                                 1.01     81.7±0.19µs        ? ?/sec    1.00     80.5±0.17µs        ? ?/sec
convert_columns 4096 string_dictionary_low_cardinality(10, 0)                                                                 1.00     30.2±0.10µs        ? ?/sec    1.01     30.5±0.09µs        ? ?/sec
convert_columns 4096 string_dictionary_low_cardinality(100, 0)                                                                1.01     49.0±0.14µs        ? ?/sec    1.00     48.6±0.08µs        ? ?/sec
convert_columns 4096 string_dictionary_low_cardinality(30, 0)                                                                 1.00     30.6±0.25µs        ? ?/sec    1.00     30.6±0.16µs        ? ?/sec
convert_columns 4096 u64(0)                                                                                                   1.01      7.9±0.12µs        ? ?/sec    1.00      7.7±0.08µs        ? ?/sec
convert_columns 4096 u64(0.3)                                                                                                 1.01     15.0±0.07µs        ? ?/sec    1.00     14.9±0.12µs        ? ?/sec
convert_columns_prepared 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)    1.00    381.8±2.80µs        ? ?/sec    1.01    385.1±1.88µs        ? ?/sec
convert_columns_prepared 4096 bool(0, 0.5)                                                                                    1.00      8.7±0.01µs        ? ?/sec    1.53     13.3±0.03µs        ? ?/sec
convert_columns_prepared 4096 bool(0.3, 0.5)                                                                                  1.00     16.2±0.15µs        ? ?/sec    1.01     16.3±0.08µs        ? ?/sec
convert_columns_prepared 4096 i64(0)                                                                                          1.03      7.9±0.11µs        ? ?/sec    1.00      7.7±0.12µs        ? ?/sec
convert_columns_prepared 4096 i64(0.3)                                                                                        1.00     14.7±0.14µs        ? ?/sec    1.01     14.9±0.15µs        ? ?/sec
convert_columns_prepared 4096 string view(10, 0)                                                                              1.19     53.0±0.35µs        ? ?/sec    1.00     44.7±0.12µs        ? ?/sec
convert_columns_prepared 4096 string view(100, 0)                                                                             1.00     79.6±0.31µs        ? ?/sec    1.00     79.6±0.30µs        ? ?/sec
convert_columns_prepared 4096 string view(100, 0.5)                                                                           1.04     84.0±0.19µs        ? ?/sec    1.00     80.7±0.16µs        ? ?/sec
convert_columns_prepared 4096 string view(30, 0)                                                                              1.09     55.2±0.31µs        ? ?/sec    1.00     50.4±0.18µs        ? ?/sec
convert_columns_prepared 4096 string(10, 0)                                                                                   1.08     48.8±0.47µs        ? ?/sec    1.00     45.4±0.10µs        ? ?/sec
convert_columns_prepared 4096 string(100, 0)                                                                                  1.00     79.5±0.28µs        ? ?/sec    1.00     79.3±0.26µs        ? ?/sec
convert_columns_prepared 4096 string(100, 0.5)                                                                                1.01     86.1±0.20µs        ? ?/sec    1.00     85.0±0.23µs        ? ?/sec
convert_columns_prepared 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                          1.01    244.6±1.05µs        ? ?/sec    1.00    242.4±1.10µs        ? ?/sec
convert_columns_prepared 4096 string(30, 0)                                                                                   1.04     50.6±0.14µs        ? ?/sec    1.00     48.7±0.11µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(10, 0)                                                                        1.04     78.8±0.43µs        ? ?/sec    1.00     75.6±0.12µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(100, 0)                                                                       1.00    151.8±0.96µs        ? ?/sec    1.00    151.4±0.92µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(100, 0.5)                                                                     1.00    118.1±0.46µs        ? ?/sec    1.01    119.1±0.46µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(30, 0)                                                                        1.02     81.6±0.29µs        ? ?/sec    1.00     80.0±0.13µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary_low_cardinality(10, 0)                                                        1.00     29.2±0.07µs        ? ?/sec    1.01     29.4±0.04µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary_low_cardinality(100, 0)                                                       1.00     47.7±0.06µs        ? ?/sec    1.00     47.9±0.11µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary_low_cardinality(30, 0)                                                        1.02     29.8±0.06µs        ? ?/sec    1.00     29.3±0.04µs        ? ?/sec
convert_columns_prepared 4096 u64(0)                                                                                          1.00      7.8±0.11µs        ? ?/sec    1.00      7.8±0.09µs        ? ?/sec
convert_columns_prepared 4096 u64(0.3)                                                                                        1.01     14.9±0.08µs        ? ?/sec    1.00     14.7±0.09µs        ? ?/sec
convert_rows 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)                1.00    298.5±2.73µs        ? ?/sec    1.00    297.2±2.81µs        ? ?/sec
convert_rows 4096 bool(0, 0.5)                                                                                                1.02     16.4±0.05µs        ? ?/sec    1.00     16.0±0.03µs        ? ?/sec
convert_rows 4096 bool(0.3, 0.5)                                                                                              1.02     16.4±0.04µs        ? ?/sec    1.00     16.0±0.02µs        ? ?/sec
convert_rows 4096 i64(0)                                                                                                      1.01     33.3±0.06µs        ? ?/sec    1.00     32.9±0.06µs        ? ?/sec
convert_rows 4096 i64(0.3)                                                                                                    1.01     33.3±0.11µs        ? ?/sec    1.00     33.0±0.07µs        ? ?/sec
convert_rows 4096 string view(10, 0)                                                                                          1.01     73.5±0.20µs        ? ?/sec    1.00     72.8±0.18µs        ? ?/sec
convert_rows 4096 string view(100, 0)                                                                                         1.02    122.0±0.35µs        ? ?/sec    1.00    119.1±0.38µs        ? ?/sec
convert_rows 4096 string view(100, 0.5)                                                                                       1.00    111.8±0.23µs        ? ?/sec    1.00    111.3±0.25µs        ? ?/sec
convert_rows 4096 string view(30, 0)                                                                                          1.01     82.6±0.36µs        ? ?/sec    1.00     81.7±0.22µs        ? ?/sec
convert_rows 4096 string(10, 0)                                                                                               1.00     61.4±0.10µs        ? ?/sec    1.00     61.5±0.36µs        ? ?/sec
convert_rows 4096 string(100, 0)                                                                                              1.00    108.0±0.52µs        ? ?/sec    1.00    107.9±0.45µs        ? ?/sec
convert_rows 4096 string(100, 0.5)                                                                                            1.01    103.8±0.18µs        ? ?/sec    1.00    102.7±0.31µs        ? ?/sec
convert_rows 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                                      1.00    296.2±2.49µs        ? ?/sec    1.01    298.7±3.89µs        ? ?/sec
convert_rows 4096 string(30, 0)                                                                                               1.02     73.9±0.14µs        ? ?/sec    1.00     72.5±0.24µs        ? ?/sec
convert_rows 4096 string_dictionary(10, 0)                                                                                    1.02     61.9±0.11µs        ? ?/sec    1.00     60.9±0.27µs        ? ?/sec
convert_rows 4096 string_dictionary(100, 0)                                                                                   1.00    107.7±0.32µs        ? ?/sec    1.00    107.4±0.31µs        ? ?/sec
convert_rows 4096 string_dictionary(100, 0.5)                                                                                 1.01    104.1±0.38µs        ? ?/sec    1.00    102.9±0.22µs        ? ?/sec
convert_rows 4096 string_dictionary(30, 0)                                                                                    1.02     74.2±0.14µs        ? ?/sec    1.00     72.4±0.31µs        ? ?/sec
convert_rows 4096 string_dictionary_low_cardinality(10, 0)                                                                    1.01     61.9±0.10µs        ? ?/sec    1.00     61.0±0.35µs        ? ?/sec
convert_rows 4096 string_dictionary_low_cardinality(100, 0)                                                                   1.00    108.2±0.41µs        ? ?/sec    1.00    107.8±0.49µs        ? ?/sec
convert_rows 4096 string_dictionary_low_cardinality(30, 0)                                                                    1.01     74.0±0.14µs        ? ?/sec    1.00     73.3±0.47µs        ? ?/sec
convert_rows 4096 u64(0)                                                                                                      1.01     30.5±0.04µs        ? ?/sec    1.00     30.1±0.05µs        ? ?/sec
convert_rows 4096 u64(0.3)                                                                                                    1.01     30.5±0.06µs        ? ?/sec    1.00     30.1±0.07µs        ? ?/sec
iterate rows                                                                                                                  1.00      2.6±0.00µs        ? ?/sec    1.00      2.6±0.00µs        ? ?/sec

@rluvaton
Copy link
Member Author

@alamb the benchmark results does not include the new benchmarks

@alamb
Copy link
Contributor

alamb commented Jul 28, 2025

@alamb the benchmark results does not include the new benchmarks

Yeah I probably need to merge this PR up from main (so git merge-base includes the benchmarks). Will do

@alamb
Copy link
Contributor

alamb commented Jul 28, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing perf-only-encode-actual-list-values (7b2df90) to 9d26336 diff
BENCH_NAME=row_format
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench row_format
BENCH_FILTER=
BENCH_BRANCH_NAME=perf-only-encode-actual-list-values
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jul 28, 2025

🤖: Benchmark completed

Details

group                                                                                                                         main                                   perf-only-encode-actual-list-values
-----                                                                                                                         ----                                   -----------------------------------
append_rows 10 large_list(0) of u64(0)                                                                                        1.00    616.0±0.73ns        ? ?/sec    1.09    672.6±1.21ns        ? ?/sec
append_rows 10 list(0) of u64(0)                                                                                              1.00    673.2±1.89ns        ? ?/sec    1.08    726.2±1.21ns        ? ?/sec
append_rows 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)                 1.00    390.4±8.16µs        ? ?/sec    1.00    392.2±2.55µs        ? ?/sec
append_rows 4096 bool(0, 0.5)                                                                                                 1.00     12.1±0.03µs        ? ?/sec    1.00     12.1±0.02µs        ? ?/sec
append_rows 4096 bool(0.3, 0.5)                                                                                               1.00     16.2±0.20µs        ? ?/sec    1.00     16.2±0.07µs        ? ?/sec
append_rows 4096 i64(0)                                                                                                       1.00      7.6±0.23µs        ? ?/sec    1.00      7.6±0.13µs        ? ?/sec
append_rows 4096 i64(0.3)                                                                                                     1.00     14.5±0.12µs        ? ?/sec    1.02     14.7±0.09µs        ? ?/sec
append_rows 4096 large_list(0) of u64(0)                                                                                      1.00    189.8±0.33µs        ? ?/sec    1.01    191.2±0.39µs        ? ?/sec
append_rows 4096 large_list(0) sliced to 10 of u64(0)                                                                         16.60    16.4±0.25µs        ? ?/sec    1.00    988.9±1.61ns        ? ?/sec
append_rows 4096 list(0) of u64(0)                                                                                            1.02    191.2±0.58µs        ? ?/sec    1.00    187.8±0.30µs        ? ?/sec
append_rows 4096 list(0) sliced to 10 of u64(0)                                                                               14.47    16.5±0.26µs        ? ?/sec    1.00   1141.9±2.75ns        ? ?/sec
append_rows 4096 string view(10, 0)                                                                                           1.00     52.7±0.09µs        ? ?/sec    1.00     52.7±0.09µs        ? ?/sec
append_rows 4096 string view(100, 0)                                                                                          1.02     78.3±0.18µs        ? ?/sec    1.00     76.9±0.43µs        ? ?/sec
append_rows 4096 string view(100, 0.5)                                                                                        1.00     85.7±0.28µs        ? ?/sec    1.00     85.6±0.23µs        ? ?/sec
append_rows 4096 string view(30, 0)                                                                                           1.00     57.1±0.09µs        ? ?/sec    1.02     58.2±0.16µs        ? ?/sec
append_rows 4096 string(10, 0)                                                                                                1.02     54.8±0.11µs        ? ?/sec    1.00     53.7±0.06µs        ? ?/sec
append_rows 4096 string(100, 0)                                                                                               1.01     75.0±0.27µs        ? ?/sec    1.00     74.1±0.15µs        ? ?/sec
append_rows 4096 string(100, 0.5)                                                                                             1.00     82.6±0.24µs        ? ?/sec    1.00     82.4±0.19µs        ? ?/sec
append_rows 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                                       1.01    245.3±0.84µs        ? ?/sec    1.00    242.4±0.88µs        ? ?/sec
append_rows 4096 string(30, 0)                                                                                                1.02     57.0±0.13µs        ? ?/sec    1.00     56.1±0.16µs        ? ?/sec
append_rows 4096 string_dictionary(10, 0)                                                                                     1.01     80.9±0.11µs        ? ?/sec    1.00     79.8±0.18µs        ? ?/sec
append_rows 4096 string_dictionary(100, 0)                                                                                    1.01    150.4±1.08µs        ? ?/sec    1.00    149.0±0.46µs        ? ?/sec
append_rows 4096 string_dictionary(100, 0.5)                                                                                  1.00    113.9±0.39µs        ? ?/sec    1.01    115.1±0.40µs        ? ?/sec
append_rows 4096 string_dictionary(30, 0)                                                                                     1.00     86.7±0.18µs        ? ?/sec    1.01     87.9±0.22µs        ? ?/sec
append_rows 4096 string_dictionary_low_cardinality(10, 0)                                                                     1.04     26.0±0.05µs        ? ?/sec    1.00     25.0±0.06µs        ? ?/sec
append_rows 4096 string_dictionary_low_cardinality(100, 0)                                                                    1.00     47.8±0.15µs        ? ?/sec    1.00     47.6±0.07µs        ? ?/sec
append_rows 4096 string_dictionary_low_cardinality(30, 0)                                                                     1.00     26.9±0.09µs        ? ?/sec    1.00     26.8±0.06µs        ? ?/sec
append_rows 4096 u64(0)                                                                                                       1.01      7.7±0.11µs        ? ?/sec    1.00      7.6±0.10µs        ? ?/sec
append_rows 4096 u64(0.3)                                                                                                     1.00     14.9±0.08µs        ? ?/sec    1.01     15.1±0.09µs        ? ?/sec
convert_columns 10 large_list(0) of u64(0)                                                                                    1.00    882.7±1.07ns        ? ?/sec    1.05    928.1±2.34ns        ? ?/sec
convert_columns 10 list(0) of u64(0)                                                                                          1.00    947.4±8.19ns        ? ?/sec    1.04    987.1±4.92ns        ? ?/sec
convert_columns 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)             1.01    395.2±3.10µs        ? ?/sec    1.00    392.9±3.25µs        ? ?/sec
convert_columns 4096 bool(0, 0.5)                                                                                             1.00     12.4±0.03µs        ? ?/sec    1.00     12.4±0.02µs        ? ?/sec
convert_columns 4096 bool(0.3, 0.5)                                                                                           1.00     16.4±0.07µs        ? ?/sec    1.01     16.5±0.06µs        ? ?/sec
convert_columns 4096 i64(0)                                                                                                   1.00      8.0±0.13µs        ? ?/sec    1.00      8.0±0.12µs        ? ?/sec
convert_columns 4096 i64(0.3)                                                                                                 1.00     14.7±0.09µs        ? ?/sec    1.02     15.0±0.07µs        ? ?/sec
convert_columns 4096 large_list(0) of u64(0)                                                                                  1.00    190.4±0.49µs        ? ?/sec    1.00    191.3±0.40µs        ? ?/sec
convert_columns 4096 large_list(0) sliced to 10 of u64(0)                                                                     13.04    16.6±0.25µs        ? ?/sec    1.00   1273.2±2.51ns        ? ?/sec
convert_columns 4096 list(0) of u64(0)                                                                                        1.01    191.1±0.41µs        ? ?/sec    1.00    188.4±0.45µs        ? ?/sec
convert_columns 4096 list(0) sliced to 10 of u64(0)                                                                           11.87    16.7±0.24µs        ? ?/sec    1.00   1404.7±7.48ns        ? ?/sec
convert_columns 4096 string view(10, 0)                                                                                       1.01     53.3±0.08µs        ? ?/sec    1.00     52.9±0.06µs        ? ?/sec
convert_columns 4096 string view(100, 0)                                                                                      1.00     77.8±0.16µs        ? ?/sec    1.03     80.2±0.45µs        ? ?/sec
convert_columns 4096 string view(100, 0.5)                                                                                    1.00     86.1±0.27µs        ? ?/sec    1.00     86.1±0.26µs        ? ?/sec
convert_columns 4096 string view(30, 0)                                                                                       1.00     57.5±0.12µs        ? ?/sec    1.02     58.4±0.13µs        ? ?/sec
convert_columns 4096 string(10, 0)                                                                                            1.02     55.4±0.92µs        ? ?/sec    1.00     54.1±0.10µs        ? ?/sec
convert_columns 4096 string(100, 0)                                                                                           1.00     74.4±0.34µs        ? ?/sec    1.03     76.8±0.80µs        ? ?/sec
convert_columns 4096 string(100, 0.5)                                                                                         1.00     83.0±0.18µs        ? ?/sec    1.00     82.9±0.23µs        ? ?/sec
convert_columns 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                                   1.00    242.1±0.87µs        ? ?/sec    1.01    244.0±1.23µs        ? ?/sec
convert_columns 4096 string(30, 0)                                                                                            1.02     57.2±0.13µs        ? ?/sec    1.00     56.3±0.06µs        ? ?/sec
convert_columns 4096 string_dictionary(10, 0)                                                                                 1.01     82.0±0.23µs        ? ?/sec    1.00     80.8±0.13µs        ? ?/sec
convert_columns 4096 string_dictionary(100, 0)                                                                                1.00    152.3±0.85µs        ? ?/sec    1.00    152.0±1.02µs        ? ?/sec
convert_columns 4096 string_dictionary(100, 0.5)                                                                              1.00    115.9±0.41µs        ? ?/sec    1.00    116.2±0.49µs        ? ?/sec
convert_columns 4096 string_dictionary(30, 0)                                                                                 1.02     89.0±0.22µs        ? ?/sec    1.00     87.7±0.18µs        ? ?/sec
convert_columns 4096 string_dictionary_low_cardinality(10, 0)                                                                 1.04     26.8±0.06µs        ? ?/sec    1.00     25.8±0.10µs        ? ?/sec
convert_columns 4096 string_dictionary_low_cardinality(100, 0)                                                                1.02     49.8±0.20µs        ? ?/sec    1.00     48.7±0.17µs        ? ?/sec
convert_columns 4096 string_dictionary_low_cardinality(30, 0)                                                                 1.00     27.4±0.09µs        ? ?/sec    1.00     27.5±0.07µs        ? ?/sec
convert_columns 4096 u64(0)                                                                                                   1.00      7.8±0.11µs        ? ?/sec    1.00      7.8±0.11µs        ? ?/sec
convert_columns 4096 u64(0.3)                                                                                                 1.00     15.1±0.08µs        ? ?/sec    1.01     15.3±0.07µs        ? ?/sec
convert_columns_prepared 10 large_list(0) of u64(0)                                                                           1.00    673.7±0.68ns        ? ?/sec    1.10    739.0±1.59ns        ? ?/sec
convert_columns_prepared 10 list(0) of u64(0)                                                                                 1.00    731.3±4.75ns        ? ?/sec    1.08    792.7±2.35ns        ? ?/sec
convert_columns_prepared 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)    1.01    391.8±2.48µs        ? ?/sec    1.00    389.0±2.07µs        ? ?/sec
convert_columns_prepared 4096 bool(0, 0.5)                                                                                    1.00     12.2±0.02µs        ? ?/sec    1.00     12.3±0.02µs        ? ?/sec
convert_columns_prepared 4096 bool(0.3, 0.5)                                                                                  1.00     16.3±0.08µs        ? ?/sec    1.00     16.4±0.08µs        ? ?/sec
convert_columns_prepared 4096 i64(0)                                                                                          1.00      7.7±0.08µs        ? ?/sec    1.02      7.9±0.13µs        ? ?/sec
convert_columns_prepared 4096 i64(0.3)                                                                                        1.00     14.6±0.11µs        ? ?/sec    1.02     14.9±0.10µs        ? ?/sec
convert_columns_prepared 4096 large_list(0) of u64(0)                                                                         1.00    190.7±1.08µs        ? ?/sec    1.00    191.4±0.43µs        ? ?/sec
convert_columns_prepared 4096 large_list(0) sliced to 10 of u64(0)                                                            15.16    16.4±0.24µs        ? ?/sec    1.00   1078.3±1.39ns        ? ?/sec
convert_columns_prepared 4096 list(0) of u64(0)                                                                               1.02    191.3±0.52µs        ? ?/sec    1.00    188.0±0.43µs        ? ?/sec
convert_columns_prepared 4096 list(0) sliced to 10 of u64(0)                                                                  13.61    16.7±0.20µs        ? ?/sec    1.00   1228.9±3.14ns        ? ?/sec
convert_columns_prepared 4096 string view(10, 0)                                                                              1.01     53.3±0.12µs        ? ?/sec    1.00     52.8±0.10µs        ? ?/sec
convert_columns_prepared 4096 string view(100, 0)                                                                             1.02     78.5±0.23µs        ? ?/sec    1.00     77.3±0.75µs        ? ?/sec
convert_columns_prepared 4096 string view(100, 0.5)                                                                           1.00     86.0±0.32µs        ? ?/sec    1.00     86.1±0.34µs        ? ?/sec
convert_columns_prepared 4096 string view(30, 0)                                                                              1.00     57.3±0.14µs        ? ?/sec    1.02     58.3±0.12µs        ? ?/sec
convert_columns_prepared 4096 string(10, 0)                                                                                   1.02     55.0±0.16µs        ? ?/sec    1.00     53.9±0.08µs        ? ?/sec
convert_columns_prepared 4096 string(100, 0)                                                                                  1.01     74.6±0.18µs        ? ?/sec    1.00     74.0±0.22µs        ? ?/sec
convert_columns_prepared 4096 string(100, 0.5)                                                                                1.01     83.4±0.20µs        ? ?/sec    1.00     82.5±0.21µs        ? ?/sec
convert_columns_prepared 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                          1.00    242.5±1.22µs        ? ?/sec    1.00    243.5±0.87µs        ? ?/sec
convert_columns_prepared 4096 string(30, 0)                                                                                   1.02     57.2±0.19µs        ? ?/sec    1.00     56.2±0.11µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(10, 0)                                                                        1.01     81.7±0.08µs        ? ?/sec    1.00     81.2±0.10µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(100, 0)                                                                       1.00    150.0±0.84µs        ? ?/sec    1.00    149.7±1.07µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(100, 0.5)                                                                     1.00    114.6±0.30µs        ? ?/sec    1.01    115.3±0.23µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(30, 0)                                                                        1.00     86.9±0.24µs        ? ?/sec    1.01     87.8±0.31µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary_low_cardinality(10, 0)                                                        1.05     26.3±0.04µs        ? ?/sec    1.00     25.1±0.04µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary_low_cardinality(100, 0)                                                       1.00     48.0±0.26µs        ? ?/sec    1.00     47.8±0.08µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary_low_cardinality(30, 0)                                                        1.00     27.0±0.06µs        ? ?/sec    1.00     26.9±0.08µs        ? ?/sec
convert_columns_prepared 4096 u64(0)                                                                                          1.00      7.7±0.11µs        ? ?/sec    1.00      7.7±0.10µs        ? ?/sec
convert_columns_prepared 4096 u64(0.3)                                                                                        1.00     15.0±0.09µs        ? ?/sec    1.02     15.2±0.06µs        ? ?/sec
convert_rows 10 large_list(0) of u64(0)                                                                                       1.07   1691.5±5.10ns        ? ?/sec    1.00   1585.0±4.88ns        ? ?/sec
convert_rows 10 list(0) of u64(0)                                                                                             1.00   1719.0±4.68ns        ? ?/sec    1.01   1743.2±6.51ns        ? ?/sec
convert_rows 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)                1.02    290.1±1.76µs        ? ?/sec    1.00    285.3±2.15µs        ? ?/sec
convert_rows 4096 bool(0, 0.5)                                                                                                1.00     17.2±0.14µs        ? ?/sec    1.10     18.9±0.05µs        ? ?/sec
convert_rows 4096 bool(0.3, 0.5)                                                                                              1.00     17.2±0.04µs        ? ?/sec    1.10     19.0±0.03µs        ? ?/sec
convert_rows 4096 i64(0)                                                                                                      1.02     34.6±0.06µs        ? ?/sec    1.00     33.7±0.04µs        ? ?/sec
convert_rows 4096 i64(0.3)                                                                                                    1.02     34.6±0.10µs        ? ?/sec    1.00     33.8±0.07µs        ? ?/sec
convert_rows 4096 large_list(0) of u64(0)                                                                                     1.00    265.8±0.58µs        ? ?/sec    1.00    267.1±0.67µs        ? ?/sec
convert_rows 4096 large_list(0) sliced to 10 of u64(0)                                                                        1.00   1916.5±7.47ns        ? ?/sec    1.06      2.0±0.01µs        ? ?/sec
convert_rows 4096 list(0) of u64(0)                                                                                           1.00    255.5±0.56µs        ? ?/sec    1.00    256.5±0.85µs        ? ?/sec
convert_rows 4096 list(0) sliced to 10 of u64(0)                                                                              1.00      2.1±0.01µs        ? ?/sec    1.02      2.2±0.01µs        ? ?/sec
convert_rows 4096 string view(10, 0)                                                                                          1.02     71.0±0.22µs        ? ?/sec    1.00     69.5±0.12µs        ? ?/sec
convert_rows 4096 string view(100, 0)                                                                                         1.02    121.4±0.42µs        ? ?/sec    1.00    119.5±1.79µs        ? ?/sec
convert_rows 4096 string view(100, 0.5)                                                                                       1.02    113.1±0.30µs        ? ?/sec    1.00    111.2±0.27µs        ? ?/sec
convert_rows 4096 string view(30, 0)                                                                                          1.00     82.1±0.22µs        ? ?/sec    1.01     83.1±0.21µs        ? ?/sec
convert_rows 4096 string(10, 0)                                                                                               1.05     56.8±0.11µs        ? ?/sec    1.00     54.1±0.08µs        ? ?/sec
convert_rows 4096 string(100, 0)                                                                                              1.02    106.2±0.48µs        ? ?/sec    1.00    104.5±0.44µs        ? ?/sec
convert_rows 4096 string(100, 0.5)                                                                                            1.02    102.4±0.33µs        ? ?/sec    1.00    100.8±0.34µs        ? ?/sec
convert_rows 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                                      1.04    295.2±3.35µs        ? ?/sec    1.00    284.2±1.58µs        ? ?/sec
convert_rows 4096 string(30, 0)                                                                                               1.04     70.4±0.20µs        ? ?/sec    1.00     68.0±0.32µs        ? ?/sec
convert_rows 4096 string_dictionary(10, 0)                                                                                    1.05     56.9±0.16µs        ? ?/sec    1.00     54.3±0.09µs        ? ?/sec
convert_rows 4096 string_dictionary(100, 0)                                                                                   1.01    105.6±0.37µs        ? ?/sec    1.00    104.4±0.38µs        ? ?/sec
convert_rows 4096 string_dictionary(100, 0.5)                                                                                 1.02    102.9±0.40µs        ? ?/sec    1.00    100.9±0.30µs        ? ?/sec
convert_rows 4096 string_dictionary(30, 0)                                                                                    1.03     70.4±0.28µs        ? ?/sec    1.00     68.1±0.23µs        ? ?/sec
convert_rows 4096 string_dictionary_low_cardinality(10, 0)                                                                    1.05     56.9±0.15µs        ? ?/sec    1.00     54.2±0.13µs        ? ?/sec
convert_rows 4096 string_dictionary_low_cardinality(100, 0)                                                                   1.00    105.6±0.56µs        ? ?/sec    1.00    105.3±0.45µs        ? ?/sec
convert_rows 4096 string_dictionary_low_cardinality(30, 0)                                                                    1.03     70.0±0.22µs        ? ?/sec    1.00     68.0±0.20µs        ? ?/sec
convert_rows 4096 u64(0)                                                                                                      1.01     33.5±0.07µs        ? ?/sec    1.00     33.2±0.07µs        ? ?/sec
convert_rows 4096 u64(0.3)                                                                                                    1.01     33.6±0.09µs        ? ?/sec    1.00     33.2±0.05µs        ? ?/sec
iterate rows                                                                                                                  1.00      3.3±0.01µs        ? ?/sec    1.00      3.3±0.00µs        ? ?/sec

@alamb alamb merged commit 4fcffa5 into apache:main Jul 28, 2025
13 checks passed
@alamb
Copy link
Contributor

alamb commented Jul 28, 2025

Thank you @rluvaton

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RowConverter on list should only encode the sliced list values and not the entire data

2 participants