-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
SortPreservingMerge now makes use of the arrow row format and this has yielded significant performance improvements over the prior DynComparator based approach. We can likely signifcantly improve the performance of SortExec by modifying sort_batch to also make use of the row format when performing multi-column sorts, instead of lexsort_to_indices which internally uses DynComparator.
For single-column sorts lexsort_to_indices will call through to sort_to_indices which will be faster than the row format, we should make sure to keep this special case.
Describe the solution you'd like
A first iteration could simply modify sort_batch to use the row format for multi-column sorts, as demonstrated here, falling back to sort_to_indices if only a single column.
A second iteration could then look to find a way to convert to the row format once, and preserve this encoding when feeding sorted batches into SortPreservingMerge.
Describe alternatives you've considered
We could not do this
Additional context