-
Notifications
You must be signed in to change notification settings - Fork 1k
Closed
Labels
arrowChanges to the arrow crateChanges to the arrow crateenhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changelogperformance
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The BatchCoalescer 's api push_batch incrementally builds up an array and produces a final output
GenericInProgressArrayis a generic implementation that works by bufferingArrayRefand then callingconcat- There are specialized implementations such as
InProgressByteViewArraythat are more efficient for certain data types (implemented in Optimize coalesce kernel for StringView (10-50% faster) #7650)
The specialized implementations can go quite a bit faster (30-50% depending)
Describe the solution you'd like
Improved performance, as measured by benchmarks for the data type named above
cargo bench --bench coalesce_kernelsDescribe alternatives you've considered
For primitive arrays, I think we could use a Vec<NativeType> to build up the in progress data and then convert that to the appropriate array type pretty easily
Additional context
- the use case is described in detail here Optimize take/filter/concat from multiple input arrays to a single large output array #6692
Metadata
Metadata
Assignees
Labels
arrowChanges to the arrow crateChanges to the arrow crateenhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changelogperformance