-
Notifications
You must be signed in to change notification settings - Fork 1k
Closed
Labels
arrowChanges to the arrow crateChanges to the arrow crateenhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changelogperformance
Description
Is your feature request related to a problem or challenge?
I ran some benchmarks in DataFusion (sort_tpch) and I saw that interleave_views
take up a large amount of time for the sorting benchmark (sort_tpch).

It shows up taking roughly 17% of the samples of SortPreservingMergeExec (of 77%, so it's about 25% of the samples).
Looking at the samples, it shows that a lot of time is spent managing a hashmap, rehashing, allocating, etc.
Describe the solution you'd like
We should be able to optimize this. I am not 100% sure what the purpose of the hashmap is here, but we should be able to optimize this to a great extent.
I think we can combine it with the improvements that are done to concat
and coalesce
@alamb
Describe alternatives you've considered
No response
Additional context
No response
alamb and zhuqi-lucas
Metadata
Metadata
Assignees
Labels
arrowChanges to the arrow crateChanges to the arrow crateenhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changelogperformance