Skip to content

interleave_views is really slow #7688

@Dandandan

Description

@Dandandan

Is your feature request related to a problem or challenge?

I ran some benchmarks in DataFusion (sort_tpch) and I saw that interleave_views take up a large amount of time for the sorting benchmark (sort_tpch).

Image

It shows up taking roughly 17% of the samples of SortPreservingMergeExec (of 77%, so it's about 25% of the samples).

Looking at the samples, it shows that a lot of time is spent managing a hashmap, rehashing, allocating, etc.
Image

Describe the solution you'd like

We should be able to optimize this. I am not 100% sure what the purpose of the hashmap is here, but we should be able to optimize this to a great extent.

I think we can combine it with the improvements that are done to concat and coalesce @alamb

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    arrowChanges to the arrow crateenhancementAny new improvement worthy of a entry in the changelogperformance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions