Skip to content

Run SortPreservingMerge Inputs Concurrently #6162

@tustvold

Description

@tustvold

Is your feature request related to a problem or challenge?

SortPreservingMerge currently interleaves the execution of its input streams, and as a result all its inputs will be evaluated serially.

This is unfortunate if its inputs are performing a potentially expensive operation, such as decoding parquet.

Describe the solution you'd like

Perhaps we should do something similar to RepartitionExec and spawn the inputs into separate tokio tasks, allowing their execution to be scheduled in parallel

Describe alternatives you've considered

Longer-term a true morsel-driven concurrency story might be nice (#2504) and would avoid needing to manually parallelise operators such as SortPreservingMerge, however, this might be a shorter-term mechanism to boost performance.

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions