Skip to content

Add an option to avoid grouping hash-partitioning #10257

@NGA-TRAN

Description

@NGA-TRAN

Is your feature request related to a problem or challenge?

We have run into a an issue in IOx and described here that the Union is converted to Interleave if their input data can interleave. The Interleave's job is to send their corresponding partitions to the right output. If I understand correctly its purposed is to keep data grouped in their same partitions which will be useful if the operators down stream want data in that shape. As the result, it is not useful to push sort down because we do not need/want data sorted in that case.

However, for our IOx use case, even though the input data to Union can be interleave, a Projection above that (See the plan here adds different constants ("m0" and "m1" in the example) and if we add that constant as a column into the output, data no longer interleave even though their output_partitioning says the do. Further more, we do want those data to get sorted and hence need the sort-push-down.

With the opposite needed described in 2 paragraphs above, converting Union to Interleave is not always needed even if data interleave. This ticket is a request to avoid that from happening.

Describe the solution you'd like

After chatting with @alamb we propose to add an option into the config to tell enforce_distribution no to use Interleave at this step

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions