Skip to content

Default datafusion.optimizer.prefer_existing_sort to true #8572

@tustvold

Description

@tustvold

Is your feature request related to a problem or challenge?

Whilst working on #8540 I was surprised to see removing unbounded causing the DataFusion optimizer to not remove the SortExec from the below plan:

    "SortPreservingMergeExec: [a@0 ASC NULLS LAST]",
    "  SortExec: expr=[a@0 ASC NULLS LAST]",
    "    RepartitionExec: partitioning=Hash([c@1], 8), input_partitions=8",
    "      RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1",
    "        CsvExec: file_groups={1 group: [[file_path]]}, projection=[a, c, d], output_ordering=[a@0 ASC NULLS LAST], has_header=true",

Doing some spelunking this appears to be a regression introduced by #7671 (comment)

Describe the solution you'd like

I can't see an obvious reason to not enable this by default, as it seems like the more reasonable default, and also consistent with how I historically remember DataFusion behaving

Describe alternatives you've considered

No response

Additional context

FYI @alamb @ozankabak

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions