Skip to content

Conversation

@mustafasrepo
Copy link
Contributor

Which issue does this PR close?

Closes #7794.

Rationale for this change

What changes are included in this PR?

In the issue, EnforceSorting moves up the sort that satisfies the window. In this case, operators that are below the SortExec do not have ordering at their inputs. However, some of them (SortPreservingRepartitionExec) expects its input to be ordered. While removing SortExec down the plan, we also replace operators with their corresponding variants that do not preserve order. With this change I believe SortPreservingRepartitionExec will be replaced with RepartitionExec in the issue.

Are these changes tested?

Yes, a new test is added to replicate problem in the issue.

Are there any user-facing changes?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @mustafasrepo -- this would have taken me quite a while to figure out.

I reviewed this PR carefully and verified it solved my problem downstream.

// corresponding SortExecs together. Also, the inputs of these `SortExec`s
// are not necessarily the same to be able to remove them.
let expected_input = ["BoundedWindowAggExec: wdw=[count: Ok(Field { name: \"count\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), frame: WindowFrame { units: Range, start_bound: Preceding(NULL), end_bound: CurrentRow }], mode=[Sorted]",
let expected_input = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff made it hard to see -- but I am pretty sure this test did not change, just the formatting did. Is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just inserted new line

" SortExec: expr=[a@0 ASC,b@1 ASC]",
" CsvExec: file_groups={1 group: [[x]]}, projection=[a, b, c, d, e], has_header=false",
];
let expected_optimized = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the code in this PR, I verified this test failed. The plan showed the same pattern as described in #7794

    "BoundedWindowAggExec: wdw=[count: Ok(Field { name: \"count\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), frame: WindowFrame { units: Range, start_bound: Preceding(NULL), end_bound: CurrentRow }], mode=[Sorted]",
    "  SortPreservingMergeExec: []",
    "    SortExec: expr=[a@0 ASC,b@1 ASC]",
    "      SortPreservingRepartitionExec: partitioning=RoundRobinBatch(10), input_partitions=10",
    "        RepartitionExec: partitioning=RoundRobinBatch(10), input_partitions=1",
    "          CsvExec: file_groups={1 group: [[x]]}, projection=[a, b, c, d, e], has_header=false",

@alamb
Copy link
Contributor

alamb commented Oct 12, 2023

I don't see any reason to delay merging this fix. Thanks again @mustafasrepo

@alamb alamb merged commit 036dc48 into apache:main Oct 12, 2023
" SortExec: expr=[nullable_col@0 DESC NULLS LAST]",
" ParquetExec: file_groups={1 group: [[x]]}, projection=[nullable_col, non_nullable_col], output_ordering=[nullable_col@0 ASC]"];
let expected_optimized = ["WindowAggExec: wdw=[count: Ok(Field { name: \"count\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), frame: WindowFrame { units: Range, start_bound: CurrentRow, end_bound: Following(NULL) }]",
let expected_optimized = [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to above case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Invalid plans result in SortPreservingRepartitionExec with no SortExprs

2 participants