Add support for multiple partitions with SortExec (#362) #378

tustvold · 2021-05-21T13:02:25Z

Which issue does this PR close?

Rationale for this change

Once an order preserving merge operator is added as part of #362 it will be possible to combine multiple sorted partitions together into a single partition - effectively yielding partitioned sort. Loosening the restriction on SortExec to a single partition allows it to form the sort part of this.

What changes are included in this PR?

SortExec is no longer restricted to a single partition, instead preserving the partitioning of its inputs

Dandandan · 2021-05-21T15:46:04Z

Currently sort needs a single partition as otherwise the partitions are not sorted. A mergeexec currently is added based on this requirement.

So this won't work I think untill we have the implementation to merge the sorted partitions which you are working on in #379

alamb

As @Dandandan pointed out, at the moment there is an assumption in the planner that a SortExec produces a single sorted partition as the output. When SortExec reports that it requires a single stream for its input, part of the planner puts a MergeExec (confusing name, I know) before to get a single stream. This is why the unit tests are failing on this PR (b/c the output order is wrong in some queries)

Perhaps one way to allow a SortExec to run on multiple partitions would be to pass in the desired output partitioning to the SortExec constructor and then use that request to decide what the input to the SortExec should be

tustvold · 2021-05-24T12:45:15Z

I've added a new constructor that allows opting into the new behaviour. I wasn't aware of the way that MergeExec is plumbed into the plans and that this would break it.

I do wonder if instead of relying on an AddMergeExec optimisation pass, the plan conversion from LogicalPlan::Sort should just inspect the input partitioning and add the Merge if necessary. After all, it already has to inspect the partitioning for operators such as LogicalPlan::Limit, and so not just generating a valid plan from the outset seems a touch surprising to me...

Dandandan · 2021-05-24T13:09:05Z

I've added a new constructor that allows opting into the new behaviour. I wasn't aware of the way that MergeExec is plumbed into the plans and that this would break it.

I do wonder if instead of relying on an AddMergeExec optimisation pass, the plan conversion from LogicalPlan::Sort should just inspect the input partitioning and add the Merge if necessary. After all, it already has to inspect the partitioning for operators such as LogicalPlan::Limit, and so not just generating a valid plan from the outset seems a touch surprising to me...

I think you are right about that, we should not rely on the optimizer to make the execution plan correct. I think it would be better if the planner adds the MergeExecs for the appropriate nodes.

tustvold · 2021-05-24T16:19:04Z

I did some more digging into this and created #412 to track the fact that PhysicalPlanner currently creates plans that are incorrect.

However, I think the issue is actually a bit more subtle than I first realised. Currently Repartition may insert RepartitionExec between an operator and its children, provided that operator doesn't require a single partition. It is then reliant on a later optimisation pass with AddMergeExec to join together the partitions if a operator further up the tree requires it.

This means that the operators inserted by PhysicalPlan must somehow remember the partitioning they need to be correct, in order to prevent the optimiser from breaking them, simply adding MergeExec when generating the initial plan is insufficient.

There are a couple of ways this gets handled that I can see:

Limit has two separate operators - GlobalLimitExec and LocalLimitExec
HashAggregateExec has an AggregateMode enumeration

I therefore think the addition of a preserve_partitioning flag to SortExec is necessary and has precedent.

However, it is unfortunately insufficient as nothing prevents Repartition from repartitioning a sorted partition (I think this might be an issue more generally). I need to think on this more, perhaps as @alamb mentioned on #379 there needs to be a concept of sorted-ness introduced for operators that optimisation passes such as Repartition and AddMergeExec would respect.

Going to mark this as a draft for now, as the above will have implications for what the best way forward for this is

alamb · 2021-05-24T17:40:26Z

I think you are right about that, we should not rely on the optimizer to make the execution plan correct. I think it would be better if the planner adds the MergeExecs for the appropriate nodes.

I agree with @tustvold and @Dandandan on this -- I think the plan should generate correct results without requiring optimizer passes being run. The optimizer passes should just (potentially) make the plans faster.

I therefore think the addition of a preserve_partitioning flag to SortExec is necessary and has precedent.

I agree

Currently Repartition may insert RepartitionExec between an operator and its children, provided that operator doesn't require a single partition. It is then reliant on a later optimisation pass with AddMergeExec to join together the partitions if a operator further up the tree requires it.

Is there any reason we can't call AddMergeExec multiple times? Once (and always) as part of creating the physical plans and then potentially again as part of Repartition?

tustvold · 2021-05-25T15:36:00Z

I think this is "as correct as current master" and therefore marking this as ready for review. It is impacted by #423 (the issue alluded to above r.e. the Repartition pass), however, so is current master, and so I think this is a separate issue that can be fixed independently of this.

alamb

The only thing I think that would make this PR better is tests, but I believe tests are added in #362 so I think we should merge this PR in as is.

@Dandandan any thoughts?

Dandandan

I think it's OK when the tests follow later 👍 thanks @tustvold

…e#378) * Add support for multiple partitions with SortExec * make SortExec partitioning optional

* Make expr member of PyExpr public * Add RexType to Expr * Add utility functions for mapping ScalarValue instances to DataTypeMap instances * Add function to get python_value from Expr instance * Fix syntax problems * Add function to get the operands for a Rex::Call * Add function to get operator for RexType::Call * expand types function to include variant support for BinaryExpr * Add variant coverage for Decimal128 and Decimal256 * add function for getting the column name of an Expr from a LogicalPlan * Make PyProjection::projection member public * Add projected_expressions to projection node * Adjust function signature * Add Distinct variant to to_variant function in PyLogicalPlan * Fill in variants for DataType::Timestamp * Address syntax issues * Refactor types() function to extend support for CAST * Update CAST variant handling * Cargo fmt * Cargo clippy * Coverage for INTERVAL in DataType * More cargo fmt changes

Add support for multiple partitions with SortExec

120eccf

tustvold changed the title ~~Add support for multiple partitions with SortExec~~ Add support for multiple partitions with SortExec (#362) May 21, 2021

alamb reviewed May 21, 2021

View reviewed changes

make SortExec partitioning optional

b92056c

This was referenced May 24, 2021

DefaultPhysicalPlanner Generates Invalid Physical Plans #412

Open

Sort preserving merge (#362) #379

Merged

tustvold marked this pull request as draft May 24, 2021 16:19

This was referenced May 25, 2021

Repartition Optimisation Pass Breaks Sorting (wrong answer with limit) #423

Closed

Design how to respect output stream ordering #424

Closed

tustvold marked this pull request as ready for review May 25, 2021 15:36

alamb approved these changes May 25, 2021

View reviewed changes

Dandandan approved these changes May 25, 2021

View reviewed changes

alamb merged commit 3593d1f into apache:master May 25, 2021

jimexist pushed a commit to jimexist/arrow-datafusion that referenced this pull request May 26, 2021

Add support for multiple partitions with SortExec (apache#362) (apach…

e412208

…e#378) * Add support for multiple partitions with SortExec * make SortExec partitioning optional

e-dard mentioned this pull request Jul 13, 2021

SortExec Multiple Partition Support #377

Closed

houqp added datafusion enhancement New feature or request labels Jul 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for multiple partitions with SortExec (#362) #378

Add support for multiple partitions with SortExec (#362) #378

Uh oh!

tustvold commented May 21, 2021 •

edited by alamb

Loading

Uh oh!

Dandandan commented May 21, 2021 •

edited

Loading

Uh oh!

alamb left a comment •

edited

Loading

Uh oh!

tustvold commented May 24, 2021

Uh oh!

Dandandan commented May 24, 2021 •

edited

Loading

Uh oh!

tustvold commented May 24, 2021 •

edited

Loading

Uh oh!

alamb commented May 24, 2021

Uh oh!

tustvold commented May 25, 2021 •

edited

Loading

Uh oh!

alamb left a comment

Uh oh!

Dandandan left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add support for multiple partitions with SortExec (#362) #378

Add support for multiple partitions with SortExec (#362) #378

Uh oh!

Conversation

tustvold commented May 21, 2021 • edited by alamb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Uh oh!

Dandandan commented May 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tustvold commented May 24, 2021

Uh oh!

Dandandan commented May 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tustvold commented May 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented May 24, 2021

Uh oh!

tustvold commented May 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Dandandan left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tustvold commented May 21, 2021 •

edited by alamb

Loading

Dandandan commented May 21, 2021 •

edited

Loading

alamb left a comment •

edited

Loading

Dandandan commented May 24, 2021 •

edited

Loading

tustvold commented May 24, 2021 •

edited

Loading

tustvold commented May 25, 2021 •

edited

Loading