Skip to content

Conversation

@ruchirK
Copy link
Contributor

@ruchirK ruchirK commented Nov 13, 2025

Previously, it was not obvious reading the plan diagram when a Repartition operator maintained sortedness by virtue of having a single input partition even if preserve_sort order was false. This commit makes the implicit sortedness preservation explicit in the plan diagram. This commit does not change anything for the case when preserve sort order is false.

Which issue does this PR close?

Are these changes tested?

Yes

Are there any user-facing changes?

No

@github-actions github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) physical-plan Changes to the physical-plan crate labels Nov 13, 2025
@gabotechs
Copy link
Contributor

Taking a look at this one shortly

Copy link
Contributor

@gabotechs gabotechs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to say I'm not fully convinced about this change. The extra added information is redundant with information already present in the plan display, as it's not really adding anything new, and it can make the plans more verbose without adding any real information.

I'd love to weight opinions of other contributors though.

if self.preserve_order {
write!(f, ", preserve_order=true")?;
} else if input_partition_count <= 1 {
write!(f, ", maintains_sort_order=true")?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the impression that this maintains_sort_order here might be confusing, as it's not a real property of RepartitionExec, and instead is something artificially derived from other properties that are already displayed, and therefore, redundant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree -- this was confusing at first. Reading @ruchirK 's explanation in the description makes sense, but I think it is somewhat subtle. Maybe we can add some comments to the code to make it clearer.

I'll leave a suggestion

@ruchirK
Copy link
Contributor Author

ruchirK commented Nov 14, 2025

I have to say I'm not fully convinced about this change. The extra added information is redundant with information already present in the plan display, as it's not really adding anything new, and it can make the plans more verbose without adding any real information.

I'd love to weight opinions of other contributors though.

Thank you for the thoughtful and prompt review @gabotechs ! My understanding of the original task was that in some cases it might be helpful to display some redundant information to make plan diagrams easier to parse at a glance.

The way I see it, there's two related questions here.

  1. Should we add some new text to the plan diagram for cases when the RepartitionExec operator preserves sortedness? There's at least 3 answers here.
    a. No, we already display input_partitions=1 which is sufficient.
    b. Yes, but only in the implicit case. Display maintains_sort_order=true when there's only 1 partition. If preserve_order is true just display that, and don't show maintains_sort_order. That's the intended behavior of this PR.
    c. Yes, but go further. Fully remove preserve_sort_order=true and only display maintains_sort_order. This would, as mentioned in the gh issue, conflate the implicit and explicit decisions.

  2. When sortedness is implicitly maintained in the single partition case, should we show something in the plan diagram indicating what the implicitly maintained sort order is? Note that this is a little bit independent of question 1 and this was my intent with the sort_exprs change.

My personal take is to generally agree with you that these changes are not super big usability wins. I'd love to here @gene-bordegaray 's take as well. For now, I'll update the PR to remove the sort_exprs change and then we can go from there.

@ruchirK ruchirK force-pushed the repartition-display branch from 26a1c57 to 4cd99c3 Compare November 14, 2025 02:42
@gabotechs
Copy link
Contributor

gabotechs commented Nov 14, 2025

I think at this point there's a lot of subjectivity here.

I personally prefer a less cluttered display that does not contain redundant information. Following that same rule I'd also expect other plans like FilterExec or CoalesceBatchesExec to display wether they are implicitly maintaining the input ordering or not, but we don't seem to be doing that here, and I agree that it would make the cluttering even worst.

My point of view is that adding the extra information proposed in this PR is making the plan slightly less readable and slightly more confusing, so I cannot in good conscience give a +1 to this myself, but I'm not the only one reading plans here, and I don't have a strong opinion, so would love to hear opinions from other contributors and agree to what others think is best. Maybe @adriangb or @Jefffrey?

@adriangb
Copy link
Contributor

I agree I wouldn’t want to see this in every plan but if the input is sorted it would be helpful to see that RepartitionExec preserves that.

@Jefffrey
Copy link
Contributor

I agree I wouldn’t want to see this in every plan but if the input is sorted it would be helpful to see that RepartitionExec preserves that.

I'm of a similar mind here; as the PR currently is, the added information isn't particularly helpful for a lot of the plans I'm seeing changed

@ruchirK ruchirK force-pushed the repartition-display branch 4 times, most recently from 679c489 to 9539583 Compare November 19, 2025 18:05
Copy link
Contributor

@gabotechs gabotechs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still maintain the personal preference of https://github.com/apache/datafusion/pull/18673/files#r2524672693, but as some people find this useful then +1 on my side. Thanks for this work!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ruchirK and @gabotechs

I have a few suggestions, but I also think this PR could be merged as is if we want

if self.preserve_order {
write!(f, ", preserve_order=true")?;
} else if input_partition_count <= 1 {
write!(f, ", maintains_sort_order=true")?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree -- this was confusing at first. Reading @ruchirK 's explanation in the description makes sense, but I think it is somewhat subtle. Maybe we can add some comments to the code to make it clearer.

I'll leave a suggestion

Previously, it was not obvious reading the plan diagram when a Repartition
operator maintained sortedness by virtue of having a single input partition even if
preserve_sort order was false. This commit makes the implicit sortedness preservation
explicit in the plan diagram. This commit does not change anything for the case when
preserve sort order is false, or if the input was not originally sorted.
@ruchirK ruchirK force-pushed the repartition-display branch from b63f77a to 31d954e Compare November 21, 2025 18:44
@ruchirK
Copy link
Contributor Author

ruchirK commented Nov 21, 2025

@alamb thanks for the review I've followed all of your suggestions!

@gabotechs
Copy link
Contributor

Thanks @ruchirK for the PR and @adriangb, @Jefffrey, @NGA-TRAN and @alamb for the input!

@gabotechs gabotechs added this pull request to the merge queue Nov 22, 2025
Merged via the queue into apache:main with commit 5f22722 Nov 22, 2025
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Display sort order in Hash Repartition

8 participants