-
Notifications
You must be signed in to change notification settings - Fork 1.8k
chore: update Repartition DisplayAs to indicate maintained sort order #18673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Taking a look at this one shortly |
gabotechs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to say I'm not fully convinced about this change. The extra added information is redundant with information already present in the plan display, as it's not really adding anything new, and it can make the plans more verbose without adding any real information.
I'd love to weight opinions of other contributors though.
| if self.preserve_order { | ||
| write!(f, ", preserve_order=true")?; | ||
| } else if input_partition_count <= 1 { | ||
| write!(f, ", maintains_sort_order=true")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the impression that this maintains_sort_order here might be confusing, as it's not a real property of RepartitionExec, and instead is something artificially derived from other properties that are already displayed, and therefore, redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree -- this was confusing at first. Reading @ruchirK 's explanation in the description makes sense, but I think it is somewhat subtle. Maybe we can add some comments to the code to make it clearer.
I'll leave a suggestion
Thank you for the thoughtful and prompt review @gabotechs ! My understanding of the original task was that in some cases it might be helpful to display some redundant information to make plan diagrams easier to parse at a glance. The way I see it, there's two related questions here.
My personal take is to generally agree with you that these changes are not super big usability wins. I'd love to here @gene-bordegaray 's take as well. For now, I'll update the PR to remove the |
26a1c57 to
4cd99c3
Compare
|
I think at this point there's a lot of subjectivity here. I personally prefer a less cluttered display that does not contain redundant information. Following that same rule I'd also expect other plans like My point of view is that adding the extra information proposed in this PR is making the plan slightly less readable and slightly more confusing, so I cannot in good conscience give a +1 to this myself, but I'm not the only one reading plans here, and I don't have a strong opinion, so would love to hear opinions from other contributors and agree to what others think is best. Maybe @adriangb or @Jefffrey? |
datafusion/core/tests/physical_optimizer/enforce_distribution.rs
Outdated
Show resolved
Hide resolved
datafusion/core/tests/physical_optimizer/enforce_distribution.rs
Outdated
Show resolved
Hide resolved
|
I agree I wouldn’t want to see this in every plan but if the input is sorted it would be helpful to see that RepartitionExec preserves that. |
I'm of a similar mind here; as the PR currently is, the added information isn't particularly helpful for a lot of the plans I'm seeing changed |
679c489 to
9539583
Compare
gabotechs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still maintain the personal preference of https://github.com/apache/datafusion/pull/18673/files#r2524672693, but as some people find this useful then +1 on my side. Thanks for this work!
9539583 to
b63f77a
Compare
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ruchirK and @gabotechs
I have a few suggestions, but I also think this PR could be merged as is if we want
| if self.preserve_order { | ||
| write!(f, ", preserve_order=true")?; | ||
| } else if input_partition_count <= 1 { | ||
| write!(f, ", maintains_sort_order=true")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree -- this was confusing at first. Reading @ruchirK 's explanation in the description makes sense, but I think it is somewhat subtle. Maybe we can add some comments to the code to make it clearer.
I'll leave a suggestion
Previously, it was not obvious reading the plan diagram when a Repartition operator maintained sortedness by virtue of having a single input partition even if preserve_sort order was false. This commit makes the implicit sortedness preservation explicit in the plan diagram. This commit does not change anything for the case when preserve sort order is false, or if the input was not originally sorted.
b63f77a to
31d954e
Compare
|
@alamb thanks for the review I've followed all of your suggestions! |
Previously, it was not obvious reading the plan diagram when a Repartition operator maintained sortedness by virtue of having a single input partition even if preserve_sort order was false. This commit makes the implicit sortedness preservation explicit in the plan diagram. This commit does not change anything for the case when preserve sort order is false.
Which issue does this PR close?
Are these changes tested?
Yes
Are there any user-facing changes?
No