-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Fix Partial Sort Get Slice Point Between Batches #16881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @berkaysynnada -- I went over the PR carefully and I think it makes sense to me
The only thing I would also like to see is a SQL level test. However, I was not able to write one as I can't seem to get PartialSortExec
to appear in the tests. I filed a ticket about this:
Can you check my last commit? I think it's what you want but I'm not sure about the location |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- thanks @berkaysynnada
* Update partial_sort.rs * Update partial_sort.rs * Update partial_sort.rs * add sql test
Which issue does this PR close?
Rationale for this change
PartialSortExec had a missing functionality where it failed to detect slice points when consecutive batches contained homogeneous sort key values (e.g., first batch: all rows with a=1, second batch: all rows with a=2). The get_slice_point method only checked within individual batches, missing cross-batch transitions.
What changes are included in this PR?
Are these changes tested?
Yes
Are there any user-facing changes?