-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Add partial_sort.slt test for partially sorted data #16900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Do you want to see some PartialSortExec in these new tests? |
01)Sort: data.a ASC NULLS LAST, data.b ASC NULLS LAST | ||
02)--TableScan: data projection=[a, b] | ||
physical_plan | ||
01)SortExec: expr=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST], preserve_partitioning=[false] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to see some PartialSortExec in these new tests?
@berkaysynnada Yes, I expect to see PartialSortExec here specifically
5 | ||
|
||
statement ok | ||
CREATE EXTERNAL TABLE data ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then you need to create the table as "CREATE UNBOUNDED EXTERNAL TABLE"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason not to use PartialSortExec for normal tables? It seems like it is more efficient for bounded tables too (it doesn't have to buffer the entire input, for example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems @EeshanBembi has already created a PR to use PartialSortExec in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a reason not to use PartialSortExec for normal tables? It seems like it is more efficient for bounded tables too (it doesn't have to buffer the entire input, for example)
I'm not sure about the efficiency. If there is no spilling work etc., collecting all batches and sorting them in one shot could be better, but needs a benchmark to be sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense -- thank you for the clarification
run extended tests |
Which issue does this PR close?
PartialSortExec
) #16899Rationale for this change
I was trying to write some tests for #16881 from @berkaysynnada but I hit #16899. So let's add some tests to show what is happening
What changes are included in this PR?
Add a new
partial_sorts.slt
test showing whenPartialSort
should be being usedAre these changes tested?
Only tests
Are there any user-facing changes?
No