Skip to content

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jul 24, 2025

Which issue does this PR close?

Rationale for this change

I was trying to write some tests for #16881 from @berkaysynnada but I hit #16899. So let's add some tests to show what is happening

What changes are included in this PR?

Add a new partial_sorts.slt test showing when PartialSort should be being used

Are these changes tested?

Only tests

Are there any user-facing changes?

No

@berkaysynnada
Copy link
Contributor

Do you want to see some PartialSortExec in these new tests?

01)Sort: data.a ASC NULLS LAST, data.b ASC NULLS LAST
02)--TableScan: data projection=[a, b]
physical_plan
01)SortExec: expr=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST], preserve_partitioning=[false]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to see some PartialSortExec in these new tests?

@berkaysynnada Yes, I expect to see PartialSortExec here specifically

5

statement ok
CREATE EXTERNAL TABLE data (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then you need to create the table as "CREATE UNBOUNDED EXTERNAL TABLE"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to use PartialSortExec for normal tables? It seems like it is more efficient for bounded tables too (it doesn't have to buffer the entire input, for example)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems @EeshanBembi has already created a PR to use PartialSortExec in this case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a reason not to use PartialSortExec for normal tables? It seems like it is more efficient for bounded tables too (it doesn't have to buffer the entire input, for example)

I'm not sure about the efficiency. If there is no spilling work etc., collecting all batches and sorting them in one shot could be better, but needs a benchmark to be sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense -- thank you for the clarification

@alamb
Copy link
Contributor Author

alamb commented Aug 10, 2025

run extended tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants