Skip to content

Conversation

berkaysynnada
Copy link
Contributor

Which issue does this PR close?

  • Closes #.

Rationale for this change

PartialSortExec had a missing functionality where it failed to detect slice points when consecutive batches contained homogeneous sort key values (e.g., first batch: all rows with a=1, second batch: all rows with a=2). The get_slice_point method only checked within individual batches, missing cross-batch transitions.

What changes are included in this PR?

  • Modified the batching logic to accumulate incoming batches into a single buffer before checking for slice points
  • Added proper handling of fetch limits to avoid processing after the limit is reached

Are these changes tested?

Yes

Are there any user-facing changes?

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Jul 24, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @berkaysynnada -- I went over the PR carefully and I think it makes sense to me

The only thing I would also like to see is a SQL level test. However, I was not able to write one as I can't seem to get PartialSortExec to appear in the tests. I filed a ticket about this:

@github-actions github-actions bot added core Core DataFusion crate catalog Related to the catalog crate labels Jul 25, 2025
@berkaysynnada
Copy link
Contributor Author

The only thing I would also like to see is a SQL level test. However, I was not able to write one as I can't seem to get PartialSortExec to appear in the tests. I filed a ticket about this:

Can you check my last commit? I think it's what you want but I'm not sure about the location

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thanks @berkaysynnada

@berkaysynnada berkaysynnada merged commit 8bf7123 into apache:main Jul 27, 2025
27 checks passed
adriangb pushed a commit to pydantic/datafusion that referenced this pull request Jul 28, 2025
* Update partial_sort.rs

* Update partial_sort.rs

* Update partial_sort.rs

* add sql test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
catalog Related to the catalog crate core Core DataFusion crate physical-plan Changes to the physical-plan crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants