Skip to content

Conversation

kosiew
Copy link
Contributor

@kosiew kosiew commented Jun 5, 2025

Which issue does this PR close?

Rationale for this change

Intermittent test failures were observed in the limit.slt SQL logic test file on the main branch, specifically related to a query that uses a LIMIT clause without an ORDER BY. Since SQL LIMIT behavior is non-deterministic without a defined order, this led to flaky test results.

This change explicitly adds an ORDER BY part_key to the LIMIT subquery to ensure deterministic results and resolve test flakiness.

What changes are included in this PR?

  • Modified the SQL logic test to include ORDER BY part_key before the LIMIT clause.
  • Updated the expected logical and physical plans accordingly:
    • The logical plan now includes a Sort operator before the limit.
    • The physical plan includes a SortPreservingMergeExec to maintain global ordering across partitions.
    • Replaces the limited DataSourceExec with a combination of SortExec and TopK on each partition to correctly apply the limit post-sort.

Are these changes tested?

Yes. The change modifies an existing SQL logic test to enforce a deterministic result, which now consistently passes in CI runs.

Tested with:

for i in {1..100}; do
  echo "Run #$i"
  cargo test -p datafusion-sqllogictest limit || break
done

Are there any user-facing changes?

No. This change only affects internal test stability and does not impact the user-facing API or behavior.

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jun 5, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @kosiew

with selection as (
select *
from test_limit_with_partitions
order by part_key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @kosiew -- can you please also update the query above (I think the EXPLAIN and query are supposed to remain in sync

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb
Thanks for pointing this out.
Updated.

Copy link
Contributor

@jonathanc-n jonathanc-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, nice find @kosiew!

@kosiew
Copy link
Contributor Author

kosiew commented Jun 6, 2025

Thanks @jonathanc-n for the review.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kosiew and @jonathanc-n

@andygrove andygrove merged commit 4cf1148 into apache:main Jun 6, 2025
29 checks passed
kosiew added a commit to kosiew/datafusion that referenced this pull request Jun 9, 2025
…BY clause (apache#16257)

* Add order by clause to limit query for consistent results

* test: update explain plan
@kosiew kosiew deleted the flaky-16180 branch July 16, 2025 03:21
crepererum pushed a commit to influxdata/arrow-datafusion that referenced this pull request Jul 17, 2025
…BY clause (apache#16257)

* Add order by clause to limit query for consistent results

* test: update explain plan
findepi pushed a commit to sdf-labs/datafusion that referenced this pull request Aug 13, 2025
…ing ORDER BY clause (apache#16257)

* Add order by clause to limit query for consistent results

* test: update explain plan

(cherry picked from commit 4cf1148)
xudong963 pushed a commit to polygon-io/arrow-datafusion that referenced this pull request Aug 25, 2025
…BY clause (apache#16257)

* Add order by clause to limit query for consistent results

* test: update explain plan
findepi pushed a commit to sdf-labs/datafusion that referenced this pull request Sep 5, 2025
…ing ORDER BY clause (apache#16257)

* Add order by clause to limit query for consistent results

* test: update explain plan

(cherry picked from commit 4cf1148)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Intermittent failures in CI in test_files/limit.slt

4 participants