-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Update code comment for the cases of regularized RANGE frame and add tests for ORDER BY cases with RANGE frame #8410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -148,12 +148,22 @@ impl WindowFrame { | |
| pub fn regularize(mut frame: WindowFrame, order_bys: usize) -> Result<WindowFrame> { | ||
| if frame.units == WindowFrameUnits::Range && order_bys != 1 { | ||
| // Normally, RANGE frames require an ORDER BY clause with exactly one | ||
| // column. However, an ORDER BY clause may be absent in two edge cases. | ||
| // column. However, an ORDER BY clause may be absent or present but with | ||
| // more than one column in two edge cases: | ||
| // 1. start bound is UNBOUNDED or CURRENT ROW | ||
| // 2. end bound is CURRENT ROW or UNBOUNDED. | ||
| // In these cases, we regularize the RANGE frame to be equivalent to a ROWS | ||
| // frame with the UNBOUNDED bounds. | ||
| // Note that this follows Postgres behavior. | ||
| if (frame.start_bound.is_unbounded() | ||
| || frame.start_bound == WindowFrameBound::CurrentRow) | ||
| && (frame.end_bound == WindowFrameBound::CurrentRow | ||
| || frame.end_bound.is_unbounded()) | ||
| { | ||
| // If an ORDER BY clause is absent, the frame is equivalent to a ROWS | ||
| // frame with the UNBOUNDED bounds. | ||
| // If an ORDER BY clause is present but has more than one column, the | ||
| // frame is unchanged. | ||
|
Comment on lines
+163
to
+166
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, based on Postgres behavior, it doesn't change the RANGE frame to ROWS frame (see the added tests in sqllogictest which return different results than Postgres). Based on Postgres doc, without |
||
| if order_bys == 0 { | ||
| frame.units = WindowFrameUnits::Rows; | ||
| frame.start_bound = | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3727,3 +3727,61 @@ FROM score_board s | |
|
|
||
| statement ok | ||
| DROP TABLE score_board; | ||
|
|
||
| # Regularize RANGE frame | ||
| query error DataFusion error: Error during planning: RANGE requires exactly one ORDER BY column | ||
| select a, | ||
| rank() over (order by a, a + 1 RANGE BETWEEN UNBOUNDED PRECEDING AND 1 FOLLOWING) rnk | ||
| from (select 1 a union select 2 a) q ORDER BY a | ||
|
|
||
| query II | ||
| select a, | ||
| rank() over (order by a RANGE BETWEEN UNBOUNDED PRECEDING AND 1 FOLLOWING) rnk | ||
| from (select 1 a union select 2 a) q ORDER BY a | ||
| ---- | ||
| 1 1 | ||
| 2 2 | ||
|
|
||
| query error DataFusion error: Error during planning: RANGE requires exactly one ORDER BY column | ||
| select a, | ||
| rank() over (RANGE BETWEEN UNBOUNDED PRECEDING AND 1 FOLLOWING) rnk | ||
| from (select 1 a union select 2 a) q ORDER BY a | ||
|
|
||
| query II | ||
| select a, | ||
| rank() over (order by a, a + 1 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) rnk | ||
| from (select 1 a union select 2 a) q ORDER BY a | ||
| ---- | ||
| 1 1 | ||
| 2 2 | ||
|
|
||
| query II | ||
| select a, | ||
| rank() over (order by a RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) rnk | ||
| from (select 1 a union select 2 a) q ORDER BY a | ||
| ---- | ||
| 1 1 | ||
| 2 2 | ||
|
|
||
| # TODO: this is different to Postgres which returns [1, 1] for `rnk`. | ||
| # Comment it because it is flaky now as it depends on the order of the `a` column. | ||
| # query II | ||
| # select a, | ||
| # rank() over (RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) rnk | ||
| # from (select 1 a union select 2 a) q ORDER BY rnk | ||
| # ---- | ||
| # 1 1 | ||
| # 2 2 | ||
|
|
||
| # TODO: this works in Postgres which returns [1, 1]. | ||
| query error DataFusion error: Arrow error: Invalid argument error: must either specify a row count or at least one column | ||
| select rank() over (RANGE between UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) rnk | ||
| from (select 1 a union select 2 a) q; | ||
|
|
||
| # TODO: this is different to Postgres which returns [1, 1] for `rnk`. | ||
| query I | ||
| select rank() over (order by 1 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) rnk | ||
| from (select 1 a union select 2 a) q ORDER BY rnk | ||
| ---- | ||
| 1 | ||
| 2 | ||
|
Comment on lines
+3781
to
+3787
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have a fix for the issue locally. But this PR focuses on doc and test change, so I will submit the fix after this is merged. |
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that actually Spark doesn't allow empty ORDER BY clause for RANGE frame in any cases. I updated this according to the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So DataFusion is following Postgres behavior on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, actually Postgres behavior allows empty ORDER BY clause, one column or multiple column ORDER BY clause with RANGE frame if start bound/end bound is UNBOUNDED or CURRENT ROW.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In short, DataFusion currently allows various
ORDER BYclause cases (empty, 1 or more column) with RANGE frame, similar to Postgres.But for empty
ORDER BYclause case, the window function result is different to Postgres.See added tests.