-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-40813][CONNECT] Add limit and offset to Connect DSL #38275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
R: @cloud-fan |
|
cc @HyukjinKwon @zhengruifeng as I updated the python side accordingly. |
|
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq why is this Fetch? There are two separate logical plans Limit and Offset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is from the original PR which I believe follows SQL semantic. In SQL, offset is something that working with limit. It is limit first then offset.
However in DataFrame this is not the case where offset is an independent operation. I updated PR to decouple this two.
grundprinzip
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this. Maybe add some python tests as well? Or do you want to do this in a follow up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why two relations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the PR description. TLDR DataFrame does not follow SQL semantic (e.g. df.offset(2) or df.offset(2).limit(10) are both valid).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use logical.Offset to make sure where it comes from similar to the proto.Offset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
I will follow up for python. Waiting for some python testing framework improvements to go in first. |
### What changes were proposed in this pull request? The mypy check in master seems to be broken: <img width="676" alt="Screen Shot 2022-10-17 at 1 09 13 PM" src="https://user-images.githubusercontent.com/1938382/196273759-64372862-6caa-4a9e-b700-b665c7ff7e6c.png"> (see it on #38279 and #38275, also reproducible locally). This PR propose to remove the relevant `type: ignore` comments. ### Why are the changes needed? Fix python lint check. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT Closes #38287 from amaliujia/test_python_lint. Authored-by: Rui Wang <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
f0922ab to
aceb40f
Compare
aceb40f to
5be9745
Compare
|
Merged to master. |
|
Late LGTM, it's easier to follow existing expressions |
… Python client ### What changes were proposed in this pull request? Following up after #38275, improve limit and offset in Python client. ### Why are the changes needed? Improve API coverage. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT Closes #38314 from amaliujia/python_test_limit_offset. Authored-by: Rui Wang <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
### What changes were proposed in this pull request? The mypy check in master seems to be broken: <img width="676" alt="Screen Shot 2022-10-17 at 1 09 13 PM" src="https://user-images.githubusercontent.com/1938382/196273759-64372862-6caa-4a9e-b700-b665c7ff7e6c.png"> (see it on apache#38279 and apache#38275, also reproducible locally). This PR propose to remove the relevant `type: ignore` comments. ### Why are the changes needed? Fix python lint check. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT Closes apache#38287 from amaliujia/test_python_lint. Authored-by: Rui Wang <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
### What changes were proposed in this pull request? This PR decouples `LIMIT` and `OFFSET` to two plan which matches with DataFrame semantic where `OFFSET` is an independent operation and can happen before a LIMIT (or any other DataFrame). ### Why are the changes needed? Improve the support for LIMIT and OFFSET. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT Closes apache#38275 from amaliujia/add_limit_offset_to_dsl. Authored-by: Rui Wang <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
… Python client ### What changes were proposed in this pull request? Following up after apache#38275, improve limit and offset in Python client. ### Why are the changes needed? Improve API coverage. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT Closes apache#38314 from amaliujia/python_test_limit_offset. Authored-by: Rui Wang <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
What changes were proposed in this pull request?
This PR decouples
LIMITandOFFSETto two plan which matches with DataFrame semantic whereOFFSETis an independent operation and can happen before a LIMIT (or any other DataFrame).Why are the changes needed?
Improve the support for LIMIT and OFFSET.
Does this PR introduce any user-facing change?
No
How was this patch tested?
UT