Skip to content

Conversation

@amaliujia
Copy link
Contributor

@amaliujia amaliujia commented Oct 16, 2022

What changes were proposed in this pull request?

This PR decouples LIMIT and OFFSET to two plan which matches with DataFrame semantic where OFFSET is an independent operation and can happen before a LIMIT (or any other DataFrame).

Why are the changes needed?

Improve the support for LIMIT and OFFSET.

Does this PR introduce any user-facing change?

No

How was this patch tested?

UT

@amaliujia
Copy link
Contributor Author

R: @cloud-fan

@amaliujia
Copy link
Contributor Author

cc @HyukjinKwon @zhengruifeng as I updated the python side accordingly.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq why is this Fetch? There are two separate logical plans Limit and Offset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from the original PR which I believe follows SQL semantic. In SQL, offset is something that working with limit. It is limit first then offset.

However in DataFrame this is not the case where offset is an independent operation. I updated PR to decouple this two.

Copy link
Contributor

@grundprinzip grundprinzip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. Maybe add some python tests as well? Or do you want to do this in a follow up?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why two relations?

Copy link
Contributor Author

@amaliujia amaliujia Oct 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the PR description. TLDR DataFrame does not follow SQL semantic (e.g. df.offset(2) or df.offset(2).limit(10) are both valid).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use logical.Offset to make sure where it comes from similar to the proto.Offset

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@amaliujia
Copy link
Contributor Author

amaliujia commented Oct 17, 2022

Thanks for doing this. Maybe add some python tests as well? Or do you want to do this in a follow up?

I will follow up for python. Waiting for some python testing framework improvements to go in first.

HyukjinKwon pushed a commit that referenced this pull request Oct 18, 2022
### What changes were proposed in this pull request?

The mypy check in master seems to be broken:

<img width="676" alt="Screen Shot 2022-10-17 at 1 09 13 PM" src="https://user-images.githubusercontent.com/1938382/196273759-64372862-6caa-4a9e-b700-b665c7ff7e6c.png">

(see it on #38279 and #38275, also reproducible locally).

This PR propose to remove the relevant `type: ignore` comments.

### Why are the changes needed?

Fix python lint check.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

UT

Closes #38287 from amaliujia/test_python_lint.

Authored-by: Rui Wang <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
@amaliujia amaliujia force-pushed the add_limit_offset_to_dsl branch 2 times, most recently from f0922ab to aceb40f Compare October 18, 2022 05:33
@amaliujia amaliujia force-pushed the add_limit_offset_to_dsl branch from aceb40f to 5be9745 Compare October 19, 2022 03:10
@HyukjinKwon
Copy link
Member

Merged to master.

@zhengruifeng
Copy link
Contributor

Late LGTM, it's easier to follow existing expressions

HyukjinKwon pushed a commit that referenced this pull request Oct 21, 2022
… Python client

### What changes were proposed in this pull request?

Following up after #38275, improve limit and offset in Python client.

### Why are the changes needed?

Improve API coverage.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

UT

Closes #38314 from amaliujia/python_test_limit_offset.

Authored-by: Rui Wang <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
### What changes were proposed in this pull request?

The mypy check in master seems to be broken:

<img width="676" alt="Screen Shot 2022-10-17 at 1 09 13 PM" src="https://user-images.githubusercontent.com/1938382/196273759-64372862-6caa-4a9e-b700-b665c7ff7e6c.png">

(see it on apache#38279 and apache#38275, also reproducible locally).

This PR propose to remove the relevant `type: ignore` comments.

### Why are the changes needed?

Fix python lint check.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

UT

Closes apache#38287 from amaliujia/test_python_lint.

Authored-by: Rui Wang <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
### What changes were proposed in this pull request?

This PR decouples `LIMIT` and `OFFSET` to two plan which matches with DataFrame semantic where `OFFSET` is an independent operation and can happen before a LIMIT (or any other DataFrame).

### Why are the changes needed?

Improve the support for LIMIT and OFFSET.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

UT

Closes apache#38275 from amaliujia/add_limit_offset_to_dsl.

Authored-by: Rui Wang <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
… Python client

### What changes were proposed in this pull request?

Following up after apache#38275, improve limit and offset in Python client.

### Why are the changes needed?

Improve API coverage.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

UT

Closes apache#38314 from amaliujia/python_test_limit_offset.

Authored-by: Rui Wang <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants