-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-40314][SQL][PYTHON] Add scala and python bindings for inline and inline_outer #37770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
python/pyspark/sql/functions.py
Outdated
| Examples | ||
| -------- | ||
| >>> from pyspark.sql import Row | ||
| >>> from pyspark.sql.functions import inline_outer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can remove this since we're in this package, and assume that users know how to import this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok. I saw all the PRs about self contained examples and thought that meant copy and pasting the code should run as is. Looked and some of those examples though and saw it was mostly adding those two sections you mentioned, will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
python/pyspark/sql/functions.py
Outdated
| Examples | ||
| -------- | ||
| >>> from pyspark.sql import Row | ||
| >>> from pyspark.sql.functions import inline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
| Unlike inline, if the array is null or empty then null is produced for each nested column. | ||
| .. versionadded:: 3.4.0 | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a Parameters and Returns sections? See also https://numpydoc.readthedocs.io/en/latest/format.html#sections
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And also we can add See Also ?
e.g.
See Also
--------
:meth:`explode_outer`
:meth:`inline`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only other See Also I see right now are for element_at/get. The existing generators don't have that yet to each other, is that something that should be added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added parameters/returns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no strict rules where See Also must to be placed, but I suggested just because it would be good to show related functions for user convenience.
WDYT, @HyukjinKwon ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it's good to have
| return _invoke_function_over_columns("posexplode", col) | ||
|
|
||
|
|
||
| def inline(col: "ColumnOrName") -> Column: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to that file
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im fine with this. looks fine
|
|
||
| test("inline raises exception on array of null type") { | ||
| val m = intercept[AnalysisException] { | ||
| spark.range(2).selectExpr("inline(array())") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we leave existing UTs alone and add extra tests in this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to update things to match how the explode's were being tested, which is mostly scala functions. I can duplicate some of them with the scala function instead if that would be preferred
|
also, what about adding some tests in |
zhengruifeng
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise
Thought I found all the places there were explode tests to add inline as well but missed these somehow, I'll add one |
|
Merged into master, thanks all! |
What changes were proposed in this pull request?
Adds Scala and Python bindings for SQL functions inline and inline_outer
Why are the changes needed?
Currently these functions can only be used via SQL or through
exprfunctions. This makes it a little easier to use them with the DataFrame APIs.Does this PR introduce any user-facing change?
Exposes new functions directly instead of only through SQL.
How was this patch tested?
Updated existing inline tests to use the new Scala binding instead of being called through SQL expressions