Skip to content

Conversation

@Kimahriman
Copy link
Contributor

What changes were proposed in this pull request?

Adds Scala and Python bindings for SQL functions inline and inline_outer

Why are the changes needed?

Currently these functions can only be used via SQL or through expr functions. This makes it a little easier to use them with the DataFrame APIs.

Does this PR introduce any user-facing change?

Exposes new functions directly instead of only through SQL.

How was this patch tested?

Updated existing inline tests to use the new Scala binding instead of being called through SQL expressions

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Examples
--------
>>> from pyspark.sql import Row
>>> from pyspark.sql.functions import inline_outer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove this since we're in this package, and assume that users know how to import this.

Copy link
Contributor Author

@Kimahriman Kimahriman Sep 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok. I saw all the PRs about self contained examples and thought that meant copy and pasting the code should run as is. Looked and some of those examples though and saw it was mostly adding those two sections you mentioned, will do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Examples
--------
>>> from pyspark.sql import Row
>>> from pyspark.sql.functions import inline
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Unlike inline, if the array is null or empty then null is produced for each nested column.
.. versionadded:: 3.4.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a Parameters and Returns sections? See also https://numpydoc.readthedocs.io/en/latest/format.html#sections

Copy link
Contributor

@itholic itholic Sep 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also we can add See Also ?

e.g.

See Also
--------
:meth:`explode_outer`
:meth:`inline`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only other See Also I see right now are for element_at/get. The existing generators don't have that yet to each other, is that something that should be added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added parameters/returns

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no strict rules where See Also must to be placed, but I suggested just because it would be good to show related functions for user convenience.

WDYT, @HyukjinKwon ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it's good to have

@HyukjinKwon HyukjinKwon changed the title [SPARK-40314][SQL] Add scala and python bindings for inline and inline_outer [SPARK-40314][SQL][PYTHON] Add scala and python bindings for inline and inline_outer Sep 5, 2022
return _invoke_function_over_columns("posexplode", col)


def inline(col: "ColumnOrName") -> Column:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to that file

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im fine with this. looks fine


test("inline raises exception on array of null type") {
val m = intercept[AnalysisException] {
spark.range(2).selectExpr("inline(array())")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we leave existing UTs alone and add extra tests in this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to update things to match how the explode's were being tested, which is mostly scala functions. I can duplicate some of them with the scala function instead if that would be preferred

@zhengruifeng
Copy link
Contributor

also, what about adding some tests in python/pyspark/sql/tests/test_functions.py?

Copy link
Contributor

@zhengruifeng zhengruifeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise

@Kimahriman
Copy link
Contributor Author

also, what about adding some tests in python/pyspark/sql/tests/test_functions.py?

Thought I found all the places there were explode tests to add inline as well but missed these somehow, I'll add one

@zhengruifeng
Copy link
Contributor

Merged into master, thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants