Skip to content

Conversation

@andygrove
Copy link
Member

Which issue does this PR close?

N/A

Rationale for this change

is_not_null: column is all nulls
                        time:   [51.571 ns 51.892 ns 52.260 ns]
                        change: [-65.802% -65.423% -65.068%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

is_not_null: column is never null
                        time:   [98.535 ns 98.892 ns 99.258 ns]
                        change: [-34.450% -34.256% -34.098%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

is_not_null: column is mix of values and nulls
                        time:   [54.555 ns 54.634 ns 54.714 ns]
                        change: [-65.402% -65.352% -65.300%] (p = 0.00 < 0.05)
                        Performance has improved.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Jul 21, 2024
@andygrove andygrove marked this pull request as ready for review July 21, 2024 19:22
@andygrove andygrove requested review from Dandandan and comphead July 22, 2024 13:40
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove -- I think it would be good to add basic test coverage for Union but otherwise this looks great to me

};
compute::not(&is_null).map_err(Into::into)
} else {
compute::is_not_null(array.as_ref()).map_err(Into::into)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes faster because it calls a single kernel (compute::is_not_null) rather than 2 (is_null and not)?

Could we add some basic tests for union? Perhaps following the model in #11321 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes faster because it calls a single kernel (compute::is_not_null) rather than 2 (is_null and not)?

Yes, exactly. It avoids creating an interim vector that is then discarded.

Could we add some basic tests for union? Perhaps following the model in #11321 ?

We do already have at least one test for IS NOT NULL for union, that was added in #11321.

There is no functional change for union in this PR. The code in compute_is_not_null for union is copied from the compute_is_null method, and adds a call to not, so it is doing the same thing as before but the flow changed a little.

Union is the only case that this PR does not optimize for, because I didn't want to mess with the temporary workaround that is in place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb I pushed a change to simplify the PR and remove the duplicated union code. Let me know if that makes things clearer (or not).


/// workaround <https://github.com/apache/arrow-rs/issues/6017>,
/// this can be replaced with a direct call to `arrow::compute::is_not_null` once it's fixed.
pub(crate) fn compute_is_not_null(array: ArrayRef) -> Result<BooleanArray> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code looks good to me -- I think it also ends up supporting UnionArray in IS NOT NULL

cc @samuelcolvin who added this code in #11321

/// this can be replaced with a direct call to `arrow::compute::is_not_null` once it's fixed.
pub(crate) fn compute_is_not_null(array: ArrayRef) -> Result<BooleanArray> {
if array.as_any().is::<UnionArray>() {
compute::not(&compute_is_null(array)?).map_err(Into::into)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not optimize the union handling - this still calls is_null and not

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is lgtm, thanks @andygrove
For UNION all we can do a followup PR

Copy link
Contributor

@samuelcolvin samuelcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great to me -- thanks everyone!

@alamb alamb merged commit 20b298e into apache:main Jul 24, 2024
@andygrove andygrove deleted the is-null branch July 24, 2024 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants