Skip to content

Conversation

@findepi
Copy link

@findepi findepi commented Oct 22, 2024

Cherry pick apache#13029

Before the change, the ValuesExec containing NullArray would incorrectly report column statistics as being non-null, which would misinform AggregateStatistics optimizer and fold count(always_null) into row count instead of 0.

This commit fixes the column statistics derivation for values with NullArray and therefore fixes execution of logical plans with count over such values.

Note that the bug was not reproducible using DataFusion SQL frontend, because in DataFusion SQL the VALUES (NULL) doesn't have type DataType:Null (it has some apparently arbitrarily picked type instead).

As a follow-up, all usages of Array:null_count should be inspected. The function can easily be misused (it returns "physical nulls", which do not exist for null type).

@findepi findepi requested review from milevin and vgapeyev October 22, 2024 06:51
@github-actions github-actions bot added the core label Oct 22, 2024
Copy link

@vgapeyev vgapeyev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing cross-repo work!

@findepi findepi force-pushed the findepi/count-null-only-41 branch from e46a004 to 3545fbe Compare October 23, 2024 11:46
@findepi
Copy link
Author

findepi commented Oct 23, 2024

rebased on top of #71

* Test Count accumulator with all-nulls

* Fix count on null values

Before the change, the `ValuesExec` containing `NullArray` would
incorrectly report column statistics as being non-null, which would
misinform `AggregateStatistics` optimizer and fold `count(always_null)`
into row count instead of 0.

This commit fixes the column statistics derivation for values with
`NullArray` and therefore fixes execution of logical plans with count
over such values.

Note that the bug was not reproducible using DataFusion SQL frontend,
because in DataFusion SQL the `VALUES (NULL)` doesn't have type
`DataType:Null` (it has some apparently arbitrarily picked type
instead).

As a follow-up, all usages of `Array:null_count` should be inspected.
The function can easily be misused (it returns "physical nulls", which
do not exist for null type).
@findepi findepi force-pushed the findepi/count-null-only-41 branch from 3545fbe to 3ff9b52 Compare October 24, 2024 18:55
@findepi findepi merged commit 89d1c8e into sdf-labs:findepi/41 Oct 25, 2024
24 checks passed
@findepi findepi deleted the findepi/count-null-only-41 branch October 25, 2024 09:04
findepi pushed a commit that referenced this pull request Mar 24, 2025
* Customize window frame support for dialect (#70)

* Customize window frame support for dialect

* fix: ignore frame only when frame implies no frame

* Add comments. move the window frame determine logic to dialect method

* Update datafusion/sql/src/unparser/dialect.rs

* Update test case

* fix

---------

Co-authored-by: Phillip LeBlanc <[email protected]>

* fix clippy

---------

Co-authored-by: Phillip LeBlanc <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants