-
Notifications
You must be signed in to change notification settings - Fork 1.7k
fix: respect inexact flags in row group metadata #16412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: respect inexact flags in row group metadata #16412
Conversation
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
27eeff3
to
27b4595
Compare
1eaac41
to
c43f1de
Compare
Hi @alamb, this pr tried to extract the exactness flags in row group metadata, could you please take a look :) |
c43f1de
to
bf10479
Compare
/// The value `0` appears at indices `[0, 2, 4]`. The corresponding exactness | ||
/// values are `[true, false, false]`. Since at least one is `true`, the | ||
/// function returns `Some(true)`. | ||
fn has_any_exact_match( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated a unit test with 4 possible scenarios. Also use a struct to make clippy happy, PTAL :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this is a good finding and nice fix!
Thank you @xudong963 and @CookiePieWw |
This reverts commit afc90f7.
This reverts commit afc90f7.
This reverts commit afc90f7.
This reverts commit afc90f7.
* Revert "Upgrade arrow/parquet to 56.0.0 (apache#16690)" This reverts commit fa1f8c1. * Revert "refactor: use upstream inline_key_fast (apache#17044)" This reverts commit 71b92bc. * Revert "fix: respect inexact flags in row group metadata (apache#16412)" This reverts commit afc90f7. * Revert "Test grouping by FixedSizeList (apache#17415)" This reverts commit 03f39e5. * Spelling (got reverted) * Also allow Byt from tests * Adjust sqllogictests
Which issue does this PR close?
Rationale for this change
Currently, datafusion will treat all max and min values in column stats as exact, while some of them may be inexact.
What changes are included in this PR?
For each row group, when max or min value is calculated, retrieve its corresponding exactness flag. The final max or min value's exactness represents the final exactness flag. Wrap the max and min stats with
Inexact
orExact
based on the final exactness flagAre these changes tested?
Are there any user-facing changes?
Now datafusion will correctly report the exactness of column max and min values.