Skip to content

Conversation

@jeffreyssmith2nd
Copy link
Contributor

@jeffreyssmith2nd jeffreyssmith2nd commented May 29, 2024

Which issue does this PR close?

Closes #7925.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label May 29, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking definitely on the right path

@jeffreyssmith2nd jeffreyssmith2nd marked this pull request as ready for review May 31, 2024 14:17
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jeffreyssmith2nd

I went over the code and test carefully and I think it looks good. I took the liberty of merging up to resolve a conflict.

I also verified that the test covers the code change by removing the code change locally and running the test. When I did so the test fails like this:


assertion failed: filtered.is_ok()
thread 'datasource::physical_plan::parquet::row_filter::test::test_filter_type_coercion' panicked at datafusion/core/src/datasource/physical_plan/parquet/row_filter.rs:549:9:
assertion failed: filtered.is_ok()
stack backtrace:
   0: rust_begin_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:72:14
   2: core::panicking::panic
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:145:5
   3: datafusion::datasource::physical_plan::parquet::row_filter::test::test_filter_type_coercion
             at ./src/datasource/physical_plan/parquet/row_filter.rs:549:9
   4: datafusion::datasource::physical_plan::parquet::row_filter::test::test_filter_type_coercion::{{closure}}
             at ./src/datasource/physical_plan/parquet/row_filter.rs:486:35
   5: core::ops::function::FnOnce::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
   6: core::ops::function::FnOnce::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

error: test failed, to rerun pass `--lib`
error: 1 target failed:
    `--lib`

I have some suggestions for the test, but otherwise I think this PR is ready to go.

@Ted-Jiang and @thinkharderdev I wonder if you have a moment to review this too as I believe you actively use the filter pushdown into parquet

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @jeffreyssmith2nd

I plan to leave this open for another day in case there are additional comments

Copy link
Contributor

@thinkharderdev thinkharderdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice. lgtm

@alamb alamb merged commit 70256ba into apache:main Jun 5, 2024
@alamb
Copy link
Contributor

alamb commented Jun 5, 2024

Thank you @jeffreyssmith2nd and @thinkharderdev

@alamb alamb deleted the smith/rowfilter branch June 5, 2024 16:17
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
)

* test: Add a failing test to show the lack of type coercion in row filters

* feat: update parquet row filter to handle type coercion

* chore: lint/fmt

* chore: test improvements and cleanup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parquet Predicate Pushdown Does Not Handle Type Coercion

3 participants