-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Resolve ListingScan
projection against table schema including partition columns
#17911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ListingTable
table_schema incl. partitio…ListingScan
projection against table schema including partition columns
A new bug report would be helpful for sure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this @mach-kernel -- this looks great
The only thing I think it needs is to avoid changing the existing tests and it will be ready to go.
I verified the test fails without the code in this PR:
failures:
---- cases::roundtrip_logical_plan::roundtrip_custom_listing_tables_schema stdout ----
Error: ArrowError(SchemaError("Unable to get field named \"part\". Valid fields: [\"id\", \"value\"]"), Some(""))
failures:
cases::roundtrip_logical_plan::roundtrip_custom_listing_tables_schema
ctx.register_table("hive_style", Arc::new(ListingTable::try_new(config)?))?; | ||
let listing_table: Arc<dyn TableProvider> = Arc::new(ListingTable::try_new(config)?); | ||
|
||
let plan = ctx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please leave the existing test case so it is clear there is no loss of coverage, and add the new plan as a new test case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original test passes for me (where it should fail, after cargo clean
) on main
refs:
daeb6597a0c7344735460bb2dce13879fd89d7bd
182d5dc5e456322664da921f446018a0549e60bc
At any rate, the old plan was 'supposed' to be the new plan! I am happy to make the change if I got this wrong, just wanted to double check 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I think it best is if the changes to the tests illustrate the changes in the code on the behavior of the code
If you change the test and the code in the same PR it is harder to evaluate the impact of your change on behavior.
For example, if the original query now fails, perhaps you can change it to
let err = thing_that_errors.unwrap_err().to_string();
assert_contains!(err, <expected message>)
To show that a test that (incorrectly) used to work no longer does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a new test alongside the existing one. The difference between the tests is that the new one will have its projection pushed down to the TableScan
node whereas the original doesn't, which covers the projection case for this bug.
I could not figure out what kind of assertion change to make to the existing test (it will continue to pass with/without these changes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mach-kernel
ctx.register_table("hive_style", Arc::new(ListingTable::try_new(config)?))?; | ||
let listing_table: Arc<dyn TableProvider> = Arc::new(ListingTable::try_new(config)?); | ||
|
||
let plan = ctx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I think it best is if the changes to the tests illustrate the changes in the code on the behavior of the code
If you change the test and the code in the same PR it is harder to evaluate the impact of your change on behavior.
For example, if the original query now fails, perhaps you can change it to
let err = thing_that_errors.unwrap_err().to_string();
assert_contains!(err, <expected message>)
To show that a test that (incorrectly) used to work no longer does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @mach-kernel
} | ||
|
||
#[tokio::test] | ||
async fn roundtrip_custom_listing_tables_schema_table_scan_projection() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified this test fails without the code changes in this PR:
Error: ArrowError(SchemaError("Unable to get field named \"part\". Valid fields: [\"id\", \"value\"]"), Some(""))
Thanks again @mach-kernel |
Which issue does this PR close?
Resolves: #17920
#15718
Rationale for this change
Since table partition columns are serialized in
table_partition_cols
for ListingScan, #15737 made a change to remove them from theTableScan
schema so they would not be duplicated. Unfortunately, queries with projections that reference the partition columns would fail to resolve here:datafusion/datafusion/proto/src/logical_plan/mod.rs
Lines 382 to 390 in daeb659
The round-trip test used a partition column in its
SELECT
to exercise this, however the unoptimized plan did not have projections decorated (so they would not get serialized, and there would be no lookup to fail). As in,scan.projection
above would beNone
.What changes are included in this PR?
ListingTable::schema
, at which point the partition columns are decoratedLogicalPlanBuilder
that explicitly specifies a projection to ensure we're testing this functionalitymain