Skip to content

Conversation

mach-kernel
Copy link
Contributor

@mach-kernel mach-kernel commented Oct 4, 2025

Which issue does this PR close?

Resolves: #17920

#15718

Rationale for this change

Since table partition columns are serialized in table_partition_cols for ListingScan, #15737 made a change to remove them from the TableScan schema so they would not be duplicated. Unfortunately, queries with projections that reference the partition columns would fail to resolve here:

let mut projection = None;
if let Some(columns) = &scan.projection {
let column_indices = columns
.columns
.iter()
.map(|name| schema.index_of(name))
.collect::<Result<Vec<usize>, _>>()?;
projection = Some(column_indices);
}

The round-trip test used a partition column in its SELECT to exercise this, however the unoptimized plan did not have projections decorated (so they would not get serialized, and there would be no lookup to fail). As in, scan.projection above would be None.

What changes are included in this PR?

  • Do the projection name -> index binding using ListingTable::schema, at which point the partition columns are decorated
  • Since we're testing the round-tripping of the plan, make a plan with LogicalPlanBuilder that explicitly specifies a projection to ensure we're testing this functionality
    • This test fails on main

@github-actions github-actions bot added the proto Related to proto crate label Oct 4, 2025
@mach-kernel mach-kernel changed the title resolve projection against ListingTable table_schema incl. partitio… Resolve ListingScan projection against table schema including partition columns Oct 4, 2025
@alamb
Copy link
Contributor

alamb commented Oct 4, 2025

#15718, though I can file a new bug report. Should be easy to on a round-tripped plan against a Hive-partitioned table.

A new bug report would be helpful for sure

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this @mach-kernel -- this looks great

The only thing I think it needs is to avoid changing the existing tests and it will be ready to go.

I verified the test fails without the code in this PR:

failures:

---- cases::roundtrip_logical_plan::roundtrip_custom_listing_tables_schema stdout ----
Error: ArrowError(SchemaError("Unable to get field named \"part\". Valid fields: [\"id\", \"value\"]"), Some(""))


failures:
    cases::roundtrip_logical_plan::roundtrip_custom_listing_tables_schema

ctx.register_table("hive_style", Arc::new(ListingTable::try_new(config)?))?;
let listing_table: Arc<dyn TableProvider> = Arc::new(ListingTable::try_new(config)?);

let plan = ctx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please leave the existing test case so it is clear there is no loss of coverage, and add the new plan as a new test case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original test passes for me (where it should fail, after cargo clean) on main refs:

  • daeb6597a0c7344735460bb2dce13879fd89d7bd
  • 182d5dc5e456322664da921f446018a0549e60bc

At any rate, the old plan was 'supposed' to be the new plan! I am happy to make the change if I got this wrong, just wanted to double check 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I think it best is if the changes to the tests illustrate the changes in the code on the behavior of the code

If you change the test and the code in the same PR it is harder to evaluate the impact of your change on behavior.

For example, if the original query now fails, perhaps you can change it to

let err = thing_that_errors.unwrap_err().to_string();
assert_contains!(err, <expected message>)

To show that a test that (incorrectly) used to work no longer does

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a new test alongside the existing one. The difference between the tests is that the new one will have its projection pushed down to the TableScan node whereas the original doesn't, which covers the projection case for this bug.

I could not figure out what kind of assertion change to make to the existing test (it will continue to pass with/without these changes).

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mach-kernel

ctx.register_table("hive_style", Arc::new(ListingTable::try_new(config)?))?;
let listing_table: Arc<dyn TableProvider> = Arc::new(ListingTable::try_new(config)?);

let plan = ctx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I think it best is if the changes to the tests illustrate the changes in the code on the behavior of the code

If you change the test and the code in the same PR it is harder to evaluate the impact of your change on behavior.

For example, if the original query now fails, perhaps you can change it to

let err = thing_that_errors.unwrap_err().to_string();
assert_contains!(err, <expected message>)

To show that a test that (incorrectly) used to work no longer does

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mach-kernel

}

#[tokio::test]
async fn roundtrip_custom_listing_tables_schema_table_scan_projection() -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified this test fails without the code changes in this PR:

Error: ArrowError(SchemaError("Unable to get field named \"part\". Valid fields: [\"id\", \"value\"]"), Some(""))

@alamb alamb added this pull request to the merge queue Oct 7, 2025
@alamb
Copy link
Contributor

alamb commented Oct 7, 2025

Thanks again @mach-kernel

Merged via the queue into apache:main with commit 1cc4daf Oct 7, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proto Related to proto crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ListingTable reads that project partition columns fail on deserialization
2 participants