Skip to content

Partitioned ListingTable read fails after logical plan ser/de #15718

@milenkovicm

Description

@milenkovicm

Describe the bug

Partitioned ListingTable logical plan round trip fails to produce valid schema after deserialisation with:

Error: SchemaError(DuplicateQualifiedField { qualifier: Bare { table: "hive_style" }, name: "year" }, Some(""))

To Reproduce

        let session_state = ctx.state();
        let data = format!("{test_data}/hive_style/");
        let listing_table_url = ListingTableUrl::parse(data)?;

        let table_partition_cols = vec![
            ("year".to_owned(), DataType::Int64),
            ("month".to_owned(), DataType::Int64),
        ];

        let file_format = ParquetFormat::new()
            .with_enable_pruning(true)
            .with_skip_metadata(true);

        let listing_options = ListingOptions::new(Arc::new(file_format))
            .with_table_partition_cols(table_partition_cols);

        let config = ListingTableConfig::new(listing_table_url)
            .with_listing_options(listing_options)
            .infer_schema(&session_state)
            .await?;

        ctx.register_table("hive_style", Arc::new(ListingTable::try_new(config)?))?;
        //
        // Limit: skip=0, fetch=1
        //  Projection: hive_style.year, hive_style.month
        //   TableScan: hive_style
        //
        let plan = ctx
            .sql("SELECT year, month FROM hive_style LIMIT 1")
            .await?
            .logical_plan()
            .clone();

        let expected = [
            "+------+-------+",
            "| year | month |",
            "+------+-------+",
            "| 2024 | 1     |",
            "+------+-------+",
        ];

        // works as expected
        let result = ctx
            .execute_logical_plan(plan.clone())
            .await?
            .collect()
            .await?;
        assert_batches_eq!(expected, &result);

        let bytes = logical_plan_to_bytes(&plan)?;

        // logical plan from bytes fails
        //
        // Error: SchemaError(DuplicateQualifiedField { qualifier: Bare { table: "hive_style" }, name: "year" }, Some(""))
        let plan = logical_plan_from_bytes(&bytes, &ctx)?;

        let result = ctx.execute_logical_plan(plan).await?.collect().await?;
        assert_batches_eq!(expected, &result);

Expected behavior

Expected correct deserialisation

Additional context

I believe schema used at

should not contain partitioned columns, as original table does not have them.

we should probably add new test to cover this case at

async fn roundtrip_custom_listing_tables() -> Result<()> {

More details about original bug can be found at apache/datafusion-ballista#1239

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinggood first issueGood for newcomers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions