Skip to content

SELECT ... ORDER BY query fails on data with int64 timestamp and timezone field #959

@sergiimk

Description

@sergiimk

Describe the bug
When trying to query a Parquet file produced by Apache Flink I get an error:

ArrowError(InvalidArgumentError("column types must match schema types, expected Timestamp(Millisecond, Some(\"UTC\")) but found Timestamp(Millisecond, None) at column index 0"))

Output of Java parquet-schema:

message Row {
  optional int64 system_time (TIMESTAMP(MILLIS,true));
  optional int64 reported_date (TIMESTAMP(MILLIS,true));
  optional binary province (STRING);
  optional int64 total_daily;
}

To Reproduce
Download and extract the sample data: data.tar.gz.

Run:

use datafusion::arrow::util::pretty::print_batches;
use datafusion::prelude::*;

#[tokio::main]
async fn main() -> datafusion::error::Result<()> {
    let mut ctx = ExecutionContext::new();
    ctx.register_parquet("test", "flink.parquet")?;
    let df = ctx.table("test")?;

    //let df = ctx.sql("select * from test")?;
    let df = ctx.sql("select * from test order by reported_date desc")?;

    let records = df.collect().await?;
    print_batches(&records)?;
    Ok(())
}

Note that simple select works fine, but ORDER BY fails.

Expected behavior
Query executes without errors.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions