-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Closed
Description
Describe the bug
When trying to query a Parquet file produced by Apache Flink I get an error:
ArrowError(InvalidArgumentError("column types must match schema types, expected Timestamp(Millisecond, Some(\"UTC\")) but found Timestamp(Millisecond, None) at column index 0"))Output of Java parquet-schema:
message Row {
optional int64 system_time (TIMESTAMP(MILLIS,true));
optional int64 reported_date (TIMESTAMP(MILLIS,true));
optional binary province (STRING);
optional int64 total_daily;
}To Reproduce
Download and extract the sample data: data.tar.gz.
Run:
use datafusion::arrow::util::pretty::print_batches;
use datafusion::prelude::*;
#[tokio::main]
async fn main() -> datafusion::error::Result<()> {
let mut ctx = ExecutionContext::new();
ctx.register_parquet("test", "flink.parquet")?;
let df = ctx.table("test")?;
//let df = ctx.sql("select * from test")?;
let df = ctx.sql("select * from test order by reported_date desc")?;
let records = df.collect().await?;
print_batches(&records)?;
Ok(())
}Note that simple select works fine, but ORDER BY fails.
Expected behavior
Query executes without errors.
watfordkcf
Metadata
Metadata
Assignees
Labels
No labels