-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
apache/arrow-rs
#2814Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Given a TimestampNanosecondArray which pretty-prints as follows:
// +---------------------+
// | time |
// +---------------------+
// | 2021-07-20 23:28:50 |
// | 2021-07-20 23:30:30 |
// +---------------------+
Queries involving a predicate such as time < to_timestamp('2021-07-20 23:29:30') do not filter any rows (even though they should filter the row with 2021-07-20 23:30:30)
To Reproduce
async fn datafusion_reproducer() {
let array = TimestampNanosecondArray::from(vec![1626823730000000000, 1626823830000000000]);
let array: ArrayRef = Arc::new(array);
println!("array[0]: {:?}", array_value_to_string(&array, 0).unwrap());
println!("array[1]: {:?}", array_value_to_string(&array, 1).unwrap());
let batch = RecordBatch::try_from_iter(vec![("time", array)]).unwrap();
let table = MemTable::try_new(batch.schema(), vec![vec![batch]]).unwrap();
let table = Arc::new(table);
// select * from t
// +---------------------+
// | time |
// +---------------------+
// | 2021-07-20 23:28:50 |
// | 2021-07-20 23:30:30 |
// +---------------------+
run_query(table.clone(), "select * from t").await;
// Using the following predicate should result in a single row,
// but instead results in both
//
// select * from t where time < to_timestamp('2021-07-20 23:29:30')
// +---------------------+
// | time |
// +---------------------+
// | 2021-07-20 23:28:50 |
// | 2021-07-20 23:30:30 |
// +---------------------+
run_query(table.clone(), "select * from t where time < to_timestamp('2021-07-20 23:29:30')").await;
// explain select * from t where time < to_timestamp('2021-07-20 23:29:30')
// +---------------+---------------------------------------------------------------+
// | plan_type | plan |
// +---------------+---------------------------------------------------------------+
// | logical_plan | Projection: #t.time |
// | | Filter: #t.time Lt TimestampNanosecond(1626838170000000000) |
// | | TableScan: t projection=Some([0]) |
// | physical_plan | ProjectionExec: expr=[time@0 as time] |
// | | CoalesceBatchesExec: target_batch_size=4096 |
// | | FilterExec: time@0 < 1626838170000000000 |
// | | RepartitionExec: partitioning=RoundRobinBatch(16) |
// | | MemoryExec: partitions=1, partition_sizes=[1] |
// +---------------+---------------------------------------------------------------+
run_query(table.clone(), "explain select * from t where time < to_timestamp('2021-07-20 23:29:30')").await;
}
#[allow(dead_code)]
async fn run_query(csvdata: Arc<dyn TableProvider>, query: &str) {
let mut ctx = ExecutionContext::new();
ctx.register_table("t", csvdata).unwrap();
let results = ctx.sql(query)
.unwrap()
.collect()
.await
.unwrap();
let pretty = pretty_format_batches(&results).unwrap();
println!("{}\n{}", query, pretty);
}Expected behavior
The query should produce a single row with timestamp 2021-07-20 23:28:50
However the actual query returns both rows
Additional context
We saw this in IOx: https://github.com/influxdata/influxdb_iox/issues/2071
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working