-
Notifications
You must be signed in to change notification settings - Fork 1k
Open
Labels
Description
Describe the bug
When writing a RecordBatch via ipc (and then reading the stream from python) dates are not correct.
I have a date column from 2000-02-01 to 2021-11-02 on rust side but once converted to ipc pyarrow table, I see dates from 1998-01-16 to 2019-11-12. Dates are saved in a Date64Array.
Writing the same recordbatch as csv displays the correct dates.
To Reproduce
I can't really share the code but here is the rust->python (using pyo3) part.
fn df<'a>(py: Python<'a>, batch: &RecordBatch) -> Result<&'a PyAny, Error> {
// rust ipc write
let buf: Vec<u8> = Vec::new();
let mut writer = StreamWriter::try_new(buf, &batch.schema())?;
writer.write(batch)?;
writer.finish()?;
let buf = writer.into_inner()?;
// python reading
let bytes = PyBytes::new(py, &buf);
let locals = PyDict::new(py);
locals.set_item("buf", bytes)?;
locals.set_item("pa", PyModule::import(py, "pyarrow")?)?;
locals.set_item("pd", PyModule::import(py, "pandas")?)?;
py.run(
r#"
reader = pa.ipc.open_stream(buf)
batches = [b for b in reader]
table = pa.Table.from_batches(batches) # table.column("date") shows wrong values
df = table.to_pandas(split_blocks=True, self_destruct=True)
df["date"] = pd.to_datetime(df["date"])
df.set_index("date", inplace=True)
"#,
Some(&locals),
None,
)?;
Ok(locals.get_item("df").unwrap())
}Expected behavior
Same dates
Additional context
I am new to arrow so there is a high chance I'm doing something wrong