Skip to content

IPC date serialization issue #902

@tafia

Description

@tafia

Describe the bug

When writing a RecordBatch via ipc (and then reading the stream from python) dates are not correct.
I have a date column from 2000-02-01 to 2021-11-02 on rust side but once converted to ipc pyarrow table, I see dates from 1998-01-16 to 2019-11-12. Dates are saved in a Date64Array.
Writing the same recordbatch as csv displays the correct dates.

To Reproduce

I can't really share the code but here is the rust->python (using pyo3) part.

fn df<'a>(py: Python<'a>, batch: &RecordBatch) -> Result<&'a PyAny, Error> {
    // rust ipc write
    let buf: Vec<u8> = Vec::new();
    let mut writer = StreamWriter::try_new(buf, &batch.schema())?;
    writer.write(batch)?;
    writer.finish()?;
    let buf = writer.into_inner()?;

    // python reading
    let bytes = PyBytes::new(py, &buf);
    let locals = PyDict::new(py);
    locals.set_item("buf", bytes)?;
    locals.set_item("pa", PyModule::import(py, "pyarrow")?)?;
    locals.set_item("pd", PyModule::import(py, "pandas")?)?;
    py.run(
        r#"
reader = pa.ipc.open_stream(buf)
batches = [b for b in reader]
table = pa.Table.from_batches(batches) # table.column("date") shows wrong values
df = table.to_pandas(split_blocks=True, self_destruct=True)
df["date"] = pd.to_datetime(df["date"])
df.set_index("date", inplace=True)
    "#,
        Some(&locals),
        None,
    )?;

    Ok(locals.get_item("df").unwrap())
}

Expected behavior

Same dates

Additional context

I am new to arrow so there is a high chance I'm doing something wrong

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions