-
Notifications
You must be signed in to change notification settings - Fork 545
Description
Environment
Delta-rs version: 0.13.0
Binding: python bindings
Environment:
- Cloud provider: n/a
- OS: ubuntu 22.04
- Other: testing locally via jupyter notebook
Bug
What happened:
It seems that when a string predicate is passed into DeltaTable.delete, when it gets parsed as a Datafusion Expression, it is not taking into account the schema of the table. For example, if there is a column with type pa.int32(), and you try to use a predicate like price = 100, it raises an error ValueError: Invalid comparison operation: Int32 <= Int64, which I assume is coming from 100 being parsed as a int64. This is supported by if I pass in price = CAST(100 as INT) instead, it works as expected.
What you expected to happen:
The parser should be schema-aware when converting the string predicate to a Datafusion Expression.
How to reproduce it:
This is a minimal reproduction:
import tempfile
import pandas as pd
import pyarrow as pa
from deltalake import DeltaTable
from deltalake.writer import write_deltalake
pandas_data = pd.DataFrame.from_dict(
{
"price": [100, 150],
"qty": [15, 20],
}
)
schema = pa.schema(
[
("price", pa.int32()),
("qty", pa.int32()),
]
)
with tempfile.TemporaryDirectory() as path:
table = pa.Table.from_pandas(pandas_data, schema)
write_deltalake(
table_or_uri=path,
data=table,
mode="error",
)
delta_table = DeltaTable(path)
# this does not work
delta_table.delete(predicate="price = 100")
# this works
delta_table.delete(predicate="price = CAST(100 as INT)")