-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Describe the bug
date_bin('1 month', column1) does not work, but (confusingly) date_bin('1 day', column1) does
To Reproduce
In datafusion-cli:
❯ select date_bin('1 month', column1)
from (values
(timestamp '2022-01-01 00:00:00'),
(timestamp '2022-01-01 01:00:00'),
(timestamp '2022-01-02 00:00:00')
) as sq;
Error during planning: Coercion from [Interval(YearMonth), Timestamp(Nanosecond, None)] to the signature OneOf([Exact([Interval(DayTime), Timestamp(Nanosecond, None), Timestamp(Nanosecond, None)]), Exact([Interval(DayTime), Timestamp(Nanosecond, None)])]) failed.Note using 1 hour works great:
select date_bin('1 hour', column1)
from (values
(timestamp '2022-01-01 00:00:00'),
(timestamp '2022-01-01 01:00:00'),
(timestamp '2022-01-02 00:00:00')
) as sq;
+------------------------------------+
| datebin(Utf8("1 hour"),sq.column1) |
+------------------------------------+
| 2022-01-01T00:00:00 |
| 2022-01-01T01:00:00 |
| 2022-01-02T00:00:00 |
+------------------------------------+
3 rows in set. Query took 0.001 seconds.Expected behavior
I expect date_bin('1 month') to work and return the same value for all three rows
| 2022-01-01T00:00:00 |
Workarounds
One can use DATE_TRUNC('month',time) to get monthly aggregates. It doesn't work with intervals of multiple months though. (thanks to our user for this idea!)
Additional context
This was reported by an IOx user
If you look at the error, the problem is that Interval(YearMonth) has a different unit than Interval(DayTime)
That happens because the code that parses '1 hour' (source link) determines it should be Interval(DayTime) but that '1 month' is Interval(YearMonth) (as 1 month can not correctly be represented as a Interval(DayTime)).
Thanfully, the Interval(MonthDayNano) type (doc link) was introduced to solve this problem and can represent the full range.
Thus my desired solution is:
- Change the signature of date_bin to take a
Interval(MonthDayNano) - Change the parsing of literal intervals to produce
Interval(MonthDayNano)always -- see Support::intervalcasting /INTERVALSQL datatype #5651 - Add equivalent coercion rules for Interval(MonthDayNano)