Skip to content

date_bin doesn't work with months or years #5689

@alamb

Description

@alamb

Describe the bug

date_bin('1 month', column1) does not work, but (confusingly) date_bin('1 day', column1) does

To Reproduce

In datafusion-cli:

select date_bin('1 month', column1)
from (values
  (timestamp '2022-01-01 00:00:00'),
  (timestamp '2022-01-01 01:00:00'),
(timestamp '2022-01-02 00:00:00')
) as sq;
Error during planning: Coercion from [Interval(YearMonth), Timestamp(Nanosecond, None)] to the signature OneOf([Exact([Interval(DayTime), Timestamp(Nanosecond, None), Timestamp(Nanosecond, None)]), Exact([Interval(DayTime), Timestamp(Nanosecond, None)])]) failed.

Note using 1 hour works great:

select date_bin('1 hour', column1) 
from (values 
  (timestamp '2022-01-01 00:00:00'), 
  (timestamp '2022-01-01 01:00:00'), 
(timestamp '2022-01-02 00:00:00')
) as sq;

+------------------------------------+
| datebin(Utf8("1 hour"),sq.column1) |
+------------------------------------+
| 2022-01-01T00:00:00                |
| 2022-01-01T01:00:00                |
| 2022-01-02T00:00:00                |
+------------------------------------+
3 rows in set. Query took 0.001 seconds.

Expected behavior

I expect date_bin('1 month') to work and return the same value for all three rows

| 2022-01-01T00:00:00                |

Workarounds

One can use DATE_TRUNC('month',time) to get monthly aggregates. It doesn't work with intervals of multiple months though. (thanks to our user for this idea!)

Additional context

This was reported by an IOx user

If you look at the error, the problem is that Interval(YearMonth) has a different unit than Interval(DayTime)

That happens because the code that parses '1 hour' (source link) determines it should be Interval(DayTime) but that '1 month' is Interval(YearMonth) (as 1 month can not correctly be represented as a Interval(DayTime)).

Thanfully, the Interval(MonthDayNano) type (doc link) was introduced to solve this problem and can represent the full range.

Thus my desired solution is:

  1. Change the signature of date_bin to take a Interval(MonthDayNano)
  2. Change the parsing of literal intervals to produce Interval(MonthDayNano) always -- see Support ::interval casting / INTERVAL SQL datatype #5651
  3. Add equivalent coercion rules for Interval(MonthDayNano)

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions