-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Describe the bug
I'm not sure I'd call it a "bug" per se, but all of the to_timestamp*() functions output TimestampTypes in Arrow that have a Timezone field of None. The issues here are:
- There is no way to specify a specific timezone currently to convert to
- Timestamp types with different timezones are not comparable / compatible types. For example, if I have data with a timezone of
Some("UTC")the following fails due to incompatible types:WHERE timestamp_col > to_timestamp("2021-06-21T12:00Z")(because timestamp_col has Some(UTC) but to_timestamp returns None)
I'm aware there was a recent vote to treat the None timestamp like a local timestamp, but this isn't always the case either. For example, we have been using the output of None timestamps from DataFusion but we treat all our timestamps as UTC internally.
To Reproduce
Steps to reproduce the behavior:
I can create a PR to have a test to show the above comparison failing. Take some data with a timezone with UTC and do the above comparison.
Expected behavior
The best solution I can think of would be for to_timestamp(...) to support a second, optional argument where the timezone can be specified. This allows for casting/conversion and generation of timestamps with a specific timezone requirement. There are some technical challenges which can be overcome:
- Current code in datetime_expressions.rs for example uses macros which assume a static, const type, which makes it difficult to input variable timezone args
- The function signature checks would be really complicated
There are other "hacks" which are possible:
- For our use case,
Some("UTC")is equivalent toNone, so we could make Some("UTC") cast to None and vice versa, but this would lose precision for many people.