-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Fix discrepancy in Float64 to timestamp(9) casts for constants #16639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix discrepancy in Float64 to timestamp(9) casts for constants #16639
Conversation
Before the change, when casting `Float64` value to `Timestamp(Nanosecond, None)`, the result would depend on whether the source value is constant-foldable scalar. This is because `ScalarValue.cast_to` had a special treatment for that source & destination type pair, producing a different result from the canonical one.
cc @Omega359 @spaydar per #16636 (comment) |
@findepi
what is the expected output ? |
@jatin510 thanks for feedback. however, i am not really changing |
I just did a comparison of to_timestamp calls that were in the slt file that were changed to what postgresql generates and they are different: SELECT to_timestamp(1.1) as a, to_timestamp(-1.1) as b, to_timestamp(0.0) as c, to_timestamp(1.23456789) as d, to_timestamp(123456789.123456789) as e;
|
@Omega359 thanks. This totally escaped my notice that to_timestamp also changed, not only the casts (that are part of the same test). |
the function was not meant to be changed
83c9c5e
to
5c4a062
Compare
That strikes me as wrong. |
@Omega359 i am not exactly fan of the cast semantics. If i were to choose, i would maybe choose it to be different. Consider
from these 16 combinations, this PR changes only one, to make it consistent with other 15 |
Ok. Well considering I couldn't even figure out how to cast a float to a timestamp in postgres without using
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this @findepi -- I think we are close
SELECT to_timestamp(1.1) as c1, cast(1.1 as timestamp) as c2, 1.1::timestamp as c3; | ||
---- | ||
1970-01-01T00:00:01.100 1970-01-01T00:00:01.100 1970-01-01T00:00:01.100 | ||
1970-01-01T00:00:01.100 1970-01-01T00:00:00.000000001 1970-01-01T00:00:00.000000001 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I read this difference right as now cast(float_col AS timestamp)
is treated as though it is 1.1 ns where as before it was treated as 1.1
sec?
What do we think about simply not supporting explicit conversion from float --> timestamp to follow the duckdb/postgres model. That feels far more defensible to me than this behavior which is both different than it was previously AND not consistent with other engines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both different than it was previously
it's NOT!
it's different ONLY in constant folding context.
BTW to_timestamp(1.1)
works the way it seems to work on constant folding context only too, i filed #16678 for this, but this PR fixes this problem as well.
What do we think about simply not supporting explicit conversion from float --> timestamp
I am supportive of that. Same for decimals and perhaps ints.
But it's definitely more work and more controversial change.
So we should first fix #16636, #16531 and #16678 which are code bugs bringing embarrassment to the project and data corruption to the users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thank you, I double checked and reminded myself of what was going on:
DataFusion CLI v48.0.0
> select cast(1.1 as timestamp);
+-------------------------+
| Float64(1.1) |
+-------------------------+
| 1970-01-01T00:00:01.100 |
+-------------------------+
1 row(s) fetched.
Elapsed 0.005 seconds.
> select cast(column1 as timestamp) from values (1.1);
+-------------------------------+
| column1 |
+-------------------------------+
| 1970-01-01T00:00:00.000000001 |
+-------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I tried to capture this in the linked issue #16636
Thanks for bring this example here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks good to me -- thank you @findepi
SELECT to_timestamp(1.1) as c1, cast(1.1 as timestamp) as c2, 1.1::timestamp as c3; | ||
---- | ||
1970-01-01T00:00:01.100 1970-01-01T00:00:01.100 1970-01-01T00:00:01.100 | ||
1970-01-01T00:00:01.100 1970-01-01T00:00:00.000000001 1970-01-01T00:00:00.000000001 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thank you, I double checked and reminded myself of what was going on:
DataFusion CLI v48.0.0
> select cast(1.1 as timestamp);
+-------------------------+
| Float64(1.1) |
+-------------------------+
| 1970-01-01T00:00:01.100 |
+-------------------------+
1 row(s) fetched.
Elapsed 0.005 seconds.
> select cast(column1 as timestamp) from values (1.1);
+-------------------------------+
| column1 |
+-------------------------------+
| 1970-01-01T00:00:00.000000001 |
+-------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.
Thanks again @findepi |
…e#16639) * Fix discrepancy in Float64 to timestamp(9) casts Before the change, when casting `Float64` value to `Timestamp(Nanosecond, None)`, the result would depend on whether the source value is constant-foldable scalar. This is because `ScalarValue.cast_to` had a special treatment for that source & destination type pair, producing a different result from the canonical one. * Test Float32 cast to timestamp ntz too * restore to_timestamp(double) behavior the function was not meant to be changed (cherry picked from commit 4e32ab9)
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #16678. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> The issue has been fixed in #16639, this PR just adds a testcase for it. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Add a test case for `to_timestamp(double)` with vectorized input. Similar to the one presented in the issue. ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Yes ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> No
Before the change, when casting
Float64
value toTimestamp(Nanosecond, None)
, the result would depend on whether the source value is constant-foldable scalar. This is becauseScalarValue.cast_to
had a special treatment for that source & destination type pair, producing a different result from the canonical one.So, this is not really changing
cast(a_double as timestamp)
cast. It is only changing a specialized and incorrect code path that's taken by that cast under very specific and narrow circumstances (constant folding and only for nano precision). This should be a non-controversial change.