Skip to content

Conversation

findepi
Copy link
Member

@findepi findepi commented Jul 1, 2025

Before the change, when casting Float64 value to Timestamp(Nanosecond, None), the result would depend on whether the source value is constant-foldable scalar. This is because ScalarValue.cast_to had a special treatment for that source & destination type pair, producing a different result from the canonical one.

So, this is not really changing cast(a_double as timestamp) cast. It is only changing a specialized and incorrect code path that's taken by that cast under very specific and narrow circumstances (constant folding and only for nano precision). This should be a non-controversial change.

Before the change, when casting `Float64` value to
`Timestamp(Nanosecond, None)`, the result would depend on whether the
source value is constant-foldable scalar. This is because
`ScalarValue.cast_to` had a special treatment for that source &
destination type pair, producing a different result from the canonical
one.
@findepi
Copy link
Member Author

findepi commented Jul 1, 2025

cc @chenkovsky @alamb @jatin510

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) common Related to common crate labels Jul 1, 2025
@findepi
Copy link
Member Author

findepi commented Jul 1, 2025

cc @Omega359 @spaydar per #16636 (comment)

@jatin510
Copy link
Contributor

jatin510 commented Jul 1, 2025

@findepi
How duckdb works in case of decimal/float input:

D  SELECT to_timestamp(1.1) as c1;
┌─────────────────────────────┐
│             c1              │
│  timestamp with time zone   │
├─────────────────────────────┤
│ 1970-01-01 05:30:01.1+05:30 │
└─────────────────────────────┘

what is the expected output ?

@findepi
Copy link
Member Author

findepi commented Jul 1, 2025

@jatin510 thanks for feedback.
to_timestamp(a_double), to_timestamp_seconds(a_double) and cast(a_double as timestamp) are 3 different things that do not have to behave the same. it's nice if they do

however,
cast(a_double as timestamp) must behave the same as cast(a_double as timestamp)... which clearly is a problem now: #16636
moreover, cast(a_double as timestamp(p)) must behave reasonably consistently for various p values (another clear problem today)
also cast from a double (Float64) must behave consistently with a cast from a float (Float32) (will add added test coverage for that)

i am not really changing cast(a_double as timestamp) cast. I am only changing a specialized incorrect code path that's taken by that cast under very specific and narrow cirumstances. this should be a non-controversial

@Omega359
Copy link
Contributor

Omega359 commented Jul 2, 2025

I just did a comparison of to_timestamp calls that were in the slt file that were changed to what postgresql generates and they are different:

SELECT to_timestamp(1.1) as a, to_timestamp(-1.1) as b, to_timestamp(0.0) as c, to_timestamp(1.23456789) as d, to_timestamp(123456789.123456789) as e;
a b c d e
1970-01-01 00:00:01.100000 +00:00 1969-12-31 23:59:58.900000 +00:00 1970-01-01 00:00:00.000000 +00:00 1970-01-01 00:00:01.234568 +00:00 1973-11-29 21:33:09.123457 +00:00

@findepi
Copy link
Member Author

findepi commented Jul 2, 2025

@Omega359 thanks. This totally escaped my notice that to_timestamp also changed, not only the casts (that are part of the same test).

@github-actions github-actions bot added the functions Changes to functions implementation label Jul 2, 2025
the function was not meant to be changed
@findepi findepi force-pushed the findepi/fix-discrepancy-in-float64-to-timestamp-9-casts-d3834c branch from 83c9c5e to 5c4a062 Compare July 2, 2025 19:19
@findepi
Copy link
Member Author

findepi commented Jul 2, 2025

I now realized that's exactly what @jatin510 pointed our earlier, just i didn't understand.
I pushed a commit restoring the to_timestamp(double) behavior to whatever it was before.

@Omega359 @jatin510 thanks again

@Omega359
Copy link
Contributor

Omega359 commented Jul 2, 2025

cast(123456789.123456789 as timestamp) => 1970-01-01T00:00:00.123456789

That strikes me as wrong.

@findepi
Copy link
Member Author

findepi commented Jul 2, 2025

@Omega359 i am not exactly fan of the cast semantics. If i were to choose, i would maybe choose it to be different.
Note that it's pre-existing though:

Consider

  • cast source type float 32, float 64
  • cast target type timestamp seconds / millis / micros / nanos
  • in scalar (constant folding) or query context

from these 16 combinations, this PR changes only one, to make it consistent with other 15

@Omega359
Copy link
Contributor

Omega359 commented Jul 2, 2025

Ok. Well considering I couldn't even figure out how to cast a float to a timestamp in postgres without using to_timestamp and duckdb outright says it's not implemented (see below) I guess it's consistent (must be nanos?) - it's just going to surprise pretty much everyone.

Use ".open FILENAME" to reopen on a persistent database.
D select 1.1::timestamp;
Conversion Error:
Unimplemented type for cast (DECIMAL(2,1) -> TIMESTAMP)

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this @findepi -- I think we are close

SELECT to_timestamp(1.1) as c1, cast(1.1 as timestamp) as c2, 1.1::timestamp as c3;
----
1970-01-01T00:00:01.100 1970-01-01T00:00:01.100 1970-01-01T00:00:01.100
1970-01-01T00:00:01.100 1970-01-01T00:00:00.000000001 1970-01-01T00:00:00.000000001
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I read this difference right as now cast(float_col AS timestamp) is treated as though it is 1.1 ns where as before it was treated as 1.1 sec?

What do we think about simply not supporting explicit conversion from float --> timestamp to follow the duckdb/postgres model. That feels far more defensible to me than this behavior which is both different than it was previously AND not consistent with other engines

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both different than it was previously

it's NOT!

it's different ONLY in constant folding context.

BTW to_timestamp(1.1) works the way it seems to work on constant folding context only too, i filed #16678 for this, but this PR fixes this problem as well.

What do we think about simply not supporting explicit conversion from float --> timestamp

I am supportive of that. Same for decimals and perhaps ints.

But it's definitely more work and more controversial change.
So we should first fix #16636, #16531 and #16678 which are code bugs bringing embarrassment to the project and data corruption to the users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you, I double checked and reminded myself of what was going on:

DataFusion CLI v48.0.0
> select cast(1.1 as timestamp);
+-------------------------+
| Float64(1.1)            |
+-------------------------+
| 1970-01-01T00:00:01.100 |
+-------------------------+
1 row(s) fetched.
Elapsed 0.005 seconds.

> select cast(column1 as timestamp) from values (1.1);
+-------------------------------+
| column1                       |
+-------------------------------+
| 1970-01-01T00:00:00.000000001 |
+-------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I tried to capture this in the linked issue #16636
Thanks for bring this example here.

@alamb alamb changed the title Fix discrepancy in Float64 to timestamp(9) casts Fix discrepancy in Float64 to timestamp(9) casts for constants Jul 17, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks good to me -- thank you @findepi

SELECT to_timestamp(1.1) as c1, cast(1.1 as timestamp) as c2, 1.1::timestamp as c3;
----
1970-01-01T00:00:01.100 1970-01-01T00:00:01.100 1970-01-01T00:00:01.100
1970-01-01T00:00:01.100 1970-01-01T00:00:00.000000001 1970-01-01T00:00:00.000000001
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you, I double checked and reminded myself of what was going on:

DataFusion CLI v48.0.0
> select cast(1.1 as timestamp);
+-------------------------+
| Float64(1.1)            |
+-------------------------+
| 1970-01-01T00:00:01.100 |
+-------------------------+
1 row(s) fetched.
Elapsed 0.005 seconds.

> select cast(column1 as timestamp) from values (1.1);
+-------------------------------+
| column1                       |
+-------------------------------+
| 1970-01-01T00:00:00.000000001 |
+-------------------------------+
1 row(s) fetched.
Elapsed 0.001 seconds.

@alamb alamb merged commit 4e32ab9 into apache:main Jul 17, 2025
29 checks passed
@alamb
Copy link
Contributor

alamb commented Jul 17, 2025

Thanks again @findepi

@findepi findepi deleted the findepi/fix-discrepancy-in-float64-to-timestamp-9-casts-d3834c branch July 17, 2025 19:40
alexanderbianchi pushed a commit to alexanderbianchi/datafusion that referenced this pull request Jul 31, 2025
…e#16639)

* Fix discrepancy in Float64 to timestamp(9) casts

Before the change, when casting `Float64` value to
`Timestamp(Nanosecond, None)`, the result would depend on whether the
source value is constant-foldable scalar. This is because
`ScalarValue.cast_to` had a special treatment for that source &
destination type pair, producing a different result from the canonical
one.

* Test Float32 cast to timestamp ntz too

* restore to_timestamp(double) behavior

the function was not meant to be changed

(cherry picked from commit 4e32ab9)
github-merge-queue bot pushed a commit that referenced this pull request Oct 18, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #16678.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

The issue has been fixed in #16639, this PR just adds a testcase for it.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Add a test case for `to_timestamp(double)` with vectorized input.
Similar to the one presented in the issue.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Yes

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

No
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Different result of double to timestamp(9) cast when source value is constant

4 participants