Skip to content

Conversation

@andre-cc-natzka
Copy link
Contributor

@andre-cc-natzka andre-cc-natzka commented Nov 21, 2022

Which issue does this PR close?

Closes #4311.

Rationale for this change

Currently DataFusion supports type coercion for (DataType::Time, DataType::Utf8) and (DataType::Date, DataType::Utf8) pairs, but is still missing type coercion for (DataType::Timestamp, DataType::Utf8).

What changes are included in this PR?

Two lines of code are added in the temporal_coercion function from datafusion/expr/src/type_coercion/binary.rs to account for type coercion for a (DataType::Timestamp, DataType::Utf8) pair. The output type is DataType::Timestamp(TimeUnit::Nanosecond, _), because it is the only time unit supported by Arrow, which forces us to change the original time unit to nanoseconds. Although not ideal, this is not a problem, as both the left and right hand side of an expression are converted into DataType::Timestamp(TimeUnit::Nanosecond, _) and the expression can be properly evaluated.

Are these changes tested?

Yes, corresponding test cases are added to the test function test_type_coercion from datafusion/expr/src/type_coercion/binary.rs.

Are there any user-facing changes?

No.

@github-actions github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions labels Nov 21, 2022
Copy link
Contributor

@waitingkuo waitingkuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @andre-cc-natzka

here's my comments:

  1. rebase master, your time32/64 pr is merged, thanks again
  2. i'll suggest align the text with the prettyprint result (what you see in datafusion-cli)
    i.e.
    align the result for
select now()::text;
+-----------------------------------+
| now()                             |
+-----------------------------------+
| 2022-11-21 21:42:52.196874 +00:00 |
+-----------------------------------+

to

select now();
+----------------------------------+
| now()                            |
+----------------------------------+
| 2022-11-21T21:42:48.800795+00:00 |
+----------------------------------+

and

select now()::timestamp::text;
+----------------------------+
| now()                      |
+----------------------------+
| 2022-11-21 21:44:48.386022 |
+----------------------------+
1 row in set. Query took 0.003 seconds.

to

❯ select now()::timestamp;
+----------------------------+
| now()                      |
+----------------------------+
| 2022-11-21T21:44:52.195395 |
+----------------------------+
  1. delete the generated datafusion.rs in the proto folder, it's generated by the prost crate https://github.com/tokio-rs/prost

async fn test_cast_to_time() -> Result<()> {
let ctx = SessionContext::new();
let sql = "SELECT 0::TIME";
let sql = "SELECT 0::TIME64";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure why do we have this. but select 0::time64 doesn't work for me

IntervalMonthDayNanoValue interval_month_day_nano = 31;
StructValue struct_value = 32;
ScalarFixedSizeBinary fixed_size_binary_value = 34;
>>>>>>> upstream/master
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
>>>>>>> upstream/master

Comment on lines 773 to 786
message ScalarTime32Value {
oneof value {
int32 time32_second_value = 1;
int32 time32_millisecond_value = 2;
};
}

message ScalarTime64Value {
oneof value {
int64 time64_microsecond_value = 1;
int64 time64_nanosecond_value = 2;
};
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you should rebase master, the pr for time32/64 is merged

Comment on lines 817 to 818
// filtering with Time32 and Time64 types

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not quite sure what it is, i guess we need to delete ths

@alamb
Copy link
Contributor

alamb commented Nov 23, 2022

Marking this PR as draft as it still needs some work prior to passing CI

@alamb alamb marked this pull request as draft November 23, 2022 18:02
@github-actions github-actions bot removed the core Core DataFusion crate label Dec 5, 2022
@andre-cc-natzka
Copy link
Contributor Author

Hello @waitingkuo, @alamb. Thanks for your feedback!

It's been a long time, I apologize for that. I took some days of vacation actually. I am also sorry for the state of this PR, I should indeed have rebased master in advance. I have been through the process now and I believe the problems pointed out by @waitingkuo have been solved.

In fact, the changes I implemented here are very simple, and are limited to the ones in datafusion/expr/src/type_coercion/binary.rs, which aim at enabling type coercion to a timestamp (to nanosecond accuracy, since it is the only one supported by Arrow) when a string and a timestamp (with arbitrary precision) are provided. This could be very useful in my opinion.

Thank you again,
André.

@andre-cc-natzka andre-cc-natzka marked this pull request as ready for review December 5, 2022 09:38
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me. Thank you @andre-cc-natzka

I would recommend (perhaps as a follow on) to add some SQL level tests but since there is coverage here I think it looks good to me

@waitingkuo
Copy link
Contributor

hi @andre-cc-natzka thank you, it looks good to me.

I suggest that add some SQL level tests

e.g.

select (timestamp '2000-01-01T00:00:00') = '2000-01-01T00:00:00';
+-----------------------------------------------------------+
| Utf8("2000-01-01T00:00:00") = Utf8("2000-01-01T00:00:00") |
+-----------------------------------------------------------+
| true                                                      |
+-----------------------------------------------------------+
1 row in set. Query took 0.002 seconds.

and this should raise error until we could cast TimestampTz to Utf8

select (timestamptz '2000-01-01T00:00:00+00:00') = '2000-01-01T00:00:00+08:00';
NotImplemented("Unsupported CAST from Utf8 to Timestamp(Nanosecond, Some(\"+00:00\"))")

@alamb
Copy link
Contributor

alamb commented Dec 7, 2022

I plan to write some SQL level tests for this feature as they will help us in IOx.

@alamb alamb merged commit 8547fd8 into apache:master Dec 7, 2022
@alamb
Copy link
Contributor

alamb commented Dec 7, 2022

Thanks again @andre-cc-natzka -- here are some SQL level tests #4545

@ursabot
Copy link

ursabot commented Dec 7, 2022

Benchmark runs are scheduled for baseline = fbadebb and contender = 8547fd8. 8547fd8 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@andre-cc-natzka
Copy link
Contributor Author

Hello again @alamb , @waitingkuo. Thank you very much for your second revision and for merging the PR!

Thank you @alamb also for adding the SQL tests that test this new functionality, they look really nice!

All the best,
André.

@alamb
Copy link
Contributor

alamb commented Dec 8, 2022

Thank you for the contribution @andre-cc-natzka

@waitingkuo
Copy link
Contributor

thank you @andre-cc-natzka

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for type coercion for a (Timestamp, Utf8) pair

4 participants