-
Notifications
You must be signed in to change notification settings - Fork 1.7k
fix: Timestamp with timezone not considered join on
#8150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you @acking-you - can you please add a test for this behavior -- perhaps in https://github.com/apache/arrow-datafusion/blob/main/datafusion/sqllogictest/test_files/joins.slt ? |
👌 I'm done testing. |
4ba7a8b to
37564aa
Compare
|
|
||
| # test timestamp with timezone join on nanos datatype | ||
| query PPPPTPPPPT rowsort | ||
| SELECT * FROM test_timestamps_tz_table as t1 JOIN (SELECT * FROM test_timestamps_tz_table ) as t2 ON t1.nanos = t2.nanos; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| SELECT * FROM test_timestamps_tz_table as t1 JOIN (SELECT * FROM test_timestamps_tz_table ) as t2 ON t1.nanos = t2.nanos; | |
| SELECT * FROM test_timestamps_tz_table AS t1 JOIN test_timestamps_tz_table AS t2 ON t1.nanos = t2.nanos; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original test on timestamp used as, do we need to change it to AS?
|
|
||
| # test timestamp with timezone join on micros datatype | ||
| query PPPPTPPPPT rowsort | ||
| SELECT * FROM test_timestamps_tz_table as t1 JOIN (SELECT * FROM test_timestamps_tz_table ) as t2 ON t1.micros = t2.micros |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| SELECT * FROM test_timestamps_tz_table as t1 JOIN (SELECT * FROM test_timestamps_tz_table ) as t2 ON t1.micros = t2.micros | |
| SELECT * FROM test_timestamps_tz_table AS t1 JOIN test_timestamps_tz_table AS t2 ON t1.micros = t2.micros |
|
|
||
| # test timestamp with timezone join on millis datatype | ||
| query PPPPTPPPPT rowsort | ||
| SELECT * FROM test_timestamps_tz_table as t1 JOIN (SELECT * FROM test_timestamps_tz_table ) as t2 ON t1.millis = t2.millis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| SELECT * FROM test_timestamps_tz_table as t1 JOIN (SELECT * FROM test_timestamps_tz_table ) as t2 ON t1.millis = t2.millis | |
| SELECT * FROM test_timestamps_tz_table AS t1 JOIN test_timestamps_tz_table AS t2 ON t1.millis = t2.millis |
join onjoin on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR can be affected by #8193
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @acking-you -- thank you very much 🙏
| DataType::Float32 => true, | ||
| DataType::Float64 => true, | ||
| DataType::Timestamp(time_unit, None) => match time_unit { | ||
| DataType::Timestamp(time_unit, _) => match time_unit { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like there is no reason to check the timeunit either (as all branches are true)
However that is not something introduced in this PR
| ----TableScan: test_timestamps_tz_table projection=[nanos, micros, millis, secs, names] | ||
| physical_plan | ||
| CoalesceBatchesExec: target_batch_size=2 | ||
| --HashJoinExec: mode=Partitioned, join_type=Inner, on=[(millis@2, millis@2)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified that this test covers the code, and without the changes in this PR and it results in NestedLoopsJoin 👍
+ NestedLoopJoinExec: join_type=Inner, filter=millis@0 = millis@1
+ --RepartitionExec: partitioning=RoundRobinBatch(2), input_partitions=1
+ ----MemoryExec: partitions=1, partition_sizes=[1]
+ --MemoryExec: partitions=1, partition_sizes=[1]
Which issue does this PR close?
part of #8147
Rationale for this change
The join filters all use the ExtractEquijoinPredicate optimizer to make the eq expression the key for the join on, and then the presence of
join ondetermines whether to useHashJoinExecorNestedJoinExec. In this process,can _hashfunction determines whether an expression type can be hashed, but currently this function does not support timestamps with timezone.What changes are included in this PR?
Modified the
can_hashfunction to support timestamps with timezone.Are these changes tested?
Are there any user-facing changes?
No