-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Closed
Closed
Copy link
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Timestamp types with timezone are not considered in Join's equivalent expression
To Reproduce
We have a test_table with a date field of type Timestamp(TimeUnit::Millisecond, Some("UTC".into())), and the SQL select * from test_table a join test_table b on a.date = b.date produces the following Optimized LogicalPlan and PhysicalPlan:
Optimized LogicalPlan:
Inner Join: Filter: a.date = b.date
SubqueryAlias: a
TableScan: test_table projection=[a, b, date]
SubqueryAlias: b
TableScan: test_table projection=[a, b, date]
Optimized PhysicalPlan:
NestedLoopJoinExec: join_type=Inner, filter=date@0 = date@1
RepartitionExec: partitioning=RoundRobinBatch(12), input_partitions=1
CustomExec
CustomExec
Expected behavior
As expected, the equal filter condition is optimized to be an on condition and eventually generates a HashJoinExec.
Optimized LogicalPlan:
Inner Join: a.date = b.date
SubqueryAlias: a
TableScan: test_table projection=[a, b, date]
SubqueryAlias: b
TableScan: test_table projection=[a, b, date]
Optimized PhysicalPlan:
HashJoinExec: mode=Partitioned, join_type=Inner, on=[(date@2, date@2)]
xxx
CustomExec
xxx
CustomExec
Additional context
I tried to look at the detailed source code implementation and found that it should be caused by not considering the timezone of the timestamp type in the function can_hash. If that's the case, I might try to fix it
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working