Skip to content

Timestamp types with timezone are not considered in Join's equivalent expression #8147

@acking-you

Description

@acking-you

Describe the bug

Timestamp types with timezone are not considered in Join's equivalent expression

To Reproduce

We have a test_table with a date field of type Timestamp(TimeUnit::Millisecond, Some("UTC".into())), and the SQL select * from test_table a join test_table b on a.date = b.date produces the following Optimized LogicalPlan and PhysicalPlan:

Optimized LogicalPlan:
Inner Join:  Filter: a.date = b.date
  SubqueryAlias: a
    TableScan: test_table projection=[a, b, date]
  SubqueryAlias: b
    TableScan: test_table projection=[a, b, date]

Optimized PhysicalPlan:
NestedLoopJoinExec: join_type=Inner, filter=date@0 = date@1
  RepartitionExec: partitioning=RoundRobinBatch(12), input_partitions=1
    CustomExec
  CustomExec

Expected behavior

As expected, the equal filter condition is optimized to be an on condition and eventually generates a HashJoinExec.

Optimized LogicalPlan:
Inner Join: a.date = b.date
  SubqueryAlias: a
    TableScan: test_table projection=[a, b, date]
  SubqueryAlias: b
    TableScan: test_table projection=[a, b, date]

Optimized PhysicalPlan:
  HashJoinExec: mode=Partitioned, join_type=Inner, on=[(date@2, date@2)]
    xxx
     CustomExec
    xxx
      CustomExec

Additional context

I tried to look at the detailed source code implementation and found that it should be caused by not considering the timezone of the timestamp type in the function can_hash. If that's the case, I might try to fix it

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions