Skip to content

Conflicting optimization rules: common_sub_expression_eliminate and push_down_projection #8296

@jonahgao

Description

@jonahgao

Describe the bug

Rule push_down_projection negates the optimization results of common_sub_expression_eliminate, but leaves the useless aliases introduced by common_sub_expression_eliminate. Those added aliases change the signature of the logical plan, also causing the optimizer to never reach the fixed point.

In summary, there are two problems:

  1. The optimization of common_sub_expression_elimination did not work.
  2. Ineffective optimization leads to the optimizer not being able to exit early until it reaches the limit of datafusion.optimizer.max_passes.

To Reproduce

Run explain verbose select a/2, a/2 + 1 from t in CLI

DataFusion CLI v33.0.0

❯ create table t(a bigint);
0 rows in set. Query took 0.006 seconds.

❯ explain verbose select a/2, a/2 + 1 from t;
| logical_plan after common_sub_expression_eliminate         | Projection: t.a / Int64(2)Int64(2)t.a AS t.a / Int64(2), t.a / Int64(2)Int64(2)t.a AS t.a / Int64(2) + Int64(1)                                                                         |
|                                                            |   Projection: t.a / Int64(2) AS t.a / Int64(2)Int64(2)t.a, t.a                                                                                                                          |
|                                                            |     TableScan: t
| ...
|
| logical_plan after push_down_projection                    | Projection: t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2), t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) + Int64(1)                                                           |
|                                                            |   TableScan: t projection=[a]

❯ explain select a/2, a/2 + 1 from t;
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                                              |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | Projection: t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2), t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) + Int64(1) |
|               |   TableScan: t projection=[a]                                                                                                                                     |
| physical_plan | ProjectionExec: expr=[a@0 / 2 as t.a / Int64(2), a@0 / 2 + 1 as t.a / Int64(2) + Int64(1)]                                                                        |
|               |   MemoryExec: partitions=1, partition_sizes=[0]                                                                                                                   |
|               |                                                                                                                                                                   |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+

common_sub_expression_eliminate generates a new child projection which includes common expressions under the original projection, and push_down_projection merges them into one.

The final projection is:

  • a/2: t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2)
  • a/2+1: t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) + Int64(1)

Three duplicate aliases appeared, corresponding to datafusion.optimizer.max_passes=3.

If I set datafusion.optimizer.max_passes=10

set datafusion.optimizer.max_passes=10;
0 rows in set. Query took 0.002 seconds.

❯ explain select a/2, a/2 + 1 from t;
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                                                                                                                                                                                                                                                                                                          |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | Projection: t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2), t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) + Int64(1) |
|               |   TableScan: t projection=[a]                                                                                                                                                                                                                                                                                                                                                                                                 |
| physical_plan | ProjectionExec: expr=[a@0 / 2 as t.a / Int64(2), a@0 / 2 + 1 as t.a / Int64(2) + Int64(1)]                                                                                                                                                                                                                                                                                                                                    |
|               |   MemoryExec: partitions=1, partition_sizes=[0]                                                                                                                                                                                                                                                                                                                                                                               |
|               |                                                                                                                                                                                                                                                                                                                                                                                                                               |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set. Query took 0.025 seconds.

One of the expr in the final projection will be t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2) AS t.a / Int64(2).
There are too many unnecessary aliases.

Expected behavior

The logic merge_projection inside the rule push_down_projection should not undo common_sub_expression_eliminate.

Additional context

I plan to work on this in the next few days.

My initial thought is to fix it in the push_down_projection side.

Do not execute merge_projection if an expression in the child projection satisfies the following conditions:

  1. It has been referenced by the parent projection two or more times. Keep the optimization effect of common_sub_expression_eliminate.
  2. Its evaluation is non-trivial. Its type is not Column or Literal.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions