Skip to content

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Feb 20, 2023

What changes were proposed in this pull request?

This PR enhances ConstantPropagation to support more cases.

  1. Propagated through other binary comparisons.
  2. Propagated across equality comparisons. This can be further optimized to false.

For example:

a = 1 and b > a + 2 ==> a = 1 && b > 3
a = 1 and a = 2 ==> 2 = 1 and 1 = 2 ==> false

Why are the changes needed?

Improve query performance. Denodo also has a similar optimization. For example:

CREATE TABLE t1(a int, b int) using parquet;
CREATE TABLE t2(x int, y int) using parquet;

CREATE TEMP VIEW v1 AS                                        
SELECT * FROM t1 JOIN t2 WHERE a = x AND a = 0                
UNION ALL                                                     
SELECT * FROM t1 JOIN t2 WHERE a = x AND (a IS NULL OR a <> 0);

SELECT * FROM v1 WHERE x > 1;

Before this PR:

== Optimized Logical Plan ==
Union false, false
:- Project [a#0 AS a#12, b#1 AS b#13, x#2 AS x#14, y#3 AS y#15]
:  +- Join Inner
:     :- Filter (isnotnull(a#0) AND (a#0 = 0))
:     :  +- Relation spark_catalog.default.t1[a#0,b#1] parquet
:     +- Filter (isnotnull(x#2) AND ((0 = x#2) AND (x#2 > 1)))
:        +- Relation spark_catalog.default.t2[x#2,y#3] parquet
+- Join Inner, (a#16 = x#18)
   :- Filter ((isnull(a#16) OR NOT (a#16 = 0)) AND ((a#16 > 1) AND isnotnull(a#16)))
   :  +- Relation spark_catalog.default.t1[a#16,b#17] parquet
   +- Filter ((isnotnull(x#18) AND (x#18 > 1)) AND (isnull(x#18) OR NOT (x#18 = 0)))
      +- Relation spark_catalog.default.t2[x#18,y#19] parquet

After this PR:

== Optimized Logical Plan ==
Join Inner, (a#16 = x#18)
:- Filter ((isnull(a#16) OR NOT (a#16 = 0)) AND ((a#16 > 1) AND isnotnull(a#16)))
:  +- Relation spark_catalog.default.t1[a#16,b#17] parquet
+- Filter ((isnotnull(x#18) AND (x#18 > 1)) AND (isnull(x#18) OR NOT (x#18 = 0)))
   +- Relation spark_catalog.default.t2[x#18,y#19] parquet

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test.

@github-actions github-actions bot added the SQL label Feb 20, 2023
condition transform {
case e @ EqualTo(_, _) if !predicates.contains(e) => replaceConstants0(e)
case e @ EqualNullSafe(_, _) if !predicates.contains(e) => replaceConstants0(e)
case b: BinaryComparison =>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To support other binary comparisons. For example: >, <.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we look at the commit history and understand why constant propagation only supports equality predicates before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial commit did not support it. It seems other binary comparisons were not considered at the time:
#17993

Later we tried to enhance it. It seems it is not easy because there are too many changes:
#24553

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can replace constants in other expressions too, #24553 does the same (and more).

Copy link
Contributor

@peter-toth peter-toth Mar 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I'm not sure I get the issue. If a complex expression is equal to a constant why we can't use that constant where the complex expression appears? I don't think it matters if some parts of the complex expression can also be replaced to another constat. The replace happens using transformDown so larger complex expressions are replaced to constants first. E.g. a = 1 AND (a + b) = 1 AND (a + b) > 2 -> a = 1 AND (1 + b) = 1 AND 1 > 2 -> false makes sense to me.

  2. Yes we can. The reason why I didn't do those kind of shortcuts in ConstantPropagation in my PR is because there are other, similar cases where it isn't that easy to do a shortcut (or simplification has been handled in other rules so why doing it here as well) . E.g. a = 1 AND a > 2 -> a = 1 AND 1 > 2 -> false where the 2nd step (1 > 2 -> false) could also be handled in ConstantPropagation but those foldable expressions are already handled in ConstantFolding and then in BooleanSimplification when contant propagation has been done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me open a PR tomorrow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #40268.
I didn't want to overcomplicate the change so took only a few parts from #24553. But if you think 1. or 2. or any other feature from the original PR is also needed I can add it to the PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan This is why conflicting equality predicates are not supported: #17993 (comment)
It is correct if it only optimize on the filter condition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sidenote: In my old PR (https://github.com/apache/spark/pull/24553/files#diff-d43484d56a4d9991066b5c00d12ec2465c75131e055fc02ee7fb6dfd45b5006fR76-R79) I proposed #27309 to detect equijoins better and to allow constant propagation in Joins too.

if (!allPredicates.contains(b)) {
replaceConstants0(b, allConstantsMap)
} else {
val excludedEqualityPredicates = equalityPredicates.filterNot(_._2.semanticEquals(b))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exclude current binary comparison to support the following case:

a = 1 and a = 2 ==> 2 = 1 and 1 = 2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't look like constant propagation but a different optimization to simplify predicates:

  • a = 1 and a = 2 ==> false
  • a > 1 and a > 2 ==> a > 2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR does not support a > 1 and a > 2 ==> a > 2. It must contain an equality predicate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we exclude it first? We should add a new optimizer rule later to do more advanced predicate simplification.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean only support a = 1 and a = 2 ==> 2 = 1 and 1 = 2 ==> false, and do not support a = 1 and b > a + 2 ==> a = 1 && b > 3?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, the rule name is ConstantPropagation, so both optimizations fit it.

@wangyum
Copy link
Member Author

wangyum commented Feb 21, 2023

cc @cloud-fan

@wangyum
Copy link
Member Author

wangyum commented Feb 26, 2023

TiDB also supports these optimizations:
image

@github-actions
Copy link

github-actions bot commented Jul 8, 2023

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Jul 8, 2023
@github-actions github-actions bot closed this Jul 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants