Skip to content

Conversation

peter-toth
Copy link
Contributor

@peter-toth peter-toth commented May 8, 2019

What changes were proposed in this pull request?

This PR improves ConstantPropagation rule, the new implementation:

  • Collects constants into a map directly, instead of a list that is then transformed into a map.
  • Supports substituting in conflicting equalities:
    e.g. a = 1 AND a = 2 => a = 1 AND 1 = 2
    Before this PR this unsatisfiable filter was not reduced:
    == Physical Plan ==
    *(1) Project [value#219 AS a#222]
    +- *(1) Filter ((isnotnull(value#219) AND (value#219 = 1)) AND (value#219 = 2))
       +- *(1) LocalTableScan [value#219]
    
    But after this PR it is:
    == Physical Plan ==
    LocalTableScan <empty>, [a#222]
    
  • Extends propagation from attribute => constant mapping to deterministic expression => constant mapping:
    e.g. abs(a) = 5 AND b = abs(a) + 3 => abs(a) = 5 AND b = 8
  • Allows substitution in other than EqualTo and EqualNullSafe expressions:
    e.g. a = 5 AND b < a + 3 => a = 5 AND b < 8
  • Allows propagation of constant non-nullable expressions in Project and other nodes (except for Join):
    e.g. SELECT a = 5 AND b = a + 3 => SELECT a = 5 AND b = 8
    Before this PR only Filter was supported.
  • Allows deep constant propagation:
    e.g. ... WHERE IF(..., a = 5 AND b = a + 3, ...) => ... WHERE IF(..., a = 5 AND b = 8, ...)
    Before this PR only top level And/Or/Not nodes were supported.
  • During expression tree traversal tracks a boolean context that controls if constant propagation of a nullable expression can be safely applied.
    • E.g. in the case of ... WHERE a = c AND f(a) or IF(a = c AND f(a), ..., ...) where a is a nullable expression and c is a constant the null result of a = c AND f(a) means the same as if it resulted false therefore constant propagation can be safely applied (a = c AND f(a) => a = c AND f(c)).
    • In the case of SELECT a = c AND f(a) the null result really means null. In this context constant propagation can't be applied safely.
    • There is also a 3rd context due to an enclosing Not in which the context flips. E.g. constant propagation can't be applied on WHERE NOT(a = c AND f(a)) but can be again on WHERE NOT(IF(..., NOT(a = c AND f(a)), ...).

Why are the changes needed?

Improve performance.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New UTs.

@peter-toth
Copy link
Contributor Author

This PR is extracted from #24495 as @dongjoon-hyun suggested to make review easier.
Please note that new UTs added in #24495 depend on this change.

Copy link
Contributor

@mgaido91 mgaido91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peter-toth may you please also add some UTs? eg. the example you added in the scaladoc may be also formalized with a UT... thanks.

@peter-toth
Copy link
Contributor Author

@mgaido91, I've moved the relevant UTs from the other PR here.

@dongjoon-hyun
Copy link
Member

ok to test

1 similar comment
@HyukjinKwon
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented May 9, 2019

Test build #105274 has finished for PR 24553 at commit 7a29f09.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Contributor

mgaido91 commented May 9, 2019

retest this please

@SparkQA
Copy link

SparkQA commented May 9, 2019

Test build #105275 has finished for PR 24553 at commit 7a29f09.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 9, 2019

Test build #105278 has finished for PR 24553 at commit 96e54a7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 9, 2019

Test build #105283 has finished for PR 24553 at commit 8be22b6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@peter-toth
Copy link
Contributor Author

Any comments or suggestions @dongjoon-hyun or @gatorsmile ?

@peter-toth
Copy link
Contributor Author

peter-toth commented May 23, 2019

@dongjoon-hyun @gatorsmile @mgaido91 @viirya @HyukjinKwon any feedback or suggestion is very welcome.

Copy link
Contributor

@mgaido91 mgaido91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the change makes sense in general, just some questions on the tests. Thanks @peter-toth

@peter-toth
Copy link
Contributor Author

Thanks so much @mgaido91 for the review. Please let me know if you have any more questions.

@SparkQA
Copy link

SparkQA commented May 24, 2019

Test build #105769 has finished for PR 24553 at commit 6bccd84.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 25, 2019

Test build #105780 has finished for PR 24553 at commit 827cfe5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Contributor

LGTM, thanks!

@peter-toth
Copy link
Contributor Author

@dongjoon-hyun @gatorsmile @cloud-fan @viirya @HyukjinKwon could you please help me to review this PR?

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a cursory look, seems making sense but I guess @hvanhovell should take a look as well.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jun 14, 2019

Test build #106493 has finished for PR 24553 at commit 827cfe5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 22, 2019

Test build #106794 has finished for PR 24553 at commit 054624e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@peter-toth
Copy link
Contributor Author

retest this please

@peter-toth
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jan 14, 2020

Test build #116717 has finished for PR 24553 at commit 39c563d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 14, 2020

Test build #116721 has finished for PR 24553 at commit 39c563d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@peter-toth peter-toth changed the title [WIP][SPARK-27604][SQL] Enhance constant propagation [SPARK-27604][SQL] Enhance constant propagation Jan 15, 2020
@peter-toth peter-toth changed the title [SPARK-27604][SQL] Enhance constant propagation [WIP][SPARK-27604][SQL] Enhance constant propagation Jan 15, 2020
@peter-toth peter-toth force-pushed the SPARK-27604-enhanced-constant-propagation branch from 39c563d to 85d5430 Compare January 15, 2020 16:15
@SparkQA
Copy link

SparkQA commented Jan 15, 2020

Test build #116792 has finished for PR 24553 at commit 85d5430.

  • This patch fails from timeout after a configured wait of 400m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 16, 2020

Test build #116861 has finished for PR 24553 at commit 0c227c3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 16, 2020

Test build #116866 has finished for PR 24553 at commit 48802b6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 18, 2020

Test build #116980 has finished for PR 24553 at commit 4f0a89a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@peter-toth peter-toth force-pushed the SPARK-27604-enhanced-constant-propagation branch from 4f0a89a to 5d67ffd Compare January 24, 2020 12:55
@SparkQA
Copy link

SparkQA commented Jan 24, 2020

Test build #117354 has finished for PR 24553 at commit 5d67ffd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@peter-toth peter-toth force-pushed the SPARK-27604-enhanced-constant-propagation branch from 5d67ffd to 767233e Compare January 29, 2020 16:18
@SparkQA
Copy link

SparkQA commented Jan 29, 2020

Test build #117514 has finished for PR 24553 at commit 767233e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@peter-toth peter-toth force-pushed the SPARK-27604-enhanced-constant-propagation branch from 767233e to 6c33c71 Compare January 31, 2020 15:12
@SparkQA
Copy link

SparkQA commented Jan 31, 2020

Test build #117661 has finished for PR 24553 at commit 6c33c71.

  • This patch fails Scala style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@peter-toth peter-toth force-pushed the SPARK-27604-enhanced-constant-propagation branch from 6c33c71 to 63ea687 Compare January 31, 2020 15:30
@SparkQA
Copy link

SparkQA commented Jan 31, 2020

Test build #117663 has finished for PR 24553 at commit 63ea687.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@peter-toth peter-toth changed the title [WIP][SPARK-27604][SQL] Enhance constant propagation [SPARK-27604][SQL] Enhance constant propagation Feb 3, 2020
@peter-toth
Copy link
Contributor Author

@cloud-fan, @dongjoon-hyun, @maropu I improved ConstantPropagation a bit in this PR. Do you think you could review it?

@SparkQA
Copy link

SparkQA commented Feb 13, 2020

Test build #118355 has finished for PR 24553 at commit 6d994ab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants