[SPARK-32717][SQL] Add a AQEOptimizer for AdaptiveSparkPlanExec #29559

Ngone51 · 2020-08-27T13:02:10Z

What changes were proposed in this pull request?

This PR proposes to add a specific AQEOptimizer for the AdaptiveSparkPlanExec instead of implementing an anonymous RuleExecutor. At the same time, this PR also adds the configuration spark.sql.adaptive.optimizer.excludedRules, which follows the same pattern of Optimizer, to make the AQEOptimizer more flexible for users and developers.

Why are the changes needed?

Currently, AdaptiveSparkPlanExec has implemented an anonymous RuleExecutor to apply the AQE optimize rules on the plan. However, the anonymous class usually could be inconvenient to maintain and extend for the long term.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

It's a pure refactor so pass existing tests should be ok.

Ngone51 · 2020-08-27T13:03:07Z

@cloud-fan @maryannxue Please take a look, thanks!

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala

SparkQA · 2020-08-27T17:44:44Z

Test build #127955 has finished for PR 29559 at commit 291db45.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-08-27T19:31:08Z

Test build #127956 has finished for PR 29559 at commit 9f4d3b5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

c21 · 2020-08-27T19:42:38Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

+  val ADAPTIVE_OPTIMIZER_EXCLUDED_RULES =
+   buildConf("spark.sql.adaptive.optimizer.excludedRules")
+    .doc("Configures a list of rules to be disabled in the adaptive optimizer, in which the " +
+     "rules are specified by their rule names and separated by comma. The optimizer will log " +
+      "the rules that have indeed been excluded.")
+    .version("3.1.0")
+    .stringConf
+    .createOptional


Just wondering why we need to introduce a config for user to disable rule in AQE? Is there a use case we are considering now? I am seeing spark.sql.optimizer.excludedRules was introduced to work around https://issues.apache.org/jira/browse/SPARK-24624 (#21764 (comment)), but not sure what's the use case here for AQE.

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

Ngone51 · 2020-08-28T05:55:02Z

@c21 @maropu Thank you for your comments. spark.sql.adaptive.optimizer.excludedRule doesn't have a real use case yet, but I think it's quite intuitive to follow the normal optimizer given the same reason(more flexible for users). I also think it can be very useful for developers to write certain tests in AQE. For example, If they want to avoid applying the rule DemoteBroadcastHashJoin. Previously, they have to tweak the size of the test data and the partition num, which can be troublesome. With this configuration, they can just exclude the rule easily.

maropu · 2020-08-28T06:24:52Z

yea, adding it itself looks okay to me.

c21 · 2020-08-28T06:26:27Z

@Ngone51 - thanks, the rationale for developer makes sense to me.

Ngone51 · 2020-08-28T07:42:30Z

Thanks, I've added a unit test and updated the PR description for the rule excluding configuration.

SparkQA · 2020-08-28T12:18:39Z

Test build #127991 has finished for PR 29559 at commit cfd348e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2020-08-28T12:23:46Z

Merged to master.

Ngone51 · 2020-08-28T12:44:56Z

thanks all!

Ngone51 added 2 commits August 27, 2020 20:47

add AQEOptimizer

5e3c6ef

end of line

291db45

probot-autolabeler bot added the SQL label Aug 27, 2020

cloud-fan approved these changes Aug 27, 2020

View reviewed changes

HyukjinKwon reviewed Aug 27, 2020

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala Outdated Show resolved Hide resolved

HyukjinKwon approved these changes Aug 27, 2020

View reviewed changes

add final

9f4d3b5

c21 reviewed Aug 27, 2020

View reviewed changes

maropu reviewed Aug 28, 2020

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala Show resolved Hide resolved

maropu reviewed Aug 28, 2020

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala Outdated Show resolved Hide resolved

Ngone51 added 2 commits August 28, 2020 14:57

fix indent

a1d1cd1

add test

cfd348e

maropu approved these changes Aug 28, 2020

View reviewed changes

HyukjinKwon closed this in c3b9404 Aug 28, 2020

[SPARK-32717][SQL] Add a AQEOptimizer for AdaptiveSparkPlanExec #29559

[SPARK-32717][SQL] Add a AQEOptimizer for AdaptiveSparkPlanExec #29559

Uh oh!

Conversation

Ngone51 commented Aug 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Ngone51 commented Aug 27, 2020

Uh oh!

Uh oh!

SparkQA commented Aug 27, 2020

Uh oh!

SparkQA commented Aug 27, 2020

Uh oh!

c21 Aug 27, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Ngone51 commented Aug 28, 2020

Uh oh!

maropu commented Aug 28, 2020

Uh oh!

c21 commented Aug 28, 2020

Uh oh!

Ngone51 commented Aug 28, 2020

Uh oh!

SparkQA commented Aug 28, 2020

Uh oh!

HyukjinKwon commented Aug 28, 2020

Uh oh!

Ngone51 commented Aug 28, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Ngone51 commented Aug 27, 2020 •

edited

Loading