[SPARK-33621][SQL] Add a way to inject data source rewrite rules #30577

aokolnychyi · 2020-12-02T21:24:36Z

What changes were proposed in this pull request?

This PR adds a way to inject data source rewrite rules.

Why are the changes needed?

Right now SparkSessionExtensions allow us to inject optimization rules but they are added to operator optimization batch. There are cases when users need to run rules after the operator optimization batch (e.g. cases when a rule relies on the fact that expressions have been optimized). Currently, this is not possible.

Does this PR introduce any user-facing change?

Yes.

How was this patch tested?

This PR comes with a new test.

aokolnychyi · 2020-12-02T21:26:17Z

cc @holdenk @dbtsai @dongjoon-hyun @rdblue @cloud-fan @sunchao @viirya @HyukjinKwon

aokolnychyi · 2020-12-02T22:01:08Z

We should probably wait for clarity on the discussion here before moving on with this one.

dongjoon-hyun · 2020-12-02T22:47:31Z

cc @gatorsmile

rdblue · 2020-12-02T22:58:55Z

This looks good to me, but I agree that we should change the name if anyone comes up with a better one.

sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala

sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala

SparkQA · 2020-12-03T01:06:29Z

Test build #132072 has finished for PR 30577 at commit a76f4c2.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

aokolnychyi · 2020-12-03T09:50:23Z

I'll wait to update this until we resolve the name issue.

dongjoon-hyun · 2020-12-03T17:17:05Z

@aokolnychyi . Could you address the existing comments first without waiting for the others? They are orthogonal. You don't need to wait. For example,

This PR breaks Scala 2.13 build.
Test case prefix.

aokolnychyi · 2020-12-04T17:11:42Z

Will do today, @dongjoon-hyun!

dongjoon-hyun · 2020-12-06T09:31:12Z

Gentle ping, @aokolnychyi .

Will do today, @dongjoon-hyun!

maropu · 2020-12-07T01:34:49Z

sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala

  def injectOptimizerRule(builder: RuleBuilder): Unit = {
    optimizerRules += builder
  }



Could you update the description above, too?

spark/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala

Lines 37 to 48 in 29096a8

* This current provides the following extension points:

*

* <ul>

* <li>Analyzer Rules.</li>

* <li>Check Analysis Rules.</li>

* <li>Optimizer Rules.</li>

* <li>Planning Strategies.</li>

* <li>Customized Parser.</li>

* <li>(External) Catalog listeners.</li>

* <li>Columnar Rules.</li>

* <li>Adaptive Query Stage Preparation Rules.</li>

* </ul>

Done, thanks for catching this!

aokolnychyi · 2020-12-07T11:25:07Z

Sorry for the delay, @dongjoon-hyun! I've updated the PR now.

aokolnychyi · 2020-12-07T11:26:13Z

Gentle ping @gatorsmile on the name suggestion discussed here.

SparkQA · 2020-12-07T12:16:14Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36961/

SparkQA · 2020-12-07T12:43:58Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36961/

SparkQA · 2020-12-07T15:15:59Z

Test build #132361 has finished for PR 30577 at commit 6b38928.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

aokolnychyi · 2020-12-07T15:49:35Z

Test failures are in HiveThriftHttpServerSuite.

dongjoon-hyun

+1, LGTM. Thank you for updating, @aokolnychyi .
Merged to master for Apache Spark 3.2.0.

[SPARK-33621][SQL] Add a way to inject data source rewrite rules

a76f4c2

github-actions bot added the SQL label Dec 2, 2020

rdblue approved these changes Dec 2, 2020

View reviewed changes

HyukjinKwon reviewed Dec 3, 2020

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala Outdated Show resolved Hide resolved

HyukjinKwon reviewed Dec 3, 2020

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala Outdated Show resolved Hide resolved

maropu reviewed Dec 7, 2020

View reviewed changes

Review round 1

6b38928

dongjoon-hyun approved these changes Dec 7, 2020

View reviewed changes

dongjoon-hyun closed this in 02508b6 Dec 7, 2020

HyukjinKwon mentioned this pull request Dec 15, 2020

[SPARK-23889][SQL] DataSourceV2: required sorting and clustering for writes #29066

Closed

aokolnychyi mentioned this pull request Dec 15, 2020

[SPARK-33612][SQL] Add dataSourceRewriteRules batch to Optimizer #30558

Closed

	* This current provides the following extension points:
	*
	* <ul>
	* <li>Analyzer Rules.</li>
	* <li>Check Analysis Rules.</li>
	* <li>Optimizer Rules.</li>
	* <li>Planning Strategies.</li>
	* <li>Customized Parser.</li>
	* <li>(External) Catalog listeners.</li>
	* <li>Columnar Rules.</li>
	* <li>Adaptive Query Stage Preparation Rules.</li>
	* </ul>

Uh oh!

[SPARK-33621][SQL] Add a way to inject data source rewrite rules #30577

[SPARK-33621][SQL] Add a way to inject data source rewrite rules #30577

Uh oh!

Conversation

aokolnychyi commented Dec 2, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

aokolnychyi commented Dec 2, 2020

Uh oh!

aokolnychyi commented Dec 2, 2020

Uh oh!

dongjoon-hyun commented Dec 2, 2020

Uh oh!

rdblue commented Dec 2, 2020

Uh oh!

Uh oh!

Uh oh!

SparkQA commented Dec 3, 2020

Uh oh!

aokolnychyi commented Dec 3, 2020

Uh oh!

dongjoon-hyun commented Dec 3, 2020

Uh oh!

aokolnychyi commented Dec 4, 2020

Uh oh!

dongjoon-hyun commented Dec 6, 2020

Uh oh!

maropu Dec 7, 2020

Choose a reason for hiding this comment

Uh oh!

aokolnychyi Dec 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aokolnychyi commented Dec 7, 2020

Uh oh!

aokolnychyi commented Dec 7, 2020

Uh oh!

SparkQA commented Dec 7, 2020

Uh oh!

SparkQA commented Dec 7, 2020

Uh oh!

SparkQA commented Dec 7, 2020

Uh oh!

aokolnychyi commented Dec 7, 2020

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

aokolnychyi Dec 7, 2020 •

edited

Loading