Skip to content

Conversation

@aokolnychyi
Copy link
Contributor

What changes were proposed in this pull request?

This PR adds a way to inject data source rewrite rules.

Why are the changes needed?

Right now SparkSessionExtensions allow us to inject optimization rules but they are added to operator optimization batch. There are cases when users need to run rules after the operator optimization batch (e.g. cases when a rule relies on the fact that expressions have been optimized). Currently, this is not possible.

Does this PR introduce any user-facing change?

Yes.

How was this patch tested?

This PR comes with a new test.

@github-actions github-actions bot added the SQL label Dec 2, 2020
@aokolnychyi
Copy link
Contributor Author

@aokolnychyi
Copy link
Contributor Author

We should probably wait for clarity on the discussion here before moving on with this one.

@dongjoon-hyun
Copy link
Member

cc @gatorsmile

@rdblue
Copy link
Contributor

rdblue commented Dec 2, 2020

This looks good to me, but I agree that we should change the name if anyone comes up with a better one.

@SparkQA
Copy link

SparkQA commented Dec 3, 2020

Test build #132072 has finished for PR 30577 at commit a76f4c2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@aokolnychyi
Copy link
Contributor Author

I'll wait to update this until we resolve the name issue.

@dongjoon-hyun
Copy link
Member

@aokolnychyi . Could you address the existing comments first without waiting for the others? They are orthogonal. You don't need to wait. For example,

  • This PR breaks Scala 2.13 build.
  • Test case prefix.

@aokolnychyi
Copy link
Contributor Author

Will do today, @dongjoon-hyun!

@dongjoon-hyun
Copy link
Member

Gentle ping, @aokolnychyi .

Will do today, @dongjoon-hyun!

def injectOptimizerRule(builder: RuleBuilder): Unit = {
optimizerRules += builder
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update the description above, too?

* This current provides the following extension points:
*
* <ul>
* <li>Analyzer Rules.</li>
* <li>Check Analysis Rules.</li>
* <li>Optimizer Rules.</li>
* <li>Planning Strategies.</li>
* <li>Customized Parser.</li>
* <li>(External) Catalog listeners.</li>
* <li>Columnar Rules.</li>
* <li>Adaptive Query Stage Preparation Rules.</li>
* </ul>

Copy link
Contributor Author

@aokolnychyi aokolnychyi Dec 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks for catching this!

@aokolnychyi
Copy link
Contributor Author

Sorry for the delay, @dongjoon-hyun! I've updated the PR now.

@aokolnychyi
Copy link
Contributor Author

Gentle ping @gatorsmile on the name suggestion discussed here.

@SparkQA
Copy link

SparkQA commented Dec 7, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36961/

@SparkQA
Copy link

SparkQA commented Dec 7, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36961/

@SparkQA
Copy link

SparkQA commented Dec 7, 2020

Test build #132361 has finished for PR 30577 at commit 6b38928.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@aokolnychyi
Copy link
Contributor Author

Test failures are in HiveThriftHttpServerSuite.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you for updating, @aokolnychyi .
Merged to master for Apache Spark 3.2.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants