-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-37518][SQL] Inject an early scan pushdown rule #34779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -42,6 +42,7 @@ import org.apache.spark.sql.execution.{ColumnarRule, SparkPlan} | |
| * <li>Check Analysis Rules.</li> | ||
| * <li>Optimizer Rules.</li> | ||
| * <li>Pre CBO Rules.</li> | ||
| * <li>Early Scan Push-Down</li> | ||
| * <li>Planning Strategies.</li> | ||
| * <li>Customized Parser.</li> | ||
| * <li>(External) Catalog listeners.</li> | ||
|
|
@@ -226,6 +227,24 @@ class SparkSessionExtensions { | |
| preCBORules += builder | ||
| } | ||
|
|
||
| private[this] val earlyScanPushDownRules = mutable.Buffer.empty[RuleBuilder] | ||
|
|
||
| private[sql] def buildEarlyScanPushDownRules(session: SparkSession): Seq[Rule[LogicalPlan]] = { | ||
| earlyScanPushDownRules.map(_.apply(session)).toSeq | ||
| } | ||
|
|
||
| /** | ||
| * Inject an optimizer `Rule` builder that rewrites logical plans into the [[SparkSession]]. | ||
| * The injected rules will be executed once after the operator optimization batch and | ||
| * after any push down optimization rules. | ||
| * 'Pre CBO Rules' and 'Early Scan Push-Down' are executed before and after | ||
| * `V2ScanRelationPushDown`. So the user can apply the custom rules related to pushdown | ||
| * after `V2ScanRelationPushDown` fails. | ||
| */ | ||
| def injectEarlyScanPushDownRule(builder: RuleBuilder): Unit = { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's update the classdoc, which only mentions
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We also need to clarify how is this different from |
||
| earlyScanPushDownRules += builder | ||
| } | ||
|
|
||
| private[this] val plannerStrategyBuilders = mutable.Buffer.empty[StrategyBuilder] | ||
|
|
||
| private[sql] def buildPlannerStrategies(session: SparkSession): Seq[Strategy] = { | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -95,6 +95,12 @@ class SparkSessionExtensionSuite extends SparkFunSuite { | |||
| } | ||||
| } | ||||
|
|
||||
| test("SPARK-37518: inject a early scan push down rule") { | ||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: I think we only require JIRA id for bug fixes and regressions?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just references spark/sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala Line 92 in a3bc014
|
||||
| withSession(Seq(_.injectEarlyScanPushDownRule(MyRule))) { session => | ||||
| assert(session.sessionState.optimizer.earlyScanPushDownRules.contains(MyRule(session))) | ||||
| } | ||||
| } | ||||
|
|
||||
| test("inject spark planner strategy") { | ||||
| withSession(Seq(_.injectPlannerStrategy(MySparkStrategy))) { session => | ||||
| assert(session.sessionState.planner.strategies.contains(MySparkStrategy(session))) | ||||
|
|
||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After so many discussions in #30808 , I'm really worried about the naming of this new extension point.
In general, this new extension point allows people to inject custom data source operator pushdown rules, which run after the built-in ones. But then the existing
Pre CBO rulesbecomes a confusing name, as the pushdown rules are also pre-CBO.We may need more time to think about the naming, or think if really need custom data source pushdown rules.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that the name of
Pre-CBO Rulesis too wide and the meaning in the comment is not the same. In my opinion we should not limit the flexibility because of this name.