-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-13263] [SQL] SQL Generation Support for Tablesample #11148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #51028 has finished for PR 11148 at commit
|
|
|
||
| // TABLESAMPLE is part of tableSource clause in the parser, | ||
| // and thus we must handle it with subquery. | ||
| case Sample(lb, ub, withReplacement, _, child @ Subquery(alias, grandChild)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems pretty brittle. i wonder if we can change sample to make it very clear this is either a table sample, or a normal sample.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @rxin. Especially, I don't think MetastoreRelation is wrapped by a Subquery in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Will follow your suggestions!
# Conflicts: # sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala
|
@rxin After searching the code base, all the samples are table sample. Could you give me more hints? I also agree it looks brittle. I did my best to make it more robust/general. Since
Let me know which option you like. The current solution is based on option 2. |
|
You can for example in the Sample operator add a method that returns true if it is created from table sample in the parser. |
|
Basically it'd be better to have something very explicit, rather than relying on some conditions that might change in the future. |
| val aliasName = if (s.child.isInstanceOf[Subquery]) s.alias else "" | ||
| val plan = if (s.child.isInstanceOf[Subquery]) s.child else s | ||
| build(toSQL(plan), "TABLESAMPLE(" + fraction + " PERCENT)", aliasName) | ||
| case _ => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unable to create a test case for this branch. I do not know if we should throw an exception here or building a SQL.
|
@rxin I see. Will do it. Thanks! |
|
Test build #51236 has finished for PR 11148 at commit
|
|
Test build #51243 has finished for PR 11148 at commit
|
|
Looks more robust this way. I will let @liancheng do the final reviews and merge. |
|
Thanks for your reviews! @rxin |
# Conflicts: # sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala
|
Test build #51326 has finished for PR 11148 at commit
|
# Conflicts: # sql/hive/src/test/scala/org/apache/spark/sql/hive/LogicalPlanToSQLSuite.scala
|
Test build #51679 has finished for PR 11148 at commit
|
|
LGTM, thanks! Merged to master. There was a minor conflict. Resolved it manually while merging. |
|
Thank you! @liancheng @rxin |
…rom Alias and AttributeReference ## What changes were proposed in this pull request? `isTableSample` and `isGenerated ` were introduced for SQL Generation respectively by #11148 and #11050 Since SQL Generation is removed, we do not need to keep `isTableSample`. ## How was this patch tested? The existing test cases Author: Xiao Li <[email protected]> Closes #18379 from gatorsmile/CleanSample.
…rom Alias and AttributeReference ## What changes were proposed in this pull request? `isTableSample` and `isGenerated ` were introduced for SQL Generation respectively by apache#11148 and apache#11050 Since SQL Generation is removed, we do not need to keep `isTableSample`. ## How was this patch tested? The existing test cases Author: Xiao Li <[email protected]> Closes apache#18379 from gatorsmile/CleanSample.
In the parser, tableSample clause is part of tableSource.
Two typical query samples using TABLESAMPLE are:
FYI, the logical plan of a TABLESAMPLE query:
Thanks! cc @liancheng