Skip to content

Conversation

@gatorsmile
Copy link
Member

In the parser, tableSample clause is part of tableSource.

tableSource
@init { gParent.pushMsg("table source", state); }
@after { gParent.popMsg(state); }
    : tabname=tableName 
    ((tableProperties) => props=tableProperties)?
    ((tableSample) => ts=tableSample)? 
    ((KW_AS) => (KW_AS alias=Identifier) 
    |
    (Identifier) => (alias=Identifier))?
    -> ^(TOK_TABREF $tabname $props? $ts? $alias?)
    ;

Two typical query samples using TABLESAMPLE are:

    "SELECT s.id FROM t0 TABLESAMPLE(10 PERCENT) s"
    "SELECT * FROM t0 TABLESAMPLE(0.1 PERCENT)"

FYI, the logical plan of a TABLESAMPLE query:

sql("SELECT * FROM t0 TABLESAMPLE(0.1 PERCENT)").explain(true)

== Analyzed Logical Plan ==
id: bigint
Project [id#16L]
+- Sample 0.0, 0.001, false, 381
   +- Subquery t0
      +- Relation[id#16L] ParquetRelation

Thanks! cc @liancheng

@gatorsmile gatorsmile changed the title [Spark-13263] [SQL] SQL Generation Support for Tablesample [SPARK-13263] [SQL] SQL Generation Support for Tablesample Feb 10, 2016
@SparkQA
Copy link

SparkQA commented Feb 10, 2016

Test build #51028 has finished for PR 11148 at commit 58b6038.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


// TABLESAMPLE is part of tableSource clause in the parser,
// and thus we must handle it with subquery.
case Sample(lb, ub, withReplacement, _, child @ Subquery(alias, grandChild))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems pretty brittle. i wonder if we can change sample to make it very clear this is either a table sample, or a normal sample.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @rxin. Especially, I don't think MetastoreRelation is wrapped by a Subquery in this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Will follow your suggestions!

# Conflicts:
#	sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala
@gatorsmile
Copy link
Member Author

@rxin After searching the code base, all the samples are table sample. Could you give me more hints? I also agree it looks brittle. I did my best to make it more robust/general.

Since TABLESAMPLE is part of tableSource clause, I am facing two options:

  1. Add the extra handling of TABLESAMPLE into MetastoreRelation and Subquery
  2. Add a new case for TABLESAMPLE.

Let me know which option you like. The current solution is based on option 2.

@rxin
Copy link
Contributor

rxin commented Feb 13, 2016

You can for example in the Sample operator add a method that returns true if it is created from table sample in the parser.

@rxin
Copy link
Contributor

rxin commented Feb 13, 2016

Basically it'd be better to have something very explicit, rather than relying on some conditions that might change in the future.

val aliasName = if (s.child.isInstanceOf[Subquery]) s.alias else ""
val plan = if (s.child.isInstanceOf[Subquery]) s.child else s
build(toSQL(plan), "TABLESAMPLE(" + fraction + " PERCENT)", aliasName)
case _ =>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unable to create a test case for this branch. I do not know if we should throw an exception here or building a SQL.

@gatorsmile
Copy link
Member Author

@rxin I see. Will do it. Thanks!

@SparkQA
Copy link

SparkQA commented Feb 13, 2016

Test build #51236 has finished for PR 11148 at commit 12be2c3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 13, 2016

Test build #51243 has finished for PR 11148 at commit fc9a156.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Feb 14, 2016

Looks more robust this way. I will let @liancheng do the final reviews and merge.

@gatorsmile
Copy link
Member Author

Thanks for your reviews! @rxin

# Conflicts:
#	sql/hive/src/main/scala/org/apache/spark/sql/hive/SQLBuilder.scala
@SparkQA
Copy link

SparkQA commented Feb 16, 2016

Test build #51326 has finished for PR 11148 at commit e4e1b72.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 22, 2016

Test build #51679 has finished for PR 11148 at commit aae08b8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor

LGTM, thanks!

Merged to master. There was a minor conflict. Resolved it manually while merging.

@asfgit asfgit closed this in 8725058 Feb 23, 2016
@gatorsmile
Copy link
Member Author

Thank you! @liancheng @rxin

@gatorsmile gatorsmile deleted the tablesplitsample branch February 23, 2016 08:55
asfgit pushed a commit that referenced this pull request Jun 23, 2017
…rom Alias and AttributeReference

## What changes were proposed in this pull request?
`isTableSample` and `isGenerated ` were introduced for SQL Generation respectively by #11148 and #11050

Since SQL Generation is removed, we do not need to keep `isTableSample`.

## How was this patch tested?
The existing test cases

Author: Xiao Li <[email protected]>

Closes #18379 from gatorsmile/CleanSample.
robert3005 pushed a commit to palantir/spark that referenced this pull request Jun 29, 2017
…rom Alias and AttributeReference

## What changes were proposed in this pull request?
`isTableSample` and `isGenerated ` were introduced for SQL Generation respectively by apache#11148 and apache#11050

Since SQL Generation is removed, we do not need to keep `isTableSample`.

## How was this patch tested?
The existing test cases

Author: Xiao Li <[email protected]>

Closes apache#18379 from gatorsmile/CleanSample.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants