-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-16217][SQL] Support SELECT INTO statement #14191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
| fromClause? | ||
| (WHERE where=booleanExpression)?) | ||
| | ((kind=SELECT setQuantifier? namedExpressionSeq fromClause? | ||
| | ((kind=SELECT setQuantifier? namedExpressionSeq (intoClause? fromClause)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @wuxianxingkong .
Currently, the following seems to be not considered yet. Could you modify the syntax to support this too?
SELECT 1
INTO newtable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, @dongjoon-hyun , thank you for your advice.
SELECT 1
INTO newtableThis won't work because we need oldtable info to create newtable. So the sql should be
SELECT 1
INTO newtable
FROM oldtableThe result from my test is: a new table called newtable was created, one column called 1 has the length of oldtable.rows.length and all elements are 1.
Did you mean there is no FROM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Spark Shell, please run the followings.
sql("select 1")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun
At first, I modify grammar:

But it will affect multiInsertQueryBody rule, i.e.:
FROM OLD_TABLE
INSERT INTO T1
SELECT C1
INSERT INTO T2
SELECT C2The Syntax tree before adding intoClause is:

After adding intoClause ,the tree will be:
This is because INSERT is a nonreserved keyword and matching strategy of antlr.
One of the ways I can think of is to change grammar like this:

This can solve the problem because antlr parser chooses the alternative specified first.
The grammar can support "SELECT 1 INTO newtable" now.
But this will cause confusion about querySpecification rule because of the duplication. Is there any way to make the syntax less verbose?Thanks.
|
Hi, @wuxianxingkong . |
| // Add organization statements. | ||
| optionalMap(ctx.queryOrganization)(withQueryResultClauses). | ||
| // Add insert. | ||
| optionalMap(ctx.insertInto())(withInsertInto) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allows for the following syntax:
INSERT INTO tbl_a
SELECT *
INTO tbl_a
FROM tbl_bMake sure that we cannot have both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to check what this does with multi-insert syntax, i.e.:
FROM tbl_a
INSERT INTO tbl_b
SELECT *
INSERT INTO tbl_c
SELECT *
INTO tbl_c2.Add check in multiinsertquery syntax:not allow multi insert and select into appear at the same time 3.Add check in singleinsertquery:not allow insert into and select into appear at the same time
| */ | ||
| protected def withSelectInto( | ||
| ctx: IntoClauseContext, | ||
| query: LogicalPlan): LogicalPlan = withOrigin(ctx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why throwing a ParseException ?
|
@wuxianxingkong Are you still working on this? Thanks! |
|
We are closing it due to inactivity. please do reopen if you want to push it forward. Thanks! |
What changes were proposed in this pull request?
This PR implements the SELECT INTO statement.
The SELECT INTO statement selects data from one table and inserts it into a new table as follows.
This statement is commonly used in SQL but not currently supported in SparkSQL.
We investigated the Catalyst and found that this statement can be implemented by improving the grammar and reusing the logical plan of CTAS.
The related JIRA is https://issues.apache.org/jira/browse/SPARK-16217
How was this patch tested?
SQLQuerySuite.