-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feature: Support EXPLAIN COPY
#7291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fc3318d to
108b78e
Compare
| Ok(s.to_uppercase()) | ||
| } | ||
|
|
||
| /// DataFusion specific EXPLAIN (needed so we can EXPLAIN datafusion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the key thing -- we need a datafusion specific EXPLAIN statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to keep these datafusion specific statements in general vs. moving this upstream to the sqlparser crate? This distinction was difficult for me to understand when first diving into how statements are parsed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the high level rationale is that sqlparser-rs intends to parse sql for one or more existing "standard" sql dialects (like MySQL or Postgres). However, these statements (COPY and CREATE EXTERNAL TABLE are DataFusion specific extensions). I will make a PR with some docs to to try and clarify this rationale
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR with some docs: #7318
| query error DataFusion error: SQL error: ParserError\("Expected end of statement, found: source_table"\) | ||
| query TT | ||
| EXPLAIN COPY source_table to 'test_files/scratch/table' (format parquet, per_thread_output true) | ||
| ---- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is an example of the explain plan working
|
|
||
| ## explain nested | ||
| statement error Explain must be root of the plan | ||
| query error DataFusion error: Error during planning: Nested explain not supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these errors are a little more clear
| pub target: String, | ||
| /// Target specific options | ||
| pub options: HashMap<String, Value>, | ||
| pub options: Vec<(String, Value)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also changed the parser to preserve the order of the options so the output is consistent (rather than whatever order the hashmap decided). Without that the CI failed because the output order was different on windows than it was on Linux -- see this example
metesynnada
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Converting the hashmap into a vector was a clever move.
datafusion/sql/src/statement.rs
Outdated
| let plan = self.sql_statement_to_plan(statement)?; | ||
| let plan = self.statement_to_plan(statement)?; | ||
| if matches!(plan, LogicalPlan::Explain(_)) { | ||
| return plan_err!("Nested explain not supported"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the error can be Nested EXPLAINs are not supported..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 in 5c46949
Co-authored-by: Metehan Yıldırım <[email protected]>
…n into alamb/explain_copy
jackwener
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job😍.
Next PR we can add it into doc.
https://arrow.apache.org/datafusion/user-guide/sql/explain.html
| fn explain_to_plan( | ||
| &self, | ||
| verbose: bool, | ||
| analyze: bool, | ||
| statement: Statement, | ||
| statement: DFStatement, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use ExplainStatement as param?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there are two places this is called -- one from the DataFusion ExplainStatement (introduced in this PR) and one from the SQL parser Explain. So I thought it best to support them both and simply pass the parameters on through that are actually used
Which issue does this PR close?
Part of #6539
Rationale for this change
It is important to see what plans are coming out for copy
What changes are included in this PR?
EXPLAINplansAre these changes tested?
Yes
Are there any user-facing changes?
EXPLAINetc works now forCOPYstatements