Skip to content

Conversation

@ted-jenks
Copy link
Contributor

@ted-jenks ted-jenks commented May 30, 2023

What changes were proposed in this pull request?

Make CTAS have a UnaryRunnableCommand trait rather than LeafRunnableCommand node to facilitate them having children (the query) that can be traversed.

Why are the changes needed?

The changes introduced to resolve SPARK-41713 in #39220 modified the CTAS commands from having a DataWritingCommand trait to a LeafRunnableCommand trait. The DataWritingCommand trait extends UnaryCommand, and has children set to the value of query in the CTAS command. This means that when transform is called to traverse the tree with the CTAS command at the root, the entire query is traversed. LeafRunnableCommand has a LeafLike trait which explicitly sets the value of children to Nil. This means that when transform is called on the command, no children are found and the query is unaffected by the rule.

In practice, this means that optimizer rules that rely on transform (such as BooleanSimplification) to traverse the tree do not work with a CTAS.

This can be demonstrated with a simple query in spark-shell. Without the CTAS we can run a command with an easily simplified boolean expression (id == 9 && id == 9) and see it gets optimized out:

Working - No Create Table

With a CTAS, the optimisation does not get applied (as we can see from the AND still present in the optimized and physical plans):

Not Working - Create Table

This works in 3.2.0 which had the old CTAS implementation:

Working - 3 2 0

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests cover the existing behavior, then adding a test assert that CTAS command nodes are unary.

@ted-jenks ted-jenks changed the title Make CTAS Have a UnaryRunnableCommand Trait Supporting hildren [WIP] Make CTAS Have a UnaryRunnableCommand Trait Supporting Children May 30, 2023
@github-actions github-actions bot added the SQL label May 30, 2023
@ted-jenks ted-jenks marked this pull request as draft May 30, 2023 16:26
@ted-jenks ted-jenks changed the title [WIP] Make CTAS Have a UnaryRunnableCommand Trait Supporting Children [SPARK-43883][SQL] Make CTAS Have a UnaryRunnableCommand Trait Supporting Children May 30, 2023
@ulysses-you
Copy link
Contributor

We run a nested query execution inside ctas, so the real command for writing is V1WriteCommand. Make ctas inherit leaf node is to avoid applying rules more than once, because we would apply rules for the nested query execution.

The explain of ctas does not show the nested query exectuion, you can check that by UI.

@ted-jenks ted-jenks closed this May 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants