[SPARK-43883][SQL] Make CTAS Have a UnaryRunnableCommand Trait Supporting Children
#41386
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Make CTAS have a
UnaryRunnableCommandtrait rather thanLeafRunnableCommandnode to facilitate them having children (the query) that can be traversed.Why are the changes needed?
The changes introduced to resolve SPARK-41713 in #39220 modified the CTAS commands from having a
DataWritingCommandtrait to aLeafRunnableCommandtrait. TheDataWritingCommandtrait extendsUnaryCommand, and has children set to the value of query in the CTAS command. This means that whentransformis called to traverse the tree with the CTAS command at the root, the entire query is traversed.LeafRunnableCommandhas aLeafLiketrait which explicitly sets the value of children toNil. This means that whentransformis called on the command, no children are found and the query is unaffected by the rule.In practice, this means that optimizer rules that rely on
transform(such asBooleanSimplification) to traverse the tree do not work with a CTAS.This can be demonstrated with a simple query in spark-shell. Without the CTAS we can run a command with an easily simplified boolean expression (
id == 9 && id == 9) and see it gets optimized out:With a CTAS, the optimisation does not get applied (as we can see from the
ANDstill present in the optimized and physical plans):This works in 3.2.0 which had the old CTAS implementation:
Does this PR introduce any user-facing change?
No
How was this patch tested?
Existing tests cover the existing behavior, then adding a test assert that CTAS command nodes are unary.