-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Is your feature request related to a problem or challenge?
Broken out from #9577 where @mustafasrepo @comphead and @jayzhan211 and I were discussing optimizer performance
TLDR is that the datafusion optimizer is slow. When I did some profiling locally by running the following
cargo bench --bench sql_planner -- physical_plan_tpch_allMy analysis is that almost 40% of the planning time is spent in SimplifyExprs and CommonSubexprEliminate and most of that time is related to copying expressions from what I can tell
While those passes themselves internally make a bunch of clones, which we are improving (e.g. @jayzhan211 on #9628) I think there is a more fundamental structural problem
I think a core challenge is that the OptimizerRule trait pretty much requires copying Exprs on each pass, as it gets a &LogicalPlan input, but produces a LogicalPlan output
// Required methods
fn try_optimize(
&self,
plan: &LogicalPlan,
config: &dyn OptimizerConfig
) -> Result<Option<LogicalPlan>, DataFusionError>;This mean any pass that works on Exprs must clone all Exprs (by calling LogicalPlan::expressions()) rewrite them, and then then create a new LogicalPlan with those new Exprs.
Here is that pattern in the expression simplifier:
Describe the solution you'd like
Find some way to avoid clone'ing exprs during LogicalPlan rewrite
Update: here are the tasks:
Infrastructure Preparation
- Refactor
Optimizerto use owned plans andTreeNodeAPI (10% faster planning) #9948 - Avoid
LogicalPlan::clone()inLogicalPlan::map_childrenwhen possible #9999
Update OptimizerRules to avoid copying
-
SimplifyExpressions: IntroduceOptimizerRule::rewriteto rewrite in place, rewriteExprSimplifier(20% faster planning) #9954 -
CommonSubexprEliminate: Stop copyingExprs and LogicalPlans so much during Common Subexpression Elimination #9873 -
DecorrelatePredicateSubquery: Stop copying LogicalPlan and Exprs inDecorrelatePredicateSubquery#10289 -
EliminateCrossJoin: Stop copying LogicalPlan and Exprs inEliminateCrossJoin#10287 -
EliminateDuplicatedExpr: refactorEliminateDuplicatedExproptimizer pass to avoid clone #10218 -
EliminateFilter: Stop copying LogicalPlan and Exprs inEliminateFilter#10288 -
EliminateJoin: Implement rewrite for EliminateOneUnion and EliminateJoin #10184 -
EliminateLimit: Stop copying LogicalPlan and Exprs inEliminateLimit#10212 -
EliminateNestedUnion: Stop copying LogicalPlan and Exprs inEliminateNestedUnion#10296 -
EliminateOneUnion: Implement rewrite for EliminateOneUnion and EliminateJoin #10184 -
EliminateOuterJoin: RefactorEliminateOuterJointo implementOptimizerRule::rewrite()#10081 -
ExtractEquijoinPredicate: implement rewrite for ExtractEquijoinPredicate and avoid clone in filter #10165 -
FilterNullJoinKeysimplement rewrite for FilterNullJoinKeys #10166 -
OptimizeProjections: Stop copying LogicalPlan and Exprs inOptimizeProjections#10209 -
PropagateEmptyRelation: Stop copying LogicalPlan and Exprs inPropagateEmptyRelation#10290 -
PushDownFilter: Stop copying LogicalPlan and Exprs inPushDownFilter#10291 -
PushDownLimit: Stop copying LogicalPlan and Exprs inPushDownLimit#10292 -
ReplaceDistinctWithAggregate: Stop copying LogicalPlan and Exprs inReplaceDistinctWithAggregate#10293 -
RewriteDisjunctivePredicate: Stop copying LogicalPlan and Exprs inRewriteDisjunctivePredicate#10213 -
ScalarSubqueryToJoin: Stop copying LogicalPlan and Exprs inScalarSubqueryToJoin#10294 -
SingleDistinctToGroupBy: Stop copying LogicalPlan and Exprs inSingleDistinctToGroupBy#10295 -
UnwrapCastInComparisonRefactorUnwrapCastInComparisonto implementOptimizerRule::rewrite()#10087 / RefactorUnwrapCastInComparisonto removeExprclones #10115
Update AnalyzerRules to avoid copying
-
AnalyzerMisc: Minor: Avoid copying all expressions inAnalzyer/check_plan#9974 -
InlineTableScan: Avoid copies inInlineTableScanvia TreeNode API #10038 -
ApplyFunctionRewrites: fixNamedStructField should be rewritten in OperatorToFunctionin subquery regression (changeApplyFunctionRewritesto use TreeNode API #10032 -
TypeCoercion: Stop copying LogicalPlan and Exprs inTypeCoercion#10210 -
TypeCoercionmore: Onyl recompute schema inTypeCoercionwhen necessary #10365 -
CountWildcardRule: (needs a little reworking to avoid clones) Avoid copies inCountWildcardRulevia TreeNode API #10066
Update Other to avoid copying
Describe alternatives you've considered
No response
Additional context
We have talked about various other ways to reduce copying of LogicalPlans as well as its challenges in other tickets:
