Give UNION ALL more opportunities for parallel plans in MPP. #1291
Replies: 1 comment
-
done in #1311 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Description
In CBDB's PostgreSQL implementation, UNION ALL operations currently have four processing methods:
three parallel variants (Parallel Append with partial subpaths, Parallel Append with mixed partial/non-partial subpaths, and Append with partial subpaths) and one non-parallel (Append with non-partial subpaths).
While these work well for local queries, we've had to disable parallel execution when Motion nodes appear in subpaths due to a critical correctness issue. The problem occurs when Parallel Append workers mark subnodes as completed, potentially causing premature skipping of Motion-containing branches.
This limitation forces serial execution for most distributed table queries, as shown in example plans where Gather Motion wraps a serial Append with Redistribute Motion nodes in each branch, missing significant optimization opportunities.
But, there is still a chance to be parallel.
I propose a robust solution that first attempts parallel-aware Append when safe (no Motion nodes), then automatically falls back to parallel-oblivious Append when Motion hazards are detected. This works because regular Append reliably executes all subnodes regardless of Motion presence, while CBDB's Motion nodes inherently handle tuples individually without requiring worker coordination.
The benefits extend beyond the UNION ALL operator itself - enabling this parallelization creates ripple effects where subpaths gain more parallel execution opportunities, particularly valuable for complex nested queries. This optimization stands to significantly improve TPC-DS benchmark performance and other analytical workloads involving distributed tables.
We're preparing to implement this with prototype development, TPC-DS testing, and edge case validation. Community feedback is welcome on potential corner cases, benchmarking approaches, and real-world query patterns that might benefit most.
Use case/motivation
No response
Related issues
No response
Are you willing to submit a PR?
Beta Was this translation helpful? Give feedback.
All reactions