[SPARK-24985][SQL]Avoid full outer join OOM on skewed dataset #29071

sidedoorleftroad · 2020-07-11T02:54:43Z

What issue does this pull request address ?

JIRA: https://issues.apache.org/jira/browse/SPARK-24985
In the case of Full Outer Joins of large tables, in the presence of data skew around the join keys for either of the joined tables, OOMs exceptions occur. While its possible to increase the heap size to workaround, Spark should be resilient to such issues as skews can happen arbitrarily.

What changes were proposed in this pull request?

#16909 introduced ExternalAppendOnlyUnsafeRowArray & changed SortMergeJoinExec to use ExternalAppendOnlyUnsafeRowArray for every join, except 'Full Outer Join'. This PR makes changes to make 'Full Outer Joins' to use ExternalAppendOnlyUnsafeRowArray.

Why are the changes needed?

This PR by @sujithjay use ExternalAppendOnlyUnsafeRowArray instead of ArrayBuffer.
But the performance of the code is very poor, because many iterators are created.
This PR hold the iterator to improve performance.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

JoinSuite and OuterJoinSuite

AmplabJenkins · 2020-07-11T02:58:04Z

Can one of the admins verify this patch?

sidedoorleftroad · 2020-07-11T02:59:55Z

@viirya Can you review this PR please?

github-actions · 2020-10-20T00:54:31Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

[SPARK-24985][SQL]Avoid full outer join OOM on skewed dataset

105354d

probot-autolabeler bot added the SQL label Jul 11, 2020

github-actions bot added the Stale label Oct 20, 2020

github-actions bot closed this Oct 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-24985][SQL]Avoid full outer join OOM on skewed dataset #29071

[SPARK-24985][SQL]Avoid full outer join OOM on skewed dataset #29071

Uh oh!

sidedoorleftroad commented Jul 11, 2020

Uh oh!

AmplabJenkins commented Jul 11, 2020

Uh oh!

sidedoorleftroad commented Jul 11, 2020

Uh oh!

github-actions bot commented Oct 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-24985][SQL]Avoid full outer join OOM on skewed dataset #29071

[SPARK-24985][SQL]Avoid full outer join OOM on skewed dataset #29071

Uh oh!

Conversation

sidedoorleftroad commented Jul 11, 2020

What issue does this pull request address ?

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

AmplabJenkins commented Jul 11, 2020

Uh oh!

sidedoorleftroad commented Jul 11, 2020

Uh oh!

github-actions bot commented Oct 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants