Skip to content

Conversation

@sidedoorleftroad
Copy link
Contributor

What issue does this pull request address ?

JIRA: https://issues.apache.org/jira/browse/SPARK-24985
In the case of Full Outer Joins of large tables, in the presence of data skew around the join keys for either of the joined tables, OOMs exceptions occur. While its possible to increase the heap size to workaround, Spark should be resilient to such issues as skews can happen arbitrarily.

What changes were proposed in this pull request?

#16909 introduced ExternalAppendOnlyUnsafeRowArray & changed SortMergeJoinExec to use ExternalAppendOnlyUnsafeRowArray for every join, except 'Full Outer Join'. This PR makes changes to make 'Full Outer Joins' to use ExternalAppendOnlyUnsafeRowArray.

Why are the changes needed?

This PR by @sujithjay use ExternalAppendOnlyUnsafeRowArray instead of ArrayBuffer.
But the performance of the code is very poor, because many iterators are created.
This PR hold the iterator to improve performance.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

JoinSuite and OuterJoinSuite

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@sidedoorleftroad
Copy link
Contributor Author

@viirya Can you review this PR please?

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Oct 20, 2020
@github-actions github-actions bot closed this Oct 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants