-
Notifications
You must be signed in to change notification settings - Fork 28.9k
SPARK-2791: Fix committing, reverting and state tracking in shuffle file consolidation #1678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
QA tests have started for PR 1678. This patch merges cleanly. |
QA results for PR 1678: |
All changes from this PR are by @mridulm and are drawn from his work in apache#1609. This patch is intended to fix all major issues related to shuffle file consolidation that @mridulm found, while minimizing changes to the code, with the hope that it may be more easily merged into 1.1. This patch is **not** intended as a replacement for apache#1609, which provides many additional benefits, including fixes to ExternalAppendOnlyMap, improvements to DiskBlockObjectWriter's API, and several new unit tests. If it is feasible to merge apache#1609 for the 1.1 deadline, that is a preferable option.
QA tests have started for PR 1678. This patch merges cleanly. |
We can also merge it to 1.0 branch. |
That's true, this is small enough to add back into branch-1.0 too. We should see if we can do the same for the rest of #1609 too (ideally I would like to merge that whole PR into branch-1.0, but I'm not sure how easy that is). |
QA results for PR 1678: |
QA tests have started for PR 1678. This patch merges cleanly. |
QA results for PR 1678: |
Jenkins, retest this please. |
QA tests have started for PR 1678. This patch merges cleanly. |
QA results for PR 1678: |
Both prior failures are due to being unable to bind to ports, which sounds like leftover state from other jenkins runs (especially since they were two entirely disjoint sets of failed tests related to port binding). |
Jenkins, retest this please. |
QA tests have started for PR 1678. This patch merges cleanly. |
QA results for PR 1678: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert can throw exception : which will cause other writers to not revert.
We need to wrap it in try/catch, log and continue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert actually doesn't throw, per its (updated) comment.
Jenkins, test this please |
QA tests have started for PR 1678. This patch merges cleanly. |
QA results for PR 1678: |
@aarondav can you update the name of this PR to reference this sub-task JIRA instead: https://issues.apache.org/jira/browse/SPARK-2791? |
Updated title. |
Alright, going to merge it then. The changes look good to me. |
…ile consolidation All changes from this PR are by mridulm and are drawn from his work in apache#1609. This patch is intended to fix all major issues related to shuffle file consolidation that mridulm found, while minimizing changes to the code, with the hope that it may be more easily merged into 1.1. This patch is **not** intended as a replacement for apache#1609, which provides many additional benefits, including fixes to ExternalAppendOnlyMap, improvements to DiskBlockObjectWriter's API, and several new unit tests. If it is feasible to merge apache#1609 for the 1.1 deadline, that is a preferable option. Author: Aaron Davidson <[email protected]> Closes apache#1678 from aarondav/consol and squashes the following commits: 53b3f6d [Aaron Davidson] Correct behavior when writing unopened file 701d045 [Aaron Davidson] Rebase with sort-based shuffle 9160149 [Aaron Davidson] SPARK-2532: Minimal shuffle consolidation fixes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This catches Exception
but it could catch IOException
instead.
Co-authored-by: Russell Spitzer <[email protected]>
All changes from this PR are by @mridulm and are drawn from his work in #1609. This patch is intended to fix all major issues related to shuffle file consolidation that @mridulm found, while minimizing changes to the code, with the hope that it may be more easily merged into 1.1.
This patch is not intended as a replacement for #1609, which provides many additional benefits, including fixes to ExternalAppendOnlyMap, improvements to DiskBlockObjectWriter's API, and several new unit tests.
If it is feasible to merge #1609 for the 1.1 deadline, that is a preferable option.