Skip to content

Conversation

aarondav
Copy link
Contributor

All changes from this PR are by @mridulm and are drawn from his work in #1609. This patch is intended to fix all major issues related to shuffle file consolidation that @mridulm found, while minimizing changes to the code, with the hope that it may be more easily merged into 1.1.

This patch is not intended as a replacement for #1609, which provides many additional benefits, including fixes to ExternalAppendOnlyMap, improvements to DiskBlockObjectWriter's API, and several new unit tests.

If it is feasible to merge #1609 for the 1.1 deadline, that is a preferable option.

@SparkQA
Copy link

SparkQA commented Jul 31, 2014

QA tests have started for PR 1678. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17535/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 31, 2014

QA results for PR 1678:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17535/consoleFull

aarondav added 2 commits July 30, 2014 19:08
All changes from this PR are by @mridulm and are drawn from his work in apache#1609.
This patch is intended to fix all major issues related to shuffle file consolidation
that @mridulm found, while minimizing changes to the code, with the hope that it may
be more easily merged into 1.1.

This patch is **not** intended as a replacement for apache#1609, which provides many
additional benefits, including fixes to ExternalAppendOnlyMap, improvements to
DiskBlockObjectWriter's API, and several new unit tests.

If it is feasible to merge apache#1609 for the 1.1 deadline, that is a preferable option.
@SparkQA
Copy link

SparkQA commented Jul 31, 2014

QA tests have started for PR 1678. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17537/consoleFull

@aarondav
Copy link
Contributor Author

@mridulm Please take a look if possible.

@mateiz This now interacts with the ExternalSorter stuff, and it's possible it partially helps fix a serialization bug (since some serializers apparently write things during a close() after a flush()).

@witgo
Copy link
Contributor

witgo commented Jul 31, 2014

We can also merge it to 1.0 branch.

@mateiz
Copy link
Contributor

mateiz commented Jul 31, 2014

That's true, this is small enough to add back into branch-1.0 too. We should see if we can do the same for the rest of #1609 too (ideally I would like to merge that whole PR into branch-1.0, but I'm not sure how easy that is).

@SparkQA
Copy link

SparkQA commented Jul 31, 2014

QA results for PR 1678:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17537/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 31, 2014

QA tests have started for PR 1678. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17553/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 31, 2014

QA results for PR 1678:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17553/consoleFull

@aarondav
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jul 31, 2014

QA tests have started for PR 1678. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17565/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 31, 2014

QA results for PR 1678:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17565/consoleFull

@aarondav
Copy link
Contributor Author

Both prior failures are due to being unable to bind to ports, which sounds like leftover state from other jenkins runs (especially since they were two entirely disjoint sets of failed tests related to port binding).

@aarondav
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jul 31, 2014

QA tests have started for PR 1678. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17585/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 31, 2014

QA results for PR 1678:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17585/consoleFull

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert can throw exception : which will cause other writers to not revert.
We need to wrap it in try/catch, log and continue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert actually doesn't throw, per its (updated) comment.

@mateiz
Copy link
Contributor

mateiz commented Aug 1, 2014

Jenkins, test this please

@SparkQA
Copy link

SparkQA commented Aug 1, 2014

QA tests have started for PR 1678. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17689/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 1, 2014

QA results for PR 1678:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17689/consoleFull

@mateiz
Copy link
Contributor

mateiz commented Aug 1, 2014

@aarondav can you update the name of this PR to reference this sub-task JIRA instead: https://issues.apache.org/jira/browse/SPARK-2791?

@aarondav aarondav changed the title SPARK-2532: Minimal shuffle consolidation fixes SPARK-2791: Fix committing, reverting and state tracking in shuffle file consolidation Aug 1, 2014
@aarondav
Copy link
Contributor Author

aarondav commented Aug 1, 2014

Updated title.

@mateiz
Copy link
Contributor

mateiz commented Aug 1, 2014

Alright, going to merge it then. The changes look good to me.

@asfgit asfgit closed this in 78f2af5 Aug 1, 2014
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
…ile consolidation

All changes from this PR are by mridulm and are drawn from his work in apache#1609. This patch is intended to fix all major issues related to shuffle file consolidation that mridulm found, while minimizing changes to the code, with the hope that it may be more easily merged into 1.1.

This patch is **not** intended as a replacement for apache#1609, which provides many additional benefits, including fixes to ExternalAppendOnlyMap, improvements to DiskBlockObjectWriter's API, and several new unit tests.

If it is feasible to merge apache#1609 for the 1.1 deadline, that is a preferable option.

Author: Aaron Davidson <[email protected]>

Closes apache#1678 from aarondav/consol and squashes the following commits:

53b3f6d [Aaron Davidson] Correct behavior when writing unopened file
701d045 [Aaron Davidson] Rebase with sort-based shuffle
9160149 [Aaron Davidson] SPARK-2532: Minimal shuffle consolidation fixes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This catches Exception but it could catch IOException instead.

sunchao pushed a commit to sunchao/spark that referenced this pull request Jun 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants