Skip to content

Conversation

nsuthar
Copy link

@nsuthar nsuthar commented Apr 23, 2014

Details: rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, rootDir1

Details: rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0, rootDir1
@nsuthar
Copy link
Author

nsuthar commented Apr 23, 2014

I have emailed the person who has opened this Jira ...I am waiting for him to assign it to me...if someone can assign it to me thn I can change Jira bug status appropritely.

Email: [email protected]

Thank you,
Niraj

@nsuthar nsuthar closed this Apr 23, 2014
@nsuthar nsuthar reopened this Apr 23, 2014
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@rxin
Copy link
Contributor

rxin commented Apr 24, 2014

Jenkins, test this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14426/

@pwendell
Copy link
Contributor

Jenkins, retest this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14427/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this import is required (and it's causing a compile error anyway).

rootDirs in DiskBlockManagerSuite doesn't get full path from rootDir0,
rootDir1
@nsuthar
Copy link
Author

nsuthar commented Apr 24, 2014

removed import statement causing compilation error and build it locally...seems to be fixed now.

@nsuthar
Copy link
Author

nsuthar commented Apr 24, 2014

build passed.

@nsuthar nsuthar closed this Apr 24, 2014
@nsuthar nsuthar deleted the bugFixes branch April 24, 2014 19:29
@nsuthar nsuthar restored the bugFixes branch April 24, 2014 19:32
@nsuthar nsuthar deleted the bugFixes branch April 24, 2014 19:34
pwendell pushed a commit to pwendell/spark that referenced this pull request Apr 27, 2014
Move the PR#517 of apache-incubator-spark to the apache-spark

Author: Mridul Muralidharan <[email protected]>

Closes apache#159 from mridulm/master and squashes the following commits:

5ff59c2 [Mridul Muralidharan] Change property in suite also
167fad8 [Mridul Muralidharan] Address review comments
9bda70e [Mridul Muralidharan] Address review comments, akwats add to failedExecutors
270d841 [Mridul Muralidharan] Address review comments
fa5d9f1 [Mridul Muralidharan] Bugfixes/improvements to scheduler : PR apache#517
yifeih pushed a commit to yifeih/spark that referenced this pull request May 8, 2019
Implements the shuffle locations API as part of SPARK-25299.

This adds an additional field to all `MapStatus` objects: a `MapShuffleLocations` that indicates where a task's map output is stored. This module is optional and implementations of the pluggable shuffle writers and readers can ignore it accordingly.

This API is designed with the use case in mind of future plugin implementations desiring to have the driver store metadata about where shuffle blocks are stored.

There are a few caveats to this design:

- We originally wanted to remove the `BlockManagerId` from `MapStatus` entirely and replace it with this object. However, doing this proves to be very difficult, as many places use the block manager ID for other kinds of shuffle data bookkeeping. As a result, we concede to storing the block manager ID redundantly here. However, the overhead should be minimal: because we cache block manager ids and default map shuffle locations, the two fields in `MapStatus` should point to the same object on the heap. Thus we add `O(M)` storage overhead on the driver, where for each map status we're storing an additional pointer to the same on-heap object. We will run benchmarks against the TPC-DS workload to see if there are significant performance repercussions for this implementation.

- `KryoSerializer` expects `CompressedMapStatus` and `HighlyCompressedMapStatus` to be serialized via reflection, so originally all fields of these classes needed to be registered with Kryo. However, the `MapShuffleLocations` is now pluggable. We think however that previously Kryo was defaulting to Java serialization anyways, so we now just explicitly tell Kryo to use `ExternalizableSerializer` to deal with these objects. There's a small hack in the serialization protocol that attempts to avoid serializing the same `BlockManagerId` twice in the case that the map shuffle locations is a `DefaultMapShuffleLocations`.
squito pushed a commit to squito/spark that referenced this pull request Jul 18, 2019
Implements the shuffle locations API as part of SPARK-25299.

This adds an additional field to all `MapStatus` objects: a `MapShuffleLocations` that indicates where a task's map output is stored. This module is optional and implementations of the pluggable shuffle writers and readers can ignore it accordingly.

This API is designed with the use case in mind of future plugin implementations desiring to have the driver store metadata about where shuffle blocks are stored.

There are a few caveats to this design:

- We originally wanted to remove the `BlockManagerId` from `MapStatus` entirely and replace it with this object. However, doing this proves to be very difficult, as many places use the block manager ID for other kinds of shuffle data bookkeeping. As a result, we concede to storing the block manager ID redundantly here. However, the overhead should be minimal: because we cache block manager ids and default map shuffle locations, the two fields in `MapStatus` should point to the same object on the heap. Thus we add `O(M)` storage overhead on the driver, where for each map status we're storing an additional pointer to the same on-heap object. We will run benchmarks against the TPC-DS workload to see if there are significant performance repercussions for this implementation.

- `KryoSerializer` expects `CompressedMapStatus` and `HighlyCompressedMapStatus` to be serialized via reflection, so originally all fields of these classes needed to be registered with Kryo. However, the `MapShuffleLocations` is now pluggable. We think however that previously Kryo was defaulting to Java serialization anyways, so we now just explicitly tell Kryo to use `ExternalizableSerializer` to deal with these objects. There's a small hack in the serialization protocol that attempts to avoid serializing the same `BlockManagerId` twice in the case that the map shuffle locations is a `DefaultMapShuffleLocations`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants