Skip to content

Conversation

aarondav
Copy link
Contributor

@aarondav aarondav commented Nov 6, 2014

No description provided.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that these checks were not useful (since the secret key is already doing the security checks). Additionally, it doesn't make sense on the Worker.

@aarondav
Copy link
Contributor Author

aarondav commented Nov 6, 2014

@pwendell Perhaps you could review this?

@aarondav aarondav changed the title [SPARK-4277] Support external shuffle service on executor [SPARK-4277] Support external shuffle service on worker Nov 6, 2014
@aarondav aarondav changed the title [SPARK-4277] Support external shuffle service on worker [SPARK-4277] Support external shuffle service on Standalone Worker Nov 6, 2014
@SparkQA
Copy link

SparkQA commented Nov 6, 2014

Test build #23009 has started for PR 3142 at commit 258417c.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 6, 2014

Test build #23012 has started for PR 3142 at commit 47f49d3.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 6, 2014

Test build #23009 has finished for PR 3142 at commit 258417c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class WorkerShuffleService(sparkConf: SparkConf, securityManager: SecurityManager) extends Logging

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23009/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Nov 6, 2014

Test build #23014 has started for PR 3142 at commit 2dcdfc1.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 6, 2014

Test build #23012 has finished for PR 3142 at commit 47f49d3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class WorkerShuffleService(sparkConf: SparkConf, securityManager: SecurityManager) extends Logging

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23012/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Nov 6, 2014

Test build #23014 has finished for PR 3142 at commit 2dcdfc1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class WorkerShuffleService(sparkConf: SparkConf, securityManager: SecurityManager) extends Logging

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23014/
Test FAILed.

@andrewor14
Copy link
Contributor

retest this please

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have some node-level environment variable in addition to this? The point is that if the Worker is started on a different port then it would be good if the executors can just pick it up. Otherwise the executors have no way of figuring out which other port the service is started on unless they go check out the worker logs or something. In Yarn for instance we have shared Yarn config used by both the executors and the NM. Though this is probably fine for first-cut implementation. Any thoughts @pwendell

@SparkQA
Copy link

SparkQA commented Nov 6, 2014

Test build #23022 has started for PR 3142 at commit 2dcdfc1.

  • This patch merges cleanly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't be opposed to this being called StandaloneShuffleService now that you renamed the other one ExternalShuffleService. This name would match YarnShuffleService more. The existing name is also fine.

@aarondav
Copy link
Contributor Author

aarondav commented Nov 6, 2014

Addressed or responded to all comments, save the environment variable one.

@SparkQA
Copy link

SparkQA commented Nov 6, 2014

Test build #23027 has started for PR 3142 at commit 3780bd7.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 7, 2014

Test build #23022 has finished for PR 3142 at commit 2dcdfc1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class WorkerShuffleService(sparkConf: SparkConf, securityManager: SecurityManager) extends Logging

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23022/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Nov 7, 2014

Test build #23027 has finished for PR 3142 at commit 3780bd7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class StandaloneWorkerShuffleService(sparkConf: SparkConf, securityManager: SecurityManager)

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23027/
Test PASSed.

@andrewor14
Copy link
Contributor

Ok, LGTM. Now the dynamic allocation feature can be used on standalone mode too. Super cool. I merge.

asfgit pushed a commit that referenced this pull request Nov 7, 2014
Author: Aaron Davidson <[email protected]>

Closes #3142 from aarondav/worker and squashes the following commits:

3780bd7 [Aaron Davidson] Address comments
2dcdfc1 [Aaron Davidson] Add private[worker]
47f49d3 [Aaron Davidson] NettyBlockTransferService shouldn't care about app ids (it's only b/t executors)
258417c [Aaron Davidson] [SPARK-4277] Support external shuffle service on executor

(cherry picked from commit 6e9ef10)
Signed-off-by: Andrew Or <[email protected]>
@asfgit asfgit closed this in 6e9ef10 Nov 7, 2014
@pwendell
Copy link
Contributor

pwendell commented Nov 7, 2014

Hey Aaron - this LGTM and seems pretty straightforward. Just wondering, what is the behavior if it can't bind to that port? I'm thinking about the case where multiple workers are running on each machine. Will it crash the worker or just e.g. send a log message? If the latter, I think this would still work correctly... it would be a bit hackish, but one worker would grab the port first and then the others would not start a shuffle service.

@aarondav
Copy link
Contributor Author

aarondav commented Nov 7, 2014

The current behavior would crash the second worker. We could disable this behavior and revert to logging if SPARK_WORKER_INSTANCES is set and > 1, but I would be hesitant to just always catch errors from starting the server, as it would just continuously produce failing executors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants