Skip to content

Conversation

@lianhuiwang
Copy link
Contributor

some time since some reasons, it lead to some exception while NMClient start some containers.example:we do not config spark_shuffle on some machines, so it will throw a exception:
java.lang.Error: org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist.
because YarnAllocator use ThreadPoolExecutor to start Container, so we can not find which container or hostname throw exception. I think we should catch YarnException in ExecutorRunnable when start container. if there are some exceptions, we can know the container id or hostname of failed container.

@SparkQA
Copy link

SparkQA commented Feb 12, 2015

Test build #27329 has started for PR 4554 at commit c02140f.

  • This patch merges cleanly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same exception shouldn't be logged twice. If we're going to rethrow it, we should wrap it in another exception and include the additional info in that exception's message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, a couple nits: we should use String interpolation instead of .format, and "start" should be "starting".

@sryza
Copy link
Contributor

sryza commented Feb 12, 2015

Thanks for posting this @lianhuiwang. Left a couple comments.

@SparkQA
Copy link

SparkQA commented Feb 12, 2015

Test build #27339 has started for PR 4554 at commit caf5a99.

  • This patch merges cleanly.

@lianhuiwang
Copy link
Contributor Author

@sryza thanks. I think i can use SparkException to warp Exception. Could you review this again?

@SparkQA
Copy link

SparkQA commented Feb 12, 2015

Test build #27329 has finished for PR 4554 at commit c02140f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27329/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Feb 12, 2015

Test build #27339 has finished for PR 4554 at commit caf5a99.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27339/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: imports should be ordered alphabetically, so SparkException should come after SparkConf here

@andrewor14
Copy link
Contributor

LGTM I'm going to merge this in master 1.3 after fixing @sryza's comments myself. Thanks @lianhuiwang

@andrewor14
Copy link
Contributor

For branch-1.2 there's going to be significant merge conflicts, so I will merge it there once you open a PR against that branch.

@asfgit asfgit closed this in 947b8bd Feb 12, 2015
asfgit pushed a commit that referenced this pull request Feb 12, 2015
…MClient start contain...

some time since some reasons, it lead to some exception while NMClient start some containers.example:we do not config spark_shuffle on some machines, so it will throw a exception:
java.lang.Error: org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist.
because YarnAllocator use ThreadPoolExecutor to start Container, so we can not find which container or hostname throw exception. I think we should catch YarnException in ExecutorRunnable when start container. if there are some exceptions, we can know the container id or hostname of failed container.

Author: lianhuiwang <[email protected]>

Closes #4554 from lianhuiwang/SPARK-5759 and squashes the following commits:

caf5a99 [lianhuiwang] use SparkException to warp exception
c02140f [lianhuiwang] ExecutorRunnable should catch YarnException while NMClient start container

(cherry picked from commit 947b8bd)
Signed-off-by: Andrew Or <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants