-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-19757][CORE] DriverEndpoint#makeOffers race against CoarseGrainedSchedulerBackend#killExecutors #17091
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6485fb8 to
6bc6b2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you fix the title and description?
The title should summarize the change, not the bug. The description should explain better why this is necessary, and you left parts of the template in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this racing against?
If it's requestTotalExecutors you're using the wrong lock, since that method is in a different class. It would be good to add comments mentioning the race (like the code handling the RegisterExecutor message).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. Updated the description and the title, and added comment to mention the race.
Fixed the synchronized statement.
|
ok to test |
|
Test build #73859 has finished for PR 17091 at commit
|
|
Can you add back the bug to the PR title? |
|
Sure. Thanks. |
|
@vanzin I updated the patch a little so as not to kill an executor if it is busy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should change. This is the user's code explicitly asking for executors to be killed, so there's no reason to not honor that request. It would also change the behavior of a public API, and there's no way for the user to know which executors were killed or not from the method's return value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just found out that SPARK-16554 just fixed the issue of killing busy executors. Reverted this change and updated the patch. Thanks.
|
Test build #74046 has finished for PR 17091 at commit
|
|
Test build #74117 has finished for PR 17091 at commit
|
|
Did you mean to close this? |
|
Oh, sorry, I closed it by mistake. I'd like to get it in. Thanks. |
|
ok to test |
|
Test build #74214 has finished for PR 17091 at commit
|
|
Merging to master. |
What changes were proposed in this pull request?
While some executors are being killed due to idleness, if some new tasks come in, driver could assign them to some executors are being killed. These tasks will fail later when the executors are lost. This patch is to make sure CoarseGrainedSchedulerBackend#killExecutors and DriverEndpoint#makeOffers are properly synchronized.
How was this patch tested?
manual tests