Skip to content

Conversation

@hustfxj
Copy link
Contributor

@hustfxj hustfxj commented Mar 7, 2017

Cleaning the application may cost much time at worker, then it will block that the worker send heartbeats master because the worker is extend ThreadSafeRpcEndpoint. If the heartbeat from a worker is blocked by the message ApplicationFinished, master will think the worker is dead. If the worker has a driver, the driver will be scheduled by master again.
It had better reuse the existing cleanupThreadExecutor to clean up the directories of finished applications to avoid the block.

@jerryshao
Copy link
Contributor

I think you need to call cleanupApplicationThreadExecutor.shutdownNow() when worker is stopped.

logInfo(s"Cleaning up local directories for application $id")
dirList.foreach { dir =>
Utils.deleteRecursively(new File(dir))
cleanupApplicationThreadExecutor.submit(new Runnable {
Copy link
Member

@zsxwing zsxwing Mar 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to just reuse the existing cleanupThreadExecutor like this:

      appDirectories.remove(id).foreach { dirList =>
        concurrent.Future {
          logInfo(s"Cleaning up local directories for application $id")
          dirList.foreach { dir =>
            Utils.deleteRecursively(new File(dir))
          }
        }(cleanupThreadExecutor).onFailure {
          case e: Throwable =>
            logError(s"Clean up app dir $dirList failed: ${e.getMessage}", e)
        }(cleanupThreadExecutor)
      }
      shuffleService.applicationRemoved(id)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you, but I am worried that cleaning up the workDir and application all cost mush time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That won't become an issue. The worst case is there will be some pending tasks in the queue of cleanupThreadExecutor. Considering the number of applications is not huge, it won't be an issue.

@zsxwing
Copy link
Member

zsxwing commented Mar 8, 2017

ok to test

@SparkQA
Copy link

SparkQA commented Mar 8, 2017

Test build #74220 has finished for PR 17189 at commit a353262.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 9, 2017

Test build #74266 has finished for PR 17189 at commit cbf14b2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


// A separated thread to clean up the workDir. Used to provide the implicit parameter of `Future`
// methods.
// A separated thread to clean up the workDir and the finished application.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the directories of finished applications.

@zsxwing
Copy link
Member

zsxwing commented Mar 9, 2017

@hustfxj Looks pretty good. Could you update the PR title and description to reflect the latest changes?

@hustfxj hustfxj changed the title [SPARK-19831][CORE] Use a separate thread to clean up the finished application to avoid the block [SPARK-19831][CORE] Use a separate thread to clean up the directories of finished applications to avoid the block Mar 12, 2017
@hustfxj hustfxj changed the title [SPARK-19831][CORE] Use a separate thread to clean up the directories of finished applications to avoid the block [SPARK-19831][CORE] Reuse the existing cleanupThreadExecutor to clean up the directories of finished applications to avoid the block Mar 12, 2017
@hustfxj
Copy link
Contributor Author

hustfxj commented Mar 12, 2017

@zsxwing Of course, Thank you for your reminding

@SparkQA
Copy link

SparkQA commented Mar 12, 2017

Test build #74397 has finished for PR 17189 at commit 8e44aab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member

zsxwing commented Mar 12, 2017

LGTM. Merging to master. Thanks!

@asfgit asfgit closed this in 2f5187b Mar 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants