[SPARK-20653][core] Add cleaning of old elements from the status store. #19751

vanzin · 2017-11-14T22:30:15Z

This change restores the functionality that keeps a limited number of
different types (jobs, stages, etc) depending on configuration, to avoid
the store growing indefinitely over time.

The feature is implemented by creating a new type (ElementTrackingStore)
that wraps a KVStore and allows triggers to be set up for when elements
of a certain type meet a certain threshold. Triggers don't need to
necessarily only delete elements, but the current API is set up in a way
that makes that use case easier.

The new store also has a trigger for the "close" call, which makes it
easier for listeners to register code for cleaning things up and flushing
partial state to the store.

The old configurations for cleaning up the stored elements from the core
and SQL UIs are now active again, and the old unit tests are re-enabled.

This change restores the functionality that keeps a limited number of different types (jobs, stages, etc) depending on configuration, to avoid the store growing indefinitely over time. The feature is implemented by creating a new type (ElementTrackingStore) that wraps a KVStore and allows triggers to be set up for when elements of a certain type meet a certain threshold. Triggers don't need to necessarily only delete elements, but the current API is set up in a way that makes that use case easier. The new store also has a trigger for the "close" call, which makes it easier for listeners to register code for cleaning things up and flushing partial state to the store. The old configurations for cleaning up the stored elements from the core and SQL UIs are now active again, and the old unit tests are re-enabled.

vanzin · 2017-11-14T22:30:35Z

For context:

Project link: https://issues.apache.org/jira/browse/SPARK-18085
Upcoming PRs that build on this code: https://github.com/vanzin/spark/pulls
More comments: SHS-NG M6: Add cleaning of old elements from the store. vanzin/spark#51

SparkQA · 2017-11-15T01:49:52Z

Test build #83867 has finished for PR 19751 at commit 8c346a1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gengliangwang · 2017-11-16T04:32:47Z

@vanzin looks like this PR has conflicts now.

gengliangwang · 2017-11-16T05:01:30Z

core/src/main/scala/org/apache/spark/status/ElementTrackingStore.scala

+    write(value)
+
+    if (checkTriggers && !stopped) {
+      triggers.get(value.getClass()).foreach { list =>


we should remove the empty parens after getClass

I really prefer this style (since the original method is declared with the parentheses - it's a Java method after all).

SparkQA · 2017-11-16T08:01:14Z

Test build #83927 has finished for PR 19751 at commit 8b150e0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gengliangwang · 2017-11-16T05:40:06Z

core/src/main/scala/org/apache/spark/status/ElementTrackingStore.scala

+        doAsync {
+          val count = store.count(value.getClass())
+          list.foreach { t =>
+          if (count > t.threshold) {


indent two spaces

gengliangwang · 2017-11-16T15:28:44Z

core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala

-  val retainedStages = conf.getInt("spark.ui.retainedStages", SparkUI.DEFAULT_RETAINED_STAGES)
-  val retainedJobs = conf.getInt("spark.ui.retainedJobs", SparkUI.DEFAULT_RETAINED_JOBS)
-  val retainedTasks = conf.get(UI_RETAINED_TASKS)
+  val retainedStages = conf.getInt("spark.ui.retainedStages", 1000)


Why use hard code here? Maybe make the configurations in config.scala public, so that we don't need to write the default values in two places.

This class is being removed in a separate PR.

gengliangwang · 2017-11-17T05:50:50Z

core/src/main/scala/org/apache/spark/status/AppStatusListener.scala


-  private def update(entity: LiveEntity, now: Long): Unit = {
-    entity.write(kvstore, now)
+  private def update(entity: LiveEntity, now: Long, last: Boolean = false): Unit = {


nit: maybe change last to isLast ?

I prefer how the current version reads on the call site, e.g.:

update(exec, now, last = true)

Also, Spark generally avoids Java-beans-style prefixes in Scala code (like "is" or "get").

maybe checkTriggers is better than last?

gengliangwang · 2017-11-17T05:52:12Z

core/src/main/scala/org/apache/spark/status/LiveEntity.scala

-  def write(store: KVStore, now: Long): Unit = {
-    store.write(doUpdate())
+  def write(store: ElementTrackingStore, now: Long, checkTriggers: Boolean = false): Unit = {
+    store.write(doUpdate(), checkTriggers || lastWriteTime == 0L)


can you specify why does it check triggers on the first write?

SparkQA · 2017-11-17T08:05:01Z

Test build #83957 has finished for PR 19751 at commit e09a376.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2017-11-17T17:59:43Z

retest this please

SparkQA · 2017-11-17T20:50:01Z

Test build #83975 has finished for PR 19751 at commit e09a376.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2017-11-17T21:37:29Z

Not retesting since there will probably be feedback and the failure seems unrelated, so better just wait.

SparkQA · 2017-11-30T02:10:00Z

Test build #84318 has finished for PR 19751 at commit 3f7c25d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-09T02:18:45Z

Test build #84665 has finished for PR 19751 at commit 2606fcd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

squito

first pass comments, will go through again this afternoon

squito · 2017-12-13T16:16:30Z

core/src/main/scala/org/apache/spark/status/LiveEntity.scala

 */
 private[spark] abstract class LiveEntity {

  var lastWriteTime = 0L


minor: can the initial value be -1 instead? doesn't matter right now, but often in tests we use a manual clock starting at time 0. That would cause problems w/ this default.

squito · 2017-12-13T16:27:30Z

core/src/main/scala/org/apache/spark/status/ElementTrackingStore.scala

+ *   internal state to the store (e.g. after the SHS finishes parsing an event log).
+ *
+ * The configured triggers are run on the same thread that triggered the write, after the write
+ * has completed.


they are actually run in a different thread, right? (comment on addTrigger looks correct)

squito · 2017-12-13T16:30:40Z

core/src/main/scala/org/apache/spark/status/ElementTrackingStore.scala

+    if (checkTriggers && !stopped) {
+      triggers.get(value.getClass()).foreach { list =>
+        doAsync {
+          val count = store.count(value.getClass())


I appreciate that this is generic, but isn't this significantly more expensive than just having a special internal variable to track this for each class as you update? I'm imaging a job with tons of very quick tasks.

You could also pull the store.count() out of foreach in case there ever were multiple triggers associated with a class (though I guess there aren't right now).

ok on another read, I see that tasks are actually tracked specially and don't use this trigger mechanism. Though, still this api isn't really that useful in the end -- its good for jobs and stages, but not actually the right count for executors, and you don't use it for tasks. still might be easier to just track those other counts directly, without going through kvstore.count()

kvstore.count is pretty cheap in-memory (it's basically a hash table lookup), and cheap enough on a disk store (its cost is dwarfed by the writes that actually trigger the call).

So while I do agree that the interface is not optimal, of the 4 call sites, it handles 3 without need for any special code, and the 4th (cleanupExecutors) would still need to call kvstore.count() or keep that count separately, so this sounds simpler.

squito · 2017-12-13T16:33:17Z

core/src/main/scala/org/apache/spark/status/AppStatusListener.scala

+    // Because the limit is on the number of *dead* executors, we need to calculate whether
+    // there are actually enough dead executors to be deleted.
+    val threshold = conf.get(MAX_RETAINED_DEAD_EXECUTORS)
+    val dead = count - activeExecutorCount


KVStore has this:

/** * Returns the number of items of the given type which match the given indexed value. */ long count(Class<?> type, String index, Object indexedValue) throws Exception;

so with an api change you could get the right number directly from the store. (though this conflicts with my other comment about not using kvstore.count() at all in the trigger, which I think is more important.)

squito · 2017-12-13T16:49:56Z

core/src/main/scala/org/apache/spark/status/KVUtils.scala

  }

+  /** Turns a KVStoreView into a Scala sequence, applying a filter. */
+  def viewToSeq[T](


looks like you don't actually need a sequence at all in any of the call sites, you could just use an iterator. I'm thinking about the price of that say if you're cleaning up 100k tasks repeatedly.

This does give you a nice spot to include iter.close(), but I think you could change this to foreachWithMaxFilterClose or something to avoid ever creating the list.

I create a list explicitly to avoid consistency issues when deleting these elements. If I had an iterator instead, and I then called kvstore.delete, you could get a ConcurrentModificationException.

Since the cleanup code deletes more than necessary to just respect the limit (to avoid having to do this every time you write something), hopefully the cost is amortized a little.

squito · 2017-12-13T16:58:06Z

core/src/main/scala/org/apache/spark/status/AppStatusListener.scala

+      // On live applications, try to delete finished tasks only; when in the SHS, treat all
+      // tasks as the same.
+      val toDelete = KVUtils.viewToSeq(view, countToDelete.toInt) { t =>
+        !live || t.info.status != TaskState.RUNNING.toString()


in the SHS, wouldn't you still prefer to delete finished tasks over live ones? in all cases, should you really just try to delete finished tasks first, but still delete running tasks if need be?

I don't see any filter like this in the old code.

The old code deletes tasks in the order they arrive; it would be expensive to do that here since it would involve sorting the task list (cheap for disk store, expensive for in-memory).

I can keep the same filter behavior for both.

squito · 2017-12-13T17:36:51Z

core/src/test/scala/org/apache/spark/status/AppStatusListenerSuite.scala

+    }
+
+    // Start 3 stages, all should be kept. Stop 2 of them, the oldest stopped one should be
+    // deleted. Start a new attempt of the second stopped one, and verify that the stage graph


oldest meaning smallest id? or ordered they are submitted? with a non-linear stage DAG, the ordering of ids, start-time, & end-time can be ordered arbitrarily. Ids will correspond to submission order, but then stage retries complicates that.

I guess I'm just trying to make sure I understand how the kvstore works, and if there is some important part I'm missing.

there is no DAG here, the test controls what "oldest" means. In this case "oldest" = "first stage in the list", which is also "smallest id", which is the actual behavior of the listener.

squito · 2017-12-13T17:46:22Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala


-    if (needReplay) {
+    val trackingStore = new ElementTrackingStore(kvstore, conf)
+    val listener = if (needReplay) {


listener is unused

squito

ok after anohter pass, overall I think this is good. the only somewhat important comment I have is about the filtering done on tasks for cleanup, how its different from before, and why its different live vs. shs

SparkQA · 2017-12-14T02:55:33Z

Test build #84885 has finished for PR 19751 at commit b02ea2c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-14T15:01:45Z

Test build #84909 has finished for PR 19751 at commit b02ea2c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

squito · 2017-12-14T19:57:53Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

        }

      case _ =>
        (new InMemoryStore(), true)


not related to this change, comment is really for val _replay = !path.isDirectory() but github won't let me put comment there ...

I know we discussed this when you made this change, but still I was confused reading this bit of code on replay. Maybe could you just add a comment above that line like "the kvstore is deleted when we decide that the loaded data is stale -- see LoadedAppUI for a more extensive discussion of the lifecycle".

doesn't seem worth a separate jira / pr just for that comment

squito

lgtm

SparkQA · 2017-12-15T01:04:22Z

Test build #84931 has finished for PR 19751 at commit d384ff4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gengliangwang · 2017-12-15T03:05:00Z

LGTM

cloud-fan · 2017-12-19T00:42:41Z

core/src/main/scala/org/apache/spark/status/AppStatusListener.scala

+
+  kvstore.onFlush {
+    if (!live) {
+      flush()


hm, why only flush for history server?

Because the store is only closed on live applications when the context is shut down. So there's no more UI for you to see this.

cloud-fan · 2017-12-19T00:44:57Z

core/src/main/scala/org/apache/spark/status/ElementTrackingStore.scala

+
+  import config._
+
+  private val triggers = new HashMap[Class[_], Seq[Trigger[_]]]()


use a mutable map?

Man, this has been pushed... I'd appreciate reviews more before they are pushed.

cloud-fan · 2017-12-19T00:47:23Z

core/src/main/scala/org/apache/spark/status/ElementTrackingStore.scala

+      executor.shutdownNow()
+    }
+
+    flushTriggers.foreach { trigger =>


flush sounds like we would do it periodicly, how about closeTriggers?

cloud-fan · 2017-12-19T00:49:11Z

core/src/main/scala/org/apache/spark/status/LiveEntity.scala

-    store.write(doUpdate())
+  def write(store: ElementTrackingStore, now: Long, checkTriggers: Boolean = false): Unit = {
+    // Always check triggers on the first write, since adding an element to the store may
+    // cause the maximum count for the element type to be exceeded.


hmm, for the first write, how can it trigger maximum count?

Multiple jobs, multiple stages, multiple tasks, etc, etc, etc.

Merge branch 'master' into SPARK-20653

8b150e0

gengliangwang reviewed Nov 16, 2017

View reviewed changes

gengliangwang reviewed Nov 17, 2017

View reviewed changes

Indent.

e09a376

gengliangwang reviewed Nov 17, 2017

View reviewed changes

Marcelo Vanzin added 2 commits November 29, 2017 14:46

Merge branch 'master' into SPARK-20653

48a16dd

Add comment.

3f7c25d

Merge branch 'master' into SPARK-20653

2606fcd

squito reviewed Dec 13, 2017

View reviewed changes

Feedback.

b02ea2c

squito reviewed Dec 14, 2017

View reviewed changes

squito approved these changes Dec 14, 2017

View reviewed changes

Add comment.

d384ff4

gengliangwang approved these changes Dec 15, 2017

View reviewed changes

vanzin mentioned this pull request Dec 15, 2017

[SPARK-22786][SQL] only use AppStatusPlugin in history server #19981

Closed

asfgit closed this in 772e464 Dec 18, 2017

vanzin deleted the SPARK-20653 branch December 18, 2017 20:40

cloud-fan reviewed Dec 19, 2017

View reviewed changes


		import config._

		private val triggers = new HashMap[Class[_], Seq[Trigger[_]]]()

[SPARK-20653][core] Add cleaning of old elements from the status store. #19751

[SPARK-20653][core] Add cleaning of old elements from the status store. #19751

Uh oh!

Conversation

vanzin commented Nov 14, 2017

Uh oh!

vanzin commented Nov 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Nov 15, 2017

Uh oh!

gengliangwang commented Nov 16, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 16, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 17, 2017

Uh oh!

vanzin commented Nov 17, 2017

Uh oh!

SparkQA commented Nov 17, 2017

Uh oh!

vanzin commented Nov 17, 2017

Uh oh!

SparkQA commented Nov 30, 2017

Uh oh!

SparkQA commented Dec 9, 2017

Uh oh!

squito left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

squito left a comment

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 14, 2017

Uh oh!

SparkQA commented Dec 14, 2017

vanzin commented Nov 14, 2017 •

edited

Loading

squito Dec 14, 2017 •

edited

Loading