[SPARK-16333][Core] Enable EventLoggingListener to log less #16714

jisookim0513 · 2017-01-26T20:02:24Z

What changes were proposed in this pull request?

Starting from Spark 2.0, task metrics are in the form of an accumulator. This is good but also causes excessive event logs because the metrics are logged twice (one under "Accumulators" and one under "Task Metrics"). For applications with lots of tasks, the size of event logs could be tens of GB and it is not feasible for Spark History Server to parse the logs and reconstruct the job UI.

This PR adds an option for EventLoggingListener not to log internal accumulators that are for task metrics. It also adds an option not to log "Update Block Statuses" metric that is quite verbose and might not be needed on some occasions.

After updating to Spark 2.0, a size of the event log of some application with over 50k tasks jumped from ~ 1GB to over 40 GB. With this patch, the size of event logs became similar to the previous sizes with Spark 1.5.2.

How was this patch tested?

Unit tests. Also run in production.

…ulators for metrics

vanzin · 2017-01-26T20:54:47Z

ok to test

SparkQA · 2017-01-26T23:30:22Z

Test build #72043 has finished for PR 16714 at commit b0bebcc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-01-27T00:02:52Z

Test build #72048 has finished for PR 16714 at commit 5d6cf56.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

jisookim0513 · 2017-01-27T00:25:48Z

Not sure why the second test build failed at PySpark unit tests. I only changed the comments.

SparkQA · 2017-01-27T02:58:23Z

Test build #72058 has finished for PR 16714 at commit f146121.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

drcrallen · 2017-02-02T04:55:19Z

@vanzin can you check this out please?

AmplabJenkins · 2017-02-13T22:28:18Z

Can one of the admins verify this patch?

vanzin

See comments. Mostly want to know why we'd even bother with recording this information.

vanzin · 2017-02-13T22:39:14Z

core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala

  private val testing = sparkConf.getBoolean("spark.eventLog.testing", false)
  private val outputBufferSize = sparkConf.getInt("spark.eventLog.buffer.kb", 100) * 1024
+  // To reduce the size of event logs, we can omit logging all of internal accumulables for metrics.
+  private val omitInternalAccumulables =


Is this information useful at all in event logs? If there's nothing that uses it, then not writing it is probably better than having yet another option that has to be documented and that people have to explicitly enable.

I don't think this information is used to reconstruct job UI. I am not sure how this information got included in event logs, but I think some people might be using it to get internal metrics for a stage from the history server using its REST API. For example, CPU time metrics is not included in stage metrics you can get by querying history server endpoint /applications/[app-id]/stages/[stage-id].

Actually I see CPU time in both stage-level data and task-level data in the REST API...

Do you mind checking the code for when this was introduced and whether it was a conscious decision (as in, it covered some user case we're not seeing)?

If possible it's always better to avoid adding more config options, especially in this kind of situation. For example, if this data is needed for something, the config would be disabling that functionality, and it would be better to instead figure out how to save it in a way that does not waste so much space. And there's always compression.

@vanzin I added CPU time because back then I was pulling stage metrics from history server and needed CPU time. Here's the PR for the change: #10212 . Looking at the code, CPU time should be there, so I think there's something on my end. That's a separate problem though, and I don't think CPU time metric should increase size of event logs much. I can't think of a use case for internal accumulables then, so I think it makes sense to delete this. If anyone wants to use internal accumulables for stage metrics, they should be able to catch it after a stage finishes, not from History server.

vanzin · 2017-02-13T22:43:54Z

core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala

+    sparkConf.getBoolean("spark.eventLog.omitInternalAccumulables", false)
+  // To reduce the size of event logs, we can omit logging "Updated Block Statuses" metric.
+  private val omitUpdatedBlockStatuses =
+    sparkConf.getBoolean("spark.eventLog.omitUpdatedBlockStatuses", false)


Similarly here.

A good way to measure "is it needed?" is to check whether the UI needs the information. If it doesn't, then it's probably something we can live without.

If the UI uses it, the option should be documented and more properly reflect what the user is giving up by disabling it (e.g. "spark.eventLog.simplifiedStorageInfo" or something).

I am not sure if updated block statuses are used for the UI. At first, I was wondering if the information was used to reconstruct Storage page but I checked the usage of TaskMetrics.updatedBlockStatues and it doesn't seem to be used anywhere except for when the task metrics is converted to JSON object. Actually, I am not sure if Storage tab is working at all unless I am missing something. I don't think /applications/[app-id]/storage/rdd returns any meaningful information.

jasonmoore2k · 2017-02-22T07:01:32Z

Would some of the other recent contributors to this area (e.g. @zsxwing or @JoshRosen) be able to comment on any use for these internal accumulables / block status updates, and whether they can be removed from the event log? I couldn't see anything that goes wrong, and my file went from 22 GB to 91 MB so it makes a big difference.

jasonmoore2k · 2017-02-22T07:03:38Z

core/src/main/scala/org/apache/spark/util/JsonProtocol.scala

+  def stageCompletedToJson(
+    stageCompleted: SparkListenerStageCompleted,
+    omitInternalAccums: Boolean = false): JValue = {
    val stageInfo = stageInfoToJson(stageCompleted.stageInfo)


Were you intending to pass omitInternalAccums into this stageInfoToJson call?

Yes, thank you for catching it. I think it got omitted while I was merging stuff. Will fix this.

jasonmoore2k · 2017-02-22T07:13:04Z

core/src/main/scala/org/apache/spark/util/JsonProtocol.scala

+    event: SparkListenerEvent,
+    omitInternalAccums: Boolean = false,
+    omitUpdatedBlockStatuses: Boolean = false): JValue = {
    event match {


stageSubmitted
stageCompleted
jobStart
jobEnd

You didn't seem to use omitInternalAccums/omitUpdatedBlockStatuses in these cases, although you had changed the underlying methods to support these flags. Intended?

stageSubmitted/stageCompleted/jobStart should use omitInternalAccums, but not jobEnd. jobEnd's interface hasn't changed. omitUpdatedBlockStatues is intended to be only used for taskEnd because that's when updated block statuses are reported. Thanks for catching, I will add omitInternalAccums to stageSubmitted and jobStart.

ajbozarth · 2017-04-27T18:32:02Z

Is this still an issue or did #17412 fix this?

jisookim0513 · 2017-04-27T18:44:17Z

I would still like not to have all internal accumulators in the event logs (not just updatedBlockStatuses), as well as updated block statuses metric. @vanzin would you be ok with eliminating all internal accumulators and have an option to skip logging updated block statues metric?

vanzin · 2017-04-27T18:51:00Z

After having played with some of this code for other reasons, at least some of the internal accumulators are needed to rebuild the SQL UI.

As far as logging updated block statuses, I still don't like the idea of an option. It's either needed or it's not needed.

ajbozarth · 2017-04-27T18:53:42Z

Didn't #17412 already get rid of block statuses though?

vanzin · 2017-04-27T18:54:09Z

I think so. Just replying to the question.

jisookim0513 · 2017-04-27T19:00:33Z

core/src/main/scala/org/apache/spark/util/JsonProtocol.scala

-        ("Block ID" -> id.toString) ~
-          ("Status" -> blockStatusToJson(status))
-      })
+      if (omitUpdatedBlockStatuses) {


@vanzin @ajbozarth #17412 gets rid of updated block statuses from the accumulable but not from task metrics. If you think it's ok to not to have an option to get rid of updated block statuses, then I can just get rid of updated block statuses here.

jisookim0513 · 2017-04-27T19:10:52Z

@vanzin @ajbozarth if you guys think having an option to skip logging internal accumulators (in my case I don't use the SQL UI) and completely getting rid of updated block statues are not needed, I can close this PR.

ajbozarth · 2017-04-27T19:28:56Z

Unless there's still an issue with file size I think I'm good without this, but I'll defer to @vanzin

vanzin · 2017-04-27T19:30:46Z

Options put the burden on the user to figure things out (do I need to set this or not?). If you want to investigate whether you can trim more data (e.g. internal metrics that just mirror stuff already in TaskMetrics) then there would be value. But adding an option doesn't really help users, since aside from you, probably nobody will ever even look at those.

jisookim0513 · 2017-04-27T20:01:59Z

Ok, not including the updated blocks in task metrics reduced the size of our event logs. But I am closing this PR as the current implementation doesn't seem to be in the right way. Thanks for the inputs.

jisookim0513 added 3 commits January 26, 2017 11:44

add an option to allow EventLoggingListener not to log internal accum…

30cb5d0

…ulators for metrics

fix scalaStyle violation

e51667c

add new configurations to the doc

b0bebcc

update comment

5d6cf56

remove an unnecessary new line

f146121

vanzin reviewed Feb 13, 2017

View reviewed changes

jasonmoore2k reviewed Feb 22, 2017

View reviewed changes

fix omitted omitInternalAccums

ed67116

vanzin mentioned this pull request Mar 24, 2017

[SPARK-20084][Core] Remove internal.metrics.updatedBlockStatuses from history files. #17412

Closed

jisookim0513 commented Apr 27, 2017

View reviewed changes

jisookim0513 closed this Apr 27, 2017

[SPARK-16333][Core] Enable EventLoggingListener to log less #16714

[SPARK-16333][Core] Enable EventLoggingListener to log less #16714

Uh oh!

Conversation

jisookim0513 commented Jan 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

vanzin commented Jan 26, 2017

Uh oh!

SparkQA commented Jan 26, 2017

Uh oh!

SparkQA commented Jan 27, 2017

Uh oh!

jisookim0513 commented Jan 27, 2017

Uh oh!

SparkQA commented Jan 27, 2017

Uh oh!

drcrallen commented Feb 2, 2017

Uh oh!

AmplabJenkins commented Feb 13, 2017

Uh oh!

vanzin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jasonmoore2k commented Feb 22, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajbozarth commented Apr 27, 2017

Uh oh!

jisookim0513 commented Apr 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vanzin commented Apr 27, 2017

Uh oh!

ajbozarth commented Apr 27, 2017

Uh oh!

vanzin commented Apr 27, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jisookim0513 commented Apr 27, 2017

Uh oh!

ajbozarth commented Apr 27, 2017

Uh oh!

vanzin commented Apr 27, 2017

Uh oh!

jisookim0513 commented Apr 27, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

jisookim0513 commented Jan 26, 2017 •

edited

Loading

jisookim0513 commented Apr 27, 2017 •

edited

Loading