Skip to content

Conversation

@XuTingjun
Copy link
Contributor

For long-running Spark applications (e.g. running for days / weeks), the Spark event log may grow to be very large.

I think group event log by job is an acceptable resolution.

  1. To group eventLog, one application has two kinds file: one meta file and many part files. We put StageSubmitted/ StageCompleted/ TaskResubmit/ TaskStart/TaskEnd/TaskGettingResult/ JobStart/JobEnd events into meta file, and put other events into part file. The event log shows like below:
application_1439246697595_0001-meta
application_1439246697595_0001-part1
application_1439246697595_0001-part2

2.To HistoryServer, every part file will be treated as an application, and it will replay meta file after replay part file. Below is the display of group app on HistoryServer web:
default

@SparkQA
Copy link

SparkQA commented Oct 23, 2015

Test build #44216 has finished for PR 9246 at commit 62c982b.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 23, 2015

Test build #44219 has finished for PR 9246 at commit b8f2b3c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@markhamstra
Copy link
Contributor

@XuTingjun you speak of an "acceptable resolution", but you haven't adequately described the problem you are trying to resolve. Yes, the event log can get long, but I'm not seeing why that is inherently a problem or why you can't "acceptably resolve" your problem by post-processing the event log while leaving that log as a single, unified stream of events.

@andrewor14
Copy link
Contributor

@XuTingjun I think this is something good to fix. I've noticed that uncompressed event logs can amount up to 15GB for a 5 minute application. However, I think a lot of functionality is already implemented so we should reuse existing code where possible. In particular, have you looked at RollingFileAppender? We can use that and specify a RollingPolicy based on number of bytes written. That seems to achieve what we want.

@andrewor14
Copy link
Contributor

By the way, since this patch has been opened many months ago it's now mostly stale. If you plan to work on this, would you mind closing this patch and re-opening one that uses the RollingFileAppender? Hopefully the size of the diff will be much smaller then.

@XuTingjun XuTingjun closed this Dec 15, 2015
@satlal
Copy link

satlal commented May 13, 2016

@XuTingjun revisiting this thread. Since this patch seems to be abandoned, were you able to work around the large log files issue for long running streaming jobs?

@eladamitpxi
Copy link

+1 large / infinitely growing event log files are quite a problem for long running streaming jobs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants