[SPARK-5210] Support group event log when app is long-running #9246
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
For long-running Spark applications (e.g. running for days / weeks), the Spark event log may grow to be very large.
I think group event log by job is an acceptable resolution.
StageSubmitted/ StageCompleted/ TaskResubmit/ TaskStart/TaskEnd/TaskGettingResult/ JobStart/JobEndevents into meta file, and put other events into part file. The event log shows like below:2.To HistoryServer, every part file will be treated as an application, and it will replay meta file after replay part file. Below is the display of group app on HistoryServer web:
