Discuss how to make it easier to debug when executors die because of memory limit

@foxish 

I was doing the [HDFS-in-K8s experiment](https://github.com/apache-spark-on-k8s/spark/issues/206#issuecomment-297552949) using Spark TeraSort jobs. It turned out the default memory size for executors, which is 1 GB per executor, is way too small for the workload. Executor JVMs would just get killed and restarted. I ended up specifying 6 GB per executor.

Learning the root cause was a painful process though. Because there is no easy way to see why the JVMs get killed. It does not show up in `$ kubectl log` of executor pods. I show a glimpse of that only in the Kubernetes dashboard UI when I was visiting the pod page at the right time.

I wonder if there is a better way. I hear a lot that Spark uses lots of memory depending on applications. I fear many people have to go through this troubleshooting without much help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Discuss how to make it easier to debug when executors die because of memory limit #247

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Discuss how to make it easier to debug when executors die because of memory limit #247

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions