Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Discuss how to make it easier to debug when executors die because of memory limit #247

@kimoonkim

Description

@kimoonkim

@foxish

I was doing the HDFS-in-K8s experiment using Spark TeraSort jobs. It turned out the default memory size for executors, which is 1 GB per executor, is way too small for the workload. Executor JVMs would just get killed and restarted. I ended up specifying 6 GB per executor.

Learning the root cause was a painful process though. Because there is no easy way to see why the JVMs get killed. It does not show up in $ kubectl log of executor pods. I show a glimpse of that only in the Kubernetes dashboard UI when I was visiting the pod page at the right time.

I wonder if there is a better way. I hear a lot that Spark uses lots of memory depending on applications. I fear many people have to go through this troubleshooting without much help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions