Make Spark jobs on Kubernetes more resilient when they encounter pod/executor failures.

This issue mainly covers the desired behavior of Spark jobs when executors fail during the job run. 
 The current behavior of a Spark application on Yarn when executors inadvertently fail is to 
 - first identify that executors failed
 - taking corrective actions on executor failures. Example: Relaunching failed tasks on other healthy executors, launching new executors.

Link to the document which describes spark behavior on k8s and yarn when executors get killed during a spark job run. https://docs.google.com/document/d/1GX__jsCbeCw4RrUpHLqtpAzHwV82NQrgjz1dCCqqRes/edit?usp=sharing

This is an umbrella issue for issues
#134 
#136 





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make Spark jobs on Kubernetes more resilient when they encounter pod/executor failures. #133

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Make Spark jobs on Kubernetes more resilient when they encounter pod/executor failures. #133

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions