This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Description
This issue mainly covers the desired behavior of Spark jobs when executors fail during the job run.
The current behavior of a Spark application on Yarn when executors inadvertently fail is to
- first identify that executors failed
- taking corrective actions on executor failures. Example: Relaunching failed tasks on other healthy executors, launching new executors.
Link to the document which describes spark behavior on k8s and yarn when executors get killed during a spark job run. https://docs.google.com/document/d/1GX__jsCbeCw4RrUpHLqtpAzHwV82NQrgjz1dCCqqRes/edit?usp=sharing
This is an umbrella issue for issues
#134
#136