Skip to content

Commit 7c35133

Browse files
committed
add the risk associated with evolving set of reasons
1 parent 2cd94f4 commit 7c35133

File tree

1 file changed

+22
-6
lines changed
  • keps/sig-apps/3329-retriable-and-non-retriable-failures

1 file changed

+22
-6
lines changed

keps/sig-apps/3329-retriable-and-non-retriable-failures/README.md

Lines changed: 22 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -409,13 +409,29 @@ This might be a good place to talk about core concepts and how they relate.
409409

410410
### Risks and Mitigations
411411

412-
<!-- Increased complexity of the Job management, by interaction with some of
413-
already exiting Job configuration options. To mitigate this, we introduce
414-
minimal API which will allow to satisfy the main usage scenario.-->
412+
#### Risk 1
415413

416-
The Pod status (which includes exit codes) could be lost if the failed pod is garbage collected.
417-
First, it should be rather rare so the overall consequence of unnecessary pod restarts
418-
will be limited. Second, we can prevent this by using the feature of
414+
The list of available options for `reason` field will be evolving with new
415+
values being added and potantially some values beeing obsolete. This can make
416+
it difficult to maintain a valid list of reasons enumerated in the
417+
Job configuration.
418+
419+
First, the users of the Job configuration will be able to mitigate it with the
420+
API which will allow them to specify that any failure of pod with non-empty
421+
`status.reason` can be ignored. This will eliminate the need of listing all the
422+
values. Second, if a user decides to list of all `reasons` to be ignored, the
423+
consequence of a new reason added in a new version of kubernetes is limited -
424+
the failed pod will increment the counter towards the `backoffLimit` which is
425+
the current behaviour.
426+
427+
#### Risk 2
428+
429+
The Pod status (which includes the `reason` field and the container exit codes)
430+
could be lost if the failed pod is garbage collected.
431+
432+
First, it should be rather
433+
rare so the overall consequence of unnecessary pod restarts will be limited.
434+
Second, we can prevent this by using the feature of
419435
[job tracking with finalizers](https://kubernetes.io/docs/concepts/overview/working-with-objects/finalizers/).
420436

421437
<!--

0 commit comments

Comments
 (0)