Skip to content

Conversation

@DaneRushtonVista
Copy link

Have observed in my test cluster (where we shutdown workloads overnight), that if the Operator shuts down first, the Terminating pods with the finalizer are left hanging even once the Operator restarts the following morning.
This is because the first time the Operator sees a Pod, it treats it as an Add, and doesn't check if it was already in the Terminating state.
Have copied what was being done on the Modified event type so that the Finalizers get run.

This is because when the Operator starts, all objects are Adds, including those already in the Terminating state.
Copy link
Collaborator

@buehler buehler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be properly externalized into a method instead of copy paste?

@buehler
Copy link
Collaborator

buehler commented Oct 28, 2025

Thanks for the collaboration @DaneRushtonVista, when you say "copied over", can it be properly engineered to reduce clutter and maintain a high code quality?

@DaneRushtonVista
Copy link
Author

Thanks for feedback. Have moved the duplicated Finalizer handling up to remove duplicate code. Retained the original Add vs Modified handling when not terminating as there are sufficient differences required.

@kimpenhaus
Copy link
Contributor

kimpenhaus commented Oct 29, 2025

@buehler @DaneRushtonVista my understanding of Kubernetes api events is, that the added event should be skipped instead of running the finalizer, because the deleted event will be fetched by the watcher after that. the operator is fetching all occurred events since a point in time either defined by resource version (if known) or without (receiving the latest batch hold by the Kubernetes api service). so the event chain of added and deleted will be received. if running the finalizer from with the added event seems not appropriate in my opinion. the upcoming deleted event will then be skipped while not finding the entity.

@buehler
Copy link
Collaborator

buehler commented Oct 29, 2025

Hey @kimpenhaus
Actually good catch. Can you confirm this @DaneRushtonVista? Do actually two events get fired for such a case?

@ralf-cestusio
Copy link

About api events my understanding (and maybe i slightly misunderstood what kimpenhause said) is that:
When an operator adds a watch on an resource it will get an:
Added event.
From there on it will get events about changes and when a resource gets deleted(absolutely removed) the deleted event.

Events are not send about happenings in the past.

So if a resource with a finalizer gets deleted when the operator is not running k8s will set a deletion timestamp.
Now when the operator starts and adds a watch it will get an added event.
In which the deletion timestamp is already set.
There will not be any other events about changes until the finalizer is removed
(making spec changes to a marked for deleted resource is not valid in k8s as well)
After finalizer is removed the resource gets deleted and the deleted event is received

@kimpenhaus
Copy link
Contributor

@ralf-cestusio no native speaker here :) but reading the Kubernetes docs I understand that there is a historical list of events (which is - admitted - not long 😄)

A given Kubernetes server will only preserve a historical record of changes for a limited time. Clusters using etcd 3 preserve changes in the last 5 minutes by default. When the requested watch operations fail because the historical version of that resource is not available, clients must handle the case by recognizing the status code 410 Gone, clearing their local cache, performing a new get or list operation, and starting the watch from the resourceVersion that was returned.

at least in this short period it should be fine to skip this event. while we would receive the delete event after that.

but to be honest then I don't understand the main problem - how could those crd's, which we receive an added event for which contains a deletion timestamp, have a finalizer attached which points to this operator when it was not added by the operator? is that added by convention by third party?

why is the assumption to run the finalizer and not to skip?

@ralf-cestusio
Copy link

@kimpenhaus Yes i think we have the same not native speaker problem.

In my work i deal a lot with the worst case edge case. So this is the problem i try to solve not the happy path.

So if we manage to fall into the short list of historical events all would be good because we would fall into the happy path (but in my understanding that only really happens on resuming a watch after a short outage - a new watch always starts at the latest resource version)

The problem is that operators can be down for extended periods (they should not, but they could) so the goal is to not leave the system in a unrecoverable broken state.
So if an operator set the finalizer in the past it does not mean its around at the time someone tries to delete the resource.
Which mean we have to action the added event with an existing deletion timestamp since without it we will not be able to ever get into the finalizer flow.

@kimpenhaus
Copy link
Contributor

@ralf-cestusio does this actually mean if an operator connects to the api server by watching events - giving no resource version to start from - it will receive an add event for all existing entities?

that would be the only reasonable possibility in my opinion why you would receive an add event with deletion timestamp while you already added the entity long time ago.

it's a pity that the docs are not clear on that - or at least I haven't found it. 😄

if it behaves like that my concerns were wrong and I agree it is correct to call the finalizer. shouldn't I be able to see the events when shutting down the operator - wait for at least 5 minutes - and start it again?

@ralf-cestusio
Copy link

ralf-cestusio commented Oct 29, 2025

@kimpenhaus i agree that this is horrendously documented. (and also depends on how the watcher is implemented). But my experience with this operator shows that i get added events for all existing crd's of that kind.

But maybe someone else has tested this one properly. (i just know that when i start the operator in debug i get bombarded by added events :)

e.g example log
{"@timestamp":"2025-10-29T10:55:52.8731671-04:00","level":"Information","messageTemplate":"Started resource watcher for {ResourceType}.","message":"Started resource watcher for \"V1VanguardSpace\".","fields":{"ResourceType":"V1VanguardSpace","SourceContext":"KubeOps.Operator.Watcher.LeaderAwareResourceWatcher","application":"vanquish","version":"1.0.0"}} {"@timestamp":"2025-10-29T10:55:52.9225134-04:00","level":"Information","messageTemplate":"Received watch event \"{EventType}\" for \"{Kind}/{Name}\", last observed resource version: {ResourceVersion}.","message":"Received watch event \"Added\" for \"\"VanguardSpace\"/\"vanguard-retail\"\", last observed resource version: \"751693\".","fields":{"EventType":"Added","Kind":"VanguardSpace","Name":"vanguard-retail","ResourceVersion":"751693","SourceContext":"KubeOps.Operator.Watcher.LeaderAwareResourceWatcher","Namespace":"prod","Uid":"7f757dc8-b07b-4499-bce4-ee1decfac88a","application":"vanquish","version":"1.0.0"}}

@DaneRushtonVista
Copy link
Author

The scenario I have, which resulting in me writing this PR is:

  1. Created K8sOperator to watch Pods as I want to perform cleanup actions in another system per-pod.
  2. Finalizer is added to selected Pods via WebHook, But even if I had the PodOperator itself add the finalizer, the result would be the same.
  3. If PodOperator isn't running when the Pods it was watching begin termination, those pods get stuck as Finalizer isn't cleared -even after the PodOperator comes back! Yes the PodOperator runs HA, but in a testing environment, it's valid to shut things down out of hours to save costs.
  4. This causes the Terminating pods to hang around. The containers have stopped, and the Nodes have scaled down, but k8s still considers the Pods as existing because the finalizer hasn't been removed yet.
  5. When the PodOperator starts back up, it receives a Add event on Pods which are already in the Terminating state. No Modified events occur because they aren't being modified. The current version of this SDK would ignore that the pods were waiting finalization.
  6. This PR allows the finalizer to run and the pre-Terminating pods to be fully deleted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants