Skip to content

Conversation

@exextatic
Copy link
Contributor

@exextatic exextatic commented Jun 26, 2024

Fixes #677.

This pull request fixes the following behavioral issues noted when testing a leader-aware operator with transient network issues:

  • In LeaderElectionBackgroundService, if elector.RunUntilLeadershipLostAsync() throws, the exception is not observed in the library and no further attempts to become the leader occur. The library now logs any unexpected exceptions and tries to become the leader again.
  • A leader could not stop and then subsequently start being a leader once more due to cancellation token sources not being recreated. The library now disposes and recreates the cancellation token sources as required.
  • LeaderAwareResourceWatcher<TEntity>.StoppedLeading would erroneously pass a cancelled cancellation token to ResourceWatcher<TEntity>. The library now passes the IHostApplicationLifetime.ApplicationStopped token to the ResourceWatcher<TEntity> - we can assume that ApplicationStopped is a good indication that the stop should no longer be graceful.

Due to the nature of these issues I've only been able to add test coverage in the form of unit tests.
I've tested these changes in practice using clumsy and an AKS cluster.

@buehler
Copy link
Collaborator

buehler commented Jun 27, 2024

Thank you @exextatic for your contribution!

It's fine to have unit tests, testing network loss is pretty hard to setup, as such I trust your message that you tested it ;-)

The linter is complaining about some white spaces though. Can you run dotnet format and push the commit again? I'd love to merge these changes.

@exextatic
Copy link
Contributor Author

Thank you @exextatic for your contribution!

It's fine to have unit tests, testing network loss is pretty hard to setup, as such I trust your message that you tested it ;-)

The linter is complaining about some white spaces though. Can you run dotnet format and push the commit again? I'd love to merge these changes.

Thank you for looking at the changes @buehler.
I've amended the whitespace formatting issue in my latest commit.

@buehler buehler enabled auto-merge (squash) June 28, 2024 08:22
@buehler buehler merged commit e0523ca into dotnet:main Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug]: LeaderAwareResourceWatcher does not regain leadership after network issues

2 participants