Skip to content

[v3.30] BPF: Fix that stale-NAT cleanup didn't wait for kube-proxy sync #10724

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

fasaxc
Copy link
Member

@fasaxc fasaxc commented Jul 24, 2025

Cherry-pick history (this PR at top)

Description

Conntrack cleanup could fire before kube-proxy had loaded the services, resulting in every NAT entry looking stale.

Related issues/PRs

Should fix #10670

Todos

  • Tests
  • Documentation
  • Release note

Release Note

eBPF: Fix race between loading kubernetes services and conntrack cleanup.  If conntrack cleanup ran before services were loaded, all service entries would look stale and get cleaned up.

Reminder for the reviewer

Make sure that this PR has the correct labels and milestone set.

Every PR needs one docs-* label.

  • docs-pr-required: This change requires a change to the documentation that has not been completed yet.
  • docs-completed: This change has all necessary documentation completed.
  • docs-not-required: This change has no user-facing impact and requires no docs.

Every PR needs one release-note-* label.

  • release-note-required: This PR has user-facing changes. Most PRs should have this label.
  • release-note-not-required: This PR has no user-facing changes.

Other optional labels:

  • cherry-pick-candidate: This PR should be cherry-picked to an earlier release. For bug fixes only.
  • needs-operator-pr: This PR is related to install and requires a corresponding change to the operator.

Conntrack cleanup could fire before kube-proxy had loaded the services,
resulting in _every_ NAT entry looking stale.
@Copilot Copilot AI review requested due to automatic review settings July 24, 2025 09:51
@fasaxc fasaxc added release-note-required Change has user-facing impact (no matter how small) cherry-pick-candidate docs-not-required Docs not required for this change labels Jul 24, 2025
@fasaxc fasaxc requested a review from a team as a code owner July 24, 2025 09:51
@fasaxc fasaxc added release-note-required Change has user-facing impact (no matter how small) cherry-pick-candidate docs-not-required Docs not required for this change labels Jul 24, 2025
@marvin-tigera marvin-tigera added this to the Calico v3.30.3 milestone Jul 24, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a race condition between BPF conntrack cleanup and kube-proxy service synchronization. The issue occurred when conntrack cleanup ran before kube-proxy had loaded the Kubernetes services, causing all NAT entries to appear stale and get incorrectly removed.

  • Adds a HasSynced() method to the DPSyncer interface to track synchronization status
  • Updates conntrack methods to check sync status before delegating to the syncer
  • Adds comprehensive tests to verify the race condition fix

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
felix/bpf/proxy/proxy.go Adds HasSynced() method to DPSyncer interface
felix/bpf/proxy/syncer.go Implements HasSynced() method returning internal sync state
felix/bpf/proxy/kube-proxy.go Updates conntrack methods to check HasSynced() before delegating
felix/bpf/proxy/proxy_test.go Updates mock syncer to implement new interface method
felix/bpf/proxy/kube-proxy_internal_test.go Adds tests for the sync status checking logic
Comments suppressed due to low confidence (2)

felix/bpf/proxy/kube-proxy_internal_test.go:34

  • The test comment and assertion don't match. The test expects true when not synced, but the error message says it 'should return true'. Consider clarifying whether this is the intended behavior or if there's a mismatch in the test logic.
		t.Errorf("ConntrackFrontendHasBackend should return true when syncer has not synced")

felix/bpf/proxy/kube-proxy_internal_test.go:38

  • The test expects false when synced, but the assertion logic appears inverted. The test calls kp.ConntrackFrontendHasBackend() and expects it to return false, but then reports an error saying it 'should return false', which suggests the condition check may be wrong.
		t.Errorf("ConntrackFrontendHasBackend should return false when syncer has synced")

@sridhartigera sridhartigera merged commit ab3d108 into projectcalico:release-v3.30 Jul 24, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick-candidate docs-not-required Docs not required for this change release-note-required Change has user-facing impact (no matter how small)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants