KEP-3015: PreferSameNode traffic distribution

danwinship · danwinship · commit 1185959e9dd2 · 2025-02-06T12:19:36.000-05:00
diff --git a/keps/prod-readiness/sig-network/3015.yaml b/keps/prod-readiness/sig-network/3015.yaml
@@ -0,0 +1,6 @@
+# The KEP must have an approver from the
+# "prod-readiness-approvers" group 
+# of http://git.k8s.io/enhancements/OWNERS_ALIASES
+kep-number: 3015
+alpha:
+  approver: "@johnbelamaric"
diff --git a/keps/sig-network/3015-prefer-same-node/README.md b/keps/sig-network/3015-prefer-same-node/README.md
@@ -0,0 +1,390 @@
+# KEP-3015 PreferSameNode Traffic Distribution
+
+<!-- toc -->
+- [Release Signoff Checklist](#release-signoff-checklist)
+- [Summary](#summary)
+- [Motivation](#motivation)
+  - [Goals](#goals)
+  - [Non-Goals](#non-goals)
+- [Proposal](#proposal)
+  - [User Stories](#user-stories)
+    - [DNS](#dns)
+  - [Risks and Mitigations](#risks-and-mitigations)
+- [Design Details](#design-details)
+  - [Test Plan](#test-plan)
+      - [Prerequisite testing updates](#prerequisite-testing-updates)
+      - [Unit tests](#unit-tests)
+      - [Integration tests](#integration-tests)
+      - [e2e tests](#e2e-tests)
+  - [Graduation Criteria](#graduation-criteria)
+    - [Alpha](#alpha)
+    - [Beta](#beta)
+    - [GA](#ga)
+  - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
+  - [Version Skew Strategy](#version-skew-strategy)
+- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
+  - [Feature Enablement and Rollback](#feature-enablement-and-rollback)
+  - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
+  - [Monitoring Requirements](#monitoring-requirements)
+  - [Dependencies](#dependencies)
+  - [Scalability](#scalability)
+  - [Troubleshooting](#troubleshooting)
+- [Implementation History](#implementation-history)
+<!-- /toc -->
+
+## Release Signoff Checklist
+
+<!--
+**ACTION REQUIRED:** In order to merge code into a release, there must be an
+issue in [kubernetes/enhancements] referencing this KEP and targeting a release
+milestone **before the [Enhancement Freeze](https://git.k8s.io/sig-release/releases)
+of the targeted release**.
+
+For enhancements that make changes to code or processes/procedures in core
+Kubernetes—i.e., [kubernetes/kubernetes], we require the following Release
+Signoff checklist to be completed.
+
+Check these off as they are completed for the Release Team to track. These
+checklist items _must_ be updated for the enhancement to be released.
+-->
+
+Items marked with (R) are required *prior to targeting to a milestone / release*.
+
+- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
+- [ ] (R) KEP approvers have approved the KEP status as `implementable`
+- [ ] (R) Design details are appropriately documented
+- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
+  - [ ] e2e Tests for all Beta API Operations (endpoints)
+  - [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) 
+  - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
+- [ ] (R) Graduation criteria is in place
+  - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) 
+- [ ] (R) Production readiness review completed
+- [ ] (R) Production readiness review approved
+- [ ] "Implementation History" section is up-to-date for milestone
+- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
+- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
+
+<!--
+**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
+-->
+
+[kubernetes.io]: https://kubernetes.io/
+[kubernetes/enhancements]: https://git.k8s.io/enhancements
+[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
+[kubernetes/website]: https://git.k8s.io/website
+
+## Summary
+
+This KEP extends KEP-4444 `TrafficDistribution` with a new value,
+`PreferSameNode`, indicating traffic for a service should
+preferentially be routed to endpoints on the same node as the client.
+
+(This is the third attempt at this feature, which was previously
+suggested as [`internalTrafficPolicy: PreferLocal`] and [Node-level
+topology].)
+
+[`internalTrafficPolicy: PreferLocal`]: https://github.com/kubernetes/enhancements/pull/3016
+[Node-level topology]: https://github.com/kubernetes/enhancements/pull/3293
+
+## Motivation
+
+### Goals
+
+- Allow configuring a service so that connections will be delivered to
+  a local endpoint when possible, and a remote endpoint if not.
+
+### Non-Goals
+
+N/A
+
+## Proposal
+
+### User Stories
+
+#### DNS
+
+As a cluster administrator, I plan to run a DNS pod on each node, and
+would like DNS requests from other pods to always go to the local DNS
+pod, for efficiency. However, if no local DNS pod is available, DNS
+should just go to a remote pod instead so it keeps working. There
+should never be enough DNS traffic to overload any one endpoint, so
+it's safe to use a TrafficDistribution mode that doesn't worry about
+endpoint overload.
+
+### Risks and Mitigations
+
+This is similar to the existing `PreferClose` mode (possibly to be
+renamed `PreferSameZone`) and has the same sorts of risks.
+We only use the new traffic distribution mode if the user explicitly
+requests it, and in that case, the user is responsible for ensuring
+that clients and servers are distributed in a way such that the
+traffic distribution mode makes sense.
+
+## Design Details
+
+We will add a new field to `discoveryv1.EndpointHints`:
+
+```golang
+// EndpointHints provides hints describing how an endpoint should be consumed.
+type EndpointHints struct {
+        ...
+
+	// forNodes indicates the node(s) this endpoint should be targeted by.
+	// +listType=atomic
+	ForNodes []string `json:"forNodes,omitempty" protobuf:"bytes,2,name=forNodes"`
+}
+
+When updating EndpointSlices, if the EndpointSlice controller sees a
+service with `PreferSameNode` traffic distribution, then for each
+endpoint in the slice, it will add a `ForNodes` hint including the
+name of the endpoint's node. (The field is an array for future
+extensibility, but initially it will always have either 0 or 1
+elements.) In addition, it will set the `ForZones` hint as it would
+with `TrafficDistribution: PreferClose`, to allow older service
+proxies to fall back to at least same-zone behavior.
+
+When kube-proxy sees an Endpoint with the `ForNodes` hint set, it will
+use that endpoint if the hint includes its own node name, and ignore
+it otherwise, similarly to the `ForZones` hint.
+
+### Test Plan
+
+[X] I/we understand the owners of the involved components may require updates to
+existing tests to make this code solid enough prior to committing the changes necessary
+to implement this enhancement.
+
+##### Prerequisite testing updates
+
+N/A
+
+##### Unit tests
+
+Tests of validation, endpointslice-controller, and kube-proxy will be
+updated.
+
+<!--
+Additionally, for Alpha try to enumerate the core package you will be touching
+to implement this enhancement and provide the current unit coverage for those
+in the form of:
+- <package>: <date> - <current test coverage>
+The data can be easily read from:
+https://testgrid.k8s.io/sig-testing-canaries#ci-kubernetes-coverage-unit
+
+This can inform certain test coverage improvements that we want to do before
+extending the production code to implement this enhancement.
+-->
+
+- `<package>`: `<date>` - `<test coverage>`
+
+##### Integration tests
+
+N/A
+
+##### e2e tests
+
+E2E tests will be added similar to existing traffic distribution
+tests, to cover the new options.
+
+- <test>: <link to test coverage>
+
+### Graduation Criteria
+
+#### Alpha
+
+- Feature implemented behind a feature flag
+
+- Unit tests for API enablement and endpoint selection.
+
+#### Beta
+
+- E2E tests completed and enabled.
+
+- Enough time has passed since Alpha to avoid version skew issues.
+
+#### GA
+
+- Time passes, no major objections
+
+### Upgrade / Downgrade Strategy
+
+No real issues, other than dealing with skew.
+
+### Version Skew Strategy
+
+In skewed clusters, it may not be possible for kube-controller-manager
+to set the new EndpointSlice hint, or else kube-proxy may not be able
+to see the hint. In this case, the service will fall back to
+perfer-same-zone semantics rather than prefer-same-node. Users can
+avoid problems with this by not using the feature until their cluster
+is fully upgraded to a version that supports the feature.
+
+## Production Readiness Review Questionnaire
+
+### Feature Enablement and Rollback
+
+###### How can this feature be enabled / disabled in a live cluster?
+
+- [X] Feature gate (also fill in values in `kep.yaml`)
+  - Feature gate name: PreferSameNodeTrafficDistribution
+  - Components depending on the feature gate:
+    - kube-apiserver
+    - kube-controller-manager
+    - kube-proxy
+
+###### Does enabling the feature change any default behavior?
+
+No
+
+###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
+
+Yes.
+
+###### What happens if we reenable the feature if it was previously rolled back?
+
+It starts working again.
+
+###### Are there any tests for feature enablement/disablement?
+
+No.
+
+### Rollout, Upgrade and Rollback Planning
+
+###### How can a rollout or rollback fail? Can it impact already running workloads?
+
+An initial rollout cannot fail and won't impact already-running
+workloads, because at the time of the initial rollout, there cannot
+already be any `TrafficDistribution: PreferSameNode` services.
+
+A rollback has reasonable fallback behavior (as with downgrades), and
+a re-rollout just updates the behavior of existing `PreferSameNode`
+services in the expected way.
+
+###### What specific metrics should inform a rollback?
+
+There are no metrics that would inform anyone that the feature was
+failing, but since the feature is opt-in, individual users can simply
+stop using the feature if it is not working for them.
+
+###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
+
+No
+
+###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
+
+No
+
+### Monitoring Requirements
+
+###### How can an operator determine if the feature is in use by workloads?
+
+By checking if any Service has `TrafficDistribution: PreferSameNode`.
+
+###### How can someone using this feature know that it is working for their instance?
+
+As with other topology features, there is no easy way for an end user
+to reliably confirm that it is working correctly other than by
+sniffing the network traffic, or else looking at the logs of each
+endpoint to confirm that they are receiving the expected connections
+and not receiving unexpected connections.
+
+###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
+
+The implementation of the feature itself has no SLOs. The effect it
+has on the performance of end user workloads that use the feature
+depends on those workloads.
+
+###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
+
+The implementation of the feature itself has no SLIs, other than the
+generic kube-proxy metrics. User workloads that use the feature may
+expose SLI information that the user can examine to determine how well
+the feature is working for their workload.
+
+###### Are there any missing metrics that would be useful to have to improve observability of this feature?
+
+Not really; we don't know how fast the user's services are supposed to
+be, so we can't really tell if we are improving them as much as they
+hoped or not.
+
+### Dependencies
+
+###### Does this feature depend on any specific services running in the cluster?
+
+It depends on a service proxy which recognizes the new traffic policy
+values. We will update `kube-proxy` ourselves, but network plugins /
+kubernetes distributions that ship their own alternative service
+proxies will also need to be updated to support the new value before
+their users can make use of it. (Until then, `TrafficDistribution:
+PreferSameNode` would be implemented as `TrafficDistribution:
+PreferClose`.)
+
+### Scalability
+
+###### Will enabling / using this feature result in any new API calls?
+
+No
+
+###### Will enabling / using this feature result in introducing new API types?
+
+No
+
+###### Will enabling / using this feature result in any new calls to the cloud provider?
+
+No
+
+###### Will enabling / using this feature result in increasing size or count of the existing API objects?
+
+No (other than that it means people may set `TrafficDistribution` on
+Services where they were not previously setting it).
+
+###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
+
+No
+
+###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
+
+No
+
+### Troubleshooting
+
+###### How does this feature react if the API server and/or etcd is unavailable?
+
+No change from existing service/proxy behavior.
+
+###### What are other known failure modes?
+
+None known
+
+###### What steps should be taken if SLOs are not being met to determine the problem?
+
+N/A
+
+## Implementation History
+
+- Initial proposal as `InternalTrafficPolicy: PreferLocal`: 2021-10-21
+- Initial proposal as "Node-level topology": 2022-01-15
+- Initial proposal as `TrafficDistribution: PreferSameNode`: 2025-02-06
+
+## Drawbacks
+
+## Alternatives
+
+As noted, this is the third attempt at this feature.
+
+The initial proposal ([#3016]) was for `internalTrafficPolicy:
+PreferLocal`, but we decided that traffic policy was for
+semantically-significant changes to how traffic was distributed,
+whereas this is just a hint, like topology.
+
+That led to the second attempt ([#3293]), which never got as far as
+defining a specific API, but reframed the problem as being a kind of
+topology hint. This eventually fizzled out because of people's
+opinions at that time about how topology ought to work in Kubernetes.
+
+However, KEP-4444 (TrafficDistribution) represents an updated
+understanding of topology in Kubernetes, which makes the idea of
+node-level topology palatable.
+
+[#3016]: https://github.com/kubernetes/enhancements/pull/3016
+[#3293]: https://github.com/kubernetes/enhancements/pull/3293
diff --git a/keps/sig-network/3015-prefer-same-node/kep.yaml b/keps/sig-network/3015-prefer-same-node/kep.yaml