|  | 
|  | 1 | +# KEP-3015 PreferSameNode Traffic Distribution | 
|  | 2 | + | 
|  | 3 | +<!-- toc --> | 
|  | 4 | +- [Release Signoff Checklist](#release-signoff-checklist) | 
|  | 5 | +- [Summary](#summary) | 
|  | 6 | +- [Motivation](#motivation) | 
|  | 7 | +  - [Goals](#goals) | 
|  | 8 | +  - [Non-Goals](#non-goals) | 
|  | 9 | +- [Proposal](#proposal) | 
|  | 10 | +  - [User Stories](#user-stories) | 
|  | 11 | +    - [DNS](#dns) | 
|  | 12 | +  - [Risks and Mitigations](#risks-and-mitigations) | 
|  | 13 | +- [Design Details](#design-details) | 
|  | 14 | +  - [Test Plan](#test-plan) | 
|  | 15 | +      - [Prerequisite testing updates](#prerequisite-testing-updates) | 
|  | 16 | +      - [Unit tests](#unit-tests) | 
|  | 17 | +      - [Integration tests](#integration-tests) | 
|  | 18 | +      - [e2e tests](#e2e-tests) | 
|  | 19 | +  - [Graduation Criteria](#graduation-criteria) | 
|  | 20 | +    - [Alpha](#alpha) | 
|  | 21 | +    - [Beta](#beta) | 
|  | 22 | +    - [GA](#ga) | 
|  | 23 | +  - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) | 
|  | 24 | +  - [Version Skew Strategy](#version-skew-strategy) | 
|  | 25 | +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) | 
|  | 26 | +  - [Feature Enablement and Rollback](#feature-enablement-and-rollback) | 
|  | 27 | +  - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) | 
|  | 28 | +  - [Monitoring Requirements](#monitoring-requirements) | 
|  | 29 | +  - [Dependencies](#dependencies) | 
|  | 30 | +  - [Scalability](#scalability) | 
|  | 31 | +  - [Troubleshooting](#troubleshooting) | 
|  | 32 | +- [Implementation History](#implementation-history) | 
|  | 33 | +<!-- /toc --> | 
|  | 34 | + | 
|  | 35 | +## Release Signoff Checklist | 
|  | 36 | + | 
|  | 37 | +<!-- | 
|  | 38 | +**ACTION REQUIRED:** In order to merge code into a release, there must be an | 
|  | 39 | +issue in [kubernetes/enhancements] referencing this KEP and targeting a release | 
|  | 40 | +milestone **before the [Enhancement Freeze](https://git.k8s.io/sig-release/releases) | 
|  | 41 | +of the targeted release**. | 
|  | 42 | +
 | 
|  | 43 | +For enhancements that make changes to code or processes/procedures in core | 
|  | 44 | +Kubernetes—i.e., [kubernetes/kubernetes], we require the following Release | 
|  | 45 | +Signoff checklist to be completed. | 
|  | 46 | +
 | 
|  | 47 | +Check these off as they are completed for the Release Team to track. These | 
|  | 48 | +checklist items _must_ be updated for the enhancement to be released. | 
|  | 49 | +--> | 
|  | 50 | + | 
|  | 51 | +Items marked with (R) are required *prior to targeting to a milestone / release*. | 
|  | 52 | + | 
|  | 53 | +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) | 
|  | 54 | +- [ ] (R) KEP approvers have approved the KEP status as `implementable` | 
|  | 55 | +- [ ] (R) Design details are appropriately documented | 
|  | 56 | +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) | 
|  | 57 | +  - [ ] e2e Tests for all Beta API Operations (endpoints) | 
|  | 58 | +  - [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)  | 
|  | 59 | +  - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free | 
|  | 60 | +- [ ] (R) Graduation criteria is in place | 
|  | 61 | +  - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)  | 
|  | 62 | +- [ ] (R) Production readiness review completed | 
|  | 63 | +- [ ] (R) Production readiness review approved | 
|  | 64 | +- [ ] "Implementation History" section is up-to-date for milestone | 
|  | 65 | +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] | 
|  | 66 | +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes | 
|  | 67 | + | 
|  | 68 | +<!-- | 
|  | 69 | +**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone. | 
|  | 70 | +--> | 
|  | 71 | + | 
|  | 72 | +[kubernetes.io]: https://kubernetes.io/ | 
|  | 73 | +[kubernetes/enhancements]: https://git.k8s.io/enhancements | 
|  | 74 | +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes | 
|  | 75 | +[kubernetes/website]: https://git.k8s.io/website | 
|  | 76 | + | 
|  | 77 | +## Summary | 
|  | 78 | + | 
|  | 79 | +This KEP extends KEP-4444 `TrafficDistribution` with a new value, | 
|  | 80 | +`PreferSameNode`, indicating traffic for a service should | 
|  | 81 | +preferentially be routed to endpoints on the same node as the client. | 
|  | 82 | + | 
|  | 83 | +(This is the third attempt at this feature, which was previously | 
|  | 84 | +suggested as [`internalTrafficPolicy: PreferLocal`] and [Node-level | 
|  | 85 | +topology].) | 
|  | 86 | + | 
|  | 87 | +[`internalTrafficPolicy: PreferLocal`]: https://github.com/kubernetes/enhancements/pull/3016 | 
|  | 88 | +[Node-level topology]: https://github.com/kubernetes/enhancements/pull/3293 | 
|  | 89 | + | 
|  | 90 | +## Motivation | 
|  | 91 | + | 
|  | 92 | +### Goals | 
|  | 93 | + | 
|  | 94 | +- Allow configuring a service so that connections will be delivered to | 
|  | 95 | +  a local endpoint when possible, and a remote endpoint if not. | 
|  | 96 | + | 
|  | 97 | +### Non-Goals | 
|  | 98 | + | 
|  | 99 | +N/A | 
|  | 100 | + | 
|  | 101 | +## Proposal | 
|  | 102 | + | 
|  | 103 | +### User Stories | 
|  | 104 | + | 
|  | 105 | +#### DNS | 
|  | 106 | + | 
|  | 107 | +As a cluster administrator, I plan to run a DNS pod on each node, and | 
|  | 108 | +would like DNS requests from other pods to always go to the local DNS | 
|  | 109 | +pod, for efficiency. However, if no local DNS pod is available, DNS | 
|  | 110 | +should just go to a remote pod instead so it keeps working. There | 
|  | 111 | +should never be enough DNS traffic to overload any one endpoint, so | 
|  | 112 | +it's safe to use a TrafficDistribution mode that doesn't worry about | 
|  | 113 | +endpoint overload. | 
|  | 114 | + | 
|  | 115 | +### Risks and Mitigations | 
|  | 116 | + | 
|  | 117 | +This is similar to the existing `PreferClose` mode (possibly to be | 
|  | 118 | +renamed `PreferSameZone`) and has the same sorts of risks. | 
|  | 119 | +We only use the new traffic distribution mode if the user explicitly | 
|  | 120 | +requests it, and in that case, the user is responsible for ensuring | 
|  | 121 | +that clients and servers are distributed in a way such that the | 
|  | 122 | +traffic distribution mode makes sense. | 
|  | 123 | + | 
|  | 124 | +## Design Details | 
|  | 125 | + | 
|  | 126 | +We will add a new field to `discoveryv1.EndpointHints`: | 
|  | 127 | + | 
|  | 128 | +```golang | 
|  | 129 | +// EndpointHints provides hints describing how an endpoint should be consumed. | 
|  | 130 | +type EndpointHints struct { | 
|  | 131 | +        ... | 
|  | 132 | + | 
|  | 133 | +	// forNodes indicates the node(s) this endpoint should be targeted by. | 
|  | 134 | +	// +listType=atomic | 
|  | 135 | +	ForNodes []string `json:"forNodes,omitempty" protobuf:"bytes,2,name=forNodes"` | 
|  | 136 | +} | 
|  | 137 | + | 
|  | 138 | +When updating EndpointSlices, if the EndpointSlice controller sees a | 
|  | 139 | +service with `PreferSameNode` traffic distribution, then for each | 
|  | 140 | +endpoint in the slice, it will add a `ForNodes` hint including the | 
|  | 141 | +name of the endpoint's node. (The field is an array for future | 
|  | 142 | +extensibility, but initially it will always have either 0 or 1 | 
|  | 143 | +elements.) In addition, it will set the `ForZones` hint as it would | 
|  | 144 | +with `TrafficDistribution: PreferClose`, to allow older service | 
|  | 145 | +proxies to fall back to at least same-zone behavior. | 
|  | 146 | +
 | 
|  | 147 | +When kube-proxy sees an Endpoint with the `ForNodes` hint set, it will | 
|  | 148 | +use that endpoint if the hint includes its own node name, and ignore | 
|  | 149 | +it otherwise, similarly to the `ForZones` hint. | 
|  | 150 | +
 | 
|  | 151 | +### Test Plan | 
|  | 152 | +
 | 
|  | 153 | +[X] I/we understand the owners of the involved components may require updates to | 
|  | 154 | +existing tests to make this code solid enough prior to committing the changes necessary | 
|  | 155 | +to implement this enhancement. | 
|  | 156 | +
 | 
|  | 157 | +##### Prerequisite testing updates | 
|  | 158 | +
 | 
|  | 159 | +N/A | 
|  | 160 | +
 | 
|  | 161 | +##### Unit tests | 
|  | 162 | +
 | 
|  | 163 | +Tests of validation, endpointslice-controller, and kube-proxy will be | 
|  | 164 | +updated. | 
|  | 165 | +
 | 
|  | 166 | +<!-- | 
|  | 167 | +Additionally, for Alpha try to enumerate the core package you will be touching | 
|  | 168 | +to implement this enhancement and provide the current unit coverage for those | 
|  | 169 | +in the form of: | 
|  | 170 | +- <package>: <date> - <current test coverage> | 
|  | 171 | +The data can be easily read from: | 
|  | 172 | +https://testgrid.k8s.io/sig-testing-canaries#ci-kubernetes-coverage-unit | 
|  | 173 | +
 | 
|  | 174 | +This can inform certain test coverage improvements that we want to do before | 
|  | 175 | +extending the production code to implement this enhancement. | 
|  | 176 | +--> | 
|  | 177 | +
 | 
|  | 178 | +- `<package>`: `<date>` - `<test coverage>` | 
|  | 179 | +
 | 
|  | 180 | +##### Integration tests | 
|  | 181 | +
 | 
|  | 182 | +N/A | 
|  | 183 | +
 | 
|  | 184 | +##### e2e tests | 
|  | 185 | +
 | 
|  | 186 | +E2E tests will be added similar to existing traffic distribution | 
|  | 187 | +tests, to cover the new options. | 
|  | 188 | +
 | 
|  | 189 | +- <test>: <link to test coverage> | 
|  | 190 | +
 | 
|  | 191 | +### Graduation Criteria | 
|  | 192 | +
 | 
|  | 193 | +#### Alpha | 
|  | 194 | +
 | 
|  | 195 | +- Feature implemented behind a feature flag | 
|  | 196 | +
 | 
|  | 197 | +- Unit tests for API enablement and endpoint selection. | 
|  | 198 | +
 | 
|  | 199 | +#### Beta | 
|  | 200 | +
 | 
|  | 201 | +- E2E tests completed and enabled. | 
|  | 202 | +
 | 
|  | 203 | +- Enough time has passed since Alpha to avoid version skew issues. | 
|  | 204 | +
 | 
|  | 205 | +#### GA | 
|  | 206 | +
 | 
|  | 207 | +- Time passes, no major objections | 
|  | 208 | +
 | 
|  | 209 | +### Upgrade / Downgrade Strategy | 
|  | 210 | +
 | 
|  | 211 | +No real issues, other than dealing with skew. | 
|  | 212 | +
 | 
|  | 213 | +### Version Skew Strategy | 
|  | 214 | +
 | 
|  | 215 | +In skewed clusters, it may not be possible for kube-controller-manager | 
|  | 216 | +to set the new EndpointSlice hint, or else kube-proxy may not be able | 
|  | 217 | +to see the hint. In this case, the service will fall back to | 
|  | 218 | +perfer-same-zone semantics rather than prefer-same-node. Users can | 
|  | 219 | +avoid problems with this by not using the feature until their cluster | 
|  | 220 | +is fully upgraded to a version that supports the feature. | 
|  | 221 | +
 | 
|  | 222 | +## Production Readiness Review Questionnaire | 
|  | 223 | +
 | 
|  | 224 | +### Feature Enablement and Rollback | 
|  | 225 | +
 | 
|  | 226 | +###### How can this feature be enabled / disabled in a live cluster? | 
|  | 227 | +
 | 
|  | 228 | +- [X] Feature gate (also fill in values in `kep.yaml`) | 
|  | 229 | +  - Feature gate name: PreferSameNodeTrafficDistribution | 
|  | 230 | +  - Components depending on the feature gate: | 
|  | 231 | +    - kube-apiserver | 
|  | 232 | +    - kube-controller-manager | 
|  | 233 | +    - kube-proxy | 
|  | 234 | +
 | 
|  | 235 | +###### Does enabling the feature change any default behavior? | 
|  | 236 | +
 | 
|  | 237 | +No | 
|  | 238 | +
 | 
|  | 239 | +###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? | 
|  | 240 | +
 | 
|  | 241 | +Yes. | 
|  | 242 | +
 | 
|  | 243 | +###### What happens if we reenable the feature if it was previously rolled back? | 
|  | 244 | +
 | 
|  | 245 | +It starts working again. | 
|  | 246 | +
 | 
|  | 247 | +###### Are there any tests for feature enablement/disablement? | 
|  | 248 | +
 | 
|  | 249 | +No. | 
|  | 250 | +
 | 
|  | 251 | +### Rollout, Upgrade and Rollback Planning | 
|  | 252 | +
 | 
|  | 253 | +###### How can a rollout or rollback fail? Can it impact already running workloads? | 
|  | 254 | +
 | 
|  | 255 | +An initial rollout cannot fail and won't impact already-running | 
|  | 256 | +workloads, because at the time of the initial rollout, there cannot | 
|  | 257 | +already be any `TrafficDistribution: PreferSameNode` services. | 
|  | 258 | + | 
|  | 259 | +A rollback has reasonable fallback behavior (as with downgrades), and | 
|  | 260 | +a re-rollout just updates the behavior of existing `PreferSameNode` | 
|  | 261 | +services in the expected way. | 
|  | 262 | + | 
|  | 263 | +###### What specific metrics should inform a rollback? | 
|  | 264 | + | 
|  | 265 | +There are no metrics that would inform anyone that the feature was | 
|  | 266 | +failing, but since the feature is opt-in, individual users can simply | 
|  | 267 | +stop using the feature if it is not working for them. | 
|  | 268 | + | 
|  | 269 | +###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? | 
|  | 270 | + | 
|  | 271 | +No | 
|  | 272 | + | 
|  | 273 | +###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? | 
|  | 274 | + | 
|  | 275 | +No | 
|  | 276 | + | 
|  | 277 | +### Monitoring Requirements | 
|  | 278 | + | 
|  | 279 | +###### How can an operator determine if the feature is in use by workloads? | 
|  | 280 | + | 
|  | 281 | +By checking if any Service has `TrafficDistribution: PreferSameNode`. | 
|  | 282 | + | 
|  | 283 | +###### How can someone using this feature know that it is working for their instance? | 
|  | 284 | + | 
|  | 285 | +As with other topology features, there is no easy way for an end user | 
|  | 286 | +to reliably confirm that it is working correctly other than by | 
|  | 287 | +sniffing the network traffic, or else looking at the logs of each | 
|  | 288 | +endpoint to confirm that they are receiving the expected connections | 
|  | 289 | +and not receiving unexpected connections. | 
|  | 290 | + | 
|  | 291 | +###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? | 
|  | 292 | + | 
|  | 293 | +The implementation of the feature itself has no SLOs. The effect it | 
|  | 294 | +has on the performance of end user workloads that use the feature | 
|  | 295 | +depends on those workloads. | 
|  | 296 | + | 
|  | 297 | +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? | 
|  | 298 | + | 
|  | 299 | +The implementation of the feature itself has no SLIs, other than the | 
|  | 300 | +generic kube-proxy metrics. User workloads that use the feature may | 
|  | 301 | +expose SLI information that the user can examine to determine how well | 
|  | 302 | +the feature is working for their workload. | 
|  | 303 | + | 
|  | 304 | +###### Are there any missing metrics that would be useful to have to improve observability of this feature? | 
|  | 305 | + | 
|  | 306 | +Not really; we don't know how fast the user's services are supposed to | 
|  | 307 | +be, so we can't really tell if we are improving them as much as they | 
|  | 308 | +hoped or not. | 
|  | 309 | +
 | 
|  | 310 | +### Dependencies | 
|  | 311 | +
 | 
|  | 312 | +###### Does this feature depend on any specific services running in the cluster? | 
|  | 313 | +
 | 
|  | 314 | +It depends on a service proxy which recognizes the new traffic policy | 
|  | 315 | +values. We will update `kube-proxy` ourselves, but network plugins / | 
|  | 316 | +kubernetes distributions that ship their own alternative service | 
|  | 317 | +proxies will also need to be updated to support the new value before | 
|  | 318 | +their users can make use of it. (Until then, `TrafficDistribution: | 
|  | 319 | +PreferSameNode` would be implemented as `TrafficDistribution: | 
|  | 320 | +PreferClose`.) | 
|  | 321 | +
 | 
|  | 322 | +### Scalability | 
|  | 323 | +
 | 
|  | 324 | +###### Will enabling / using this feature result in any new API calls? | 
|  | 325 | +
 | 
|  | 326 | +No | 
|  | 327 | +
 | 
|  | 328 | +###### Will enabling / using this feature result in introducing new API types? | 
|  | 329 | +
 | 
|  | 330 | +No | 
|  | 331 | +
 | 
|  | 332 | +###### Will enabling / using this feature result in any new calls to the cloud provider? | 
|  | 333 | +
 | 
|  | 334 | +No | 
|  | 335 | +
 | 
|  | 336 | +###### Will enabling / using this feature result in increasing size or count of the existing API objects? | 
|  | 337 | +
 | 
|  | 338 | +No (other than that it means people may set `TrafficDistribution` on | 
|  | 339 | +Services where they were not previously setting it). | 
|  | 340 | +
 | 
|  | 341 | +###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? | 
|  | 342 | +
 | 
|  | 343 | +No | 
|  | 344 | +
 | 
|  | 345 | +###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? | 
|  | 346 | +
 | 
|  | 347 | +No | 
|  | 348 | +
 | 
|  | 349 | +### Troubleshooting | 
|  | 350 | +
 | 
|  | 351 | +###### How does this feature react if the API server and/or etcd is unavailable? | 
|  | 352 | +
 | 
|  | 353 | +No change from existing service/proxy behavior. | 
|  | 354 | +
 | 
|  | 355 | +###### What are other known failure modes? | 
|  | 356 | +
 | 
|  | 357 | +None known | 
|  | 358 | +
 | 
|  | 359 | +###### What steps should be taken if SLOs are not being met to determine the problem? | 
|  | 360 | +
 | 
|  | 361 | +N/A | 
|  | 362 | +
 | 
|  | 363 | +## Implementation History | 
|  | 364 | +
 | 
|  | 365 | +- Initial proposal as `InternalTrafficPolicy: PreferLocal`: 2021-10-21 | 
|  | 366 | +- Initial proposal as "Node-level topology": 2022-01-15 | 
|  | 367 | +- Initial proposal as `TrafficDistribution: PreferSameNode`: 2025-02-06 | 
|  | 368 | +
 | 
|  | 369 | +## Drawbacks | 
|  | 370 | +
 | 
|  | 371 | +## Alternatives | 
|  | 372 | +
 | 
|  | 373 | +As noted, this is the third attempt at this feature. | 
|  | 374 | +
 | 
|  | 375 | +The initial proposal ([#3016]) was for `internalTrafficPolicy: | 
|  | 376 | +PreferLocal`, but we decided that traffic policy was for | 
|  | 377 | +semantically-significant changes to how traffic was distributed, | 
|  | 378 | +whereas this is just a hint, like topology. | 
|  | 379 | +
 | 
|  | 380 | +That led to the second attempt ([#3293]), which never got as far as | 
|  | 381 | +defining a specific API, but reframed the problem as being a kind of | 
|  | 382 | +topology hint. This eventually fizzled out because of people's | 
|  | 383 | +opinions at that time about how topology ought to work in Kubernetes. | 
|  | 384 | + | 
|  | 385 | +However, KEP-4444 (TrafficDistribution) represents an updated | 
|  | 386 | +understanding of topology in Kubernetes, which makes the idea of | 
|  | 387 | +node-level topology palatable. | 
|  | 388 | + | 
|  | 389 | +[#3016]: https://github.com/kubernetes/enhancements/pull/3016 | 
|  | 390 | +[#3293]: https://github.com/kubernetes/enhancements/pull/3293 | 
0 commit comments