Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions keps/prod-readiness/sig-network/2433.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,5 @@ alpha:
approver: "@wojtek-t"
beta:
approver: "@wojtek-t"
stable:
approver: "@wojtek-t"
138 changes: 114 additions & 24 deletions keps/sig-network/2433-topology-aware-hints/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,17 @@
- [Handling Node Updates](#handling-node-updates)
- [Future Expansion](#future-expansion)
- [Test Plan](#test-plan)
- [Controller Unit Tests](#controller-unit-tests)
- [Kube-Proxy Unit Tests](#kube-proxy-unit-tests)
- [e2e Tests](#e2e-tests)
- [Unit tests](#unit-tests)
- [Controller Unit Tests](#controller-unit-tests)
- [Kube-Proxy Unit Tests](#kube-proxy-unit-tests)
- [Integration tests](#integration-tests)
- [e2e tests](#e2e-tests)
- [Observability](#observability)
- [Events](#events)
- [Logic](#logic)
- [Sample Events](#sample-events)
- [Documentation](#documentation)
- [Limitations](#limitations)
- [Graduation Criteria](#graduation-criteria)
- [Version Skew Strategy](#version-skew-strategy)
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
Expand Down Expand Up @@ -368,12 +375,18 @@ In the future we may expand this functionality if needed. This could include:

- A new `RequireZone` algorithm that would keep endpoints in EndpointSlices for
the same zone they are in.
- A new option to specify a minimum threshold for the `PreferZone` approach.
- A new option to specify a minimum threshold for the `Auto` (PreferZone)
approach.
- Support for region based hints.

### Test Plan

#### Controller Unit Tests
#### Unit tests

- `k8s.io/pkg/controller/endpointslice`: `2022-10-05` - `73.1`
- `k8s.io/pkg/controller/endpointslice/topologycache`: `2022-10-05` - `75.4`

##### Controller Unit Tests
| Test Description | Expected Result |
| :--- | :--- |
| Feature On, 2+ zones | Hints set |
Expand All @@ -393,31 +406,29 @@ In the future we may expand this functionality if needed. This could include:
| Endpoint additions that require redistribution | Hints updated |
| Endpoint removals that require redistribution | Hints updated |

#### Kube-Proxy Unit Tests
##### Kube-Proxy Unit Tests
| Test Description | Expected Result |
| :--- | :--- |
| Feature On, hints matching zone | Endpoints filtered |
| Feature On, ExternalTrafficPolicy == 'Local', hints matching zone | Endpoints not filtered |
| Feature Off, hints matching zone | Endpoints not filtered |
| Feature On, no hints matching zone | Endpoints not filtered |

### e2e Tests
This represents the largest and most uncertain part of the testing effort. We
need to find a way to run e2e tests on multizone clusters. To limit flakiness,
those clusters likely need to have a consistent distribution of nodes across
zones. This will enable us to write predictable tests for topology aware
routing.
#### Integration tests

N/A

At a minimum, we likely want the following test:
#### e2e tests

- 3 zone cluster, with 1 equivalent node per zone
- Deploy a single pod to each node with a daemonset
- Create a Service that targets that daemonset
- Make requests from each zone and ensure that the request is routed to a pod in
the same zone
This feature has e2e test coverage with the ["Topology Hints"
test](https://github.com/kubernetes/kubernetes/blob/fbb6ccc0c62d2431dc3a18a4130d7fbae5c05acd/test/e2e/network/topology_hints.go#L43).
This is currently limited to a periodic run due to the nature of requiring a
multizone cluster to run. It has been [remarkably
stable](https://testgrid.k8s.io/sig-network-kind#sig-network-kind,%20multizone)
with 100% green runs.

We'll likely need more tests to properly vet this feature, but this one should
be straightforward to write and unlikely to be flaky.
As a prerequisite for GA, we will ensure that this test runs as a presubmit
if any code changes in kube-proxy or the EndpointSlice controller.

### Observability
We can reuse some of the metrics of EndpointSlice Controller that we already
Expand Down Expand Up @@ -454,13 +465,74 @@ EndpointSliceSyncs = metrics.NewCounterVec(

```

### Events
A common point of frustration among initial users of this feature was how
difficult it was to tell if this feature was enabled and working as intended.
Due to the nature of this design, even when a user opts in to the `Auto` mode
that is no guarantee that the controller logic will determine that there are
a sufficient number of endpoints to allocate them proportionally to each zone
in the cluster.

To make this feature easier to understand and use, the EndpointSlice controller
will publish events for a Service to describe if the feature has been enabled,
and if not, why not.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not a ServiceStatus field? (The samples below really sound like a Condition.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today the endpoint-slice controller doesn't have RBAC access to Service, and ServiceStatus doesn't have a Conditions field. I am not against either, per se, but it's worth considering. It DOES feel like a condition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree this feels like a condition. I wanted to add that, but I also didn't want this to be dependent on a change to Service API.


#### Logic

The EndpointSlice controller will track the known state of this feature for
each Service. When that state or the reason for it changes, the EndpointSlice
controller will publish a new Event to reflect the updated status of this
feature.

#### Sample Events

| Type | Reason | Message |
|-|-|-|
| Normal | TopologyAwareRoutingEnabled | Topology Aware Routing has been enabled |
| Normal | TopologyAwareRoutingDisabled | Topology Aware Routing configuration was removed |
| Warning | TopologyAwareRoutingDisabled | Insufficient number of Endpoints (n), impossible to safely allocate proportionally |
| Warning | TopologyAwareRoutingDisabled | 1 or more Endpoints do not have a Zone specified |
| Warning | TopologyAwareRoutingDisabled | 1 or more Nodes do not have allocatable CPU specified |
| Warning | TopologyAwareRoutingDisabled | Nodes only ready in 1 zone |

#### Documentation

The Topology Aware Hints documentation will be updated to describe the reason
each of these events may have been triggered, along with steps that can be taken
to recover from that state.

#### Limitations

Although the events described above should dramatically simplify the use of this
feature, there is a tiny edge case that will not be covered. If any
EndpointSlices for a Service do not include Hints, Kube-Proxy will not implement
this feature. This would happen if a user created custom EndpointSlices _and_
enabled Topology Aware Hints _and_ failed to set Hints on their custom
EndpointSlices. This seems very unlikely, but is mentioned here for the sake of
completeness.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EndpointSlice controller sees events even for EndpointSlices that it doesn't manage. It could check for this situation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that we could, but I'm worried about loosening our filtering here: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/endpointslice/endpointslice_controller.go#L357-L361. It seems like a place that we could easily break things if we were not careful. I'm not sure that this particular edge case justifies that risk.


### Graduation Criteria
**Alpha:**
- Basic functionality covered with unit tests described above.

**Beta:**
- Tests expanded to include e2e coverage described above.

**GA:**
- Feedback from real world usage shows that feature is working as intended
- Events are triggered on each Service to provide users with clear information
on when the feature transitioned between enabled and disabled states.
- Test coverage in EndpointSlice strategy to ensure that the Hints field is
dropped when the feature gate is not enabled.
- Test coverage in EndpointSlice controller for the transition from enabled to
disabled.
- Ensure that existing Topology Hints e2e test runs as a presubmit if any code
changes in kube-proxy or the EndpointSlice controller.
- Topology Hints e2e tests will graduate to conformance tests.
- Autoscaling and Scheduling SIGs have a plan to provide zone aware autoscaling
(and scheduling) that allows users to proportionally distribute endpoints
across zones.
Comment on lines +532 to +534
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that you're claiming we're going to make GA in 1.26, that would imply that these plans already exist, in which case can you link to them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how this process works. I'd thought that graduation criteria had to be met prior to making the change in upstream k8s OSS, not before the KEP updates including the graduation criteria merged. I will say that it will be quite a stretch to get this to GA in 1.26, I just want to make sure we're making real progress towards these goals in the 1.26 release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, true, the criteria don't have to be met yet, but it seems a little sketchy to claim we're planning to make GA for 1.26 when that depends on other teams doing work which isn't started yet...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danwinship I'm not tied to this getting to GA in 1.26, that does seem fairly unlikely at our current pace. Would you feel better if we set the GA target date as 1.27 in this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if that needs to block this - Dan, you want to make the call?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if we don't claim 1.26 now then we're blocked from doing it later? So if there's any chance it might happen then I guess we want to say that...


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commenting here, since can't comment on unchanged line:

L563: Two things that are missing to me in tests are:

(a) the strategy-level test (like this: https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3R246-R282 ) that is ensuring that dropping fields is working as expected

(b) Disablement test in endpoint-slice controller, sth like:

  • EPS with hints exist, EPS-controller is started with FG disabled, ensure that all EPS are rewritten without hints now

L581: Please update (I hope that some manual upgrade test were run, so please briefly describe it in the text - what setup and what you did)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wojtek-t! Addressed both of these with latest push (and added to graduation criteria).

### Version Skew Strategy
This KEP requires updates to both the EndpointSlice Controller and kube-proxy.
Thus there could be two potential version skew scenarios:
Expand Down Expand Up @@ -499,8 +571,13 @@ enabled even if the annotation has been set on the Service.
EndpointSlices for Services that have this feature enabled.

* **Are there any tests for feature enablement/disablement?**
This feature is not yet implemented but per-Service enablement/disablement is
covered in depth as part of the test plan.
Per Service enablement and disablement is covered in depth by unit tests. As a
prerequisite for graduation to GA, we will also add the following:

- Test coverage in EndpointSlice strategy to ensure that the Hints field is
dropped when the feature gate is not enabled.
- Test coverage in EndpointSlice controller for the transition from enabled to
disabled.

### Rollout, Upgrade and Rollback Planning

Expand All @@ -517,8 +594,10 @@ enabled even if the annotation has been set on the Service.
with before the feature was enabled.

* **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
This feature is not yet implemented but per-Service enablement/disablement is
covered in depth as part of the test plan.
Per-Service enablement/disablement is covered in depth and feature gate
enablement and disablement will be covered before the feature graduates to GA.
In addition, manual testing covering combinations of
upgrade->downgrade->upgrade cycles will be completed prior to GA graduation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too late now, but we should have done that before Beta (which enabled that by default).
Anyway - let's that that testing before graduating to GA and update the KEP with the results.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, will do. I'm pretty sure I did this kind of manual testing, but it was more than a year ago and I clearly failed to record any results in the GEP.


* **Is the rollout accompanied by any deprecations and/or removals of features,
APIs, fields of API types, flags, etc.?**
Expand Down Expand Up @@ -567,6 +646,10 @@ enabled even if the annotation has been set on the Service.
additional calls to the EndpointSlice API, but expect the increase to be
minimal.

The EndpointSlice controller will begin publishing Events for each Service
that has opted in to this feature when this transitions between enablement
states.

* **Will enabling / using this feature result in introducing new API types?**
No.

Expand Down Expand Up @@ -612,6 +695,13 @@ enabled even if the annotation has been set on the Service.

- KEP Merged: February 2021
- Alpha release: Kubernetes 1.21
- Beta Release: Kubernetes 1.23[^1]
- Feature Gate on-by default, feature available by default: 1.24
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally beta implies "feature gate on by default"... did we just declare beta in 1.23 but leave it disabled? Was that an accident or by design? Might be worth explaining just for clarity...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely an accident, and realized a few days too late :(. Will add a note.


[^1]: This was intended to also flip the feature gate to enabled by default, but
unfortunately that part was missed in 1.23. See
[#108747](https://github.com/kubernetes/kubernetes/pull/108747) for more
information.

## Drawbacks
1. Increased complexity in EndpointSlice controller
Expand Down
6 changes: 3 additions & 3 deletions keps/sig-network/2433-topology-aware-hints/kep.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,18 +23,18 @@ replaces:
- "github.com/kubernetes/enhancements/tree/master/keps/sig-network/536-topology-aware-routing"

# The target maturity stage in the current dev cycle for this KEP.
stage: beta
stage: stable

# The most recent milestone for which work toward delivery of this KEP has been
# done. This can be the current (upcoming) milestone, if it is being actively
# worked on.
latest-milestone: "v1.23"
latest-milestone: "v1.26"

# The milestone at which this feature was, or is targeted to be, at each stage.
milestone:
alpha: "v1.21"
beta: "v1.23"
stable: "v1.25"
stable: "v1.26"

# The following PRR answers are required at alpha release
# List the feature gate name and the components for which it must be enabled
Expand Down