You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Components depending on the feature gate: kube-apiserver and kube-controller-manager
186
+
187
+
###### Does enabling the feature change any default behavior?
188
+
189
+
Yes, terminating endpoints are now included as part of EndpointSlice API. The `ready` condition of an endpoint will always be `false` to ensure consumers do not send traffic to terminating endpoints unless the new conditions `serving` and `terminating` are checked.
190
+
191
+
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
192
+
193
+
Yes. On rollback, terminating endpoints will no longer be included in EndpointSlice and the `terminating` and `serving` conditions will not be set.
194
+
195
+
###### What happens if we reenable the feature if it was previously rolled back?
196
+
197
+
EndpointSlice will continue to have the `terminating` and `serving` condition set and terminating endpoints will be added to the endpointslice in it's next sync.
198
+
199
+
###### Are there any tests for feature enablement/disablement?
200
+
201
+
Yes, there will be strategy API unit tests validating if the new API field is allowed based on the feature gate.
202
+
203
+
### Rollout, Upgrade and Rollback Planning
204
+
205
+
###### How can a rollout fail? Can it impact already running workloads?
206
+
207
+
If there are consumers of EndpointSlice that do not check the `ready` condition, then they may unexpectedly start sending traffic to terminating endpoints.
208
+
It is assumed that almost all consumers of EndpointSlice check the `ready` condition prior to allowing traffic to a pod.
209
+
210
+
###### What specific metrics should inform a rollback?
211
+
212
+
Application-level traffic indicating packet-loss or error rates.
213
+
214
+
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
215
+
216
+
Not yet, but manual upgrade and rollback testing will be done prior to graduating the feature to Beta.
217
+
218
+
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
219
+
220
+
No.
221
+
222
+
### Monitoring Requirements
223
+
224
+
###### How can an operator determine if the feature is in use by workloads?
225
+
226
+
The condition will always be set for terminating pods but consumers may choose to ignore them. It is up to consumers of the API to provide metrics
227
+
on how the new conditions are being used.
228
+
229
+
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
230
+
231
+
Metrics will be added for total endpoints with the `serving` and `terminating` condition set.
232
+
233
+
###### What are the reasonable SLOs (Service Level Objectives) for the above SLIs?
234
+
235
+
N/A
236
+
237
+
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
238
+
239
+
N/A
240
+
241
+
### Dependencies
242
+
243
+
###### Does this feature depend on any specific services running in the cluster?
244
+
245
+
N/A
246
+
247
+
### Scalability
248
+
249
+
###### Will enabling / using this feature result in any new API calls?
250
+
251
+
Yes, there will be more writes to EndpointSlice when:
252
+
* a pod starts termination
253
+
* a pod's readiness changes during termination
254
+
255
+
###### Will enabling / using this feature result in introducing new API types?
256
+
257
+
No.
258
+
259
+
###### Will enabling / using this feature result in any new calls to the cloud provider?
260
+
261
+
No.
262
+
263
+
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
264
+
265
+
Yes, it will increase the size of EndpointSlice by adding two boolean fields for each endpoint.
266
+
267
+
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
268
+
269
+
The networking programming latency SLO might be impacted due to additional writes to EndpointSlice.
270
+
271
+
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
272
+
273
+
More writes to EndpointSlice could result in more resource usage from etcd disk IO and network bandwidth for all watchers.
274
+
275
+
### Troubleshooting
276
+
277
+
###### How does this feature react if the API server and/or etcd is unavailable?
278
+
279
+
EndpointSlice conditions will get stale.
280
+
281
+
###### What are other known failure modes?
282
+
283
+
* Consumers of EndpointSlice that do not not check the `ready` condition may unexpectedly use terminating endpoints.
284
+
285
+
###### What steps should be taken if SLOs are not being met to determine the problem?
286
+
287
+
* Disable the feature gate
288
+
* Check if consumers of EndpointSlice are using the serving or termianting condition
289
+
* Check etcd disk usage
290
+
151
291
## Implementation History
152
292
153
293
-[x] 2020-04-23: KEP accepted as implementable for v1.19
294
+
-[x] 2020-07-01: initial PR with alpha imlementation merged for v1.20
295
+
-[x] 2020-05-12: KEP accepted as implementable for v1.22
0 commit comments