You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Components depending on the feature gate: kube-apiserver and kube-controller-manager
186
+
187
+
###### Does enabling the feature change any default behavior?
188
+
189
+
Yes, terminating endpoints are now included as part of EndpointSlice API. The `ready` condition of an endpoint will always be `false` to ensure consumers do not send traffic to terminating endpoints unless the new conditions `serving` and `terminating` are checked.
190
+
191
+
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
192
+
193
+
Yes. On rollback, terminating endpoints will no longer be included in EndpointSlice and the `terminating` and `serving` conditions will not be set.
194
+
195
+
###### What happens if we reenable the feature if it was previously rolled back?
196
+
197
+
EndpointSlice will continue to have the `terminating` and `serving` condition set.
198
+
199
+
###### Are there any tests for feature enablement/disablement?
200
+
201
+
Yes, there will be integration and e2e tests validating whether EndpointSlice contains endpoints for pods that are terminating.
202
+
203
+
### Rollout, Upgrade and Rollback Planning
204
+
205
+
###### How can a rollout fail? Can it impact already running workloads?
206
+
207
+
If there are consumers of EndpointSlice that do not check the `ready` condition, then they may unexpectedly start sending traffic to terminating endpoints.
208
+
It is assumed that almost all consumers of EndpointSlice check the `ready` condition prior to allowing traffic to a pod.
209
+
210
+
###### What specific metrics should inform a rollback?
211
+
212
+
Application-level traffic indicating packet-loss or error rates.
213
+
214
+
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
215
+
216
+
Not yet, but manual upgrade and rollback testing will be done prior to graduating the feature to Beta.
217
+
218
+
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
219
+
220
+
No.
221
+
222
+
### Monitoring Requirements
223
+
224
+
###### How can an operator determine if the feature is in use by workloads?
225
+
226
+
The condition will always be set for terminating pods but consumers may choose to ignore them. It is up to consumers of the API to provide metrics
227
+
on how the new conditions are being used.
228
+
229
+
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
230
+
231
+
N/A
232
+
233
+
###### What are the reasonable SLOs (Service Level Objectives) for the above SLIs?
234
+
235
+
N/A
236
+
237
+
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
238
+
239
+
N/A
240
+
241
+
### Dependencies
242
+
243
+
###### Does this feature depend on any specific services running in the cluster?
244
+
245
+
N/A
246
+
247
+
### Scalability
248
+
249
+
###### Will enabling / using this feature result in any new API calls?
250
+
251
+
Yes, there will be more writes to EndpointSlice for every pod when it begins terminating.
252
+
253
+
###### Will enabling / using this feature result in introducing new API types?
254
+
255
+
No.
256
+
257
+
###### Will enabling / using this feature result in any new calls to the cloud provider?
258
+
259
+
No.
260
+
261
+
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
262
+
263
+
Yes, it will increase the size of EndpointSlice by adding two boolean fields for each endpoint.
264
+
265
+
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
266
+
267
+
No.
268
+
269
+
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
270
+
271
+
More writes to EndpointSlice could result in more resource usage from etcd disk IO and network bandwidth for all watchers.
272
+
273
+
### Troubleshooting
274
+
275
+
###### How does this feature react if the API server and/or etcd is unavailable?
276
+
277
+
EndpointSlice conditions will get stale.
278
+
279
+
###### What are other known failure modes?
280
+
281
+
* Consumers of EndpointSlice that do not not check the `ready` condition may unexpectedly use terminating endpoints.
282
+
283
+
###### What steps should be taken if SLOs are not being met to determine the problem?
284
+
285
+
* Disable the feature gate
286
+
* Check if consumers of EndpointSlice are using the serving or termianting condition
287
+
* Check etcd disk usage
288
+
151
289
## Implementation History
152
290
153
291
-[x] 2020-04-23: KEP accepted as implementable for v1.19
292
+
-[x] 2020-07-01: initial PR with alpha imlementation merged for v1.20
293
+
-[x] 2020-05-12: KEP accepted as implementable for v1.22
0 commit comments