KEP-2086: promote ServiceInternalTrafficPolicy to GA in v1.26 #3509

andrewsykim · 2022-09-13T14:56:27Z

Signed-off-by: Andrew Sy Kim [email protected]

One-line PR description:

Updates ServiceInternalTrafficPolicy feature to GA in v1.26. Feature has been in Beta since v1.22 and was blocked on a metric that was added in v1.24.

Issue link: Service Internal Traffic Policy #2086

Other comments:

andrewsykim · 2022-09-13T15:01:13Z

/assign @thockin @danwinship @wojtek-t

danwinship · 2022-09-21T14:35:38Z

I am still really really dubious about internal traffic policy.

The primary user story (DNS) is IMNSHO completely wrong and ought to be removed from the KEP, because nobody actually wants "DNS requests should either stay local or else fail". They want "DNS requests should stay local when possible, but should still work no matter what". And internal traffic policy can't do that. (Node-level topology can.)

The second user story is... weird. Is anybody actually using internal traffic policy that way?

Do we have any evidence of anyone using internal traffic policy in the real world in a way other than to make their DNS fast-but-fragile?

andrewsykim · 2022-09-21T15:19:12Z

The primary user story (DNS) is IMNSHO completely wrong and ought to be removed from the KEP, because nobody actually wants "DNS requests should either stay local or else fail". They want "DNS requests should stay local when possible, but should still work no matter what". And internal traffic policy can't do that. (#3293 can.)

The "PreferLocal" case was one of the main motivations for the field in the first place. I don't feel too strongly about how we achieve this (either internalTrafficPolicy: PreferLocal or topology based), but I agree it's something that should be addressed. I missed your KEP and will review it soon.

The second user story is... weird. Is anybody actually using internal traffic policy that way?

Do we have any evidence of anyone using internal traffic policy in the real world in a way other than to make their DNS fast-but-fragile?

The use-cases around "Local" are still valid I think. The way I've been thinking about it, is that in many cases where a Pod would need to use downward API to get a host's IP, you can do more elegantly using a static cluster IP that deterministically routes to the local host from any Pod. Pushing logs/metrics to a local daemon is just one example from the 2nd user story but you can imagine other use-cases for this.

Kubernetes also supports pod affinity rules that allow pods to be co-scheduled on the same node so it's not a stretch to think someone would also want those pods to strictly talk to each other locally via a Service. But even for this case, you could achieve that with node-level topology. At the time we went with a distinct field because the assumption was that topology was going to be strictly zone/regional (hence the addition of nodeName in v1). TBH I forget why we decided topology was zone/region only, maybe it was too expensive to generate hints at the granularity of nodes?

Anyways, I'll let others chime in on use-cases to avoid confirmation bias here :)

avoltz · 2022-09-22T18:36:40Z

Pushing logs/metrics to a local daemon is just one example from the 2nd user story but you can imagine other use-cases for this.

I agree with this use case mentioned by @andrewsykim. This local-only routing has been useful in my development. PreferLocal sounds like a useful option, but I would rather leave that up to the user to choose and try to guide via documentation.

In my use, traffic is not relevant off node.

danwinship · 2022-09-26T19:45:03Z

Pushing logs/metrics to a local daemon is just one example from the 2nd user story but you can imagine other use-cases for this.

The multi-cluster folks were talking about services where the same service IP resolves to different things in different clusters, and that sounds like pretty much the same thing. (But didn't we dislike that idea?) But then that makes me wonder if this should exist at the zone level too. (It's not quite the same as Topology Aware Hints because there the assumption is that the service is identical everywhere, not different in each zone.)

At any rate, this is very very different from what externalTrafficPolicy: Local means, so even if we want this feature, maybe it shouldn't be called internalTrafficPolicy: Local.

(Another example of how it is weird: it seems like any Service that has this "different instance on each node" behavior would have to be type: ClusterIP... because what would it mean to have a type: LoadBalancer, internalTrafficPolicy: Local service where the endpoints on different nodes did different things, but you can't control which endpoint you hit? So maybe it should really just be a new service type?)

But even for this case, you could achieve that with node-level topology

Well, no, the node-level topology proposal (like the existing zone-level topology) has the rule that if you can't hit a node-level endpoint, you fall back to hitting a remote endpoint (because that's what you want for, eg, the DNS case). So it won't work for something where it's mandatory that you talk to a local endpoint.

danwinship · 2022-09-26T19:52:29Z

PreferLocal sounds like a useful option, but I would rather leave that up to the user to choose and try to guide via documentation.

Note that the features are not equivalent: internalTrafficPolicy: Local says "traffic should be sent to a local endpoint, and if there are no local endpoints then the traffic should be dropped". Node-local topology (previously internalTrafficPolicy: Local) says "traffic should be sent to a local endpoint, and if there are no local endpoints then the traffic should be sent to a remote endpoint".

I agree with this use case mentioned by @andrewsykim. This local-only routing has been useful in my development.

Could you describe in more detail what you are doing?

Especially, I am concerned about the "if there are no local endpoints then the traffic should be dropped". It seems to me that that means that internalTrafficPolicy: Local would only be useful if one of:

The service has at least 2 endpoints per node, so that you can roll out updates to it without causing an outage, OR
The cluster is basically "static", and the service will never be updated; if it was necessary to change something about the service, you would just deploy a new cluster instead, OR
The service is non-critical and so you don't care if it is sometimes unavailable to the clients.

avoltz · 2022-09-27T20:33:56Z

Note that the features are not equivalent: internalTrafficPolicy: Local says "traffic should be sent to a local endpoint, and if there are no local endpoints then the traffic should be dropped". Node-local topology (previously internalTrafficPolicy: Local) says "traffic should be sent to a local endpoint, and if there are no local endpoints then the traffic should be sent to a remote endpoint".

That makes sense. I think both behaviors can be useful.

Could you describe in more detail what you are doing?

Especially, I am concerned about the "if there are no local endpoints then the traffic should be dropped". It seems to me that that means that internalTrafficPolicy: Local would only be useful if one of:

The service is non-critical and so you don't care if it is sometimes unavailable to the clients.

I think this is closest to my case. In my case, the client will tolerate the outage on the node, but the state in the service currently requires the local connection.

I understand it's a less common mode, but I've found this beta feature useful as-is. The docs seemed clear about this availability effect.

andrewsykim · 2022-09-29T19:47:59Z

I removed the use-case around DNS per the SIG Network call earlier today. I also updated the Test Plan section to use the new format with links to existing tests and added steps for (manual) rollback testing.

@danwinship @avoltz PTAL.

avoltz · 2022-09-30T19:02:24Z

Looks good @andrewsykim thanks.

There are some integration tests related to ITP in test/integration/service/service_test.go

Should these be mentioned in the integration tests section?

ExternalNameServiceStopsDefaultingInternalTrafficPolicy
ExternalNameServiceDropsInternalTrafficPolicy
ConvertingToExternalNameServiceDropsInternalTrafficPolicy

andrewsykim · 2022-09-30T19:44:15Z

Good catch! Updated the test plan to include those tests

thockin · 2022-10-03T23:31:50Z

The service has at least 2 endpoints per node, so that you can roll out updates to it without causing an outage, OR

DaemonSet does support MaxSurge and MaxUnavailable, so you can effect this (unless there's some corner case I am missing?)

The service is non-critical and so you don't care if it is sometimes unavailable to the clients.

For many applications, retry-until-it-works is OK.

so even if we want this feature, maybe it shouldn't be called internalTrafficPolicy: Local

I am not in love with the name, for sure, but given the analogy of eTP it seems better than picking a whole new name, no? If there were no nodes, how would we describe this? Would we bend it to mean "same zone" or just make it not work?

I'm LGTM on this change, but I hate to feel like a steamroller - Dan do you want to debate more?

andrewsykim · 2022-10-03T23:45:58Z

I am not in love with the name, for sure, but given the analogy of eTP it seems better than picking a whole new name, no? If there were no nodes, how would we describe this? Would we bend it to mean "same zone" or just make it not work?

Agreed with this sentiment around naming, but I am also happy to debate further better names. Now is the last chance to change it :)

wojtek-t

@andrewsykim - just some minor comments from me - other than that it LGTM

wojtek-t · 2022-10-04T07:37:04Z

keps/sig-network/2086-service-internal-traffic-policy/README.md

 * **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**

-No, but this will be manually tested prior to beta. Automated testing will be done if the test tooling is available.
+#### API Enablement Rollback


Sorry can't comment on unchanged lines so commenting here:

L248: Have those metrics been added? Can you please update this answer?
[I think it's sync_proxy_rules_no_local_endpoints_total - if so, please just update the answer]

Thanks, updated

wojtek-t · 2022-10-04T07:37:27Z

keps/sig-network/2086-service-internal-traffic-policy/README.md


 * Check Service to see if `internalTrafficPolicy` is set to `Local`.
-* A per-node "blackhole" metric will be added to kube-proxy which represent Services that are being intentionally dropped (internalTrafficPolicy=Local and no endpoints). The metric will be named `kubeproxy/sync_proxy_rules/blackhole_total` (subject to rename).
+* A per-node "blackhole" metric will be added to kube-proxy which represent Services that are being intentionally dropped (internalTrafficPolicy=Local and no endpoints). The metric will be named `kubeproxy/sync_proxy_rules_no_endpoints_total` (subject to rename).


I think it's sync_proxy_rules_no_local_endpoints_total instead?
If so, can you please update (and remove "subject to rename :) )

Also L570: I think this metrics was already added, right?

Thanks, updated

danwinship · 2022-10-04T12:59:46Z

DaemonSet does support MaxSurge and MaxUnavailable, so you can effect this (unless there's some corner case I am missing?)

Ah, I hadn't seen MaxSurge before. (MaxUnavailable doesn't help AFAICT.)

I am not in love with the name, for sure, but given the analogy of eTP it seems better than picking a whole new name, no?

No, it's a false analogy. That's why I don't like the name. The two features have the same name because they're the same to us (the people hacking on kube-proxy), but they are completely different and unrelated from the perspective of actual users of the feature. (eTP:Local = "preserve client IP", iTP:Local = "only send to local endpoints"). Of course, the problem is that "Local" is actually a good description of iTP:Local, and it's eTP:Local that has the bad name...

I'm LGTM on this change, but I hate to feel like a steamroller - Dan do you want to debate more?

No, my primary objection was claiming that DNS was a good use case for the feature, and the KEP no longer claims that. So, OK.

Signed-off-by: Andrew Sy Kim <[email protected]>

thockin · 2022-10-05T00:15:23Z

No, it's a false analogy. That's why I don't like the name. The two features have the same name because they're the same to us (the people hacking on kube-proxy), but they are completely different and unrelated from the perspective of actual users of the feature. (eTP:Local = "preserve client IP", iTP:Local = "only send to local endpoints"). Of course, the problem is that "Local" is actually a good description of iTP:Local, and it's eTP:Local that has the bad name...

A different read: "Local" means "avoid extra network hops". eTP:Local doesn't make sense at all for LBs that don't boounce through nodes (direct to pod), so yeah - the error lies there, I think :)

thockin · 2022-10-05T00:15:36Z

Thanks!

/lgtm
/approve

wojtek-t · 2022-10-05T06:25:40Z

/label tide/merge-method-squash

/lgtm
/approve PRR

Thanks!

k8s-ci-robot · 2022-10-05T06:36:55Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andrewsykim, thockin, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/prod-readiness/OWNERS~~ [wojtek-t]
~~keps/sig-network/OWNERS~~ [thockin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…etes#3509) * KEP-2086: promote ServiceInternalTrafficPolicy to GA in v1.26 Signed-off-by: Andrew Sy Kim <[email protected]> * KEP-2086: document manual validation for rollback Signed-off-by: Andrew Sy Kim <[email protected]> * KEP-2086: use new test plan format, including links to tests Signed-off-by: Andrew Sy Kim <[email protected]> * KEP-2086: remove use-case about DNS optimization Signed-off-by: Andrew Sy Kim <[email protected]> * KEP-2086: fix metric name for sync_proxy_rules_no_local_endpoints_total Signed-off-by: Andrew Sy Kim <[email protected]> Signed-off-by: Andrew Sy Kim <[email protected]>

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 13, 2022

k8s-ci-robot requested review from caseydavenport and thockin September 13, 2022 14:56

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/network Categorizes an issue or PR as relevant to SIG Network. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Sep 13, 2022

k8s-ci-robot assigned danwinship, thockin and wojtek-t Sep 13, 2022

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 22, 2022

andrewsykim force-pushed the kep-2086 branch from de8672f to 93fc00f Compare September 22, 2022 23:58

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 22, 2022

andrewsykim force-pushed the kep-2086 branch from 93fc00f to 4fae202 Compare September 26, 2022 14:16

andrewsykim force-pushed the kep-2086 branch from 4fae202 to 86457b6 Compare September 27, 2022 14:14

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Sep 27, 2022

andrewsykim force-pushed the kep-2086 branch 3 times, most recently from 8af5fc0 to 2bd8de4 Compare September 27, 2022 15:06

rhockenbury mentioned this pull request Sep 27, 2022

Service Internal Traffic Policy #2086

Closed

andrewsykim force-pushed the kep-2086 branch from 2bd8de4 to ec4ffb3 Compare September 29, 2022 19:43

andrewsykim force-pushed the kep-2086 branch from d731b2c to ea4a150 Compare September 29, 2022 19:49

andrewsykim force-pushed the kep-2086 branch from ea4a150 to b3b989e Compare September 30, 2022 19:44

wojtek-t reviewed Oct 4, 2022

View reviewed changes

andrewsykim added 4 commits October 4, 2022 20:06

KEP-2086: promote ServiceInternalTrafficPolicy to GA in v1.26

14f29e4

Signed-off-by: Andrew Sy Kim <[email protected]>

KEP-2086: document manual validation for rollback

0a50d43

Signed-off-by: Andrew Sy Kim <[email protected]>

KEP-2086: use new test plan format, including links to tests

3b00e3d

Signed-off-by: Andrew Sy Kim <[email protected]>

KEP-2086: remove use-case about DNS optimization

b8f2055

Signed-off-by: Andrew Sy Kim <[email protected]>

andrewsykim force-pushed the kep-2086 branch from b3b989e to 68ca957 Compare October 5, 2022 00:10

KEP-2086: fix metric name for sync_proxy_rules_no_local_endpoints_total

49afc73

Signed-off-by: Andrew Sy Kim <[email protected]>

andrewsykim force-pushed the kep-2086 branch from 68ca957 to 49afc73 Compare October 5, 2022 00:11

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 5, 2022

k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Oct 5, 2022

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 5, 2022

k8s-ci-robot merged commit a8590f3 into kubernetes:master Oct 5, 2022

k8s-ci-robot added this to the v1.26 milestone Oct 5, 2022

KEP-2086: promote ServiceInternalTrafficPolicy to GA in v1.26 #3509

KEP-2086: promote ServiceInternalTrafficPolicy to GA in v1.26 #3509

Uh oh!

Conversation

andrewsykim commented Sep 13, 2022

Uh oh!

andrewsykim commented Sep 13, 2022

Uh oh!

danwinship commented Sep 21, 2022

Uh oh!

andrewsykim commented Sep 21, 2022

Uh oh!

avoltz commented Sep 22, 2022

Uh oh!

danwinship commented Sep 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danwinship commented Sep 26, 2022

Uh oh!

avoltz commented Sep 27, 2022

Uh oh!

andrewsykim commented Sep 29, 2022

Uh oh!

avoltz commented Sep 30, 2022

Uh oh!

andrewsykim commented Sep 30, 2022

Uh oh!

thockin commented Oct 3, 2022

Uh oh!

andrewsykim commented Oct 3, 2022

Uh oh!

wojtek-t left a comment

Choose a reason for hiding this comment

Uh oh!

wojtek-t Oct 4, 2022

Choose a reason for hiding this comment

Uh oh!

andrewsykim Oct 5, 2022

Choose a reason for hiding this comment

Uh oh!

wojtek-t Oct 4, 2022

Choose a reason for hiding this comment

Uh oh!

wojtek-t Oct 4, 2022

Choose a reason for hiding this comment

Uh oh!

andrewsykim Oct 5, 2022

Choose a reason for hiding this comment

Uh oh!

danwinship commented Oct 4, 2022

Uh oh!

thockin commented Oct 5, 2022

Uh oh!

thockin commented Oct 5, 2022

Uh oh!

wojtek-t commented Oct 5, 2022

Uh oh!

k8s-ci-robot commented Oct 5, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

danwinship commented Sep 26, 2022 •

edited

Loading