-
Notifications
You must be signed in to change notification settings - Fork 1.6k
KEP-2086: promote ServiceInternalTrafficPolicy to GA in v1.26 #3509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/assign @thockin @danwinship @wojtek-t |
|
I am still really really dubious about internal traffic policy. The primary user story (DNS) is IMNSHO completely wrong and ought to be removed from the KEP, because nobody actually wants "DNS requests should either stay local or else fail". They want "DNS requests should stay local when possible, but should still work no matter what". And internal traffic policy can't do that. (Node-level topology can.) The second user story is... weird. Is anybody actually using internal traffic policy that way? Do we have any evidence of anyone using internal traffic policy in the real world in a way other than to make their DNS fast-but-fragile? |
The "PreferLocal" case was one of the main motivations for the field in the first place. I don't feel too strongly about how we achieve this (either internalTrafficPolicy: PreferLocal or topology based), but I agree it's something that should be addressed. I missed your KEP and will review it soon.
The use-cases around "Local" are still valid I think. The way I've been thinking about it, is that in many cases where a Pod would need to use downward API to get a host's IP, you can do more elegantly using a static cluster IP that deterministically routes to the local host from any Pod. Pushing logs/metrics to a local daemon is just one example from the 2nd user story but you can imagine other use-cases for this. Kubernetes also supports pod affinity rules that allow pods to be co-scheduled on the same node so it's not a stretch to think someone would also want those pods to strictly talk to each other locally via a Service. But even for this case, you could achieve that with node-level topology. At the time we went with a distinct field because the assumption was that topology was going to be strictly zone/regional (hence the addition of nodeName in v1). TBH I forget why we decided topology was zone/region only, maybe it was too expensive to generate hints at the granularity of nodes? Anyways, I'll let others chime in on use-cases to avoid confirmation bias here :) |
I agree with this use case mentioned by @andrewsykim. This local-only routing has been useful in my development. PreferLocal sounds like a useful option, but I would rather leave that up to the user to choose and try to guide via documentation. In my use, traffic is not relevant off node. |
de8672f to
93fc00f
Compare
93fc00f to
4fae202
Compare
The multi-cluster folks were talking about services where the same service IP resolves to different things in different clusters, and that sounds like pretty much the same thing. (But didn't we dislike that idea?) But then that makes me wonder if this should exist at the zone level too. (It's not quite the same as Topology Aware Hints because there the assumption is that the service is identical everywhere, not different in each zone.) At any rate, this is very very different from what (Another example of how it is weird: it seems like any Service that has this "different instance on each node" behavior would have to be
Well, no, the node-level topology proposal (like the existing zone-level topology) has the rule that if you can't hit a node-level endpoint, you fall back to hitting a remote endpoint (because that's what you want for, eg, the DNS case). So it won't work for something where it's mandatory that you talk to a local endpoint. |
Note that the features are not equivalent:
Could you describe in more detail what you are doing? Especially, I am concerned about the "if there are no local endpoints then the traffic should be dropped". It seems to me that that means that
|
4fae202 to
86457b6
Compare
8af5fc0 to
2bd8de4
Compare
That makes sense. I think both behaviors can be useful.
I think this is closest to my case. In my case, the client will tolerate the outage on the node, but the state in the service currently requires the local connection. I understand it's a less common mode, but I've found this beta feature useful as-is. The docs seemed clear about this availability effect. |
2bd8de4 to
ec4ffb3
Compare
|
I removed the use-case around DNS per the SIG Network call earlier today. I also updated the Test Plan section to use the new format with links to existing tests and added steps for (manual) rollback testing. @danwinship @avoltz PTAL. |
d731b2c to
ea4a150
Compare
|
Looks good @andrewsykim thanks. There are some integration tests related to ITP in test/integration/service/service_test.go Should these be mentioned in the integration tests section? ExternalNameServiceStopsDefaultingInternalTrafficPolicy |
ea4a150 to
b3b989e
Compare
|
Good catch! Updated the test plan to include those tests |
DaemonSet does support MaxSurge and MaxUnavailable, so you can effect this (unless there's some corner case I am missing?)
For many applications, retry-until-it-works is OK.
I am not in love with the name, for sure, but given the analogy of I'm LGTM on this change, but I hate to feel like a steamroller - Dan do you want to debate more? |
Agreed with this sentiment around naming, but I am also happy to debate further better names. Now is the last chance to change it :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewsykim - just some minor comments from me - other than that it LGTM
| * **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?** | ||
|
|
||
| No, but this will be manually tested prior to beta. Automated testing will be done if the test tooling is available. | ||
| #### API Enablement Rollback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry can't comment on unchanged lines so commenting here:
L248: Have those metrics been added? Can you please update this answer?
[I think it's sync_proxy_rules_no_local_endpoints_total - if so, please just update the answer]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated
|
|
||
| * Check Service to see if `internalTrafficPolicy` is set to `Local`. | ||
| * A per-node "blackhole" metric will be added to kube-proxy which represent Services that are being intentionally dropped (internalTrafficPolicy=Local and no endpoints). The metric will be named `kubeproxy/sync_proxy_rules/blackhole_total` (subject to rename). | ||
| * A per-node "blackhole" metric will be added to kube-proxy which represent Services that are being intentionally dropped (internalTrafficPolicy=Local and no endpoints). The metric will be named `kubeproxy/sync_proxy_rules_no_endpoints_total` (subject to rename). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's sync_proxy_rules_no_local_endpoints_total instead?
If so, can you please update (and remove "subject to rename :) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also L570: I think this metrics was already added, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated
Ah, I hadn't seen MaxSurge before. (MaxUnavailable doesn't help AFAICT.)
No, it's a false analogy. That's why I don't like the name. The two features have the same name because they're the same to us (the people hacking on kube-proxy), but they are completely different and unrelated from the perspective of actual users of the feature. (eTP:Local = "preserve client IP", iTP:Local = "only send to local endpoints"). Of course, the problem is that "Local" is actually a good description of iTP:Local, and it's eTP:Local that has the bad name...
No, my primary objection was claiming that DNS was a good use case for the feature, and the KEP no longer claims that. So, OK. |
Signed-off-by: Andrew Sy Kim <[email protected]>
Signed-off-by: Andrew Sy Kim <[email protected]>
Signed-off-by: Andrew Sy Kim <[email protected]>
Signed-off-by: Andrew Sy Kim <[email protected]>
Signed-off-by: Andrew Sy Kim <[email protected]>
A different read: "Local" means "avoid extra network hops". |
|
Thanks! /lgtm |
|
/label tide/merge-method-squash /lgtm Thanks! |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andrewsykim, thockin, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…etes#3509) * KEP-2086: promote ServiceInternalTrafficPolicy to GA in v1.26 Signed-off-by: Andrew Sy Kim <[email protected]> * KEP-2086: document manual validation for rollback Signed-off-by: Andrew Sy Kim <[email protected]> * KEP-2086: use new test plan format, including links to tests Signed-off-by: Andrew Sy Kim <[email protected]> * KEP-2086: remove use-case about DNS optimization Signed-off-by: Andrew Sy Kim <[email protected]> * KEP-2086: fix metric name for sync_proxy_rules_no_local_endpoints_total Signed-off-by: Andrew Sy Kim <[email protected]> Signed-off-by: Andrew Sy Kim <[email protected]>
Signed-off-by: Andrew Sy Kim [email protected]
Updates ServiceInternalTrafficPolicy feature to GA in v1.26. Feature has been in Beta since v1.22 and was blocked on a metric that was added in v1.24.