-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Add KEP for kubectl debug #1204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I published an example plugin at https://github.com/verb/kubectl-debug |
|
A demo of the plugin is http://bit.ly/kubectl-debug-demo |
|
|
||
| Options: | ||
| -a, --attach=true: Automatically attach to container once created. | ||
| -c, --container='': Container name. If omitted one will be chosen. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If omitted, the first container in the pod will be chosen is the default in other commands, eg. kubectl attach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| -a, --attach=true: Automatically attach to container once created. | ||
| -c, --container='': Container name. If omitted one will be chosen. | ||
| -i, --stdin=true: Pass stdin to the container | ||
| -m, --image='busybox': Container image to use for debug container. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-m might be confusing, I'd drop short for now. Also I'm inclined not to default this to anything, just require a user to always pass this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG, done.
| #### Operations | ||
|
|
||
| Alice runs a service "neato" that consists of a statically compiled Go binary | ||
| running in a minimal container image. One of the its pods is suddenly having |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the its pods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks.
| running in a minimal container image. One of the its pods is suddenly having | ||
| trouble connecting to an internal service. Being in operations, Alice wants to | ||
| be able to inspect the running pod without restarting it, but she doesn't | ||
| necessarily need to enter the container itself. She wants to: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That argument falls short, I'd prefer to use the same argument you used at the beginning, that she doesn't have the ability to enter the container b/c it's a scratch image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scenario I'm envisioning is attaching a debugger to capture state of a hard-to-reproduce bug when it occurs in production. Restarting the container will destroy the state.
|
|
||
| ### Implementation Details/Notes/Constraints | ||
|
|
||
| 1. There are no guaranteed resources for ad-hoc troubleshooting. If |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is rather undesirable effect, since it may cause an application downtime. I think we'll need to be explicit about that in the command's help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm even inclined to move this to risks section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to risks and added that we'll be able to mitigate this once support for resizing pods has been added.
| kubectl debug mypod --image=debian | ||
|
|
||
| Options: | ||
| -a, --attach=true: Automatically attach to container once created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered always attaching and removing the container when session ends, there's InterruptParent handler in https://github.com/kubernetes/kubernetes/blob/3ec5fe500d7a56be6ff6c15f916987eaa48c2e94/staging/src/k8s.io/kubectl/pkg/cmd/exec/exec.go#L136
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I didn't know about InterruptParent, that seems useful. We haven't agreed on the semantics of deleting/recreating ephemeral containers yet, but it's important to me that we support multiple sequential commands in some manner. For example:
$ kubectl debug $pod ls /tmp
$ kubectl debug $pod rm /tmp/bigfile
Choosing multiple ephemeral container names would be unfortunate.
Is InterruptParent invoked on the stream being interrupted, or only on signals received by kubectl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is
InterruptParentinvoked on the stream being interrupted, or only on signals received by kubectl?
Signal.
I have another question wrt attaching, is this to satisfy the use case where you can't use kubectl attach but debug with ephemeral container will work, is that correct?
| ### Container Namespace Targeting | ||
|
|
||
| In order to see processes running in other containers in the pod, [Process | ||
| Namespace Sharing] should be enabled for the pod. In cases where process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be called out both in command help and also in constraints section, w/o it kubectl debug won't be able to that much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make sure it's called out in command help (verb/kubectl-debug@64404a9, 1a39bf5) but I'm not sure it's a constraint since we will support targeting container namespaces. Based on feedback, here's how I think it will be used:
| shareProcessNamespace | container targeted? | See other processes? | Use case |
|---|---|---|---|
| true | - | yes | application troubleshooting |
| false | yes | yes | application troubleshooting |
| false | no | no | network troubleshooting |
Concrete example for network troubleshooting: kubedns wants to use distroless containers but wants to be able to run an ephemeral container to be able to dig against localhost.
|
@aylei you have a lot of expertise in this area, would you be interested in joining the effort to port your kubectl-debug plugin to kubectl proper? |
Yes, of course. I'm willing to assist in advancing the design and code implementation. However, I don't have much experience about the community process, could you please tell me how can I help now? |
|
@aylei Great! The first step is to iterate on the proposal in this PR until the approvers at the top agree that we've fleshed out enough of the details that the proposal is "implementable". I'd say we should start with:
|
kubectl debug
aylei
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest LGTM, I will try to expand the todo parts if you're not working on them now. 😊
| ``` | ||
| Examples: | ||
| # Create a copy of 'mypod' with the debugging image for container 'app' | ||
| kubectl debug mypod --copy-to=mypod-debug --image=myapp-image:debug --container=myapp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we override the command of the container 'myapp' via -- COMMAND?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good point. Added.
|
|
||
| Examples: | ||
| # Get network address configuration from pod mypod | ||
| kubectl debug mypod ip addr list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the image is required, this doesn't make sense, it's rather kubectl exec ip addr list, unless this is to run a command in the ephemeral container b/c the main is a distroless so we can't exec into it, in which case you'll probably want to add --attach=false too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, I neglected to update example when making image required. I would like to be able to set a default container image for debug, but I don't want to get hung up on it for v1.
--attach must be true because that's where the output of ip addr list will go. Unless we were to create the Ephemeral Container and do an exec rather than setting Container.Args to args. There would be some advantages to using exec this way, actually.
Rather than getting hung up on this, I've removed this example to focus on the interactive debugging use case for the first version.
| the copy to receive traffic from a service or a replicaset | ||
| to kill other pods. | ||
| --share-processes=true: When used with `--copy-to`, enable process namespace | ||
| sharing in the pod copy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add --node-name to be able to place the pod on a specific node, additionally this will be necessary in the node debug mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other options include --as-user --as-root, which would allow you to run as specific user or as root. --to-namespace would be also complementary to --copy-to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, we can have several others modifiers for the debug pod, including liveness and readiness probe (you might not care about those), as well as init containers, which might not be needed for debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about node-name, I'll add that. Or perhaps --same-node?
It would be useful to allow arbitrary edits prior to create. What do you think about an --edit that opens an editor similar to kubectl edit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: --node-name for node debug mode, I expected the node name to be a positional parameter like kubectl debug node/$node_name, but I haven't yet investigated the conventions for specifying resources on the command line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but node-name would be only for node debug, what if you want to land your pod on a specific node for other reasons.
| privileged pod constrained to this particular node and run in the host | ||
| namespaces. | ||
|
|
||
| TODO: how? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just create a pod, for a specific node, with HostNetwork, HostPid, HostPath set accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we don't need to have this rock solid, defined. We'll iterate as we go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, added a brief description.
|
|
||
| TODO: how? | ||
|
|
||
| TODO: Some CLI options only make sense for pods, and some only make sense for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not strongly convinced, but apiserver did something kubernetes/kubernetes#64517 that I was planning o implementing in kubectl, eventually. But let's start simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG, I'll remove for now.
| troubleshooting session might resemble: | ||
|
|
||
| ``` | ||
| % kubectl debug -it -m debian neato-5thn0 -- bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's -m here, and --image is missing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-m was the short code for --image once upon a time. 😄
Updated here and elsewhere.
| kubectl debug mypod --image=debian | ||
|
|
||
| Options: | ||
| -a, --attach=true: Automatically attach to container once created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is
InterruptParentinvoked on the stream being interrupted, or only on signals received by kubectl?
Signal.
I have another question wrt attaching, is this to satisfy the use case where you can't use kubectl attach but debug with ephemeral container will work, is that correct?
soltysh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few more comments here and there, mostly nits. Let's get them shaped and merge this asap.
| privileged pod constrained to this particular node and run in the host | ||
| namespaces. | ||
|
|
||
| TODO: how? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we don't need to have this rock solid, defined. We'll iterate as we go.
Co-Authored-By: Aylei <[email protected]>
- Expand command arguments in examples - Add description of how node debug works - Mark implementable
verb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soltysh Thanks for the review. I addressed the comments and also updated the status to "implementable" so we can begin implementing for 1.18.
| ``` | ||
| Examples: | ||
| # Create a copy of 'mypod' with the debugging image for container 'app' | ||
| kubectl debug mypod --copy-to=mypod-debug --image=myapp-image:debug --container=myapp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good point. Added.
|
|
||
| TODO: how? | ||
|
|
||
| TODO: Some CLI options only make sense for pods, and some only make sense for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG, I'll remove for now.
| privileged pod constrained to this particular node and run in the host | ||
| namespaces. | ||
|
|
||
| TODO: how? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, added a brief description.
| troubleshooting session might resemble: | ||
|
|
||
| ``` | ||
| % kubectl debug -it -m debian neato-5thn0 -- bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-m was the short code for --image once upon a time. 😄
Updated here and elsewhere.
|
This lgtm, if nobody has any objections I'll tag it after today's SIG-CLI. |
|
Added the test plan we discussed at today's sig-cli. |
soltysh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: soltysh, verb The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Adds a provisional KEP proposing a new
kubectl debugcommand based on #277. KEP is not yet fully described, but feedback is welcome.