Skip to content

Conversation

@verb
Copy link
Contributor

@verb verb commented Aug 6, 2019

Adds a provisional KEP proposing a new kubectl debug command based on #277. KEP is not yet fully described, but feedback is welcome.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/cli Categorizes an issue or PR as relevant to SIG CLI. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 6, 2019
@verb
Copy link
Contributor Author

verb commented Aug 15, 2019

I published an example plugin at https://github.com/verb/kubectl-debug

@verb
Copy link
Contributor Author

verb commented Sep 26, 2019

A demo of the plugin is http://bit.ly/kubectl-debug-demo


Options:
-a, --attach=true: Automatically attach to container once created.
-c, --container='': Container name. If omitted one will be chosen.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If omitted, the first container in the pod will be chosen is the default in other commands, eg. kubectl attach

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

-a, --attach=true: Automatically attach to container once created.
-c, --container='': Container name. If omitted one will be chosen.
-i, --stdin=true: Pass stdin to the container
-m, --image='busybox': Container image to use for debug container.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-m might be confusing, I'd drop short for now. Also I'm inclined not to default this to anything, just require a user to always pass this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG, done.

#### Operations

Alice runs a service "neato" that consists of a statically compiled Go binary
running in a minimal container image. One of the its pods is suddenly having
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the its pods

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks.

running in a minimal container image. One of the its pods is suddenly having
trouble connecting to an internal service. Being in operations, Alice wants to
be able to inspect the running pod without restarting it, but she doesn't
necessarily need to enter the container itself. She wants to:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That argument falls short, I'd prefer to use the same argument you used at the beginning, that she doesn't have the ability to enter the container b/c it's a scratch image.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scenario I'm envisioning is attaching a debugger to capture state of a hard-to-reproduce bug when it occurs in production. Restarting the container will destroy the state.


### Implementation Details/Notes/Constraints

1. There are no guaranteed resources for ad-hoc troubleshooting. If
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is rather undesirable effect, since it may cause an application downtime. I think we'll need to be explicit about that in the command's help.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm even inclined to move this to risks section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to risks and added that we'll be able to mitigate this once support for resizing pods has been added.

kubectl debug mypod --image=debian

Options:
-a, --attach=true: Automatically attach to container once created.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered always attaching and removing the container when session ends, there's InterruptParent handler in https://github.com/kubernetes/kubernetes/blob/3ec5fe500d7a56be6ff6c15f916987eaa48c2e94/staging/src/k8s.io/kubectl/pkg/cmd/exec/exec.go#L136

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't know about InterruptParent, that seems useful. We haven't agreed on the semantics of deleting/recreating ephemeral containers yet, but it's important to me that we support multiple sequential commands in some manner. For example:

$ kubectl debug $pod ls /tmp
$ kubectl debug $pod rm /tmp/bigfile

Choosing multiple ephemeral container names would be unfortunate.

Is InterruptParent invoked on the stream being interrupted, or only on signals received by kubectl?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is InterruptParent invoked on the stream being interrupted, or only on signals received by kubectl?

Signal.

I have another question wrt attaching, is this to satisfy the use case where you can't use kubectl attach but debug with ephemeral container will work, is that correct?

### Container Namespace Targeting

In order to see processes running in other containers in the pod, [Process
Namespace Sharing] should be enabled for the pod. In cases where process
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be called out both in command help and also in constraints section, w/o it kubectl debug won't be able to that much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make sure it's called out in command help (verb/kubectl-debug@64404a9, 1a39bf5) but I'm not sure it's a constraint since we will support targeting container namespaces. Based on feedback, here's how I think it will be used:

shareProcessNamespace container targeted? See other processes? Use case
true - yes application troubleshooting
false yes yes application troubleshooting
false no no network troubleshooting

Concrete example for network troubleshooting: kubedns wants to use distroless containers but wants to be able to run an ephemeral container to be able to dig against localhost.

@verb
Copy link
Contributor Author

verb commented Oct 29, 2019

@aylei you have a lot of expertise in this area, would you be interested in joining the effort to port your kubectl-debug plugin to kubectl proper?

@aylei
Copy link
Contributor

aylei commented Oct 29, 2019

@aylei you have a lot of expertise in this area, would you be interested in joining the effort to port your kubectl-debug plugin to kubectl proper?

Yes, of course. I'm willing to assist in advancing the design and code implementation. However, I don't have much experience about the community process, could you please tell me how can I help now?

@verb
Copy link
Contributor Author

verb commented Nov 1, 2019

@aylei Great! The first step is to iterate on the proposal in this PR until the approvers at the top agree that we've fleshed out enough of the details that the proposal is "implementable". I'd say we should start with:

  1. Please review the current state of the proposal in this PR and let me know your thoughts.
  2. You can prepare a PR to expand the proposal and add yourself as an author.

@verb
Copy link
Contributor Author

verb commented Nov 1, 2019

@soltysh I pushed a commit which attempts to capture the new scope we discussed. Please have a look at e995c36 and let me know if you agree.

(edited to update ref because I forgot to update the toc)

@verb verb changed the title Add KEP for Pod Troubleshooting in kubectl Add KEP for kubectl debug Nov 6, 2019
@verb verb changed the title Add KEP for kubectl debug Add KEP for kubectl debug Nov 6, 2019
Copy link
Contributor

@aylei aylei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest LGTM, I will try to expand the todo parts if you're not working on them now. 😊

```
Examples:
# Create a copy of 'mypod' with the debugging image for container 'app'
kubectl debug mypod --copy-to=mypod-debug --image=myapp-image:debug --container=myapp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we override the command of the container 'myapp' via -- COMMAND?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point. Added.


Examples:
# Get network address configuration from pod mypod
kubectl debug mypod ip addr list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the image is required, this doesn't make sense, it's rather kubectl exec ip addr list, unless this is to run a command in the ephemeral container b/c the main is a distroless so we can't exec into it, in which case you'll probably want to add --attach=false too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, I neglected to update example when making image required. I would like to be able to set a default container image for debug, but I don't want to get hung up on it for v1.

--attach must be true because that's where the output of ip addr list will go. Unless we were to create the Ephemeral Container and do an exec rather than setting Container.Args to args. There would be some advantages to using exec this way, actually.

Rather than getting hung up on this, I've removed this example to focus on the interactive debugging use case for the first version.

the copy to receive traffic from a service or a replicaset
to kill other pods.
--share-processes=true: When used with `--copy-to`, enable process namespace
sharing in the pod copy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add --node-name to be able to place the pod on a specific node, additionally this will be necessary in the node debug mode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other options include --as-user --as-root, which would allow you to run as specific user or as root. --to-namespace would be also complementary to --copy-to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, we can have several others modifiers for the debug pod, including liveness and readiness probe (you might not care about those), as well as init containers, which might not be needed for debug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about node-name, I'll add that. Or perhaps --same-node?

It would be useful to allow arbitrary edits prior to create. What do you think about an --edit that opens an editor similar to kubectl edit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: --node-name for node debug mode, I expected the node name to be a positional parameter like kubectl debug node/$node_name, but I haven't yet investigated the conventions for specifying resources on the command line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but node-name would be only for node debug, what if you want to land your pod on a specific node for other reasons.

privileged pod constrained to this particular node and run in the host
namespaces.

TODO: how?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just create a pod, for a specific node, with HostNetwork, HostPid, HostPath set accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we don't need to have this rock solid, defined. We'll iterate as we go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, added a brief description.


TODO: how?

TODO: Some CLI options only make sense for pods, and some only make sense for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not strongly convinced, but apiserver did something kubernetes/kubernetes#64517 that I was planning o implementing in kubectl, eventually. But let's start simple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG, I'll remove for now.

troubleshooting session might resemble:

```
% kubectl debug -it -m debian neato-5thn0 -- bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's -m here, and --image is missing here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-m was the short code for --image once upon a time. 😄

Updated here and elsewhere.

kubectl debug mypod --image=debian

Options:
-a, --attach=true: Automatically attach to container once created.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is InterruptParent invoked on the stream being interrupted, or only on signals received by kubectl?

Signal.

I have another question wrt attaching, is this to satisfy the use case where you can't use kubectl attach but debug with ephemeral container will work, is that correct?

Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few more comments here and there, mostly nits. Let's get them shaped and merge this asap.

privileged pod constrained to this particular node and run in the host
namespaces.

TODO: how?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we don't need to have this rock solid, defined. We'll iterate as we go.

- Expand command arguments in examples
- Add description of how node debug works
- Mark implementable
Copy link
Contributor Author

@verb verb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soltysh Thanks for the review. I addressed the comments and also updated the status to "implementable" so we can begin implementing for 1.18.

```
Examples:
# Create a copy of 'mypod' with the debugging image for container 'app'
kubectl debug mypod --copy-to=mypod-debug --image=myapp-image:debug --container=myapp
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point. Added.


TODO: how?

TODO: Some CLI options only make sense for pods, and some only make sense for
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG, I'll remove for now.

privileged pod constrained to this particular node and run in the host
namespaces.

TODO: how?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, added a brief description.

troubleshooting session might resemble:

```
% kubectl debug -it -m debian neato-5thn0 -- bash
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-m was the short code for --image once upon a time. 😄

Updated here and elsewhere.

@soltysh
Copy link
Contributor

soltysh commented Jan 15, 2020

This lgtm, if nobody has any objections I'll tag it after today's SIG-CLI.

@verb
Copy link
Contributor Author

verb commented Jan 15, 2020

Added the test plan we discussed at today's sig-cli.

Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 15, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: soltysh, verb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 15, 2020
@k8s-ci-robot k8s-ci-robot merged commit c283365 into kubernetes:master Jan 15, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Jan 15, 2020
@verb verb deleted the cli-debug branch July 23, 2021 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/cli Categorizes an issue or PR as relevant to SIG CLI. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants