Skip to content

Conversation

@krmayankk
Copy link

@rootfs here is a proposal to make rbd image non random that we talked about some time back

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Feb 11, 2018
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 11, 2018
@krmayankk
Copy link
Author

/assign @saad-ali
/assign @rootfs

@k8s-github-robot k8s-github-robot added kind/design Categorizes issue or PR as related to design. sig/storage Categorizes an issue or PR as relevant to SIG Storage. labels Feb 11, 2018
rbdImageName := b64.StdEncoding.EncodeToString([]byte(r.options.PVC.Name))

// Append the base64 encoding to the string `kubernetes-dynamic-pvc-`
rbdImageName = fmt.Sprintf("kubernetes-dynamic-pvc-%s", rbdImageName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit concerned exposing PV name to rbd image name, for the following reason:

  • the image name could run out of limit (cc @dillaman for (krbd) image name length and validation)
  • it misses the randomness. Two PVCs with the same name (but different namespaces) then collide on PV name.
  • if the PVC name has special meaning (e.g. "my-highly-confidential-pvc"), the name is accidentally leaked. Admittedly this may not be a real use case though.

What about using special key/value pairs for the image, e.g. through rbd image-meta set? It doesn't solve the 3nd problem but it fixes the first two problems without regression.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rootfs Some older versions of librbd / krbd start having issues w/ names longer than 95 characters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @dillaman

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rootfs thanks for the pointer on image-meta , i think that looks much better solution since it doesnt change the existing rbd image name, allows for arbitrary key value to be stored and kinda hides the pvc information in the meta rather than making it obvious in the name. I just tried it and it works. I have updated the proposal, PTAL

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 20, 2018
@krmayankk krmayankk changed the title add proposal to make rbd image name non random add proposal to store additional metadata about rbd owner in rbd image Feb 21, 2018


// create RBD Image name
image := fmt.Sprintf("kubernetes-dynamic-pvc-%s", uuid.NewUUID())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is image used?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

image := fmt.Sprintf("kubernetes-dynamic-pvc-%s", uuid.NewUUID())

// Create a base64 encoding of the PVC Namespace and Name
rbdImageName := b64.StdEncoding.EncodeToString([]byte(r.options.PVC.Name+"-"+r.options.PVC.Namespace))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- is not a delimiter, it doesn't help avoid collision. Try
r.options.PVC.Namespace+"/"+r.options.PVC.Name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, this will break old krbd where image length cannot be over 95 bytes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed and moved this proposal at the bottom

of hiding the PVC name in the metadata rather than making it more obvious in the RBD image name. In
addition, Proposal 3, doesnt impose any limitations on the length of metadata that can be stored
and hence can accomodate any pvc names and namespaces which are stored as arbitrary key value pairs.
It also leaves room for storing any other metadata about the PVC.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved this proposal at the top as Finalized Proposal

@rootfs
Copy link
Contributor

rootfs commented Feb 21, 2018

let's go with rbd meta approach.

@krmayankk
Copy link
Author

@rootfs PTAL comments addressed. I will send out another PR for the implementation.

@krmayankk krmayankk force-pushed the runasg branch 2 times, most recently from 003161d to caf06e3 Compare March 6, 2018 05:57
@rootfs
Copy link
Contributor

rootfs commented Mar 14, 2018

/approve

@rootfs
Copy link
Contributor

rootfs commented Mar 14, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 14, 2018
@krmayankk
Copy link
Author

/assign @saad-ali

@saad-ali
Copy link
Member

Can you add a section about what happens when upgrading from a cluster without this logic to this logic. And what happens if I downgrade from a cluster with this logic to one without it?

Otherwise LGTM

The current implementation generates a UUID and the rbd image name becomes
image := fmt.Sprintf("kubernetes-dynamic-pvc-%s", uuid.NewUUID()). This RBD image
name is stored in the PV. The PV also has a reference to the PVC to which it binds.
The problem with this approach is that if there is a catastrophic etcd data loss

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just curious, if there is catastrophic etcd data loss, do you think with all other info lost, having deterministic volume name would help?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have had use cases where you might want to import the volume in a new cluster and with out the meta-data details we don't know the origin of the volume.

@krmayankk
Copy link
Author

@sbezverk yes we will be able to reconstruct which rbd volumes belong to which customers and provide a fallback to let them copy the data or stitch the rbd image to pv/pvc mapping by hand . In our case everything else that we deploy in kubernetes has a source of truth in enterprise git so that data is not lost at all

@sbezverk
Copy link

@krmayankk thanks a lot for your reply. Make sense then.

@chakri-nelluri
Copy link
Contributor

@rootfs @saad-ali Isn't this kind of true for all plugins? Shall we make it generic for all plugins?

@krmayankk
Copy link
Author

@chakri-nelluri what do you mean by make it generic ? You mean move the part which generates the volume name inside the generic volume provisioner code rather than putting it in each volume provisioner ?

@chakri-nelluri
Copy link
Contributor

@krmayankk Yes. That's true. We would need it for all volumes which use dynamic provisioning.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 28, 2018

#### Downgrade from a K8s version with this metadata to a version without this metadata
After a downgrade, all existing RBD images will have the metadata set. New RBD images created after the
downgrade will not have this metadata.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saad-ali @rootfs could you review this again ? My proposal is to populate the meta on CreateImage and re-verify in AttachDisk, so that in the upgrade case, all older RBD images also get the image meta . If you have a better idea, please propose.

@saad-ali
Copy link
Member

@rootfs @saad-ali Isn't this kind of true for all plugins? Shall we make it generic for all plugins?

Can you add a section about what happens when upgrading from a cluster without this logic to this logic. And what happens if I downgrade from a cluster with this logic to one without it?

If backwards/forwards comparability review indicates that there are no issues with doing so, yes, this should apply to other volume plugins (e.g. GCE PD, AWS EBS, etc.). FWIW, we are planing to do this for for the external CSI provisioner: kubernetes-csi/external-provisioner#63

@krmayankk
Copy link
Author

krmayankk commented Mar 29, 2018

@saad-ali this proposal is talking about generating image names for image specific provisioners in a predictable manner or storing the pvc name as meta in the image name. The other proposal you reference is talking about generating pv names deterministically, which is still useful but not the same as this proposal . I dont see a way for us to be able to generate provisioner specific names deterministically mainly because different provsioners can have different unerlying limitations on naming etc, but maybe CSI should(if it doesnt arlreadty) exposes an api call to set meta and each proviosner can implement the way it chooses. For rbd it will be calling the api i propose here. the CSI api should be called and in many places it might be a noop if the underlying proviosner cannot implement it correctly.

@rootfs
Copy link
Contributor

rootfs commented Mar 29, 2018

For an out-of-band metadata change, this proposal doesn't change the existing rbd volume behavior. It remains to be seen whether such approach can be applied to other volume types. In the end, not every volume has out-of-band metadata.
Since this change is isolated to rbd, doesn't impact volume upgrade/downgrade process, and serves a use case for storage admin, I'd like to see it in coming releases.

@krmayankk please propose this change to Ceph CSI driver too.

@rootfs
Copy link
Contributor

rootfs commented Mar 29, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 29, 2018
@saad-ali
Copy link
Member

saad-ali commented Apr 4, 2018

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 4, 2018
@saad-ali
Copy link
Member

saad-ali commented Apr 4, 2018

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rootfs, saad-ali

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit e37d326 into kubernetes:master Apr 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/design Categorizes issue or PR as related to design. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants