-
Notifications
You must be signed in to change notification settings - Fork 5.3k
add proposal to store additional metadata about rbd owner in rbd image #1790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| rbdImageName := b64.StdEncoding.EncodeToString([]byte(r.options.PVC.Name)) | ||
|
|
||
| // Append the base64 encoding to the string `kubernetes-dynamic-pvc-` | ||
| rbdImageName = fmt.Sprintf("kubernetes-dynamic-pvc-%s", rbdImageName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit concerned exposing PV name to rbd image name, for the following reason:
- the image name could run out of limit (cc @dillaman for (krbd) image name length and validation)
- it misses the randomness. Two PVCs with the same name (but different namespaces) then collide on PV name.
- if the PVC name has special meaning (e.g. "my-highly-confidential-pvc"), the name is accidentally leaked. Admittedly this may not be a real use case though.
What about using special key/value pairs for the image, e.g. through rbd image-meta set? It doesn't solve the 3nd problem but it fixes the first two problems without regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rootfs Some older versions of librbd / krbd start having issues w/ names longer than 95 characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @dillaman
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rootfs thanks for the pointer on image-meta , i think that looks much better solution since it doesnt change the existing rbd image name, allows for arbitrary key value to be stored and kinda hides the pvc information in the meta rather than making it obvious in the name. I just tried it and it works. I have updated the proposal, PTAL
|
|
||
|
|
||
| // create RBD Image name | ||
| image := fmt.Sprintf("kubernetes-dynamic-pvc-%s", uuid.NewUUID()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is image used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
| image := fmt.Sprintf("kubernetes-dynamic-pvc-%s", uuid.NewUUID()) | ||
|
|
||
| // Create a base64 encoding of the PVC Namespace and Name | ||
| rbdImageName := b64.StdEncoding.EncodeToString([]byte(r.options.PVC.Name+"-"+r.options.PVC.Namespace)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- is not a delimiter, it doesn't help avoid collision. Try
r.options.PVC.Namespace+"/"+r.options.PVC.Name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, this will break old krbd where image length cannot be over 95 bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed and moved this proposal at the bottom
| of hiding the PVC name in the metadata rather than making it more obvious in the RBD image name. In | ||
| addition, Proposal 3, doesnt impose any limitations on the length of metadata that can be stored | ||
| and hence can accomodate any pvc names and namespaces which are stored as arbitrary key value pairs. | ||
| It also leaves room for storing any other metadata about the PVC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved this proposal at the top as Finalized Proposal
|
let's go with rbd meta approach. |
|
@rootfs PTAL comments addressed. I will send out another PR for the implementation. |
003161d to
caf06e3
Compare
|
/approve |
|
/lgtm |
|
/assign @saad-ali |
|
Can you add a section about what happens when upgrading from a cluster without this logic to this logic. And what happens if I downgrade from a cluster with this logic to one without it? Otherwise LGTM |
| The current implementation generates a UUID and the rbd image name becomes | ||
| image := fmt.Sprintf("kubernetes-dynamic-pvc-%s", uuid.NewUUID()). This RBD image | ||
| name is stored in the PV. The PV also has a reference to the PVC to which it binds. | ||
| The problem with this approach is that if there is a catastrophic etcd data loss |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just curious, if there is catastrophic etcd data loss, do you think with all other info lost, having deterministic volume name would help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have had use cases where you might want to import the volume in a new cluster and with out the meta-data details we don't know the origin of the volume.
|
@sbezverk yes we will be able to reconstruct which rbd volumes belong to which customers and provide a fallback to let them copy the data or stitch the rbd image to pv/pvc mapping by hand . In our case everything else that we deploy in kubernetes has a source of truth in enterprise git so that data is not lost at all |
|
@krmayankk thanks a lot for your reply. Make sense then. |
|
@chakri-nelluri what do you mean by make it generic ? You mean move the part which generates the volume name inside the generic volume provisioner code rather than putting it in each volume provisioner ? |
|
@krmayankk Yes. That's true. We would need it for all volumes which use dynamic provisioning. |
|
|
||
| #### Downgrade from a K8s version with this metadata to a version without this metadata | ||
| After a downgrade, all existing RBD images will have the metadata set. New RBD images created after the | ||
| downgrade will not have this metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If backwards/forwards comparability review indicates that there are no issues with doing so, yes, this should apply to other volume plugins (e.g. GCE PD, AWS EBS, etc.). FWIW, we are planing to do this for for the external CSI provisioner: kubernetes-csi/external-provisioner#63 |
|
@saad-ali this proposal is talking about generating image names for image specific provisioners in a predictable manner or storing the pvc name as meta in the image name. The other proposal you reference is talking about generating pv names deterministically, which is still useful but not the same as this proposal . I dont see a way for us to be able to generate provisioner specific names deterministically mainly because different provsioners can have different unerlying limitations on naming etc, but maybe CSI should(if it doesnt arlreadty) exposes an api call to set meta and each proviosner can implement the way it chooses. For rbd it will be calling the api i propose here. the CSI api should be called and in many places it might be a noop if the underlying proviosner cannot implement it correctly. |
|
For an out-of-band metadata change, this proposal doesn't change the existing rbd volume behavior. It remains to be seen whether such approach can be applied to other volume types. In the end, not every volume has out-of-band metadata. @krmayankk please propose this change to Ceph CSI driver too. |
|
/lgtm |
|
/lgtm |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rootfs, saad-ali The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@rootfs here is a proposal to make rbd image non random that we talked about some time back