-
Notifications
You must be signed in to change notification settings - Fork 5.3k
add proposal to store additional metadata about rbd owner in rbd image #1790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
128 changes: 128 additions & 0 deletions
128
contributors/design-proposals/storage/pv-to-rbd-mapping.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,128 @@ | ||
| # RBD Volume to PV Mapping | ||
|
|
||
| Authors: krmayankk@ | ||
|
|
||
| ### Problem | ||
|
|
||
| The RBD Dynamic Provisioner currently generates rbd volume names which are random. | ||
| The current implementation generates a UUID and the rbd image name becomes | ||
| image := fmt.Sprintf("kubernetes-dynamic-pvc-%s", uuid.NewUUID()). This RBD image | ||
| name is stored in the PV. The PV also has a reference to the PVC to which it binds. | ||
| The problem with this approach is that if there is a catastrophic etcd data loss | ||
| and all PV's are gone, there is no way to recover the mapping from RBD to PVC. The | ||
| RBD volumes for the customer still exist, but we have no way to tell which rbd | ||
| volumes belong to which customer. | ||
|
|
||
| ## Goal | ||
| We want to store some information about the PVC in RBD image name/metadata, so that | ||
| in catastrophic situations, we can derive the PVC name from rbd image name/metadata | ||
| and allow customer the following options: | ||
| - Backup RBD volume data for specific customers and hand them their copy before deleting | ||
| the RBD volume. Without knowing from rbd image name/metadata, which customers they | ||
| belong to we cannot hand those customers their data. | ||
| - Create PV with the given RBD name and pre-bind it to the desired PVC so that customer | ||
| can get its data back. | ||
|
|
||
| ## Non Goals | ||
| This proposal doesnt attempt to undermine the importance of etcd backups to restore | ||
| data in catastrophic situations. This is one additional line of defense in case our | ||
| backups are not working. | ||
|
|
||
| ## Motivation | ||
|
|
||
| We recently had an etcd data loss which resulted in loss of this rbd to pv mapping | ||
| and there was no way to restore customer data. This proposal aims to store pvc name | ||
| as metadata in the RBD image so that in catastrophic scenarios, the mapping can be | ||
| restored by just looking at the RBD's. | ||
|
|
||
| ## Current Implementation | ||
|
|
||
| ```go | ||
| func (r *rbdVolumeProvisioner) Provision() (*v1.PersistentVolume, error) { | ||
| ... | ||
|
|
||
| // create random image name | ||
| image := fmt.Sprintf("kubernetes-dynamic-pvc-%s", uuid.NewUUID()) | ||
| r.rbdMounter.Image = image | ||
| ``` | ||
| ## Finalized Proposal | ||
| Use `rbd image-meta set` command to store additional metadata in the RBD image about the PVC which owns | ||
| the RBD image. | ||
|
|
||
| `rbd image-meta set --pool hdd kubernetes-dynamic-pvc-fabd715f-0d24-11e8-91fa-1418774b3e9d pvcname <pvcname>` | ||
| `rbd image-meta set --pool hdd kubernetes-dynamic-pvc-fabd715f-0d24-11e8-91fa-1418774b3e9d pvcnamespace <pvcnamespace>` | ||
|
|
||
| ### Pros | ||
| - Simple to implement | ||
| - Does not cause regression in RBD image names, which remains same as earlier. | ||
| - The metada information is not immediately visible to RBD admins | ||
|
|
||
| ### Cons | ||
| - NA | ||
|
|
||
| Since this Proposal does not change the RBD image name and is able to store additional metadata about | ||
| the PVC to which it belongs, this is preferred over other two proposals. Also it does a better job | ||
| of hiding the PVC name in the metadata rather than making it more obvious in the RBD image name. The | ||
| metadata can only be seen by admins with appropriate permissions to run the rbd image-meta command. In | ||
| addition, this Proposal , doesnt impose any limitations on the length of metadata that can be stored | ||
| and hence can accommodate any pvc names and namespaces which are stored as arbitrary key value pairs. | ||
| It also leaves room for storing any other metadata about the PVC. | ||
|
|
||
|
|
||
| ### Upgrade/Downgrade Behavior | ||
|
|
||
| #### Upgrading from a K8s version without this metadata to a version with this metadata | ||
| The metadata for image is populated on CreateImage. After an upgrade, existing RBD Images will not have that | ||
| metadata set. When the next AttachDisk happens, we can check if the metadata is not set, set it. Cluster | ||
| administrators could also run a one time script to set this manually. For all newly created RBD images, | ||
| the rbd image metadata will be set properly. | ||
|
|
||
| #### Downgrade from a K8s version with this metadata to a version without this metadata | ||
| After a downgrade, all existing RBD images will have the metadata set. New RBD images created after the | ||
| downgrade will not have this metadata. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| ## Proposal 1 | ||
|
|
||
| Make the RBD Image name as base64 encoded PVC name(namespace+name) | ||
|
|
||
| ```go | ||
| import b64 "encoding/base64" | ||
| ... | ||
|
|
||
|
|
||
| func (r *rbdVolumeProvisioner) Provision() (*v1.PersistentVolume, error) { | ||
| ... | ||
|
|
||
| // Create a base64 encoding of the PVC Namespace and Name | ||
| rbdImageName := b64.StdEncoding.EncodeToString([]byte(r.options.PVC.Name+"/"+r.options.PVC.Namespace)) | ||
|
|
||
| // Append the base64 encoding to the string `kubernetes-dynamic-pvc-` | ||
| rbdImageName = fmt.Sprintf("kubernetes-dynamic-pvc-%s", rbdImageName) | ||
| r.rbdMounter.Image = rbdImageName | ||
|
|
||
| ``` | ||
|
|
||
| ### Pros | ||
| - Simple scheme which encodes the fully qualified PVC name in the RBD image name | ||
|
|
||
| ### Cons | ||
| - Causes regression since RBD image names will change from one version of K8s to another. | ||
| - Some older versions of librbd/krbd start having issues with names longer than 95 characters. | ||
|
|
||
|
|
||
| ## Proposal 2 | ||
|
|
||
| Make the RBD Image name as the stringified PVC namespace plus PVC name. | ||
|
|
||
| ### Pros | ||
| - Simple to implement. | ||
|
|
||
| ### Cons | ||
| - Causes regression since RBD image names will change from one version of K8s to another. | ||
| - This exposes the customer name directly to Ceph Admins. Earlier it was hidden as base64 encoding | ||
|
|
||
|
|
||
| ## Misc | ||
| - Document how Pre-Binding of PV to PVC works in dynamic provisioning | ||
| - Document/Test if there are other issues with restoring PVC/PV after a | ||
| etcd backup is restored | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just curious, if there is
catastrophic etcd data loss, do you think with all other info lost, having deterministic volume name would help?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have had use cases where you might want to import the volume in a new cluster and with out the meta-data details we don't know the origin of the volume.