-
Notifications
You must be signed in to change notification settings - Fork 1.2k
KVM disk-only based snapshot of volumes instead of taking VM's full snapshot and extracting disks #5297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
RodrigoDLopez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice enhancement @GutoVeronezi
I would like it if you could explain your tests more clearly. I think just organize your text will help a lot.
The code LGTM.
|
Thanks @GutoVeronezi I'll post my review this week |
| * 2nd parameter: disk's label (target.dev tag from VM's XML);<br> | ||
| * 3rd parameter: the absolute path of the base file; | ||
| */ | ||
| private static final String COMMAND_MERGE_SNAPSHOT = "virsh blockcommit %s %s --base %s --active --wait --delete --pivot"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GutoVeronezi, one suggestion - the latest Libvirt's Java API bindings support qemu-monitor-command and maybe the command block-commit could replace this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spoke offline with @GutoVeronezi, and the flag --delete does not work with the version of libvirt that I have
Using library: libvirt 5.0.0
Using API: QEMU 5.0.0
Running hypervisor: QEMU 2.12.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @slavkap, I will make some changes soon.
|
@RodrigoDLopez thanks for the review. I just updated the tests description. If there is anything to improve, let me know. |
… only if libvirt version is equal or higher 6.0.0
|
Regarding to @slavkap #5297 (comment), the feature |
...chestration/src/main/java/org/apache/cloudstack/engine/orchestration/VolumeOrchestrator.java
Outdated
Show resolved
Hide resolved
...datamotion/src/main/java/org/apache/cloudstack/storage/motion/AncientDataMotionStrategy.java
Show resolved
Hide resolved
...e/snapshot/src/main/java/org/apache/cloudstack/storage/snapshot/DefaultSnapshotStrategy.java
Outdated
Show resolved
Hide resolved
| storagePoolMgr.getStoragePoolByURI(snapshotImageStore.getUrl()); | ||
| } | ||
|
|
||
| if (primaryPool.getType() == StoragePoolType.CLVM) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it need to be part of storagePoolTypesThatSupportRevertSnapshot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nvazquez reverting snapshots that were only in primary storage was not working correctly for KVM. The RBD validation was added in PR #3502 to allow Ceph to revert snapshots that were only in primary storage, however the rest kept not working. This PR fixed this error to some others storage pool types (which were added to the validation), however CLVM was not our focus, therefore we tried to not touch in it.
|
@blueorangutan package |
nvazquez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM - have not tested it
|
@blueorangutan package |
|
@nvazquez a Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
@blueorangutan package |
|
@DaanHoogland a Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result: ✖️ el7 ✖️ el8 ✖️ debian ✔️ suse15. SL-JID 3112 |
|
@blueorangutan package |
|
@GutoVeronezi a Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 3118 |
|
@blueorangutan test |
|
@nvazquez a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
|
Trillian test result (tid-3853)
|
|
Merging based on approvals and test results |
ACS + Xenserver works with differential snapshots. ACS takes a volume full snapshot and the next ones are referenced as a child of the previous snapshot until the chain reaches the limit defined in the global setting snapshot.delta.max; then, a new full snapshot is taken. PR #5297 introduced disk-only snapshots for KVM volumes. Among the changes, the delete process was also refactored. Before the changes, when one was removing a snapshot with children, ACS was marking it as Destroyed and it was keeping the Image entry on the table cloud.snapshot_store_ref as Ready. When ACS was rotating the snapshots (the max delta was reached) and all the children were already marked as removed; then, ACS would start removing the whole hierarchy, completing the differential snapshot cycle. After the changes, the snapshots with children stopped being marked as removed and the differential snapshot cycle was not being completed. This PR intends to honor again the differential snapshot cycle for XenServer, making the snapshots to be marked as removed when deleted while having children and following the differential snapshot cycle. Also, when one takes a volume snapshot and ACS backs it up to the secondary storage, ACS inserts 2 entries on table cloud.snapshot_store_ref (Primary and Image). When one deletes a volume snapshot, ACS first tries to remove the snapshot from the secondary storage and mark the entry Image as removed; then, it tries to remove the snapshot from the primary storage and mark the entry Primary as removed. If ACS cannot remove the snapshot from the primary storage, it will keep the snapshot as BackedUp; however, If it does not exist in the secondary storage and without the entry SNAPSHOT.DELETE on cloud.usage_event. In the end, after the garbage collector flow, the snapshot will be marked as BackedUp, with a value in the field removed and still being rated. This PR also addresses the correction for this situation. Co-authored-by: GutoVeronezi <[email protected]>
…allows This removes the conditional logic where comment notest to remove it after PR apache#5297 is merged that is applicable for ACS 4.18+. Only when the global setting is enabled and memory isn't selected, VM snapshot could be allowed for VMs on KVM that have qemu-guest-agent running. Signed-off-by: Rohit Yadav <[email protected]>
…g allows (#8062) This removes the conditional logic where comment notest to remove it after PR #5297 is merged that is applicable for ACS 4.18+. Only when the global setting is enabled and memory isn't selected, VM snapshot could be allowed for VMs on KVM that have qemu-guest-agent running. Signed-off-by: Rohit Yadav <[email protected]>
…g allows (apache#8062) This removes the conditional logic where comment notest to remove it after PR apache#5297 is merged that is applicable for ACS 4.18+. Only when the global setting is enabled and memory isn't selected, VM snapshot could be allowed for VMs on KVM that have qemu-guest-agent running. Signed-off-by: Rohit Yadav <[email protected]>
…g allows (apache#8062) This removes the conditional logic where comment notest to remove it after PR apache#5297 is merged that is applicable for ACS 4.18+. Only when the global setting is enabled and memory isn't selected, VM snapshot could be allowed for VMs on KVM that have qemu-guest-agent running. Signed-off-by: Rohit Yadav <[email protected]> (cherry picked from commit 8350ce5) Signed-off-by: Rohit Yadav <[email protected]>
…g allows (apache#8062) This removes the conditional logic where comment notest to remove it after PR apache#5297 is merged that is applicable for ACS 4.18+. Only when the global setting is enabled and memory isn't selected, VM snapshot could be allowed for VMs on KVM that have qemu-guest-agent running. Signed-off-by: Rohit Yadav <[email protected]> (cherry picked from commit 8350ce5) Signed-off-by: Rohit Yadav <[email protected]>
Description
This PR solves issue #5124.
The main focus of it is to change how volume snapshots are taken in KVM. However, this proposal of changing
take snapshotworkflow in KVM impacts others snapshot workflows as well. The impacted workflows we mapped are:Issue #5124 has more details and flowcharts of all the mapped workflows.
The focus of this PR is to refactor KVM snapshot, others hypervisors workflows should not be affected.
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
How Has This Been Tested?
I added unit tests to several methods and tested the workflows:
These workflows were tested twice, once with the global setting
snapshot.backup.to.secondaryastrueand once asfalse.To test workflows 1, 2, and 3, I created a VM called
Test1, then I opened its console to create a file, which I calledtestfile. At first, I wrotegoldin this file and took a snapshot that I calledgold(this tests workflow 1). Then I wrotesilverto the file and configured a recurring snapshot. I waited for the recurring snapshot and, when it was took, I changed the name in database tosilver, to facilitate the steps (this tests workflow 2). Finally, I wrotecopperto the file and stopped the VM, then I took a snapshot (this tests workflow 3). After that, I destroyed VMTest1.To test workflow 4, I created a template from snapshot
silverand called ittemplate silver. From it, I created a VM calledTest2, then I opened its console and retrieved the content of thetestfilewith the commandcat. It returnedsilver, which is the same value of the file when I took the snapshot on workflow 2. After that, I stopped VMTest2.To test workflow 5, I created a volume from snapshot
goldand called itvolume gold, then I changed the root volume of the VMTest2to thevolume gold. I started the VMTest2and retrieved the content of thetestfilewith the commandcat. It returnedgold, which is the same value of the file when I took the snapshot on workflow 1.To test workflows 6 and 7, I wrote
platinumto the filetestfilein VMTest2, then I took a snapshot and called itplatinum. I migrated the VM and the volume to another host and primary storage with the APImigrateVirtualMachineWithVolume. When the global settingsnapshot.backup.to.secondarywastrue, it migrated successfully. When it wasfalse, an exception was threw, warning user about snapshots on primary storage (this tests workflow 6). I stopped the VM and migrated only the volume to another primary storage. When the global settingsnapshot.backup.to.secondarywastrue, it migrated successfully. When it wasfalse, an exception was threw, warning users about snapshots on primary storagee (this tests workflow 7).To test workflow 8, I started VM
Test2and wrotediamondto the filetestfile. Then I stopped the VMTest2and reverted the volume to the snapshotplatinum. Then I opened its console and retrieved the content of thetestfilewith the commandcat. It returnedplatinum, which is the same value of the file when I took the snapshot before the workflow 6.To test workflow 9, I deleted all the snapshots.