-
Notifications
You must be signed in to change notification settings - Fork 1.2k
CloudStack fails to migrate VM with volume when there are datadisks attatched #5410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CloudStack fails to migrate VM with volume when there are datadisks attatched #5410
Conversation
|
This has been tested with KVM. Pinging some of the experts in managed storages (@mike-tutkowski @slavkap) to check if there are any cases I am missing where such volume mapping is correct. Also, I would appreciate feedback regarding different hypervisors, such as VMware and Xen (@rhtyd @shwstppr @nvazquez @DaanHoogland). |
nvazquez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would break at least Vmware offline migrations (https://github.com/apache/cloudstack/blob/main/engine/orchestration/src/main/java/org/apache/cloudstack/engine/orchestration/VolumeOrchestrator.java#L1314) cc. @rhtyd @shwstppr @weizhouapache @sureshanaparti @davidjumani - any input on this?
| } else { | ||
| volumeToPoolObjectMap.put(volume, currentPool); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GabrielBrascher as you have seen allVolumes is a confusing name here. It is only the volumes that are not mapped using API. I've not tested but VMware might expect mappings for each of the volumes of the VM.
I feel instead of preventing adding a map here, we should look into the code that does migration and skip volumes that are already on the destination pool. For KVM this could be,
https://github.com/apache/cloudstack/blob/main/engine/storage/datamotion/src/main/java/org/apache/cloudstack/storage/motion/StorageSystemDataMotionStrategy.java#L1787
https://github.com/apache/cloudstack/blob/main/engine/storage/datamotion/src/main/java/org/apache/cloudstack/storage/motion/StorageSystemDataMotionStrategy.java#L1978-L1980
|
@blueorangutan package |
|
@shwstppr a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 1145 |
|
Tested offline migration on VMware env and the volume for which mapping was not provided in API call got detached from the VM in vCenter.
|
2925a11 to
9dfd1ec
Compare
|
I've found some issues when having Local + NFS attached, with Local + RBD data disk all works fine. I will check the workflow with the NFS tests feedback. |
|
@blueorangutan package |
|
@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
Packaging result: ✖️ el7 ✖️ el8 ✖️ debian ✖️ suse15. SL-JID 1209 |
|
@blueorangutan package |
|
@nvazquez a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 1230 |
|
I am running a couple more tests. However, considering that the 4.16 cut will be soon, I might reduce the scope to fix only RBD (datadisk) + Local (Root). And then block migrations with NFS (datadisk) + Local (Root). |
|
@blueorangutan test |
|
@sureshanaparti a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
|
@nvazquez a Trillian-Jenkins matrix job (centos7 mgmt + xs71, centos7 mgmt + vmware65, centos7 mgmt + kvmcentos7) has been kicked to run smoke tests |
|
@blueorangutan test matrix |
|
@sureshanaparti a Trillian-Jenkins matrix job (centos7 mgmt + xs71, centos7 mgmt + vmware65, centos7 mgmt + kvmcentos7) has been kicked to run smoke tests |
|
Trillian test result (tid-2340)
|
|
Trillian test result (tid-2339)
|
|
Trillian test result (tid-2341)
|
|
@blueorangutan test centos7 vmware-67u3 |
|
@nvazquez a Trillian-Jenkins test job (centos7 mgmt + vmware-67u3) has been kicked to run smoke tests |
|
Trillian Build Failed (tid-2348) |
|
@blueorangutan test centos7 vmware-67u3 |
|
@nvazquez a Trillian-Jenkins test job (centos7 mgmt + vmware-67u3) has been kicked to run smoke tests |
|
Trillian test result (tid-2349)
|
nvazquez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
GutoVeronezi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@sureshanaparti @vladimirpetrov @borisstoyanov @nvazquez do we need any further manual testing before merging this? |
@rhtyd yes, requires some manual tests for VMware migration cases. |
|
@GabrielBrascher @nvazquez @rhtyd Tested the following VM migration with volume(s) scenarios, with VMware 6.7.0. All passed.
Noticed issue (#5558) with storage type/offering for the volume, not related to this PR. |
Description
This PR fixes issue #5407. In summary, volumes that should not be migrated are still being mapped in
VirtualMachineManagerImpl.createStoragePoolMappingsForVolumes.To give some context: method
createStoragePoolMappingsForVolumesis used only when migrating a VM with its volume(s), therefore via the API call migrateVirtualMachineWithVolume.In such cases, there are two options:
A. User does not provide volumes, just the VM ID, and Host ID
B. User provides VM ID, Host ID, and a volume map, e.g.
migrateto[0].volume=71f43cd6-69b0-4d3b-9fbc-67f50963d60b&migrateto[0].pool=a382f181-3d2b-4413-b92d-b8931befa7e1&migrateto[1].volume=88de0173-55c0-4c1c-a269-83d0279eeedf&migrateto[1].pool=95d6e97c-6766-4d67-9a30-c449c15011d1.However, regardless of case A or B,
createStoragePoolMappingsForVolumesiterates all the VM's volumes and adds to be migrated. Therefore, this causes exceptions when migrating VMs with data disk attached whenever the data-disk (shared) should not be migrated.From my point of view, it does not make sense to add volumes that the user has not mapped to be migrated unless the VM is migrated to (I) different cluster or (II) the volume scope is LOCAL. Both of these cases are already covered by the if statement previous to the removed line in this PR.
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
How Has This Been Tested?
Migrate VM with Root disk in local storage.
Tests have been done via the UI as well as the API. Also, it considered on both cases:
A. User does not provide volumes, just the VM ID, and HostID;
B. User provides VM ID, and HostID, and a volume map, e.g.
migrateto[0].volume=71f43cd6-69b0-4d3b-9fbc-67f50963d60b&migrateto[0].pool=a382f181-3d2b-4413-b92d-b8931befa7e1&migrateto[1].volume=88de0173-55c0-4c1c-a269-83d0279eeedf&migrateto[1].pool=95d6e97c-6766-4d67-9a30-c449c15011d1.In both cases, the volume map has been done accordingly with the expected behavior.
VMs that fail to be migrated, have been successfully migrated and only the correct volume was migrated; data disks remained attached to the VM, at their storage pool, and in "Ready" state.