Skip to content

Conversation

@GabrielBrascher
Copy link
Member

@GabrielBrascher GabrielBrascher commented Nov 3, 2018

This PR proposes a new DataMotionStrategy KvmNonManagedStorageDataMotionStrategy that extends the StorageSystemDataMotionStrategy. This strategy allows migrating KVM VMs with the ROOT volume in a primary local file system storage (volumes on path /var/lib/libvirt/images/).

This design also aims to ease the development of any non-managed KVM volume migration, such as the implementation of @nvazquez in PR #2983 (NFS to NFS).

Copy link
Contributor

@wido wido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main question is how this code handles a Instance which has a disk on Local Storage and one or more on Shared storage like NFS or Ceph


for (VolumeInfo volumeInfo : volumeInfoSet) {
StoragePoolVO storagePoolVO = _storagePoolDao.findById(volumeInfo.getPoolId());
if (storagePoolVO.getPoolType() != StoragePoolType.Filesystem) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if the VM has a ROOT volume on a local disk (FileSystem) and has one or more data disks on NFS or Ceph for example?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this implementation is not working in such an example. The user would need to unplug the attached volume first and then migrate. I will check alternatives and ping in case I find a better way. If I am not able to solve it, I will make sure to document it and send the proper log/message for the user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if it doesn't work we should throw a proper error message saying that we can't migrate the Instance as long as data disks are attached.

Check if numDataVolumes > 0, then throw an Exception.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you, @wido, thanks for pointing that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that it will not work?

I mean, the XML that described the VM, and that is going to be used in the target host will have an entry for a disk that is in a shared volume. Therefore, as long as that volume is connected to the host, you should be good to go. That is the tricky you are doing here right, because you are dealing with local storage, you will need to first move/declare the volume in the target host, and then execute a migration command that will take care of the "copy/paste" of the local disks contents in the target host.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wido @rafaelweingartner updated the code and tested. Now it migrates VMs with attached data disk on shared storage.
The added code can be found in StorageSystemDataMotionStrategy.copyAsync(Map, VirtualMachineTO, Host, Host, AsyncCompletionCallback):

                if (!shouldMigrateVolume(sourceStoragePool, destHost, destStoragePool)) {
                    continue;
                }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Thanks!

if (migrateDiskInfo != null && migrateDiskInfo.isSourceDiskOnStorageFileSystem()) {
deleteLocalVolume(disk.getDiskPath());
} else {
libvirtComputingResource.cleanupDisk(disk);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this clean up a disk which is non-local? Isn't that a potential danger that you remove a data disk?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this the old behaviour anyway?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wido the libvirtComputingResource.cleanupDisk(disk) was already being executed. The cleanupDisk just disconnects from the source host, after it has been migrated to another host; it works that way with iScsi, for instance. I can also work on renaming/documenting the cleanupDisk method, if you think that it would good.

The deleteLocalVolume method was added to handle in case of the disk is on a file; in such case it deletes the file from the source host (local primary storage).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will delete the disk only after a successful migration, right?


if (migrateStorage) {
flags |= 1 << 6;
flags += VIR_MIGRATE_NON_SHARED_DISK;
Copy link
Contributor

@nvazquez nvazquez Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like these changes 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write as flags |= VIR_MIGRATE_NON_SHARED_DISK. In general, enabling is flag is more readable as performing an OR, | operation than an addition operation.

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ok at first sight but the change does not fit in my puny brain. let's test extensively in all kinds of environments.

return sourceText;
}

public boolean isSourceDiskOnStorageFileSystem() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for serialised objectsw like this we better use wrapper classes instead of primatives. gson upgrade is going to demand it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaanHoogland I was not aware of such issue when using primitives. Shouldn't gson work with primitives? Can you please show some details? I did some googling but could not find such information; if it is indeed a problem I will also change all the primitives in MigrateCommand.java and MigrateDiskInfo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

old gson does, but when we try to upgrade, we'll have issues, so let's start to use wrapper classes

Copy link
Member Author

@GabrielBrascher GabrielBrascher Nov 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaanHoogland, I was checking and Gson 2.0 doesn’t support type adapters for primitive types. However, the primitive types still are supported.

The Gson documentation is misleading us to think that it does not support primitive types; however, it does support. What it does not support is creating Type Adapters for primitive types.

I did some testing with Gson 2.8.5 just to make sure; we can still marshall/unmarshal a primitive variable with Gson 2.X.

if (migrateDiskInfo != null && migrateDiskInfo.isSourceDiskOnStorageFileSystem()) {
deleteLocalVolume(disk.getDiskPath());
} else {
libvirtComputingResource.cleanupDisk(disk);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this the old behaviour anyway?

@GabrielBrascher GabrielBrascher force-pushed the kvm-filesystem-datamotion-strategy branch 11 times, most recently from b64176f to 3d81ec4 Compare November 8, 2018 17:48
private boolean migrateStorage;
private boolean autoConvergence;

/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need more of those, and please let's stop bit shifting.


for (VolumeInfo volumeInfo : volumeInfoSet) {
StoragePoolVO storagePoolVO = _storagePoolDao.findById(volumeInfo.getPoolId());
if (storagePoolVO.getPoolType() != StoragePoolType.Filesystem) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that it will not work?

I mean, the XML that described the VM, and that is going to be used in the target host will have an entry for a disk that is in a shared volume. Therefore, as long as that volume is connected to the host, you should be good to go. That is the tricky you are doing here right, because you are dealing with local storage, you will need to first move/declare the volume in the target host, and then execute a migration command that will take care of the "copy/paste" of the local disks contents in the target host.

if (migrateDiskInfo != null && migrateDiskInfo.isSourceDiskOnStorageFileSystem()) {
deleteLocalVolume(disk.getDiskPath());
} else {
libvirtComputingResource.cleanupDisk(disk);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will delete the disk only after a successful migration, right?

@GabrielBrascher GabrielBrascher force-pushed the kvm-filesystem-datamotion-strategy branch from 3d81ec4 to 2f49d01 Compare November 9, 2018 13:58
Copy link
Member

@rohityadavcloud rohityadavcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, overall lgtm!

@rohityadavcloud
Copy link
Member

Test run lgtm.

@GabrielBrascher GabrielBrascher force-pushed the kvm-filesystem-datamotion-strategy branch 2 times, most recently from 58bc835 to ebb81b1 Compare December 12, 2018 18:26
@rohityadavcloud
Copy link
Member

Looks like some changes were pushed -f, will kick tests
@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@GabrielBrascher GabrielBrascher force-pushed the kvm-filesystem-datamotion-strategy branch from ebb81b1 to 039c1eb Compare December 12, 2018 18:32
@blueorangutan
Copy link

Packaging result: ✖centos6 ✖centos7 ✖debian. JID-2488

@rohityadavcloud
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-2489

@rohityadavcloud
Copy link
Member

@blueorangutan test

@blueorangutan
Copy link

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-3254)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 24251 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr2997-t3254-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_deploy_virtio_scsi_vm.py
Intermittent failure detected: /marvin/tests/smoke/test_multipleips_per_nic.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Smoke tests completed. 68 look OK, 2 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_nic_secondaryip_add_remove Error 33.40 test_multipleips_per_nic.py
test_04_rvpc_network_garbage_collector_nics Failure 497.76 test_vpc_redundant.py

@GabrielBrascher
Copy link
Member Author

Thanks for the feedback, @rhtyd, @DaanHoogland, @nvazquez, @wido, @rafaelweingartner, @mike-tutkowski! Trillian test is good, and we have 4 LGTMs. This one looks ready to merge.

The Travis failures are not related to this PR. I have been digging on what is causing Travis to fail on recent PRs.

implementation (developers can easily guess the name and use
autocomplete)
@GabrielBrascher GabrielBrascher force-pushed the kvm-filesystem-datamotion-strategy branch from 039c1eb to 97771f2 Compare December 13, 2018 16:38
@rafaelweingartner
Copy link
Member

Integration tests passed, reviews were made and adjusts executed. Travis problems are not related to this PR. Therefore, I am going to merge it.

@rafaelweingartner rafaelweingartner merged commit bf20940 into apache:master Dec 14, 2018
@rohityadavcloud
Copy link
Member

Few of my comments were not addressed or replied to, but I'll live with the changes. @GabrielBrascher @rafaelweingartner

@rafaelweingartner
Copy link
Member

@rhtyd which ones?

@rohityadavcloud
Copy link
Member

@rafaelweingartner go to the files changed tab, there are a few minor comments but nothing I cannot live with.

@rafaelweingartner
Copy link
Member

Well, I indeed checked everything, and everything seems to be already answered/addressed. There was only one comment that you did regarding the flag building that @GabrielBrascher used a more human readable method instead of playing/shifting bits around. That seemed to be covered by the feedback of the community, which was very positive.

rohityadavcloud added a commit to shapeblue/cloudstack that referenced this pull request Sep 18, 2019
Considering that we have addressed a considerable effort on allowing KVM
to perform storage data motion, this PR proposes updating the
'hypervisor_capabilities' table setting the 'storage_motion_supported'
to '1' for KVM.

PRs that implemented KVM storage motion features:

Non-managed storages
 apache#2997 KVM VM live migration with ROOT volume on file storage type
 apache#2983 KVM live storage migration intra cluster from NFS source and destination
Managed storages
 apache#2298 CLOUDSTACK-9620: Enhancements for managed storage

Signed-off-by: Rohit Yadav <[email protected]>
rohityadavcloud added a commit that referenced this pull request Sep 19, 2019
- Removes CentOS6/el6 packaging (voting thread reference https://markmail.org/message/u3ka4hwn2lzwiero)
- Add upgrade path from 4.13 to 4.14
- Enable live storage migration support for KVM by default as el6 is deprecated
- PRs using live storage migration
  #2997 KVM VM live migration with ROOT volume on file storage type
  #2983 KVM live storage migration intra cluster from NFS source and destination
  #2298 CLOUDSTACK-9620: Enhancements for managed storage

Signed-off-by: Rohit Yadav <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants