Skip to content

Conversation

@Pearl1594
Copy link
Contributor

Description

This PR attempts to fix: #6114

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

@nvazquez nvazquez added this to the 4.17.0.0 milestone Mar 14, 2022
@nvazquez
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@nvazquez a Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 2869

@nvazquez
Copy link
Contributor

@blueorangutan test ubuntu20 kvm-ubuntu20

@blueorangutan
Copy link

@nvazquez a Trillian-Jenkins test job (ubuntu20 mgmt + kvm-ubuntu20) has been kicked to run smoke tests

@weizhouapache
Copy link
Member

@Pearl1594 @nvazquez @sureshanaparti
this sounds like a critical issue in 4.16 as well
(it seems this operation is not supported in centos7 with stock qemu)

@weizhouapache
Copy link
Member

@Pearl1594 can you re-target to 4.16 ?

@Pearl1594
Copy link
Contributor Author

@blueorangutan test ubuntu20 kvm-ubuntu20

@blueorangutan
Copy link

@Pearl1594 a Trillian-Jenkins test job (ubuntu20 mgmt + kvm-ubuntu20) has been kicked to run smoke tests

@Pearl1594 Pearl1594 force-pushed the ubuntu-migration-fix branch from 3a21627 to 50b4073 Compare March 15, 2022 12:03
@Pearl1594 Pearl1594 changed the base branch from main to 4.16 March 15, 2022 12:03
@blueorangutan
Copy link

Trillian Build Failed (tid-3614)

@Pearl1594 Pearl1594 force-pushed the ubuntu-migration-fix branch from 927744b to a92bb31 Compare March 16, 2022 06:19
@weizhouapache weizhouapache added the Severity:Critical Critical bug label Mar 16, 2022
@weizhouapache
Copy link
Member

the issue seems to be caused by #5410
@GabrielBrascher could you please review this PR ?

@Pearl1594 I am wondering if this is better

        if (canHandleKVMNonManagedLiveNFSStorageMigration(volumeMap, srcHost, destHost) == StrategyPriority.CANT_HANDLE) {
            return StrategyPriority.CANT_HANDLE;
        }

@Pearl1594
Copy link
Contributor Author

I am not sure if it would be correct @weizhouapache (I may be wrong), but based on the 4.15.2 code , it looks like if the strategy returned by canHandleKVMNonManagedLiveNFSStorageMigration(volumeMap, srcHost, destHost is CANT_HANDLE, then there are some subsequent checks done before returning can't handle

if (canHandleKVMNonManagedLiveNFSStorageMigration(volumeMap, srcHost, destHost) == StrategyPriority.CANT_HANDLE) {
                Set<VolumeInfo> volumeInfoSet = volumeMap.keySet();

                for (VolumeInfo volumeInfo : volumeInfoSet) {
                    StoragePoolVO storagePoolVO = _storagePoolDao.findById(volumeInfo.getPoolId());
                    if (storagePoolVO.getPoolType() != StoragePoolType.Filesystem && storagePoolVO.getPoolType() != StoragePoolType.NetworkFilesystem) {
                        return StrategyPriority.CANT_HANDLE;
                    }
                }
            }

@weizhouapache
Copy link
Member

I am not sure if it would be correct @weizhouapache (I may be wrong), but based on the 4.15.2 code , it looks like if the strategy returned by canHandleKVMNonManagedLiveNFSStorageMigration(volumeMap, srcHost, destHost is CANT_HANDLE, then there are some subsequent checks done before returning can't handle

if (canHandleKVMNonManagedLiveNFSStorageMigration(volumeMap, srcHost, destHost) == StrategyPriority.CANT_HANDLE) {
                Set<VolumeInfo> volumeInfoSet = volumeMap.keySet();

                for (VolumeInfo volumeInfo : volumeInfoSet) {
                    StoragePoolVO storagePoolVO = _storagePoolDao.findById(volumeInfo.getPoolId());
                    if (storagePoolVO.getPoolType() != StoragePoolType.Filesystem && storagePoolVO.getPoolType() != StoragePoolType.NetworkFilesystem) {
                        return StrategyPriority.CANT_HANDLE;
                    }
                }
            }

@Pearl1594
OK, so the logic in this PR looks like same as what before #5410 , right ?

@GabrielBrascher
Copy link
Member

@weizhouapache thanks for pinging me.
I will double-check the file history to get some context and remember what changed over time and why.
Thanks for stepping up with this PR, @Pearl1594!

I believe that most of the work I've done on that PR was related to extracting a few validations added by @nvazquez in 0fbf500 and align with the workflow in case of Local volume + attached data disks. Might be good if @nvazquez can join us and double-check it as well.

I have some meetings and tasks to run. Get back as soon as I can to refresh my memory and re-debug this flow.

Regarding @Pearl1594`s comment in #6116 (comment)
@Pearl1594 I believe that you are right indeed. There are some special cases to be handled, with regards to being NFS migration or a local storage migration, which led to the subsequent checks you've referenced.

@Pearl1594
Copy link
Contributor Author

I am not sure if it would be correct @weizhouapache (I may be wrong), but based on the 4.15.2 code , it looks like if the strategy returned by canHandleKVMNonManagedLiveNFSStorageMigration(volumeMap, srcHost, destHost is CANT_HANDLE, then there are some subsequent checks done before returning can't handle

if (canHandleKVMNonManagedLiveNFSStorageMigration(volumeMap, srcHost, destHost) == StrategyPriority.CANT_HANDLE) {
                Set<VolumeInfo> volumeInfoSet = volumeMap.keySet();

                for (VolumeInfo volumeInfo : volumeInfoSet) {
                    StoragePoolVO storagePoolVO = _storagePoolDao.findById(volumeInfo.getPoolId());
                    if (storagePoolVO.getPoolType() != StoragePoolType.Filesystem && storagePoolVO.getPoolType() != StoragePoolType.NetworkFilesystem) {
                        return StrategyPriority.CANT_HANDLE;
                    }
                }
            }

@Pearl1594 OK, so the logic in this PR looks like same as what before #5410 , right ?

Yes @weizhouapache

@weizhouapache
Copy link
Member

I am not sure if it would be correct @weizhouapache (I may be wrong), but based on the 4.15.2 code , it looks like if the strategy returned by canHandleKVMNonManagedLiveNFSStorageMigration(volumeMap, srcHost, destHost is CANT_HANDLE, then there are some subsequent checks done before returning can't handle

if (canHandleKVMNonManagedLiveNFSStorageMigration(volumeMap, srcHost, destHost) == StrategyPriority.CANT_HANDLE) {
                Set<VolumeInfo> volumeInfoSet = volumeMap.keySet();

                for (VolumeInfo volumeInfo : volumeInfoSet) {
                    StoragePoolVO storagePoolVO = _storagePoolDao.findById(volumeInfo.getPoolId());
                    if (storagePoolVO.getPoolType() != StoragePoolType.Filesystem && storagePoolVO.getPoolType() != StoragePoolType.NetworkFilesystem) {
                        return StrategyPriority.CANT_HANDLE;
                    }
                }
            }

@Pearl1594 OK, so the logic in this PR looks like same as what before #5410 , right ?

Yes @weizhouapache

@Pearl1594
OK, looks good to me. let's wait for the smoke test results

Copy link
Member

@weizhouapache weizhouapache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code lgtm

@Pearl1594
Copy link
Contributor Author

@blueorangutan test ubuntu20 kvm-ubuntu20

@blueorangutan
Copy link

@Pearl1594 a Trillian-Jenkins test job (ubuntu20 mgmt + kvm-ubuntu20) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian Build Failed (tid-3635)

@Pearl1594
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@Pearl1594 a Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 2904

@Pearl1594
Copy link
Contributor Author

@blueorangutan test ubuntu20 kvm-ubuntu20

@blueorangutan
Copy link

@Pearl1594 a Trillian-Jenkins test job (ubuntu20 mgmt + kvm-ubuntu20) has been kicked to run smoke tests

Copy link
Member

@GabrielBrascher GabrielBrascher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will run some manual tests ASAP.
Waiting also for the smoke test results.
But code LGTM

@GabrielBrascher
Copy link
Member

@Pearl1594 @weizhouapache tested and it is working fine when migrating VMs with KVM + Root on Local.
In case tests are all good, I am LGTM in having this one merged.

@blueorangutan
Copy link

Trillian test result (tid-3636)
Environment: kvm-ubuntu20 (x2), Advanced Networking with Mgmt server u20
Total time taken: 32410 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr6116-t3636-kvm-ubuntu20.zip
Smoke tests completed. 92 look OK, 0 have errors
Only failed tests results shown below:

Test Result Time (s) Test File

Copy link
Contributor

@nvazquez nvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks @Pearl1594 and @GabrielBrascher

@nvazquez nvazquez marked this pull request as ready for review March 17, 2022 05:09
@nvazquez
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@nvazquez a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@nvazquez
Copy link
Contributor

Tests on ubuntu are fixed, started a new round of tests for centos to ensure no regressions are added - can be merged after that

@Pearl1594
Copy link
Contributor Author

@blueorangutan test

@blueorangutan
Copy link

@Pearl1594 a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-3662)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 46441 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr6116-t3662-kvm-centos7.zip
Smoke tests completed. 86 look OK, 6 have errors
Only failed tests results shown below:

Test Result Time (s) Test File
test_01_add_primary_storage_disabled_host Error 0.58 test_primary_storage.py
test_01_primary_storage_nfs Error 0.10 test_primary_storage.py
ContextSuite context=TestStorageTags>:setup Error 0.18 test_primary_storage.py
test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL Failure 581.46 test_vpc_redundant.py
test_02_redundant_VPC_default_routes Failure 343.92 test_vpc_redundant.py
test_reboot_router Failure 461.39 test_network.py
test_01_secure_vm_migration Error 161.28 test_vm_life_cycle.py
test_02_unsecure_vm_migration Error 266.01 test_vm_life_cycle.py
test_03_secured_to_nonsecured_vm_migration Error 142.81 test_vm_life_cycle.py
test_08_migrate_vm Error 44.96 test_vm_life_cycle.py
test_02_list_snapshots_with_removed_data_store Error 9.53 test_snapshots.py
test_02_list_snapshots_with_removed_data_store Error 9.53 test_snapshots.py
test_hostha_kvm_host_degraded Error 698.20 test_hostha_kvm.py
test_hostha_kvm_host_fencing Error 681.64 test_hostha_kvm.py

@nvazquez
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@nvazquez a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-3673)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 31731 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr6116-t3673-kvm-centos7.zip
Smoke tests completed. 91 look OK, 1 have errors
Only failed tests results shown below:

Test Result Time (s) Test File
test_hostha_enable_ha_when_host_in_maintenance Error 300.68 test_hostha_kvm.py

@nvazquez
Copy link
Contributor

Merging based on approvals and test results on ubuntu and KVM (error not related to the PR)

@nvazquez nvazquez merged commit f8b648b into apache:4.16 Mar 21, 2022
rohityadavcloud pushed a commit to shapeblue/cloudstack that referenced this pull request May 18, 2022
* Fix migration of VM with volume on Ubuntu

* address comment

(cherry picked from commit f8b648b)
Signed-off-by: Rohit Yadav <[email protected]>
rohityadavcloud pushed a commit to shapeblue/cloudstack that referenced this pull request May 18, 2022
* Fix migration of VM with volume on Ubuntu

* address comment

(cherry picked from commit f8b648b)
Signed-off-by: Rohit Yadav <[email protected]>
rohityadavcloud pushed a commit to shapeblue/cloudstack that referenced this pull request May 18, 2022
* Fix migration of VM with volume on Ubuntu

* address comment

(cherry picked from commit f8b648b)
Signed-off-by: Rohit Yadav <[email protected]>
rohityadavcloud pushed a commit to shapeblue/cloudstack that referenced this pull request May 18, 2022
* Fix migration of VM with volume on Ubuntu

* address comment

(cherry picked from commit f8b648b)
Signed-off-by: Rohit Yadav <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

Smoke test failed on ubuntu 20.04

7 participants