-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Fix storage cleanup corner case preventing VM deletion #5575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2d8bccb to
eaa3fdb
Compare
|
@blueorangutan package |
|
@nvazquez a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
| destVol.setUuid(uuid); | ||
| destVol.setInstanceId(instanceId); | ||
| // Prevent storage cleanup corner case of ROOT volume deletion | ||
| srcVol.setVolumeType(Type.DATADISK); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could that cause any regression, does this save the VO in DB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently testing for regressions. This saves the removed volume after storage migrations in DB as datadisk. @weizhouapache and I are considering the approach taken on restore VM as well: https://github.com/apache/cloudstack/blob/main/server/src/main/java/com/cloud/vm/UserVmManagerImpl.java#L7407-L7409
|
@blueorangutan package |
|
@nvazquez a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 1563 |
|
@blueorangutan test matrix |
|
@nvazquez a Trillian-Jenkins matrix job (centos7 mgmt + xs71, centos7 mgmt + vmware65, centos7 mgmt + kvmcentos7) has been kicked to run smoke tests |
|
Trillian test result (tid-2408)
|
|
Trillian test result (tid-2409)
|
|
Trillian test result (tid-2410)
|
weizhouapache
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code lgtm
engine/schema/src/main/java/com/cloud/storage/dao/VolumeDaoImpl.java
Outdated
Show resolved
Hide resolved
|
@blueorangutan package |
|
@nvazquez a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
DaanHoogland
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the vmware storage processor addresses this in a harmfull way;
"to destroy a volume; get the vm and destroy that"
I don't think we can solve it there now, but it should be addressed there as well
it is formally right to detach the volume when it should no longer be associated with the VM so this CLGTM
|
Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 1568 |
|
@blueorangutan test matrix |
|
@nvazquez a Trillian-Jenkins matrix job (centos7 mgmt + xs71, centos7 mgmt + vmware65, centos7 mgmt + kvmcentos7) has been kicked to run smoke tests |
weizhouapache
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approved again
|
@blueorangutan package |
|
@vladimirpetrov a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
Packaging result: ✔️ el7 ✔️ el8 ✖️ debian ✔️ suse15. SL-JID 1572 |
|
@blueorangutan package |
|
@nvazquez a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
|
Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 1575 |
|
@blueorangutan test centos7 vmware-67u3 |
|
@rhtyd a Trillian-Jenkins test job (centos7 mgmt + vmware-67u3) has been kicked to run smoke tests |
vladimirpetrov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM based on manual testing. Steps used:
(1) Create a VMWare 6.7u3 env with 2 pools (pool1, pool2).
(2) Create L2 network, and a VM. The created volume is on pool2.
(3) Put pool2 in maintenance mode, the VM is stopped.
(4) Start the VM. Its volume is copied to pool1.
(5) There are two records in the database with the same name ‘ROOT-xx’. Update the 'Expunged' volume like this:
MariaDB [cloud]> update volumes set removed=null,state='Destroy' where id=x;
(6) Change the following global settings, and restart the management server:
storage.cleanup.interval -> to 60 (seconds)
storage.cleanup.delay -> to 60 (seconds)
systemctl restart cloudstack-management
Result: Only the inactive volume is removed (expunged) and our VM with the active volume is still running fine.
2021-10-15 15:13:10,160 INFO [c.c.s.r.VmwareStorageProcessor] (DirectAgent-1:ctx-5a08de4b 10.0.35.14, cmd: DeleteCommand) (logid:5cbc532f) Executing resource DeleteCommand: {"data":{"org.apache.cloudstack.storage.to.VolumeObjectTO":{"volumeType":"DATADISK","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"43c3140e-8caf-38b8-9540-238fa5409707","name":"ref-trl-2009-v-M7-vladimir-petrov-esxi-pri2","id":2,"poolType":"NetworkFilesystem","host":"10.0.32.4","path":"/acs/primary/ref-trl-2009-v-M7-vladimir-petrov/ref-trl-2009-v-M7-vladimir-petrov-esxi-pri2","port":2049,"url":"NetworkFilesystem://10.0.32.4/acs/primary/ref-trl-2009-v-M7-vladimir-petrov/ref-trl-2009-v-M7-vladimir-petrov-esxi-pri2/?ROLE=Primary&STOREUUID=43c3140e-8caf-38b8-9540-238fa5409707","isManaged":false}},"name":"ROOT-6","size":2147483648,"path":"ROOT-6","volumeId":9,"accountId":2,"chainInfo":"{"diskDeviceBusName":"ide0:1","diskChain":["[43c3140e8caf38b89540238fa5409707] i-2-6-VM/ROOT-6.vmdk","[43c3140e8caf38b89540238fa5409707] e765b12f796432fc80cd4091e8c59f32/e765b12f796432fc80cd4091e8c59f32.vmdk"]}","format":"OVA","provisioningType":"THIN","poolId":2,"id":9,"hypervisorType":"VMware","directDownload":false,"deployAsIs":false}},"wait":0,"bypassHostMaintenance":false}
|
Trillian test result (tid-2422)
|
* Fix storage cleanup corner case * Improve deletion * Refactor
Description
This PR prevents a corner case for the storage cleanup task observed in Vmware in which:
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?