-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Milestone
Description
ISSUE TYPE
- Bug Report
COMPONENT NAME
- Storage & KVM
CLOUDSTACK VERSION
- 4.17.0.1
CONFIGURATION
- 1 X Management Server that hosts /primary & /secondary storage.
- 1 X KVM Server.
- Advanced Network - Default Security Group.
OS / ENVIRONMENT
Both Management & KVM run Almalinux 8
SUMMARY
- Making manual or recurring volume snapshots fails and times out after 3600 seconds. The timeout seems to happen once VM Snapshot is converted to qcow2.
STEPS TO REPRODUCE
- Run volume snapshot of linux VM while VM is online.
Here are a few log entries:
2022-10-17 20:51:30,536 WARN [o.a.c.f.j.i.AsyncJobMonitor] (Timer-0:ctx-ec4e885c) (logid:0ca5ca78) Task (job-956) has been pending for 3266 seconds
2022-10-17 20:51:30,568 DEBUG [c.c.s.StatsCollector] (secstorage-1:ctx-714f41f7) (logid:da002c15) Verifying image storage [1]. Capacity: total=[870 GB], used=[357 GB], threshold=[89.99999761581421%].
2022-10-17 20:51:30,570 DEBUG [o.a.c.s.SecondaryStorageManagerImpl] (secstorage-1:ctx-714f41f7) (logid:da002c15) Zone [1] is ready to launch secondary storage VM.
2022-10-17 20:51:33,971 DEBUG [o.a.c.h.HAManagerImpl] (BackgroundTaskPollManager-4:ctx-516d88d5) (logid:3305a3a6) HA health check task is running...
2022-10-17 20:51:35,610 DEBUG [c.c.a.m.AgentManagerImpl] (AgentManager-Handler-15:null) (logid:) SeqA 5-1273: Processing Seq 5-1273: { Cmd , MgmtId: -1, via: 5, Ver: v1, Flags: 11, [{"com.cloud.agent.api.ConsoleProxyLoadReportCommand":{"_proxyVmId":"11","_loadInfo":"{
2022-10-17 20:57:03,623 DEBUG [c.c.a.m.AgentAttache] (Work-Job-Executor-1:ctx-ecdbd329 job-955/job-956 ctx-3220426d) (logid:917cdd39) Seq 1-8475211548757852179: Waiting some more time because this is the current command
2022-10-17 20:57:03,626 WARN [c.c.a.m.AgentAttache] (Work-Job-Executor-1:ctx-ecdbd329 job-955/job-956 ctx-3220426d) (logid:917cdd39) Seq 1-8475211548757852179: Timed out on Seq 1-8475211548757852179: { Cmd , MgmtId: 2619780474, via: 1(KVMHOSTNAME), Ver: v1, Flags: 100011, [{"org.apache.cloudstack.storage.command.CreateObjectCommand":{"data":{"org.apache.cloudstack.storage.to.SnapshotObjectTO":{"volume":{"uuid":"635fd5b5-bf14-4cf4-979c-be8a649d5189","volumeType":"ROOT","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"81ffca3a-9775-375d-a1c0-9504c0ec3d89","name":"Primary","id":"2","poolType":"NetworkFilesystem","host":"MGMNTSERVERIPADDRESS","path":"/mnt/primary","port":"2049","url":"NetworkFilesystem://MGMNTSERVERIPADDRESS/mnt/primary/?ROLE=Primary&STOREUUID=81ffca3a-9775-375d-a1c0-9504c0ec3d89","isManaged":"false"}},"name":"ROOT-19","size":"(50,00 GB) 53687091200","path":"635fd5b5-bf14-4cf4-979c-be8a649d5189","volumeId":"24","vmName":"i-2-19-VM","accountId":"2","format":"QCOW2","provisioningType":"THIN","poolId":"2","id":"24","deviceId":"0","cacheMode":"NONE","hypervisorType":"KVM","directDownload":"false","deployAsIs":"false"},"dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"81ffca3a-9775-375d-a1c0-9504c0ec3d89","name":"Primary","id":"2","poolType":"NetworkFilesystem","host":"MGMNTSERVERIPADDRESS","path":"/mnt/primary","port":"2049","url":"NetworkFilesystem://MGMNTSERVERIPADDRESS/mnt/primary/?ROLE=Primary&STOREUUID=81ffca3a-9775-375d-a1c0-9504c0ec3d89","isManaged":"false"}},"vmName":"i-2-19-VM","name":"Test006","hypervisorType":"KVM","id":"27","quiescevm":"false","physicalSize":"0"}},"wait":"0","bypassHostMaintenance":"false"}}] }
2022-10-17 20:57:03,626 DEBUG [c.c.a.m.AgentAttache] (Work-Job-Executor-1:ctx-ecdbd329 job-955/job-956 ctx-3220426d) (logid:917cdd39) Seq 1-8475211548757852179: Cancelling.
2022-10-17 20:57:03,626 DEBUG [o.a.c.s.RemoteHostEndPoint] (Work-Job-Executor-1:ctx-ecdbd329 job-955/job-956 ctx-3220426d) (logid:917cdd39) Failed to send command, due to Agent:1, com.cloud.exception.OperationTimedoutException: Commands 8475211548757852179 to Host 1 timed out after 3600
2022-10-17 20:57:03,627 DEBUG [o.a.c.s.d.d.CloudStackPrimaryDataStoreDriverImpl] (Work-Job-Executor-1:ctx-ecdbd329 job-955/job-956 ctx-3220426d) (logid:917cdd39) Failed to take snapshot: 27
2022-10-17 20:57:03,628 DEBUG [o.a.c.s.s.SnapshotServiceImpl] (Work-Job-Executor-1:ctx-ecdbd329 job-955/job-956 ctx-3220426d) (logid:917cdd39) create snapshot Test006 failed: com.cloud.utils.exception.CloudRuntimeException: Failed to send command, due to Agent:1, com.cloud.exception.OperationTimedoutException: Commands 8475211548757852179 to Host 1 timed out after 3600
2022-10-17 20:57:03,640 DEBUG [o.a.c.s.s.DefaultSnapshotStrategy] (Work-Job-Executor-1:ctx-ecdbd329 job-955/job-956 ctx-3220426d) (logid:917cdd39) Failed to take snapshot: com.cloud.utils.exception.CloudRuntimeException: Failed to send command, due to Agent:1, com.cloud.exception.OperationTimedoutException: Commands 8475211548757852179 to Host 1 timed out after 3600
2022-10-17 20:57:03,646 DEBUG [c.c.s.s.SnapshotManagerImpl] (Work-Job-Executor-1:ctx-ecdbd329 job-955/job-956 ctx-3220426d) (logid:917cdd39) Failed to create snapshotcom.cloud.utils.exception.CloudRuntimeException: Failed to send command, due to Agent:1, com.cloud.exception.OperationTimedoutException: Commands 8475211548757852179 to Host 1 timed out after 3600
2022-10-17 20:57:03,646 DEBUG [c.c.r.ResourceLimitManagerImpl] (Work-Job-Executor-1:ctx-ecdbd329 job-955/job-956 ctx-3220426d) (logid:917cdd39) Updating resource Type = snapshot count for Account = 2 Operation = decreasing Amount = 1
2022-10-17 20:57:03,650 DEBUG [c.c.r.ResourceLimitManagerImpl] (Work-Job-Executor-1:ctx-ecdbd329 job-955/job-956 ctx-3220426d) (logid:917cdd39) Updating resource Type = secondary_storage count for Account = 2 Operation = decreasing Amount = (50,00 GB) 53687091200
2022-10-17 20:57:03,654 ERROR [o.a.c.s.v.VolumeServiceImpl] (Work-Job-Executor-1:ctx-ecdbd329 job-955/job-956 ctx-3220426d) (logid:917cdd39) Take snapshot: 24 failed
2022-10-17 20:57:03,655 ERROR [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-1:ctx-ecdbd329 job-955/job-956 ctx-3220426d) (logid:917cdd39) Invocation exception, caused by: com.cloud.utils.exception.CloudRuntimeException: com.cloud.utils.exception.CloudRuntimeException: Failed to send command, due to Agent:1, com.cloud.exception.OperationTimedoutException: Commands 8475211548757852179 to Host 1 timed out after 3600
2022-10-17 20:57:03,655 INFO [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-1:ctx-ecdbd329 job-955/job-956 ctx-3220426d) (logid:917cdd39) Rethrow exception com.cloud.utils.exception.CloudRuntimeException: com.cloud.utils.exception.CloudRuntimeException: Failed to send command, due to Agent:1, com.cloud.exception.OperationTimedoutException: Commands 8475211548757852179 to Host 1 timed out after 3600
2022-10-17 20:57:03,656 DEBUG [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-1:ctx-ecdbd329 job-955/job-956) (logid:917cdd39) Done with run of VM work job: com.cloud.vm.VmWorkTakeVolumeSnapshot for VM 19, job origin: 955
2022-10-17 20:57:03,656 ERROR [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-1:ctx-ecdbd329 job-955/job-956) (logid:917cdd39) Unable to complete AsyncJobVO: {id:956, userId: 2, accountId: 2, instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkTakeVolumeSnapshot, cmdInfo: rO0ABXNyACVjb20uY2xvdWQudm0uVm1Xb3JrVGFrZVZvbHVtZVNuYXBzaG90BL5gG4Li1c8CAAZaAAthc3luY0JhY2t1cFoACXF1aWVzY2VWbUwADGxvY2F0aW9uVHlwZXQAKUxjb20vY2xvdWQvc3RvcmFnZS9TbmFwc2hvdCRMb2NhdGlvblR5cGU7TAAIcG9saWN5SWR0ABBMamF2YS9sYW5nL0xvbmc7TAAKc25hcHNob3RJZHEAfgACTAAIdm9sdW1lSWRxAH4AAnhyABNjb20uY2xvdWQudm0uVm1Xb3Jrn5m2VvAlZ2sCAARKAAlhY2NvdW50SWRKAAZ1c2VySWRKAAR2bUlkTAALaGFuZGxlck5hbWV0ABJMamF2YS9sYW5nL1N0cmluZzt4cAAAAAAAAAACAAAAAAAAAAIAAAAAAAAAE3QAFFZvbHVtZUFwaVNlcnZpY2VJbXBsAABwc3IADmphdmEubGFuZy5Mb25nO4vkkMyPI98CAAFKAAV2YWx1ZXhyABBqYXZhLmxhbmcuTnVtYmVyhqyVHQuU4IsCAAB4cAAAAAAAAAAAc3EAfgAHAAAAAAAAABtzcQB-AAcAAAAAAAAAGA, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 2619780474, completeMsid: null, lastUpdated: null, lastPolled: null, created: Mon Oct 17 19:57:03 SAST 2022, removed: null}, job origin:955
2022-10-17 20:57:03,667 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-1:ctx-ecdbd329 job-955/job-956) (logid:917cdd39) Complete async job-956, jobStatus: FAILED, resultCode: 0, result: rO0ABXNyAC9jb20uY2xvdWQudXRpbHMuZXhjZXB0aW9uLkNsb3VkUnVudGltZUV4Y2VwdGlvbgAAAABWTU9yAwABSQALY3NFcnJvckNvZGV4cgAaamF2YS5sYW5nLlJ1bnRpbWVFeGNlcHRpb26eXwZHCjSD5QIAAHhyABNqYXZhLmxhbmcuRXhjZXB0aW9u0P0fPho7HMQCAAB4cgATamF2YS5sYW5nLlRocm93YWJsZdXGNSc5d7jLAwAETAAFY2F1c2V0ABVMamF2YS9sYW5nL1Rocm93YWJsZTtMAA1kZXRhaWxNZXNzYWdldAASTGphdmEvbGFuZy9TdHJpbmc7WwAKc3RhY2tUcmFjZXQAHltMamF2YS9sYW5nL1N0YWNrVHJhY2VFbGVtZW50O0wAFHN1cHByZXNzZWRFeGNlcHRpb25zdAAQTGphdmEvdXRpbC9MaXN0O3hwcQB-AAh0AMRjb20uY2xvdWQudXRpbHMuZXhjZXB0aW9uLkNsb3VkUnVudGltZUV4Y2VwdGlvbjogRmFpbGVkIHRvIHNlbmQgY29tbWFuZCwgZHVlIHRvIEFnZW50OjEsIGNvbS5jbG91ZC5leGNlcHRpb24uT3BlcmF0aW9uVGltZWRvdXRFeGNlcHRpb246IENvbW1hbmRzIDg0NzUyMTE1NDg3NTc4NTIxNzkgdG8gSG9zdCAxIHRpbWVkIG91dCBhZnRlciAzNjAwdXIAHltMamF2YS5sYW5nLlN0YWNrVHJhY2VFbGVtZW50OwJGKjw8_SI5AgAAeHAAAAAuc3IAG2phdmEubGFuZy5TdGFja1RyYWNlRWxlbWVudGEJxZomNt2FAgAIQgAGZm9ybWF0SQAKbGluZU51bWJlckwAD2NsYXNzTG9hZGVyTmFtZXEAfgAFTAAOZGVjbGFyaW5nQ2xhc3NxAH4ABUwACGZpbGVOYW1lcQB-AAVMAAptZXRob2ROYW1lcQB-AAVMAAptb2R1bGVOYW1lcQB-AAVMAA1tb2R1bGVWZXJzaW9ucQB-AAV4cAEAAAG6dAADYXBwdAA-b3JnLmFwYWNoZS5jbG91ZHN0YWNrLnN0b3JhZ2Uuc25hcHNob3QuRGVmYXVsdFNuYXBzaG90U3RyYXRlZ3l0ABxEZWZhdWx0U25hcHNob3RTdHJhdGVneS5qYXZhdAAMdGFrZVNuYXBzaG90cHBzcQB-AAwBAAAE2HEAfgAOdAAuY29tLmNsb3VkLnN0b3JhZ2Uuc25hcHNob3QuU25hcHNob3RNYW5hZ2VySW1wbHQAGFNuYXBzaG90TWFuYWdlckltcGwuamF2YXEAfgARcHBzcQB-AAwC_____nB0AC1qZGsuaW50ZXJuYWwucmVmbGVjdC5OYXRpdmVNZXRob2RBY2Nlc3NvckltcGx0AB1OYXRpdmVNZXRob2RBY2Nlc3NvckltcGwuamF2YXQAB2ludm9rZTB0AAlqYXZhLmJhc2V0AAkxMS4wLjE2LjFzcQB-AAwCAAAAPnBxAH4AFnEAfgAXdAAGaW52b2tlcQB-ABlxAH4AGnNxAH4ADAIAAAArcHQAMWpkay5pbnRlcm5hbC5yZWZsZWN0LkRlbGVnYXRpbmdNZXRob2RBY2Nlc3NvckltcGx0ACFEZWxlZ2F0aW5nTWV0aG9kQWNjZXNzb3JJbXBsLmphdmFxAH4AHHEAfgAZcQB-ABpzcQB-AAwCAAACNnB0ABhqYXZhLmxhbmcucmVmbGVjdC5NZXRob2R0AAtNZXRob2QuamF2YXEAfgAccQB-ABlxAH4AGnNxAH4ADAEAAAFYcQB-AA50AChvcmcuc3ByaW5nZnJhbWV3b3JrLmFvcC5zdXBwb3J0LkFvcFV0aWxzdAANQW9wVXRpbHMuamF2YXQAHmludm9rZUpvaW5wb2ludFVzaW5nUmVmbGVjdGlvbnBwc3EAfgAMAQAAAMZxAH4ADnQAPG9yZy5zcHJpbmdmcmFtZXdvcmsuYW9wLmZyYW1ld29yay5SZWZsZWN0aXZlTWV0aG9kSW52b2NhdGlvbnQAH1JlZmxlY3RpdmVNZXRob2RJbnZvY2F0aW9uLmphdmF0AA9pbnZva2VKb2lucG9pbnRwcHNxAH4ADAEAAACjcQB-AA5xAH4AKHEAfgApdAAHcHJvY2VlZHBwc3EAfgAMAQAAAGFxAH4ADnQAP29yZy5zcHJpbmdmcmFtZXdvcmsuYW9wLmludGVyY2VwdG9yLkV4cG9zZUludm9jYXRpb25JbnRlcmNlcHRvcnQAIEV4cG9zZUludm9jYXRpb25JbnRlcmNlcHRvci5qYXZhcQB-ABxwcHNxAH4ADAEAAAC6cQB-AA5xAH4AKHEAfgApcQB-ACxwcHNxAH4ADAEAAADXcQB-AA50ADRvcmcuc3ByaW5nZnJhbWV3b3JrLmFvcC5mcmFtZXdvcmsuSmRrRHluYW1pY0FvcFByb3h5dAAXSmRrRHluYW1pY0FvcFByb3h5LmphdmFxAH4AHHBwc3EAfgAMAP____9wdAAXY29tLnN1bi5wcm94eS4kUHJveHkyMjFwcQB-ABFwcHNxAH4ADAEAAAqDcQB-AA50ADZvcmcuYXBhY2hlLmNsb3Vkc3RhY2suc3RvcmFnZS52b2x1bWUuVm9sdW1lU2VydmljZUltcGx0ABZWb2x1bWVTZXJ2aWNlSW1wbC5qYXZhcQB-ABFwcHNxAH4ADAEAAAzLcQB-AA50ACZjb20uY2xvdWQuc3RvcmFnZS5Wb2x1bWVBcGlTZXJ2aWNlSW1wbHQAGVZvbHVtZUFwaVNlcnZpY2VJbXBsLmphdmF0AB1vcmNoZXN0cmF0ZVRha2VWb2x1bWVTbmFwc2hvdHBwc3EAfgAMAQAAEONxAH4ADnEAfgA6cQB-ADtxAH4APHBwc3EAfgAMAv____5wcQB-ABZxAH4AF3EAfgAYcQB-ABlxAH4AGnNxAH4ADAIAAAA-cHEAfgAWcQB-ABdxAH4AHHEAfgAZcQB-ABpzcQB-AAwCAAAAK3BxAH4AHnEAfgAfcQB-ABxxAH4AGXEAfgAac3EAfgAMAgAAAjZwcQB-ACFxAH4AInEAfgAccQB-ABlxAH4AGnNxAH4ADAEAAABrcQB-AA50ACJjb20uY2xvdWQudm0uVm1Xb3JrSm9iSGFuZGxlclByb3h5dAAaVm1Xb3JrSm9iSGFuZGxlclByb3h5LmphdmF0AA9oYW5kbGVWbVdvcmtKb2JwcHNxAH4ADAEAABDpcQB-AA5xAH4AOnEAfgA7cQB-AEVwcHNxAH4ADAL____-cHEAfgAWcQB-ABdxAH4AGHEAfgAZcQB-ABpzcQB-AAwCAAAAPnBxAH4AFnEAfgAXcQB-ABxxAH4AGXEAfgAac3EAfgAMAgAAACtwcQB-AB5xAH4AH3EAfgAccQB-ABlxAH4AGnNxAH4ADAIAAAI2cHEAfgAhcQB-ACJxAH4AHHEAfgAZcQB-ABpzcQB-AAwBAAABWHEAfgAOcQB-ACRxAH4AJXEAfgAmcHBzcQB-AAwBAAAAxnEAfgAOcQB-AChxAH4AKXEAfgAqcHBzcQB-AAwBAAAAo3EAfgAOcQB-AChxAH4AKXEAfgAscHBzcQB-AAwBAAAAYXEAfgAOcQB-AC5xAH4AL3EAfgAccHBzcQB-AAwBAAAAunEAfgAOcQB-AChxAH4AKXEAfgAscHBzcQB-AAwBAAAA13EAfgAOcQB-ADJxAH4AM3EAfgAccHBzcQB-AAwA_____3B0ABdjb20uc3VuLnByb3h5LiRQcm94eTIyOXBxAH4ARXBwc3EAfgAMAQAAAGZxAH4ADnQAIGNvbS5jbG91ZC52bS5WbVdvcmtKb2JEaXNwYXRjaGVydAAYVm1Xb3JrSm9iRGlzcGF0Y2hlci5qYXZhdAAGcnVuSm9icHBzcQB-AAwBAAACbHEAfgAOdAA_b3JnLmFwYWNoZS5jbG91ZHN0YWNrLmZyYW1ld29yay5qb2JzLmltcGwuQXN5bmNKb2JNYW5hZ2VySW1wbCQ1dAAYQXN5bmNKb2JNYW5hZ2VySW1wbC5qYXZhdAAMcnVuSW5Db250ZXh0cHBzcQB-AAwBAAAAMHEAfgAOdAA-b3JnLmFwYWNoZS5jbG91ZHN0YWNrLm1hbmFnZWQuY29udGV4dC5NYW5hZ2VkQ29udGV4dFJ1bm5hYmxlJDF0ABtNYW5hZ2VkQ29udGV4dFJ1bm5hYmxlLmphdmF0AANydW5wcHNxAH4ADAEAAAA3cQB-AA50AEJvcmcuYXBhY2hlLmNsb3Vkc3RhY2subWFuYWdlZC5jb250ZXh0LmltcGwuRGVmYXVsdE1hbmFnZWRDb250ZXh0JDF0ABpEZWZhdWx0TWFuYWdlZENvbnRleHQuamF2YXQABGNhbGxwcHNxAH4ADAEAAABmcQB-AA50AEBvcmcuYXBhY2hlLmNsb3Vkc3RhY2subWFuYWdlZC5jb250ZXh0LmltcGwuRGVmYXVsdE1hbmFnZWRDb250ZXh0cQB-AGF0AA9jYWxsV2l0aENvbnRleHRwcHNxAH4ADAEAAAA0cQB-AA5xAH4AZHEAfgBhdAAOcnVuV2l0aENvbnRleHRwcHNxAH4ADAEAAAAtcQB-AA50ADxvcmcuYXBhY2hlLmNsb3Vkc3RhY2subWFuYWdlZC5jb250ZXh0Lk1hbmFnZWRDb250ZXh0UnVubmFibGVxAH4AXXEAfgBecHBzcQB-AAwBAAACOHEAfgAOcQB-AFhxAH4AWXEAfgBecHBzcQB-AAwCAAACA3B0AC5qYXZhLnV0aWwuY29uY3VycmVudC5FeGVjdXRvcnMkUnVubmFibGVBZGFwdGVydAAORXhlY3V0b3JzLmphdmFxAH4AYnEAfgAZcQB-ABpzcQB-AAwCAAABCHB0AB9qYXZhLnV0aWwuY29uY3VycmVudC5GdXR1cmVUYXNrdAAPRnV0dXJlVGFzay5qYXZhcQB-AF5xAH4AGXEAfgAac3EAfgAMAgAABGhwdAAnamF2YS51dGlsLmNvbmN1cnJlbnQuVGhyZWFkUG9vbEV4ZWN1dG9ydAAXVGhyZWFkUG9vbEV4ZWN1dG9yLmphdmF0AAlydW5Xb3JrZXJxAH4AGXEAfgAac3EAfgAMAgAAAnRwdAAuamF2YS51dGlsLmNvbmN1cnJlbnQuVGhyZWFkUG9vbEV4ZWN1dG9yJFdvcmtlcnEAfgBzcQB-AF5xAH4AGXEAfgAac3EAfgAMAgAAAz1wdAAQamF2YS5sYW5nLlRocmVhZHQAC1RocmVhZC5qYXZhcQB-AF5xAH4AGXEAfgAac3IAH2phdmEudXRpbC5Db2xsZWN0aW9ucyRFbXB0eUxpc3R6uBe0PKee3gIAAHhweAAAEJp3CAAAAAAAAAAAeA
2022-10-17 20:57:03,668 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-1:ctx-ecdbd329 job-955/job-956) (logid:917cdd39) Publish async job-956 complete on message bus
2022-10-17 20:57:03,668 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-1:ctx-ecdbd329 job-955/job-956) (logid:917cdd39) Wake up jobs related to job-956
2022-10-17 20:57:03,668 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-1:ctx-ecdbd329 job-955/job-956) (logid:917cdd39) Update db status for job-956
2022-10-17 20:57:03,670 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-1:ctx-ecdbd329 job-955/job-956) (logid:917cdd39) Wake up jobs joined with job-956 and disjoin all subjobs created from job- 956
2022-10-17 20:57:03,676 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-1:ctx-ecdbd329 job-955/job-956) (logid:917cdd39) Done executing com.cloud.vm.VmWorkTakeVolumeSnapshot for job-956
2022-10-17 20:57:03,677 INFO [o.a.c.f.j.i.AsyncJobMonitor] (Work-Job-Executor-1:ctx-ecdbd329 job-955/job-956) (logid:917cdd39) Remove job-956 from job monitoring
2022-10-17 20:57:03,688 ERROR [o.a.c.a.c.u.s.CreateSnapshotCmd] (API-Job-Executor-1:ctx-4c0df155 job-955 ctx-04b40830) (logid:917cdd39) Failed to create snapshot due to an internal error creating snapshot for volume 635fd5b5-bf14-4cf4-979c-be8a649d5189
2022-10-17 20:57:03,696 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-1:ctx-4c0df155 job-955) (logid:917cdd39) Complete async job-955, jobStatus: FAILED, resultCode: 530, result: org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":"530","errortext":"Failed to create snapshot due to an internal error creating snapshot for volume 635fd5b5-bf14-4cf4-979c-be8a649d5189"}
2022-10-17 20:57:03,696 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-1:ctx-4c0df155 job-955) (logid:917cdd39) Publish async job-955 complete on message bus
2022-10-17 20:57:03,696 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-1:ctx-4c0df155 job-955) (logid:917cdd39) Wake up jobs related to job-955
2022-10-17 20:57:03,696 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-1:ctx-4c0df155 job-955) (logid:917cdd39) Update db status for job-955
2022-10-17 20:57:03,697 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-1:ctx-4c0df155 job-955) (logid:917cdd39) Wake up jobs joined with job-955 and disjoin all subjobs created from job- 955
2022-10-17 20:57:03,699 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-1:ctx-4c0df155 job-955) (logid:917cdd39) Done executing org.apache.cloudstack.api.command.user.snapshot.CreateSnapshotCmd for job-955
2022-10-17 20:57:03,699 INFO [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-1:ctx-4c0df155 job-955) (logid:917cdd39) Remove job-955 from job monitoring
EXPECTED RESULTS
- Volume snapshot is created and transported to primary storage.
ACTUAL RESULTS
- Volume Snap process can successfully create VM Snapshot, however, when conversion to qcow2 starts the process hangs and times out after 3600 seconds. Checking the logs, it does not ever reach the point in which the snapshot is copied to the snapshots directory on primary storage and it may be taking too long. Currently, the operation is being timeout after 3600 seconds. We increased the
waittime from 1800 seconds to 3600 seconds but made no difference, the time out continues to stop at 3600 seconds. - I was recommended:
If you have taken vm snapshot (or volume snapshot on running vms) multiple times, The QCOW2 image might be very big. The whole process (including vm snapshotting and qemu-img convert) might exceed the timeout limit (3600 seconds = 1 hour).But there's not a lot of snaps:
root@HOSTNAME ~ $ virsh snapshot-list i-2-19-VM
Name Creation Time State
-----------------------------------------------------------------------------------
0843e6b1-cb12-495c-8d5f-35c9a3338db7 2022-10-17 18:41:08 +0200 disk-snapshot
2e77acb7-1be4-4417-858e-a1d0890f44f0 2022-10-17 19:57:03 +0200 disk-snapshot
68476e2a-aec8-4637-9c4b-2dc49d79bd82 2022-10-18 09:44:21 +0200 disk-snapshot
c110058b-4661-42c0-b648-a54f4ba4beb7 2022-10-17 22:34:21 +0200 disk-snapshot
i-2-19-VM_VS_20221018072924 2022-10-18 09:29:25 +0200 running
root@HOSTNAME /mnt/81ffca3a-9775-375d-a1c0-9504c0ec3d89 $ du * -sh
48G 340cc676-4683-4424-beb7-8b33a5cb3dfe
42G 4a56546f-4f63-41c2-a314-27619fe5c2dc
15G 635fd5b5-bf14-4cf4-979c-be8a649d5189
21G 70ffef79-3aad-43ab-93d1-153fcd34ad15
26G 777b5149-50c0-4c0d-944b-873050c85419
20G 867a61e8-4635-40e4-b1b1-c94b2a325b49
4.8G b21551c6-8276-4e58-841e-4e5a02c0e163
11G d2c4feca-da5c-44b3-8ecc-40a01fd6c243
40G f50c0bca-3ae2-48ee-aa9d-0e2d4ec6ba92
24G f9336d09-0b3d-42d8-9a24-883b022b6712
8.9G f962cb2c-e03b-4ed2-a035-1870a367aa46
47G fd573618-885e-48c7-86e6-b019e2311d76
8.0K KVMHA
15G snapshots
root@HOSTNAME /mnt/81ffca3a-9775-375d-a1c0-9504c0ec3d89 $
I have started this matter via the email newsletter "Recurring Snaps Failing", and we have tested multiple solutions with no fix in sight.