Skip to content

KVMHAMonitor thread blocks indefinitely while NFS not available #2890

@csquire

Description

@csquire
ISSUE TYPE
  • Bug Report
COMPONENT NAME
KVM Agent
CLOUDSTACK VERSION
4.11.2.0-41120rc2
CONFIGURATION
OS / ENVIRONMENT
SUMMARY

Also see comment thread on PR #2722

We installed an RC release which includes PR #2722 on a test system expecting the host to get marked as Disconnected after using iptables to drop NFS requests, but instead the host gets marked as Down. My investigation shows that the line storage = conn.storagePoolLookupByUUIDString(uuid); blocks indefinitely. So, kvmheartbeat.sh is never executed, a host investigation is started, the host with blocked NFS is marked as Down and finally all VMs on that host are rescheduled and result in duplicate VMs.

I pulled a thread dump and found the KVMHAMonitor thread will hang here until NFS is unblocked.

java.lang.Thread.State: RUNNABLE
      at com.sun.jna.Native.invokePointer(Native Method)
      at com.sun.jna.Function.invokePointer(Function.java:470)
      at com.sun.jna.Function.invoke(Function.java:404)
      at com.sun.jna.Function.invoke(Function.java:315)
      at com.sun.jna.Library$Handler.invoke(Library.java:212)
      at com.sun.proxy.$Proxy3.virStoragePoolLookupByUUIDString(Unknown Source)
      at org.libvirt.Connect.storagePoolLookupByUUIDString(Unknown Source)
      at com.cloud.hypervisor.kvm.resource.KVMHAMonitor$Monitor.runInContext(KVMHAMonitor.java:95)
      - locked <1afb3370> (a java.util.concurrent.ConcurrentHashMap)
      at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
      at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
      at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
      at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
      at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
      at java.lang.Thread.run(Thread.java:748)

 Locked ownable synchronizers:
      - None
STEPS TO REPRODUCE

EXPECTED RESULTS
The host still runs kvmheartbeat.sh and shows as `Disconnected`
ACTUAL RESULTS
The host heartbeat hangs and get marked as `Down` via host investigation

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions