KVM cluster with NFS primary storage – VM HA not working when host is powered down #11674
Replies: 4 comments 1 reply
-
Thanks for opening your first issue here! Be sure to follow the issue template! |
Beta Was this translation helpful? Give feedback.
-
Could you please try the steps mentioned in this link cc @rajujith |
Beta Was this translation helpful? Give feedback.
-
VM-HA functions properly, but only when HOST-HA is disabled. When HOST-HA is also enabled on the hosts, the log contains the entries mentioned above, and the VMs fail to start on the healthy hosts even after several hours of waiting. |
Beta Was this translation helpful? Give feedback.
-
This right, @akoskuczi-bw , VM-HA and host-HA do not play nice together. This is a known issue. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
problem
In a KVM cluster with NFS primary storage, VM HA does not work when a host is powered down.
Expected behavior
VMs from the failed host should be restarted on other available hosts in the cluster.
Actual behavior
Down
and HA stateFenced
.NoTransitionException
.Relevant log snippet
WARN [o.a.c.h.HAManagerImpl] (BackgroundTaskPollManager-4:[ctx-c2bf501d]) (logid:96e12771) Unable to find next HA state for current HA state=[Fenced] for event=[Ineligible] for host Host {"id":4,"name":"csh-1-2.clab.run","type":"Routing","uuid":"f8f86177-f0e3-4994-8609-dd55e0e35a3e"} with id 4. com.cloud.utils.fsm.NoTransitionException: Unable to transition to a new state from Fenced via Ineligible
at com.cloud.utils.fsm.StateMachine2.getTransition(StateMachine2.java:108)
at com.cloud.utils.fsm.StateMachine2.getNextState(StateMachine2.java:94)
at org.apache.cloudstack.ha.HAManagerImpl.transitionHAState(HAManagerImpl.java:153)
at org.apache.cloudstack.ha.HAManagerImpl.validateAndFindHAProvider(HAManagerImpl.java:233)
at org.apache.cloudstack.ha.HAManagerImpl$HAManagerBgPollTask.runInContext(HAManagerImpl.java:665)
at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
versions
Environment
The steps to reproduce the bug
1.1. Enable Host HA and VM HA in a KVM cluster (NFS primary storage).
2. Power off a host that runs VMs.
3. Observe host and VM states in the management server.
What to do about it?
No response
Beta Was this translation helpful? Give feedback.
All reactions