-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Ensure affinity groups are honored when VMs are deployed in parallel #9201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure affinity groups are honored when VMs are deployed in parallel #9201
Conversation
|
@blueorangutan package |
|
@vishesh92 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 4.19 #9201 +/- ##
============================================
+ Coverage 14.96% 15.67% +0.71%
- Complexity 10993 10997 +4
============================================
Files 5373 5010 -363
Lines 469248 439976 -29272
Branches 58782 54849 -3933
============================================
- Hits 70210 68967 -1243
+ Misses 391266 363377 -27889
+ Partials 7772 7632 -140
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
|
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 9842 |
|
@blueorangutan test keepEnv |
|
@vishesh92 a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
|
[SF] Trillian test result (tid-10401)
|
|
@vishesh92 |
I tested with simulator. I was able to reproduce and fix the issue for strict anti affinity. |
thanks @vishesh92 |
|
@vishesh92 I tried twice. first time it works, second try it did not |
Yes. This will reduce the occurrence of failures but not completely resolve the issue. |
rohityadavcloud
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - but can't confirm on stability of the feature/testing, or if current smoketests cover these changes
|
@vishesh92 cloudstack/server/src/main/java/com/cloud/deploy/DeploymentPlanningManagerImpl.java Lines 303 to 304 in cb9b313
rename it to |
I am also wondering what processes should be locked, maybe until the host is determined ? maybe we could introduce some mechanism, for example
the problem is, it could lead to high latency if the platform is large. |
|
@vishesh92 can you review and advise further (PR marked waiting-for-author)? |
@rohityadavcloud I am running the capc e2e test for affinity group. Let me share the results once the job is complete. |
|
@weizhouapache @sureshanaparti I ran e2e test 5 times on an env with this patch. Here are the results: |
|
@sureshanaparti , will we try to merge this in 4.19.1? |
|
@DaanHoogland This PR is targeted for 4.19.2 |
kiranchavala
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, tested with the following steps
1. Create a affinity group of type "host anti-affinity (Strict)".
2. Deploy 2 vm's ok without any affinity group attached
3. Check the vm's host ( they can be deployed on the same host)
Note: Make sure the vm's are deployed on 1 single host in the cluster
3. Stop the vm's
4. Change the affinity group of the vms to "host anti-affinity (Strict)"
Note: Make sure the affinity group is attached to the vm
5. Start the vm (Made sure to uncheck the option to start on Last host )
6. The vm's got deployed on different hosts

Description
Fixes #7202 #9110
This PR doesn't completely solve the issue. To completely resolving the issue, we will have to sacrifice the speed at which VMs can be deployed in parallel.
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
How did you try to break this feature and the system with this change?