-
-
Notifications
You must be signed in to change notification settings - Fork 1
Test race condition #27
Conversation
…pped_in_parallel added
…d_stopped_in_parallel renamed to starting_and_stopping_100_pods_simultaneously_should_succeed and made asynchronous
soenkeliebau
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left two observations, both are just things to maybe consider, and I am happy for this to be merged as is as well.
I'll leave it up to you if you want to change anything based on my comments.
| pub fn get_allocatable_pods(node: &Node) -> u32 { | ||
| node.status | ||
| .as_ref() | ||
| .and_then(|status| status.allocatable.as_ref()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not anything wrong with this test, just something I noticed when playing around:
This is not updated by the Krustlet to reflect pods already assigned to this node. In my case I had 1 pod running on a worker , max pods was set to 110 and I created 110 additional pods in the test (limited to this single node) which resulted in one "pending" pod, as the check was satisfied: 110 <= 110 but didn't account for the one additional pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was aware of this behavior but too lazy to write a comment. I will do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment added. Thanks for the thorough review.
| } | ||
|
|
||
| #[tokio::test(flavor = "multi_thread")] | ||
| async fn starting_and_stopping_100_pods_simultaneously_should_succeed() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A general comment on this test, my understanding is, that this is to stress test a single agent, correct?
In that case it might be useful to pick one of the available nodes and explicitly assing all pods to that node, either via directly setting nodeName in the spec, or by adding a nodeSelector that targets that host.
Alternatively maybe make it "NUM_PODS_PER_NODE" and multiply by the number of nodes ... but that might create additional issues depending on how the scheduler assigns them..
Currently this runs fairly dissimilar tests from the perspective of an individual agent, depending on whether it is run against a cluster with 1, 3 or 10 nodes..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All pods are now scheduled on the same node via nodeName.
…d starts all pods on the same node
soenkeliebau
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
fyi - I ran this ten times and once I saw the test fail with only 97 out of 100 pods being deleted. I wasn't able to reproduce this though, so will assume it was a fluke from an earlier failed test that left some garbage on the agent ..
Tests stackabletech/agent#168