Skip to content
This repository was archived by the owner on Dec 21, 2021. It is now read-only.

Conversation

@siegfriedweber
Copy link
Member

…d_stopped_in_parallel renamed to starting_and_stopping_100_pods_simultaneously_should_succeed and made asynchronous
@siegfriedweber siegfriedweber requested a review from a team May 27, 2021 11:51
@siegfriedweber siegfriedweber self-assigned this May 27, 2021
soenkeliebau
soenkeliebau previously approved these changes May 27, 2021
Copy link
Member

@soenkeliebau soenkeliebau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left two observations, both are just things to maybe consider, and I am happy for this to be merged as is as well.
I'll leave it up to you if you want to change anything based on my comments.

pub fn get_allocatable_pods(node: &Node) -> u32 {
node.status
.as_ref()
.and_then(|status| status.allocatable.as_ref())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not anything wrong with this test, just something I noticed when playing around:
This is not updated by the Krustlet to reflect pods already assigned to this node. In my case I had 1 pod running on a worker , max pods was set to 110 and I created 110 additional pods in the test (limited to this single node) which resulted in one "pending" pod, as the check was satisfied: 110 <= 110 but didn't account for the one additional pod.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was aware of this behavior but too lazy to write a comment. I will do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added. Thanks for the thorough review.

}

#[tokio::test(flavor = "multi_thread")]
async fn starting_and_stopping_100_pods_simultaneously_should_succeed() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general comment on this test, my understanding is, that this is to stress test a single agent, correct?
In that case it might be useful to pick one of the available nodes and explicitly assing all pods to that node, either via directly setting nodeName in the spec, or by adding a nodeSelector that targets that host.
Alternatively maybe make it "NUM_PODS_PER_NODE" and multiply by the number of nodes ... but that might create additional issues depending on how the scheduler assigns them..

Currently this runs fairly dissimilar tests from the perspective of an individual agent, depending on whether it is run against a cluster with 1, 3 or 10 nodes..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All pods are now scheduled on the same node via nodeName.

Copy link
Member

@soenkeliebau soenkeliebau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

fyi - I ran this ten times and once I saw the test fail with only 97 out of 100 pods being deleted. I wasn't able to reproduce this though, so will assume it was a fluke from an earlier failed test that left some garbage on the agent ..

@siegfriedweber siegfriedweber merged commit 725f304 into main May 27, 2021
@siegfriedweber siegfriedweber deleted the test_race_condition branch May 27, 2021 14:46
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants