Skip to content

cluster-autoscaler: priority-expander sort order always changing #16664

@elliotdobson

Description

@elliotdobson

/kind bug

1. What kops version are you running? The command kops version, will display
this information.

Client version: 1.28.5 (git-v1.28.5)

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

Server Version: v1.28.11

3. What cloud provider are you using?
AWS

4. What commands did you run? What is the simplest way to reproduce this issue?
Configure the cluster-autoscaler managed-addon using expander: priority & (multiple) instancegroups with autoscale: true & autoscalePriority set.
Then run kops update cluster a few times and observe the sort order of the cluster-autoscaler-priority-expander configMap changing on each run.

5. What happened after the commands executed?
The sort order of the instancegroups in the cluster-autoscaler-priority-expander configMap changing which shows updates are required.

First run (shouldn't have changed this time)

Will modify resources:
  ManagedFile/cluster.example.com-addons-bootstrap
  	Contents            
  	                    	...
  	                    	    - id: k8s-1.15
  	                    	      manifest: cluster-autoscaler.addons.k8s.io/k8s-1.15.yaml
  	                    	+     manifestHash: 2632a6222ec08e5fd1166ff29c0fed020dd81f7ca9c8400a8ec0e24a48d4e2c9
  	                    	-     manifestHash: 9383ffd41a1eb9e3d299f9e3ddaf1f1a4d440aaba92dedd7b1d4bd2f0fa3818d
  	                    	      name: cluster-autoscaler.addons.k8s.io
  	                    	      selector:
  	                    	...
  	                    	

  ManagedFile/cluster.example.com-addons-cluster-autoscaler.addons.k8s.io-k8s-1.15
  	Contents            
  	                    	...
  	                    	    priorities: |-
  	                    	      0:
  	                    	+     - nodes-b.cluster.example.com
  	                    	      - nodes-a.cluster.example.com
  	                    	-     - nodes-b.cluster.example.com
  	                    	      10:
  	                    	      - nodes-cifs.cluster.example.com
  	                    	      20:
  	                    	+     - nodes-a-ondemand.cluster.example.com
  	                    	-     - nodes-b-ondemand.cluster.example.com
  	                    	+     - nodes-b-ondemand.cluster.example.com
  	                    	-     - nodes-a-ondemand.cluster.example.com
  	                    	      30:
  	                    	+     - nodes-b-worker.cluster.example.com
  	                    	-     - nodes-b-import-worker.cluster.example.com
  	                    	+     - nodes-b-courier-worker.cluster.example.com
  	                    	-     - nodes-b-worker.cluster.example.com
  	                    	+     - nodes-a-spot.cluster.example.com
  	                    	-     - nodes-a-worker.cluster.example.com
  	                    	-     - nodes-b-spot.cluster.example.com
  	                    	+     - nodes-b-import-worker.cluster.example.com
  	                    	-     - nodes-a-spot.cluster.example.com
  	                    	+     - nodes-a-courier-worker.cluster.example.com
  	                    	-     - nodes-b-courier-worker.cluster.example.com
  	                    	      - nodes-a-import-worker.cluster.example.com
  	                    	+     - nodes-b-spot.cluster.example.com
  	                    	+     - nodes-a-worker.cluster.example.com
  	                    	-     - nodes-a-courier-worker.cluster.example.com
  	                    	  kind: ConfigMap
  	                    	  metadata:
  	                    	...
  	                    	

Second run (again still shouldn't have changed)

Will modify resources:
  ManagedFile/cluster.example.com-addons-bootstrap
  	Contents            
  	                    	...
  	                    	    - id: k8s-1.15
  	                    	      manifest: cluster-autoscaler.addons.k8s.io/k8s-1.15.yaml
  	                    	+     manifestHash: 312c1246f94a347ff98f6aba70d6060f0de5d1ae488bedfdcd082d7a14b2555e
  	                    	-     manifestHash: 9383ffd41a1eb9e3d299f9e3ddaf1f1a4d440aaba92dedd7b1d4bd2f0fa3818d
  	                    	      name: cluster-autoscaler.addons.k8s.io
  	                    	      selector:
  	                    	...
  	                    	

  ManagedFile/cluster.example.com-addons-cluster-autoscaler.addons.k8s.io-k8s-1.15
  	Contents            
  	                    	...
  	                    	    priorities: |-
  	                    	      0:
  	                    	+     - nodes-b.cluster.example.com
  	                    	      - nodes-a.cluster.example.com
  	                    	-     - nodes-b.cluster.example.com
  	                    	      10:
  	                    	      - nodes-cifs.cluster.example.com
  	                    	...
  	                    	      - nodes-a-ondemand.cluster.example.com
  	                    	      30:
  	                    	+     - nodes-a-courier-worker.cluster.example.com
  	                    	-     - nodes-b-import-worker.cluster.example.com
  	                    	+     - nodes-a-import-worker.cluster.example.com
  	                    	-     - nodes-b-worker.cluster.example.com
  	                    	+     - nodes-a-spot.cluster.example.com
  	                    	-     - nodes-a-worker.cluster.example.com
  	                    	+     - nodes-a-worker.cluster.example.com
  	                    	-     - nodes-b-spot.cluster.example.com
  	                    	+     - nodes-b-courier-worker.cluster.example.com
  	                    	-     - nodes-a-spot.cluster.example.com
  	                    	+     - nodes-b-import-worker.cluster.example.com
  	                    	-     - nodes-b-courier-worker.cluster.example.com
  	                    	+     - nodes-b-spot.cluster.example.com
  	                    	-     - nodes-a-import-worker.cluster.example.com
  	                    	+     - nodes-b-worker.cluster.example.com
  	                    	-     - nodes-a-courier-worker.cluster.example.com
  	                    	  kind: ConfigMap
  	                    	  metadata:
  	                    	...
  	                    	

6. What did you expect to happen?
The sort order of the instancegroups in the cluster-autoscaler-priority-expander configMap to be consistent.

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
...
spec:
...
  cloudProvider: aws
  clusterAutoscaler:
    awsUseStaticInstanceList: false
    balanceSimilarNodeGroups: true
    cordonNodeBeforeTerminating: false
    cpuRequest: 100m
    enabled: true
    expander: priority
    maxNodeProvisionTime: 10m0s
    memoryRequest: 384Mi
    newPodScaleUpDelay: 30s
    scaleDownDelayAfterAdd: 10m0s
    scaleDownUnneededTime: 10m0s
    scaleDownUnreadyTime: 20m0s
    scaleDownUtilizationThreshold: "0.5"
    skipNodesWithLocalStorage: false
    skipNodesWithSystemPods: false
...
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  name: nodes-a
...
spec:
  autoscale: false
...
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  name: nodes-a-ondemand
...
spec:
  autoscale: true
  autoscalePriority: 20
...
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  name: nodes-a-spot
...
spec:
  autoscale: true
  autoscalePriority: 30
...
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  name: nodes-a-worker
...
spec:
  autoscale: true
  autoscalePriority: 30
...
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  name: nodes-b
...
spec:
  autoscale: false
...
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  name: nodes-b-ondemand
...
spec:
  autoscale: true
  autoscalePriority: 20
...
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  name: nodes-b-spot
...
spec:
  autoscale: true
  autoscalePriority: 30
...
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  name: nodes-b-worker
...
spec:
  autoscale: true
  autoscalePriority: 30
...

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.

9. Anything else do we need to know?
I also noticed that instanceGroups that have autoscale: false are included in the cluster-autoscaler-priority-expander configMap but IMO they shouldn't be (they are not included in the cluster-autoscaler Deployment as expected).

Removing autoscale: false from the instanceGroup actually removes it from the cluster-autoscaler-priority-expander configMap which is good, however it adds it to the cluster-autoscaler Deployment, which is not good.

So need to tidy up this behaviour too.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions