-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Description
/kind bug
1. What kops version are you running? The command kops version, will display
this information.
Client version: 1.28.5 (git-v1.28.5)
2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
Server Version: v1.28.11
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
Configure the cluster-autoscaler managed-addon using expander: priority & (multiple) instancegroups with autoscale: true & autoscalePriority set.
Then run kops update cluster a few times and observe the sort order of the cluster-autoscaler-priority-expander configMap changing on each run.
5. What happened after the commands executed?
The sort order of the instancegroups in the cluster-autoscaler-priority-expander configMap changing which shows updates are required.
First run (shouldn't have changed this time)
Will modify resources:
ManagedFile/cluster.example.com-addons-bootstrap
Contents
...
- id: k8s-1.15
manifest: cluster-autoscaler.addons.k8s.io/k8s-1.15.yaml
+ manifestHash: 2632a6222ec08e5fd1166ff29c0fed020dd81f7ca9c8400a8ec0e24a48d4e2c9
- manifestHash: 9383ffd41a1eb9e3d299f9e3ddaf1f1a4d440aaba92dedd7b1d4bd2f0fa3818d
name: cluster-autoscaler.addons.k8s.io
selector:
...
ManagedFile/cluster.example.com-addons-cluster-autoscaler.addons.k8s.io-k8s-1.15
Contents
...
priorities: |-
0:
+ - nodes-b.cluster.example.com
- nodes-a.cluster.example.com
- - nodes-b.cluster.example.com
10:
- nodes-cifs.cluster.example.com
20:
+ - nodes-a-ondemand.cluster.example.com
- - nodes-b-ondemand.cluster.example.com
+ - nodes-b-ondemand.cluster.example.com
- - nodes-a-ondemand.cluster.example.com
30:
+ - nodes-b-worker.cluster.example.com
- - nodes-b-import-worker.cluster.example.com
+ - nodes-b-courier-worker.cluster.example.com
- - nodes-b-worker.cluster.example.com
+ - nodes-a-spot.cluster.example.com
- - nodes-a-worker.cluster.example.com
- - nodes-b-spot.cluster.example.com
+ - nodes-b-import-worker.cluster.example.com
- - nodes-a-spot.cluster.example.com
+ - nodes-a-courier-worker.cluster.example.com
- - nodes-b-courier-worker.cluster.example.com
- nodes-a-import-worker.cluster.example.com
+ - nodes-b-spot.cluster.example.com
+ - nodes-a-worker.cluster.example.com
- - nodes-a-courier-worker.cluster.example.com
kind: ConfigMap
metadata:
...
Second run (again still shouldn't have changed)
Will modify resources:
ManagedFile/cluster.example.com-addons-bootstrap
Contents
...
- id: k8s-1.15
manifest: cluster-autoscaler.addons.k8s.io/k8s-1.15.yaml
+ manifestHash: 312c1246f94a347ff98f6aba70d6060f0de5d1ae488bedfdcd082d7a14b2555e
- manifestHash: 9383ffd41a1eb9e3d299f9e3ddaf1f1a4d440aaba92dedd7b1d4bd2f0fa3818d
name: cluster-autoscaler.addons.k8s.io
selector:
...
ManagedFile/cluster.example.com-addons-cluster-autoscaler.addons.k8s.io-k8s-1.15
Contents
...
priorities: |-
0:
+ - nodes-b.cluster.example.com
- nodes-a.cluster.example.com
- - nodes-b.cluster.example.com
10:
- nodes-cifs.cluster.example.com
...
- nodes-a-ondemand.cluster.example.com
30:
+ - nodes-a-courier-worker.cluster.example.com
- - nodes-b-import-worker.cluster.example.com
+ - nodes-a-import-worker.cluster.example.com
- - nodes-b-worker.cluster.example.com
+ - nodes-a-spot.cluster.example.com
- - nodes-a-worker.cluster.example.com
+ - nodes-a-worker.cluster.example.com
- - nodes-b-spot.cluster.example.com
+ - nodes-b-courier-worker.cluster.example.com
- - nodes-a-spot.cluster.example.com
+ - nodes-b-import-worker.cluster.example.com
- - nodes-b-courier-worker.cluster.example.com
+ - nodes-b-spot.cluster.example.com
- - nodes-a-import-worker.cluster.example.com
+ - nodes-b-worker.cluster.example.com
- - nodes-a-courier-worker.cluster.example.com
kind: ConfigMap
metadata:
...
6. What did you expect to happen?
The sort order of the instancegroups in the cluster-autoscaler-priority-expander configMap to be consistent.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
...
spec:
...
cloudProvider: aws
clusterAutoscaler:
awsUseStaticInstanceList: false
balanceSimilarNodeGroups: true
cordonNodeBeforeTerminating: false
cpuRequest: 100m
enabled: true
expander: priority
maxNodeProvisionTime: 10m0s
memoryRequest: 384Mi
newPodScaleUpDelay: 30s
scaleDownDelayAfterAdd: 10m0s
scaleDownUnneededTime: 10m0s
scaleDownUnreadyTime: 20m0s
scaleDownUtilizationThreshold: "0.5"
skipNodesWithLocalStorage: false
skipNodesWithSystemPods: false
...apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
name: nodes-a
...
spec:
autoscale: false
...apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
name: nodes-a-ondemand
...
spec:
autoscale: true
autoscalePriority: 20
...apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
name: nodes-a-spot
...
spec:
autoscale: true
autoscalePriority: 30
...apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
name: nodes-a-worker
...
spec:
autoscale: true
autoscalePriority: 30
...apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
name: nodes-b
...
spec:
autoscale: false
...apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
name: nodes-b-ondemand
...
spec:
autoscale: true
autoscalePriority: 20
...apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
name: nodes-b-spot
...
spec:
autoscale: true
autoscalePriority: 30
...apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
name: nodes-b-worker
...
spec:
autoscale: true
autoscalePriority: 30
...8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.
9. Anything else do we need to know?
I also noticed that instanceGroups that have autoscale: false are included in the cluster-autoscaler-priority-expander configMap but IMO they shouldn't be (they are not included in the cluster-autoscaler Deployment as expected).
Removing autoscale: false from the instanceGroup actually removes it from the cluster-autoscaler-priority-expander configMap which is good, however it adds it to the cluster-autoscaler Deployment, which is not good.
So need to tidy up this behaviour too.