HA Kubernetes Cluster Not Using Auto-Generated Public IP in Apache CloudStack 4.21.0.0 #11642

hodie-aurora · 2025-09-16T06:24:08Z

hodie-aurora
Sep 16, 2025

Environment

CloudStack Version: Upgraded to 4.21.0.0
Kubernetes Template: setup-v1.33.1-calico-x86_64.iso
Setup: Using VPC with a subnet, creating HA K8s cluster without specifying an external load balancer IP (expecting auto-generation of public IP with port forwarding and load balancing)

Description
Previously, in older versions of CloudStack, I successfully created HA Kubernetes clusters using the following workflow:

Create a VPC.
Create a subnet within the VPC.
Deploy an HA K8s cluster without filling in the external load balancer IP. This would automatically generate a public IP, configure port forwarding, and set up load balancing.
On a control node, running kubectl cluster-info would show the cluster IP pointing to the auto-generated public IP.
Downloading the kube.conf from the CloudStack UI for the K8s cluster page worked normally for remote access.

This behavior was logical and worked as expected.
However, after upgrading to CloudStack 4.21.0.0, I'm encountering an issue with the same workflow:

The public IP is still auto-generated.
Port forwarding and load balancing rules are created successfully (as shown in the UI).
But when I run kubectl cluster-info on a control node inside the cluster, the IP points to one of the internal control node IPs (e.g., 10.1.0.219:6443) instead of the public IP.
Attempting to use the downloaded kube.conf fails, likely due to certificate verification issues or inability to connect to the server via the public IP.
Running kubectl get pods -A also fails with TLS certificate verification errors: "Failed to verify certificate: x509: certificate is valid for [internal IPs], not [public IP]".

Screenshots attached for reference:

kubectl cluster-info output showing internal control node IP.

CloudStack UI showing auto-generated public IP (192.168.122.109) with port forwarding rules (private port 22 to public ports 2222-2225 TCP, mapping to control nodes at 10.1.0.x) and load balancing setup (api-lb on port 6443 TCP, active, pointing to control nodes at 10.1.0.44, 10.1.0.133, 10.1.0.219).

kube.conf usage failure with connection errors.

I'm not sure if this is due to a configuration error on my end, a bug in 4.21.0.0, or if there's a new mechanism/functionality introduced in this version (e.g., changes to the CloudStack Kubernetes Service/CKS plugin, which I read has enhancements for flexible node configurations and hypervisor selection in 4.21).
Questions

Is this a configuration issue? If so, what should I check or troubleshoot? For example:

Network settings in the VPC/subnet?
Kubernetes template compatibility with 4.21?
Any specific flags or options during cluster creation?
Certificate generation or API server config?

Is this due to new features in 4.21? From what I've seen in the release notes and blogs (e.g., ShapeBlue's deep dive), CKS has been updated for better adaptability, including separate templates for worker/control/etcd nodes. If there's a new required step for HA public IP handling, what is the correct procedure to ensure the cluster uses the auto-generated public IP externally?
Workarounds or Fixes? Has anyone else encountered this? Any patches or config tweaks recommended?

Additional Request: Tutorials
On a related note, does anyone have recommendations for comprehensive video or illustrated tutorials on using CloudStack? The official docs (docs.cloudstack.apache.org) cover basics but lack detailed walkthroughs for features like this K8s integration. Official full-series videos/articles would be ideal, but unofficial ones are welcome too.

weizhouapache · 2025-09-16T06:38:02Z

weizhouapache
Sep 16, 2025
Collaborator

cc @nvazquez @Pearl1594

0 replies

kiranchavala · 2025-09-16T06:58:59Z

kiranchavala
Sep 16, 2025
Collaborator

@hodie-aurora there is a similar upstream issue logged

#11579

As a workaround pass the following command when executing kubectl command

kubectl --insecure-skip-tls-verify=true

1 reply

hodie-aurora Sep 16, 2025
Author

@kiranchavala Thank you for your suggestion! While using kubectl --insecure-skip-tls-verify=true can help bypass the TLS certificate verification issue, it doesn't address the root problem. The Kubernetes cluster is still pointing to an internal control node IP (e.g., 10.1.0.219:6443) instead of the auto-generated public IP for the HA load balancer. This creates a single point of failure—if the control node with that internal IP goes down, the entire cluster becomes unavailable, which defeats the purpose of a high-availability setup. I appreciate your valuable input and look forward to any further insights or potential fixes to ensure the cluster uses the HA public IP correctly

hodie-aurora · 2025-09-17T01:40:00Z

hodie-aurora
Sep 17, 2025
Author

@weizhouapache Since #11579 can reproduce this issue, can you confirm if this is a bug introduced in CloudStack 4.21.0.0? If yes, I'd like to know when it might be fixed—will it be in 4.22, or in a patch minor version? If fixed in the 4.21 series, will the release package be updated? thank you.

0 replies

hodie-aurora · 2025-09-17T01:44:40Z

hodie-aurora
Sep 17, 2025
Author

@weizhouapache Following up on my previous comment, I believe that using kubectl --insecure-skip-tls-verify=true only allows symptomatic access to the cluster but doesn't resolve the root cause. The fundamental issue appears to be that during cluster initialization, the Kubernetes API server is configured to point to the internal IP of a single control node VM (e.g., 10.1.0.219:6443) instead of the auto-generated public IP. If the cluster were properly set up to use the public IP (with the load balancer), the kubectl access problems would be resolved naturally, and the cluster would truly achieve high availability—meaning it could tolerate the failure of any number of control nodes up to less than half without the entire cluster going down. Is my understanding of the root cause correct? Thank you for any confirmation or additional insights!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HA Kubernetes Cluster Not Using Auto-Generated Public IP in Apache CloudStack 4.21.0.0 #11642

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

HA Kubernetes Cluster Not Using Auto-Generated Public IP in Apache CloudStack 4.21.0.0 #11642

Uh oh!

hodie-aurora Sep 16, 2025

Replies: 4 comments · 1 reply

Uh oh!

weizhouapache Sep 16, 2025 Collaborator

Uh oh!

kiranchavala Sep 16, 2025 Collaborator

Uh oh!

hodie-aurora Sep 16, 2025 Author

Uh oh!

hodie-aurora Sep 17, 2025 Author

Uh oh!

hodie-aurora Sep 17, 2025 Author

hodie-aurora
Sep 16, 2025

Replies: 4 comments 1 reply

weizhouapache
Sep 16, 2025
Collaborator

kiranchavala
Sep 16, 2025
Collaborator

hodie-aurora Sep 16, 2025
Author

hodie-aurora
Sep 17, 2025
Author

hodie-aurora
Sep 17, 2025
Author