Skip to content

Commit a4f0465

Browse files
emeraldbayXin Wang
andauthored
Add cert mgr installation which is required by HPTO (aws#180)
* Add cert mgr installation * Add cert mgr installation * update cert-mgr readme --------- Co-authored-by: Xin Wang <[email protected]>
1 parent f747815 commit a4f0465

File tree

3 files changed

+28
-0
lines changed

3 files changed

+28
-0
lines changed

helm_chart/HyperPodHelmChart/Chart.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,10 @@ version: 0.1.0
2424
appVersion: "1.16.0"
2525

2626
dependencies:
27+
- name: cert-manager
28+
version: "v1.18.2"
29+
repository: oci://quay.io/jetstack/charts
30+
condition: cert-manager.enabled
2731
- name: training-operators
2832
version: "0.1.0"
2933
repository: "file://charts/training-operators"

helm_chart/HyperPodHelmChart/values.yaml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,15 @@ namespace:
115115
create: true
116116
name: aws-hyperpod
117117

118+
cert-manager:
119+
enabled: true
120+
namespace: cert-manager
121+
global:
122+
leaderElection:
123+
namespace: cert-manager
124+
crds:
125+
enabled: true
126+
118127
mlflow:
119128
enabled: false
120129

helm_chart/readme.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ More information about orchestration features for cluster admins [here](https://
3333
| [Kubeflow Training Operator](https://www.kubeflow.org/docs/components/trainer/legacy-v1/overview/) | Installs operators for managing various machine learning training jobs, such as TensorFlow, PyTorch, and MXNet, providing native Kubernetes support for distributed training workloads. | | Yes |
3434
| HyperPod patching | Deploys the RBAC and controller resources needed for orchestrating rolling updates and patching workflows in SageMaker HyperPod clusters. Includes pod eviction and node monitoring. | HyperPod Resiliency | Yes |
3535
| hyperpod-inference-operator | Installs the HyperPod Inference Operator and its dependencies to the cluster, allowing cluster deployment and inferencing of JumpStart, s3-hosted, and FSx-hosted models | No |
36+
| [cert-manager](https://github.com/cert-manager/cert-manager) | Automatically provisions and manages TLS certificates in Kubernetes clusters. Provides certificate lifecycle management including issuance, renewal, and revocation for secure communications. | [Hyperpod training operator](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-eks-operator.html) | Yes |
3637

3738
> **_Note_** The `mpijob` scheme is disabled in the Training Operator helm chart to avoid conflicting with the MPI Operator.
3839
@@ -48,6 +49,20 @@ storage:
4849
enabled: true
4950
```
5051

52+
To enable cert-manager for TLS certificate management, pass in `--set cert-manager.enabled=true` when installing or upgrading the main chart or set the following in the values.yaml file:
53+
```
54+
cert-manager:
55+
enabled: true
56+
namespace: cert-manager
57+
global:
58+
leaderElection:
59+
namespace: cert-manager
60+
crds:
61+
enabled: true
62+
```
63+
namespace specifies which name space cert-manager should be installed
64+
65+
5166
---
5267

5368
The following plugins are only required for HyperPod Resiliency if you are using the following supported devices, such as GPU/Neuron instances, unless you install these plugins on your own.

0 commit comments

Comments
 (0)