Skip to content

Commit ae98bf7

Browse files
committed
Adding non goals and risks and mitigations
1 parent 0e02825 commit ae98bf7

File tree

1 file changed

+41
-1
lines changed
  • keps/sig-node/5419-pod-level-resources-in-place-resize

1 file changed

+41
-1
lines changed

keps/sig-node/5419-pod-level-resources-in-place-resize/README.md

Lines changed: 41 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,10 @@
55
- [Summary](#summary)
66
- [Motivation](#motivation)
77
- [Goals](#goals)
8+
- [Non Goals](#non-goals)
89
- [Proposal](#proposal)
910
- [Notes/Constraints/Caveats](#notesconstraintscaveats)
11+
- [Risks and Mitigations](#risks-and-mitigations)
1012
- [Design Details](#design-details)
1113
- [Design Principles](#design-principles)
1214
- [Components/Features changes](#componentsfeatures-changes)
@@ -102,7 +104,25 @@ This proposal aims to:
102104
1. Extend the In-Place Pod Resize (IPPR) functionality to support dynamic
103105
adjustments of pod-level CPU and Memory resources.
104106
2. Ensure compatibility and proper interaction between pod-level IPPR and existing container-level IPPR mechanisms.
105-
3. Provide clear mechanisms for tracking and reporting the actual allocated pod-level resources in PodStatus
107+
3. Provide clear mechanisms for tracking and reporting the actual allocated
108+
pod-level resources in PodStatus
109+
110+
### Non Goals
111+
112+
1. This KEP focuses solely on in-place resizing of core compute resources (CPU and
113+
Memory) at the pod level. Extending this functionality to other resource types
114+
(e.g., GPUs, network bandwidth) is outside the current scope.
115+
116+
2. This KEP does not aim to implement dynamic changes to a pod's QoS class based on
117+
in-place resource resize operations.
118+
119+
3. No dynamic adjustments for Init Containers that have already finished and can't
120+
be restarted.
121+
122+
4. No automatic removal of lower-priority pods to make room for a pod that's resizing its resources.
123+
124+
5. This KEP doesn't aim to fix every complex timing issue that can happen between
125+
the Kubelet and the scheduler during resizes that already exist in [KEP#1287](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/1287-in-place-update-pod-resources/README.md).
106126

107127
## Proposal
108128
### Notes/Constraints/Caveats
@@ -118,6 +138,26 @@ This proposal aims to:
118138

119139
3. This feature relies on the PodLevelResources, InPlacePodVerticalScaling and InPlacePodLevelResourcesVerticalScaling feature gates being enabled.
120140

141+
### Risks and Mitigations
142+
143+
1. Backward compatibility: For pods with pod-level resources, when Pod.Spec.Resources
144+
becomes representative of desired state, and Pod's actual resource configurations are
145+
tracked in Pod.Status.Resources, applications that query PodSpec and rely on
146+
Resources in PodSpec to determine resource configurations will see values that
147+
may not represent actual configurations. As a mitigation, this change needs to be
148+
documented and highlighted in the release notes, and in
149+
top-level Kubernetes documents.
150+
151+
2. Resizing memory lower: Lowering cgroup memory limits may not work as pages could
152+
be in use, and approaches such as setting limit near current usage may be
153+
required. This issue needs further investigation.
154+
155+
3. Scheduler race condition: If a resize happens concurrently with the scheduler
156+
evaluating the node where the pod is resized, it can result in a node being
157+
over-scheduled, which will cause the pod to be rejected with an OutOfCPU or
158+
OutOfMemory error. Solving this race condition is out of scope for this KEP, but
159+
a general solution may be considered in the future.
160+
121161
## Design Details
122162

123163
### Design Principles

0 commit comments

Comments
 (0)