Skip to content

Commit 4ba2a6f

Browse files
authored
Merge pull request #780 from gnufied/csi-volume-resize
Start adding a KEP for csi volume resizing
2 parents d77bd5e + c4f4b1b commit 4ba2a6f

File tree

1 file changed

+170
-0
lines changed

1 file changed

+170
-0
lines changed
Lines changed: 170 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,170 @@
1+
---
2+
title: Support for CSI volume resizing
3+
authors:
4+
- "@gnufied
5+
owning-sig: sig-storage
6+
participating-sigs:
7+
- sig-storage
8+
reviewers:
9+
- @saad-ali
10+
- @jsafrane
11+
approvers:
12+
- @saad-ali
13+
- @childsb
14+
creation-date: 2019-01-29
15+
last-updated: 2019-01-29
16+
status: implementable
17+
see-also:
18+
- [Kubernetes Volume expansion](https://github.com/kubernetes/enhancements/issues/284)
19+
- [Online resizing design](https://github.com/kubernetes/enhancements/pull/737)
20+
replaces:
21+
superseded-by:
22+
---
23+
24+
# Support for CSI volume resizing
25+
26+
## Table of Contents
27+
28+
Table of Contents
29+
=================
30+
31+
* [Support for CSI volume resizing](#support-for-csi-volume-resizing)
32+
* [Table of Contents](#table-of-contents)
33+
* [Table of Contents](#table-of-contents-1)
34+
* [Summary](#summary)
35+
* [Motivation](#motivation)
36+
* [Goals](#goals)
37+
* [Non-Goals](#non-goals)
38+
* [Proposal](#proposal)
39+
* [External resize controller](#external-resize-controller)
40+
* [Expansion on Kubelet](#expansion-on-kubelet)
41+
* [Offline volume resizing on kubelet:](#offline-volume-resizing-on-kubelet)
42+
* [Online volume resizing on kubelet:](#online-volume-resizing-on-kubelet)
43+
* [Risks and Mitigations](#risks-and-mitigations)
44+
* [Test Plan](#test-plan)
45+
* [Graduation Criteria](#graduation-criteria)
46+
* [Implementation History](#implementation-history)
47+
48+
49+
## Summary
50+
51+
To bring CSI volumes in feature parity with in-tree volumes we need to implement support for resizing of CSI volumes.
52+
53+
## Motivation
54+
55+
We recently implemented volume resizing support in CSI specs. This proposal implements this feature for Kubernetes.
56+
Any CSI volume plugin that implements necessary part of CSI specs will become resizable.
57+
58+
### Goals
59+
60+
To enable expansion of CSI volumes used by `PersistentVolumeClaim`s that support volume expansion as a plugin capability.
61+
62+
### Non-Goals
63+
64+
The expansion capability of a CSI plugin will not be validated by using CSI RPC call when user edits the PVC(i.e existing resize admission controller will not make CSI RPC call).
65+
The responsibility of
66+
actually enabling expansion for certains storageclasses still falls on Kubernetes admin.
67+
68+
## Proposal
69+
70+
The design of CSI volume resizing is made of two parts.
71+
72+
73+
### External resize controller
74+
75+
To support resizing of CSI volumes an external resize controller will monitor all PVCs. If a PVC meets following criteria for resizing, it will be added to
76+
controller's workqueue:
77+
78+
- The driver name disovered from PVC should match name of driver currently known(by querying driver info via CSI RPC call) to external resize controller.
79+
- Once it notices a PVC has been updated and by comparing old and new PVC object, it determines more space has been requested by the user.
80+
81+
Once PVC gets picked from workqueue, the controller will also compare requested PVC size with actual size of volume in `PersistentVolume`
82+
object. Once PVC passes all these checks, a CSI `ControllerExpandVolume` call will be made by the controller if CSI plugin implements `ControllerExpandVolume`
83+
RPC call.
84+
85+
If `ControllerExpandVolume` call is successful and plugin implements `NodeExpandVolume`:
86+
- if `ControllerExpandVolumeResponse` returns `true` in `node_expansion_required` then `FileSystemResizePending` condition will be added to PVC and `NodeExpandVolume` operation will be queued on kubelet. Also volume size reported by PV will be updated to new value.
87+
- if `ControllerExpandVolumeResponse` returns `false` in `node_expansion_required` then volume resize operation will be marked finished and both `pvc.Status.Capacity` and `pv.Spec.Capacity` will report updated value.
88+
89+
If plugin does not implement `NodeExpandVolume` then volume resize operation will be marked as finished and both `pvc.Status.Capacity` and `pv.Spec.Capacity` will report updated value after successful completion of `ControllerExpandVolume` RPC call.
90+
91+
If `ControllerExpandVolume` call fails:
92+
- Then PVC will retain `Resizing` condition and will have appropriate events added to the PVC.
93+
- Controller will retry resizing operation with exponential backoff, assuming it corrects itself.
94+
95+
A general mechanism for recovering from resize failure will be implemented via: https://github.com/kubernetes/kubernetes/issues/73036
96+
97+
### Expansion on Kubelet
98+
99+
A CSI volume may require expansion on the node to finish volume resizing. In some cases - the entire resizing operation can happen on the node and
100+
plugin may choose to not implement `ControllerExpandVolume` CSI RPC call at all.
101+
102+
Currently Kubernetes supports two modes of performing volume resize on kubelet. We will describe each mode here. For more information , please refer to original volume resize proposal - https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/grow-volume-size.md.
103+
104+
105+
#### Offline volume resizing on kubelet:
106+
107+
This is the default mode and in this mode `NodeExpandVolume` will only be called when volume is being mounted on the node. In other words, pod that was using the volume must be re-created for expansion on node to happen.
108+
109+
When a pod that is using the PVC is started, kubelet will compare `pvc.spec.resources.requests.storage` and `pvc.Status.Capacity`. It also compares PVC's size with `pv.Spec.Capacity` and if it detects PV is reporting same size as pvc's spec but PVC's status is still reporting smaller value then it determines -
110+
a volume expansion is pending on the node. At this point if plugin implements `NodeExpandVolume` RPC call then, kubelet will call it and:
111+
112+
If `NodeExpandVolume` is successful:
113+
- It will update `pvc.Status.Capacity` with latest value and remove all resizing related conditions from PVC.
114+
115+
If `NodeExpandVolume` failed:
116+
- It will add a event to both PVC and Pod about failed resizing and resize operation will be retried. This will prevent pod from starting up.
117+
118+
119+
#### Online volume resizing on kubelet:
120+
121+
More details about online resizing can be found in [Online resizing design](https://github.com/kubernetes/enhancements/pull/737) but essentially if
122+
`ExpandInUsePersistentVolumes` feature is enabled then kubelet will periodically poll all PVCs that are being used on the node and compare `pvc.spec.resources.requests.storage` and `pvc.Status.Capacity`(also `pv.Spec.Capacity`) and make similar determination about whether node expansion is required for the volume.
123+
124+
In this mode `NodeExpandVolume` can be called while pod is running and volume is in-use. Using aformentioned check if kubelet determines that
125+
volume expansion is needed on the node and plugin implements `NodeExpandVolume` RPC call then, kubelet will call it(provided volume has already been node staged and published on the node) and:
126+
127+
If `NodeExpandVolume` is successful:
128+
- It will update `pvc.Status.Capacity` with latest value and remove all resizing related conditions from PVC.
129+
130+
If `NodeExpandVolume` failed:
131+
- It will add a event to both PVC and Pod about failed resizing and resize operation will be retried.
132+
133+
### Risks and Mitigations
134+
135+
Before this feature goes GA - we need to handle recovering https://github.com/kubernetes/kubernetes/issues/73036.
136+
137+
## Test Plan
138+
139+
* Unit tests for external resize controller.
140+
* Add e2e tests in Kubernetes that use csi-mock driver for volume resizing.
141+
- (postive) Give a plugin that supports both control plane and node size resize, CSI volume should be resizable and able to complete successfully.
142+
- (positive) Given a plugin that only requires control plane resize, CSI volume should be resizable and able to complete successfully.
143+
- (positive) Given a plugin that only requires node side resize, CSI volume should be resizable and able to complete successfully.
144+
- (positive) Given a plugin that support online resizing, CSI volume should be resizable and online resize operation be able to complete successfully.
145+
- (negative) If control resize fails, PVC should have appropriate events.
146+
- (neative) if node side resize fails, both pod and PVC should have appropriate events.
147+
148+
## Graduation Criteria
149+
150+
Once implemented CSI volumes should be resizable and in-line with current in-tree implementation of volume resizing.
151+
152+
- *Alpha* : Initial support for CSI volume resizing. Released code will include an external CSI volume resize controller and changes to Kubelet. Implementation will have unit tests and csi-mock driver e2e tests.
153+
- *Beta* : More robust support for CSI volume resizing, handle recovering from resize failures. Add e2e tests that use real drivers(`gce-pd`, `ebs` at minimum). Add metrics for volume resize operations.
154+
- *GA* : CSI resizing in general will only leave GA after existing [Volume expansion](https://github.com/kubernetes/enhancements/issues/284) feature leaves GA. Online resizing of CSI volumes depends on [Online resizing](https://github.com/kubernetes/enhancements/pull/737) feature and online resizing of CSI volumes will be available as a GA feature only when [Online resizing feature](https://github.com/kubernetes/enhancements/pull/737) goes GA.
155+
156+
Hopefully the content previously contained in [umbrella issues][] will be tracked in the `Graduation Criteria` section.
157+
158+
[umbrella issues]: https://github.com/kubernetes/kubernetes/issues/62096
159+
160+
## Implementation History
161+
162+
Major milestones in the life cycle of a KEP should be tracked in `Implementation History`.
163+
Major milestones might include
164+
165+
- the `Summary` and `Motivation` sections being merged signaling SIG acceptance
166+
- the `Proposal` section being merged signaling agreement on a proposed design
167+
- the date implementation started
168+
- the first Kubernetes release where an initial version of the KEP was available
169+
- the version of Kubernetes where the KEP graduated to general availability
170+
- when the KEP was retired or superseded

0 commit comments

Comments
 (0)