-
Notifications
You must be signed in to change notification settings - Fork 15.1k
Promote sysctls to Beta #8804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
k8s-ci-robot
merged 5 commits into
kubernetes:release-1.11
from
ingvagabund:update-sysctl-docs
Jun 15, 2018
Merged
Promote sysctls to Beta #8804
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,13 +1,15 @@ | ||
| --- | ||
| title: Using Sysctls in a Kubernetes Cluster | ||
| title: Using sysctls in a Kubernetes Cluster | ||
| reviewers: | ||
| - sttts | ||
| content_template: templates/task | ||
| --- | ||
|
|
||
| {{% capture overview %}} | ||
| {{< feature-state for_k8s_version="v1.11" state="beta" >}} | ||
|
|
||
| This document describes how sysctls are used within a Kubernetes cluster. | ||
| This document describes how to configure and use kernel parameters within a | ||
| Kubernetes cluster using the sysctl interface. | ||
|
|
||
| {{% /capture %}} | ||
|
|
||
|
|
@@ -74,7 +76,7 @@ application tuning. _Unsafe_ sysctls are enabled on a node-by-node basis with a | |
| flag of the kubelet, e.g.: | ||
|
|
||
| ```shell | ||
| $ kubelet --experimental-allowed-unsafe-sysctls \ | ||
| $ kubelet --allowed-unsafe-sysctls \ | ||
| 'kernel.msg*,net.ipv4.route.min_pmtu' ... | ||
| ``` | ||
|
|
||
|
|
@@ -89,36 +91,49 @@ Only _namespaced_ sysctls can be enabled this way. | |
| ## Setting Sysctls for a Pod | ||
|
|
||
| A number of sysctls are _namespaced_ in today's Linux kernels. This means that | ||
| they can be set independently for each pod on a node. Being namespaced is a | ||
| requirement for sysctls to be accessible in a pod context within Kubernetes. | ||
| they can be set independently for each pod on a node. Only namespaced sysctls | ||
| are configurable via the pod securityContext within Kubernetes. | ||
|
|
||
| The following sysctls are known to be _namespaced_: | ||
| The following sysctls are known to be namespaced. This list could change | ||
| in future versions of the Linux kernel. | ||
|
|
||
| - `kernel.shm*`, | ||
| - `kernel.msg*`, | ||
| - `kernel.sem`, | ||
| - `fs.mqueue.*`, | ||
| - `net.*`. | ||
|
|
||
| Sysctls which are not namespaced are called _node-level_ and must be set | ||
| manually by the cluster admin, either by means of the underlying Linux | ||
| distribution of the nodes (e.g. via `/etc/sysctls.conf`) or using a DaemonSet | ||
| with privileged containers. | ||
| Sysctls with no namespace are called _node-level_ sysctls. If you need to set | ||
| them, you must manually configure them on each node's operating system, or by | ||
| using a DaemonSet with privileged containers. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 |
||
|
|
||
| The sysctl feature is an alpha API. Therefore, sysctls are set using annotations | ||
| on pods. They apply to all containers in the same pod. | ||
| Use the pod securityContext to configure namespaced sysctls. The securityContext | ||
| applies to all containers in the same pod. | ||
|
|
||
| Here is an example, with different annotations for _safe_ and _unsafe_ sysctls: | ||
| This example uses the pod securityContext to set a safe sysctl | ||
| `kernel.shm_rmid_forced` and two unsafe sysctls `net.ipv4.route.min_pmtu` and | ||
| `kernel.msgmax` There is no distinction between _safe_ and _unsafe_ sysctls in | ||
| the specification. | ||
|
|
||
| {{< warning >}} | ||
| Only modify sysctl parameters after you understand their effects, to avoid | ||
| destabilizing your operating system. | ||
| {{< /warning >}} | ||
|
|
||
| ```yaml | ||
| apiVersion: v1 | ||
| kind: Pod | ||
| metadata: | ||
| name: sysctl-example | ||
| annotations: | ||
| security.alpha.kubernetes.io/sysctls: kernel.shm_rmid_forced=1 | ||
| security.alpha.kubernetes.io/unsafe-sysctls: net.ipv4.route.min_pmtu=1000,kernel.msgmax=1 2 3 | ||
| spec: | ||
| securityContext: | ||
| sysctls: | ||
| - name: kernel.shm_rmid_forced | ||
| value: "0" | ||
| - name: net.ipv4.route.min_pmtu | ||
| value: "552" | ||
| - name: kernel.msgmax | ||
| value: "65536" | ||
| ... | ||
| ``` | ||
| {{% /capture %}} | ||
|
|
@@ -143,27 +158,52 @@ is recommended to use | |
| [taints on nodes](/docs/concepts/configuration/taint-and-toleration/) | ||
| to schedule those pods onto the right nodes. | ||
|
|
||
| ## PodSecurityPolicy Annotations | ||
| ## PodSecurityPolicy | ||
|
|
||
| You can further control which sysctls can be set in pods by specifying lists of | ||
| sysctls or sysctl patterns in the `forbiddenSysctls` and/or | ||
| `allowedUnsafeSysctls` fields of the PodSecurityPolicy. A sysctl pattern ends | ||
| with a `*` character, such as `kernel.*`. A `*` character on its own matches | ||
| all sysctls. | ||
|
|
||
| By default, all safe sysctls are allowed. | ||
|
|
||
| Both `forbiddenSysctls` and `allowedUnsafeSysctls` are lists of plain sysctl names | ||
| or sysctl patterns (which end with `*`). The string `*` matches all sysctls. | ||
|
|
||
| The use of sysctl in pods can be controlled via annotation on the PodSecurityPolicy. | ||
| The `forbiddenSysctls` field excludes specific sysctls. You can forbid a | ||
| combination of safe and unsafe sysctls in the list. To forbid setting any | ||
| sysctls, use `*` on its own. | ||
|
|
||
| Sysctl annotation represents a whitelist of allowed safe and unsafe sysctls | ||
| in a pod spec. It's a comma-separated list of plain sysctl names or sysctl patterns | ||
| (which end in `*`). The string `*` matches all sysctls. | ||
| If you specify any unsafe sysctl in the `allowedUnsafeSysctls` field and it is | ||
| not present in the `forbiddenSysctls` field, that sysctl can be used in Pods | ||
| using this PodSecurityPolicy. To allow all unsafe sysctls in the | ||
| PodSecurityPolicy to be set, use `*` on its own. | ||
|
|
||
| Here is an example, it authorizes binding user creating pod with corresponding sysctls. | ||
| Do not configure these two fields such that there is overlap, meaning that a | ||
| given sysctl is both allowed and forbidden. | ||
|
|
||
| {{< warning >}} | ||
| **Warning**: If you whitelist unsafe sysctls via the `allowedUnsafeSysctls` field | ||
| in a PodSecurityPolicy, any pod using such a sysctl will fail to start | ||
| if the sysctl is not whitelisted via the `--allowed-unsafe-sysctls` kubelet | ||
| flag as well on that node. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 |
||
| {{< /warning >}} | ||
|
|
||
| This example allows unsafe sysctls prefixed with `kernel.msg` to be set and | ||
| disallows setting of the `kernel.shm_rmid_forced` sysctl. | ||
|
|
||
| ```yaml | ||
| apiVersion: policy/v1beta1 | ||
| kind: PodSecurityPolicy | ||
| metadata: | ||
| name: sysctl-psp | ||
| annotations: | ||
| security.alpha.kubernetes.io/sysctls: 'net.ipv4.route.*,kernel.msg*' | ||
| spec: | ||
| allowedUnsafeSysctls: | ||
| - kernel.msg* | ||
| forbiddenSysctls: | ||
| - kernel.shm_rmid_forced | ||
| ... | ||
| ``` | ||
|
|
||
| {{% /capture %}} | ||
|
|
||
|
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this text changed? What is an "underlying node"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even with the change about it still make sense to me. Though, if it is still generally unclear, I can undo the change. @sttts WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I introduced this change to try to make it clear that you'd be messing with the underlying OS of the node, outside the scope of Kubernetes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that a node is a fixed concept in kube. What should be the underlying node be? Write "underlying operation system" or "underlying machine".
Also the "operation system" is misleading. The operation system is visible all the time, to each container via syscalls of the kernel. The point here is that the user-level tooling by the distribution has to be used, not the operation system as the kernel visible to containers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that helps. Let me try to fix this better.