Commit 1676fb0
bpf: Introduce SK_BPF_MEMCG_FLAGS and SK_BPF_MEMCG_EXCLUSIVE.
If a socket has sk->sk_memcg with SK_MEMCG_EXCLUSIVE, it is decoupled
from the global protocol memory accounting.
This is controlled by net.core.memcg_exclusive sysctl, but it lacks
flexibility.
Let's support flagging (and clearing) SK_MEMCG_EXCLUSIVE via
bpf_setsockopt() at the BPF_CGROUP_INET_SOCK_CREATE hook.
u32 flags = SK_BPF_MEMCG_EXCLUSIVE;
bpf_setsockopt(ctx, SOL_SOCKET, SK_BPF_MEMCG_FLAGS,
&flags, sizeof(flags));
As with net.core.memcg_exclusive, this is inherited to child sockets,
and BPF always takes precedence over sysctl at socket(2) and accept(2).
SK_BPF_MEMCG_FLAGS is only supported at BPF_CGROUP_INET_SOCK_CREATE
and not supported on other hooks for some reasons:
1. UDP charges memory under sk->sk_receive_queue.lock instead
of lock_sock()
2. For TCP child sockets, memory accounting is adjusted only in
__inet_accept() which sk->sk_memcg allocation is deferred to
3. Modifying the flag after skb is charged to sk requires such
adjustment during bpf_setsockopt() and complicates the logic
unnecessarily
We can support other hooks later if a real use case justifies that.
Most changes are inline and hard to trace, but a microbenchmark on
__sk_mem_raise_allocated() during neper/tcp_stream showed that more
samples completed faster with SK_MEMCG_EXCLUSIVE. This will be more
visible under tcp_mem pressure.
# bpftrace -e 'kprobe:__sk_mem_raise_allocated { @start[tid] = nsecs; }
kretprobe:__sk_mem_raise_allocated /@start[tid]/
{ @EnD[tid] = nsecs - @start[tid]; @times = hist(@EnD[tid]); delete(@start[tid]); }'
# tcp_stream -6 -F 1000 -N -T 256
Without bpf prog:
[128, 256) 3846 | |
[256, 512) 1505326 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[512, 1K) 1371006 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[1K, 2K) 198207 |@@@@@@ |
[2K, 4K) 31199 |@ |
With bpf prog in the next patch:
(must be attached before tcp_stream)
# bpftool prog load sk_memcg.bpf.o /sys/fs/bpf/sk_memcg type cgroup/sock_create
# bpftool cgroup attach /sys/fs/cgroup/test cgroup_inet_sock_create pinned /sys/fs/bpf/sk_memcg
[128, 256) 6413 | |
[256, 512) 1868425 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[512, 1K) 1101697 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[1K, 2K) 117031 |@@@@ |
[2K, 4K) 11773 | |
Signed-off-by: Kuniyuki Iwashima <[email protected]>1 parent abbf2e7 commit 1676fb0
File tree
4 files changed
+49
-0
lines changed- include/uapi/linux
- mm
- net/core
- tools/include/uapi/linux
4 files changed
+49
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7182 | 7182 | | |
7183 | 7183 | | |
7184 | 7184 | | |
| 7185 | + | |
7185 | 7186 | | |
7186 | 7187 | | |
7187 | 7188 | | |
| |||
7204 | 7205 | | |
7205 | 7206 | | |
7206 | 7207 | | |
| 7208 | + | |
| 7209 | + | |
| 7210 | + | |
| 7211 | + | |
| 7212 | + | |
7207 | 7213 | | |
7208 | 7214 | | |
7209 | 7215 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4997 | 4997 | | |
4998 | 4998 | | |
4999 | 4999 | | |
| 5000 | + | |
| 5001 | + | |
| 5002 | + | |
5000 | 5003 | | |
5001 | 5004 | | |
5002 | 5005 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5731 | 5731 | | |
5732 | 5732 | | |
5733 | 5733 | | |
| 5734 | + | |
| 5735 | + | |
| 5736 | + | |
| 5737 | + | |
| 5738 | + | |
| 5739 | + | |
| 5740 | + | |
| 5741 | + | |
| 5742 | + | |
| 5743 | + | |
| 5744 | + | |
| 5745 | + | |
| 5746 | + | |
| 5747 | + | |
| 5748 | + | |
| 5749 | + | |
| 5750 | + | |
| 5751 | + | |
| 5752 | + | |
| 5753 | + | |
| 5754 | + | |
| 5755 | + | |
| 5756 | + | |
| 5757 | + | |
| 5758 | + | |
| 5759 | + | |
5734 | 5760 | | |
5735 | 5761 | | |
5736 | 5762 | | |
| 5763 | + | |
| 5764 | + | |
| 5765 | + | |
| 5766 | + | |
5737 | 5767 | | |
5738 | 5768 | | |
5739 | 5769 | | |
| |||
5751 | 5781 | | |
5752 | 5782 | | |
5753 | 5783 | | |
| 5784 | + | |
| 5785 | + | |
| 5786 | + | |
| 5787 | + | |
5754 | 5788 | | |
5755 | 5789 | | |
5756 | 5790 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7182 | 7182 | | |
7183 | 7183 | | |
7184 | 7184 | | |
| 7185 | + | |
7185 | 7186 | | |
7186 | 7187 | | |
7187 | 7188 | | |
| |||
7204 | 7205 | | |
7205 | 7206 | | |
7206 | 7207 | | |
| 7208 | + | |
| 7209 | + | |
| 7210 | + | |
| 7211 | + | |
| 7212 | + | |
7207 | 7213 | | |
7208 | 7214 | | |
7209 | 7215 | | |
| |||
0 commit comments