You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
net/mlx5e: Make DEFAULT_FRAG_SIZE relative to page size
When page size is 4K, DEFAULT_FRAG_SIZE of 2048 ensures that with 3
fragments per WQE, odd-indexed WQEs always share the same page with
their subsequent WQE, while WQEs consisting of 4 fragments does not.
However, this relationship does not hold for page sizes larger than 8K.
In this case, wqe_index_mask cannot guarantee that newly allocated WQEs
won't share the same page with old WQEs.
If the last WQE in a bulk processed by mlx5e_post_rx_wqes() shares a
page with its subsequent WQE, allocating a page for that WQE will
overwrite mlx5e_frag_page, preventing the original page from being
recycled. When the next WQE is processed, the newly allocated page will
be immediately recycled. In the next round, if these two WQEs are
handled in the same bulk, page_pool_defrag_page() will be called again
on the page, causing pp_frag_count to become negative[1].
Moreover, this can also lead to memory corruption, as the page may have
already been returned to the page pool and re-allocated to another WQE.
And since skb_shared_info is stored at the end of the first fragment,
its frags->bv_page pointer can be overwritten, leading to an invalid
memory access when processing the skb[2].
For example, on 8K page size systems (e.g. DEC Alpha) with a ConnectX-4
Lx MT27710 (MCX4121A-ACA_Ax) NIC setting MTU to 7657 or higher, heavy
network loads (e.g. iperf) will first trigger a series of WARNINGs[1]
and eventually crash[2].
Fix this by making DEFAULT_FRAG_SIZE always equal to half of the page
size.
[1]
WARNING: CPU: 9 PID: 0 at include/net/page_pool/helpers.h:130
mlx5e_page_release_fragmented.isra.0+0xdc/0xf0 [mlx5_core]
CPU: 9 PID: 0 Comm: swapper/9 Tainted: G W 6.6.0
walk_stackframe+0x0/0x190
show_stack+0x70/0x94
dump_stack_lvl+0x98/0xd8
dump_stack+0x2c/0x48
__warn+0x1c8/0x220
warn_slowpath_fmt+0x20c/0x230
mlx5e_page_release_fragmented.isra.0+0xdc/0xf0 [mlx5_core]
mlx5e_free_rx_wqes+0xcc/0x120 [mlx5_core]
mlx5e_post_rx_wqes+0x1f4/0x4e0 [mlx5_core]
mlx5e_napi_poll+0x1c0/0x8d0 [mlx5_core]
__napi_poll+0x58/0x2e0
net_rx_action+0x1a8/0x340
__do_softirq+0x2b8/0x480
[...]
[2]
Unable to handle kernel paging request at virtual address 393837363534333a
Oops [#1]
CPU: 72 PID: 0 Comm: swapper/72 Tainted: G W 6.6.0
Trace:
walk_stackframe+0x0/0x190
show_stack+0x70/0x94
die+0x1d4/0x350
do_page_fault+0x630/0x690
entMM+0x120/0x130
napi_pp_put_page+0x30/0x160
skb_release_data+0x164/0x250
kfree_skb_list_reason+0xd0/0x2f0
skb_release_data+0x1f0/0x250
napi_consume_skb+0xa0/0x220
net_rx_action+0x158/0x340
__do_softirq+0x2b8/0x480
irq_exit+0xd4/0x120
do_entInt+0x164/0x520
entInt+0x114/0x120
[...]
Fixes: 069d114 ("net/mlx5e: RX, Enhance legacy Receive Queue memory scheme")
Signed-off-by: Mingrui Cui <[email protected]>
Reviewed-by: Jacob Keller <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
0 commit comments