B4/xsk #10138

bastien-curutchet · 2025-10-29T13:20:23Z

No description provided.

Hi all, The test_xsk.sh script covers many AF_XDP use cases. The tests it runs are defined in xksxceiver.c. Since this script is used to test real hardware, the goal here is to leave it as it is, and only integrate the tests that run on veth peers into the test_progs framework. I've looked into what could improve the speed in the CI: - some tests are skipped when run on veth peers in a VM (because they rely on huge page allocation or HW rings). This skipping logic still takes some time and can be easily avoided. - the TEARDOWN test is quite long (several seconds on its own) because it runs the same test 10 times in a row to ensure the teardown process works properly With theses tests fully skipped in the CI and the veth setup done only once for each mode (DRV / SKB), the execution time is reduced to about 5 seconds on my setup. ``` $ tools/testing/selftests/bpf/vmtest.sh -d $HOME/ebpf/output-regular/ -- time ./test_progs -t xsk [...] real 0m 5.04s user 0m 0.38s sys 0m 1.61s ``` It still feels a bit long, but there are 24 tests run in both DRV and SKB modes which means around 100ms for each one. I'm not sure I can make it much faster without randomizing the tests so that not all of them run in every CI execution. PATCH 1 extracts test_xsk[.c/.h] from xskxceiver[.c/.h] to make the tests available to test_progs. PATCH 2 to 7 fix small issues in the current test PATCH 8 to 13 handle all errors to release resources instead of calling exit() when any error occurs. PATCH 14 isolates the tests that won't fit in the CI PATCH 15 integrates the CI tests to the test_progs framework To: Björn Töpel <[email protected]> To: Magnus Karlsson <[email protected]> To: Maciej Fijalkowski <[email protected]> To: Jonathan Lemon <[email protected]> To: Alexei Starovoitov <[email protected]> To: Daniel Borkmann <[email protected]> To: Andrii Nakryiko <[email protected]> To: Martin KaFai Lau <[email protected]> To: Eduard Zingerman <[email protected]> To: Song Liu <[email protected]> To: Yonghong Song <[email protected]> To: John Fastabend <[email protected]> To: KP Singh <[email protected]> To: Stanislav Fomichev <[email protected]> To: Hao Luo <[email protected]> To: Jiri Olsa <[email protected]> To: Mykola Lysenko <[email protected]> To: Shuah Khan <[email protected]> To: David S. Miller <[email protected]> To: Jakub Kicinski <[email protected]> To: Jesper Dangaard Brouer <[email protected]> Cc: Thomas Petazzoni <[email protected]> Cc: Alexis Lothore <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]> --- Changes in v7: - EDITME: describe what is new in this series revision. - EDITME: use bulletpoints and terse descriptions. - Link to v6: https://lore.kernel.org/r/[email protected] Changes in v6: - Setup veth peer once for each mode instead of once for each substest - Rename the 'flaky' table 'skip-ci' table and move the automatically skipped and the longest tests into it - Link to v5: https://lore.kernel.org/r/[email protected] Changes in v5: - Rebase on latest bpf-next_base - Move XDP_ADJUST_TAIL_SHRINK_MULTI_BUFF to the flaky table - Add Maciej's reviewed-by - Link to v4: https://lore.kernel.org/r/[email protected] Changes in v4: - Fix test_xsk.sh's summary report. - Merge PATCH 11 & 12 together, otherwise PATCH 11 fails to build. - Split old PATCH 3 in two patches. The first one fixes testapp_stats_rx_dropped(), the second one fixes testapp_xdp_shared_umem(). The unecessary frees (in testapp_stats_rx_full() and testapp_stats_fill_empty() are removed) - Link to v3: https://lore.kernel.org/r/[email protected] Changes in v3: - Rebase on latest bpf-next_base to integrate commit c9110e6 ("selftests/bpf: Fix count write in testapp_xdp_metadata_copy()"). - Move XDP_METADATA_COPY_* tests from flaky-tests to nominal tests - Link to v2: https://lore.kernel.org/r/[email protected] Changes in v2: - Rebase on the latest bpf-next_base and integrate the newly added tests to the work (adjust_tail* and tx_queue_consumer tests) - Re-order patches to split xkxceiver sooner. - Fix the bug reported by Maciej. - Fix verbose mode in test_xsk.sh by keeping kselftest (remove PATCH 1, 7 and 8) - Link to v1: https://lore.kernel.org/r/[email protected] --- b4-submit-tracking --- # This section is used internally by b4 prep for tracking purposes. { "series": { "revision": 7, "change-id": "20250218-xsk-0cf90e975d14", "prefixes": [ "bpf-next" ], "history": { "v1": [ "[email protected]" ], "v2": [ "[email protected]" ], "v3": [ "[email protected]" ], "v4": [ "[email protected]" ], "v5": [ "[email protected]" ], "v6": [ "[email protected]" ] } } }

AF_XDP features are tested by the test_xsk.sh script but not by the test_progs framework. The tests used by the script are defined in xksxceiver.c which can't be integrated in the test_progs framework as is. Extract these test definitions from xskxceiver{.c/.h} to put them in new test_xsk{.c/.h} files. Keep the main() function and its unshared dependencies in xksxceiver to avoid impacting the test_xsk.sh script which is often used to test real hardware. Move ksft_test_result_*() calls to xskxceiver.c to keep the kselftest's report valid Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>

bitmap is used before being initialized. Initialize it to zero before using it. Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>

__testapp_validate_traffic is supposed to return an integer value that tells if the test passed (0), failed (-1) or was skiped (2). It actually returns a boolean in the end. This doesn't harm when the test is successful but can lead to misinterpretation in case of failure as 1 will be returned instead of -1. Return TEST_FAILURE (-1) in case of failure, TEST_PASS (0) otherwise. Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>

testapp_stats_rx_dropped() generates pkt_stream twice. The last generated is released by pkt_stream_restore_default() at the end of the test but we lose the pointer of the first pkt_stream. Release the 'middle' pkt_stream when it's getting replaced to prevent memory leaks. Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>

testapp_xdp_shared_umem() generates pkt_stream on each xsk from xsk_arr, where normally xsk_arr[0] gets pkt_streams and xsk_arr[1] have them NULLed. At the end of the test pkt_stream_restore_default() only releases xsk_arr[0] which leads to memory leaks. Release the missing pkt_stream at the end of testapp_xdp_shared_umem() Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>

The clean-up done at the end of a test in __testapp_validate_traffic() isn't wrapped in a function. It isn't convenient if we want to use it somewhere else in the code. Wrap the clean-up in two new functions : the first deletes the sockets, the second releases the umem. Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>

testapp_validate_traffic() doesn't release the sockets and the umem created by the threads if the test isn't currently in its last step. Thus, if the swap_xsk_resources() fails before the last step, the created resources aren't cleaned up. Clean the sockets and the umem in case of swap_xsk_resources() failure. Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>

init_iface() doesn't have any return value while it can fail. In case of failure it calls exit_on_error() which exits the application immediately. This prevents the following tests from being run and isn't compliant with the CI Add a return value to init_iface() so errors can be handled more smoothly. Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>

xsk_reattach_xdp calls exit_with_error() on failures. This exits the program immediately. It prevents the following tests from being run and isn't compliant with the CI. Add a return value to the functions handling XDP attachments to handle errors more smoothly. Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>

exit_with_error() is called when gettimeofday() fails. This exits the program immediately. It prevents the following tests from being run and isn't compliant with the CI. Return TEST_FAILURE instead of calling exit_on_error(). Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>

TX and RX workers can fail in many places. These failures trigger a call to exit_with_error() which exits the program immediately. It prevents the following tests from running and isn't compliant with the CI. Add return value to functions that can fail. Handle failures more smoothly through report_failure(). Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>

…ails __testapp_validate_traffic() calls exit_with_error() on failures. This exits the program immediately. It prevents the following tests from running and isn't compliant with the CI. Return TEST_FAILURE instead of calling exit_with_error(). Release the resource of the 1st thread if a failure happens between its creation and the creation of the second thread. Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>

If any allocation in the pkt_stream_*() helpers fail, exit_with_error() is called. This terminates the program immediately. It prevents the following tests from running and isn't compliant with the CI. Return NULL in case of allocation failure. Return TEST_FAILURE when something goes wrong in the packet generation. Clean up the resources if a failure happens between two steps of a test. Move exit_with_error()'s definition into xskxceiver.c as it isn't used anywhere else now. Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>

Following tests won't fit in the CI: - XDP_ADJUST_TAIL_* and SEND_RECEIVE_9K_PACKETS because of their flakyness - UNALIGNED_* because they depend on huge page allocations - *_RING_SIZE because they depend on HW rings - TEARDOWN because it's too long Remove these tests from the nominal tests table so they won't be run by the CI in upcoming patch. Create a skip_ci_tests table to hold them. Use this skip_ci table in xskxceiver.c to keep all the tests available from the test_xsk.sh script. Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>

test_xsk.c isn't part of the test_progs framework. Integrate the tests defined by test_xsk.c into the test_progs framework through a new file : prog_tests/xsk.c. ZeroCopy mode isn't tested in it as veth peers don't support it. Move test_xsk{.c/.h} to prog_tests/. Add the find_bit library to test_progs sources in the Makefile as it is is used by test_xsk.c Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from 4481a85 to 5125528 Compare October 29, 2025 19:39

adding ci files

1e2d874

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from 5125528 to 1e2d874 Compare October 30, 2025 01:28

bastien-curutchet added 16 commits October 31, 2025 08:40

selftests/bpf: test_xsk: Initialize bitmap before use

04e9657

bitmap is used before being initialized. Initialize it to zero before using it. Reviewed-by: Maciej Fijalkowski <[email protected]> Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>

bastien-curutchet force-pushed the b4/xsk branch from 3430305 to 3c6da8a Compare October 31, 2025 07:40

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch 7 times, most recently from de0745f to efe6edf Compare November 6, 2025 01:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

B4/xsk #10138

B4/xsk #10138

Uh oh!

bastien-curutchet commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

B4/xsk #10138

Are you sure you want to change the base?

B4/xsk #10138

Uh oh!

Conversation

bastien-curutchet commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant