Skip to content

Conversation

@bastien-curutchet
Copy link
Contributor

No description provided.

Hi all,

The test_xsk.sh script covers many AF_XDP use cases. The tests it runs
are defined in xksxceiver.c. Since this script is used to test real
hardware, the goal here is to leave it as it is, and only integrate the
tests that run on veth peers into the test_progs framework.

I've looked into what could improve the speed in the CI:
- some tests are skipped when run on veth peers in a VM (because they
  rely on huge page allocation or HW rings). This skipping logic still
  takes some time and can be easily avoided.
- the TEARDOWN test is quite long (several seconds on its own) because
  it runs the same test 10 times in a row to ensure the teardown process
  works properly

With theses tests fully skipped in the CI and the veth setup done only
once for each mode (DRV / SKB), the execution time is reduced to about 5
seconds on my setup.
```
$ tools/testing/selftests/bpf/vmtest.sh -d $HOME/ebpf/output-regular/ -- time ./test_progs -t xsk
[...]
real    0m 5.04s
user    0m 0.38s
sys     0m 1.61s
```

It still feels a bit long, but there are 24 tests run in both DRV and
SKB modes which means around 100ms for each one. I'm not sure I can make
it much faster without randomizing the tests so that not all of them run
in every CI execution.

PATCH 1 extracts test_xsk[.c/.h] from xskxceiver[.c/.h] to make the
tests available to test_progs.
PATCH 2 to 7 fix small issues in the current test
PATCH 8 to 13 handle all errors to release resources instead of calling
exit() when any error occurs.
PATCH 14 isolates the tests that won't fit in the CI
PATCH 15 integrates the CI tests to the test_progs framework

To: Björn Töpel <[email protected]>
To: Magnus Karlsson <[email protected]>
To: Maciej Fijalkowski <[email protected]>
To: Jonathan Lemon <[email protected]>
To: Alexei Starovoitov <[email protected]>
To: Daniel Borkmann <[email protected]>
To: Andrii Nakryiko <[email protected]>
To: Martin KaFai Lau <[email protected]>
To: Eduard Zingerman <[email protected]>
To: Song Liu <[email protected]>
To: Yonghong Song <[email protected]>
To: John Fastabend <[email protected]>
To: KP Singh <[email protected]>
To: Stanislav Fomichev <[email protected]>
To: Hao Luo <[email protected]>
To: Jiri Olsa <[email protected]>
To: Mykola Lysenko <[email protected]>
To: Shuah Khan <[email protected]>
To: David S. Miller <[email protected]>
To: Jakub Kicinski <[email protected]>
To: Jesper Dangaard Brouer <[email protected]>
Cc: Thomas Petazzoni <[email protected]>
Cc: Alexis Lothore <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>

---
Changes in v7:
- EDITME: describe what is new in this series revision.
- EDITME: use bulletpoints and terse descriptions.
- Link to v6: https://lore.kernel.org/r/[email protected]

Changes in v6:
- Setup veth peer once for each mode instead of once for each substest
- Rename the 'flaky' table 'skip-ci' table and move the automatically
  skipped and the longest tests into it
- Link to v5: https://lore.kernel.org/r/[email protected]

Changes in v5:
- Rebase on latest bpf-next_base
- Move XDP_ADJUST_TAIL_SHRINK_MULTI_BUFF to the flaky table
- Add Maciej's reviewed-by
- Link to v4: https://lore.kernel.org/r/[email protected]

Changes in v4:
- Fix test_xsk.sh's summary report.
- Merge PATCH 11 & 12 together, otherwise PATCH 11 fails to build.
- Split old PATCH 3 in two patches. The first one fixes
  testapp_stats_rx_dropped(), the second one fixes
  testapp_xdp_shared_umem(). The unecessary frees (in
  testapp_stats_rx_full() and testapp_stats_fill_empty() are removed)
- Link to v3: https://lore.kernel.org/r/[email protected]

Changes in v3:
- Rebase on latest bpf-next_base to integrate commit c9110e6 ("selftests/bpf:
Fix count write in testapp_xdp_metadata_copy()").
- Move XDP_METADATA_COPY_* tests from flaky-tests to nominal tests
- Link to v2: https://lore.kernel.org/r/[email protected]

Changes in v2:
- Rebase on the latest bpf-next_base and integrate the newly added tests
  to the work (adjust_tail* and tx_queue_consumer tests)
- Re-order patches to split xkxceiver sooner.
- Fix the bug reported by Maciej.
- Fix verbose mode in test_xsk.sh by keeping kselftest (remove PATCH 1,
  7 and 8)
- Link to v1: https://lore.kernel.org/r/[email protected]

--- b4-submit-tracking ---
# This section is used internally by b4 prep for tracking purposes.
{
  "series": {
    "revision": 7,
    "change-id": "20250218-xsk-0cf90e975d14",
    "prefixes": [
      "bpf-next"
    ],
    "history": {
      "v1": [
        "[email protected]"
      ],
      "v2": [
        "[email protected]"
      ],
      "v3": [
        "[email protected]"
      ],
      "v4": [
        "[email protected]"
      ],
      "v5": [
        "[email protected]"
      ],
      "v6": [
        "[email protected]"
      ]
    }
  }
}
AF_XDP features are tested by the test_xsk.sh script but not by the
test_progs framework. The tests used by the script are defined in
xksxceiver.c which can't be integrated in the test_progs framework as is.

Extract these test definitions from xskxceiver{.c/.h} to put them in new
test_xsk{.c/.h} files.
Keep the main() function and its unshared dependencies in xksxceiver to
avoid impacting the test_xsk.sh script which is often used to test real
hardware.
Move ksft_test_result_*() calls to xskxceiver.c to keep the kselftest's
report valid

Reviewed-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>
bitmap is used before being initialized.

Initialize it to zero before using it.

Reviewed-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>
__testapp_validate_traffic is supposed to return an integer value that
tells if the test passed (0), failed (-1) or was skiped (2). It actually
returns a boolean in the end. This doesn't harm when the test is
successful but can lead to misinterpretation in case of failure as 1
will be returned instead of -1.

Return TEST_FAILURE (-1) in case of failure, TEST_PASS (0) otherwise.

Reviewed-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>
testapp_stats_rx_dropped() generates pkt_stream twice. The last
generated is released by pkt_stream_restore_default() at the end of the
test but we lose the pointer of the first pkt_stream.

Release the 'middle' pkt_stream when it's getting replaced to prevent
memory leaks.

Reviewed-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>
testapp_xdp_shared_umem() generates pkt_stream on each xsk from xsk_arr,
where normally xsk_arr[0] gets pkt_streams and xsk_arr[1] have them NULLed.
At the end of the test pkt_stream_restore_default() only releases
xsk_arr[0] which leads to memory leaks.

Release the missing pkt_stream at the end of testapp_xdp_shared_umem()

Reviewed-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>
The clean-up done at the end of a test in __testapp_validate_traffic()
isn't wrapped in a function. It isn't convenient if we want to use it
somewhere else in the code.

Wrap the clean-up in two new functions : the first deletes the sockets,
the second releases the umem.

Reviewed-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>
testapp_validate_traffic() doesn't release the sockets and the umem
created by the threads if the test isn't currently in its last step.
Thus, if the swap_xsk_resources() fails before the last step, the
created resources aren't cleaned up.

Clean the sockets and the umem in case of swap_xsk_resources() failure.

Reviewed-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>
init_iface() doesn't have any return value while it can fail. In case of
failure it calls exit_on_error() which exits the application
immediately. This prevents the following tests from being run and isn't
compliant with the CI

Add a return value to init_iface() so errors can be handled more
smoothly.

Reviewed-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>
xsk_reattach_xdp calls exit_with_error() on failures. This exits the
program immediately. It prevents the following tests from being run and
isn't compliant with the CI.

Add a return value to the functions handling XDP attachments to handle
errors more smoothly.

Reviewed-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>
exit_with_error() is called when gettimeofday() fails. This exits the
program immediately. It prevents the following tests from being run and
isn't compliant with the CI.

Return TEST_FAILURE instead of calling exit_on_error().

Reviewed-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>
TX and RX workers can fail in many places. These failures trigger a call
to exit_with_error() which exits the program immediately. It prevents the
following tests from running and isn't compliant with the CI.

Add return value to functions that can fail.
Handle failures more smoothly through report_failure().

Reviewed-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>
…ails

__testapp_validate_traffic() calls exit_with_error() on failures. This
exits the program immediately. It prevents the following tests from
running and isn't compliant with the CI.

Return TEST_FAILURE instead of calling exit_with_error().
Release the resource of the 1st thread if a failure happens between its
creation and the creation of the second thread.

Reviewed-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>
If any allocation in the pkt_stream_*() helpers fail, exit_with_error() is
called. This terminates the program immediately. It prevents the following
tests from running and isn't compliant with the CI.

Return NULL in case of allocation failure.
Return TEST_FAILURE when something goes wrong in the packet generation.
Clean up the resources if a failure happens between two steps of a test.

Move exit_with_error()'s definition into xskxceiver.c as it isn't used
anywhere else now.

Reviewed-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>
Following tests won't fit in the CI:
- XDP_ADJUST_TAIL_* and SEND_RECEIVE_9K_PACKETS because of their
  flakyness
- UNALIGNED_* because they depend on huge page allocations
- *_RING_SIZE because they depend on HW rings
- TEARDOWN because it's too long

Remove these tests from the nominal tests table so they won't be
run by the CI in upcoming patch.
Create a skip_ci_tests table to hold them.
Use this skip_ci table in xskxceiver.c to keep all the tests available
from the test_xsk.sh script.

Reviewed-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>
test_xsk.c isn't part of the test_progs framework.

Integrate the tests defined by test_xsk.c into the test_progs framework
through a new file : prog_tests/xsk.c. ZeroCopy mode isn't tested in it
as veth peers don't support it.

Move test_xsk{.c/.h} to prog_tests/.

Add the find_bit library to test_progs sources in the Makefile as it is
is used by test_xsk.c

Reviewed-by: Maciej Fijalkowski <[email protected]>
Signed-off-by: Bastien Curutchet (eBPF Foundation) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant