Skip to content

[CI Failure]: LoRA TP Test (Distributed) - lora/test_llama_tp.py::test_tp2_serialize_and_deserialize_lora #20723

@mgoin

Description

@mgoin

Name of failing test

lora/test_llama_tp.py::test_tp2_serialize_and_deserialize_lora

Basic information

  • Flaky test
  • Can reproduce locally
  • Caused by external libraries (e.g. bug in transformers)

🧪 Describe the failing test

https://buildkite.com/vllm/ci/builds/23536/steps/canvas?sid=0197f0f3-a191-49c0-aef5-89d61c597808

[2025-07-09T23:17:11Z] (VllmWorker rank=0 pid=11292) WARNING 07-09 16:17:11 [tensorizer.py:226] Provided both tensorizer_dir and tensorizer_uri. Inferring tensorizer_dir from tensorizer_uri as the latter takes precedence.

[2025-07-09T23:17:11Z] (VllmWorker rank=0 pid=11292) ERROR 07-09 16:17:11 [multiproc_executor.py:487] Traceback (most recent call last):
[2025-07-09T23:17:11Z] (VllmWorker rank=0 pid=11292) ERROR 07-09 16:17:11 [multiproc_executor.py:487]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 461, in worker_main
[2025-07-09T23:17:11Z] (VllmWorker rank=0 pid=11292) ERROR 07-09 16:17:11 [multiproc_executor.py:487]     worker = WorkerProc(*args, **kwargs)
[2025-07-09T23:17:11Z] (VllmWorker rank=0 pid=11292) ERROR 07-09 16:17:11 [multiproc_executor.py:487]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-09T23:17:11Z] (VllmWorker rank=0 pid=11292) ERROR 07-09 16:17:11 [multiproc_executor.py:487]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 358, in __init__
[2025-07-09T23:17:11Z] (VllmWorker rank=0 pid=11292) ERROR 07-09 16:17:11 [multiproc_executor.py:487]     self.worker.load_model()
[2025-07-09T23:17:11Z] (VllmWorker rank=0 pid=11292) ERROR 07-09 16:17:11 [multiproc_executor.py:487]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 186, in load_model
[2025-07-09T23:17:11Z] (VllmWorker rank=0 pid=11292) ERROR 07-09 16:17:11 [multiproc_executor.py:487]     self.model_runner.load_model()
[2025-07-09T23:17:11Z] (VllmWorker rank=0 pid=11292) ERROR 07-09 16:17:11 [multiproc_executor.py:487]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1773, in load_model
[2025-07-09T23:17:11Z] (VllmWorker rank=0 pid=11292) ERROR 07-09 16:17:11 [multiproc_executor.py:487]     model_loader = get_model_loader(self.load_config)
[2025-07-09T23:17:11Z] (VllmWorker rank=0 pid=11292) ERROR 07-09 16:17:11 [multiproc_executor.py:487]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-09T23:17:11Z] (VllmWorker rank=0 pid=11292) ERROR 07-09 16:17:11 [multiproc_executor.py:487]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/__init__.py", line 33, in get_model_loader
[2025-07-09T23:17:11Z] (VllmWorker rank=0 pid=11292) ERROR 07-09 16:17:11 [multiproc_executor.py:487]     return TensorizerLoader(load_config)
[2025-07-09T23:17:11Z] (VllmWorker rank=0 pid=11292) ERROR 07-09 16:17:11 [multiproc_executor.py:487]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-09T23:17:11Z] (VllmWorker rank=0 pid=11292) ERROR 07-09 16:17:11 [multiproc_executor.py:487]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/tensorizer_loader.py", line 45, in __init__
[2025-07-09T23:17:11Z] (VllmWorker rank=0 pid=11292) ERROR 07-09 16:17:11 [multiproc_executor.py:487]     self.tensorizer_config = TensorizerConfig(
[2025-07-09T23:17:11Z] (VllmWorker rank=0 pid=11292) ERROR 07-09 16:17:11 [multiproc_executor.py:487]                              ^^^^^^^^^^^^^^^^^
[2025-07-09T23:17:11Z] (VllmWorker rank=0 pid=11292) ERROR 07-09 16:17:11 [multiproc_executor.py:487]   File "<string>", line 16, in __init__
[2025-07-09T23:17:11Z] (VllmWorker rank=0 pid=11292) ERROR 07-09 16:17:11 [multiproc_executor.py:487]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/tensorizer.py", line 232, in __post_init__

📝 History of failing test

It seems like this failure was introduced by #19619 as it introduced this check in the tensorizer.py

        if self.tensorizer_dir and self.lora_dir:
            raise ValueError(
                "Only one of tensorizer_dir or lora_dir may be specified. "
                "Use lora_dir exclusively when serializing LoRA adapters, "
                "and tensorizer_dir or tensorizer_uri otherwise.")

It seems the failing test here wasn't triggered by the conditional check

CC List.

@sangstar @Eta0 @aarnphm @jeejeelee please take a look

Metadata

Metadata

Assignees

No one assigned

    Labels

    ci-failureIssue about an unexpected test failure in CI

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions