[Bug]: qwen3-vl-2b after ms-swift fine-tuning  lance errors

### Your current environment

<details>
<summary>The output of <code>python collect_env.py</code></summary>

```text
Your output of `python collect_env.py` here
```

</details>


### 🐛 Describe the bug

(Worker_TP1 pid=43610) INFO 10-23 09:29:27 [gpu_model_runner.py:2602] Starting to load model /media/kwaishop-langbridge-kfs/chenjunjie/live-product-matcher-reject/sft_train/ms-swift/qwen3_vl_2b_1023_sft_afternoon_prompt_v6_merge...
(Worker_TP1 pid=43610) INFO 10-23 09:29:27 [gpu_model_runner.py:2634] Loading model from scratch...
(Worker_TP1 pid=43610) INFO 10-23 09:29:28 [cuda.py:366] Using Flash Attention backend on V1 engine.
(Worker_TP0 pid=43609) INFO 10-23 09:29:28 [gpu_model_runner.py:2602] Starting to load model /media/kwaishop-langbridge-kfs/chenjunjie/live-product-matcher-reject/sft_train/ms-swift/qwen3_vl_2b_1023_sft_afternoon_prompt_v6_merge...
(Worker_TP0 pid=43609) INFO 10-23 09:29:28 [gpu_model_runner.py:2634] Loading model from scratch...
(Worker_TP0 pid=43609) INFO 10-23 09:29:29 [cuda.py:366] Using Flash Attention backend on V1 engine.
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
(Worker_TP1 pid=43610) INFO 10-23 09:29:30 [default_loader.py:267] Loading weights took 2.22 seconds
(Worker_TP1 pid=43610) INFO 10-23 09:29:31 [gpu_model_runner.py:2653] Model loading took 2.7054 GiB and 2.993759 seconds
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:02<00:00,  2.08s/it]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:02<00:00,  2.08s/it]
(Worker_TP0 pid=43609) 
(Worker_TP0 pid=43609) INFO 10-23 09:29:32 [default_loader.py:267] Loading weights took 2.27 seconds
(Worker_TP0 pid=43609) INFO 10-23 09:29:33 [gpu_model_runner.py:2653] Model loading took 2.7054 GiB and 3.420745 seconds
(Worker_TP0 pid=43609) INFO 10-23 09:29:33 [gpu_model_runner.py:3344] Encoder cache will be initialized with a budget of 16384 tokens, and profiled with 1 image items of the maximum feature size.
(Worker_TP1 pid=43610) INFO 10-23 09:29:33 [gpu_model_runner.py:3344] Encoder cache will be initialized with a budget of 16384 tokens, and profiled with 1 image items of the maximum feature size.
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] WorkerProc hit an exception.
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] Traceback (most recent call last):
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 666, in worker_busy_loop
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     output = func(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     return func(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 263, in determine_available_memory
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     self.model_runner.profile_run()
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3361, in profile_run
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     self.model.get_multimodal_embeddings(
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_vl.py", line 1378, in get_multimodal_embeddings
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     vision_embeddings = self._process_image_input(multimodal_input)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_vl.py", line 1301, in _process_image_input
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     return run_dp_sharded_mrope_vision_model(self.visual,
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/vision.py", line 338, in run_dp_sharded_mrope_vision_model
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     image_embeds_local = vision_model(pixel_values_local,
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_vl.py", line 517, in forward
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     hidden_states = blk(hidden_states,
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]                     ^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_vl.py", line 200, in forward
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     x = x + self.attn(self.norm1(x),
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]             ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/qwen2_5_vl.py", line 369, in forward
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     output = flash_attn_varlen_func(q,
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]              ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/vllm_flash_attn/flash_attn_interface.py", line 233, in flash_attn_varlen_func
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     out, softmax_lse = torch.ops._vllm_fa2_C.varlen_fwd(
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/_ops.py", line 1243, in __call__
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     return self._op(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] torch.AcceleratorError: CUDA error: the provided PTX was compiled with an unsupported toolchain.
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] 
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] Traceback (most recent call last):
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 666, in worker_busy_loop
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     output = func(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     return func(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 263, in determine_available_memory
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     self.model_runner.profile_run()
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3361, in profile_run
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     self.model.get_multimodal_embeddings(
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_vl.py", line 1378, in get_multimodal_embeddings
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     vision_embeddings = self._process_image_input(multimodal_input)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_vl.py", line 1301, in _process_image_input
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     return run_dp_sharded_mrope_vision_model(self.visual,
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/vision.py", line 338, in run_dp_sharded_mrope_vision_model
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     image_embeds_local = vision_model(pixel_values_local,
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_vl.py", line 517, in forward
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     hidden_states = blk(hidden_states,
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]                     ^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_vl.py", line 200, in forward
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     x = x + self.attn(self.norm1(x),
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]             ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     return forward_call(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/qwen2_5_vl.py", line 369, in forward
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     output = flash_attn_varlen_func(q,
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]              ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/vllm_flash_attn/flash_attn_interface.py", line 233, in flash_attn_varlen_func
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     out, softmax_lse = torch.ops._vllm_fa2_C.varlen_fwd(
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/_ops.py", line 1243, in __call__
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]     return self._op(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] torch.AcceleratorError: CUDA error: the provided PTX was compiled with an unsupported toolchain.
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] 
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] 
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] EngineCore failed to start.
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] Traceback (most recent call last):
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708]     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708]     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 92, in __init__
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708]     self._initialize_kv_caches(vllm_config)
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 190, in _initialize_kv_caches
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708]     self.model_executor.determine_available_memory())
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 85, in determine_available_memory
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708]     return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 262, in collective_rpc
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708]     result = result.result()
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708]              ^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/concurrent/futures/_base.py", line 456, in result
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708]     return self.__get_result()
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708]            ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708]     raise self._exception
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/concurrent/futures/thread.py", line 59, in run
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708]     result = self.fn(*self.args, **self.kwargs)
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708]   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 248, in get_response
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708]     raise RuntimeError(
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] RuntimeError: Worker failed with error 'CUDA error: the provided PTX was compiled with an unsupported toolchain.
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] ', please check the stack trace above for the root cause
[rank1]:[W1023 09:29:42.972180911 TCPStore.cpp:138] [c10d] recvValueWithTimeout failed on SocketImpl(fd=90, addr=[localhost]:54338, remote=[localhost]:45167): Failed to recv, got 0 bytes. Connection was likely closed. Did the remote server shutdown or crash?
Exception raised from recvBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:682 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x80 (0x7f89ca37eeb0 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x5d694d1 (0x7f89ae3ef4d1 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #2: <unknown function> + 0x5d6ab2d (0x7f89ae3f0b2d in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #3: <unknown function> + 0x5d6b1e9 (0x7f89ae3f11e9 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #4: c10d::TCPStore::doWait(c10::ArrayRef<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::chrono::duration<long, std::ratio<1l, 1000l> >) + 0x1c6 (0x7f89ae3ec5f6 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #5: c10d::TCPStore::doGet(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x33 (0x7f89ae3edfe3 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #6: c10d::TCPStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x114 (0x7f89ae3ef0f4 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #7: c10d::PrefixStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x30 (0x7f89ae39d700 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #8: c10d::PrefixStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x30 (0x7f89ae39d700 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #9: c10d::PrefixStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x30 (0x7f89ae39d700 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #10: c10d::PrefixStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x30 (0x7f89ae39d700 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #11: c10d::ProcessGroupNCCL::broadcastUniqueNCCLID(ncclUniqueId*, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int) + 0x5c4 (0x7f896d8d9bd4 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #12: c10d::ProcessGroupNCCL::initNCCLComm(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::Device&, c10d::OpType, int, bool) + 0x1bba (0x7f896d8dc67a in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #13: c10d::ProcessGroupNCCL::_allgather_base(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&) + 0x141f (0x7f896d8ffb7f in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #14: <unknown function> + 0x5d080e8 (0x7f89ae38e0e8 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #15: <unknown function> + 0x5d0effe (0x7f89ae394ffe in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #16: <unknown function> + 0x5d2dd0b (0x7f89ae3b3d0b in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #17: <unknown function> + 0xc78589 (0x7f89bd7b6589 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
frame #18: <unknown function> + 0x3767d2 (0x7f89bceb47d2 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
frame #19: VLLM::Worker_TP1() [0x543944]
frame #20: _PyObject_MakeTpCall + 0x2fc (0x51778c in VLLM::Worker_TP1)
frame #21: _PyEval_EvalFrameDefault + 0x6d2 (0x521952 in VLLM::Worker_TP1)
frame #22: <unknown function> + 0x85294a (0x7f89bd39094a in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
frame #23: <unknown function> + 0xb584ab (0x7f89bd6964ab in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
frame #24: <unknown function> + 0x5b4222d (0x7f89ae1c822d in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #25: torch::jit::invokeOperatorFromPython(std::vector<std::shared_ptr<torch::jit::Operator>, std::allocator<std::shared_ptr<torch::jit::Operator> > > const&, pybind11::args const&, pybind11::kwargs const&, std::optional<c10::DispatchKey>) + 0x394 (0x7f89bd449a04 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
frame #26: torch::jit::_get_operation_for_overload_or_packet(std::vector<std::shared_ptr<torch::jit::Operator>, std::allocator<std::shared_ptr<torch::jit::Operator> > > const&, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional<c10::DispatchKey>) + 0x1a9 (0x7f89bd449dc9 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
frame #27: <unknown function> + 0x81567a (0x7f89bd35367a in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
frame #28: <unknown function> + 0x3767d2 (0x7f89bceb47d2 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
frame #29: VLLM::Worker_TP1() [0x543944]
frame #30: _PyObject_Call + 0xb5 (0x555cd5 in VLLM::Worker_TP1)
frame #31: _PyEval_EvalFrameDefault + 0x53fe (0x52667e in VLLM::Worker_TP1)
frame #32: _PyObject_FastCallDictTstate + 0x285 (0x519fc5 in VLLM::Worker_TP1)
frame #33: _PyObject_Call_Prepend + 0x66 (0x552ff6 in VLLM::Worker_TP1)
frame #34: VLLM::Worker_TP1() [0x6282e6]
frame #35: _PyObject_MakeTpCall + 0x2fc (0x51778c in VLLM::Worker_TP1)
frame #36: _PyEval_EvalFrameDefault + 0x6d2 (0x521952 in VLLM::Worker_TP1)
frame #37: VLLM::Worker_TP1() [0x56d11d]
frame #38: VLLM::Worker_TP1() [0x56ccad]
frame #39: _PyObject_Call + 0x122 (0x555d42 in VLLM::Worker_TP1)
frame #40: _PyEval_EvalFrameDefault + 0x53fe (0x52667e in VLLM::Worker_TP1)
frame #41: VLLM::Worker_TP1() [0x56d11d]
frame #42: VLLM::Worker_TP1() [0x56cce0]
frame #43: _PyEval_EvalFrameDefault + 0x53fe (0x52667e in VLLM::Worker_TP1)
frame #44: PyEval_EvalCode + 0xae (0x5de5ce in VLLM::Worker_TP1)
frame #45: VLLM::Worker_TP1() [0x61b7b7]
frame #46: VLLM::Worker_TP1() [0x616307]
frame #47: PyRun_StringFlags + 0x5f (0x61232f in VLLM::Worker_TP1)
frame #48: PyRun_SimpleStringFlags + 0x3a (0x611eca in VLLM::Worker_TP1)
frame #49: Py_RunMain + 0x4e1 (0x60f801 in VLLM::Worker_TP1)
frame #50: Py_BytesMain + 0x39 (0x5c6bb9 in VLLM::Worker_TP1)
frame #51: <unknown function> + 0x29d90 (0x7f89cb159d90 in /lib/x86_64-linux-gnu/libc.so.6)
frame #52: __libc_start_main + 0x80 (0x7f89cb159e40 in /lib/x86_64-linux-gnu/libc.so.6)
frame #53: VLLM::Worker_TP1() [0x5c69e9]

[rank1]:[W1023 09:29:42.988847516 TCPStore.cpp:125] [c10d] recvValue failed on SocketImpl(fd=90, addr=[localhost]:54338, remote=[localhost]:45167): Failed to recv, got 0 bytes. Connection was likely closed. Did the remote server shutdown or crash?
Exception raised from recvBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:682 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x80 (0x7f89ca37eeb0 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x5d694d1 (0x7f89ae3ef4d1 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #2: <unknown function> + 0x5d6a8cd (0x7f89ae3f08cd in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #3: <unknown function> + 0x5d6b47a (0x7f89ae3f147a in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x31e (0x7f89ae3ec19e in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #5: c10d::ProcessGroupNCCL::HeartbeatMonitor::runLoop() + 0x398 (0x7f896d8d1b18 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #6: <unknown function> + 0xdbbf4 (0x7f895125abf4 in /root/miniconda3/envs/qwen3-vl/bin/../lib/libstdc++.so.6)
frame #7: <unknown function> + 0x94ac3 (0x7f89cb1c4ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #8: <unknown function> + 0x126850 (0x7f89cb256850 in /lib/x86_64-linux-gnu/libc.so.6)

[rank1]:[W1023 09:29:42.993390455 ProcessGroupNCCL.cpp:1783] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Failed to recv, got 0 bytes. Connection was likely closed. Did the remote server shutdown or crash?
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:42 [multiproc_executor.py:154] Worker proc VllmWorker-0 died unexpectedly, shutting down executor.
(EngineCore_DP0 pid=43418) Process EngineCore_DP0:
(EngineCore_DP0 pid=43418) Traceback (most recent call last):
(EngineCore_DP0 pid=43418)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=43418)     self.run()
(EngineCore_DP0 pid=43418)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=43418)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=43418)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 712, in run_engine_core
(EngineCore_DP0 pid=43418)     raise e
(EngineCore_DP0 pid=43418)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=43418)     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=43418)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=43418)     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=43418)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 92, in __init__
(EngineCore_DP0 pid=43418)     self._initialize_kv_caches(vllm_config)
(EngineCore_DP0 pid=43418)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 190, in _initialize_kv_caches
(EngineCore_DP0 pid=43418)     self.model_executor.determine_available_memory())
(EngineCore_DP0 pid=43418)     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 85, in determine_available_memory
(EngineCore_DP0 pid=43418)     return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=43418)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 262, in collective_rpc
(EngineCore_DP0 pid=43418)     result = result.result()
(EngineCore_DP0 pid=43418)              ^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/concurrent/futures/_base.py", line 456, in result
(EngineCore_DP0 pid=43418)     return self.__get_result()
(EngineCore_DP0 pid=43418)            ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(EngineCore_DP0 pid=43418)     raise self._exception
(EngineCore_DP0 pid=43418)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/concurrent/futures/thread.py", line 59, in run
(EngineCore_DP0 pid=43418)     result = self.fn(*self.args, **self.kwargs)
(EngineCore_DP0 pid=43418)              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 248, in get_response
(EngineCore_DP0 pid=43418)     raise RuntimeError(
(EngineCore_DP0 pid=43418) RuntimeError: Worker failed with error 'CUDA error: the provided PTX was compiled with an unsupported toolchain.
(EngineCore_DP0 pid=43418) CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(EngineCore_DP0 pid=43418) For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(EngineCore_DP0 pid=43418) Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(EngineCore_DP0 pid=43418) ', please check the stack trace above for the root cause
(APIServer pid=43208) Traceback (most recent call last):
(APIServer pid=43208)   File "/root/miniconda3/envs/qwen3-vl/bin/vllm", line 7, in <module>
(APIServer pid=43208)     sys.exit(main())
(APIServer pid=43208)              ^^^^^^
(APIServer pid=43208)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/entrypoints/cli/main.py", line 54, in main
(APIServer pid=43208)     args.dispatch_function(args)
(APIServer pid=43208)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/entrypoints/cli/serve.py", line 57, in cmd
(APIServer pid=43208)     uvloop.run(run_server(args))
(APIServer pid=43208)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/uvloop/__init__.py", line 109, in run
(APIServer pid=43208)     return __asyncio.run(
(APIServer pid=43208)            ^^^^^^^^^^^^^^
(APIServer pid=43208)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=43208)     return runner.run(main)
(APIServer pid=43208)            ^^^^^^^^^^^^^^^^
(APIServer pid=43208)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=43208)     return self._loop.run_until_complete(task)
(APIServer pid=43208)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=43208)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=43208)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/uvloop/__init__.py", line 61, in wrapper
(APIServer pid=43208)     return await main
(APIServer pid=43208)            ^^^^^^^^^^
(APIServer pid=43208)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1884, in run_server
(APIServer pid=43208)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=43208)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1902, in run_server_worker
(APIServer pid=43208)     async with build_async_engine_client(
(APIServer pid=43208)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=43208)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=43208)     return await anext(self.gen)
(APIServer pid=43208)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=43208)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 180, in build_async_engine_client
(APIServer pid=43208)     async with build_async_engine_client_from_engine_args(
(APIServer pid=43208)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=43208)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=43208)     return await anext(self.gen)
(APIServer pid=43208)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=43208)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 225, in build_async_engine_client_from_engine_args
(APIServer pid=43208)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=43208)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=43208)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/utils/__init__.py", line 1572, in inner
(APIServer pid=43208)     return fn(*args, **kwargs)
(APIServer pid=43208)            ^^^^^^^^^^^^^^^^^^^
(APIServer pid=43208)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 207, in from_vllm_config
(APIServer pid=43208)     return cls(
(APIServer pid=43208)            ^^^^
(APIServer pid=43208)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 134, in __init__
(APIServer pid=43208)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=43208)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=43208)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client
(APIServer pid=43208)     return AsyncMPClient(*client_args)
(APIServer pid=43208)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=43208)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 769, in __init__
(APIServer pid=43208)     super().__init__(
(APIServer pid=43208)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 448, in __init__
(APIServer pid=43208)     with launch_core_engines(vllm_config, executor_class,
(APIServer pid=43208)          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=43208)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/contextlib.py", line 144, in __exit__
(APIServer pid=43208)     next(self.gen)
(APIServer pid=43208)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 732, in launch_core_engines
(APIServer pid=43208)     wait_for_engine_startup(
(APIServer pid=43208)   File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 785, in wait_for_engine_startup
(APIServer pid=43208)     raise RuntimeError("Engine core initialization failed. "
(APIServer pid=43208) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

这个怎么解决

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: qwen3-vl-2b after ms-swift fine-tuning lance errors #27405

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: qwen3-vl-2b after ms-swift fine-tuning lance errors #27405

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions