Skip to content

[Bug]: qwen3-vl-2b after ms-swift fine-tuning lance errors #27405

@Michel-debug

Description

@Michel-debug

Your current environment

The output of python collect_env.py
Your output of `python collect_env.py` here

🐛 Describe the bug

(Worker_TP1 pid=43610) INFO 10-23 09:29:27 [gpu_model_runner.py:2602] Starting to load model /media/kwaishop-langbridge-kfs/chenjunjie/live-product-matcher-reject/sft_train/ms-swift/qwen3_vl_2b_1023_sft_afternoon_prompt_v6_merge...
(Worker_TP1 pid=43610) INFO 10-23 09:29:27 [gpu_model_runner.py:2634] Loading model from scratch...
(Worker_TP1 pid=43610) INFO 10-23 09:29:28 [cuda.py:366] Using Flash Attention backend on V1 engine.
(Worker_TP0 pid=43609) INFO 10-23 09:29:28 [gpu_model_runner.py:2602] Starting to load model /media/kwaishop-langbridge-kfs/chenjunjie/live-product-matcher-reject/sft_train/ms-swift/qwen3_vl_2b_1023_sft_afternoon_prompt_v6_merge...
(Worker_TP0 pid=43609) INFO 10-23 09:29:28 [gpu_model_runner.py:2634] Loading model from scratch...
(Worker_TP0 pid=43609) INFO 10-23 09:29:29 [cuda.py:366] Using Flash Attention backend on V1 engine.
Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s]
(Worker_TP1 pid=43610) INFO 10-23 09:29:30 [default_loader.py:267] Loading weights took 2.22 seconds
(Worker_TP1 pid=43610) INFO 10-23 09:29:31 [gpu_model_runner.py:2653] Model loading took 2.7054 GiB and 2.993759 seconds
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:02<00:00, 2.08s/it]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:02<00:00, 2.08s/it]
(Worker_TP0 pid=43609)
(Worker_TP0 pid=43609) INFO 10-23 09:29:32 [default_loader.py:267] Loading weights took 2.27 seconds
(Worker_TP0 pid=43609) INFO 10-23 09:29:33 [gpu_model_runner.py:2653] Model loading took 2.7054 GiB and 3.420745 seconds
(Worker_TP0 pid=43609) INFO 10-23 09:29:33 [gpu_model_runner.py:3344] Encoder cache will be initialized with a budget of 16384 tokens, and profiled with 1 image items of the maximum feature size.
(Worker_TP1 pid=43610) INFO 10-23 09:29:33 [gpu_model_runner.py:3344] Encoder cache will be initialized with a budget of 16384 tokens, and profiled with 1 image items of the maximum feature size.
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] WorkerProc hit an exception.
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] Traceback (most recent call last):
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 666, in worker_busy_loop
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] output = func(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return func(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 263, in determine_available_memory
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] self.model_runner.profile_run()
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3361, in profile_run
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] self.model.get_multimodal_embeddings(
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_vl.py", line 1378, in get_multimodal_embeddings
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] vision_embeddings = self._process_image_input(multimodal_input)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_vl.py", line 1301, in _process_image_input
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return run_dp_sharded_mrope_vision_model(self.visual,
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/vision.py", line 338, in run_dp_sharded_mrope_vision_model
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] image_embeds_local = vision_model(pixel_values_local,
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return forward_call(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_vl.py", line 517, in forward
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] hidden_states = blk(hidden_states,
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return forward_call(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_vl.py", line 200, in forward
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] x = x + self.attn(self.norm1(x),
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return forward_call(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/qwen2_5_vl.py", line 369, in forward
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] output = flash_attn_varlen_func(q,
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/vllm_flash_attn/flash_attn_interface.py", line 233, in flash_attn_varlen_func
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] out, softmax_lse = torch.ops._vllm_fa2_C.varlen_fwd(
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/_ops.py", line 1243, in call
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return self._op(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] torch.AcceleratorError: CUDA error: the provided PTX was compiled with an unsupported toolchain.
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] Traceback (most recent call last):
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 666, in worker_busy_loop
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] output = func(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return func(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 263, in determine_available_memory
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] self.model_runner.profile_run()
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3361, in profile_run
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] self.model.get_multimodal_embeddings(
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_vl.py", line 1378, in get_multimodal_embeddings
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] vision_embeddings = self._process_image_input(multimodal_input)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_vl.py", line 1301, in _process_image_input
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return run_dp_sharded_mrope_vision_model(self.visual,
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/vision.py", line 338, in run_dp_sharded_mrope_vision_model
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] image_embeds_local = vision_model(pixel_values_local,
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return forward_call(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_vl.py", line 517, in forward
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] hidden_states = blk(hidden_states,
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return forward_call(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_vl.py", line 200, in forward
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] x = x + self.attn(self.norm1(x),
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return self._call_impl(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return forward_call(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/model_executor/models/qwen2_5_vl.py", line 369, in forward
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] output = flash_attn_varlen_func(q,
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/vllm_flash_attn/flash_attn_interface.py", line 233, in flash_attn_varlen_func
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] out, softmax_lse = torch.ops._vllm_fa2_C.varlen_fwd(
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/_ops.py", line 1243, in call
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] return self._op(*args, **kwargs)
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] torch.AcceleratorError: CUDA error: the provided PTX was compiled with an unsupported toolchain.
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671] Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]
(Worker_TP0 pid=43609) ERROR 10-23 09:29:40 [multiproc_executor.py:671]
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] EngineCore failed to start.
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] Traceback (most recent call last):
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 498, in init
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] super().init(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 92, in init
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] self._initialize_kv_caches(vllm_config)
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 190, in _initialize_kv_caches
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] self.model_executor.determine_available_memory())
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 85, in determine_available_memory
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 262, in collective_rpc
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] result = result.result()
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] ^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/concurrent/futures/_base.py", line 456, in result
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] return self.__get_result()
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] raise self._exception
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/concurrent/futures/thread.py", line 59, in run
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] result = self.fn(*self.args, *self.kwargs)
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 248, in get_response
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] raise RuntimeError(
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] RuntimeError: Worker failed with error 'CUDA error: the provided PTX was compiled with an unsupported toolchain.
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:40 [core.py:708] ', please check the stack trace above for the root cause
[rank1]:[W1023 09:29:42.972180911 TCPStore.cpp:138] [c10d] recvValueWithTimeout failed on SocketImpl(fd=90, addr=[localhost]:54338, remote=[localhost]:45167): Failed to recv, got 0 bytes. Connection was likely closed. Did the remote server shutdown or crash?
Exception raised from recvBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:682 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x80 (0x7f89ca37eeb0 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: + 0x5d694d1 (0x7f89ae3ef4d1 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #2: + 0x5d6ab2d (0x7f89ae3f0b2d in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #3: + 0x5d6b1e9 (0x7f89ae3f11e9 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #4: c10d::TCPStore::doWait(c10::ArrayRef<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::chrono::duration<long, std::ratio<1l, 1000l> >) + 0x1c6 (0x7f89ae3ec5f6 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #5: c10d::TCPStore::doGet(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x33 (0x7f89ae3edfe3 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #6: c10d::TCPStore::get(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x114 (0x7f89ae3ef0f4 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #7: c10d::PrefixStore::get(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x30 (0x7f89ae39d700 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #8: c10d::PrefixStore::get(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x30 (0x7f89ae39d700 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #9: c10d::PrefixStore::get(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x30 (0x7f89ae39d700 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #10: c10d::PrefixStore::get(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x30 (0x7f89ae39d700 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #11: c10d::ProcessGroupNCCL::broadcastUniqueNCCLID(ncclUniqueId
, bool, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int) + 0x5c4 (0x7f896d8d9bd4 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #12: c10d::ProcessGroupNCCL::initNCCLComm(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, c10::Device&, c10d::OpType, int, bool) + 0x1bba (0x7f896d8dc67a in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #13: c10d::ProcessGroupNCCL::_allgather_base(at::Tensor&, at::Tensor&, c10d::AllgatherOptions const&) + 0x141f (0x7f896d8ffb7f in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #14: + 0x5d080e8 (0x7f89ae38e0e8 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #15: + 0x5d0effe (0x7f89ae394ffe in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #16: + 0x5d2dd0b (0x7f89ae3b3d0b in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #17: + 0xc78589 (0x7f89bd7b6589 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
frame #18: + 0x3767d2 (0x7f89bceb47d2 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
frame #19: VLLM::Worker_TP1() [0x543944]
frame #20: _PyObject_MakeTpCall + 0x2fc (0x51778c in VLLM::Worker_TP1)
frame #21: _PyEval_EvalFrameDefault + 0x6d2 (0x521952 in VLLM::Worker_TP1)
frame #22: + 0x85294a (0x7f89bd39094a in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
frame #23: + 0xb584ab (0x7f89bd6964ab in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
frame #24: + 0x5b4222d (0x7f89ae1c822d in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #25: torch::jit::invokeOperatorFromPython(std::vector<std::shared_ptrtorch::jit::Operator, std::allocator<std::shared_ptrtorch::jit::Operator > > const&, pybind11::args const&, pybind11::kwargs const&, std::optionalc10::DispatchKey) + 0x394 (0x7f89bd449a04 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
frame #26: torch::jit::_get_operation_for_overload_or_packet(std::vector<std::shared_ptrtorch::jit::Operator, std::allocator<std::shared_ptrtorch::jit::Operator > > const&, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optionalc10::DispatchKey) + 0x1a9 (0x7f89bd449dc9 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
frame #27: + 0x81567a (0x7f89bd35367a in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
frame #28: + 0x3767d2 (0x7f89bceb47d2 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
frame #29: VLLM::Worker_TP1() [0x543944]
frame #30: _PyObject_Call + 0xb5 (0x555cd5 in VLLM::Worker_TP1)
frame #31: _PyEval_EvalFrameDefault + 0x53fe (0x52667e in VLLM::Worker_TP1)
frame #32: _PyObject_FastCallDictTstate + 0x285 (0x519fc5 in VLLM::Worker_TP1)
frame #33: _PyObject_Call_Prepend + 0x66 (0x552ff6 in VLLM::Worker_TP1)
frame #34: VLLM::Worker_TP1() [0x6282e6]
frame #35: _PyObject_MakeTpCall + 0x2fc (0x51778c in VLLM::Worker_TP1)
frame #36: _PyEval_EvalFrameDefault + 0x6d2 (0x521952 in VLLM::Worker_TP1)
frame #37: VLLM::Worker_TP1() [0x56d11d]
frame #38: VLLM::Worker_TP1() [0x56ccad]
frame #39: _PyObject_Call + 0x122 (0x555d42 in VLLM::Worker_TP1)
frame #40: _PyEval_EvalFrameDefault + 0x53fe (0x52667e in VLLM::Worker_TP1)
frame #41: VLLM::Worker_TP1() [0x56d11d]
frame #42: VLLM::Worker_TP1() [0x56cce0]
frame #43: _PyEval_EvalFrameDefault + 0x53fe (0x52667e in VLLM::Worker_TP1)
frame #44: PyEval_EvalCode + 0xae (0x5de5ce in VLLM::Worker_TP1)
frame #45: VLLM::Worker_TP1() [0x61b7b7]
frame #46: VLLM::Worker_TP1() [0x616307]
frame #47: PyRun_StringFlags + 0x5f (0x61232f in VLLM::Worker_TP1)
frame #48: PyRun_SimpleStringFlags + 0x3a (0x611eca in VLLM::Worker_TP1)
frame #49: Py_RunMain + 0x4e1 (0x60f801 in VLLM::Worker_TP1)
frame #50: Py_BytesMain + 0x39 (0x5c6bb9 in VLLM::Worker_TP1)
frame #51: + 0x29d90 (0x7f89cb159d90 in /lib/x86_64-linux-gnu/libc.so.6)
frame #52: __libc_start_main + 0x80 (0x7f89cb159e40 in /lib/x86_64-linux-gnu/libc.so.6)
frame #53: VLLM::Worker_TP1() [0x5c69e9]

[rank1]:[W1023 09:29:42.988847516 TCPStore.cpp:125] [c10d] recvValue failed on SocketImpl(fd=90, addr=[localhost]:54338, remote=[localhost]:45167): Failed to recv, got 0 bytes. Connection was likely closed. Did the remote server shutdown or crash?
Exception raised from recvBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:682 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x80 (0x7f89ca37eeb0 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: + 0x5d694d1 (0x7f89ae3ef4d1 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #2: + 0x5d6a8cd (0x7f89ae3f08cd in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #3: + 0x5d6b47a (0x7f89ae3f147a in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > const&) + 0x31e (0x7f89ae3ec19e in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #5: c10d::ProcessGroupNCCL::HeartbeatMonitor::runLoop() + 0x398 (0x7f896d8d1b18 in /root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #6: + 0xdbbf4 (0x7f895125abf4 in /root/miniconda3/envs/qwen3-vl/bin/../lib/libstdc++.so.6)
frame #7: + 0x94ac3 (0x7f89cb1c4ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #8: + 0x126850 (0x7f89cb256850 in /lib/x86_64-linux-gnu/libc.so.6)

[rank1]:[W1023 09:29:42.993390455 ProcessGroupNCCL.cpp:1783] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Failed to recv, got 0 bytes. Connection was likely closed. Did the remote server shutdown or crash?
(EngineCore_DP0 pid=43418) ERROR 10-23 09:29:42 [multiproc_executor.py:154] Worker proc VllmWorker-0 died unexpectedly, shutting down executor.
(EngineCore_DP0 pid=43418) Process EngineCore_DP0:
(EngineCore_DP0 pid=43418) Traceback (most recent call last):
(EngineCore_DP0 pid=43418) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=43418) self.run()
(EngineCore_DP0 pid=43418) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=43418) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=43418) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 712, in run_engine_core
(EngineCore_DP0 pid=43418) raise e
(EngineCore_DP0 pid=43418) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=43418) engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=43418) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 498, in init
(EngineCore_DP0 pid=43418) super().init(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=43418) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 92, in init
(EngineCore_DP0 pid=43418) self._initialize_kv_caches(vllm_config)
(EngineCore_DP0 pid=43418) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 190, in _initialize_kv_caches
(EngineCore_DP0 pid=43418) self.model_executor.determine_available_memory())
(EngineCore_DP0 pid=43418) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 85, in determine_available_memory
(EngineCore_DP0 pid=43418) return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=43418) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 262, in collective_rpc
(EngineCore_DP0 pid=43418) result = result.result()
(EngineCore_DP0 pid=43418) ^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/concurrent/futures/_base.py", line 456, in result
(EngineCore_DP0 pid=43418) return self.__get_result()
(EngineCore_DP0 pid=43418) ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(EngineCore_DP0 pid=43418) raise self._exception
(EngineCore_DP0 pid=43418) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/concurrent/futures/thread.py", line 59, in run
(EngineCore_DP0 pid=43418) result = self.fn(*self.args, **self.kwargs)
(EngineCore_DP0 pid=43418) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=43418) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 248, in get_response
(EngineCore_DP0 pid=43418) raise RuntimeError(
(EngineCore_DP0 pid=43418) RuntimeError: Worker failed with error 'CUDA error: the provided PTX was compiled with an unsupported toolchain.
(EngineCore_DP0 pid=43418) CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(EngineCore_DP0 pid=43418) For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(EngineCore_DP0 pid=43418) Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
(EngineCore_DP0 pid=43418) ', please check the stack trace above for the root cause
(APIServer pid=43208) Traceback (most recent call last):
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/bin/vllm", line 7, in
(APIServer pid=43208) sys.exit(main())
(APIServer pid=43208) ^^^^^^
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/entrypoints/cli/main.py", line 54, in main
(APIServer pid=43208) args.dispatch_function(args)
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/entrypoints/cli/serve.py", line 57, in cmd
(APIServer pid=43208) uvloop.run(run_server(args))
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/uvloop/init.py", line 109, in run
(APIServer pid=43208) return __asyncio.run(
(APIServer pid=43208) ^^^^^^^^^^^^^^
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=43208) return runner.run(main)
(APIServer pid=43208) ^^^^^^^^^^^^^^^^
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=43208) return self._loop.run_until_complete(task)
(APIServer pid=43208) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=43208) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/uvloop/init.py", line 61, in wrapper
(APIServer pid=43208) return await main
(APIServer pid=43208) ^^^^^^^^^^
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1884, in run_server
(APIServer pid=43208) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1902, in run_server_worker
(APIServer pid=43208) async with build_async_engine_client(
(APIServer pid=43208) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=43208) return await anext(self.gen)
(APIServer pid=43208) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 180, in build_async_engine_client
(APIServer pid=43208) async with build_async_engine_client_from_engine_args(
(APIServer pid=43208) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=43208) return await anext(self.gen)
(APIServer pid=43208) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 225, in build_async_engine_client_from_engine_args
(APIServer pid=43208) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=43208) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/utils/init.py", line 1572, in inner
(APIServer pid=43208) return fn(*args, **kwargs)
(APIServer pid=43208) ^^^^^^^^^^^^^^^^^^^
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 207, in from_vllm_config
(APIServer pid=43208) return cls(
(APIServer pid=43208) ^^^^
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 134, in init
(APIServer pid=43208) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=43208) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client
(APIServer pid=43208) return AsyncMPClient(*client_args)
(APIServer pid=43208) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 769, in init
(APIServer pid=43208) super().init(
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 448, in init
(APIServer pid=43208) with launch_core_engines(vllm_config, executor_class,
(APIServer pid=43208) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/contextlib.py", line 144, in exit
(APIServer pid=43208) next(self.gen)
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 732, in launch_core_engines
(APIServer pid=43208) wait_for_engine_startup(
(APIServer pid=43208) File "/root/miniconda3/envs/qwen3-vl/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 785, in wait_for_engine_startup
(APIServer pid=43208) raise RuntimeError("Engine core initialization failed. "
(APIServer pid=43208) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

这个怎么解决

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions